AI For Students Studying

AI For Students Studying — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • World model (artificial intelligence)

    World model (artificial intelligence)

    A world model in artificial intelligence is a machine learning system that builds an internal representation of an environment. The model predicts how that environment changes over time in response to actions. Researchers design world models to help agents plan, reason, and act without constant real-world trial and error. World models differ from systems that merely classify or generate outputs. They simulate dynamics such as physics, object interactions, and causality. Early ideas date to the 1990s. Modern versions power robots, autonomous driving, and interactive video generation. == History == Jürgen Schmidhuber introduced the term world model in machine learning in 1990. He proposed recurrent neural networks that predict future states from observations and use those predictions to train agents. David Ha and Schmidhuber revived the concept in a 2018 paper. Their agents learned to drive virtual cars and play video games inside self-generated simulations. Yann LeCun advanced the idea in a 2022 position paper titled "A Path Towards Autonomous Machine Intelligence". He argued that intelligence requires predictive models of the world rather than pure pattern matching. LeCun proposed the joint embedding predictive architecture (JEPA) as a practical foundation. LeCun and collaborators developed several JEPA variants. V-JEPA 2 reached state-of-the-art performance on video understanding and physical reasoning at the time. It supports zero-shot robot control in unfamiliar environments. Introduced in March 2026, LeWorldModel trains stably end-to-end from raw pixels and uses two loss terms and avoids hand-crafted heuristics. LeCun founded Advanced Machine Intelligence Labs in 2026 to further develop world models. Google DeepMind introduced Genie in 2024. The model learned interactive environments from unlabeled internet videos. Genie 2 followed in late 2024 and added three-dimensional generation. The Genie series set benchmarks for general-purpose simulation. Genie 3 was introduced in August 2025. It produces photorealistic, real-time interactive worlds from text prompts which are displayed at 24 frames per second and explored in real time with text or image prompts. The model supports persistent three-dimensional worlds and real-time interaction. Waymo adopted Genie 3 in February 2026 and used it to create a specialized world model for autonomous driving simulation, called the Waymo World Model. It produces synchronized camera and lidar outputs and creates edge cases that real robotaxis rarely encounter. The edge cases were reported to be unusual by PCMag. General Intuition announced a $133.7 million seed round. World Labs raised $1 billion. AMI raised $1.03 billion. In April 2026, Alibaba announced Happy Oyster, its world model designed for real-time and “flowy” world model. It includes a directing mode for world building based on text and image prompts and a wandering mode for exploring the resulting world. It can generate 3-minute in-world video clips. Also in April, World Labs, co-founded by Li Fei Fei, unveiled Spark 2.0, an open-source 3D Gaussian splatting rendering engine that targets smartphone-class devices. In June 2026, Nvidia released Cosmos 3, a family of open-weight models. It combines previously independent physical reasoning, world simulation, and action generation. Cosmos 3 integrates can process and generate text, image, video, audio, and action sequences. The model employs a Mixture-of-Transformers" (MoT) approach. An autoregressive (AR) transformer handles reasoning and next-token prediction, while a diffusion transformer (DT) does multimodal generation. Encoders (ViT for vision, VAE for visual/audio, and domain-specific for actions) and generate a shared representation space using 3D multi-dimensional rotary position embedding (mRoPE) for spatial and temporal information. The family includes Cosmos3-Nano (16B parameters) for workstations; Cosmos3-Super (64B parameters) for research. == Architecture == World models process raw sensory data such as video frames or lidar scans. They compress this input into compact latent representations. The system then predicts future representations rather than pixel-by-pixel reconstructions. Many modern world models use joint embedding predictive architecture (JEPA). An encoder turns observations into embeddings. A predictor estimates one or a suite of embeddings from the current one and an action. In some cases a critic chooses one embedding as the best result. A regularizer keeps embeddings well-behaved. The model trains by minimizing prediction error in embedding space. This approach avoids the high cost of generating every detail. Some architectures add explicit components. A fast reactive path handles immediate responses. A slower deliberative path performs longer-horizon planning. Video prediction accuracy or robot success rates are key metrics, but do not always predict real-world performance. Generative world models such as Genie 3 combine these with a simulator. They accept text prompts or layouts and output consistent video, lidar, or three-dimensional scenes. World models often train with self-supervised learning. They use large unlabeled datasets of video or robot interactions. Self-supervised learning can speed learning. Reinforcement learning can fine-tune a model for specific tasks. == Applications == World models support robot learning. Agents train inside simulations and transfer skills to the physical world. This reduces the need for dangerous or expensive real-world trials. Autonomous vehicles use world models to test rare events. Waymo's system simulates tornadoes or unusual pedestrian behavior. Companies train planners without putting vehicles on public roads. Interactive entertainment benefits from world models. Genie 3 lets users generate playable environments from simple descriptions. Game studios prototype levels faster. Scientific simulation gains from these models. Researchers model physical systems or biological processes at scale. Planners in logistics or urban design test strategies inside accurate digital twins. == Comparison with large language models == Both world models and large language models (LLMs) use inferencing on their inputs to make predictions. LLMs operate on textual inputs. They predict the next token in text sequences. They excel at language-oriented tasks such as translation or summarization. However, they lack understanding of physics. World models operate on sensor inputs such as pixels. They predict state changes in that data in latent space. This design supports planning and causal reasoning. LLMs generate fluent text but often fail at consistent physical predictions. Their architecture employs transformers with refinements such as mixture of experts. World models divide an inferencing task into work performed by encoders, predictors, simulators, and other pieces. They typically handle multimodal inputs such as video, lidar, radar, and audio, guided by textual prompting. LLMs power chatbots and code assistants. World models drive embodied agents that act in dynamic environments, such as autonomous driving. The two may be combined in hybrid systems. For example, a LLM handles instructions, while a world model manages low-level control. World model proponents such as LeCun claim that because LLMs are trained only on text, they have no ability to predict anything beyond text, such as real-world events. == Benchmarks == World model benchmarks test physical understanding, long-term consistency, planning, and generalization from sensor data. Meta introduced three benchmarks for V-JEPA 2. IntPhys 2 measures a model's ability to detect physics violations. It presents pairs of videos that diverge when one breaks physical rules. Humans score near 100% accuracy. V-JEPA 2 achieves little better than random chance on many conditions. Minimal Video Pairs (MVPBench) tests physical understanding through multiple-choice questions based on short video clips. It probes object interactions and causality. Something-Something tests action recognition. Epic-Kitchens-100 tests human action anticipation. DeepMind benchmark: Interactive evaluation measures consistency over minutes of interaction, memory of off-screen objects, and response to user actions or text prompts. Waymo benchmark: Output generation quality: Metrics include realism, controllability (via text prompts), and usefulness for training planners in simulated worlds. However, pixel reconstruction error rate with episodic rewards often fails. Other: Epic-Kitchens-100 (often measured with Recall@5) Ego4D 50 Salads, Breakfast, etc. Potential benchmarks: Zero-shot transfer to robots Long-horizon planning Implausible prediction rate

    Read more →
  • With Folded Hands ...

    With Folded Hands ...

    "With Folded Hands ..." is a 1947 science fiction novelette by American writer Jack Williamson (1908–2006). In writing it, Williamson was influenced by the aftermath of World War II, the atomic bombings of Hiroshima and Nagasaki, and his concern that "some of the technological creations we had developed with the best intentions might have disastrous consequences in the long run." The novelette first appeared in the July 1947 issue of Astounding Science Fiction and was later included in The Science Fiction Hall of Fame, Volume Two (1973) after being voted one of the best novellas up to 1965. In 1950, it was the first of several Astounding stories adapted for NBC's radio series Dimension X. == Rewrite and sequel == The 1947 publication was followed by a novel-length rewrite, with a different setting and inventor. At the behest of Astounding editor-in-chief John W. Campbell, a new ending had the robots defeated by means of what Williamson and Campbell would later christen "psionics". This novel was serialized, also in Astounding (March, April, May 1948), as ... And Searching Mind, and finally published in hardback book form as The Humanoids (1949). Much later, in 1980, Williamson followed with another sequel, The Humanoid Touch. == Plot summary == Underhill, a seller of "Mechanicals" (unthinking robots that perform menial tasks) in the small town of Two Rivers, is startled to find a competitor's store on his way home. The competitors are not humans but are small black robots who appear more advanced than anything Underhill has encountered before. They describe themselves as "humanoids". Disturbed at his encounter, Underhill rushes home to discover that his wife has taken in a new lodger, a mysterious old man named Sledge. In the course of the next day, the new Mechanicals have appeared everywhere in town. They state that they only follow the Prime Directive: "to serve and obey and guard men from harm". Offering their services free of charge, they replace humans as police officers, bank tellers, and more, and eventually drive Underhill out of business. Despite the humanoids' benign appearance and mission, Underhill soon realizes that, in the name of their Prime Directive, the mechanicals have essentially taken over every aspect of human life. No humans may engage in any behavior that might endanger them, and every human action is carefully scrutinized. Suicide is prohibited. Humans who resist the Prime Directive are taken away and lobotomized, so that they may live happily under the direction of the humanoids. Underhill learns that his lodger Sledge is the creator of the humanoids and is on the run from them. Sledge explains that 60 years earlier he had discovered the force of "rhodomagnetics" on the planet Wing IV and that his discovery resulted in a war that destroyed his planet. In his grief, Sledge designed the humanoids to help humanity and be invulnerable to human exploitation. However, he eventually realized that they had instead taken control of humanity, in the name of their Prime Directive, to make humans happy. The humanoids are spreading out from Wing IV to every human-occupied planet to implement their Prime Directive. Sledge and Underhill attempt to stop the humanoids by aiming a rhodomagnetic beam at Wing IV, but fail. The humanoids take Sledge away for surgery. He returns with no memory of his prior life, stating that he is now happy under the humanoids' care. Underhill is driven home by the humanoids, sitting "with folded hands," as there is nothing left to do. == Origins == In a 1991 interview, Williamson revealed how the story construction reflected events of his childhood in addition to technological extrapolations: I wrote "With Folded Hands" immediately after World War II, when the shadow of the atomic bomb had just fallen over SF and was just beginning to haunt the imaginations of people in the US. The story grows out of that general feeling that some of the technological creations we had developed with the best intentions might have disastrous consequences in the long run (that idea, of course, still seems relevant today). The notion I was consciously working on specifically came out of a fragment of a story I had worked on for a while about an astronaut in space who is accompanied by a robot obviously superior to him physically—i.e., the robot wasn't hurt by gravity, extremes of temperature, radiation, or whatever. Just looking at the fragment gave me the sense of how inferior humanity is in many ways to mechanical creations. That basic recognition was the essence of the story, and as I wrote it up in my notes the theme was that the perfect machine would prove to be perfectly destructive... It was only when I looked back at the story much later on that I was able to realize that the emotional reach of the story undoubtedly derived from my own early childhood, when people were attempting to protect me from all those hazardous things a kid is going to encounter in the isolated frontier setting I grew up in. As a result, I felt frustrated and over protected by people whom I couldn't hate because I loved them. A sort of psychological trap. Specifically, the first three years of my life were spent on a ranch at the top of the Sierra Madre Mountains on the headwaters of the Yaqui River in Sonora, Mexico. ... [My mother] was terrified by this environment. My father built a crib that became a psychological prison for me, particularly because my mother apparently kept me in it too long, when I needed to get out and crawl on the floor. ... In retrospect, I'm certain I projected my fears and suspicions of this kind of conditioning, and these projections became the governing emotional principle of "With Folded Hands" and The Humanoids. == Reception == In 2024, Robert Silverberg wrote an essay in which he asserted that "With Folded Hands..." is "probably the best story ever written about robots" and suggested that Elon Musk's Optimus Generation 2 is the realization of the "humanoids" along with their worst drawbacks.

    Read more →
  • The Great Automatic Grammatizator

    The Great Automatic Grammatizator

    The Great Automatic Grammatizator (published in the U.S. as The Umbrella Man and Other Stories) is a posthumous 1998 collection of thirteen short stories written by British author Roald Dahl. The stories were selected for teenagers from Dahl's adult works. All the stories included were published elsewhere originally; their sources are noted below. The stories, with the exception of the war story "Katina", possess a deadpan, ironic, bizarre, or even macabre sense of humor. They generally end with unexpected plot twists. == Stories == "The Great Automatic Grammatizator" (from Someone Like You): A mechanically-minded man reasons that the rules of grammar are fixed by certain, almost mathematical principles. By exploiting this idea, he is able to create a mammoth machine that can write a prize-winning novel in roughly fifteen minutes. The story ends on a fearful note, as more and more of the world's writers are forced into licensing their names—and all hope of human creativity—to the machine. "Mrs. Bixby and the Colonel's Coat" (from Kiss Kiss): Mrs. Bixby cheats on her dentist husband with a rich, dashing colonel. When their relationship breaks off, the colonel offers Mrs. Bixby a gorgeous and expensive mink coat. In an attempt to explain the coat away, Mrs. Bixby sets up an elaborate trick with the help of a pawn shop—but her husband learns of the ruse and manages to turn the tables. "The Butler" (from More Tales of the Unexpected): An obnoxious and newly wealthy couple employs a butler and chef to impress dinner guests. The butler recommends that the husband buy expensive wines to please his guests, and the man slavishly follows the idea. The butler and the chef reap the rewards of this idea, while making fools of the "fashionable" couple. "Man from the South" (from Someone Like You): At a seaside resort in Jamaica, a strange old man makes a bet with an American man in his late teens. If the young man's cigarette lighter can spark ten times without fail, the American will win a brand-new Cadillac car—but failure means losing the little finger of his right hand. The high-tension wager ensues, and with only a few sparks left, a woman—who knows only too well the cost of the old man's bets—appears and stops the madness. "The Landlady" (from Kiss Kiss): A young man traveling to London on business stops at a bed and breakfast along the way, where a strange and slightly dotty landlady eagerly welcomes him. The eccentric nature of the house, and the news that only two other young men have ever stayed there, confuse and frighten the young man. In the end, the landlady—who indulges in the hobby of taxidermy—and the boy share a drink of tea that tastes of bitter almonds, and the landlady softly smiles at what may be her latest stuffing project. "Parson's Pleasure" (from Kiss Kiss): A man discovers an extremely rare piece of Chippendale furniture at the farm of some boorish ranchers. He desperately attempts to buy the piece cheap, in the hope of selling it at auction to earn a huge profit. He manages to buy the piece "for firewood", only for the ranchers to destroy it in an attempt to make it fit into his car. "The Umbrella Man" (from More Tales of the Unexpected): On a rainy day, a mother and daughter meet a gentlemanly old man on a street corner, who offers them a beautiful silk umbrella in exchange for a pound note. They trade, and the daughter notices that the "feeble" old man suddenly seems much sprier. They follow him, and discover that the gentleman is a con artist who visits various pubs, has a drink, and then steals another umbrella to continue the cycle. "Katina" (from Over to You: Ten Stories of Flyers and Flying): A group of RAF pilots stationed in Greece during World War II discover a hauntingly beautiful young girl, whose "family is beneath the rubble." She becomes their squadron's unofficial "mascot". In the end, her fragile life is taken as she stands defiantly against a rain of bullets from Nazi aircraft, shaking her fists at the heavens. "The Way Up to Heaven" (from Kiss Kiss): Mrs. Foster suffers from a chronic phobia of being late for appointments. Her husband enjoys the cruel sport of purposely delaying their activities, just to rile his wife. On the day when Mrs. Foster is due to fly to Paris to visit her grandchildren, her husband engages in his usual tricks. But as Mrs. Foster rushes from their taxi to the house to find him, she hears a strange noise—and turns triumphantly toward her cab. It is only when she returns, and calls a man to "repair the lift" that was stuck between floors in the house, that readers guess Mr. Foster's fate. "Royal Jelly" (from Kiss Kiss): New parents fear for the life of their little girl, who is sickly and dangerously underweight. The husband, a beekeeper, remembers hearing of the miraculous royal jelly used by bees to transform one particular larva into a queen. He adds the mixture to his daughter's bottles, and she puts on weight at an astonishing rate. The mother senses that something is amiss, and the husband confesses his actions—along with the fact that he himself swallowed buckets of the jelly for months in an attempt to cure his impotence. The royal jelly did the trick—but the strange side-effects include a disturbing metamorphosis for both father and daughter. "Vengeance is Mine Inc." (from More Tales of the Unexpected): Two brothers who are short of cash bemoan their fate over breakfast while reading the society column of a newspaper. They hit upon a scheme to take revenge on cruel tabloid writers in exchange for money from wealthy patrons. The unconventional plan works, and the brothers line their pockets with the spoils of their plans. "Taste" (from Someone Like You): A rich man with a beautiful young daughter hosts a dinner party, inviting a famous connoisseur of fine wines. When the rich man boasts that he has a wine that the expert cannot identify, the stakes become frighteningly high: if he can guess the name and vintage of the wine, he will win his daughter's hand. After an elaborate show, the expert guesses correctly; however, the family's maid appears and inadvertently exposes the guest as a cheat, thus saving the girl. "Neck" (from Someone Like You): A newspaper heir finds himself suddenly engaged to the voluptuous and controlling Lady Tutton. He loses all control of his life, and only his trusted butler and friends realize how broken he is by her control. A weekend trip to their estate, however, proves the perfect opportunity for Lord Tutton to engage in revenge against his wicked wife: her head is trapped in a valuable piece of wooden sculpture, and he must decide whether to use a saw or an axe to cut her free. == Publication details == Dahl, Roald (19 January 2004). The Umbrella Man and Other Stories. Speak. ISBN 9780142400876. == Reception == Groff Conklin in 1954 called the short story "The Great Automatic Grammatizator" "an awe-inspiring fantasy-satire ... an unforgettable bit of biting nonsense".

    Read more →
  • Gene expression programming

    Gene expression programming

    Gene expression programming (GEP) in computer programming is an evolutionary algorithm that creates computer programs or models. These computer programs are complex tree structures that learn and adapt by changing their sizes, shapes, and composition, much like a living organism. And like living organisms, the computer programs of GEP are also encoded in simple linear chromosomes of fixed length. Thus, GEP is a genotype–phenotype system, benefiting from a simple genome to keep and transmit the genetic information and a complex phenotype to explore the environment and adapt to it. == Background == Evolutionary algorithms use populations of individuals, select individuals according to fitness, and introduce genetic variation using one or more genetic operators. Their use in artificial computational systems dates back to the 1950s where they were used to solve optimization problems (e.g. Box 1957 and Friedman 1959). But it was with the introduction of evolution strategies by Rechenberg in 1965 that evolutionary algorithms gained popularity. A good overview text on evolutionary algorithms is the book "An Introduction to Genetic Algorithms" by Mitchell (1996). Gene expression programming belongs to the family of evolutionary algorithms and is closely related to genetic algorithms and genetic programming. From genetic algorithms it inherited the linear chromosomes of fixed length; and from genetic programming it inherited the expressive parse trees of varied sizes and shapes. In gene expression programming the linear chromosomes work as the genotype and the parse trees as the phenotype, creating a genotype/phenotype system. This genotype/phenotype system is multigenic, thus encoding multiple parse trees in each chromosome. This means that the computer programs created by GEP are composed of multiple parse trees. Because these parse trees are the result of gene expression, in GEP they are called expression trees. Masood Nekoei, et al. utilized this expression programming style in ABC optimization to conduct ABCEP as a method that outperformed other evolutionary algorithms.ABCEP == Encoding: the genotype == The genome of gene expression programming consists of a linear, symbolic string or chromosome of fixed length composed of one or more genes of equal size. These genes, despite their fixed length, code for expression trees of different sizes and shapes. An example of a chromosome with two genes, each of size 9, is the string (position zero indicates the start of each gene): 012345678012345678 L+a-baccdcLabacd where “L” represents the natural logarithm function and “a”, “b”, “c”, and “d” represent the variables and constants used in a problem. == Expression trees: the phenotype == As shown above, the genes of gene expression programming have all the same size. However, these fixed length strings code for expression trees of different sizes. This means that the size of the coding regions varies from gene to gene, allowing for adaptation and evolution to occur smoothly. For example, the mathematical expression: ( a − b ) ( c + d ) {\displaystyle {\sqrt {(a-b)(c+d)}}\,} can also be represented as an expression tree: where "Q” represents the square root function. This kind of expression tree consists of the phenotypic expression of GEP genes, whereas the genes are linear strings encoding these complex structures. For this particular example, the linear string corresponds to: 01234567 Q-+abcd which is the straightforward reading of the expression tree from top to bottom and from left to right. These linear strings are called k-expressions (from Karva notation). Going from k-expressions to expression trees is also very simple. For example, the following k-expression: 01234567890 Qb+baQba is composed of two different terminals (the variables “a” and “b”), two different functions of two arguments (“” and “+”), and a function of one argument (“Q”). Its expression gives: == K-expressions and genes == The k-expressions of gene expression programming correspond to the region of genes that gets expressed. This means that there might be sequences in the genes that are not expressed, which is indeed true for most genes. The reason for these noncoding regions is to provide a buffer of terminals so that all k-expressions encoded in GEP genes correspond always to valid programs or expressions. The genes of gene expression programming are therefore composed of two different domains – a head and a tail – each with different properties and functions. The head is used mainly to encode the functions and variables chosen to solve the problem at hand, whereas the tail, while also used to encode the variables, provides essentially a reservoir of terminals to ensure that all programs are error-free. For GEP genes the length of the tail is given by the formula: t = h ( n max − 1 ) + 1 {\displaystyle t=h(n_{\max }-1)+1} where h is the head's length and nmax is maximum arity. For example, for a gene created using the set of functions F = {Q, +, −, ∗, /} and the set of terminals T = {a, b}, nmax = 2. And if we choose a head length of 15, then t = 15 (2–1) + 1 = 16, which gives a gene length g of 15 + 16 = 31. The randomly generated string below is an example of one such gene: 0123456789012345678901234567890 b+a-aQab+//+b+babbabbbababbaaa It encodes the expression tree: which, in this case, only uses 8 of the 31 elements that constitute the gene. It's not hard to see that, despite their fixed length, each gene has the potential to code for expression trees of different sizes and shapes, with the simplest composed of only one node (when the first element of a gene is a terminal) and the largest composed of as many nodes as there are elements in the gene (when all the elements in the head are functions with maximum arity). It's also not hard to see that it is trivial to implement all kinds of genetic modification (mutation, inversion, insertion, recombination, and so on) with the guarantee that all resulting offspring encode correct, error-free programs. == Multigenic chromosomes == The chromosomes of gene expression programming are usually composed of more than one gene of equal length. Each gene codes for a sub-expression tree (sub-ET) or sub-program. Then the sub-ETs can interact with one another in different ways, forming a more complex program. The figure shows an example of a program composed of three sub-ETs. In the final program the sub-ETs could be linked by addition or some other function, as there are no restrictions to the kind of linking function one might choose. Some examples of more complex linkers include taking the average, the median, the midrange, thresholding their sum to make a binomial classification, applying the sigmoid function to compute a probability, and so on. These linking functions are usually chosen a priori for each problem, but they can also be evolved elegantly and efficiently by the cellular system of gene expression programming. == Cells and code reuse == In gene expression programming, homeotic genes control the interactions of the different sub-ETs or modules of the main program. The expression of such genes results in different main programs or cells, that is, they determine which genes are expressed in each cell and how the sub-ETs of each cell interact with one another. In other words, homeotic genes determine which sub-ETs are called upon and how often in which main program or cell and what kind of connections they establish with one another. === Homeotic genes and the cellular system === Homeotic genes have exactly the same kind of structural organization as normal genes and they are built using an identical process. They also contain a head domain and a tail domain, with the difference that the heads contain now linking functions and a special kind of terminals – genic terminals – that represent the normal genes. The expression of the normal genes results as usual in different sub-ETs, which in the cellular system are called ADFs (automatically defined functions). As for the tails, they contain only genic terminals, that is, derived features generated on the fly by the algorithm. For example, the chromosome in the figure has three normal genes and one homeotic gene and encodes a main program that invokes three different functions a total of four times, linking them in a particular way. From this example it is clear that the cellular system not only allows the unconstrained evolution of linking functions but also code reuse. And it shouldn't be hard to implement recursion in this system. === Multiple main programs and multicellular systems === Multicellular systems are composed of more than one homeotic gene. Each homeotic gene in this system puts together a different combination of sub-expression trees or ADFs, creating multiple cells or main programs. For example, the program shown in the figure was created using a cellular system with two cells and three normal genes. The applications of these multicellular systems are mu

    Read more →
  • Audio-visual speech recognition

    Audio-visual speech recognition

    Audio visual speech recognition (AVSR) is a technique that uses image processing capabilities in lip reading to aid speech recognition systems in recognizing indeterministic phones or giving preponderance among near probability decisions. Each system of lip reading and speech recognition works separately, then their results are mixed at the stage of feature fusion. As the name suggests, it has two parts. First one is the audio part and second one is the visual part. In audio part we use features like log mel spectrogram, mfcc etc. from the raw audio samples and we build a model to get feature vector out of it . For visual part generally we use some variant of convolutional neural network to compress the image to a feature vector after that we concatenate these two vectors (audio and visual ) and try to predict the target object.

    Read more →
  • Ghost in the Shell

    Ghost in the Shell

    Ghost in the Shell is a Japanese cyberpunk military science fiction media franchise that began with the eponymous manga series, written and illustrated by Masamune Shirow. The manga, first serialized from 1989 to 1991, is set in the mid-21st-century and follows the fictional counter-cyberterrorist organization Public Security Section 9, led by protagonist Major Motoko Kusanagi. Animation studio Production I.G has produced several anime adaptations of the series. These include the 1995 film of the same name and its 2004 sequel, Ghost in the Shell 2: Innocence; the 2002 television series Ghost in the Shell: Stand Alone Complex and its 2020 follow-up, Ghost in the Shell: SAC_2045; and the Ghost in the Shell: Arise original video animation series. In addition, an American-produced live-action film was released in March 2017. == Overview == === Title === The original editor Koichi Yuri says: At first, Ghost in the Shell came from Shirow, but when Yuri asked for "something more flashy", Shirow came up with "攻殻機動隊 Koukaku Kidou Tai (Shell Squad)" for Yuri. But Shirow was attached to including "Ghost in the Shell" as well even if in smaller type. === Setting === Primarily set in the mid-twenty-first century in the fictional Japanese city of Niihama, Niihama Prefecture (新浜県新浜市, Niihama-ken Niihama-shi), otherwise known as New Port City (ニューポートシティ, Nyū Pōto Shiti), the manga and the many anime adaptations follow the members of Public Security Section 9, a task-force consisting of various professionals skilled at solving and preventing crime, mostly with some sort of police background. Political intrigue and counter-terrorism operations are standard fare for Section 9, but the various actions of corrupt officials, companies, and cyber-criminals in each scenario are unique and require the diverse skills of Section 9's staff to prevent a series of incidents from escalating. In this post-cyberpunk iteration of a possible future, computer technology has advanced to the point that many members of the public possess cyberbrains, technology that allows them to interface their biological brain with various networks. The level of cyberization varies from simple minimal interfaces to almost complete replacement of the brain with cybernetic parts, in cases of severe trauma. This can also be combined with various levels of prostheses, with a fully prosthetic body enabling a person to become a cyborg. The main character of Ghost in the Shell, Major Motoko Kusanagi, is such a cyborg, having had a terrible accident befall her as a child that ultimately required her to use a full-body prosthesis to house her cyberbrain. This high level of cyberization, however, opens the brain up to attacks from highly skilled hackers, with the most dangerous being those who will hack a person to bend to their whims. == Media == === Literature === ==== Original manga ==== The original Ghost in the Shell manga ran in Japan from April 1989 to November 1990 in Kodansha's manga anthology Young Magazine, and was released in a tankōbon volume on October 2, 1991. Ghost in the Shell 2: Man-Machine Interface followed in 1997 for nine issues in Young Magazine, and was collected in the Ghost in the Shell: Solid Box on December 1, 2000. Then a standard version with modifications and new pages was published on June 26, 2001. Four stories from Man-Machine Interface that were not released in tankobon format from previous releases were later collected in Ghost in the Shell 1.5: Human-Error Processor, and published by Kodansha on July 17, 2003. Several art books have also been published for the manga. === Films === ==== Animated films ==== Two animated films based on the original manga have been released, both directed by Mamoru Oshii and animated by Production I.G. Ghost in the Shell was released in 1995 and follows the "Puppet Master" storyline from the manga. It was re-released in 2008 as Ghost in the Shell 2.0 with new audio and updated 3D computer graphics in certain scenes. Innocence, otherwise known as Ghost in the Shell 2: Innocence, was released in 2004, with its story based on a chapter from the first manga. ==== Live-action film ==== In 2008, DreamWorks and producer Steven Spielberg acquired the rights to a live-action film adaptation of the original Ghost in the Shell manga. On January 24, 2014, Rupert Sanders was announced as director, with a screenplay by William Wheeler. In April 2016, the full cast was announced, which included Juliette Binoche, Chin Han, Lasarus Ratuere and Kaori Momoi, and Scarlett Johansson in the lead role; the casting of Johansson drew accusations of whitewashing. Principal photography on the film began on location in Wellington, New Zealand, on February 1, 2016. Filming wrapped in June 2016. Ghost in the Shell premiered in Tokyo on March 16, 2017, and was released in the United States on March 31, 2017, in 2D, 3D and IMAX 3D. It received mixed reviews, with praise for its visuals and Johansson's performance but criticism for its script. === Television === ==== Stand Alone Complex TV series, film and ONA ==== In 2002, Ghost in the Shell: Stand Alone Complex premiered on Animax, presenting a new telling of Ghost in the Shell independent from the original manga, focusing on Section 9's investigation of the Laughing Man hacker. It was followed in 2004 by a second season titled Ghost in the Shell: S.A.C. 2nd GIG, which focused on the Individual Eleven terrorist group. The primary storylines of both seasons were compressed into OVAs broadcast as Ghost in the Shell: Stand Alone Complex The Laughing Man in 2005 and Ghost in the Shell: Stand Alone Complex Individual Eleven in 2006. Also in 2006, Ghost in the Shell: Stand Alone Complex - Solid State Society, featuring Section 9's confrontation with a hacker known as the Puppeteer, was broadcast, serving as a finale to the anime series. The extensive score for the series and its films was composed by Yoko Kanno. On April 7, 2017, Kodansha and Production I.G announced that Kenji Kamiyama and Shinji Aramaki would be co-directing a new Kōkaku Kidōtai anime production. On December 7, 2018, it was reported by Netflix that they had acquired the worldwide streaming rights to the original net animation (ONA) anime series, titled Ghost in the Shell: SAC_2045, and that it would premiere on April 23, 2020. The series is in 3DCG and Sola Digital Arts collaborated with Production I.G on the project. Ilya Kuvshinov handled character designs. The series had two seasons of 12 episodes each. In addition to the anime, a series of published books, two separate manga adaptations, and several video games for consoles and mobile phones have been released for Stand Alone Complex. ==== Arise OVA, TV series and film ==== In 2013, a new iteration of the series titled Ghost in the Shell: Arise premiered, taking an original look at the Ghost in the Shell world, set before the original manga. It was released as a series of four original video animation (OVA) episodes (with limited theatrical releases) from 2013 to 2014, then recompiled as a 10-episode television series under the title of Kōkaku Kidōtai: Arise - Alternative Architecture. An additional fifth OVA titled Pyrophoric Cult, originally premiering in the Alternative Architecture broadcast as two original episodes, was released on August 26, 2015. Kazuchika Kise served as the chief director of the series, with Tow Ubukata as head writer. Cornelius was brought onto the project to compose the score for the series, with the Major's new voice actress Maaya Sakamoto also providing vocals for certain tracks. Ghost in the Shell: The New Movie, also known as Ghost in the Shell: Arise − The Movie or New Ghost in the Shell, is a 2015 film directed by Kazuya Nomura that serves as a finale to the Ghost in the Shell: Arise story arc. The film is a continuation to the plot of the Pyrophoric Cult episode of Arise, and ties up loose ends from that arc. A manga adaptation was serialized in Kodansha's Young Magazine, which started on March 13 and ended on August 26, 2013. ==== 2026 anime ==== On May 25, 2024, it was announced that a new anime television series adaptation will be produced by Science Saru for a July 2026 premiere. Saru will be in a production committee with Bandai Namco Filmworks, Kodansha and Production I.G. The series will be directed by Monkochan, with a script by EnJoe Toh. === Video games === Ghost in the Shell was developed by Exact and released for the PlayStation on July 17, 1997, in Japan by Sony Computer Entertainment. It is a third-person shooter featuring an original storyline where the character plays a rookie member of Section 9. The video game's soundtrack Megatech Body features various techno artists, such as Takkyu Ishino, Scan X and Mijk Van Dijk. Several video games were also developed to tie into the Stand Alone Complex television series, in addition to a first-person shooter by Nexon and Neople titled Ghost in the Shell: Stand Alone Complex - First Assault Online,

    Read more →
  • A.I.s

    A.I.s

    A.I.s is a themed anthology of science fiction short works edited by American writers Jack Dann and Gardner Dozois. It was first published in paperback by Ace Books in December 2004. It was reissued as an ebook by Baen Books in June 2013. The book collects ten novelettes and short stories by various science fiction authors, together with a preface by the editors. == Contents == "Preface" (Jack Dann and Gardner Dozois) "Antibodies" (Charles Stross) "Trojan Horse" (Michael Swanwick) "Birth Day" (Robert Reed) "The Hydrogen Wall" (Gregory Benford) "The Turing Test" (Chris Beckett) "Dante Dreams" (Stephen Baxter) "The Names of All the Spirits" (J. R. Dunn) "From the Corner of My Eye" (Alexander Glass) "Halfjack" (Roger Zelazny) "Computer Virus" (Nancy Kress)

    Read more →
  • Document AI

    Document AI

    Document AI, also known as Document Intelligence, refers to a field of technology that employs machine learning (ML) techniques, such as natural language processing (NLP). These techniques are used to develop computer models capable of analyzing documents in a manner akin to human review. Through NLP, computer systems are able to understand relationships and contextual nuances in document contents, which facilitates the extraction of information and insights. Additionally, this technology enables the categorization and organization of the documents themselves. The applications of Document AI extend to processing and parsing a variety of semi-structured documents, such as forms, tables, receipts, invoices, tax forms, contracts, loan agreements, and financial reports. == Key features == Machine learning is utilized in Document AI to extract information from both printed and digital documents. This technology recognizes images, text, and characters in various languages, aiding in the extraction of insights from unstructured documents. The use of this technology can improve the speed and quality of decision-making in document analysis. Additionally, the automation of data extraction and validation can contribute to increased efficiency in document analysis processes. Since the early 2020s, the integration of large language models has extended Document AI beyond extraction toward generative tasks, including the automated drafting of forms, contracts, and document summaries. == Example == A business letter contains information in the form of text, as well as other types of information, such as the position of the text. For instance, a typical letter contains two addresses before the body of the text. The address at the very top (sometimes aligned to the right) is the sender address. This is normally followed by the date of the letter, with the place of writing. After this, the receiver address is listed. The distinction between the sender address and the receiver address is conveyed solely by the position of the address on the page, i.e. there is no textual indication like Sender: in front of the addresses. == Data dimensions and ML architecture == Data is typically distinguished into spatial data and time-series data, the former includes things like images, maps and graphs, while the latter includes signals such as stock prices or voice recordings. Document AI combines text data, which has a time dimension, with other types of data, such as the position of an address in a business letter, which is spatial. Historically in machine learning spatial data was analyzed using a convolutional neural network, and temporal data using a recurrent neural network. With the advent of dimension-type agnostic transformer architecture, these two different types of dimension can be more easily combined, Document AI is an example of this. == Benchmarks == Several public datasets are used to evaluate Document AI systems. FUNSD (Form Understanding in Noisy Scanned Documents) contains 199 annotated forms with token- and block-level labels for form understanding tasks. CORD (Consolidated Receipt Dataset) supports key information extraction from receipts. DocVQA contains approximately 50,000 questions over 12,000 document images for layout-aware visual question answering. == Common uses == Document AI systems are used to automate document processing and information extraction in business and financial workflows, including invoice and receipt processing, data entry automation, anomaly detection, mortgage processing, loan portfolio monitoring, credit risk management, and fraud detection such as counterfeit currency and fraudulent checks. They are also applied in regulatory compliance and contract analysis, including assessing changes in legal and regulatory documents. In real estate, Document AI supports document classification and structured information extraction for standardized processing and analytics. With the adoption of generative AI, Document AI systems can also generate and pre-fill structured documents such as contracts or business forms from natural language prompts.

    Read more →
  • Creately

    Creately

    Creately is a SaaS visual collaboration tool with diagramming and design capabilities designed by Cinergix. The application is mostly known for creating flowcharts, organization charts, project charts, UML diagrams, mind maps, and other business visuals. == History == The initial beta version of Creately was released by Chandika Jayasundara. Hiraash Thawfeek, Nick Foster and Charanjit Singh joined the project in the same year. Chandika Jayasundara is CEO of Cinergix. The headquarters of the company is located at Mentone, Victoria, Australia. == Features and reception == Creately provides predefined templates and diagram elements for incorporating in the projects. It provides drag and drop feature with which both predefined and custom made shapes can be included to build the desired diagram while the same workspace can be shared with multiple persons for collaboration. Some experts have reviewed the application by commenting on its lacking in accessible integration options as its downside. The company claims Creately to have integration feature with Slack, Confluence while not having the integration with Zapier and OneDrive yet. It is compatible with Google Drive and Dropbox. The software is available as both freemium and paid option.

    Read more →
  • Jake Elwes

    Jake Elwes

    Jake Elwes () is a British media artist, hacker and researcher. Their practice is the exploration of artificial intelligence (AI), queer theory and technical biases. They are known for using AI to create art in mediums such as video, performance and installation. Elwes considers themselves to be neuroqueer, and their work on queering technology addresses issues caused by the normative biases of artificial intelligence. == Education and early life == Elwes was born in London to British contemporary artist and painter Luke Elwes and Anneke, daughter of Hans Dumoulin. Elwes is the great grandchild of Army officer James Hennessy and portrait painter Simon Elwes RA, son of Victorian opera singer Gervase Elwes. Elwes studied at the Slade School of Fine Art from 2013 to 2017, where they began using computer code as a medium. In 2016 they attended the School of Machines, Making & Make-Believe in Berlin with artist and educator Gene Kogan. Elwes was introduced to drag performance by their collaborator Dr Joe Parslow who holds a PhD in drag performance. Drag performance has since become instrumental to Elwes' work. == Career == Elwes' work with artificial intelligence is cited as a hopeful strategy to make AI more playful and diverse. Elwes' work has been exhibited in numerous international art museums and galleries and was featured in a BBC documentary on the history of video art, they were a 2021 finalist for the Lumen Prize, and received the Honorary Mention of the 2022 Prix Ars Electronica in the Interactive Art + category. They also curated and presented the opening provocation "The New Real - Artistic and Queer Visions of AI Futures" to the UK government with two drag artists at the AI UK conference 2024. Elwes is part of the Radical Faeries countercultural movement. They have exhibited in museums and galleries across Europe and Asia including: Victoria and Albert Museum (London, UK) - The Zizi Show (2023-2024) for the first digital commission in their photography center's digital gallery Pinakothek der Moderne (Munich, Germany) - Glitch. Die Kunst Der Störung (2023-2024) ZKM (Karlsruhe, Germany) - Biomedia (2021-2022) National Museum of Modern and Contemporary Art (Cheongju, South Korea) - What an Artificial World (2024) Somerset House (London, UK) - The Horror Show! (2022-2023) Gazelli Art House (London, UK) - Jake Elwes: Data • Glitch • Utopia (2023) (survey exhibition) Jut Art Museum (Taipei, Taiwan) - Future Lives, Future You (2023-2024) Max Ernst Museum (Brühl, Germany) - Surreal Futures (2023-2024) Zabludowicz Collection (London, UK) - Among the Machines (2022) Ars Electronica (Linz, Austria) - Prix Ars Electronica, CyberArts Exhibition (2022) Institute of Contemporary Arts (ICA) (London, UK) - Do Androids Dream on Silver Screens? (2023) Arebyte gallery (London, UK) - Real-Time Constraints (2020) Ming Contemporary Art Museum (McaM) (Shanghai, China) - Mind the Deep (2019) HMKV (Hartware MedienKunstVerein) (Dortmund, Germany) - House of Mirrors: Artificial Intelligence as Phantasm (2022) Today Art Museum (Beijing, China) - Future of Today: DEJA VU (2019) Science Gallery (Dublin, Ireland) - BIAS (2021-2022) Yuz Museum (Shanghai, China) - Lying Sophia and Mocking Alexa (2021) Fotomuseum Winterthur The Onassis Foundation (Athens, Greece) - You and AI (2021) Royal College of Art (London, UK) - Event Two (2019) (50th anniversary of Computer Arts Society & Event One) Museum für Naturkunde (Berlin, Germany) - Forschungsfall Nachtigall (2019) Frankfurter Kunstverein (Frankfurt, Germany) - I am here to learn (2018) Nature Morte (Delhi, India) - Gradient Descent (2018) BALTIC Centre for Contemporary Art (Newcastle, UK) - Bloomberg New Contemporaries (2017) == Artworks == === The Zizi Project - a deepfake drag cabaret === The Zizi Project is a series of works that explore the interaction of drag and A.I. Currently, The Zizi Project is made up of multiple artworks. ==== Zizi - Queering the Dataset (2019) ==== Knowing that facial recognition technology statically struggle to recognize black women or transgender people, Elwes set out to "Queer the Dataset" through an open-sourced generative adversarial network (GAN, a type of machine learning model and an early Generative artificial intelligence). Elwes added a dataset of 1,000 photos of drag kings and queens into the GAN's 70,000 faces collected in a standardised facial recognition dataset called Flickr-Faces-HQ Dataset (FFHQ). They then created new simulacra faces, known as deep fakes. "We queer that data so it shifts all of the weights in this neural network from a space of normativity into a space of queerness and otherness. Suddenly all of the faces start to break down and you see mascara dissolve into lipstick and blue eye shadow turn into a pink wig" said Elwes in a 2023 interview for Artnet. ==== Zizi & Me (2020–2023) ==== Zizi & Me is an ongoing multimedia collaboration between drag queen Me The Drag Queen and a deepfake A.I. clone of Me The Drag Queen. Using neural networks trained on filmed footage, the project creates a virtual body that can mimic reference movements. The first act, which features a digital lip-sync duet to Anything You Can Do (I Can Do Better), satirises the idea of A.I. being mistaken for a human, using drag performance and cabaret to critique societal narratives about A.I. and its role in shaping identity. The project is part of The Zizi Project by Jake Elwes, which explores the intersection of drag performance and A.I. ==== The Zizi Show - A Deepfake Drag Cabaret (2020) ==== The Zizi Show is a deep fake drag act based on artificial intelligence (AI). It has been presented live and as interactive online artwork. It is an exploration of queer culture and the algorithms philosophy and ethics of AI. The Zizi Show was exhibited as the inaugural exhibition in the digital gallery at the V&A’s Photography Center from 2023 to 2024. ==== Zizi in Motion: A Deepfake Drag Utopia (Movement by Wet Mess) (2023) ==== "Zizi in Motion" is a multichannel silent video installation featuring AI-generated deepfake performances, which are dynamically re-animated through the movements of London drag artist Wet Mess. The movements of Wet Mess cause the AI-generated visuals to glitch and distort, showcasing the interaction between drag performance and artificial intelligence. The work explore the potential for queer communities to ethically and creatively reclaim and repurpose deepfake technology, using it to celebrate queer bodies and identities. === Art in the Cage of Digital Reproduction (2024) === In an act of protest on 26 November 2024, Elwes facilitated indirect access to an early access token for OpenAI’s Sora text-to-video model through a Hugging Face frontend under the account "PR Puppets". The accompanying statement called to 'denormalize the exploitation of artists by major AI companies for training data, R&D, and publicity'. The incident attracted international press coverage calling into question the role of artists in shaping the future of generative AI versus merely serving as data and credibility providers for tech giants. Elwes also coordinated a collection of mini essays with responses and reflections from the signees and guest writers titled "Art in the Cage of Digital Reproduction". === Installations exploring interpretation and feedback loops between neural networks === Elwes has created works based on the interpretations and misinterpretations between different neural networks and training datasets including: A.I. Interprets A.I. Interpreting ‘Against Interpretation’ (Sontag 1966) from 2023, Closed Loop from 2017, and Auto-Encoded Buddha from 2016. ==== A.I. Interprets A.I. Interpreting ‘Against Interpretation’ (Sontag 1966) (2023) ==== A.I. Interprets A.I. Interpreting ‘Against Interpretation (Sontag 1966) is a three-channel video artwork where an AI interprets Susan Sontag’s essay into images, and then and another AI reinterprets those images back into language. The piece highlights how AI-generated art can misinterpret and introduce bias. ==== Closed Loop (2017) ==== Closed Loop is a two-channel video where two neural networks engage in a continuous feedback loop, one generating images based on the text output and the other creating text based on the image output. The work explores how AI models misinterpret and evolve in a surreal, self-perpetuating conversation, without human input. ==== Auto-Encoded Buddha (2016) ==== Auto-Encoded Buddha is a mixed-media piece where an AI attempts to generate an image of a Buddha statue, trained on 5,000 Buddha images. The AI struggles to accurately represent the Buddha, highlighting the limitations of early generative neural networks. The work is a tribute to Nam June Paik’s TV Buddha (1974). === CUSP (2019) === In their video work CUSP (2019) Elwes places marsh birds generated using artificial intelligence into a tidal landscape. These digitally generated and constantly shifting birds are recorded in dialogue with native

    Read more →
  • We Appreciate Power

    We Appreciate Power

    "We Appreciate Power" is a song by Canadian musician Grimes, featuring American musician Hana. It was released on November 29, 2018, billed as the lead single from her fifth studio album Miss Anthropocene, however it is only available on the Japanese and deluxe releases. The song was written and produced by Grimes, Poppy (originally), Hana and Chris Greatti. == Background and release == The song was supposed to be one of two collaborations between Grimes and American singer Poppy, for the latter's second studio album Am I a Girl?. In an interview, Poppy mentioned that she wrote two songs with Grimes; one about "destroying things" and another about "power". The other song, "Play Destroy", was featured on the album. Grimes shared a lyric of the song with a photo of her with Poppy on Twitter in May 2018. Following feuds between the two singers, the song was released by Grimes featuring singer Hana instead. On November 26, Grimes announced she would be releasing new music on November 29. Two days later, she revealed that the single is titled "We Appreciate Power" and features Hana, and shared the artwork. The release of the song was accompanied by a lyric video directed by Grimes and her brother Mac Boucher. == Music and lyrics == "We Appreciate Power" is an industrial rock, nu metal, and techno-industrial song. The track is regarded as a further step into Grimes's experimentation with guitars that started on 2015's Art Angels. The track was compared to the works of Nine Inch Nails; Jillian Mapes of Pitchfork described the song as "an immediate onslaught of mutilated noise—distorted metal guitar chug, bloody screams, a guitar loop that conjures fear and demands worship. Flashes of Nine Inch Nails' Pretty Hate Machine reverberate through the drum programming and synths." Brendan Klinkenberg of Rolling Stone placed the song "somewhere between power pop and straightforward industrial (with an extended bridge reminiscent of the most sweeping moments in a Final Fantasy score)" and "a distinctly 2018 take on Nine Inch Nails-esque hard-edged rock." A press release stated that the song was inspired by the North Korean band Moranbong and was written "from the perspective of a Pro-A.I. Girl Group Propaganda machine who use song, dance, sex and fashion to spread goodwill towards Artificial Intelligence." In addition Grimes stated that by simply listening to the song you will be reducing your risk of ending up on any future AI overlord's hit list when it reigns supreme, mirroring the Roko's basilisk theory. Lyrically, the song touches on transhumanist ideas such as the betterment and future of the human race, the possibilities of merging consciousness with machines to extend life indefinitely through mind uploading, and the idea that reality may be simulated. The song's chorus generated a spike in interest in the word "capitulate". == Critical reception == Pitchfork critic Jillian Mapes wrote: "If "Freak on a Leash" isn't a dealbreaker, then the supervillain allure of "We Appreciate Power" might pull you in (it legitimately slaps), but it just as well may leave you weighed down by Grimes' commitment to the absolute darkest timeline." Billboard's Gil Kaufman described the song as "a dystopian, aggressive dive into a more rock-leaning sound." Similarly, Brendan Klinkenberg of Rolling Stone called it "the most aggressive single Grimes has released to date" Noisey called the song "an absolute motherfucker of a single" and opined it sounds "like a K-pop band covering nu-metal". Justin Kamp of Paste described the track as a "glitchy empowerment anthem that chugs along on screeching synths and Grimes' repeated exultations of power." == Personnel == Credits adapted from Tidal. Grimes – vocals, guitar, production, engineering Hana – vocals, guitar, additional production Chris Greatti – guitar, keyboards, production, engineering Zakk Cervini – mixing == Track listing == == Charts ==

    Read more →
  • Shyster (expert system)

    Shyster (expert system)

    SHYSTER is a legal expert system developed at the Australian National University in Canberra in 1993. It was written as the doctoral dissertation of James Popple under the supervision of Robin Stanton, Roger Clarke, Peter Drahos, and Malcolm Newey. A full technical report of the expert system, and a book further detailing its development and testing have also been published. SHYSTER emphasises its pragmatic approach, and posits that a legal expert system need not be based upon a complex model of legal reasoning in order to produce useful advice. Although SHYSTER attempts to model the way in which lawyers argue with cases, it does not attempt to model the way in which lawyers decide which cases to use in those arguments. SHYSTER is of a general design, permitting its operation in different legal domains. It was designed to provide advice in areas of case law that have been specified by a legal expert using a bespoke specification language. Its knowledge of the law is acquired, and represented, as information about cases. It produces its advice by examining, and arguing about, the similarities and differences between cases. It derives its name from Shyster: a slang word for someone who acts in a disreputable, unethical, or unscrupulous way, especially in the practice of law and politics. == Methods == SHYSTER is a specific example of a general category of legal expert systems, broadly defined as systems that make use of artificial intelligence (AI) techniques to solve legal problems. Legal AI systems can be divided into two categories: legal retrieval systems and legal analysis systems. SHYSTER belongs to the latter category of legal analysis systems. Legal analysis systems can be further subdivided into two categories: judgment machines and legal expert systems. SHYSTER again belongs to the latter category of legal expert systems. A legal expert system, as Popple uses the term, is a system capable of performing at a level expected of a lawyer: "AI systems which merely assist a lawyer in coming to legal conclusions or preparing legal arguments are not here considered to be legal expert systems; a legal expert system must exhibit some legal expertise itself." Designed to operate in more than one legal domain, and be of specific use to the common law of Australia, SHYSTER accounts for statute law, case law, and the doctrine of precedent in areas of private law. Whilst it accommodates statute law, it is primarily a case-based system, in contradistinction to rule-based systems like MYCIN. More specifically, it was designed in a manner enabling it to be linked with a rule-based system to form a hybrid system. Although case-based reasoning possesses an advantage over rule-based systems by the elimination of complex semantic networks, it suffers from intractable theoretical obstacles: without some further theory it cannot be predicted what features of a case will turn out to be relevant. Users of SHYSTER therefore require some legal expertise. Richard Susskind argues that "jurisprudence can and ought to supply the models of law and legal reasoning that are required for computerized [sic] implementation in the process of building all expert systems in law". Popple, however, believes jurisprudence is of limited value to developers of legal expert systems. He posits that a lawyer must have a model of the law (maybe unarticulated) which includes assumptions about the nature of law and legal reasoning, but that model need not rest on basic philosophical foundations. It may be a pragmatic model, developed through experience within the legal system. Many lawyers perform their work with little or no jurisprudential knowledge, and there is no evidence to suggest that they are worse, or better, at their jobs than lawyers well-versed in jurisprudence. The fact that many lawyers have mastered the process of legal reasoning, without having been immersed in jurisprudence, suggests that it may indeed be possible to develop legal expert systems of good quality without jurisprudential insight. As a pragmatic legal expert system SHYSTER is the embodiment of this belief. A further example of SHYSTER’s pragmatism is its simple knowledge representation structure. This structure was designed to facilitate specification of different areas of case law using a specification language. Areas of case law are specified in terms of the cases and attributes of importance in those areas. SHYSTER weights its attributes and checks for dependence between them. In order to choose cases upon which to construct its opinions, SHYSTER calculates distances between cases and uses these distances to determine which of the leading cases are nearest to the instant case. To this end SHYSTER can be seen to adopt and expand upon nearest neighbor search methods used in pattern recognition. These nearest cases are used to produce an argument (based on similarities and differences between the cases) about the likely outcome in the instant case. This argument relies on the doctrine of precedent; it assumes that the instant case will be decided the same way as was the nearest case. SHYSTER then uses information about these nearest cases to construct a report. The report that SHYSTER generates makes a prediction and justifies that prediction by reference only to cases and their similarities and differences: the calculations that SHYSTER performs in coming to its opinion do not appear in that opinion. Safeguards are employed to warn users if SHYSTER doubts the veracity of its advice. == Results == SHYSTER was tested in four different and disparate areas of case law. Four specifications were written, each representing an area of Australian law: an aspect of the law of trover; the meaning of "authorization [sic]" in copyright law of Australia; the categorisation of employment contracts; and the implication of natural justice in administrative decision-making. SHYSTER was evaluated under five headings: its usefulness, its generality, the quality of its advice, its limitations, and possible enhancements that could be made to it. Despite its simple knowledge representation structure, it has shown itself capable of producing good advice, and its simple structure has facilitated the specification of different areas of law. Appreciating the difficulties encountered by legal expert systems developers in adequately representing legal knowledge can assist in appreciating the shortcomings of digital rights management technologies. Some academics believe future digital rights management systems may become sophisticated enough to permit exceptions to copyright law. To this end SHYSTER's attempt to model "authorization [sic]" in the Copyright Act can be viewed as pioneering work in this field. The term "authorization [sic]" is undefined in the Copyright Act. Consequently, a number of cases have been before the courts seeking answers as to what conduct amounts to authorisation. The main contexts in which the issue has arisen are analogous to permitted exceptions to copyright currently prevented by most digital rights management technologies: "home taping of recorded materials, photocopying in educational institutions and performing works in public". When applied to one case concerning compact cassettes, SHYSTER successfully agreed that Amstrad did not authorise the infringement. 'shyster-myci'n Popple highlighted the most obvious avenue of future research using SHYSTER as the development of a rule-based system, and the linking together of that rule-based system with the existing case-based system to form a hybrid system. This intention was eventually realised by Thomas O’Callaghan, the creator of SHYSTER-MYCIN: a hybrid legal expert system first presented at ICAIL '03, 24–28 June 2003 in Edinburgh, Scotland. MYCIN is an existing medical expert system, which was adapted for use with SHYSTER. MYCIN’s controversial "certainty factor" is not used in SHYSTER-MYCIN. The reason for this is the difficulty in scientifically establishing how certain a fact is in a legal domain. The rule-based approach of the MYCIN part is used to reason with the provisions of an Act of Parliament only. This hybrid system enables the case-based system (SHYSTER) to determine open textured concepts when required by the rule-based system (MYCIN). The ultimate conclusion of this joint endeavour is that a hybrid approach is preferred in the creation of legal expert systems where "it is appropriate to use rule-based reasoning when dealing with statutes, and…case-based reasoning when dealing with cases".

    Read more →
  • Waveform graphics

    Waveform graphics

    Waveform graphics is a simple vector graphics system introduced by Digital Equipment Corporation (DEC) on the VT55 and VT105 terminals in the mid-1970s. It was used to produce graphics output from mainframes and minicomputers. DEC used the term "waveform graphics" to refer specifically to the hardware, but it was used more generally to describe the whole system. The system was designed to use as little computer memory as possible. At any given X location it could draw two dots at given Y locations, making it suitable for producing two superimposed waveforms, line charts or histograms. Text and graphics could be mixed, and there were additional tools for drawing axes and markers. The waveform graphics system was used only for a short period of time before it was replaced by the more sophisticated ReGIS system, first introduced on the VT125 in 1981. ReGIS allowed the construction of arbitrary vectors and other shapes. Whereas DEC normally provided a backward compatible solution in newer terminal models, they did not choose to do this when ReGIS was introduced, and waveform graphics disappeared from later terminals. == Description == Waveform graphics was introduced on the VT55 terminal in October 1975, an era when memory was extremely expensive. Although it was technically possible to produce a bitmap display using a framebuffer using technology of the era, the memory needed to do so at a reasonable resolution was typically beyond the price point that made it practical. All sorts of systems were used to replace computer memory with other concepts, like the storage tubes used in the Tektronix 4010 terminals, or the zero memory racing-the-beam system used in the Atari 2600. DEC chose to attack this problem through a clever use of a small buffer representing only the vertical positions on the screen. Such a system could not draw arbitrary shapes, but would allow the display of graph data. The system was based on a 512 by 236 pixel display, producing 512 vertical columns along the X-axis, and 236 horizontal rows on the Y-axis. Y locations were counted up from the bottom, so the coordinate 0,0 was in the lower left, and 511, 235 in the upper right. Had this been implemented using a framebuffer with each location represented by a single bit, 512 × 236 × 1 = 120,832 bits, or 15,104 bytes, would have been required. At the time, memory cost about $50 per kilobyte, so the buffer alone would cost over $700 (equivalent to $4,570 in 2025). Instead, the waveform graphic system used one byte of memory for each X axis location, with the byte's value representing the Y location. This required only 512 bytes for each graph, a total of 1024 bytes for the two graphs. Drawing a line required the programmer to construct a series of Y locations and send them as individual points, the terminal could not connect the dots itself. To make this easier, the terminal automatically incremented the X location every time an Y coordinate was received, so a graph line could be sent as a long string of numbers for subsequent Y locations instead of having to repeatedly send the X location every time. Drawing normally started by sending a single instruction to set the initial X location, often 0 on the left, and then sending in data for the entire curve. The system also included storage for up to 512 markers on both lines. These were always drawn centered on the Y value of the line they were associated with, meaning that a simple on/off indication for X locations was all that was needed, requiring only 1024 bits, or 128 bytes, in total. The markers extended 16 pixels vertically, and could only be aligned on 16-pixel boundaries, so they were not necessarily centered across the underlying graph. Markers were used to indicate important points on the graph, where a symbol of some sort would normally be used. The system also allowed a vertical line to be drawn for every horizontal location and a horizontal one at every vertical location. These were also stored as simple on/off bits, requiring another 128 bytes of memory. These lines were used to draw axes and scale lines, or could be used for a screen-spanning crosshair cursor. A separate set of two 7-bit registers held additional information about the drawing style and other settings. Although complex from the user's perspective, this system was easy to implement in hardware. A cathode ray tube produces a display by scanning the screen in a series of horizontal motions, moving down one vertical line after each horizontal scan. At any given instant during this process, the display hardware examines a few memory locations to see if anything needs to be displayed. For instance, it can determine whether to draw a marker on graph 0 by examining register 1 to see if markers are turned on, looking in the marker buffer to see if there is a 1 at the current X location, and then examining the Y location of graph 0 to see if it is within 16 pixels of the current scan line. If all of these are true, a spot is drawn to present that portion of the marker. As this will be true for 16 vertical locations during the scanning process, a 16-pixel high marker will be drawn. Sold alone, the VT55 was priced at $2,496 (equivalent to $16,295 in 2025),. Like other models of the VT50 series, the terminal could be equipped with an optional wet-paper printer in a panel on the right of the screen. This added $800 (equivalent to $5,223 in 2025) to the price. DEC also offered VT55 in a package with a small model of the PDP-11 to create one model of the DEClab 11/03 system. The DEClab normally sold for $14,000 (equivalent to $91,397 in 2025) with a DECwriter II (LA36) hard-copy terminal for $15,000 (equivalent to $97,925 in 2025), with the VT55. The system had I/O channels for up to 15 lab devices, and included libraries for FORTRAN and BASIC for reading the data and creating graphs. The fairly extensive VT55 Programmers Manual covered the latter in depth. == Commands and data == Data was sent to the terminal using an extended set of codes similar to those introduced on the VT52. VT52 codes generally started with the ESC character (octal 33, decimal 27) and was then followed by a single letter instruction. For instance, the string of four characters ESC H ESC J would reposition the cursor in the upper left (home) and then clear the screen from that point down. These codes were basically modeless; triggered by the ESC the resulting escape mode automatically exited again when the command was complete. Escape codes could be interspersed with display text anywhere in the stream of data. In contrast, the graphics system was entirely modal, with escape sequences being sent to cause the terminal to enter or exit graph drawing mode. Data sent between these two codes were interpreted by the graphics hardware, so text and graphics could not be mixed in a single stream of instructions. Graphics mode was entered by sending the string ESC 1, and exited again with the string ESC 2. Even the commands within the graphics mode were modal; characters were interpreted as being additional data for the previous load character (command) until another load character is seen. Ten load characters were available: @ - no operation, used to tell the terminal the last command is no longer active A - load data into register 0, selecting the drawing mode for the two graphs I - load data into register 1, selecting other drawing options H - load the starting X position (Horizontal) for the following commands B - load data for Y locations for graph 0 starting at the H position selected earlier J - load data for Y locations for graph 1 starting at the H position selected earlier C - store a marker on graph 0 at the following X location K - store a marker on graph 1 at the following X location D - draw a horizontal line at the given Y location L - draw a vertical line at the given X location X and Y locations were sent as 10-bit decimal numbers, encoded as ASCII characters, with 5 bits per character. This means that any number within the 1024 number space (210) can be stored as a string of two characters. To ensure the characters can be transmitted over 7-bit links, the pattern 01 is placed in front of both 5-bit numbers, producing 7-bit ASCII values that are always within the printable range. This results in a somewhat complex encoding algorithm. For instance, if one wanted to encode the decimal value 102, first you convert that to the 10-bit decimal pattern 0010010010. That is then split that into upper and lower 5-bit parts, 00100 and 10010. Then append 01 binary to produce 7-bit numbers 0100100 and 0110010. Individually convert back to decimal 40 and 50, and then look up those characters in an ASCII chart, finding ( and 2. These have to be sent to the terminal least significant character first. If these were being used to set the X coordinate, the complete string would be H2(. When used as X and Y locations for the graphs, extra digits were ignored. For instance, the 512 pixel X axis r

    Read more →
  • Neural architecture search

    Neural architecture search

    Neural architecture search (NAS) is a technique for automating the design of artificial neural networks (ANN), a widely used model in the field of machine learning. NAS has been used to design networks that are on par with or outperform hand-designed architectures. Methods for NAS can be categorized according to the search space, search strategy and performance estimation strategy used: The search space defines the type(s) of ANN that can be designed and optimized. The search strategy defines the approach used to explore the search space. The performance estimation strategy evaluates the performance of a possible ANN from its design (without constructing and training it). NAS is closely related to hyperparameter optimization and meta-learning and is a subfield of automated machine learning (AutoML). == Reinforcement learning == Reinforcement learning (RL) can underpin a NAS search strategy. Barret Zoph and Quoc Viet Le applied NAS with RL targeting the CIFAR-10 dataset and achieved a network architecture that rivals the best manually-designed architecture for accuracy, with an error rate of 3.65, 0.09 percent better and 1.05x faster than a related hand-designed model. On the Penn Treebank dataset, that model composed a recurrent cell that outperforms LSTM, reaching a test set perplexity of 62.4, or 3.6 perplexity better than the prior leading system. On the PTB character language modeling task it achieved bits per character of 1.214. Learning a model architecture directly on a large dataset can be a lengthy process. NASNet addressed this issue by transferring a building block designed for a small dataset to a larger dataset. The design was constrained to use two types of convolutional cells to return feature maps that serve two main functions when convoluting an input feature map: normal cells that return maps of the same extent (height and width) and reduction cells in which the returned feature map height and width is reduced by a factor of two. For the reduction cell, the initial operation applied to the cell's inputs uses a stride of two (to reduce the height and width). The learned aspect of the design included elements such as which lower layer(s) each higher layer took as input, the transformations applied at that layer and to merge multiple outputs at each layer. In the studied example, the best convolutional layer (or "cell") was designed for the CIFAR-10 dataset and then applied to the ImageNet dataset by stacking copies of this cell, each with its own parameters. The approach yielded accuracy of 82.7% top-1 and 96.2% top-5. This exceeded the best human-invented architectures at a cost of 9 billion fewer FLOPS—a reduction of 28%. The system continued to exceed the manually-designed alternative at varying computation levels. The image features learned from image classification can be transferred to other computer vision problems. E.g., for object detection, the learned cells integrated with the Faster-RCNN framework improved performance by 4.0% on the COCO dataset. In the so-called Efficient Neural Architecture Search (ENAS), a controller discovers architectures by learning to search for an optimal subgraph within a large graph. The controller is trained with policy gradient to select a subgraph that maximizes the validation set's expected reward. The model corresponding to the subgraph is trained to minimize a canonical cross entropy loss. Multiple child models share parameters, ENAS requires fewer GPU-hours than other approaches and 1000-fold less than "standard" NAS. On CIFAR-10, the ENAS design achieved a test error of 2.89%, comparable to NASNet. On Penn Treebank, the ENAS design reached test perplexity of 55.8. == Evolution == An alternative approach to NAS is based on evolutionary algorithms, which has been employed by several groups. An Evolutionary Algorithm for Neural Architecture Search generally performs the following procedure. First a pool consisting of different candidate architectures along with their validation scores (fitness) is initialised. At each step the architectures in the candidate pool are mutated (e.g.: 3x3 convolution instead of a 5x5 convolution). Next the new architectures are trained from scratch for a few epochs and their validation scores are obtained. This is followed by replacing the lowest scoring architectures in the candidate pool with the better, newer architectures. This procedure is repeated multiple times and thus the candidate pool is refined over time. Mutations in the context of evolving ANNs are operations such as adding or removing a layer, which include changing the type of a layer (e.g., from convolution to pooling), changing the hyperparameters of a layer, or changing the training hyperparameters. On CIFAR-10 and ImageNet, evolution and RL performed comparably, while both slightly outperformed random search. == Bayesian optimization == Bayesian Optimization (BO), which has proven to be an efficient method for hyperparameter optimization, can also be applied to NAS. In this context, the objective function maps an architecture to its validation error after being trained for a number of epochs. At each iteration, BO uses a surrogate to model this objective function based on previously obtained architectures and their validation errors. One then chooses the next architecture to evaluate by maximizing an acquisition function, such as expected improvement, which provides a balance between exploration and exploitation. Acquisition function maximization and objective function evaluation are often computationally expensive for NAS, and make the application of BO challenging in this context. Recently, BANANAS has achieved promising results in this direction by introducing a high-performing instantiation of BO coupled to a neural predictor. == Hill-climbing == Another group used a hill climbing procedure that applies network morphisms, followed by short cosine-annealing optimization runs. The approach yielded competitive results, requiring resources on the same order of magnitude as training a single network. E.g., on CIFAR-10, the method designed and trained a network with an error rate below 5% in 12 hours on a single GPU. == Multi-objective search == While most approaches solely focus on finding architecture with maximal predictive performance, for most practical applications other objectives are relevant, such as memory consumption, model size or inference time (i.e., the time required to obtain a prediction). Because of that, researchers created a multi-objective search. LEMONADE is an evolutionary algorithm that adopted Lamarckism to efficiently optimize multiple objectives. In every generation, child networks are generated to improve the Pareto frontier with respect to the current population of ANNs. Neural Architect is claimed to be a resource-aware multi-objective RL-based NAS with network embedding and performance prediction. Network embedding encodes an existing network to a trainable embedding vector. Based on the embedding, a controller network generates transformations of the target network. A multi-objective reward function considers network accuracy, computational resource and training time. The reward is predicted by multiple performance simulation networks that are pre-trained or co-trained with the controller network. The controller network is trained via policy gradient. Following a modification, the resulting candidate network is evaluated by both an accuracy network and a training time network. The results are combined by a reward engine that passes its output back to the controller network. == One-shot models == RL or evolution-based NAS require thousands of GPU-days of searching/training to achieve state-of-the-art computer vision results as described in the NASNet, mNASNet and MobileNetV3 papers. To reduce computational cost, many recent NAS methods rely on the weight-sharing idea. In this approach, a single overparameterized supernetwork (also known as the one-shot model) is defined. A supernetwork is a very large Directed Acyclic Graph (DAG) whose subgraphs are different candidate neural networks. Thus, in a supernetwork, the weights are shared among a large number of different sub-architectures that have edges in common, each of which is considered as a path within the supernet. The essential idea is to train one supernetwork that spans many options for the final design rather than generating and training thousands of networks independently. In addition to the learned parameters, a set of architecture parameters are learnt to depict preference for one module over another. Such methods reduce the required computational resources to only a few GPU days. More recent works further combine this weight-sharing paradigm, with a continuous relaxation of the search space, which enables the use of gradient-based optimization methods. These approaches are generally referred to as differentiable NAS and have proven very efficient in exploring the search space of ne

    Read more →
  • SQLf

    SQLf

    SQLf is a SQL extended with fuzzy set theory application for expressing flexible (fuzzy) queries to traditional (or ″Regular″) Relational Databases. Among the known extensions proposed to SQL, at the present time, this is the most complete, because it allows the use of diverse fuzzy elements in all the constructions of the language SQL. SQLf is the only known proposal of flexible query system allowing linguistic quantification over set of rows in queries, achieved through the extension of SQL nesting and partitioning structures with fuzzy quantifiers. It also allows the use of quantifiers to qualify the quantity of search criteria satisfied by single rows. Several mechanisms are proposed for query evaluation, the most important being the one based on the derivation principle. This consists in deriving classic queries that produce, given a threshold t, a t-cut of the result of the fuzzy query, so that the additional processing cost of using a fuzzy language is diminished. == Basic block == The fundamental querying structure of SQLf is the multi-relational block. The conception of this structure is based on the three basic operations of the relational algebra: projection, cartesian product and selection, and the application of fuzzy sets’ concepts. The result of a SQLf query is a fuzzy set of rows that is a fuzzy relation instead of a regular relation. A basic block in SQLf consists of a SELECT clause, a FROM clause and an optional WHERE clause. The semantic of this query structure is: The SELECT clause corresponds to the projection. It specifies the relations’ attributes (or attribute expressions) that will be selected. The resulting table is a fuzzy set and it is given in decreasing ordered of satisfaction degree. The SELECT clause specifies also a calibration that is intended to restrict the set of rows retrieved. There are two kinds of calibrations: quantitative and qualitative. In quantitative calibration the user specifies the number of results to be retrieved, so that the query will retrieve the rows with highest membership degrees up to the number of required answers. In qualitative calibration the user specifies a minim level of satisfaction that must have any retrieved row. The FROM clause corresponds to the Cartesian Product. The consult is made on the Cartesian Product of the relations that are specified in this clause. The WHERE clause corresponds to the selection. It specifies the condition for which the satisfaction degree will be calculated. Rows that do not satisfy at all the condition are rejected. This condition is a fuzzy predicate that may involve any attribute of the relations. The following is an example of a SELECT query that returns a list of hotels that are cheap. The query retrieves all rows from the Hotels table that satisfice the fuzzy predicate cheap defined by the fuzzy set μ=(∞, ∞, 25, 30). The result is sorted in descending order by the membership degree of the query.

    Read more →