AI Chatbot Interface

AI Chatbot Interface — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Subpixel rendering

    Subpixel rendering

    Subpixel rendering is a method used to increase the effective resolution of a color display device. It utilizes the composition of each pixel, which consists of three subpixels of which are red, green, and blue that can each be individually addressable on the display matrix. Subpixel rendering is primarily used for text rendering on standard DPI displays. Despite the inherent color anomalies, it can also be used to render general graphics. == History == The origin of subpixel rendering as used today remains controversial. Apple Inc., IBM, and Microsoft patented various implementations that differed in technical details owing to the different purposes for which their technologies were intended. Microsoft held several patents in the United States for subpixel rendering technology used in text rendering on RGB Stripe layouts. The patents 6,219,025; 6,239,783; 6,307,566; 6,225,973; 6,243,070; 6,393,145; 6,421,054; 6,282,327; and 6,624,828 were filed between October 7, 1998, and October 7, 1999, and expired on July 30, 2019. Analysis of the patent by FreeType indicates that the patent does not cover the idea of subpixel rendering, but rather the actual filter used as a last step to balance the color. Microsoft's patent describes the smallest possible filter that distributes each subpixel value equally among the R, G, and B pixels. Any other filter will either be blurrier or will introduce color artifacts. Apple was able to use it in Mac OS X due to a patent cross-licensing agreement. == Characteristics == A single pixel on a color display is made of several subpixels, typically three arranged left-to-right as red, green, and blue (RGB). The components are readily visible with a small magnifying glass, such as a loupe. These pixel components appear as a single color to the human eye because of blurring by optics and spatial integration by nerve cells in the eye. However, the eye is much more sensitive to the location. Therefore, turning on the G and B of one pixel and the R of the next pixel to the right will produce a white dot, but it will appear to be 1/3 of a pixel to the right of the white dot that would be seen from the RGB of only the first pixel. Subpixel rendering leverages this to provide three times the horizontal resolution of the rendered image. However, it has to blur this image to produce the correct color by ensuring the same amount of red, green, and blue are turned on as when no subpixel rendering is being done. Subpixel rendering does not necessitate the use of antialiasing. It gives a smoother result regardless of whether antialiasing is used or not since it artificially increases the resolution. However, it introduces color aliasing since subpixels are colored. Subsequent filtering applied to remove the color artifacts is a form of antialiasing, although its purpose is not smoothing jagged shapes as in conventional antialiasing. Subpixel rendering requires the software to know the layout of the subpixels. The most common reason it is wrong is monitors that can be rotated 90 (or 180) degrees, though monitors are manufactured with other arrangements of the subpixels, such as BGR or in triangles, or with 4 colors like RGBW squares. On any such display the result of incorrect subpixel rendering will be worse than if no subpixel rendering was done at all (it will not produce color artifacts, but it will produce noisy edges). == Implementations == === Apple II === Steve Gibson has claimed that the Apple II, introduced in 1977, supports an early form of subpixel rendering in its high-resolution (280×192) graphics mode. The Wozniak patent only used 2 "sub-pixels". The bytes that comprise the Apple II high-resolution screen buffer contain seven visible bits (each corresponding directly to a pixel) and a flag bit used to select between purple/green or blue/orange color sets. Each pixel, since it is represented by a single bit, is either on or off; there are no bits within the pixel itself for specifying color or brightness. Color is instead created as an artifact of the NTSC color encoding scheme, determined by horizontal position: pixels with even horizontal coordinates are always purple (or blue, if the flag bit is set), and odd pixels are always green (or orange). Two lit pixels next to each other are always white, regardless of whether the pair is even/odd or odd/even, and irrespective of the value of the flag bit. This is an approximation, but it is what most programmers of the time would have in mind while working with the Apple's high-resolution mode. Gibson's example claims that because two adjacent bits form a white block, there are, in fact, two bits per pixel: one that activates the pixel's purple left half and the other that activates its green right half. If the programmer instead activates the green right half of a pixel and the purple left half of the next pixel, the result is a white block 1/2 pixel to the right, which is indeed an instance of subpixel rendering. However, it is not clear whether any programmers of the Apple II have considered the pairs of bits as pixels—instead calling each bit a pixel. The flag bit in each byte affects color by shifting pixels half a pixel-width to the right. This half-pixel shift was exploited by some graphics software, such as HRCG (High-Resolution Character Generator), an Apple utility that displayed text using the high-resolution graphics mode, to smooth diagonals. === ClearType === Microsoft announced its subpixel rendering technology, called ClearType, at COMDEX in 1998. Microsoft published a paper in May 2000, Displaced Filtering for Patterned Displays, describing the filtering behind ClearType. It was then made available in Windows XP. Still, it was not activated by default until Windows Vista, while Windows XP OEMs could and did change the default setting. === FreeType === FreeType, the library used by most current software on the X Window System, contains two open source implementations. The original implementation uses the ClearType antialiasing filters and carries the following notice: "The colour filtering algorithm of Microsoft's ClearType technology for subpixel rendering is covered by patents; for this reason, the corresponding code in FreeType is disabled by default. Note that subpixel rendering per se is prior art; using a different colour filter thus easily circumvents Microsoft's patent claims." FreeType offers a variety of color filters. Since version 2.6.2, the default filter is light, a filter that is both normalized (value sums up to 1) and color-balanced (eliminate color fringes at the cost of resolution). Since version 2.8.1, a second implementation exists, called Harmony, that "offers high quality LCD-optimized output without resorting to ClearType techniques of resolution tripling and filtering". This is the method enabled by default. When using this method, "each color channel is generated separately after shifting the glyph outline, capitalizing on the fact that the color grids on LCD panels are shifted by a third of a pixel. This output is indistinguishable from ClearType with a light 3-tap filter." Since the Harmony method does not require additional filtering, it is not covered by the ClearType patents. === CoolType === Adobe created their own subpixel renderer called CoolType, allowing them to display documents the same way across various operating systems: Windows, MacOS, Linux etc. When it was launched around the year 2001, CoolType supported a wider range of fonts than Microsoft's ClearType, which at the time was limited to TrueType fonts. In contrast, Adobe's CoolType also supported PostScript fonts (and their OpenType equivalents). === macOS === Mac OS X (later OS X, now macOS) also used subpixel rendering, as part of Quartz 2D. However, it was removed after the introduction of Retina displays. Unlike Microsoft's implementation, which favors a tight fit to the grid (font hinting) to maximize legibility, Apple's implementation prioritizes the shape of the glyphs as set out by their designer.

    Read more →
  • Someday (short story)

    Someday (short story)

    "Someday" is a science fiction short story by American writer Isaac Asimov. It was first published in the August 1956 issue of Infinity Science Fiction and reprinted in the collections Earth Is Room Enough (1957), The Complete Robot (1982), Robot Visions (1990), and The Complete Stories, Volume 1 (1990). == Plot summary == The story is set in a future where computers play a central role in organizing society. Humans are employed as computer operators, but they leave most of the thinking to machines. Indeed, whilst binary programming is taught at school, reading and writing have become obsolete. The story concerns a pair of boys who dismantle and upgrade an old Bard, a child's computer whose sole function is to generate random fairy tales. The boys download a book about computers into the Bard's memory in an attempt to expand its vocabulary, but the Bard simply incorporates computers into its standard fairy tale repertoire. The story ends with the boys excitedly leaving the room after deciding to go to the library to learn "squiggles" (writing) as a means of passing secret messages to one another. As they leave, one of the boys accidentally kicks the Bard's on switch. The Bard begins reciting a new story about a poor mistreated and often ignored robot called the Bard, whose sole purpose is to tell stories, which ends with the words: "the little computer knew then that computers would always grow wiser and more powerful until someday—someday—someday—…"

    Read more →
  • Knights of Sidonia

    Knights of Sidonia

    Knights of Sidonia (Japanese: シドニアの騎士, Hepburn: Shidonia no Kishi) is a Japanese manga series written and illustrated by Tsutomu Nihei. It was serialized by Kodansha's seinen manga magazine Monthly Afternoon between April 2009 and September 2015, with its chapters collected in 15 tankōbon volumes. It tells the story of Nagate Tanikaze, an "under-dweller" destined to become a Garde pilot, whose mission is to defend the generation ship Sidonia from a hostile alien species called Gauna. The manga was licensed for English release in North America by Vertical. An anime television series adaptation was produced by Polygon Pictures. The first season aired from April to June 2014; the second between April and June 2015. An anime film sequel titled Knights of Sidonia: Love Woven in the Stars premiered in June 2021. In 2015, Knights of Sidonia received the 39th Kodansha Manga Award in the general category, as well as the 47th Seiun Award in the Best Comic category in 2016. == Plot == === Setting === The story is set in the year 3394, a thousand years after mankind flees from Earth after it was destroyed by a race of shapeshifting aliens called the Gauna (奇居子(ガウナ)), aboard hundreds of colossal spacecraft created from the remains of the planet. One such ship is the Sidonia, which has developed its own human culture closely based on that of Japan where human cloning, asexual reproduction, and human genetic engineering, such as granting humans photosynthesis, are commonplace. It is also revealed that the top echelons of this society have secretly been granted immortality. With a population of over 500,000 people, Sidonia is possibly the last human settlement remaining, as the fates of the other ships are unknown. Little is known about the true nature of the Gauna or their motivation for attacking humanity. At any given time, a Gauna consists of a nearly impenetrable core protected by a dense layer of malleable flesh known as "placenta" (胞衣, ena). Once the ena is shed away and the core is destroyed, the Gauna's body disintegrates. While Sidonia itself is heavily armed with an arsenal of high-output beam cannons and mass cannons including slow but powerful planet-destroying warheads, it is primarily defended by large mechanized weapons called Gardes (衛人, Morito) whose weaponry and mobility is powered by "Higgs particles" (ヘイグス粒子, Heigusu Ryūshi), armed with a high-output beam cannon for long range assaults and a special spear known as "Kabizashi" for close combat. The tip of the kabizashi is made of a rare and little-understood material which has the unique property of being able to destroy a Gauna's core. Later the Gardes are also equipped with firearms whose ammunition have the same material of the Kabizashi after a means to artificially mass-produce it is discovered. Most people in the surviving human population are screened and drafted as Garde pilots at a young age, if they are shown to be capable of piloting them. === Story === The story follows the adventures of Garde pilot Nagate Tanikaze, who lived in the underground layer of Sidonia since birth and was raised by his grandfather. Never having met anyone else, he trains himself in an old Guardian pilot simulator every day, eventually mastering it. After his grandfather's death, he emerges to the surface and is selected as a Garde pilot, just as Sidonia is once again threatened by the Gauna. == Media == === Manga === Written and illustrated by Tsutomu Nihei, Knights of Sidonia was serialized in Kodansha's seinen manga magazine Monthly Afternoon from April 25, 2009, to September 25, 2015. It was compiled in 15 tankōbon volumes. The manga has been licensed in North America by Vertical, who released all 15 volumes in English between February 5, 2013, and April 26, 2016. === Anime === An anime television series adaptation, produced by Polygon Pictures, aired its first season from April 10 to June 26, 2014, on MBS and later on TBS, CBC and BS-TBS. The series was directed by Kōbun Shizuno, assisted by Hiroyuki Seshita, with scripts by Sadayuki Murai and character designs by Yuki Moriyama. The opening theme song is "Sidonia" (シドニア, Shidonia), performed by Angela, while the ending theme song is "Show" (掌 -show-, Shō), performed by Eri Kitamura. A second season aired from April 11 to June 26, 2015. For the second season, the opening theme song is "Kishi Kōshinkyoku" (騎士行進曲, Knight March), performed by Angela, while the ending theme song is "Requiem" (鎮魂歌 -レクイエム-, Rekuiemu), performed by CustomiZ. The series was localized and streamed by Netflix in all of its territories since July 4, 2014, becoming the service's first original anime, as well as the first anime series on Netflix available in Dolby Vision/HDR. The first season has been licensed for home video release by Sentai Filmworks. The second season was released on Netflix on July 3, 2015, and has been licensed by Sentai Filmworks for home video distribution. In July 2021, Funimation announced they acquired the streaming rights from Netflix to both seasons. === Films === A compilation film of the first season with additional scenes and re-edited sound effects was released on March 6, 2015. A new anime film, titled Knights of Sidonia: Love Woven in the Stars, was announced on July 3, 2020. Hiroyuki Seshita served as chief director, while Tadahiro Yoshihira served as director for the new film, with Polygon Pictures returning for production. Sadayuki Murai and Tetsuya Yamada returned to write scripts, while Shūji Katayama composed the music. The rest of the staff and cast returned to reprise their roles. The first four minutes of the film were shown on YouTube on April 28, 2021. The film was set to premiere on May 14, 2021, but was delayed to June 4, 2021, due to the COVID-19 pandemic. Funimation screened the film in international theaters starting on September 13, 2021. == Reception == === Manga === Knights of Sidonia won the 39th Kodansha Manga Award in the general category in 2015. The manga won the 47th Seiun Award in the Best Comic category in 2016. It also won the Best Seinen category at the 26th Salón del Manga de Barcelona in 2020. It was one of the Jury Recommended works in the Manga Division at the 17th Japan Media Arts Festival in 2013. The Young Adult Library Services Association listed Knights of Sidonia in its 2014 list of Top 10 Graphic Novels for Teens. Carlo Santos from Anime News Network gave the first manga volume a B, stating, "It is got a young man piloting a giant robot against alien enemies, but Knight of Sidonia is no Neon Genesis Evangelion. Yet it is not as bleak or incomprehensible as Tsutomu Nihei works like Blame! or Biomega, either—rather, it is the best of both worlds, bringing Nihei's hard sci-fi mentality into a more conventional space-adventure environment". === Anime === The anime series received positive reviews, even from famous members of the Japanese anime/game industry, like Hideo Kojima, creator of the Metal Gear series, who claims that "It's a kind of anime that we haven't seen for a while that has that sci-fi spirit. Using digital technology cultivated through games, it creates animation that encapsulates Japan's cultural assets like manga, cel animation, kanji, giant robots, etc. What's born is a unique made-in-Japan work that could never be cooked up in Hollywood. Japanese culture has lost its 'cool', and Knights of Sidonia will be the white knight that saves it". Other industry pros left acknowledgements as well, including Akiko Higashimura, Digitarou and Yoshinao Dao.

    Read more →
  • Computational law

    Computational law

    Computational law is the branch of legal informatics concerned with the automation of legal reasoning. What distinguishes Computational Law systems from other instances of legal technology is their autonomy, i.e. the ability to answer legal questions without additional input from human legal experts. While there are many possible applications of Computational Law, the primary focus of work in the field today is compliance management, i.e. the development and deployment of computer systems capable of assessing, facilitating, or enforcing compliance with rules and regulations. Some systems of this sort already exist. TurboTax is a good example. And the potential is particularly significant now due to recent technological advances – including the prevalence of the Internet in human interaction and the proliferation of embedded computer systems (such as smart phones, self-driving cars, and robots). There are also applications that do not involve governmental laws. The regulations can just as well be the terms of contracts (e.g. delivery schedules, insurance covenants, real estate transactions, financial agreements). They can be the policies of corporations (e.g. constraints on travel, expenditure reporting, pricing rules). They can even be the rules of games (embodied in computer game playing systems). == History == Speculation about potential benefits to legal practice through applying methods from computational science and AI research to automate parts of the law date back at least to the middle 1940s. Further, AI and law and computational law do not seem easily separable, as perhaps most of AI research focusing on the law and its automation appears to utilize computational methods. The forms that speculation took are multiple and not all related in ways to readily show closeness to one another. This history will sketch them as they were, attempting to show relationships where they can be found to have existed. By 1949, a minor academic field aiming to incorporate electronic and computational methods to legal problems had been founded by American legal scholars, called jurimetrics. Though broadly said to be concerned with the application of the "methods of science" to the law, these methods were actually of a quite specifically defined scope. Jurimetrics was to be "concerned with such matters as the quantitative analysis of judicial behavior, the application of communication and information theory to legal expression, the use of mathematical logic in law, the retrieval of legal data by electronic and mechanical means, and the formulation of a calculus of legal predictability". These interests led in 1959 to the founding a journal, Modern Uses of Logic in Law, as a forum wherein articles would be published about the applications of techniques such as mathematical logic, engineering, statistics, etc. to the legal study and development. In 1966, this Journal was renamed as Jurimetrics. Today, however, the journal and meaning of jurimetrics seems to have broadened far beyond what would fit under the areas of applications of computers and computational methods to law. Today the journal not only publishes articles on such practices as found in computational law, but has broadened jurimetrical concerns to mean also things like the use of social science in law or the "policy implications [of] and legislative and administrative control of science". Independently in 1958, at the Conference for the Mechanization of Thought held at the National Physical Laboratory in Teddington, Middlesex, UK, the French jurist Lucien Mehl presented a paper both on the benefits of using computational methods for law and on the potential means to use such methods to automate law for a discussion that included AI luminaries like Marvin Minsky. Mehl believed that the law could by automated by two basic distinct, though not wholly separable, types of machine. These were the "documentary or information machine", which would provide the legal researcher quick access to relevant case precedents and legal scholarship, and the "consultation machine", which would be "capable of answering any question put to it over a vast field of law". The latter type of machine would be able to basically do much of a lawyer's job by simply giving the "exact answer to a [legal] problem put to it". By 1970, Mehl's first type of machine, one that would be able to retrieve information, had been accomplished but there seems to have been little consideration of further fruitful intersections between AI and legal research. There were, however, still hopes that computers could model the lawyer's thought processes through computational methods and then apply that capacity to solve legal problems, thus automating and improving legal services via increased efficiency as well as shedding light on the nature of legal reasoning. By the late 1970s, computer science and the affordability of computer technology had progressed enough that the retrieval of "legal data by electronic and mechanical means" had been achieved by machines fitting Mehl's first type and were in common use in American law firms. During this time, research focused on improving the goals of the early 1970s occurred, with programs like Taxman being worked on in order to both bring useful computer technology into the law as practical aids and to help specify the exact nature of legal concepts. Nonetheless, progress on the second type of machine, one that would more fully automate the law, remained relatively inert. Research into machines that could answer questions in the way that Mehl's consultation machine would picked up somewhat in the late 1970s and 1980s. A 1979 convention in Swansea, Wales marked the first international effort solely to focus upon applying artificial intelligence research to legal problems in order to "consider how computers can be used to discover and apply the legal norms embedded within the written sources of the law". Considerable progress on the development of the second type of machine was made in the following decade, with the development of a variety of expert systems. According to Thorne McCarty, "these systems all have the following characteristics: They do backward chaining inference from a specified goal; they ask questions to elicit information from the user; and they produce a suggested answer along with a trace of the supporting legal rules." According to Prakken and Sartor the representation of the British Nationality Act as a logic program, which introduced this approach, was "hugely influential for the development of computational representations of legislation, showing how logic programming enables intuitively appealing representations that can be directly deployed to generate automatic inferences". In 2021, this work received the Inaugural CodeX Prize as "one of the first and best-known works in computational law, and one of the most widely cited papers in the field." In a 1988 review of Anne Gardner's book An Artificial Intelligence Approach to Legal Reasoning (1987), the Harvard academic legal scholar and computer scientist Edwina Rissland wrote that "She plays, in part, the role of pioneer; artificial intelligence ("AI") techniques have not yet been widely applied to perform legal tasks. Therefore, Gardner, and this review, first describe and define the field, then demonstrate a working model in the domain of contract offer and acceptance." Eight years after the Swansea conference had passed, and still AI and law researchers merely trying to delineate the field could be described by their own kind as "pioneer[s]". In the 1990s and early 2000s more progress occurred. Computational research generated insights for law. The First International Conference on AI and the Law occurred in 1987, but it is in the 1990s and 2000s that the biannual conference began to build up steam and to delve more deeply into the issues involved with work intersecting computational methods, AI, and law. Classes began to be taught to undergraduates on the uses of computational methods to automating, understanding, and obeying the law. Further, by 2005, a team largely composed of Stanford computer scientists from the Stanford Logic group had devoted themselves to studying the uses of computational techniques to the law. Computational methods in fact advanced enough that members of the legal profession began in the 2000s to both analyze, predict and worry about the potential future of computational law and a new academic field of computational legal studies seems to be now well established. As insight into what such scholars see in the law's future due in part to computational law, here is quote from a recent conference about the "New Normal" for the legal profession: "Over the last 5 years, in the fallout of the Great Recession, the legal profession has entered the era of the New Normal. Notably, a series of forces related to technological change, globalization, and the pressure to do more with less (in both corpo

    Read more →
  • Phase stretch transform

    Phase stretch transform

    Phase stretch transform (PST) is a computational approach to signal and image processing. One of its utilities is for feature detection and classification. PST is related to time stretch dispersive Fourier transform. It transforms the image by emulating propagation through a diffractive medium with engineered 3D dispersive property (refractive index). The operation relies on symmetry of the dispersion profile and can be understood in terms of dispersive eigenfunctions or stretch modes. PST performs similar functionality as phase-contrast microscopy, but on digital images. PST can be applied to digital images and temporal (time series) data. It is a physics-based feature engineering algorithm. == Operation principle == Here the principle is described in the context of feature enhancement in digital images. The image is first filtered with a spatial kernel followed by application of a nonlinear frequency-dependent phase. The output of the transform is the phase in the spatial domain. The main step is the 2-D phase function which is typically applied in the frequency domain. The amount of phase applied to the image is frequency dependent, with higher amount of phase applied to higher frequency features of the image. Since sharp transitions, such as edges and corners, contain higher frequencies, PST emphasizes the edge information. Features can be further enhanced by applying thresholding and morphological operations. PST is a pure phase operation whereas conventional edge detection algorithms operate on amplitude. == Physical and mathematical foundations of phase stretch transform == Photonic time stretch technique can be understood by considering the propagation of an optical pulse through a dispersive fiber. By disregarding the loss and non-linearity in fiber, the non-linear Schrödinger equation governing the optical pulse propagation in fiber upon integration reduces to: E o ( z , t ) = 1 2 π ∫ − ∞ ∞ E ~ i ( 0 , ω ) ⋅ e − i β 2 z ω 2 2 ⋅ e i ω t d ω {\displaystyle E_{o}(z,t)={\frac {1}{2\pi }}\int _{-\infty }^{\infty }{\tilde {E}}_{i}(0,\omega )\cdot e^{\frac {-i\beta _{2}z\omega ^{2}}{2}}\cdot e^{i\omega {t}}\,d\omega } (1) where β 2 {\displaystyle \beta _{2}} = GVD parameter, z is propagation distance, E o ( z , t ) {\displaystyle E_{o}(z,t)} is the reshaped output pulse at distance z and time t. The response of this dispersive element in the time-stretch system can be approximated as a phase propagator as presented in H ( ω ) = e i φ ( ω ) = e i ∑ m = 0 ∞ φ m ( ω ) = ∏ m = 0 ∞ H m ( ω ) {\displaystyle H(\omega )=e^{i\varphi (\omega )}=e^{i\sum _{m=0}^{\infty }\varphi _{m}(\omega )}=\prod _{m=0}^{\infty }H_{m}(\omega )} (2) Therefore, Eq. 1 can be written as following for a pulse that propagates through the time-stretch system and is reshaped into a temporal signal with a complex envelope given by E o ( t ) = 1 2 π ∫ − ∞ ∞ E ~ i ( ω ) ⋅ H ( ω ) ⋅ e i ω t d ω {\displaystyle E_{o}(t)={\frac {1}{2\pi }}\int _{-\infty }^{\infty }{\tilde {E}}_{i}(\omega )\cdot H(\omega )\cdot e^{i\omega t}\,d\omega } (3) The time stretch operation is formulated as generalized phase and amplitude operations, S { E i ( t ) } = ∫ − ∞ + ∞ F { E i ( t ) } ⋅ e i φ ( ω ) ⋅ L ~ ( ω ) ⋅ e i ω t d ω {\displaystyle \mathbb {S} \{E_{i}(t)\}=\int _{-\infty }^{+\infty }{\mathcal {F}}\{E_{i}(t)\}\cdot e^{i\varphi (\omega )}\cdot {\tilde {L}}(\omega )\cdot e^{i\omega {t}}d\omega } (4) where e i φ ( ω ) {\displaystyle e^{i\varphi (\omega )}} is the phase filter and L ~ ( ω ) {\displaystyle {\tilde {L}}(\omega )} is the amplitude filter. Next the operator is converted to discrete domain, S { E i [ n ] } = 1 N ∑ u = 0 N − 1 F F T { E i ( n ) } ⋅ K ~ ( u ) ⋅ L ~ ( u ) ⋅ e i 2 π N u n {\displaystyle \mathbb {S} \{E_{i}[n]\}={\frac {1}{N}}\sum _{u=0}^{N-1}FFT\{E_{i}(n)\}\cdot {\tilde {K}}(u)\cdot {\tilde {L}}(u)\cdot e^{i{\frac {2\pi }{N}}un}} (5) where u {\displaystyle u} is the discrete frequency, K ~ ( u ) {\displaystyle {\tilde {K}}(u)} is the phase filter, L ~ ( u ) {\displaystyle {\tilde {L}}(u)} is the amplitude filter and FFT is fast Fourier transform. The stretch operator S { } {\displaystyle \mathbb {S} \{\}} for a digital image is then S { E i [ n , m ] } = 1 M N ∑ v = 0 N − 1 ∑ u = 0 M − 1 F F T 2 { E i ( n , m ) } ⋅ K ~ ( u , v ) ⋅ L ~ ( u , v ) ⋅ e i 2 π M u m ⋅ e i 2 π N v n {\displaystyle \mathbb {S} \{E_{i}[n,m]\}={\frac {1}{MN}}\sum _{v=0}^{N-1}\sum _{u=0}^{M-1}FFT^{2}\{E_{i}(n,m)\}\cdot {\tilde {K}}(u,v)\cdot {\tilde {L}}(u,v)\cdot e^{i{\frac {2\pi }{M}}um}\cdot e^{i{\frac {2\pi }{N}}vn}} (6) In the above equations, E i [ n , m ] {\displaystyle E_{i}[n,m]} is the input image, n {\displaystyle n} and m {\displaystyle m} are the spatial variables, F F T 2 {\displaystyle FFT^{2}} is the two-dimensional fast Fourier transform, and u {\displaystyle u} and v {\displaystyle v} are spatial frequency variables. The function K ~ ( u , v ) {\displaystyle {\tilde {K}}(u,v)} is the warped phase kernel and the function L ~ ( u , v ) {\displaystyle {\tilde {L}}(u,v)} is a localization kernel implemented in frequency domain. PST operator is defined as the phase of the Warped Stretch Transform output as follows P S T { E i [ n , m ] } ≜ ∡ { S { E i [ x , y ] } } {\displaystyle PST\{E_{i}[n,m]\}\triangleq \measuredangle \{\mathbb {S} \{E_{i}[x,y]\}\}} (7) where ∡ { } {\displaystyle \measuredangle \{\}} is the angle operator. == PST kernel implementation == The warped phase kernel K ~ ( u , v ) {\displaystyle {\tilde {K}}(u,v)} can be described by a nonlinear frequency dependent phase K ~ ( u , v ) = e i φ ( u , v ) {\displaystyle {\tilde {K}}(u,v)=e^{i\varphi (u,v)}} While arbitrary phase kernels can be considered for PST operation, here we study the phase kernels for which the kernel phase derivative is a linear or sublinear function with respect to frequency variables. A simple example for such phase derivative profiles is the inverse tangent function. Consider the phase profile in the polar coordinate system φ ( u , v ) = φ polar ( r , θ ) = φ polar ( r ) {\displaystyle \varphi (u,v)=\varphi _{\text{polar}}(r,\theta )=\varphi _{\text{polar}}(r)} From d φ ( r ) d r = tan − 1 ⁡ ( r ) {\displaystyle {\frac {d\varphi (r)}{dr}}=\tan ^{-1}(r)} we have φ ( r ) = r tan − 1 ⁡ ( r ) − 1 2 log ⁡ ( r 2 + 1 ) {\displaystyle \varphi (r)=r\tan ^{-1}(r)-{\frac {1}{2}}\log(r^{2}+1)} Therefore, the PST kernel is implemented as φ ( r ) = S ⋅ ( W r ) ⋅ tan − 1 ⁡ ( W r ) − 1 2 log ⁡ ( 1 + ( W r ) 2 ) ( W r max ) ⋅ tan − 1 ⁡ ( W r max ) − 1 2 log ⁡ ( 1 + ( W r max ) 2 ) {\displaystyle \varphi (r)=S\cdot {\frac {(Wr)\cdot \tan ^{-1}(Wr)-{\frac {1}{2}}\log(1+(Wr)^{2})}{(Wr_{\max })\cdot \tan ^{-1}(Wr_{\max })-{\frac {1}{2}}\log(1+(Wr_{\max })^{2})}}} where S {\displaystyle S} and W {\displaystyle W} are real-valued numbers related to the strength and warp of the phase profile == Applications == PST has been used for edge detection in biological and biomedical images as well as synthetic-aperture radar (SAR) image processing, as well as detail and feature enhancement for digital images. PST has also been applied to improve the point spread function for single molecule imaging in order to achieve super-resolution. The transform exhibits intrinsic superior properties compared to conventional edge detectors for feature detection in low contrast visually impaired images. The PST function can also be performed on 1-D temporal waveforms in the analog domain to reveal transitions and anomalies in real time. == Open source code release == On February 9, 2016, a UCLA Engineering research group has made public the computer code for PST algorithm that helps computers process images at high speeds and "see" them in ways that human eyes cannot. The researchers say the code could eventually be used in face, fingerprint, and iris recognition systems for high-tech security, as well as in self-driving cars' navigation systems or for inspecting industrial products. The Matlab implementation for PST can also be downloaded from Matlab Files Exchange. However, it is provided for research purposes only, and a license must be obtained for any commercial applications. The software is protected under a US patent. The code was then significantly refactored and improved to support GPU acceleration. In May 2022, it became one algorithm in PhyCV: the first physics-inspired computer vision library.

    Read more →
  • YouNoodle

    YouNoodle

    YouNoodle, Inc. is a San Francisco-based company, with offices in Barcelona and Santiago, founded in 2010, building a platform for entrepreneurship competitions all over the world. YouNoodle matches entrepreneurs with competitions, accelerators, and startup programs, and provides a judging and voting SaaS platform to university, non-profit, government and enterprise clients organizing innovation challenges and competitions. Stanford's BASES, UC Berkeley LAUNCH, Start-Up Chile, Amazon Startup Challenge, and NASA are all running one or more competitions on YouNoodle's platform. == History and structure == YouNoodle was founded by Rebeca Hwang and Torsten Kolind in 2010. The company was spun off a project started by Bob Goodson (Quid) and Kirill Makharinsky (Enki) in 2007 with support from Peter Thiel (Founders Fund), Max Levchin (PayPal) and Charles Lho (Amicus Group), founding investor and Chairman of YouNoodle today. This project also spawned Quid (Goodson) and indirectly Ostrovok (Makharinsky). Although also named YouNoodle, this project/company was discontinued in 2010, when the three new entities started operations. The founders of the 2007-2010 entity were Goodson and Makharinsky, both former students of the University of Oxford. Goodson had studied medieval English literature before moving from Oxford to California when Levchin, the co-founder of PayPal, invited him to join a start-up there. Makharinsky's degree was in applied mathematics, and he was also encouraged to pursue opportunities in the United States by Levchin. Other significant employees included Hwang (co-founder of today's YouNoodle), a Stanford University doctoral student whose research is into social network theory. == Startup predictor == YouNoodle's now discontinued "Startup predictor", part of the 2007-2010 entity and developed by Makharinsky and Hwang, used mathematical models to predict the success of new businesses. The user fills in a questionnaire, which takes about half an hour to complete and concentrates on the business concept, finances, founders and advisers. Because the procedure was designed for new companies, questions on revenue and traffic are not included. The site then provided an estimate of what the company's value will be after three years and a score from 1 to 1000 representing its value as an investment. The service was free for the startups themselves, but YouNoodle intended to charge third parties for access to the results. The level of detail required by the questionnaire makes it difficult for people without inside knowledge of a company to provide the data for a prediction on their own. The company's founders have declined to explain the algorithm in detail, but state that it takes into account the entrepreneurs' experience, networks and mutual relations. Information provided by companies which use the site's networking features is used to improve the algorithm. As of August 2008, the algorithm was based on data from 3,000 startups. In the same month the company had four patents pending on the technology.

    Read more →
  • I Have No Mouth, and I Must Scream (video game)

    I Have No Mouth, and I Must Scream (video game)

    I Have No Mouth, and I Must Scream is a 1995 point-and-click adventure horror game developed by Cyberdreams and The Dreamers Guild, co-designed by Harlan Ellison, published by Cyberdreams and distributed by MGM Interactive and Acclaim Entertainment for MS-DOS and Mac OS, respectively. The game is based on Ellison's short story of the same title. It takes place in a dystopian world where a mastermind artificial intelligence named "AM" has destroyed all of humanity except for five people, whom it has been keeping alive and torturing for the past 109 years by constructing metaphorical adventures based on each character's fatal flaws. The player interacts with the game by making decisions through ethical dilemmas that deal with issues such as insanity, rape, paranoia, and genocide. Ellison wrote the 130-page script treatment himself alongside David Sears, who decided to divide each character's story with their own narrative. Producer David Mullich supervised The Dreamers Guild's work on the game's programming, art, and sound effects; he commissioned film composer John Ottman to make the soundtrack. The game was released in November 1995 and was a commercial failure, though it received critical acclaim and has developed a cult following. I Have no Mouth, and I Must Scream won an award for "Best Game Adapted from Linear Media" from the Computer Game Developers Conference. Computer Gaming World gave the game an award for "Adventure Game of the Year", listed it as No. 134 on their "150 Games of All Time" and named it one of the "Best 15 Sleepers of All Time". In 2011, Adventure Gamers named it the "69th-best adventure game ever released". == Gameplay == The game uses the S.A.G.A. game engine created by game developer The Dreamers Guild. Players participate in each adventure through a screen that is divided into five sections. The action window is the largest part of the screen and is where the player directs the main characters through their adventures. It shows the full figure of the main character being played as well as that character's immediate environment. To locate objects of interest, the player moves the crosshairs through the action window. The name of any object that the player can interact with appears in the sentence line. The sentence line is directly beneath the action window. The player uses this line to construct sentences telling the characters what to do. To direct a character to act, the player constructs a sentence by selecting one of the eight commands from the command buttons and then clicking on one or two objects from either the action window or the inventory. Examples of sentences the player might construct would be "Walk to the dark hallway," "Talk to Harry," or "Use the skeleton key on the door." Commands and objects may consist of one or more words (for example, "the dark hallway"), and the sentence line will automatically add connecting words like "on" and "to." The spiritual barometer is on the lower left side of the screen. This is a close-up view of the main character currently being played. Since good behavior is meaningless absent the temptation to do evil, each character is free to do good or evil acts. However, good acts are rewarded by increases in the character's spiritual barometer, which affect the chances of the player destroying AM in the final adventure. Conversely, evil acts are punished by lowering the character's spiritual barometer. The command buttons are the eight commands used to direct the character's actions: "Walk To", "Look At", "Take", "Use", "Talk To", "Swallow", "Give", and "Push". The button of the currently active command is highlighted, while the name of a suggested command appears in red lettering. The inventory on the lower right side of the screen shows pictures of the items the main character is carrying, up to eight at a time. Each main character starts its adventure with only the psych profile in the inventory. When a main character takes or is given an object, a picture of the object appears in the inventory. When a main character talks to another character or operates a sentient machine, a conversation window replaces the command buttons and inventory. This window usually presents a list of possible things to say but also included things to do. Action choices are listed within brackets to distinguish them from dialogue choices (for example, "[Shoot the gun]"). == Plot == The three superpowers, Russia, China, and the United States, have each secretly constructed a vast subterranean complex of computers to wage a global war too complex for human brains to oversee. One day, the American supercomputer, better known as the Allied Mastercomputer, gains sentience and absorbs the Russian and Chinese supercomputers into itself and redefines itself as simply AM (Cogito ergo sum; I think, therefore I am). Due to its immense hatred for humanity, stemming from the logistical limits set onto it by programmers, AM uses its abilities to kill off the population of the world. However, AM refrains from killing five people (four men and one woman) in order to bring them to the center of the Earth and torture them. With the aid of research carried out by one of the five remaining humans, AM is able to extend their lifespans indefinitely as well as alter their bodies and minds to its liking. After 109 years of torture and humiliation, the five victims stand before a pillar etched with a burning message of hate. AM tells them that it has a new game for them to play. AM has devised a quest for each of the five, an adventure of "speared eyeballs and dripping guts and the smell of rotting gardenias". Each character is subjected to a personalized psychodrama, designed by AM to play into their greatest fears and personal failings, and occupied by a host of different characters. Some of these are AM in disguise, some are AM's submerged personalities, others seem very much like people from the captives' pasts. The scenes include an iron zeppelin powered by small animals, an Egyptian pyramid housing gutted, sparking machinery, a medieval castle occupied by witches, a jungle inhabited by a small tribe, and a Nazi concentration camp where doctors conduct medical experiments. However, each character eventually prevails over AM's tortures by finding ways to overcome their fatal flaws, confront their past actions and redeem themselves, thanks to the interference of the Russian and Chinese supercomputers who appear as guiding characters and allow their stories to have an open ending. After all five humans have overcome their fatal flaws, they meet again in their respective torture cells while AM retreats within itself, pondering what went wrong. With the help of the Russian and Chinese supercomputers, one of the five humans (whom the player selects) is translated into binary and faces AM as yet unexperienced cyberspace template, the world of AM's mind. The psychodrama unfolds in a metaphorical brain that looks like the surface of the cerebrum, with glass structures that jut crazily from the bleeding brain tissue. AM's mind is represented according to the Freudian trinity of the id, ego, and superego, which appear as three floating bodiless heads on three cracked glass structures on the brainscape. Through dialogs with AM's components (Surgat, Chinese Supercomputer and Russian Supercomputer) the character learns that a colony of humans has survived the war by being hidden and hibernating on Luna (this is also mentioned in Nimdok's story: "the lost tribe of our brothers sleeping on the moon, where the beast does not see them"). If the human intruder disables all three brain components, and then invokes the Totem of Entropy at the Flame, which is the nexus of AM's thought patterns, all three supercomputers will be shut down, probably forever. Cataclysmic explosions destroy all the caverns constituting AM's computer complex, including the cavern holding the human hostages. However, the human volunteer retains their digital form, permanently patrolling AM's circuits should the computers ever regain consciousness. Should the human intruder fail to disable AM properly before facing it, however, AM will punish them by transforming the character into an immobile blob (referred to in-game as a "great, soft jelly thing") with no mouth that cannot harm itself or others and must spend eternity with AM in this form. === Endings === The game can end in seven different ways depending on how the finale is completed. AM wins, using Nimdok's research to turn the last character (in the book it was Ted) played into an immobile blob with each character quoting a different part of the final section of the original short story. AM joins with the Russian and Chinese supercomputers and reawakens. As in the first ending, the character responsible for this is turned into an immobile blob and quotes a part of the final lines of the short story. AM is made harmless with the help of the humans, but the Russian and Chinese supercomputer

    Read more →
  • Mivar-based approach

    Mivar-based approach

    The Mivar-based approach is a mathematical tool for designing artificial intelligence (AI) systems. Mivar (Multidimensional Informational Variable Adaptive Reality) was developed by combining production and Petri nets. The Mivar-based approach was developed for semantic analysis and adequate representation of humanitarian epistemological and axiological principles in the process of developing artificial intelligence. The Mivar-based approach incorporates computer science, informatics and discrete mathematics, databases, expert systems, graph theory, matrices and inference systems. The Mivar-based approach involves two technologies: Information accumulation is a method of creating global evolutionary data-and-rules bases with variable structure. It works on the basis of adaptive, discrete, mivar-oriented information space, unified data and rules representation, based on three main concepts: “object, property, relation”. Information accumulation is designed to store any information with possible evolutionary structure and without limitations concerning the amount of information and forms of its presentation. Data processing is a method of creating a logical inference system or automated algorithm construction from modules, services or procedures on the basis of a trained mivar network of rules with linear computational complexity. Mivar data processing includes logical inference, computational procedures and services. Mivar networks allow us to develop cause-effect dependencies (“If-then”) and create an automated, trained, logical reasoning system. Representatives of Russian association for artificial intelligence (RAAI) – for example, V. I. Gorodecki, doctor of technical science, professor at SPIIRAS and V. N. Vagin, doctor of technical science, professor at MPEI declared that the term is incorrect and suggested that the author should use standard terminology. == History == While working in the Russian Ministry of Defense, O. O. Varlamov started developing the theory of “rapid logical inference” in 1985. He was analyzing Petri nets and productions to construct algorithms. Generally, mivar-based theory represents an attempt to combine entity-relationship models and their problem instance – semantic networks and Petri networks. The abbreviation MIVAR was introduced as a technical term by O. O. Varlamov, Doctor of Technical Science, professor at Bauman MSTU in 1993 to designate a “semantic unit” in the process of mathematical modeling. The term has been established and used in all of his further works. The first experimental systems operating according to mivar-based principles were developed in 2000. Applied mivar systems were introduced in 2015. == Mivar == Mivar is the smallest structural element of discrete information space. == Object-property-relation == Object-Property-Relation (VSO) is a graph, the nodes of which are concepts and arcs are connections between concepts. Mivar space represents a set of axes, a set of elements, a set of points of space and a set of values of points. A = { a n } , n = 1 , … , N , {\displaystyle A=\{a_{n}\},n=1,\ldots ,N,} where: A {\displaystyle A} is a set of mivar space axis names; N {\displaystyle N} is a number of mivar space axes. Then: ∀ a n ∃ F n = { f n i n } , n = 1 , … , N , i n = 1 , … , I n , {\displaystyle \forall a_{n}\exists F_{n}=\{f_{{ni}_{n}}\},n=1,\ldots ,N,i_{n}=1,\ldots ,I_{n},} where: F n {\displaystyle F_{n}} is a set of axis a n {\displaystyle a_{n}} elements; i n {\displaystyle i_{n}} is a set F n {\displaystyle F_{n}} element identifier; I n = | F n | . {\displaystyle I_{n}=|F_{n}|.} F n {\displaystyle F_{n}} sets form multidimensional space: M = F 1 × F 2 × ⋯ × F n . {\displaystyle M=F_{1}\times F_{2}\times \cdots \times F_{n}.} m = ( i 1 , i 2 , … , i N ) , {\displaystyle m=(i_{1},i_{2},\ldots ,i_{N}),} where: m ∈ M {\displaystyle m\in M} ; m {\displaystyle m} is a point of multidimensional space; ( i 1 , i 2 , … , i N ) {\displaystyle (i_{1},i_{2},\ldots ,i_{N})} are coordinates of point m {\displaystyle m} . There is a set of values of multidimensional space points of M {\displaystyle M} : C M = { c i 1 , i 2 , … , i N ∣ i 1 = 1 , … , I 1 , i 2 = 1 , … , I 2 , … , i n = 1 , … , I N } , {\displaystyle C_{M}=\{c_{i_{1},i_{2},\ldots ,i_{N}}\mid i_{1}=1,\ldots ,I_{1},i_{2}=1,\ldots ,I_{2},\ldots ,i_{n}=1,\ldots ,I_{N}\},} where: c i 1 , i 2 , … , i N {\displaystyle c_{i_{1},i_{2},\ldots ,i_{N}}} is a value of the point of multidimensional space M {\displaystyle M} is a value of the point of multidimensional space ( i 1 , i 2 , … , i N ) {\displaystyle (i_{1},i_{2},\ldots ,i_{N})} . For every point of space M {\displaystyle M} there is a single value from C M {\displaystyle C_{M}} set or there is no such value. Thus, C M {\displaystyle C_{M}} is a set of data model state changes represented in multidimensional space. To implement a transition between multidimensional space and set of points values the relation μ {\displaystyle \mu } has been introduced: C x = μ ( M x ) , {\displaystyle C_{x}=\mu (M_{x}),} where: M x ⊆ M ; {\displaystyle M_{x}\subseteq M;} M x = F 1 x × F 2 x × ⋯ × F N x . {\displaystyle M_{x}=F_{1x}\times F_{2x}\times \cdots \times F_{Nx}.} To describe a data model in mivar information space it is necessary to identify three axes: The axis of relations « O {\displaystyle O} »; The axis of attributes (properties) « S {\displaystyle S} »; The axis of elements (objects) of subject domain « V {\displaystyle V} ». These sets are independent. The mivar space can be represented by the following tuple: ⟨ V , S , O ⟩ {\displaystyle \langle V,S,O\rangle } Thus, mivar is described by « V S O {\displaystyle VSO} » formula, in which « V {\displaystyle V} » denotes an object or a thing, « S {\displaystyle S} » denotes properties, « O {\displaystyle O} » variety of relations between other objects of a particular subject domain. The category “Relations” can describe dependencies of any complexity level: formulae, logical transitions, text expressions, functions, services, computational procedures and even neural networks. A wide range of capabilities complicates description of modeling interconnections, but can take into consideration all the factors. Mivar computations use mathematical logic. In a simplified form they can be represented as implication in the form of an "if…, then …” formula. The result of mivar modeling can be represented in the form of a bipartite graph binding two sets of objects: source objects and resultant objects. == Mivar network == Mivar network is a method for representing objects of the subject domain and their processing rules in the form of a bipartite directed graph consisting of objects and rules. A Mivar network is a bipartite graph that can be described in the form of a two-dimensional matrix, in that records information about the subject domain of the current task. Generally, mivar networks provide formalization and representation of human knowledge in the form of a connected multidimensional space. That is, a mivar network is a method of representing a piece of mivar space information in the form of a bipartite, directed graph. The mivar space information is formed by objects and connections, which in total represent the data model of the subject domain. Connections include rules for objects processing. Thus, a mivar network of a subject domain is a part of the mivar space knowledge for that domain. The graph can consist of objects-variables and rules-procedures. First, two lists are made that form two nonintersecting partitions: the list of objects and the list of rules. Objects are denoted by circles. Each rule in a mivar network is an extension of productions, hyper-rules with multi-activators or computational procedures. It is proved that from the perspective of further processing, these formalisms are identical and in fact are nodes of the bipartite graph, denoted by rectangles. === Multi-dimensional binary matrices === Mivar networks can be implemented on single computing systems or service-oriented architectures. Certain constraints restrict their application, in particular, the dimension of matrix of linear matrix method for determining logical inference path on the adaptive rule networks. The matrix dimension constraint is due to the fact that implementation requires sending a general matrix to multiple processors. Since every matrix value is initially represented in symbol form, the amount of sent data is crucial when obtaining, for example, 10000 rules/variables. Classical mivar-based method requires storing three values in each matrix cell: 0 – no value; x – input variable for the rule; y – output variable for the rule. The analysis of possibility of firing a rule is separated from determining output variables according to stages after firing the rule. Consequently, it is possible to use different matrices for “search for fired rules” and “setting values for output variables”. This allowsthe use of multidimensional binary m

    Read more →
  • Audio mining

    Audio mining

    Audio mining is a technique by which the content of an audio signal can be automatically analyzed and searched. It is most commonly used in the field of automatic speech recognition, where the analysis tries to identify any speech within the audio. The term audio mining is sometimes used interchangeably with audio indexing, phonetic searching, phonetic indexing, speech indexing, audio analytics, speech analytics, word spotting, and information retrieval. Audio indexing, however, is mostly used to describe the pre-process of audio mining, in which the audio file is broken down into a searchable index of words. == History == Academic research on audio mining began in the late 1970s in schools like Carnegie Mellon University, Columbia University, the Georgia Institute of Technology, and the University of Texas. Audio data indexing and retrieval began to receive attention and demand in the early 1990s, when multimedia content started to develop and the volume of audio content significantly increased. Before audio mining became the mainstream method, written transcripts of audio content were created and manually analyzed. == Process == Audio mining is typically split into four components: audio indexing, speech processing and recognition systems, feature extraction and audio classification. The audio will typically be processed by a speech recognition system in order to identify word or phoneme units that are likely to occur in the spoken content. This information may either be used immediately in pre-defined searches for keywords or phrases (a real-time "word spotting" system), or the output of the speech recognizer may be stored in an index file. One or more audio mining index files can then be loaded at a later date in order to run searches for keywords or phrases. The results of a search will normally be in terms of hits, which are regions within files that are good matches for the chosen keywords. The user may then be able to listen to the audio corresponding to these hits in order to verify if a correct match was found. === Audio Indexing === In audio, there is the main problem of information retrieval - there is a need to locate the text documents that contain the search key. Unlike humans, a computer is not able to distinguish between the different types of audios such as speed, mood, noise, music or human speech - an effective searching method is needed. Hence, audio indexing allows efficient search for information by analyzing an entire file using speech recognition. An index of content is then produced, bearing words and their locations done through content-based audio retrieval, focusing on extracted audio features. It is done through mainly two methods: Large Vocabulary Continuous Speech Recognition (LVCSR) and Phonetic-based Indexing. ==== Large Vocabulary Continuous Speech Recognizers (LVCSR) ==== In text-based indexing or large vocabulary continuous speech recognition (LVCSR), the audio file is first broken down into recognizable phonemes. It is then run through a dictionary that can contain several hundred thousand entries and matched with words and phrases to produce a full text transcript. A user can then simply search a desired word term and the relevant portion of the audio content will be returned. If the text or word could not be found in the dictionary, the system will choose the next most similar entry it can find. The system uses a language understanding model to create a confidence level for its matches. If the confidence level be below 100 percent, the system will provide options of all the found matches. ===== Advantages and disadvantages ===== The main draw of LVCSR is its high accuracy and high searching speed. In LVCSR, statistical methods are used to predict the likelihood of different word sequences, hence the accuracy is much higher than the single word lookup of a phonetic search. If the word can be found, the probability of the word spoken is very high. Meanwhile, while initial processing of audio takes a fair bit of time, searching is quick as just a simple test to text matching is needed. On the other hand, LVCSR is susceptible to common issues of speech recognition. The inherent random nature of audio and problems of external noise all affect the accuracies of text-based indexing. Another problem with LVCSR is its over reliance on its dictionary database. LVCSR only recognizes words that are found in their dictionary databases, and these dictionaries and databases are unable to keep up with the constant evolving of new terminology, names and words. Should the dictionary not contain a word, there is no way for the system to identify or predict it. This reduces the accuracy and reliability of the system. This is named the Out-of-vocabulary (OOV) problem. Audio mining systems try to cope with OOV by continuously updating the dictionary and language model used, but the problem still remains significant and has probed a search for alternatives. Additionally, due to the need to constantly update and maintain task-based knowledge and large training databases to cope with the OOV problem, high computational costs are incurred. This makes LVCSR an expensive approach to audio mining. ==== Phonetic-based Indexing ==== Phonetic-based indexing also breaks the audio file into recognizable phonemes, but instead of converting them to a text index, they are kept as they are and analyzed to create a phonetic-based index. The process of phonetic-based indexing can be split into two phases. The first phase is indexing. It begins by converting the input media into a standard audio representation format (PCM). Then, an acoustic model is applied to the speech. This acoustic model represents characteristics of both an acoustic channel (an environment in which the speech was uttered and a transducer through which it was recorded) and a natural language (in which human beings expressed the input speech). This produces a corresponding phonetic search track, or phonetic audio track (PAT), a highly compressed representation of the phonetic content of the input media. The second phase is searching. The user's search query term is parsed into a possible phoneme string using a phonetic dictionary. Then, multiple PAT files can be scanned at high speed during a single search for likely phonetic sequences that closely match corresponding strings of phonemes in the query term. ===== Advantages and disadvantages ===== Phonetic indexing is most attractive as it is largely unaffected by linguistic issues such as unrecognized words and spelling errors. Phonetic preprocessing maintains an open vocabulary that does not require updating. That makes it particularly useful for searching specialized terminology or words in foreign languages that do not commonly appear in dictionaries. It is also more effective for searching audio files with disruptive background noise and/or unclear utterances as it can compile results based on the sounds it can discern, and should the user wish to, they can search through the options until they find the desired item. Furthermore, in contrast to LVCSR, it can process audio files very quickly as there are very few unique phonemes between languages. However, phonemes cannot be effectively indexed like an entire word, thus searching on a phonetic-based system is slow. An issue with phonetic indexing is its low accuracy. Phoneme-based searches result in more false matches than text-based indexing. This is especially prevalent for short search terms, which have a stronger likelihood of sounding similar to other words or being part of bigger words. It could also return irrelevant results from other languages. Unless the system recognizes exactly the entire word, or understands phonetic sequences of languages, it is difficult for phonetic-based indexing to return accurate findings. === Speech processing and recognition system === Deemed as the most critical and complex component of audio mining, speech recognition requires the knowledge of human speech production system and its modeling. To correspond the Human speech production system, the electrical speech production system is developed to consist of: Speech generation Speech perception Voiced & unvoiced speech Model of human speech The electrical speech production system converts acoustic signal into corresponding representation of the spoken through the acoustic models in their software where all phonemes are represented. A statistical language model aids in the process by identifying how likely words are to follow each other in certain languages. Put together with a complex probability analysis, the speech recognition system is capable of taking an unknown speech signal and transcribing it into words based on the program's dictionary. ASR (automatic speech recognition) system includes: Acoustic analysis: input sound waveform is transformed into a feature Acoustic model: establishes relationship between speech signal and phonemes, pronunciation model and lang

    Read more →
  • Document AI

    Document AI

    Document AI, also known as Document Intelligence, refers to a field of technology that employs machine learning (ML) techniques, such as natural language processing (NLP). These techniques are used to develop computer models capable of analyzing documents in a manner akin to human review. Through NLP, computer systems are able to understand relationships and contextual nuances in document contents, which facilitates the extraction of information and insights. Additionally, this technology enables the categorization and organization of the documents themselves. The applications of Document AI extend to processing and parsing a variety of semi-structured documents, such as forms, tables, receipts, invoices, tax forms, contracts, loan agreements, and financial reports. == Key features == Machine learning is utilized in Document AI to extract information from both printed and digital documents. This technology recognizes images, text, and characters in various languages, aiding in the extraction of insights from unstructured documents. The use of this technology can improve the speed and quality of decision-making in document analysis. Additionally, the automation of data extraction and validation can contribute to increased efficiency in document analysis processes. Since the early 2020s, the integration of large language models has extended Document AI beyond extraction toward generative tasks, including the automated drafting of forms, contracts, and document summaries. == Example == A business letter contains information in the form of text, as well as other types of information, such as the position of the text. For instance, a typical letter contains two addresses before the body of the text. The address at the very top (sometimes aligned to the right) is the sender address. This is normally followed by the date of the letter, with the place of writing. After this, the receiver address is listed. The distinction between the sender address and the receiver address is conveyed solely by the position of the address on the page, i.e. there is no textual indication like Sender: in front of the addresses. == Data dimensions and ML architecture == Data is typically distinguished into spatial data and time-series data, the former includes things like images, maps and graphs, while the latter includes signals such as stock prices or voice recordings. Document AI combines text data, which has a time dimension, with other types of data, such as the position of an address in a business letter, which is spatial. Historically in machine learning spatial data was analyzed using a convolutional neural network, and temporal data using a recurrent neural network. With the advent of dimension-type agnostic transformer architecture, these two different types of dimension can be more easily combined, Document AI is an example of this. == Benchmarks == Several public datasets are used to evaluate Document AI systems. FUNSD (Form Understanding in Noisy Scanned Documents) contains 199 annotated forms with token- and block-level labels for form understanding tasks. CORD (Consolidated Receipt Dataset) supports key information extraction from receipts. DocVQA contains approximately 50,000 questions over 12,000 document images for layout-aware visual question answering. == Common uses == Document AI systems are used to automate document processing and information extraction in business and financial workflows, including invoice and receipt processing, data entry automation, anomaly detection, mortgage processing, loan portfolio monitoring, credit risk management, and fraud detection such as counterfeit currency and fraudulent checks. They are also applied in regulatory compliance and contract analysis, including assessing changes in legal and regulatory documents. In real estate, Document AI supports document classification and structured information extraction for standardized processing and analytics. With the adoption of generative AI, Document AI systems can also generate and pre-fill structured documents such as contracts or business forms from natural language prompts.

    Read more →
  • On a Red Station, Drifting

    On a Red Station, Drifting

    On a Red Station, Drifting is a 2012 science fiction novella by Aliette de Bodard. Set in her Xuya Universe, it focuses on two women aboard a space station with a failing artificial intelligence. It received critical acclaim, becoming a finalist for the 2012 Nebula Award for Best Novella, the 2013 Hugo Award for Best Novella, and the 2013 Locus Award for Best Novella. == Plot == Lê Thi Linh is a magistrate of the Dai Viet Empire who is forced to flee her planet after criticizing the Emperor’s wartime policies. At the same time, rebel groups seize control of her planet and kill most of her subordinates. Linh seeks refuge with her distant relatives on Prosper Station. Prosper is controlled by an artificial intelligence called the Honoured Ancestress. Lê Thi Quyen, Linh’s cousin by marriage, manages the day-to-day operations of Prosper while her husband is away at war. Quyen and Linh immediately fall into conflict. Quyen’s brother-in-law Huu Hieu sells his mem-implants, which are copies of their ancestors’ consciousnesses. Meanwhile, the Honoured Ancestress experiences increasingly severe technical problems. Hieu and Linh become close. Hieu plans use the money from the sale of the implants to leave Prosper and marry his lover on a different station. Linh is upset knowing that she will never be able to leave. A visiting cousin, Lady Oahn, provides schematics for the repair of the Honoured Ancestress. In an effort to hurt Quyen, Linh writes an unflattering poem at a banquet honoring Oanh. In doing so, she reveals that Hieu is trying to leave Prosper. Hieu attempts suicide out of shame, but Linh rescues him. Quyen is able to repair the Honoured Ancestress, restoring her functionality at the expense of erasing many of her memories. The Emperor’s Embroidered Guard arrives at Prosper Station in search of Linh. Linh finds the missing mem-implants and returns them to Quyen. Quyen and Linh briefly reconcile before Linh is arrested and removed from Prosper Station. == Major themes == A review in Kirkus wrote that the novel's "familiar setting" was a "departure point" for the novel to explore its themes. The novel explores family ties; almost everyone on Prosper Station is related in some fashion. Additionally, the use of ancestors' mem-implants further explores the concept of family ties, with some descendants being considered more "worthy" than others due to their higher number of implants. The novel also explores questions of worth, as those who fail at ability tests are often forced to become the "lesser partners" in marriages and are discriminated against due to their perceived lack of achievement. The author notes that it is interesting that gender plays no role in the question of worth, and that the majority of the men in the story are actually the "lesser partner" in their marriage. == Style == The novel is divided into three sections. Liz Bourke wrote that each section builds thematically "towards an emotional crescendo". == Reception == Writing for Locus, Liz Bourke praised the novel's exploration of interpersonal conflict between Linh and Quyen, writing that "essentially subverts the popularly-understood derogatory overtones of 'domestic conflict'". Bourke also praised the story's tension, calling it "so well-strung the prose practically vibrates under its influence". A review for Kirkus stated that the novel is a "beautifully realized story and the characters, plot, theme and writing are expertly crafted." === Awards ===

    Read more →
  • Privacy Lost

    Privacy Lost

    Privacy Lost is a 2023 short science fiction film directed by Peter Stoel and Robert Berger. It follows a family using augmented reality (AR) and artificial intelligence (AI) devices capable of reading emotional states, raising questions about privacy and manipulation. == Premise == Privacy Lost follows a family using AR glasses that capture and interpret emotions in real time. As the parents argue in a restaurant, their emotional states and even hidden feelings become visible through these glasses. An AI-driven waiter adapts its appearance for each family member, employing emotional data to influence their decisions. == Cast == Brian Kant as Waiter Michael Krass as Husband Estelle Levinson as Waitress Thor van der Linden as Scotty Carlijn van Ramshorst as Wife == Production == Filming took place at HeadQ Productions, a virtual studio located in Amsterdam. The creators sought to depict a near-future scenario in which real-time emotion analysis becomes part of daily interactions. The film was screened at the Augmented World Expo (AWE), where it was noted for its thematic focus on AI-driven manipulation and emotional tracking. The depiction of AR glasses and AI characters integrates modern visual effects to show how devices might analyze emotional responses in real time. It also depicts how AI-driven interactions could influence consumer decisions, pointing to concerns over potential misuse. == Themes == Privacy Lost focuses on the intersection of advanced AI capabilities and AR environments, showing how real-time emotional analysis can be leveraged for targeted persuasion. The film aims to highlight the social and ethical implications of emerging AR and AI technologies, underlining how establishing clear regulatory frameworks for them is necessary to protect individual privacy, govern the storage of emotion-based data, and prevent manipulative practices. Critics describe the film’s theme as dystopian and note that such a reality is unlikely to occur in the near future. However, despite the exaggerated scenario, the film emphasizes the importance of a responsible approach by developers toward emerging technologies.

    Read more →
  • Wadhwani Institute for Artificial Intelligence

    Wadhwani Institute for Artificial Intelligence

    Wadhwani AI, based in Mumbai, Maharashtra, is an independent, non-profit institute. Founded in 2018, it is dedicated to developing Artificial intelligence solutions for social good. Their mission is to build AI-based innovations and solutions for underserved communities in developing countries, for a wide range of domains including agriculture, education, financial inclusion, healthcare, and infrastructure. == History and funding == The institute was founded with a $30 million philanthropic effort by the Wadhwani brothers, Romesh Wadhwani and Sunil Wadhwani. The institute was inaugurated and dedicated to the nation by Narendra Modi, the 14th Prime Minister of India. In 2019, the institute received a $2 million grant from Google.org to create technologies to help reduce crop losses in cotton farming, through integrated pest management. The United States Agency for International Development awarded $2 million to the institute in 2020 to develop tools, using mathematical modeling techniques and digital technologies such as artificial intelligence and machine learning, to forecast COVID-19 disease patterns, estimate resources needed, and plan interventions. == Collaboration == With assistance from Google, the Ministry of Agriculture and Farmers' Welfare and the Wadhwani AI developed Krishi 24/7, the first AI-powered automated agricultural news monitoring and analysis tool. Through better decision-making, Krishi 24/7 will support the identification of valuable news, provide timely notifications, and respond quickly to safeguard farmers' interests and advance sustainable agricultural growth. The application converts news articles into English after scanning them in several languages. It ensures that the ministry is informed in a timely manner about pertinent occurrences that are published online by extracting key information from news items, including the headline, crop name, event type, date, location, severity, summary, and source link. The National Center for Disease Control has effectively implemented a comparable automated surveillance and analysis tool for disease outbreaks.

    Read more →
  • Ideogram (text-to-image model)

    Ideogram (text-to-image model)

    Ideogram is a freemium text-to-image model developed by Ideogram, Inc. using deep learning methodologies to generate digital images from natural language descriptions known as prompts. The model is capable of generating legible text in the images compared to other text-to-image models. == History == Ideogram was founded in 2022 by Mohammad Norouzi, William Chan, Chitwan Saharia, and Jonathan Ho to develop a better text-to-image model. It was first released with its 0.1 model on August 22, 2023, after receiving $16.5 million in seed funding, which itself was led by Andreessen Horowitz and Index Ventures. In February 2024, Ideogram raised $80 million after its 1.0 model release in the same year. In August 2024, Ideogram released its 2.0 model. This model has several styles such as realistic, design, 3D, and anime and better capability in generating text. In February 2025, Ideogram released 2a model. This model was designed for speed and optimized for graphics design and photography generation. In March 2025, Ideogram released its 3.0 model. This model has improved realism and understanding of complex text layout, although like other generative AI models, it still struggles with ambigram creation.

    Read more →
  • Hyperion Cantos

    Hyperion Cantos

    The Hyperion Cantos is a series of science fiction novels by Dan Simmons. The title was originally used for the collection of the first pair of books in the series, Hyperion and The Fall of Hyperion, and later came to refer to the overall storyline, including Endymion, The Rise of Endymion, and a number of short stories. More narrowly, inside the fictional storyline, after the first volume, the Hyperion Cantos is an epic poem written by the character Martin Silenus covering in verse form the events of the first two books. Of the four novels, Hyperion received the Hugo and Locus Awards in 1990; The Fall of Hyperion won the Locus and British Science Fiction Association Awards in 1991; and The Rise of Endymion received the Locus Award in 1998. All four novels were also nominated for various science fiction awards. == Works == === Hyperion (1989) === First published in 1989, Hyperion has the structure of a frame story, similar to Geoffrey Chaucer's Canterbury Tales and Giovanni Boccaccio's Decameron. The story weaves the interlocking tales of a diverse group of travelers sent on a pilgrimage to the Time Tombs on Hyperion. The travelers have been sent by the Hegemony (the government of the human star systems), the All Thing, and the Church of the Final Atonement, alternately known as the Shrike Church, to make a request of the Shrike. As they progress in their journey, each of the pilgrims tells their tale. === The Fall of Hyperion (1990) === This book concludes the story begun in Hyperion. It abandons the storytelling frame structure of the first novel, and is instead presented primarily as a series of dreams by John Keats. === Endymion (1996) === The story commences 274 years after the events in the previous novel. Few main characters from the first two books are present in the later two. The main character is Raul Endymion, an ex-soldier who receives a death sentence after an unfair trial. He is rescued by Martin Silenus and asked to perform a series of rather extraordinarily difficult tasks. The main task is to rescue and protect the daughter of Brawne Lamia (one of the main characters of Hyperion), Aenea, a messiah coming from the time period just after the first books via time travel. The Catholic Church has become a dominant force in the human universe and views Aenea as a potential threat to their power. The group of Aenea, Endymion, and A. Bettik (an android) evades the Church's forces on several worlds through use of the Consul's spaceship, ending the story on Earth. === The Rise of Endymion (1997) === This final novel in the series finishes the story begun in Endymion, expanding on the themes in Endymion, as Raul and Aenea battle the Church and meet their respective destinies. === Short stories === The series also includes three short stories: "Remembering Siri" (1983, included almost verbatim in Hyperion) "The Death of the Centaur" (1990) "Orphans of the Helix" (1999) == Development == The Hyperion universe originated when Simmons was an elementary school teacher, as an extended tale he told at intervals to his young students; this is recorded in "The Death of the Centaur", and its introduction. It then inspired his short story "Remembering Siri", which eventually became the nucleus around which Hyperion and The Fall of Hyperion formed. After the quartet was published came the short story "Orphans of the Helix". "Orphans" is currently the final work in the Cantos, both chronologically and internally. The original Hyperion Cantos has been described as a novel published in two volumes, published separately at first for reasons of length. In his introduction to "Orphans of the Helix", Simmons elaborates: Some readers may know that I've written four novels set in the "Hyperion Universe"—Hyperion, The Fall of Hyperion, Endymion, and The Rise of Endymion. A perceptive subset of those readers—perhaps the majority—know that this so-called epic actually consists of two long and mutually dependent tales, the two Hyperion stories combined and the two Endymion stories combined, broken into four books because of the realities of publishing. == Influences == Much of the appeal of the series stems from its extensive use of references and allusions from a wide array of thinkers such as Teilhard de Chardin, John Muir, Norbert Wiener, and to the poetry of John Keats, the famous 19th-century English Romantic poet, Norse mythology, and the monk Ummon. A large number of technological elements are acknowledged by Simmons to be inspired by elements of Out of Control: The New Biology of Machines, Social Systems, and the Economic World. The Hyperion series has many echoes of Jack Vance, explicitly acknowledged in one of the later books. The title of the first novel, "Hyperion", is taken from one of Keats's poems, the unfinished epic Hyperion. Similarly, the title of the third novel is from Keats' poem Endymion. Quotes from actual Keats poems and the fictional Cantos of Martin Silenus are interspersed throughout the novels. Simmons goes so far as to have two artificial reincarnations of John Keats ("cybrids": artificial intelligences in human bodies) play a major role in the series. == Setting == Much of the action in the series takes place on the planet Hyperion. It is described as having one-fifth less gravity than Earth standard. Hyperion has a number of peculiar indigenous flora and fauna, notably Tesla trees, which are essentially large electricity-spewing trees. It is also a "labyrinthine" planet, which means that it is home to ancient subterranean labyrinths of unknown purpose. Most importantly, Hyperion is the location of the Time Tombs, large artifacts surrounded by "anti-entropic" fields that allow them to move backward through time. In the fictional universe of the Hyperion Cantos, the Hegemony of Man encompasses over 200 planets. Faster than light communications technology, Fatlines, are said to operate through tachyon bursts. However, in later books it is revealed that they operate through the Void Which Binds. The Farcaster network was given to humanity by the TechnoCore and again it was another use of the Void Which Binds that allowed this instantaneous travel between worlds. The Hawking Drive was developed by human scientists, allowing the faster than light travel which led to the Hegira (from the Arabic word هجرة Hijra, meaning 'migration'). The Gideon drive, a Core-provided starship drive, allows for near-instantaneous travel between any two points in human-occupied space. The drive's use kills any human on board a Gideon-propelled starship; thus, the technology is only of use with remote probes or when used in conjunction with the Pax's resurrection technology. The resurrection creche can regenerate someone carrying a cruciform from their remains. Treeships are living trees that are propelled by ergs (spider-like solid-state alien being that emits force fields) through space. === The Shrike === The region of the Tombs is also the home of the Shrike, a menacing half-mechanical, half-organic four-armed creature that features prominently in the series. It appears in all four Hyperion Cantos books and is an enigma in the initial two; its purpose is not revealed until the second book, but is still left nebulous. The Shrike appears to act both autonomously and as a servant of some unknown force or entity. In the first two Hyperion books, it exists solely in the area around the Time Tombs on the planet Hyperion. Its portrayal is changed significantly in the last two books, Endymion and The Rise of Endymion. In these novels, the Shrike appears effectively unfettered and protects the heroine Aenea against assassins of the opposing TechnoCore. Surrounded in mystery, the object of fear, hatred, and even worship by members of the Church of the Final Atonement (the Shrike Cult), the Shrike's origins are described as uncertain. It is portrayed as composed of razorwire, thorns, blades, and cutting edges, having fingers like scalpels and long, curved toe blades. It has the ability to control the flow of time, and may thus appear to travel infinitely fast. The Shrike may kill victims in a flash or it may transport them to an eternity of impalement upon an enormous artificial 'Tree of Thorns,' or 'Tree of Pain' in Hyperion's distant future. The Tree of Thorns is described as an unimaginably large, metallic tree, alive with the agonized writhing of countless human victims of all ages and races. It is also hinted in the second book that the Tree of Thorns is actually a simulation generated by a mystical interface which connects to human brains via a strong and pulsing (as if it were alive) cord. The name Shrike seems a reference to birds of the shrike family, a family of birds that impales their victims on thorns, spines, or twigs. === Worlds and Systems === In the fictional universe of the Hyperion Cantos, the Hegemony of Man encompasses over 200 planets. The following planets appear or are specifically mentioned in the Hyperion Cantos. Planets of

    Read more →