AI Face Grader

AI Face Grader — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Lossless join decomposition

    Lossless join decomposition

    In database design, a lossless join decomposition is a decomposition of a relation r {\displaystyle r} into relations r 1 , r 2 {\displaystyle r_{1},r_{2}} such that a natural join of the two smaller relations yields back the original relation. This is central in removing redundancy safely from databases while preserving the original data. Lossless join can also be called non-additive. == Definition == A relation r {\displaystyle r} on schema R {\displaystyle R} decomposes losslessly onto schemas R 1 {\displaystyle R_{1}} and R 2 {\displaystyle R_{2}} if π R 1 ( r ) ⋈ π R 2 ( r ) = r {\displaystyle \pi _{R_{1}}(r)\bowtie \pi _{R_{2}}(r)=r} , that is r {\displaystyle r} is the natural join of its projections onto the smaller schemas. A pair ( R 1 , R 2 ) {\displaystyle (R_{1},R_{2})} is a lossless-join decomposition of R {\displaystyle R} or said to have a lossless join with respect to a set of functional dependencies F {\displaystyle F} if any relation r ( R ) {\displaystyle r(R)} that satisfies F {\displaystyle F} decomposes losslessly onto R 1 {\displaystyle R_{1}} and R 2 {\displaystyle R_{2}} . Decompositions into more than two schemas can be defined in the same way. == Criteria == A decomposition R = R 1 ∪ R 2 {\displaystyle R=R_{1}\cup R_{2}} has a lossless join with respect to F {\displaystyle F} if and only if the closure of R 1 ∩ R 2 {\displaystyle R_{1}\cap R_{2}} includes R 1 ∖ R 2 {\displaystyle R_{1}\setminus R_{2}} or R 2 ∖ R 1 {\displaystyle R_{2}\setminus R_{1}} . In other words, one of the following must hold: ( R 1 ∩ R 2 ) → ( R 1 ∖ R 2 ) ∈ F + {\displaystyle (R_{1}\cap R_{2})\to (R_{1}\setminus R_{2})\in F^{+}} ( R 1 ∩ R 2 ) → ( R 2 ∖ R 1 ) ∈ F + {\displaystyle (R_{1}\cap R_{2})\to (R_{2}\setminus R_{1})\in F^{+}} === Criteria for multiple sub-schemas === Multiple sub-schemas R 1 , R 2 , . . . , R n {\displaystyle R_{1},R_{2},...,R_{n}} have a lossless join if there is some way in which we can repeatedly perform lossless joins until all the schemas have been joined into a single schema. Once we have a new sub-schema made from a lossless join, we are not allowed to use any of its isolated sub-schema to join with any of the other schemas. For example, if we can do a lossless join on a pair of schemas R i , R j {\displaystyle R_{i},R_{j}} to form a new schema R i , j {\displaystyle R_{i,j}} , we use this new schema (rather than R i {\displaystyle R_{i}} or R j {\displaystyle R_{j}} ) to form a lossless join with another schema R k {\displaystyle R_{k}} (which may already be joined (e.g., R k , l {\displaystyle R_{k,l}} )). == Example == Let R = { A , B , C , D } {\displaystyle R=\{A,B,C,D\}} be the relation schema, with attributes A, B, C and D. Let F = { A → B C } {\displaystyle F=\{A\rightarrow BC\}} be the set of functional dependencies. Decomposition into R 1 = { A , B , C } {\displaystyle R_{1}=\{A,B,C\}} and R 2 = { A , D } {\displaystyle R_{2}=\{A,D\}} is lossless under F because R 1 ∩ R 2 = A {\displaystyle R_{1}\cap R_{2}=A} and we have a functional dependency A → B C {\displaystyle A\rightarrow BC} . In other words, we have proven that ( R 1 ∩ R 2 → R 1 ∖ R 2 ) ∈ F + {\displaystyle (R_{1}\cap R_{2}\rightarrow R_{1}\setminus R_{2})\in F^{+}} .

    Read more →
  • FuseBase

    FuseBase

    FuseBase (previously Nimbus Note and Nimbus Platform) is a B2B SaaS platform. It is among the first to support the Model Context Protocol (MCP), an open standard enabling seamless integration of AI agents with external tools, systems, and data sources. == History == The platform was founded in 2014 as Nimbus Note, the platform started as a cross-platform note-taking and information management tool. As it evolved into Nimbus Platform, it added project management and client portal capabilities. In 2023, the company rebranded as FuseBase, pivoting to connect and automate both internal and external collaboration through AI Agents and cutting-edge protocol adoption like MCP. At the same time, FuseBase was named Product of the Year on Product Hunt. == Technical overview == The platform integrates the Model Context Protocol (MCP), an open-source framework created by Anthropic. MCP allows AI models to securely access and interact with external data, tools, and systems. This enables FuseBase AI Agents to gather relevant context, perform actions, and provide more advanced automation.

    Read more →
  • Dispo

    Dispo

    Dispo (formerly David's Disposable) is an American photo sharing and social networking app owned by Dispo, Inc. and co-founded by CEO Daniel Liss, YouTuber David Dobrik, and Natalie Mariduena. When the app initially launched on iOS in December 2019, it briefly charted as the most downloaded free app on the App Store, ahead of both Disney+ and Instagram. The app was rebranded and relaunched as Dispo, expanding from a simple camera app to a full social network in March 2021. It is based on the disposable camera. == History == On December 21, 2019, the app was first launched on the App Store under the name "David's Disposable." In its first week of release, it was downloaded more than a million times, reaching number one among free apps in the App Store. In June 2020, the team decided to rename the app to Dispo, purchasing the Dispo.fun domain on June 21, 2020. The company announced the change in September 2020. The early Dispo team consisted of Dobrik's longtime friend and business associate Natalie Mariduena as its treasurer, entrepreneur and venture capitalist Daniel Liss as chief executive officer, Regynald Augustin as first engineer, and Briana Hokanson as lead designer. In October 2020, the company raised a $4M seed round with backing from Alexis Ohanian's venture fund Seven Seven Six alongside other investors including Unshackled Ventures, Shrug Capital, and Weekend Fund. In February 2021, Axios reported that the app had generated US$20 million in its series A round, led by Spark Capital. At this time, the app was valued at US$200 million. A New York Times profile asked, "Are Disposables the Future of Photosharing?" In March 2021, the app was officially relaunched with new social network features and its invite-only feature was dropped. On March 21, 2021, it was announced that Spark Capital would sever all ties with Dispo in light of several disparaging allegations against David Dobrik and The Vlog Squad. The same day, it was announced that Dobrik would leave the company and step down from the company's board of directors. On March 22, 2021, Seven Seven Six and Unshackled Ventures announced they would be standing by the company and its remaining employees but donating profits to charity. In June, 2021, CEO Daniel Liss announced Dispo's official Series A. Investors and advisors in the new Dispo include Ohanian's Seven Seven Six, Unshackled, Endeavor, photographers Annie Leibovitz and Raven B. Varona, NBA stars Kevin Durant and Andre Iguodala (through their 35 Ventures and F9 Strategies venture firms, respectively). Other participants include Cara Delevingne, Sofia Vergara, Shade Room CEO Angelica Nwandu, Latin World Entertainment CEO Luis Balaguer, and Amplify Africa co-founders Damilare Kujembola and Timi Adeyeba. == Overview == Dispo has been compared to other image sharing and social networking services, most notably Instagram and VSCO, although users cannot immediately see the photos they have taken using the app. When a user attempts to take a photo, the interface mimics the developing process of a disposable camera. Users can take as many photos on the app as they want; they do not appear on the app however, until 9 am the next day. Once the set of photos appear on the app, users can choose to save them or share them with other users in a "roll". == Reception == Screen Rant has called the app "like Clubhouse [referring to the app] but for photos," comparing the early invite-only features of the apps. As it greatly restricts the user's editing options and sets out to offer a more authentic social networking experience, the app has been widely dubbed the "anti-Instagram". Between March 2021 and June 2021, the app reached the top ten in the App Store's photo/video rankings on 5 continents including in the US, Japan, Spain, Germany, Brazil, and Australia. It has been a notable success in Japan, where it opened its first international office in July 2021. In July 2021, NBA number one draft pick Cade Cunningham announced he had selected Dispo as his exclusive social media partner for the NBA draft.

    Read more →
  • Tiimo

    Tiimo

    Tiimo is an app designed to help neurodivergent individuals with planning their life. In August 2024 the company raised €1.4 million, bringing their total funding to €4.3 million. At that point they had over 500,000 users, including 50,000 paid users. The app has Apple Watch support and a learning platform that includes courses on well-being and neurodiversity. The app was founded by Helene Lassen Nørlem and Melissa Würtz Azari in 2015. After being a finalist in 2024, in December 2025 Tiimo was won Apple’s iPhone App of the Year. The premium version is $10/mo and features an AI chatbot alongside the daily planner.

    Read more →
  • Tapingo

    Tapingo

    Tapingo was an American mobile commerce application that offers advance ordering for pickup and food delivery services for college campuses. The company was acquired by Grubhub in September 2018 for approximately $150 million. Following the acquisition, Tapingo’s campus-ordering functionality was integrated into the Grubhub app (Grubhub Campus Dining) and the Tapingo service was discontinued during 2019. Tapingo is differentiated from other on-demand delivery/logistics companies, such as Waiter.com, Postmates, or DoorDash, by focusing its efforts on serving the college market. Through Tapingo, users can browse menus, place orders, pay for the meal and schedule the pickup or have it delivered. On certain campuses, students are able to use their university's meal dollars to pay for food. In the spring of 2012, Tapingo first launched its services on five campuses (Santa Clara University, Loyola Marymount University, Biola University, the University of Maine, and California Lutheran University), and has since expanded to more than 200 college campuses across the U.S. and Canada, serving 100 markets. To date, Tapingo has received venture funding from Carmel Ventures, Khosla Ventures, Kinzon Capital, DCM Ventures and Qualcomm Ventures. In fall 2015, Tapingo announced expansion plans through major partnership deals with national brands like Chipotle Mexican Grill and 7-Eleven, regional restaurants such as Taco Bueno, and global foodservice provider Aramark.

    Read more →
  • Deep Zoom

    Deep Zoom

    Deep Zoom is a technology developed by Microsoft for efficiently transmitting and viewing images. It allows users to pan around and zoom in on a large, high resolution image or a large collection of images. It reduces the time required for initial load by downloading only the region being viewed or only at the resolution it is displayed at. Subsequent regions are downloaded as the user pans to (or zooms into) them; animations are used to hide any jerkiness in the transition. The libraries are also available in other platforms including Java and Flash. == History == The Deep Zoom file format is very similar to the Google Maps image format where images are broken into tiles and then displayed as required. The tiling typically follows a quadtree pattern of increasing resolution of image (in other words twice the zoom and twice the resolution). The main difference is that with Google Maps the actual details on the image change from one zoom level to another, while with Deep Zoom the same image is displayed at each zoom level. Seadragon Software, formerly Sand Codex, first created the Seadragon technology and its implementation of what is now called Deep Zoom. This technology was then absorbed into the Microsoft Live Labs when Seadragon Software was acquired. Engineers from Seadragon now work with Microsoft to integrate their work into technology such as Silverlight and Photosynth. == Deep Zoom examples == The most famous implementation of Deep Zoom was probably the first: the memorabilia collection at the Hard Rock website. Conceived and designed by Duncan/Channon and built by Vertigo, it was demonstrated for the first time in March 2008 at the Microsoft MIX convention in Las Vegas. In 2010, Microsoft Live Labs partnered with the University of California, Berkeley to create ChronoZoom, a DeepZoom-powered time visualization tool that pushed the limits of DeepZoom, since it required zooming from the scale of 13 billion years down to a single day. The project has since graduated to development under Microsoft Research. Another example is the Deep Earth project. It is described by its creators as "a community project focused on creating a rich interactive mapping control using Silverlight2 Deep Zoom. Concentrating on Microsoft Virtual Earth imagery and data the project offers team members the opportunity to learn and share while creating something cool and useful." A paintings collection project http://galleryzoom.co.uk/ shows 1000 high resolution/sensor images individually indexed. (Using Deep Zoom Composer). Blaise Aguera y Arcas gave a demonstration of Seadragon and Photosynth at the 2007 TED conference. In November 2009, 352 Media Group, a Silverlight developer in the Microsoft Silverlight Partner Program, created an example of Deep Zoom using Microsoft Silverlight version 3. It is online at 352 Media Group's Web site. The Winston Churchill Deep Zoom Archived 2010-07-04 at the Wayback Machine mosaic, created by Silverlight developers Shoothill, features as both an online interactive deep zoom and a standalone deep zoom which forms part of the Churchill exhibit in the Churchill War Rooms in Whitehall. In 2010, Shoothill built the Sumatran Tiger Deep Zoom - the largest seen to date - for worldwide conservation charity Fauna and Flora International, featuring thousands of images of endangered species. An early example of Deep Zoom-like technology was implemented at The Department of Maori Affairs in New Zealand in 1997. The technology was used to display Maori land ownership. == Deep Zoom images == The file format used by Deep Zoom (as well as Photosynth and Seadragon Ajax) is XML based. Users can specify a single large image (dzi) or a collection of images (dzc). It also allows for "Sparse Images"; where some parts of the image have greater resolution than others, an example of which can be found on the Seadragon Ajax home page; The bike image displayed is a sparse image. Though used in the proprietary Deep Zoom, the dzi format is open and able to be used by anyone. === Deep Zoom image (dzi) === A DZI has two parts: a DZI file (with either a .dzi or .xml extension) and a subdirectory of image folders. Each folder in the image subdirectory is labeled with its level of resolution. Higher numbers correspond to a higher resolution level; inside each folder are the image tiles corresponding to that level of resolution, numbered consecutively in columns from top left to bottom right. === Deep Zoom collection (dzc) === A DZC is a collection of some number of DZIs linked and referenced by a DZC file (with either a .dzc or .xml extension). At a high level, a collection is a number of image thumbnails whose location is kept track of by the .dzc/.xml file, when zooming into an image, it accesses greater resolutions tiles. A DZC's structure is similar to that of a DZI; the .dzc/.xml file defines the collection and the subdirectory of folders maps to the DZI file structure, each with their set of .dzi/.xml and image tiles. The DZC is used in Microsoft's Pivot, but not in SeaDragon per se. === Sparse Images === Sparse images are a sub-classification of the DZI file type. A sparse image is normally a number of separate photographs with varying resolution levels that have been placed in a single DZI instead of a DZC. Sparse images have no different file structure than that of a DZI and differ only in that there is not a single "highest resolution" level for the entire DZI. == Software that uses Deep Zoom == Image Composite Editor - image stitching tool created by Microsoft Research Deep Zoom Composer - collage maker and simple panorama tool created by Microsoft. Images' resolution is maintained when exporting for web use (via Silverlight Deep Zoom or JavaScript using a third-party template). No longer available for download from Microsoft though it can be found on various other sources such as Internet Archive. == iPhone OS development == Microsoft Live Labs has created an application for the App Store called Seadragon Mobile. It is run over the internet and includes Deep Zoom on the following categories; art, history, maps, photos, Photosynth which anybody can upload to, space and technology & web.

    Read more →
  • Resel

    Resel

    In image analysis, a resel (from resolution element) represents the actual spatial resolution in an image or a volumetric dataset. The number of resels in the image may be lower or equal to the number of pixel/voxels in the image. In an actual image the resels can vary across the image and indeed the local resolution can be expressed as "resels per pixel" (or "resels per voxel"). In functional neuroimaging analysis, an estimate of the number of resels together with random field theory is used in statistical inference. Keith Worsley has proposed an estimate for the number of resels/roughness. The word "resel" is related to the words "pixel", "texel", and "voxel". Waldo R. Tobler is probably among the first to use the word.

    Read more →
  • CapCut

    CapCut

    CapCut, known domestically as JianYing (Chinese: 剪映; pinyin: Jiǎnyìng) and formerly internationally as ViaMaker, is a video editor developed by ByteDance, available as a mobile app, desktop app, and web app. == History == The app was first released in China in 2019 and was initially available for iPhone and Android. In 2020, it was rebranded in English from ViaMaker to CapCut and became available globally. It later expanded to include web and desktop versions for Mac and Windows. In 2022, CapCut reached 200 million active users. According to The Wall Street Journal, in March 2023, it was the second-most downloaded app in the U.S., behind that of Chinese discount retailer Temu. In January 2025, CapCut had over 1 billion downloads on the Google Play Store. On February 1, 2021, CapCut Pro for Windows was launched. On November 27, the Pro version for Mac was launched. In July 2025, CapCut Pro for HarmonyOS was available on HarmonyOS NEXT tablets. In July 2024, CapCut was reported by the South China Morning Post to be a generative AI (GenAI) application that led global AI app downloads, with approximately 38.42 million downloads and 323 million monthly active users. == Features == CapCut supports basic video editing functions, including editing, trimming, and adding or splitting clips. Editing projects is limited to single-layer editing, but the app supports overlay options that enable additional effects, including multi-layer editing. The app includes a library of pre-made templates and a tool that generates editable video captions. It also provides photo editing tools, including retouch and product photo features integrated within the editing interface. CapCut's video editor includes AI-based features such as video and script generation. Users can export or save completed projects directly to different social media platforms. CapCut includes a free version and a paid Pro version with cloud storage and advanced features. == Controversies == === Illegal data collection === In July 2023, many users of CapCut accused it of illegally profiting off their personal data. A class-action lawsuit filed in the U.S. District Court for the Northern District of Illinois on July 28, 2023, alleged that CapCut illegally harvests and profits from user data including biometric information and geolocation without consent. In September 2025, a federal court excluded most of the lawsuit, which alleged that TikTok’s parent company improperly scraped private data from CapCut's video editing software, as lacking grounds, with some of the class action continuing to move forward. == Bans and restrictions == === Ban in India === As a response to border clashes with China in May 2020, the Indian government banned around 56 Chinese applications including CapCut and TikTok, which is owned by CapCut's parent company ByteDance. Indian users were unable to use and download the application. As of February 2022, around 273 Chinese applications have been banned by the Indian government under the concern of national security and Indian user privacy. === Ban in the United States === On January 18, 2025, at 10 PM EST, CapCut was banned in the United States along with TikTok and all other ByteDance apps due to the implementation of the Protecting Americans from Foreign Adversary Controlled Applications Act. Hours after the suspension of services took effect, President Donald Trump indicated on Truth Social that he would issue an executive order on the day of his inauguration "to extend the period of time before the law's prohibitions take effect". On January 21, CapCut began restoring service. On February 13, Google and Apple restored CapCut on the App Store and Google Play Store.

    Read more →
  • Winner-take-all in action selection

    Winner-take-all in action selection

    Winner-take-all is a computer science concept that has been widely applied in behavior-based robotics as a method of action selection for intelligent agents. Winner-take-all systems work by connecting modules (task-designated areas) in such a way that when one action is performed it stops all other actions from being performed, so only one action is occurring at a time. The name comes from the idea that the "winner" action takes all of the motor system's power. == History == In the 1980s and 1990s, many roboticists and cognitive scientists were attempting to find speedier and more efficient alternatives to the traditional world modeling method of action selection. In 1982, Jerome A. Feldman and D.H. Ballard published the "Connectionist Models and Their Properties", referencing and explaining winner-take-all as a method of action selection. Feldman's architecture functioned on the simple rule that in a network of interconnected action modules, each module will set its own output to zero if it reads a higher input than its own in any other module. In 1986, Rodney Brooks introduced behavior-based artificial intelligence. Winner-take-all architectures for action selection soon became a common feature of behavior-based robots, because selection occurred at the level of the action modules (bottom-up) rather than at a separate cognitive level (top-down), producing a tight coupling of stimulus and reaction. == Types of winner-take-all architectures == === Hierarchy === In the hierarchical architecture, actions or behaviors are programmed in a high-to-low priority list, with inhibitory connections between all the action modules. The agent performs low-priority behaviors until a higher-priority behavior is stimulated, at which point the higher behavior inhibits all other behaviors and takes over the motor system completely. Prioritized behaviors are usually key to the immediate survival of the agent, while behaviors of lower priority are less time-sensitive. For example, "run away from predator" would be ranked above "sleep." While this architecture allows for clear programming of goals, many roboticists have moved away from the hierarchy because of its inflexibility. === Heterarchy and fully distributed === In the heterarchy and fully distributed architecture, each behavior has a set of pre-conditions to be met before it can be performed, and a set of post-conditions that will be true after the action has been performed. These pre- and post-conditions determine the order in which behaviors must be performed and are used to causally connect action modules. This enables each module to receive input from other modules as well as from the sensors, so modules can recruit each other. For example, if the agent's goal were to reduce thirst, the behavior "drink" would require the pre-condition of having water available, so the module would activate the module in charge of "find water". The activations organize the behaviors into a sequence, even though only one action is performed at a time. The distribution of larger behaviors across modules makes this system flexible and robust to noise. Some critics of this model hold that any existing set of division rules for the predecessor and conflictor connections between modules produce sub-par action selection. In addition, the feedback loop used in the model can in some circumstances lead to improper action selection. === Arbiter and centrally coordinated === In the arbiter and centrally coordinated architecture, the action modules are not connected to each other but to a central arbiter. When behaviors are triggered, they begin "voting" by sending signals to the arbiter, and the behavior with the highest number of votes is selected. In these systems, bias is created through the "voting weight", or how often a module is allowed to vote. Some arbiter systems take a different spin on this type of winner-take-all by using a "compromise" feature in the arbiter. Each module is able to vote for or against each smaller action in a set of actions, and the arbiter selects the action with the most votes, meaning that it benefits the most behavior modules. This can be seen as violating the general rule against creating representations of the world in behavior-based AI, established by Brooks. By performing command fusion, the system is creating a larger composite pool of knowledge than is obtained from the sensors alone, forming a composite inner representation of the environment. Defenders of these systems argue that forbidding world-modeling puts unnecessary constraints on behavior-based robotics, and that agents benefits from forming representations and can still remain reactive.

    Read more →
  • DataScene

    DataScene

    DataScene is a scientific graphing, animation, data analysis, and real-time data monitoring software package. It was developed with the Common Language Infrastructure technology and the GDI+ graphics library. With the two Common Language Runtime engines - the .Net and Mono frameworks - DataScene runs on all major operating systems. With DataScene, the user can plot 39 types 2D & 3D graphs (e.g., Area graph, Bar graph, Boxplot graph, Pie graph, Line graph, Histogram graph, Surface graph, Polar graph, Water Fall graph, etc.), manipulate, print, and export graphs to various formats (e.g., Bitmap, WMF/EMF, JPEG, PNG, GIF, TIFF, PostScript, and PDF), analyze data with different mathematical methods (fitting curves, calculating statics, FFT, etc.), create chart animations for presentations (e.g. with PowerPoint), classes, and web pages, and monitor and chart real-time data. == History == DataScene was first released (version 1.0) in March 2009 for the Windows platform and the .Net 2.0 framework. Since version 2.0, DataScene has been ported to the Mono framework 2.6 and all Linux and Unix/X11 operating systems. Cyberwit offers free licensing for the Express edition of DataScene.

    Read more →
  • Speech recognition

    Speech recognition

    Speech recognition (automatic speech recognition (ASR), computer speech recognition, or speech-to-text (STT)) is a sub-field of computational linguistics concerned with methods and technologies that translate spoken language into text or other interpretable forms. Speech recognition applications include voice user interfaces, where the user speaks to a device, which "listens" and processes the audio. Common voice applications include interpreting commands for calling, call routing, home automation, and aircraft control. These applications are called direct voice input. Productivity applications include searching audio recordings, creating transcripts, and dictation. Speech recognition can be used to analyse speaker characteristics, such as identifying native language using pronunciation assessment. Voice recognition (speaker identification) refers to identifying the speaker, rather than speech contents. Recognizing the speaker can simplify the task of translating speech in systems trained on a specific person's voice. It can also be used to authenticate the speaker as part of a security process. == History == Applications for speech recognition developed over many decades, with progress accelerated due to advances in deep learning and the use of big data. These advances are reflected in an increase in academic papers, and greater system adoption. Key areas of growth include vocabulary size, more accurate recognition for unfamiliar speakers (speaker independence), and faster processing speed. === Pre-1970 === 1952 – Bell Labs researchers, Stephen Balashek, R. Biddulph, and K. H. Davis, built Audrey for single-speaker digit recognition. Their system located the formants in the power spectrum of each utterance. 1960 – Gunnar Fant developed and published the source–filter model of speech production. 1962 – IBM's 16-word "Shoebox" machine's speech recognition debuted at the 1962 World's Fair. 1966 – Linear predictive coding, a speech coding method, was proposed by Fumitada Itakura of Nagoya University and Shuzo Saito of Nippon Telegraph and Telephone. 1969 – Funding at Bell Labs came to a halt for several years after the company's head engineer, John R. Pierce, wrote an open letter criticizing speech recognition research. This defunding lasted until Pierce retired and James L. Flanagan took over. Raj Reddy was the first person to work on continuous speech recognition, as a graduate student at Stanford University in the late 1960s. Previous systems required users to pause after each word. Reddy's system issued spoken commands for playing chess. Around this time, Soviet researchers invented the dynamic time warping (DTW) algorithm and used it to create a recognizer capable of operating on a 200-word vocabulary. DTW processed speech by dividing it into short frames (e.g. 10 ms segments) and treating each frame as a unit. Speaker independence, however, remained unsolved. === 1970–1990 === 1971 – DARPA funded a five-year speech recognition research project, Speech Understanding Research, seeking a minimum vocabulary size of 1,000 words. The project considered speech understanding a key to achieving progress in speech recognition, which was later disproved. BBN, IBM, Carnegie Mellon (CMU), and Stanford Research Institute participated. 1972 – The IEEE Acoustics, Speech, and Signal Processing group held a conference in Newton, Massachusetts. 1976 – The first ICASSP was held in Philadelphia, which became a major venue for publishing on speech recognition. During the late 1960s, Leonard Baum developed the mathematics of Markov chains at the Institute for Defense Analysis. A decade later, at CMU, Raj Reddy's students James Baker and Janet M. Baker began using the hidden Markov model (HMM) for speech recognition. James Baker had learned about HMMs while at the Institute for Defense Analysis. HMMs enabled researchers to combine sources of knowledge, such as acoustics, language, and syntax, in a unified probabilistic model. By the mid-1980s, Fred Jelinek's team at IBM created a voice-activated typewriter called Tangora, which could handle a 20,000-word vocabulary. Jelinek's statistical approach placed less emphasis on emulating human brain processes in favor of statistical modelling. (Jelinek's group independently discovered the application of HMMs to speech.) This was controversial among linguists since HMMs are too simplistic to account for many features of human languages. However, the HMM proved to be a highly useful way for modelling speech and replaced dynamic time warping as the dominant speech recognition algorithm in the 1980s. 1982 – Dragon Systems, founded by James and Janet M. Baker, was one of IBM's few competitors. === Practical speech recognition === The 1980s also saw the introduction of the n-gram language model. 1987 – The back-off model enabled language models to use multiple-length n-grams, and CSELT used HMM to recognize languages (in software and hardware, e.g. RIPAC). At the end of the DARPA program in 1976, the best computer available to researchers was the PDP-10 with 4 MB of RAM. It could take up to 100 minutes to decode 30 seconds of speech. Practical products included: 1984 – the Apricot Portable was released with up to 4096 words support, of which only 64 could be held in RAM at a time. 1987 – a recognizer from Kurzweil Applied Intelligence 1990 – Dragon Dictate, a consumer product released in 1990. AT&T deployed the Voice Recognition Call Processing service in 1992 to route telephone calls without a human operator. The technology was developed by Lawrence Rabiner and others at Bell Labs. By the early 1990s, the vocabulary of the typical commercial speech recognition system had exceeded the average human vocabulary. Reddy's former student, Xuedong Huang, developed the Sphinx-II system at CMU. Sphinx-II was the first to do speaker-independent, large vocabulary, continuous speech recognition, and it won DARPA's 1992 evaluation. Handling continuous speech with a large vocabulary was a major milestone. Huang later founded the speech recognition group at Microsoft in 1993. Reddy's student Kai-Fu Lee joined Apple, where, in 1992, he helped develop the Casper speech interface prototype. Lernout & Hauspie, a Belgium-based speech recognition company, acquired other companies, including Kurzweil Applied Intelligence in 1997 and Dragon Systems in 2000. L&H was used in Windows XP. L&H was an industry leader until an accounting scandal destroyed it in 2001. L&H speech technology was bought by ScanSoft, which became Nuance in 2005. Apple licensed Nuance software for its digital assistant Siri. ==== 2000s ==== In the 2000s, DARPA sponsored two speech recognition programs: Effective Affordable Reusable Speech-to-Text (EARS) in 2002, followed by Global Autonomous Language Exploitation (GALE) in 2005. Four teams participated in EARS: IBM; a team led by BBN with LIMSI and the University of Pittsburgh; Cambridge University; and a team composed of ICSI, SRI, and the University of Washington. EARS funded the collection of the Switchboard telephone speech corpus, which contained 260 hours of recorded conversations from over 500 speakers. The GALE program focused on Arabic and Mandarin broadcast news. Google's first effort at speech recognition came in 2007 after recruiting Nuance researchers. Its first product, GOOG-411, was a telephone-based directory service. Since at least 2006, the U.S. National Security Agency has employed keyword spotting, allowing analysts to index large volumes of recorded conversations and identify speech containing "interesting" keywords. Other government research programs focused on intelligence applications, such as DARPA's EARS program and IARPA's Babel program. In the early 2000s, speech recognition was dominated by hidden Markov models combined with feed-forward artificial neural networks (ANN). Later, speech recognition was taken over by long short-term memory (LSTM), a recurrent neural network (RNN) published by Sepp Hochreiter & Jürgen Schmidhuber in 1997. LSTM RNNs avoid the vanishing gradient problem and can learn "Very Deep Learning" tasks that require memories of events that happened thousands of discrete time steps earlier, which is important for speech. Around 2007, LSTMs trained with Connectionist Temporal Classification (CTC) began to outperform. In 2015, Google reported a 49 percent error‑rate reduction in its speech recognition via CTC‑trained LSTM. Transformers, a type of neural network based solely on attention, were adopted in computer vision and language modelling, and then to speech recognition. Deep feed-forward (non-recurrent) networks for acoustic modelling were introduced in 2009 by Geoffrey Hinton and his students at the University of Toronto, and by Li Deng and colleagues at Microsoft Research. In contrast to the prioer incremental improvements, deep learning decreased error rates by 30%. Both shallow and deep forms (e.g., recurrent nets) of ANNs had been explored since the 1980s. Howev

    Read more →
  • Logistics automation

    Logistics automation

    Logistics automation is the application of computer software or automated machinery to logistics operations in order to improve its efficiency. Typically this refers to operations within a warehouse or distribution center, with broader tasks undertaken by supply chain engineering systems and enterprise resource planning systems. Logistics automation systems can powerfully complement the facilities provided by these higher level computer systems. The focus on an individual node within a wider logistics network allows systems to be highly tailored to the requirements of that node. == Components == Logistics automation systems comprise a variety of hardware and software components: Fixed machinery Automated storage and retrieval systems, including: Cranes serve a rack of locations, allowing many levels of stock to be stacked vertically, and allowing for higher storage densities and better space utilization than alternatives. In systems produced by Amazon Robotics, automated guided vehicles move items to a human picker. Conveyors: Containers can enter automated conveyors in one area of the warehouse and, either through hard-coded rules or data input, be moved to a selected destination. Vertical carousels based on the paternoster lift system or using space optimization, similar to vending machines, but on a larger scale. Sortation systems: similar to conveyors but typically with higher capacity and able to divert containers more quickly. Typically used to distribute high volumes of small cartons to a large set of locations. Industrial robots: four- to six-axis industrial robots, e.g. palletizing robots, are used for palletizing, depalletizing, packaging, commissioning and order picking. Typically all of these will automatically identify and track containers using barcodes or, increasingly, RFID tags. Motion check weighers may be used to reject cases or individual products that are under or over their specified weight. They are often used in kitting conveyor lines to ensure all pieces belonging in the kit are present. Mobile technology Radio data terminals: these are handheld or truck-mounted terminals which connect by radio to logistics automation software and provide instructions to operators moving throughout the warehouse. Many also have barcode scanners to allow identification of containers more quickly and accurately than manual keyboard entry. Software Integration software: this provides overall control of the automation machinery and allows cranes to be connected to conveyors for seamless stock movements. Operational control software: provides low-level decision-making, such as where to store incoming containers, and where to retrieve them when requested. Business control software: provides higher-level functionality, such as identification of incoming deliveries/stock, scheduling order fulfillment, and assignment of stock to outgoing trailers. == Benefits == A typical warehouse or distribution center will receive stock of a variety of products from suppliers and store these until the receipt of orders from customers, whether individual buyers (e.g. mail order), retail branches (e.g. chain stores), or other companies (e.g. wholesalers). A logistics automation system may provide the following: Automated goods in processes: Incoming goods can be marked with barcodes and the automation system notified of the expected stock. On arrival, the goods can be scanned and thereby identified, and taken via conveyors, sortation systems, and automated cranes into an automatically assigned storage location. Automated goods retrieval for orders: On receipt of orders, the automation system is able to immediately locate goods and retrieve them to a pick-face location. Automated dispatch processing: Combining knowledge of all orders placed at the warehouse the automation system can assign picked goods into dispatch units and then into outbound loads. Sortation systems and conveyors can then move these onto the outgoing trailers. If needed, repackaging to ensure proper protection for further distribution or to change the package format for specific retailers/customers. A complete warehouse automation system can drastically reduce the workforce required to run a facility, with human input required only for a few tasks, such as picking units of product from a bulk packed case. Even here, assistance can be provided with equipment such as pick-to-light units. Smaller systems may only be required to handle part of the process. Examples include automated storage and retrieval systems, which simply use cranes to store and retrieve identified cases or pallets, typically into a high-bay storage system which would be unfeasible to access using fork-lift trucks or any other means. The use of Automatic Guided Vehicles maximizes the output compared to humans since they can do repetitive tasks for long hours and with least to no supervision. An AGV is built and programmed for precision and accuracy thereby reducing the chances of errors in a warehouse, especially when dealing with fragile goods. == Automation software == Software or cloud-based SaaS solutions are used for logistics automation which helps the supply chain industry in automating the workflow as well as management of the system. Knowledge @ Wharton staff writers noted in 2011 that some manufacturers and retailers were weathering the Great Recession "by signing up for pay-as-you-go logistics services available through the Internet 'cloud'". They identified the benefits and reduced costs which came from sharing information about shipments with suppliers, hauliers and end users. There is little generalized software available in this market. This is because there is no rule to generalize the system as well as work flow even though the practice is more or less the same. Most of the commercial companies do use one or the other of the custom solutions. But there are various software solutions that are being used within the departments of logistics. There are a few departments in Logistics, namely: Conventional Department, Container Department, Warehouse, Marine Engineering, Heavy Haulage, etc. Software used in these departments Conventional department : CVT software / CTMS software. Container Trucking: CTMS software Warehouse : WMS/WCS Improving Effectiveness of Logistics Management Logistical Network Information Transportation Sound Inventory Management Warehousing, Materials Handling & Packaging

    Read more →
  • PatchMatch

    PatchMatch

    PatchMatch is an algorithm used to quickly find correspondences (or matches) between small square regions (or patches) of an image. It has various applications in image editing, such as reshuffling or removing objects from images or altering their aspect ratios without cropping or noticeably stretching them. PatchMatch was first presented in a 2011 paper by researchers at Princeton University. == Algorithm == The goal of the algorithm is to find the patch correspondence by defining a nearest-neighbor field (NNF) as a function f : R 2 → R 2 {\displaystyle f:\mathbb {R} ^{2}\to \mathbb {R} ^{2}} of offsets, which is over all possible matches of patch (location of patch centers) in image A, for some distance function of two patches D {\displaystyle D} . So, for a given patch coordinate a {\displaystyle a} in image A {\displaystyle A} and its corresponding nearest neighbor b {\displaystyle b} in image B {\displaystyle B} , f ( a ) {\displaystyle f(a)} is simply b − a {\displaystyle b-a} . However, if we search for every point in image B {\displaystyle B} , the work will be too hard to complete. So the following algorithm is done in a randomized approach in order to accelerate the calculation speed. The algorithm has three main components. Initially, the nearest-neighbor field is filled with either random offsets or some prior information. Next, an iterative update process is applied to the NNF, in which good patch offsets are propagated to adjacent pixels, followed by random search in the neighborhood of the best offset found so far. Independent of these three components, the algorithm also uses a coarse-to-fine approach by building an image pyramid to obtain the better result. === Initialization === When initializing with random offsets, we use independent uniform samples across the full range of image B {\displaystyle B} . This algorithm avoids using an initial guess from the previous level of the pyramid because in this way the algorithm can avoid being trapped in local minima. === Iteration === After initialization, the algorithm attempted to perform iterative process of improving the N N F {\displaystyle NNF} . The iterations examine the offsets in scan order (from left to right, top to bottom), and each undergoes propagation followed by random search. === Propagation === We attempt to improve f ( x , y ) {\displaystyle f(x,y)} using the known offsets of f ( x − 1 , y ) {\displaystyle f(x-1,y)} and f ( x , y − 1 ) {\displaystyle f(x,y-1)} , assuming that the patch offsets are likely to be the same. That is, the algorithm will take new value for f ( x , y ) {\displaystyle f(x,y)} to be arg ⁡ min ( x , y ) D ( f ( x , y ) ) , D ( f ( x − 1 , y ) ) , D ( f ( x , y − 1 ) ) {\displaystyle \arg \min \limits _{(x,y)}{D(f(x,y)),D(f(x-1,y)),D(f(x,y-1))}} . So if f ( x , y ) {\displaystyle f(x,y)} has a correct mapping and is in a coherent region R {\displaystyle R} , then all of R {\displaystyle R} below and to the right of f ( x , y ) {\displaystyle f(x,y)} will be filled with the correct mapping. Alternatively, on even iterations, the algorithm search for different direction, fill the new value to be arg ⁡ min ( x , y ) { D ( f ( x , y ) ) , D ( f ( x + 1 , y ) ) , D ( f ( x , y + 1 ) ) } {\displaystyle \arg \min \limits _{(x,y)}\{D(f(x,y)),D(f(x+1,y)),D(f(x,y+1))\}} . === Random search === Let v 0 = f ( x , y ) {\displaystyle v_{0}=f(x,y)} , we attempt to improve f ( x , y ) {\displaystyle f(x,y)} by testing a sequence of candidate offsets at an exponentially decreasing distance from v 0 {\displaystyle v_{0}} u i = v 0 + w α i R i {\displaystyle u_{i}=v_{0}+w\alpha ^{i}R_{i}} where R i {\displaystyle R_{i}} is a uniform random in [ − 1 , 1 ] × [ − 1 , 1 ] {\displaystyle [-1,1]\times [-1,1]} , w {\displaystyle w} is a large window search radius which will be set to maximum picture size, and α {\displaystyle \alpha } is a fixed ratio often assigned as 1/2. This part of the algorithm allows the f ( x , y ) {\displaystyle f(x,y)} to jump out of local minimum through random process. === Halting criterion === The often used halting criterion is set the iteration times to be about 4~5. Even with low iteration, the algorithm works well.

    Read more →
  • DexNet

    DexNet

    Dex-net is a robotic. It uses a Grasp Quality Convolutional Neural Network to learn how to grasp unusually shaped objects. == History == Dex-net was developed by University of California, Berkeley professor Ken Goldberg and graduate student Jeff Mahler. == Design == Dex-net includes a high-resolution 3-D sensor and two arms, each controlled by a different neural network. One arm is equipped with a conventional robot gripper and another with a suction system. The robot’s software scans an object and then asks both neural networks to decide, on the fly, whether to grab or suck a particular object. It runs on an off-the-shelf industrial machine made by Swiss robotics company ABB. The software learns by attempting to pick up objects in a virtual environment. Dex-Net can generalize from an object it has seen before to a new one. The robot can "nudge" such virtual objects to examine if it is unsure how to grasp them. The trial data set was 6.7 million point clouds, grasps and analytic grasp metrics generated from thousands of 3D models. Grasps are defined as a gripper's planar position, angle and depth relative to an RGB-D sensor. == Mean picks per hour == A metric called mean picks per hour (MPPH) is calculated by multiplying the average time per pick and the average probability of success for a specific set of objects. The new metric allows labs working on picking robots to compare their results. Humans are capable of between 400 and 600 MPPH. In a contest organized by Amazon recently, the best robots were capable of between 70 and 95. Dex-net has achieved 200 to 300.

    Read more →
  • Shape analysis (digital geometry)

    Shape analysis (digital geometry)

    This article describes shape analysis to analyze and process geometric shapes. == Description == Shape analysis is the (mostly) automatic analysis of geometric shapes, for example using a computer to detect similarly shaped objects in a database or parts that fit together. For a computer to automatically analyze and process geometric shapes, the objects have to be represented in a digital form. Most commonly a boundary representation is used to describe the object with its boundary (usually the outer shell, see also 3D model). However, other volume based representations (e.g. constructive solid geometry) or point based representations (point clouds) can be used to represent shape. Once the objects are given, either by modeling (computer-aided design), by scanning (3D scanner) or by extracting shape from 2D or 3D images, they have to be simplified before a comparison can be achieved. The simplified representation is often called a shape descriptor (or fingerprint, signature). These simplified representations try to carry most of the important information, while being easier to handle, to store and to compare than the shapes directly. A complete shape descriptor is a representation that can be used to completely reconstruct the original object (for example the medial axis transform). == Application fields == Shape analysis is used in many application fields: archeology for example, to find similar objects or missing parts architecture for example, to identify objects that spatially fit into a specific space medical imaging to understand shape changes related to illness or aid surgical planning virtual environments or on the 3D model market to identify objects for copyright purposes security applications such as face recognition entertainment industry (movies, games) to construct and process geometric models or animations computer-aided design and computer-aided manufacturing to process and to compare designs of mechanical parts or design objects. == Shape descriptors == Shape descriptors can be classified by their invariance with respect to the transformations allowed in the associated shape definition. Many descriptors are invariant with respect to congruency, meaning that congruent shapes (shapes that could be translated, rotated and mirrored) will have the same descriptor (for example moment or spherical harmonic based descriptors or Procrustes analysis operating on point clouds). Another class of shape descriptors (called intrinsic shape descriptors) is invariant with respect to isometry. These descriptors do not change with different isometric embeddings of the shape. Their advantage is that they can be applied nicely to deformable objects (e.g. a person in different body postures) as these deformations do not involve much stretching but are in fact near-isometric. Such descriptors are commonly based on geodesic distances measures along the surface of an object or on other isometry invariant characteristics such as the Laplace–Beltrami spectrum (see also spectral shape analysis). There are other shape descriptors, such as graph-based descriptors like the medial axis or the Reeb graph that capture geometric and/or topological information and simplify the shape representation but can not be as easily compared as descriptors that represent shape as a vector of numbers. From this discussion it becomes clear, that different shape descriptors target different aspects of shape and can be used for a specific application. Therefore, depending on the application, it is necessary to analyze how well a descriptor captures the features of interest.

    Read more →