Leela Zero

Leela Zero

Leela Zero is a free and open-source computer Go program released on 25 October 2017. It is developed by Belgian programmer Gian-Carlo Pascutto, the author of chess engine Sjeng and Go engine Leela. Leela Zero's algorithm is based on DeepMind's 2017 paper about AlphaGo Zero. Unlike the original Leela, which has a lot of human knowledge and heuristics programmed into it, the program code in Leela Zero only knows the basic rules and nothing more. The knowledge that makes Leela Zero a strong player is contained in a neural network, which is trained based on the results of previous games that the program played. Leela Zero is trained by a distributed effort, which is coordinated at the Leela Zero website. Members of the community provide computing resources by running the client, which generates self-play games and submits them to the server. The self-play games are used to train newer networks. Generally, over 500 clients have connected to the server to contribute resources. The community has provided high quality code contributions as well. == Version history == Leela Zero finished third at the BerryGenomics Cup World AI Go Tournament in Fuzhou, Fujian, China on 28 April 2018. The New Yorker at the end of 2018 characterized Leela and Leela Zero as "the world’s most successful open-source Go engines". In early 2018, another team branched Leela Chess Zero from the same code base, also to verify the methods in the AlphaZero paper as applied to the game of chess. AlphaZero's use of Google TPUs was replaced by a crowd-sourcing infrastructure and the ability to use graphics card GPUs via the OpenCL library. Even so, it is expected to take a year of crowd-sourced training to make up for the dozen hours that AlphaZero was allowed to train for its chess match in the paper. The distributed training server was shut down on 2021-02-15, marking the end of Leela Zero project. The page now directs visitors to KataGo and SAI. The model sizes increased steadily over time. The first released model has hash name d645af97, size 1x8 (1 layer, 8 channels), and released at 2017-11-10 13:04. The last released model has hash name 0e9ea880, size 40x256, and was released at 2021-02-15 09:04. == Technology == Leela Zero is an (almost) exact replication of AlphaGo Zero in both training process and architecture. The training process is Monte-Carlo Tree Search with self-play, exactly the same as AlphaGo Zero. The architecture is the same as AlphaGo Zero (with one difference). Consider the last released model, 0e9ea880. It has 47 million parameters, and the following architecture: The stem of the network takes as input a 18x19x19 tensor representation of the Go board. 8 channels are the positions of the current player's stones from the last eight time steps. (1 if there is a stone, 0 otherwise. If the time step go before the beginning of the game, then 0 in all positions.) 8 channels are the positions of the other player's stones from the last eight time steps. 1 channel is all 1 if black is to move, and 0 otherwise. 1 channel is all 1 if white is to move, and 0 otherwise. (This channel is not present in the original AlphaGo Zero) The body is a ResNet with 40 residual blocks and 256 channels. There are two heads, a policy head and a value head. Policy head outputs a logit array of size 19 × 19 + 1 {\displaystyle 19\times 19+1} , representing the logit of making a move in one of the points, plus the logit of passing. Value head outputs a number in the range ( − 1 , + 1 ) {\displaystyle (-1,+1)} , representing the expected score for the current player. -1 represents current player losing, and +1 winning.

Ultra Hal

Ultra Hal is a chatbot intended to function as a virtual assistant. It was developed by Zabaware, Inc. Ultra Hal uses a natural language interface with animated characters using speech synthesis. Users can communicate with the chatterbot via typing or via a speech recognition engine. It utilizes the WordNet lexical dictionary. Its name is an allusion to HAL 9000, the artificial intelligence from the movie 2001: A Space Odyssey. Ultra Hal won the 2007 Loebner Prize for "most human" chatterbot.

Audio mining

Audio mining is a technique by which the content of an audio signal can be automatically analyzed and searched. It is most commonly used in the field of automatic speech recognition, where the analysis tries to identify any speech within the audio. The term audio mining is sometimes used interchangeably with audio indexing, phonetic searching, phonetic indexing, speech indexing, audio analytics, speech analytics, word spotting, and information retrieval. Audio indexing, however, is mostly used to describe the pre-process of audio mining, in which the audio file is broken down into a searchable index of words. == History == Academic research on audio mining began in the late 1970s in schools like Carnegie Mellon University, Columbia University, the Georgia Institute of Technology, and the University of Texas. Audio data indexing and retrieval began to receive attention and demand in the early 1990s, when multimedia content started to develop and the volume of audio content significantly increased. Before audio mining became the mainstream method, written transcripts of audio content were created and manually analyzed. == Process == Audio mining is typically split into four components: audio indexing, speech processing and recognition systems, feature extraction and audio classification. The audio will typically be processed by a speech recognition system in order to identify word or phoneme units that are likely to occur in the spoken content. This information may either be used immediately in pre-defined searches for keywords or phrases (a real-time "word spotting" system), or the output of the speech recognizer may be stored in an index file. One or more audio mining index files can then be loaded at a later date in order to run searches for keywords or phrases. The results of a search will normally be in terms of hits, which are regions within files that are good matches for the chosen keywords. The user may then be able to listen to the audio corresponding to these hits in order to verify if a correct match was found. === Audio Indexing === In audio, there is the main problem of information retrieval - there is a need to locate the text documents that contain the search key. Unlike humans, a computer is not able to distinguish between the different types of audios such as speed, mood, noise, music or human speech - an effective searching method is needed. Hence, audio indexing allows efficient search for information by analyzing an entire file using speech recognition. An index of content is then produced, bearing words and their locations done through content-based audio retrieval, focusing on extracted audio features. It is done through mainly two methods: Large Vocabulary Continuous Speech Recognition (LVCSR) and Phonetic-based Indexing. ==== Large Vocabulary Continuous Speech Recognizers (LVCSR) ==== In text-based indexing or large vocabulary continuous speech recognition (LVCSR), the audio file is first broken down into recognizable phonemes. It is then run through a dictionary that can contain several hundred thousand entries and matched with words and phrases to produce a full text transcript. A user can then simply search a desired word term and the relevant portion of the audio content will be returned. If the text or word could not be found in the dictionary, the system will choose the next most similar entry it can find. The system uses a language understanding model to create a confidence level for its matches. If the confidence level be below 100 percent, the system will provide options of all the found matches. ===== Advantages and disadvantages ===== The main draw of LVCSR is its high accuracy and high searching speed. In LVCSR, statistical methods are used to predict the likelihood of different word sequences, hence the accuracy is much higher than the single word lookup of a phonetic search. If the word can be found, the probability of the word spoken is very high. Meanwhile, while initial processing of audio takes a fair bit of time, searching is quick as just a simple test to text matching is needed. On the other hand, LVCSR is susceptible to common issues of speech recognition. The inherent random nature of audio and problems of external noise all affect the accuracies of text-based indexing. Another problem with LVCSR is its over reliance on its dictionary database. LVCSR only recognizes words that are found in their dictionary databases, and these dictionaries and databases are unable to keep up with the constant evolving of new terminology, names and words. Should the dictionary not contain a word, there is no way for the system to identify or predict it. This reduces the accuracy and reliability of the system. This is named the Out-of-vocabulary (OOV) problem. Audio mining systems try to cope with OOV by continuously updating the dictionary and language model used, but the problem still remains significant and has probed a search for alternatives. Additionally, due to the need to constantly update and maintain task-based knowledge and large training databases to cope with the OOV problem, high computational costs are incurred. This makes LVCSR an expensive approach to audio mining. ==== Phonetic-based Indexing ==== Phonetic-based indexing also breaks the audio file into recognizable phonemes, but instead of converting them to a text index, they are kept as they are and analyzed to create a phonetic-based index. The process of phonetic-based indexing can be split into two phases. The first phase is indexing. It begins by converting the input media into a standard audio representation format (PCM). Then, an acoustic model is applied to the speech. This acoustic model represents characteristics of both an acoustic channel (an environment in which the speech was uttered and a transducer through which it was recorded) and a natural language (in which human beings expressed the input speech). This produces a corresponding phonetic search track, or phonetic audio track (PAT), a highly compressed representation of the phonetic content of the input media. The second phase is searching. The user's search query term is parsed into a possible phoneme string using a phonetic dictionary. Then, multiple PAT files can be scanned at high speed during a single search for likely phonetic sequences that closely match corresponding strings of phonemes in the query term. ===== Advantages and disadvantages ===== Phonetic indexing is most attractive as it is largely unaffected by linguistic issues such as unrecognized words and spelling errors. Phonetic preprocessing maintains an open vocabulary that does not require updating. That makes it particularly useful for searching specialized terminology or words in foreign languages that do not commonly appear in dictionaries. It is also more effective for searching audio files with disruptive background noise and/or unclear utterances as it can compile results based on the sounds it can discern, and should the user wish to, they can search through the options until they find the desired item. Furthermore, in contrast to LVCSR, it can process audio files very quickly as there are very few unique phonemes between languages. However, phonemes cannot be effectively indexed like an entire word, thus searching on a phonetic-based system is slow. An issue with phonetic indexing is its low accuracy. Phoneme-based searches result in more false matches than text-based indexing. This is especially prevalent for short search terms, which have a stronger likelihood of sounding similar to other words or being part of bigger words. It could also return irrelevant results from other languages. Unless the system recognizes exactly the entire word, or understands phonetic sequences of languages, it is difficult for phonetic-based indexing to return accurate findings. === Speech processing and recognition system === Deemed as the most critical and complex component of audio mining, speech recognition requires the knowledge of human speech production system and its modeling. To correspond the Human speech production system, the electrical speech production system is developed to consist of: Speech generation Speech perception Voiced & unvoiced speech Model of human speech The electrical speech production system converts acoustic signal into corresponding representation of the spoken through the acoustic models in their software where all phonemes are represented. A statistical language model aids in the process by identifying how likely words are to follow each other in certain languages. Put together with a complex probability analysis, the speech recognition system is capable of taking an unknown speech signal and transcribing it into words based on the program's dictionary. ASR (automatic speech recognition) system includes: Acoustic analysis: input sound waveform is transformed into a feature Acoustic model: establishes relationship between speech signal and phonemes, pronunciation model and lang

RIPAC (microprocessor)

RIPAC was a VLSI single-chip microprocessor designed for automatic recognition of the connected speech, one of the first of this use. The project of the microprocessor RIPAC started in 1984. RIPAC was aimed to provide efficient real-time speech recognition services to the italian telephone system provided by SIP. The microprocessor was presented in September 1986 at The Hague (Netherlands) at EUSPICO conference. It was composed of 70.000 transistors and structured as Harvard architecture. The name RIPAC is the acronym for "Riconoscimento del PArlato Connesso", that means "Recognition of the connected speech" in Italian. The microprocessor was designed by the Italian companies CSELT and ELSAG and was produced by SGS: a combination of Hidden Markov Model and Dynamic Time Warping algorithms was used for processing speech signals. It was able to do real-time speech recognition of Italian and many languages with a good affordability. The chip, issued by U.S. Patent No. 4,907,278, worked at first run.

VideoThang

VideoThang was free video editing software for Windows 2000, XP, and Vista. The software has three parts to it which are My Stuff, Edit My Stuff, and My Mix. The software accepts MOV, AVI, MPG, MP4, PNG, WMV, FLV, and MP3 standards. Its official website is now no longer available. == Reception == Jan Ozer, of Pcmag, said that the software "suffers from several unfortunate design and implementation flaws that dramatically limit output quality and overall utility." Jon L. Jacobi, of PC World, said that the software "may not be the most flexible multimedia editor in the world, but the trim/zoom basics are there, it's free, and it's so simple to use that just about anyone in the world should be able figure it out." Amit Agarwal, of Digital Inspiration, said that the software "doesn’t offer loads of features like other video editors but is perfect for making quick video slideshows of your pictures that you can upload on the web or share via email."

Multiple buffering

In computer science, multiple buffering is the use of more than one buffer to hold a block of data, so that a "reader" will see a complete (though perhaps old) version of the data instead of a partially updated version of the data being created by a "writer". It is very commonly used for computer display images. It is also used to avoid the need to use dual-ported RAM (DPRAM) when the readers and writers are different devices. == Description == === Double buffering Petri net === The Petri net in the illustration shows double buffering. Transitions W1 and W2 represent writing to buffer 1 and 2 respectively while R1 and R2 represent reading from buffer 1 and 2 respectively. At the beginning, only the transition W1 is enabled. After W1 fires, R1 and W2 are both enabled and can proceed in parallel. When they finish, R2 and W1 proceed in parallel and so on. After the initial transient where W1 fires alone, this system is periodic and the transitions are enabled – always in pairs (R1 with W2 and R2 with W1 respectively). == Double buffering in computer graphics == In computer graphics, double buffering is a technique for drawing graphics that shows less stutter, tearing, and other artifacts. It is difficult for a program to draw a display so that pixels do not change more than once. For instance, when updating a page of text, it is much easier to clear the entire page and then draw the letters than to somehow erase only the pixels that are used in old letters but not in new ones. However, this intermediate image is seen by the user as flickering. In addition, computer monitors constantly redraw the visible video page (traditionally at around 60 times a second), so even a perfect update may be visible momentarily as a horizontal divider between the "new" image and the un-redrawn "old" image, known as tearing. === Software double buffering === A software implementation of double buffering has all drawing operations store their results in some region of system RAM; any such region is often called a "back buffer". When all drawing operations are considered complete, the whole region (or only the changed portion) is copied into the video RAM (the "front buffer"); this copying is usually synchronized with the monitor's raster beam in order to avoid tearing. Software implementations of double buffering necessarily require more memory and CPU time than single buffering because of the system memory allocated for the back buffer, the time for the copy operation, and the time waiting for synchronization. Compositing window managers often combine the "copying" operation with "compositing" used to position windows, transform them with scale or warping effects, and make portions transparent. Thus, the "front buffer" may contain only the composite image seen on the screen, while there is a different "back buffer" for every window containing the non-composited image of the entire window contents. === Page flipping === In the page-flip method, instead of copying the data, both buffers are capable of being displayed. At any one time, one buffer is actively being displayed by the monitor, while the other, background buffer is being drawn. When the background buffer is complete, the roles of the two are switched. The page-flip is typically accomplished by modifying a hardware register in the video display controller—the value of a pointer to the beginning of the display data in the video memory. The page-flip is much faster than copying the data and can guarantee that tearing will not be seen as long as the pages are switched over during the monitor's vertical blanking interval—the blank period when no video data is being drawn. The currently active and visible buffer is called the front buffer, while the background page is called the back buffer. == Triple buffering == In computer graphics, triple buffering is similar to double buffering but can provide improved performance. In double buffering, the program must wait until the finished drawing is copied or swapped before starting the next drawing. This waiting period could be several milliseconds during which neither buffer can be touched. In triple buffering, the program has two back buffers and can immediately start drawing in the one that is not involved in such copying. The third buffer, the front buffer, is read by the graphics card to display the image on the monitor. Once the image has been sent to the monitor, the front buffer is flipped with (or copied from) the back buffer holding the most recent complete image. Since one of the back buffers is always complete, the graphics card never has to wait for the software to complete. Consequently, the software and the graphics card are completely independent and can run at their own pace. Finally, the displayed image was started without waiting for synchronization and thus with minimum lag. Due to the software algorithm not polling the graphics hardware for monitor refresh events, the algorithm may continuously draw additional frames as fast as the hardware can render them. For frames that are completed much faster than interval between refreshes, it is possible to replace a back buffers' frames with newer iterations multiple times before copying. This means frames may be written to the back buffer that are never used at all before being overwritten by successive frames. Nvidia has implemented this method under the name "Fast Sync". An alternative method sometimes referred to as triple buffering is a swap chain three buffers long. After the program has drawn both back buffers, it waits until the first one is placed on the screen, before drawing another back buffer (i.e. it is a 3-long first in, first out queue). Most Windows games seem to refer to this method when enabling triple buffering. == Quad buffering == The term quad buffering is the use of double buffering for each of the left and right eye images in stereoscopic implementations, thus four buffers total (if triple buffering was used then there would be six buffers). The command to swap or copy the buffer typically applies to both pairs at once, so at no time does one eye see an older image than the other eye. Quad buffering requires special support in the graphics card drivers which is disabled for most consumer cards. AMD's Radeon HD 6000 Series and newer support it. 3D standards like OpenGL and Direct3D support quad buffering. == Double buffering for DMA == The term double buffering is used for copying data between two buffers for direct memory access (DMA) transfers, not for enhancing performance, but to meet specific addressing requirements of a device (particularly 32-bit devices on systems with wider addressing provided via Physical Address Extension). Windows device drivers are a place where the term "double buffering" is likely to be used. Linux and BSD source code calls these "bounce buffers". Some programmers try to avoid this kind of double buffering with zero-copy techniques. == Other uses == Double buffering is also used as a technique to facilitate interlacing or deinterlacing of video signals.

JasPer

JasPer is a computer software project to create a reference implementation of the codec specified in the JPEG-2000 Part-1 standard (i.e. ISO/IEC 15444-1) - started in 1997 at Image Power Inc. and at the University of British Columbia. It consists of a C library and some sample applications useful for testing the codec. The copyright owner began licensing the code to the public under an MIT License-style license in 2004 in response to requests from the open-source community. As of 2011 JasPer operated as a component of many software projects, both free and proprietary, including (but not limited to) netpbm (as of release 10.12), ImageMagick and KDE (as of version 3.2). As of 22 June 2010 the GEGL graphics library supported JasPer in its latest Git versions. In a series of objective JPEG-2000-compression quality tests conducted in 2004, "JasPer was the best codec, closely followed by IrfanView and Kakadu". However, Jasper remains one of the slowest implementations of the JPEG-2000 codec, as it was designed for reference, not performance. == Etymology == The name "JasPer" has simultaneous connotations with Canada's Jasper National Park, with the semi-precious gemstone, jasper, and with "JP" as an abbreviation of the JPEG-2000 standard.