AI App Just Like Chatgpt

AI App Just Like Chatgpt — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Articulatory speech recognition

    Articulatory speech recognition

    Articulatory speech recognition means the recovery of speech (in forms of phonemes, syllables or words) from acoustic signals with the help of articulatory modeling or an extra input of articulatory movement data. Speech recognition (or automatic speech recognition, acoustic speech recognition) means the recovery of speech from acoustics (sound wave) only. Articulatory information is extremely helpful when the acoustic input is in low quality, perhaps because of noise or missing data. Measurable information from the articulatory system (e.g. tongue, jaw movements) can supplement acoustic signals to improve phone recognition accuracy by 2%. However, attempts to estimate articulatory data from acoustic signals alone have not significantly enhanced recognition performance.

    Read more →
  • Wilkinson's Grammar of Graphics

    Wilkinson's Grammar of Graphics

    The Grammar of Graphics (GoG) is a grammar-based system for representing graphics to provide grammatical constraints on the composition of data and information visualizations. A graphical grammar differs from a graphics pipeline as it focuses on semantic components such as scales and guides, statistical functions, coordinate systems, marks and aesthetic attributes. For example, a bar chart can be converted into a pie chart by specifying a polar coordinate system without any other change in graphical specification. The grammar of graphics concept was launched by Leland Wilkinson in 2001 (Wilkinson et al., 2001; Wilkinson, 2005) and graphical grammars have since been written in a variety of languages with various parameterisations and extensions. The major implementations of graphical grammars are nViZn created by a team at SPSS/IBM, followed by Polaris focusing on multidimensional relational databases which is commercialised as Tableau, a revised Layered Grammar of Graphics by Hadley Wickham in Ggplot2, and Vega-Lite which is a visualisation grammar with added interactivity. The grammar of graphics continues to evolve with alternate parameterisations, extensions, or new specifications. == Wilkinson's Grammar of Graphics == === Theory === Wilkinson conceived the seven elements of a graphics to be Variables: mapping of objects to values represented in a graphic Algebra: operations to combine variables and specify dimensions of graphs Geometry: creation of geometric graphs from variables Aesthetics: sensory attributes Statistics: functions to change the appearance and representation of graphs Scales: represent variables on measured dimensions Coordinates: mapping to coordinate systems With these, Wilkinson hypothesised that These seven constructs are orthogonal and virtually all known statistical charts can be generated relatively parsimoniously This computational system is not a taxonomy of charts and rather it describes the meaning of what we do when we construct statistical graphics. === Implementations === Wilkinson wrote SYSTAT, a statistical software package, in the early 1980s. This program was noted for its comprehensive graphics, including the first software implementation of the heatmap display now widely used among biologists. After his company grew to 50 employees, he sold it to SPSS in 1995. At SPSS, he assembled a team of graphics programmers who developed the nViZn platform that produces the visualizations in SPSS, Clementine, and other analytics products. While at Stanford, Tableau founders Hanrahan and Stolte, as well as Diane Tang, created the predecessor to Tableau, named Polaris. Polaris was a data visualization software tool, built with the support of a United States Department of Energy defense program, the Accelerated Strategic Computing Initiative (ASCI). The main differences between Wilkinson's system and Polaris are the use of SQL relational algebra for database services and using shelves instead of cross and nest operators. == Wickham's Layered Grammar of Graphics == === Theory === Hadley Wickham conceived an alternate parameterisation of the syntax Wilkinson had derived, creating a layered grammar of graphics which he implemented as ggplot2 for R (programming language) users. This added a hierarchy of defaults based around the idea of building up a graphic from multiple layers. Wickham conceived these elements to be: Defaults: consists of data and mapping Data: dataset Mapping: aesthetic mappings Layer: consists of data, mapping, geom, stat, and position Data: dataset, or inherit from defaults Mapping: aesthetic mappings, or inherit from defaults Geom: geometric object Stat: statistical transformation Position: position adjustment Scale: mapping of data to aesthetic attributes Coord: mapping of data to the plane of the plot Facet: split up the data === Reception === Wilkinson is generally positive on Wickham's parameterisation and implementation of ggplot2, praising its elegance and expressivity whilst claiming that his original Grammar of Graphics is capable of representing a wider range of statistical graphics. === Implementations === ggplot2 is the first implementation of a layered grammar of graphics in R and implementations in other programming languages have ensued. These include direct ports plotnine for Python, gramm for MATLAB, Lets-Plot for Kotlin and gadfly for Julia. Projects inspired by elements of Wickham's grammar include Vega-Lite which specifies plots in JSON and uses a JavaScript engine. Implementations for Python include Vega-Altair (built on top of Vega-Lite). == Vega-Lite: A Grammar of Interactive Graphics == === Theory === Vega-Lite combines ideas from Wilkinson's Grammar of Graphics and Wickham's Layered Grammar of Graphics with a composition algebra for layered and multi-view displays with a grammar of interaction. The Vega-Lite specification is instantiated in JSON and rendered by the lower-level Vega. The graphical grammar implemented by Vega-Lite is composed of the following: Unit: consists of data, transforms, mark-type and encoding Data: relational table consisting of records (rows) and named attributes (columns) Transforms: data transformations Mark-type: geometric object for visual encoding Encodings: mapping of data attributes to visual marks properties where each encoding consists of: Channel: e.g. colour, shape, size, or text Field: data attribute Data-type: e.g. nominal, ordinal, quantitative, or temporal Value: use a literal instead of a data-type Functions: e.g. binning, aggregation, and sorting Scale: maps from data domain to visual range Guide: axis or legend for visualising scale Composite Views: compose views from multiple unit specifications with operators: Layer: charts plotted on top of each other Hconcat/Vconcat: place views side-by-side Facet: subset data to produce a trellis plot Repeat: multiple plots similar to facet but with full data replication in each cell Interaction: selections identify the set of points a user is interested in manipulating, with components: Selection: get the minimal number of backing points Name: reference Type: how many backing values are stored Predicate: determine the set of selected points e.g. single, list, interval Domain|Range: store data domain or visual range Event: e.g. mouseover, mousedown, mouseup, Init: initialise with specific backing points Transforms: e.g. project, toggle, translate, zoom, and nearest Resolve: resolve selections to union or intersect ==== Implementations ==== Whilst Vega-Lite is the sole implementation of this graphics grammar specification with compilation to Vega, other implementations do create JSON files which can be interpreted by Vega-Lite. == Related projects == Ggplot2 is an R package for plotting Tableau Software (originally known as Polaris) is a commercial software built using the Grammar of Graphics nViZn built by Wilkinson. SYSTAT (statistics package) built by Wilkinson ggpy, ggplot for Python, but has not been updated since 20 November 2016 plotnine started as an effort to improve the scalability of ggplot for Python and is largely compatible with ggplot2 syntax. Plotly - Interactive, online ggplot2 graphs gramm, a plotting class for MATLAB inspired by ggplot2 gadfly, a system for plotting and visualization written in Julia, based largely on ggplot2 Chart::GGPlot - ggplot2 port in Perl, but has not been updated since 16 March 2023 The Lets-Plot for Python library includes a native backend and a Python API, which was mostly based on the ggplot2 package. Lets-Plot Kotlin API is an open-source plotting library for statistical data implemented using the Kotlin programming language, and is built on the principles of layered graphics first described in the Leland Wilkinson's work The Grammar of Graphics. ggplotnim, plotting library using the Nim programming language inspired by ggplot2. Vega and Vega-Lite are plotting libraries that use JSON to specify plots. Vega-Altair, a Python library built on top of Vega-Lite chart-parts - React-friendly Grammar of Graphics, but has not been updated since 10 Dec 2021 g2 - a JavaScript library

    Read more →
  • Network Abstraction Layer

    Network Abstraction Layer

    The Network Abstraction Layer (NAL) is a part of the H.264/AVC and HEVC video coding standards. The main goal of the NAL is the provision of a "network-friendly" video representation addressing "conversational" (video telephony) and "non conversational" (storage, broadcast, or streaming) applications. NAL has achieved a significant improvement in application flexibility relative to prior video coding standards. == Introduction == An increasing number of services and growing popularity of high definition TV are creating greater needs for higher coding efficiency. Moreover, other transmission media such as cable modem, xDSL, or UMTS offer much lower data rates than broadcast channels, and enhanced coding efficiency can enable the transmission of more video channels or higher quality video representations within existing digital transmission capacities. Video coding for telecommunication applications has diversified from ISDN and T1/E1 service to embrace PSTN, mobile wireless networks, and LAN/Internet network delivery. Throughout this evolution, continued efforts have been made to maximize coding efficiency while dealing with the diversification of network types and their characteristic formatting and loss/error robustness requirements. The H.264/AVC and HEVC standards are designed for technical solutions including areas like broadcasting (over cable, satellite, cable modem, DSL, terrestrial, etc.) interactive or serial storage on optical and magnetic devices, conversational services, video-on-demand or multimedia streaming, multimedia messaging services, etc. Moreover, new applications may be deployed over existing and future networks. This raises the question about how to handle this variety of applications and networks. To address this need for flexibility and customizability, the design covers a NAL that formats the Video Coding Layer (VCL) representation of the video and provides header information in a manner appropriate for conveyance by a variety of transport layers or storage media. The NAL is designed in order to provide "network friendliness" to enable simple and effective customization of the use of VCL for a broad variety of systems. The NAL facilitates the ability to map VCL data to transport layers such as: RTP/IP for any kind of real-time wire-line and wireless Internet services. File formats, e.g., ISO MP4 for storage and MMS. H.32X for wireline and wireless conversational services. MPEG-2 systems for broadcasting services, etc. The full degree of customization of the video content to fit the needs of each particular application is outside the scope of the video coding standardization effort, but the design of the NAL anticipates a variety of such mappings. Some key concepts of the NAL are NAL units, byte stream, and packet formats uses of NAL units, parameter sets, and access units. A short description of these concepts is given below. == NAL units == The coded video data is organized into NAL units, each of which is effectively a packet that contains an integer number of bytes. The first byte of each H.264/AVC NAL unit is a header byte that contains an indication of the type of data in the NAL unit. For HEVC the header was extended to two bytes. All the remaining bytes contain payload data of the type indicated by the header. The NAL unit structure definition specifies a generic format for use in both packet-oriented and bitstream-oriented transport systems, and a series of NAL units generated by an encoder is referred to as a NAL unit stream. == NAL Units in Byte-Stream Format Use == Some systems require delivery of the entire or partial NAL unit stream as an ordered stream of bytes or bits within which the locations of NAL unit boundaries need to be identifiable from patterns within the coded data itself. For use in such systems, the H.264/AVC and HEVC specifications define a byte stream format. In the byte stream format, each NAL unit is prefixed by a specific pattern of three bytes called a start code prefix. The boundaries of the NAL unit can then be identified by searching the coded data for the unique start code prefix pattern. The use of emulation prevention bytes guarantees that start code prefixes are unique identifiers of the start of a new NAL unit. A small amount of additional data (one byte per video picture) is also added to allow decoders that operate in systems that provide streams of bits without alignment to byte boundaries to recover the necessary alignment from the data in the stream. Additional data can also be inserted in the byte stream format that allows expansion of the amount of data to be sent and can aid in achieving more rapid byte alignment recovery, if desired. == NAL Units in Packet-Transport System Use == In other systems (e.g., IP/RTP systems), the coded data is carried in packets that are framed by the system transport protocol, and identification of the boundaries of NAL units within the packets can be established without use of start code prefix patterns. In such systems, the inclusion of start code prefixes in the data would be a waste of data carrying capacity, so instead the NAL units can be carried in data packets without start code prefixes. == VCL and Non-VCL NAL Units == NAL units are classified into VCL and non-VCL NAL units. VCL NAL units contain the data that represents the values of the samples in the video pictures. Non-VCL NAL units contain any associated additional information such as parameter sets (important header data that can apply to a large number of VCL NAL units) and supplemental enhancement information (timing information and other supplemental data that may enhance usability of the decoded video signal but are not necessary for decoding the values of the samples in the video pictures). == Parameter Sets == A parameter set contains shared configuration data that is carried in non-VCL NAL units. Parameter sets are typically reused when decoding many coded pictures within a video sequence. Each VCL NAL unit references a picture parameter set (PPS), which in turn references a sequence parameter set (SPS). There are two types of parameter sets: Sequence parameter set (SPS), which specifies mostly constant configuration such as resolution, bit depth, or chroma format. (For a concrete implementation, see FFmpeg's SPS struct.) Picture parameter set (PPS), which applies on top of an SPS, and specifies configuration such as QP offsets. (For a concrete implementation, see FFmpeg's PPS struct.) The sequence and picture parameter-set mechanism decouples the transmission of infrequently changing information from the transmission of coded representations of the values of the samples in the video pictures. Each VCL NAL unit contains an identifier that refers to the content of the relevant picture parameter set and each picture parameter set contains an identifier that refers to the content of the relevant sequence parameter set. In this manner, a small amount of data (the identifier) can be used to refer to a larger amount of information (the parameter set) without repeating that information within each VCL NAL unit. Sequence and picture parameter sets can be sent well ahead of the VCL NAL units that they apply to, and can be repeated to provide robustness against data loss. In some applications, parameter sets may be sent within the channel that carries the VCL NAL units (termed "in-band" transmission). In other applications, it can be advantageous to convey the parameter sets "out-of-band" using a more reliable transport mechanism than the video channel itself. == Access Units == A set of NAL units in a specified form is referred to as an access unit. The decoding of each access unit results in one decoded picture. Each access unit contains a set of VCL NAL units that together compose a primary coded picture. It may also be prefixed with an access unit delimiter to aid in locating the start of the access unit. Some supplemental enhancement information containing data such as picture timing information may also precede the primary coded picture. The primary coded picture consists of a set of VCL NAL units consisting of slices or slice data partitions that represent the samples of the video picture. Following the primary coded picture may be some additional VCL NAL units that contain redundant representations of areas of the same video picture. These are referred to as redundant coded pictures, and are available for use by a decoder in recovering from loss or corruption of the data in the primary coded pictures. Decoders are not required to decode redundant coded pictures if they are present. Finally, if the coded picture is the last picture of a coded video sequence (a sequence of pictures that is independently decodable and uses only one sequence parameter set), an end of sequence NAL unit may be present to indicate the end of the sequence; and if the coded picture is the last coded picture in the entire NAL unit stream, an end of stream NAL unit may be present to

    Read more →
  • Seam carving

    Seam carving

    Seam carving (or liquid rescaling) is an algorithm for content-aware image resizing, developed by Shai Avidan, of Mitsubishi Electric Research Laboratories (MERL), and Ariel Shamir, of the Interdisciplinary Center and MERL. It functions by establishing a number of seams (paths of least importance) in an image and automatically removes seams to reduce image size or inserts seams to extend it. Seam carving also allows manually defining areas in which pixels may not be modified, and features the ability to remove whole objects from photographs. The purpose of the algorithm is image retargeting, which is the problem of displaying images without distortion on media of various sizes (cell phones, projection screens) using document standards, like HTML, that already support dynamic changes in page layout and text but not images. Image Retargeting was invented by Vidya Setlur, Saeko Takage, Ramesh Raskar, Michael Gleicher and Bruce Gooch in 2005. The work by Setlur et al. won the 10-year impact award in 2015. == Seams == Seams can be either vertical or horizontal. A vertical seam is a path of pixels connected from top to bottom in an image with one pixel in each row. A horizontal seam is similar with the exception of the connection being from left to right. The importance/energy function values a pixel by measuring its contrast with its neighbor pixels. == Process == The below example describes the process of seam carving: The seams to remove depends only on the dimension (height or width) one wants to shrink. It is also possible to invert step 4 so the algorithm enlarges in one dimension by copying a low energy seam and averaging its pixels with its neighbors. === Computing seams === Computing a seam consists of finding a path of minimum energy cost from one end of the image to another. This can be done via Dijkstra's algorithm, dynamic programming, greedy algorithm or graph cuts among others. ==== Dynamic programming ==== Dynamic programming is a programming method that stores the results of sub-calculations in order to simplify calculating a more complex result. Dynamic programming can be used to compute seams. If attempting to compute a vertical seam (path) of lowest energy, for each pixel in a row we compute the energy of the current pixel plus the energy of one of the three possible pixels above it. The images below depict a DP process to compute one optimal seam. Each square represents a pixel, with the top-left value in red representing the energy value of that pixel. The value in black represents the cumulative sum of energies leading up to and including that pixel. The energy calculation is trivially parallelized for simple functions. The calculation of the DP array can also be parallelized with some interprocess communication. However, the problem of making multiple seams at the same time is harder for two reasons: the energy needs to be regenerated for each removal for correctness and simply tracing back multiple seams can form overlaps. Avidan 2007 computes all seams by removing each seam iteratively and storing an "index map" to record all the seams generated. The map holds a "nth seam" number for each pixel on the image, and can be used later for size adjustment. If one ignores both issues however, a greedy approximation for parallel seam carving is possible. To do so, one starts with the minimum-energy pixel at one end, and keep choosing the minimum energy path to the other end. The used pixels are marked so that they are not picked again. Local seams can also be computed for smaller parts of the image in parallel for a good approximation. == Issues == The algorithm may need user-provided information to reduce errors. This can consist of painting the regions which are to be preserved. With human faces it is possible to use face detection. Sometimes the algorithm, by removing a low energy seam, may end up inadvertently creating a seam of higher energy. The solution to this is to simulate a removal of a seam, and then check the energy delta to see if the energy increases (forward energy). If it does, prefer other seams instead. == Implementations == Adobe Systems acquired a non-exclusive license to seam carving technology from MERL, and implemented it as a feature in Photoshop CS4, where it is called Content Aware Scaling. As the license is non-exclusive, other popular computer graphics applications (e. g. GIMP, digiKam, and ImageMagick) as well as some stand-alone programs (e. g. iResizer) also have implementations of this technique, some of which are released as free and open source software. There also exists an implementation for webpages. == Improvements and extensions == Better energy function and application to video by introducing 2D (time+1D) seams. Faster implementation on GPU. Application of this forward energy function to static images. Multi-operator: Combine with cropping and scaling. Much faster removal of multiple seams. Removing seams through neural deformation fields to extend to continuous domains like 3D scenes. A 2010 review of eight image retargeting methods found that seam carving produced output that was ranked among the worst of the tested algorithms. It was, however, a part of one of the highest-ranking algorithms: the multi-operator extension mentioned above (combined with cropping and scaling).

    Read more →
  • Haskins Laboratories

    Haskins Laboratories

    Haskins Laboratories, Inc. is an independent research laboratory, founded in 1935 and located in New Haven, Connecticut since 1970. Many current Haskins researchers are affiliated with Yale University's Child Study Center and/or the University of Connecticut. Haskins is a multidisciplinary and international community of researchers who conduct basic research on spoken and written language and global literacy. A guiding perspective of their research has been to view speech and language as emerging from biological processes, including those of adaptation, response to stimuli, and conspecific interaction. Haskins Laboratories has a long history of technological and theoretical innovation, from creating systems of rules for speech synthesis and development of an early working prototype of a reading machine for the blind to developing the landmark concept of phonemic awareness as the critical preparation for learning to read an alphabetic writing system. == Research tools and facilities == Haskins Laboratories is equipped, in-house, with a comprehensive suite of tools and capabilities to advance its mission of research into language and literacy. As of 2014, these included: Anechoic chamber Electroencephalography BioSemi 264 electrode, 24 bit Active Two System EGI 128 electrode, Geodesic EEG System 300 Electromagnetic articulography (EMMA) Carstens AG501 NDI WAVE Eye tracking: HL is equipped with 3 SR Research eye-trackers. 2 Model Eyelink 1000 systems. 1 Model Eyelink 1000plus system. Magnetic resonance imaging: Haskins has access to MRI scanners through agreements with the University of Connecticut and the Yale School of Medicine. On-site, HL has a Linux computer cluster dedicated to analysis of MRI data. Motion capture: HL is equipped with a Vicon motion capture system with one Basler high-speed digital camera, six Vicon MX T-20 cameras and a Vicon MX Giganet for synching camera data and connecting cameras to the data capture computer. Near infrared spectroscopy: HL has a TechEn CW6 8x8 system (four emitters; eight detectors). Ultrasound sonogram == History == Many researchers have contributed to scientific breakthroughs at Haskins Laboratories since its founding. All of them are indebted to the pioneering work and leadership of Caryl Parker Haskins, Franklin S. Cooper, Alvin Liberman, Seymour Hutner and Luigi Provasoli. The history presented here focuses on the research program of the division of Haskins Laboratories that, since the 1940s, has been most well known for its work in the areas of speech, language, and reading. === 1930s === Caryl Haskins and Franklin S. Cooper established Haskins Laboratories in 1935. It was originally affiliated with Harvard University, MIT, and Union College in Schenectady, NY. Caryl Haskins conducted research in microbiology, radiation physics, and other fields in Cambridge, MA and Schenectady. In 1939 Haskins Laboratories moved its center to New York City. Seymour Hutner joined the staff to set up a research program in microbiology, genetics, and nutrition. The descendant of the division led by Hutner program eventually became a department of Pace University in New York. The two identically named organizations are no longer formally affiliated. === 1940s === The U. S. Office of Scientific Research and Development, under Vannevar Bush asked Haskins Laboratories to evaluate and develop technologies for assisting blinded World War II veterans. Experimental psychologist Alvin Liberman joined Haskins Laboratories to assist in developing a "sound alphabet" to represent the letters in a text for use in a reading machine for the blind. Luigi Provasoli joined Haskins Laboratories to set up a research program in marine biology. The program in marine biology moved to Yale University in 1970 and disbanded with Provasoli's retirement in 1978. === 1950s === Franklin S. Cooper invented the pattern playback, a machine that converts pictures of the acoustic patterns of speech back into sound. With this device, Alvin Liberman, Cooper, and Pierre Delattre (and later joined by Katherine Safford Harris, Leigh Lisker, Arthur Abramson, and others), discovered the acoustic cues for the perception of phonetic segments (consonants and vowels). Liberman and colleagues proposed a motor theory of speech perception to resolve the acoustic complexity: they hypothesized that we perceive speech by tapping into a biological specialization, a speech module, that contains knowledge of the acoustic consequences of articulation. Liberman, aided by Frances Ingemann and others, organized the results of the work on speech cues into a groundbreaking set of rules for speech synthesis by the Pattern Playback. === 1960s === Franklin S. Cooper and Katherine Safford Harris, working with Peter MacNeilage, were the first researchers in the U.S. to use electromyographic techniques, pioneered at the University of Tokyo, to study the neuromuscular organization of speech. Leigh Lisker and Arthur Abramson looked for simplification at the level of articulatory action in the voicing of certain contrasting consonants. They showed that many acoustic properties of voicing contrasts arise from variations in voice onset time, the relative phasing of the onset of vocal cord vibration and the end of a consonant. Their work has been widely replicated and elaborated, here and abroad, over the following decades. Donald Shankweiler and Michael Studdert-Kennedy used a dichotic listening technique (presenting different nonsense syllables simultaneously to opposite ears) to demonstrate the dissociation of phonetic (speech) and auditory (nonspeech) perception by finding that phonetic structure devoid of meaning is an integral part of language, typically processed in the left cerebral hemisphere. Liberman, Cooper, Shankweiler, and Studdert-Kennedy summarized and interpreted fifteen years of research in "Perception of the Speech Code", still among the most cited papers in the speech literature. It set the agenda for many years of research at Haskins and elsewhere by describing speech as a code in which speakers overlap (or coarticulate) segments to form syllables. Researchers at Haskins connected their first computer to a speech synthesizer designed by Haskins Laboratories' engineers. Ignatius Mattingly, with British collaborators, John N. Holmes and J.N. Shearme, adapted the Pattern playback rules to write the first computer program for synthesizing continuous speech from a phonetically spelled input. A further step toward a reading machine for the blind combined Mattingly's program with an automatic look-up procedure for converting alphabetic text into strings of phonetic symbols. === 1970s === In 1970, Haskins Laboratories moved to New Haven, Connecticut, and entered into affiliation agreements with Yale University and the University of Connecticut; Haskins remains fully independent of both Yale and UConn, administratively and financially. The lab's original location in New Haven, at 270 Crown Street (from 1970 to 2005), was leased from Yale University. Isabelle Liberman, Donald Shankweiler, and Alvin Liberman teamed up with Ignatius Mattingly to study the relationship between speech perception and reading, a topic implicit in Haskins Laboratories' research program since its inception. They developed the concept of phonemic awareness, the knowledge that would-be readers must be aware of the phonemic structure of their language in order to be able to read. Leonard Katz related the work to contemporary cognitive theory and provided expertise in experimental design and data analysis. Under the broad rubric of the "alphabetic principle", this is the core of the lab's present program of reading pedagogy. Patrick Nye joined Haskins Laboratories to lead a team working on the reading machine for the blind. The project culminated when the addition of an optical character recognizer allowed investigators to assemble the first automatic text-to-speech reading machine. By the end of the decade this technology had advanced to the point where commercial concerns assumed the task of designing and manufacturing reading machines for the blind. In 1973, Franklin S. Cooper was selected to form a panel of six experts charged with investigating the famous 18-minute gap in the White House office tapes of President Richard Nixon related to the Watergate scandal. Building on earlier work, Philip Rubin developed the sinewave synthesis program, which was then used by Robert Remez, Rubin, and colleagues to show that listeners can perceive continuous speech without traditional speech cues from a pattern of sinewaves that track the changing resonances of the vocal tract. This paved the way for a view of speech as a dynamic pattern of trajectories through articulatory-acoustic space. Philip Rubin and colleagues developed Paul Mermelstein's anatomically simplified vocal tract model, originally worked on at Bell Laboratories, into the first articulatory synthesizer that can be controlled in a phy

    Read more →
  • NeoPaint

    NeoPaint

    NeoPaint is a raster graphics editor for Windows and MS-DOS. It supports several file formats including JPEG, GIF, BMP, PNG, and TIFF. The developer, NeoSoft, advertises NeoPaint as "being simple enough for use by children while remaining powerful enough for the purposes of advanced image editing". The first version, NeoPaint 1.0, was released in 1992 on floppy disks. It supported video modes ranging from 640x350 to 1024x768 and multiple fonts. NeoPaint 2.2 came out for MS-DOS 3.1 in 1993, with support of for 2, 16, or 256 color images in Hercules, EGA, VGA, and Super VGA modes. NeoPaint 3.1 was released in 1995 supporting 24-bit images and formats like PCX, TIFF and BMP. NeoPaint 3.2 was released in 1996. An updated version, NeoPaint 3.2a, supported the GIF file format. NeoPaint 3.2d was released in 1998. A Windows 95 version named NeoPaint for Windows v4.0 was released in 1999 supporting the PNG file format. On September 1, 2018 the program was rebranded as PixelNEO, becoming one of the VisualNEO software products. Formats such as JPEG 2000, ICO, CUR, PSD and RAW are supported.

    Read more →
  • Bump (application)

    Bump (application)

    Bump was an iOS and Android mobile app that enabled smartphone users to transfer contact information, photos and files between devices. In 2011, it was #8 on Apple's list of all-time most popular free iPhone apps, and by February 2013 it had been downloaded 125 million times. Its developer, Bump Technologies, shut down the service and discontinued the app on January 31, 2014, after being acquired by Google for Google Photos and Android Camera. == Features == Bump sent contact information, photos and files to another device over the internet. Before activating the transfer, each user confirmed what they want to send to the other user. To initiate a transfer, two people physically bumped their phones together. A screen appeared on both users' smartphone displays, allowing them to confirm what they want to send to each other. When two users bumped their phones, software on the phones send a variety of sensor data to an algorithm running on Bump servers, which included the location of the phone, accelerometer readings, IP address, and other sensor readings. The algorithm figured out which two phones felt the same physical bump and then transfers the information between those phones. Bump did not use Near Field Communication. February 2012 release of Bump 3.0 for iOS, the company streamlined the app to focus on its most frequently used features: contact and photo sharing. Bump 3.0 for Android maintained the features eliminated from the iOS version but moved them behind swipeable layers. In May 2012, a Bump update enabled users to transfer photos from their phone to their computer via a web service. To initiate a transfer, the user goes to the Bump website on their computer and bumps the smartphone on the computer keyboard's space bar. By December 2012, various Bump updates for iOS and Android had added the abilities to share video, audio, and any files. Users swipe to access those features. In February 2013, an update to the Bump iOS and Android apps enabled users to transfer photos, videos, contacts and other files from a computer to a smartphone and vice versa via a web service. To perform the transfer, users went to the Bump website on their computer and bump the smartphone on the computer keyboard's space bar. == History == The underlying idea of a synchronous gesture like bumping two devices for content transfer or pairing them was first conceived by Ken Hinkley of Microsoft Research in 2003. This idea was presented at a user interface and technology conference that same year. The paper proposed the use of accelerometers and a bumping gesture of two devices to enable communication, screen sharing and content transfer between them. Similar to this original concept, the idea for Bump app was conceived by David Lieb, a former employee of Texas Instruments, while he was attending the University of Chicago Booth School of Business for his MBA. While going through the orientation and meeting process of business school, he became frustrated by constantly entering contact information into his iPhone and felt that the process could be improved. His fellow Texas Instruments employees Andy Huibers and Jake Mintz, who was a classmate of Lieb's at the University of Chicago's MBA program, joined Lieb to form Bump Technologies. Bump Technologies launched in 2008 and is located in Mountain View, CA. Early funding for the project was provided by startup incubator Y Combinator, Sequoia Capital and other angel investors. It gained attention at the CTIA international wireless conference, due to its accessibility and novelty factor. In October 2009, Bump received $3.4m in Series A funding followed in January 2011 with a $16m series B financing round led by Andreessen Horowitz. Silicon Valley venture capitalist Marc Andreessen sits on the company's board. The Bump app debuted in the Apple iOS App Store in March 2009 and was “one of the apps that helped to define the iPhone” (Harry McCracken, Technologizer). It soon became the billionth download on Apple's App Store. An Android version launched in November 2009. By the time Bump 3.0 for iOS was released in February 2012, the app had been installed 77 million times, with users sharing more than 2 million photos daily. As of February 2013, there had been 125 million Bump app downloads. == Other apps created by Bump Technologies == Bump Technologies worked with PayPal in March 2010 to create a PayPal iPhone application. The application, which allows two users to automatically activate an Internet transfer of money between their accounts, found widespread adoption. A similar version was released for Android in August 2010. The Bump capability in PayPal's apps was removed in March 2012. At that time, Bump Technologies released Bump Pay, an iOS app that lets users transfer money via PayPal by physically bumping two smartphones together. The tool was originally created for the Bump team to use when splitting up restaurant bills. The payment feature was not added to the Bump app because the company “wanted to make it as simple as possible so people understand how this works,” Lieb told ABC News. Bump Pay was the first app from the company's Bump Labs initiative. A goal of Bump Labs is to test new app ideas that may not fit within the main Bump app. ING Direct added a feature to its iPhone app in 2011 that lets users transfer money to each other using Bump's technology. The feature was later added to its Android app, now called Capital One 360. In July 2012, Bump Technologies released Flock, an iPhone photo sharing app. An Android version was released in December 2012. Using geolocation data embedded in photos and a user's Facebook connections, Flock finds pictures the user takes while out with friends and family and puts everyone's photos from that event into a single shared album. Users receive a push notification after the event, asking if they want to share their photos with friends who were there in the moment. The app will also scan previous photos in the iPhone camera roll and uncover photos that have yet to be shared. If location services were enabled at the time a photo was taken, Flock allows users to create an album of photos from the past with the friends who were there with them. == Acquisition by Google == On September 16, 2013, Bump Technologies announced that it had been acquired by Google. On December 31, 2013, they broke the news that both Bump and Flock would be discontinued so that the team could focus on new projects at Google. The apps were removed from the App Store and Google Play on January 31, 2014. The company subsequently deleted all user data and shut down their servers, thus rendering existing installations of the apps inoperable.

    Read more →
  • Aphelion (software)

    Aphelion (software)

    The Aphelion Imaging Software Suite is a software suite that includes three base products - Aphelion Lab, Aphelion Dev, and Aphelion SDK for addressing image processing and image analysis applications. The suite also includes a set of extension programs to implement specific vertical applications that benefit from imaging techniques. The Aphelion software products can be used to prototype and deploy applications, or can be integrated, in whole or in part, into a user's system as processing and visualization libraries whose components are available as both DLLs or .Net components. == History and evolution == The development of Aphelion started in 1995 as a joint project of a French company, ADCIS S.A., and an American company, Amerinex Applied Imaging, Inc. (AAI) Aphelion's image processing and analysis functions were made from operators available from the KBVision software developed and sold by Amerinex's predecessor, Amerinex Artificial Intelligence Inc. In the 1990s, the XLim software library was developed at the Center of Mathematical Morphology of Mines ParisTech, and both companies carried out its development tasks. The first version of Aphelion was completed and released in April 1996. Successive versions were released before the first official stable release in December 1996 at the Photonics East conference in Boston and the Solutions Vision show in Paris in January 1997, where at the latter it competed with Stemmer Imaging's CVB imaging toolbox. In 1998, version 2.3 of Aphelion for Windows 98 was released, and its user base was growing in both France and the United States. Version 3.0, totally rewritten to take advantage of Microsoft's then-recent ActiveX technology, was officially released in 2000. It also became available as a « Developer » version, for rapid prototyping of applications using its intuitive GUI and the macro recording capability, and a « Core » version, including the full library as a set of ActiveX components to be used by software developers, integrators and original equipment manufacturers (OEM). As AAI turned its focus to security, in 2001, ADCIS took the lead on developing Aphelion. AAI focused on millimeter wave scanners for concealed weapon detection at airports, and eventually merged with Millimetrics to become Millivision. In 2004, ADCIS specified version 4.0 of Aphelion. The set of image processing/analysis functions was rewritten one more time to be compatible with the .NET technology and the emergence of 64 bit architecture PCs. In addition, the GUI was redesigned to address two usage types: a semi-automatic use where the user is guided through the different steps of functions, and a fully automatic use where the expert user can quickly invoke imaging functions. Its first release was presented at the IPOT exhibition in Birmingham, UK the same year. During the Vision Show in Paris in October 2008, the new Aphelion Lab product was launched for users that are not specialists in image processing. It is easier to use, and only includes fewer image processing functions. It was then included in the Aphelion Image Processing Suite, consisting of Aphelion Dev (replacing Aphelion Developer), Aphelion Lab, Aphelion SDK (replacing Aphelion Core), and a set of extensions. Nowadays, ADCIS is still working on the suite, and updated versions with new extensions and functionalities continually become available from the websites of both companies. In 2015, support was added for very large images and scan microscope images (virtual slides compound into a very large JPEG 2000 image) for high throughput imaging, and new specific extensions were also added. In late 2015, ADCIS announced Aphelion's port for tablets and smartphones, for vertical applications. The name "Aphelion" comes from the astronomical term of the same name, meaning the point on a planet rotating around the Sun where it lies farthest from it, applying the term in a metaphorical sense. Unix was the operating system used on scientific workstations in the 1990s, such as on the workstations manufactured by market leader Sun Microsystems, which Windows suite Aphelion was quite removed from. == Description == Aphelion is a software suite to be used for image processing and image analysis. It supports 2D and 3D, monochrome, color, and multi-band images. It is developed by ADCIS, a French software house located in Saint-Contest, Calvados, Normandy. Aphelion is widely used in the scientific/industry community to solve basic and complex imaging applications. First, the imaging application is quickly developed from the Graphical User Interface, involving a set of functions that can be automatically recorded into a macro command. The macro languages available in Aphelion (i.e. BasicScript, Python, and C#) help to process batch of images, and prompt the user if needed for specific parameters that are applied to the imaging functions. All Aphelion image processing functions are written in C++, and the Aphelion user interface is written in C#. C++ functions can be called from the C# language thanks the use of dedicated wrappers. The main principle of image processing is to automatically process pixels of a digital image, then extract one or more objects of interest (i.e. cells in the field of biology, inclusions in the field of material science) and compute one or more measurements on those objects to quantify the image and generate a verdict (good image, image with defects, cancerous cells). In other words, starting from an image, pixels are processed by a set of successive functions or operators until only measurements are computed and used as the input of a 3rd party system or a classification software that will classify objects of interest that have been extracted during the imaging process. An acquisition system such as a digital camera, a video camera, an optical or electron microscope, a medical scanner, or a smartphone can be used to capture images. The set of values or pixels can be processed as a 1D image (1D signal), a 2D image (array of pixel values corresponding to a monochrome or color image), or a 3D image displayed using volume rendering (array of voxels in the 3D space) or displaying surfaces by using 3D rendering. A 2D color image is made of 3 value pixels (typically Red, Green, and Blue information or another color space), and a 3D image is made of monochrome, color (indexed color are often used), multispectral, or hyperspectral data. When dealing with videos, an additional band is added corresponding to temporal information. The Aphelion Software Suite includes three base products, and a set of optional extensions for specific applications: Aphelion Lab: Entry-level package for non-experts in image processing. It helps to quickly segment an image in a semi-automatic or manual ways, and compute a set of measurements computed on objects of interest that have been extracted during the segmentation process. A set of wizards guides the user from image acquisition to report generation. Aphelion Dev: Full imaging environment including over 450 functions to develop and deploy an application that involves image processing and analysis. It also includes a set of macro-command languages to automate any application to be invoked from the user interface. It also helps to run the imaging algorithm on more than one image that are stored on disk, available on the network, or captured by an acquisition device. Aphelion libraries for image processing and visualization are provided in Aphelion Dev as DLLs and .Net components. Aphelion SDK: A set of libraries to develop a stand-alone application with a custom interface based on the Aphelion libraries. This software development kit including display, processing and analysis functions that can be used by software developers and OEMs. It is provided as DLLs and .Net components. The stand-alone application is typically developed in C# on one computer, and then deployed on multiple PCs and systems. A set of optional extensions can be added to the « Aphelion Dev » product, depending on the application. An evaluation version of Aphelion can be run on a PC for 30 days. A permanent version of Aphelion is available based on a perpetual license. Upgrades are available through a maintenance agreement based on a yearly fee. Technical support is provided by the engineers who are developing the product. The goal of image processing is usually to extract object(s) of interest in an image, and then to classify them based on some characteristics such as shape, density, position, etc. Using Aphelion, this goal is achieved by performing the following tasks: Load an image from disk or acquire an image using an acquisition device. Enhance the image removing noise or modifying its contrast. Segment the image extracting objects of interest to be measured and analyzed. Typically, for simple applications, a threshold is performed to generate a binary image. Then, morphological operators are applied to clean the image and only keep obj

    Read more →
  • Onshape

    Onshape

    Onshape is a computer-aided design (CAD) software system, delivered over the Internet via a software as a service (SaaS) model. It makes extensive use of cloud computing, with compute-intensive processing and rendering performed on Internet-based servers, and users are able to interact with the system via a web browser or the iOS and Android apps. As a SaaS system, Onshape upgrades are released directly to the web interface, and the software does not require maintenance by the user. Onshape allows teams to collaborate on a single shared design, the same way multiple writers can work together editing a shared document via cloud services. It is primarily focused on mechanical CAD (MCAD) and is used for product and machinery design across many industries, including consumer electronics, mechanical machinery, medical devices, 3D printing, machine parts, and industrial equipment. As of 2025, Onshape is popularly used as a CAD suite for the FIRST Robotics Competition (FRC) alongside the MKCad application available in the Onshape App Store. == Company history == Onshape was developed by a company with the same name. Founded in 2012, Onshape was based in Cambridge, Massachusetts (USA), with offices in Singapore and Pune, India. Its leadership team includes several engineers and executives who originated from SolidWorks, a popular 3D CAD program that runs on Microsoft Windows. Onshape’s co-founders include two former SolidWorks CEOs, Jon Hirschtick and John McEleney. In November 2012, former SolidWorks CEOs Jon Hirschtick and John McEleney led six co-founders launching Belmont Technology, a placeholder name that was later changed to Onshape. The company’s first round of funding was $9 million from North Bridge Venture Partners and Commonwealth Capital. In March 2015, Onshape released the public beta version of its cloud CAD software, after pre-production testing with more than a thousand CAD professionals in 52 countries. Included in the beta launch was Onshape for iPhone. In August 2015, the company released its Onshape for Android app. In December 2015, Onshape launched its full commercial release. The company also launched the Onshape App Store, offering CAM, simulation, rendering and other cloud-based engineering tools. The Onshape App Store was launched with 24 developer partners. In April 2016, Onshape introduced its Education Plan, with a free version of Onshape Professional geared for college students and educators. In May 2016, Onshape released FeatureScript, a new open source (MIT licensed) programming language for creating and customizing CAD features. In October 2019, Onshape agreed to be acquired by PTC. The acquisition closed in November 2019 for $470 million. In February 2024, Onshape released iOS support for the Apple Vision Pro, allowing for real world applications of CAD models and prototypes. In January 2025, Onshape released the CAM studio, allowing users to generate G-code for up to 5-axis Simultaneous milling. == Funding == Onshape was a venture-backed company with investments from firms including Andreessen Horowitz, Commonwealth Capital Ventures, New Enterprise Associates (NEA) and North Bridge Venture Partners. Total venture funding amounted to $169 million. == Supported file formats == === Modelling === ==== Importing ==== As of May 2025, Onshape supported importing (opening) the following common CAD file formats: Parasolid X_T (Preferred) STEP (ISO 10303) ISO JT (ISO 14306) ACIS IGES CATIA v4, v5, v6 Autodesk Inventor Part (.IPT) Assembly (.IAM) Presentation (.IPN) Drawing (.IDW) Pro/ENGINEER, Creo Rhinoceros 3D: .3dm .STL .OBJ SolidWorks file formats Siemens NX file formats Drawings (.DXF/.DWG) ==== Exporting ==== Onshape supports exporting to the following formats: STEP (ISO 10303) Parasolid XT ACIS IGES SolidWorks file formats .STL Rhinoceros 3D: .3dm Collada XML-spec based textual file === Drawing === Ordinary engineering or technical drawing can be exported as .PDF file. === Other Formats === In addition to CAD file formats, Onshape supports importing some Non-CAD file formats for viewing and referencing. === Assembly === Assemblies can be imported and exported to: STEP (ISO 10303) Parasolid XT ACIS Pro/ENGINEER, Creo ISO JT Rhinoceros 3D: .3dm Siemens NX file formats SolidWorks Pack and Go zip file File formats that assemblies can be only-exported to, are: IGES .STL Collada XML-spec based textual file

    Read more →
  • Vismon

    Vismon

    Vismon was the Bell Labs system which displayed authors' faces on one of their internal e-mail systems. The name was a pun on the sysmon program used at Bell to show the load on computer systems. It can also be interpreted as "visual monitor". The system inspired Rich Burridge to develop the similar but more widespread faces system, which spread with Unix distributions in the 1980s. This in turn inspired Steve Kinzler to develop the Picons, or personal icons, which have the goal of offering symbols and other images, as well as faces, to represent individuals and institutions in email messages. Other systems such as the faces available on the LAN email functions of the NeXTSTEP platform also seem to have been influenced by the original Vismon capabilities. The faces program in Plan 9 is the direct descendant of this system. Vismon was the work of Rob Pike and Dave Presotto. It was based on some early experiments by Luca Cardelli. Many other scientists and engineers of the Computing Science Research Center of the Murray Hill facility were also involved. All had been spurred by the introduction in 1983 of the new Blit graphics terminal developed by Pike and Bart Locanthi and marketed by Teletype Corporation of Skokie, Illinois as the DMD 5620. Pike was eager, along with his colleagues, to exploit the new graphic capabilities. Pike and company went around their Center, convincing everybody, from directors and administrative assistants to engineers and scientists, to pose as they got out a 4×5 view camera with a Polaroid back and took black-and-white photos (Polaroid type 52) of their faces. Their efforts yielded nearly 100 faces, which they digitised with a scanner from graphics colleagues. They wrote several programs to transform the faces, store them and serve them on several machines at the lab. As time went by, they added faces from outside their Center and outside Bell Labs. This database also led to the pico image editor (originally named zunk) which was used for image transformations, many of them with colleagues as the preferred target. The first programs built around vismon were used to announce incoming mail in a dedicated window, using the 48 by 48 pixel faces. Later on the faces were also used to decorate line printer banners.

    Read more →
  • KoalaPad

    KoalaPad

    The KoalaPad is a graphics tablet, released in 1983 by US company Koala Technologies Corporation, for the Apple II, TRS-80 Color Computer (as the TRS-80 Touch Pad), Atari 8-bit computers, Commodore 64, and IBM PC compatibles. Originally designed by Dr. David Thornburg as a low-cost computer drawing tool for schools, the Koala Pad and the bundled drawing program, KoalaPainter, was popular with home users as well. KoalaPainter was called KoalaPaint in some versions for the Apple II, and PC Design for the IBM PC. A program called Graphics Exhibitor was included for creating slideshow presentations from KoalaPainter drawings. == Description == The pad was four inches square (i.e. roughly 10×10 cm) and mounted on a slightly inclined base with the back of the pad higher than the front. At the top, "behind" the pad, were two buttons. The pad hooked into the computer using the analog signals of the joystick ports (the so-called paddle inputs), which meant that it had a low resolution and tended to jostle the cursor if moved during use. As an alternative to the drawing stylus, the pad could as easily be operated by the user's fingers for tasks that demanded less precision, such as selecting between menu items (thus using the pad as a kind of "indirect touch screen"). The top-mounted buttons tended to be somewhat frustrating to use, as the user had to "reach around" the stylus to push the buttons in order to start or stop drawing. A similar tablet from Atari, the Atari CX77 Touch Tablet, addressed this with a built-in button on the stylus, which some enterprising users adapted for use with their KoalaPad. == KoalaPainter == The pad shipped with a simple bitmap graphics editor developed by Audio Light called KoalaPainter, PC Design or Micro Illustrator depending on the target machine (see release history). Although bundled with the pad, KoalaPainter could also be operated using an ordinary digital joystick. One unique feature of the program, for its time, was that it held two pictures in the computer's memory, allowing the user to flip from one to the other—a function commonly used in order to study the differences between an original and a modified picture, and to copy and paste between two different pictures. Some third-party bitmap editors could also be used with the KoalaPad, such as Broderbund's Dazzle Draw for the Apple II. === Release history === KoalaPainter for Commodore 64 (1983) and Atari 8-bit computers (1983) PC Design for the IBM PC (1983) Micro Illustrator for the Apple II (1983), Atari 8-bit computers (1983) and Commodore Plus/4 (1984) KoalaPainter II for Commodore 64 (1984) === Reception === Ahoy! called KoalaPainter "a very powerful and effective color drawing package", and concluded that it and the KoalaPad were "excellent in ease of use, a fine choice for a beginner as well as young children". BYTE's reviewer stated in December 1984 that he made far fewer errors when using an Apple Mouse with MousePaint than with a KoalaPad and its software. He found that MousePaint was easier to use and more efficient, predicting that the mouse would receive more software support than the pad. Cassie Stahl in InfoWorld's Essential Guide to Atari Computers praised the tablet and its documentation, rating it "Excellent" among all categories and stating that "Playing with the KoalaPad becomes addictive. It does everything it claims to, and it does it well". She also liked Micro Illustrator, rating it "Excellent" except for "Good" for Performance. While criticizing the limited erase function, Stahl reported an undocumented feature enabling exporting pictures to other software. === File format === The Commodore 64 version of KoalaPainter used a fairly simple file format corresponding directly to the way bitmapped graphics are handled on the computer: A two-byte load address, followed immediately by 8,000 bytes of raw bitmap data, 1,000 bytes of raw "Video Matrix" data, 1,000 bytes of raw "Color RAM" data, and a one-byte Background Color field. == KoalaWare == Koala Technologies offered more software beyond the bundled KoalaPainter and Graphics Exhibitor for use with the pad. Among these applications, marketed under the moniker KoalaWare (like KoalaPainter itself), was educational software for use with customized keypads and overlays, such as spelling tools, music programs, and mathematics instruction software, as well as software for "translating" graphical designs into Logo programs.

    Read more →
  • Shape factor (image analysis and microscopy)

    Shape factor (image analysis and microscopy)

    Shape factors are dimensionless quantities used in image analysis and microscopy that numerically describe the shape of a particle, independent of its size. Shape factors are calculated from measured dimensions, such as diameter, chord lengths, area, perimeter, centroid, moments, etc. The dimensions of the particles are usually measured from two-dimensional cross-sections or projections, as in a microscope field, but shape factors also apply to three-dimensional objects. The particles could be the grains in a metallurgical or ceramic microstructure, or the microorganisms in a culture, for example. The dimensionless quantities often represent the degree of deviation from an ideal shape, such as a circle, sphere or equilateral polyhedron. Shape factors are often normalized, that is, the value ranges from zero to one. A shape factor equal to one usually represents an ideal case or maximum symmetry, such as a circle, sphere, square or cube. == Aspect ratio == The most common shape factor is the aspect ratio, a function of the largest diameter and the smallest diameter orthogonal to it: A R = d min d max {\displaystyle A_{R}={\frac {d_{\min }}{d_{\max }}}} The normalized aspect ratio varies from approaching zero for a very elongated particle, such as a grain in a cold-worked metal, to near unity for an equiaxed grain. The reciprocal of the right side of the above equation is also used, such that the AR varies from one to approaching infinity. == Circularity == Another very common shape factor is the circularity (or isoperimetric quotient), a function of the perimeter P and the area A: f circ = 4 π A P 2 {\displaystyle f_{\text{circ}}={\frac {4\pi A}{P^{2}}}} The circularity of a circle is 1, and much less than one for a starfish footprint. The reciprocal of the circularity equation is also used, such that fcirc varies from one for a circle to infinity. == Elongation shape factor == The less-common elongation shape factor is defined as the square root of the ratio of the two second moments in of the particle around its principal axes. f elong = i 2 i 1 {\displaystyle f_{\text{elong}}={\sqrt {\frac {i_{2}}{i_{1}}}}} == Compactness shape factor == The compactness shape factor is a function of the polar second moment in of a particle and a circle of equal area A. f comp = A 2 2 π i 1 2 + i 2 2 {\displaystyle f_{\text{comp}}={\frac {A^{2}}{2\pi {\sqrt {{i_{1}}^{2}+{i_{2}}^{2}}}}}} The fcomp of a circle is one, and much less than one for the cross-section of an I-beam. == Waviness shape factor == The waviness shape factor of the perimeter is a function of the convex portion Pcvx of the perimeter to the total. f wav = P cvx P {\displaystyle f_{\text{wav}}={\frac {P_{\text{cvx}}}{P}}} Some properties of metals and ceramics, such as fracture toughness, have been linked to grain shapes. == An application of shape factors == Greenland, the largest island in the world, has an area of 2,166,086 km2; a coastline (perimeter) of 39,330 km; a north–south length of 2670 km; and an east–west length of 1290 km. The aspect ratio of Greenland is A R = 1290 2670 = 0.483 {\displaystyle A_{R}={\frac {1290}{2670}}=0.483} The circularity of Greenland is f circ = 4 π ( 2166086 ) 39330 2 = 0.0176. {\displaystyle f_{\text{circ}}={\frac {4\pi (2166086)}{39330^{2}}}=0.0176.} The aspect ratio is agreeable with an eyeball-estimate on a globe. Such an estimate on a typical flat map, using the Mercator projection, would be less accurate due to the distorted scale at high latitudes. The circularity is deceptively low, due to the fjords that give Greenland a very jagged coastline (see the coastline paradox). A low value of circularity does not necessarily indicate a lack of symmetry, and shape factors are not limited to microscopic objects.

    Read more →
  • Kolmogorov–Arnold Networks

    Kolmogorov–Arnold Networks

    Kolmogorov–Arnold Networks (KANs) are a type of artificial neural network architecture inspired by the Kolmogorov–Arnold representation theorem, also known as the superposition theorem. Unlike traditional multilayer perceptrons (MLPs), which rely on fixed activation functions and linear weights, KANs replace each weight with a learnable univariate function, often represented using splines. == History == KANs (Kolmogorov–Arnold Networks) were proposed by Liu et al. (2024) as a generalization of the Kolmogorov–Arnold representation theorem (KART), aiming to outperform MLPs in small-scale AI and scientific tasks. Before KANs, numerous studies explored KART's connections to neural networks or used it as a basis for designing new network architectures. In the 1980s and 1990s, early research applied KART to neural network design. Kůrková et al. (1992), Hecht-Nielsen (1987), and Nees (1994) established theoretical foundations for multilayer networks based on KART. Igelnik et al. (2003) introduced the Kolmogorov Spline Network using cubic splines to model complex functions. Sprecher (1996, 1997) introduced numerical methods for building network layers, while Nakamura et al. (1993) created activation functions with guaranteed approximation accuracy. These works linked KART's theoretical potential with practical neural network implementation. KART has also been used in other computational and theoretical fields. Coppejans (2004) developed nonparametric regression estimators using B-splines, Bryant (2008) applied it to high-dimensional image tasks, Liu (2015) investigated theoretical applications in optimal transport and image encryption, and more recently, Polar and Poluektov (2021) used Urysohn operators for efficient KART construction, while Fakhoury et al. (2022) introduced ExSpliNet, integrating KART with probabilistic trees and multivariate B-splines for improved function approximation. == Architecture == KANs are based on the Kolmogorov–Arnold representation theorem, which was linked to the 13th Hilbert problem. Given x = ( x 1 , x 2 , … , x n ) {\displaystyle x=(x_{1},x_{2},\dots ,x_{n})} consisting of n variables, a multivariate continuous function f ( x ) {\displaystyle f(x)} can be represented as: f ( x ) = f ( x 1 , … , x n ) = ∑ q = 1 2 n + 1 Φ q ( ∑ p = 1 n φ q , p ( x p ) ) {\displaystyle f(x)=f(x_{1},\dots ,x_{n})=\sum _{q=1}^{2n+1}\Phi _{q}\left(\sum _{p=1}^{n}\varphi _{q,p}(x_{p})\right)} (1) This formulation contains two nested summations: an outer and an inner sum. The outer sum ∑ q = 1 2 n + 1 {\displaystyle \sum _{q=1}^{2n+1}} aggregates 2 n + 1 {\displaystyle 2n+1} terms, each involving a function Φ q : R → R {\displaystyle \Phi _{q}:\mathbb {R} \to \mathbb {R} } . The inner sum ∑ p = 1 n {\displaystyle \sum _{p=1}^{n}} computes n terms for each q, where each term φ q , p : [ 0 , 1 ] → R {\displaystyle \varphi _{q,p}:[0,1]\to \mathbb {R} } is a continuous function of the single variable x p {\displaystyle x_{p}} . The inner continuous functions φ q , p {\displaystyle \varphi _{q,p}} are universal, independent of f {\displaystyle f} , while the outer functions Φ q {\displaystyle \Phi _{q}} depend on the specific function f {\displaystyle f} being represented. The representation (1) holds for all multivariate functions f {\displaystyle f} as proved in . If f {\displaystyle f} is continuous, then the outer functions Φ q {\displaystyle \Phi _{q}} are continuous; if f {\displaystyle f} is discontinuous, then the corresponding Φ q {\displaystyle \Phi _{q}} are generally discontinuous, while the inner functions φ q , p {\displaystyle \varphi _{q,p}} remain the same universal functions. Liu et al. proposed the name KAN. A general KAN network consisting of L layers takes x to generate the output as: K A N ( x ) = ( Φ L − 1 ∘ Φ L − 2 ∘ ⋯ ∘ Φ 1 ∘ Φ 0 ) x {\displaystyle \mathrm {KAN} (x)=(\Phi ^{L-1}\circ \Phi ^{L-2}\circ \cdots \circ \Phi ^{1}\circ \Phi ^{0})x} (3) Here, Φ l {\displaystyle \Phi ^{l}} is the function matrix of the l-th KAN layer or a set of pre-activations. Let i denote the neuron of the l-th layer and j the neuron of the (l+1)-th layer. The activation function φ j , i l {\displaystyle \varphi _{j,i}^{l}} connects (l, i) to (l+1, j): φ j , i l , l = 0 , … , L − 1 , i = 1 , … , n l , j = 1 , … , n l + 1 {\displaystyle \varphi _{j,i}^{l},\quad l=0,\dots ,L-1,\;i=1,\dots ,n_{l},\;j=1,\dots ,n_{l+1}} (4) where nl is the number of nodes of the l-th layer. Thus, the function matrix Φ l {\displaystyle \Phi ^{l}} can be represented as an n l + 1 × n l {\displaystyle n_{l+1}\times n_{l}} matrix of activations: x l + 1 = ( φ 1 , 1 l ( ⋅ ) φ 1 , 2 l ( ⋅ ) ⋯ φ 1 , n l l ( ⋅ ) φ 2 , 1 l ( ⋅ ) φ 2 , 2 l ( ⋅ ) ⋯ φ 2 , n l l ( ⋅ ) ⋮ ⋮ ⋱ ⋮ φ n l + 1 , 1 l ( ⋅ ) φ n l + 1 , 2 l ( ⋅ ) ⋯ φ n l + 1 , n l l ( ⋅ ) ) x l {\displaystyle x^{l+1}={\begin{pmatrix}\varphi _{1,1}^{l}(\cdot )&\varphi _{1,2}^{l}(\cdot )&\cdots &\varphi _{1,n_{l}}^{l}(\cdot )\\\varphi _{2,1}^{l}(\cdot )&\varphi _{2,2}^{l}(\cdot )&\cdots &\varphi _{2,n_{l}}^{l}(\cdot )\\\vdots &\vdots &\ddots &\vdots \\\varphi _{n_{l+1},1}^{l}(\cdot )&\varphi _{n_{l+1},2}^{l}(\cdot )&\cdots &\varphi _{n_{l+1},n_{l}}^{l}(\cdot )\end{pmatrix}}x^{l}} == Implementations == To make the KAN layers optimizable, the inner function is formed by the combination of spline and basic functions as the formula: φ ( x ) = w b b ( x ) + w s spline ( x ) {\displaystyle \varphi (x)=w_{b}\,b(x)+w_{s}\,{\text{spline}}(x)} where b ( x ) {\displaystyle b(x)} is the basic function, usually defined as s i l u ( x ) = x / ( 1 + e x ) {\displaystyle silu(x)=x/(1+e^{x})} and w b {\displaystyle w_{b}} is the base weight matrix. Also, w s {\displaystyle w_{s}} is the spline weight matrix and spline ( x ) {\displaystyle {\text{spline}}(x)} is the spline function. The spline function can be a sum of B-splines. spline ( x ) = ∑ i c i B i ( x ) {\displaystyle {\text{spline}}(x)=\sum _{i}c_{i}B_{i}(x)} Many studies suggested to use other polynomial and curve functions instead of B-spline to create new KAN variants. == Functions used == The choice of functional basis strongly influences the performance of KANs. Common function families include: B-splines: Provide locality, smoothness, and interpretability; they are the most widely used in current implementations. RBFs (include Gaussian RBFs): Capture localized features in data and are effective in approximating functions with non-linear or clustered structures. Chebyshev polynomials: Offer efficient approximation with minimized error in the maximum norm, making them useful for stable function representation. Rational function: Useful for approximating functions with singularities or sharp variations, as they can model asymptotic behavior better than polynomials. Fourier series: Capture periodic patterns effectively and are particularly useful in domains such as physics-informed machine learning. Wavelet functions (DoG, Mexican hat, Morlet, and Shannon): Used for feature extraction as they can capture both high-frequency and low-frequency data components. Piecewise linear functions: Provide efficient approximation for multivariate functions in KANs. == Usage == In some modern neural architectures like convolutional neural networks (CNNs), recurrent neural networks (RNNs), and Transformers, KANs are typically used as drop-in substitutes for MLP layers. Despite KANs' general-purpose design, researchers have created and used them for a number of tasks: Scientific machine learning (SciML): Function fitting, partial differential equations (PDEs) and physical/mathematical laws. Continual learning: KANs better preserve previously learned information during incremental updates, avoiding catastrophic forgetting due to the locality of spline adjustments. Graph neural networks: Extensions such as Kolmogorov–Arnold Graph Neural Networks (KA-GNNs) integrate KAN modules into message-passing architectures, showing improvements in molecular property prediction tasks. Sensor data processing: Kolmogorov–Arnold Networks (KANs) have recently been applied to sensor data processing due to their ability to model complex nonlinear relationships with relatively few parameters and improved interpretability compared to conventional multilayer perceptrons. Applications include industrial soft sensors, biomedical signal analysis, remote sensing, and environmental monitoring systems. == Drawbacks == KANs can be computationally intensive and require a large number of parameters due to their use of polynomial functions to capture data.

    Read more →
  • Automated Lip Reading

    Automated Lip Reading

    Automated Lip Reading (ALR) is a software technology developed by speech recognition expert Frank Hubner. A video image of a person talking can be analysed by the software. The shapes made by the lips can be examined and then turned into sounds. The sounds are compared to a dictionary to create matches to the words being spoken. The technology was used successfully to analyse silent home movie footage of Adolf Hitler taken by Eva Braun at their Bavarian retreat Berghof. The video, with words, was included in a documentary titled "Hitler's Private World", Revealed Studios, 2006 Source: New Technology catches Hitler off guard

    Read more →
  • Oculus Quill

    Oculus Quill

    Quill is a painting and animation software for virtual reality. It runs on Microsoft Windows with Oculus Rift headsets. It is used to create 3D paintings and animated cartoons. Quill was released on November 29, 2016, on the Oculus Store. Theater Elsewhere(formerly Quill Theater), an application for viewing creations made in Quill, was later made available following the release of the Oculus Quest. In September 2021, Facebook, now known as Meta Platforms, and the owner of Oculus, sold Quill to its original creator, who continues to develop and support the app. == Development == Quill was originally developed by Oculus Story Studio as an internal tool for the creative needs of the studio's project Dear Angelica directed by Saschka Unseld along with its art-director Wesley Allsbrook. == Controls == The software works on Oculus Rift utilizing its 6DoF motion controllers. Users can paint in 3D space using their hands naturally, and animate those paintings with keyframes. They can also capture videos and photos of their creations. == Reception == Dear Angelica, a VR story fully painted in Quill, was nominated for an Emmy Award in 2017.

    Read more →