AI Coding Meta

AI Coding Meta — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • U-Net

    U-Net

    U-Net is a convolutional neural network that was developed for image segmentation. The network is based on a fully convolutional neural network whose architecture was modified and extended to work with fewer training images and to yield more precise segmentation. Segmentation of a 512 × 512 image takes less than a second on a modern (2015) GPU using the U-Net architecture. The U-Net architecture has also been employed in diffusion models for iterative image denoising. This technology underlies many modern image generation models, such as DALL-E, Midjourney, and Stable Diffusion. U-Net is also being explored for language models. Tokenization is not a separate step, allowing the model to more easily understand spelling and concurrently vectorizing / tokenizing higher level concepts. == Description == The U-Net architecture stems from the so-called "fully convolutional network". The main idea is to supplement a usual contracting network by successive layers, where pooling operations are replaced by upsampling operators. Hence these layers increase the resolution of the output. A successive convolutional layer can then learn to assemble a precise output based on this information. One important modification in U-Net is that there are a large number of feature channels in the upsampling part, which allow the network to propagate context information to higher resolution layers. As a consequence, the expansive path is more or less symmetric to the contracting part, and yields a u-shaped architecture. The network only uses the valid part of each convolution without any fully connected layers. To predict the pixels in the border region of the image, the missing context is extrapolated by mirroring the input image. This tiling strategy is important to apply the network to large images, since otherwise the resolution would be limited by the GPU memory. Recently, there had also been an interest in receptive field based U-Net models for medical image segmentation. == Network architecture == The network consists of a contracting path and an expansive path, which gives it the u-shaped architecture. The contracting path is a typical convolutional network that consists of repeated application of convolutions, each followed by a rectified linear unit (ReLU) and a max pooling operation. During the contraction, the spatial information is reduced while feature information is increased. The expansive pathway combines the feature and spatial information through a sequence of up-convolutions and concatenations with high-resolution features from the contracting path. == Applications == There are many applications of U-Net in biomedical image segmentation, such as brain image segmentation (''BRATS'') and liver image segmentation ("siliver07") as well as protein binding site prediction. U-Net implementations have also found use in the physical sciences, for example in the analysis of micrographs of materials. Variations of the U-Net have also been applied for medical image reconstruction. Here are some variants and applications of U-Net as follows: Pixel-wise regression using U-Net and its application on pansharpening; 3D U-Net: Learning Dense Volumetric Segmentation from Sparse Annotation; TernausNet: U-Net with VGG11 Encoder Pre-Trained on ImageNet for Image Segmentation. Image-to-image translation to estimate fluorescent stains In binding site prediction of protein structure. == History == U-Net was created by Olaf Ronneberger, Philipp Fischer, Thomas Brox in 2015 and reported in the paper "U-Net: Convolutional Networks for Biomedical Image Segmentation". It is an improvement and development of FCN: Evan Shelhamer, Jonathan Long, Trevor Darrell (2014). "Fully convolutional networks for semantic segmentation".

    Read more →
  • Image subtraction

    Image subtraction

    Image subtraction or pixel subtraction or difference imaging is an image processing technique whereby the digital numeric value of one pixel or whole image is subtracted from another image, and a new image generated from the result. This is primarily done for one of two reasons – levelling uneven sections of an image such as half an image having a shadow on it, or detecting changes between two images. This method can show things in the image that have changed position, brightness, color, or shape. For this technique to work, the two images must first be spatially aligned to match features between them, and their photometric values and point spread functions must be made compatible, either by careful calibration, or by post-processing (using color mapping). The complexity of the pre-processing needed before differencing varies with the type of image, but is essential to ensure good subtraction of static features. This is commonly used in fields such as time-domain astronomy (known primarily as difference imaging) to find objects that fluctuate in brightness or move. In automated searches for asteroids or Kuiper belt objects, the target moves and will be in one place in one image, and in another place in a reference image made an hour or day later. Thus, image processing algorithms can make the fixed stars in the background disappear, leaving only the target. Distinct families of astronomical image subtraction techniques have emerged, operating in both image space or frequency space, with distinct trade-offs in both quality of subtraction and computational cost. These algorithms lie at the heart of almost all modern (and upcoming) transient surveys, and can enable the detection of even faint supernovae embedded in bright galaxies. Nevertheless, in astronomical imaging, significant 'residuals' remain around bright, complex sources, necessitating further algorithmic steps to identify candidates (known as real-bogus classification) The Hutchinson metric can be used to "measure of the discrepancy between two images for use in fractal image processing".

    Read more →
  • Human–robot interaction

    Human–robot interaction

    Human–robot interaction (HRI) is the study of interactions between humans and robots. Human–robot interaction is a multidisciplinary field with contributions from human–computer interaction, artificial intelligence, robotics, natural language processing, design, psychology and philosophy. A subfield known as physical human–robot interaction (pHRI) has tended to focus on device design to enable people to safely interact with robotic systems. == Origins == Human–robot interaction has been a topic of both science fiction and academic speculation even before any robots existed. Because much of active HRI development depends on natural language processing, many aspects of HRI are continuations of human communications, a field of research which is much older than robotics. The origin of HRI as a discrete problem was stated by 20th-century author Isaac Asimov in 1941, in his novel I, Robot. Asimov coined Three Laws of Robotics, namely: A robot may not injure a human being or, through inaction, allow a human being to come to harm. A robot must obey the orders given it by human beings except where such orders would conflict with the First Law. A robot must protect its own existence as long as such protection does not conflict with the First or Second Laws. These three laws provide an overview of the goals engineers and researchers hold for safety in the HRI field, although the fields of robot ethics and machine ethics are more complex than these three principles. However, generally human–robot interaction prioritizes the safety of humans that interact with potentially dangerous robotics equipment. Solutions to this problem range from the philosophical approach of treating robots as ethical agents (individuals with moral agency), to the practical approach of creating safety zones. These safety zones use technologies such as lidar to detect human presence or physical barriers to protect humans by preventing any contact between machine and operator. Although initially robots in the human–robot interaction field required some human intervention to function, research has expanded this to the extent that fully autonomous systems are now far more common than in the early 2000s. Autonomous systems include from simultaneous localization and mapping systems which provide intelligent robot movement to natural-language processing and natural-language generation systems which allow for natural, human-esque interaction which meet well-defined psychological benchmarks. Anthropomorphic robots (machines which imitate human body structure) are better described by the biomimetics field, but overlap with HRI in many research applications. Examples of robots which demonstrate this trend include Willow Garage's PR2 robot, the NASA Robonaut, and Honda ASIMO. However, robots in the human–robot interaction field are not limited to human-like robots: Paro and Kismet are both robots designed to elicit emotional response from humans, and so fall into the category of human–robot interaction. Goals in HRI range from industrial manufacturing through Cobots, medical technology through rehabilitation, autism intervention, and elder care devices, entertainment, human augmentation, and human convenience. Future research therefore covers a wide range of fields, much of which focuses on assistive robotics, robot-assisted search-and-rescue, and space exploration. == The goal of friendly human–robot interactions == Robots are artificial agents with capacities of perception and action in the physical world often referred by researchers as workspace. Their use has been generalized in factories but nowadays they tend to be found in the most technologically advanced societies in such critical domains as search and rescue, military battle, mine and bomb detection, scientific exploration, law enforcement, entertainment and hospital care. These new domains of applications imply a closer interaction with the user, sharing the workspace but also goals in terms of task achievement. The subfield of physical human–robot interaction (pHRI) has largely focused on device design to enable people to safely interact with robotic systems but is increasingly developing algorithmic approaches in an attempt to support fluent and expressive interactions between humans and robotic systems. With the advance in AI, the research is focusing on one part towards the safest physical interaction but also on a socially correct interaction, dependent on cultural criteria. The goal is to build an intuitive, and easy communication with the robot through speech, gestures, and facial expressions. Kerstin Dautenhahn refers to friendly Human–robot interaction as "Robotiquette" defining it as the "social rules for robot behaviour (a 'robotiquette') that is comfortable and acceptable to humans" The robot has to adapt itself to our way of expressing desires and orders and not the contrary. But every day environments such as homes have much more complex social rules than those implied by factories or even military environments. Thus, the robot needs perceiving and understanding capacities to build dynamic models of its surroundings. It needs to categorize objects, recognize and locate humans and further recognize their emotions. The need for dynamic capacities pushes forward every sub-field of robotics. Furthermore, by understanding and perceiving social cues, robots can enable collaborative scenarios with humans. For example, with the rapid rise of personal fabrication machines such as desktop 3D printers, laser cutters, etc., entering our homes, scenarios may arise where robots can collaboratively share control, co-ordinate and achieve tasks together. Industrial robots have already been integrated into industrial assembly lines and are collaboratively working with humans. The social impact of such robots have been studied and has indicated that workers still treat robots and social entities, rely on social cues to understand and work together. On the other end of HRI research the cognitive modelling of the "relationship" between human and the robots benefits the psychologists and robotic researchers the user study are often of interests on both sides. This research endeavours part of human society. For effective human – humanoid robot interaction numerous communication skills and related features should be implemented in the design of such artificial agents/systems. == General HRI research == HRI research spans a wide range of fields, some general to the nature of HRI. === Methods for perceiving humans === Methods for perceiving humans in the environment are based on sensor information. Research on sensing components and software led by Microsoft provide useful results for extracting the human kinematics (see Kinect). An example of older technique is to use colour information for example the fact that for light skinned people the hands are lighter than the clothes worn. In any case a human modelled a priori can then be fitted to the sensor data. The robot builds or has (depending on the level of autonomy the robot has) a 3D mapping of its surroundings to which is assigned the humans locations. Most methods intend to build a 3D model through vision of the environment. The proprioception sensors permit the robot to have information over its own state. This information is relative to a reference. Theories of proxemics may be used to perceive and plan around a person's personal space. A speech recognition system is used to interpret human desires or commands. By combining the information inferred by proprioception, sensor and speech the human position and state (standing, seated). In this matter, natural-language processing is concerned with the interactions between computers and human (natural) languages, in particular how to program computers to process and analyze large amounts of natural-language data. For instance, neural-network architectures and learning algorithms that can be applied to various natural-language processing tasks including part-of-speech tagging, chunking, named-entity recognition, and semantic role labeling. === Methods for motion planning === Motion planning in dynamic environments is a challenge that can at the moment only be achieved for robots with 3 to 10 degrees of freedom. Humanoid robots or even 2 armed robots, which can have up to 40 degrees of freedom, are unsuited for dynamic environments with today's technology. However lower-dimensional robots can use the potential field method to compute trajectories which avoid collisions with humans. === Cognitive models and theory of mind === Humans exhibit negative social and emotional responses as well as decreased trust toward some robots that closely, but imperfectly, resemble humans; this phenomenon has been termed the "Uncanny Valley". However recent research in telepresence robots has established that mimicking human body postures and expressive gestures has made the robots likeable and engaging in a remote setting. Further, the presence o

    Read more →
  • Haskins Laboratories

    Haskins Laboratories

    Haskins Laboratories, Inc. is an independent research laboratory, founded in 1935 and located in New Haven, Connecticut since 1970. Many current Haskins researchers are affiliated with Yale University's Child Study Center and/or the University of Connecticut. Haskins is a multidisciplinary and international community of researchers who conduct basic research on spoken and written language and global literacy. A guiding perspective of their research has been to view speech and language as emerging from biological processes, including those of adaptation, response to stimuli, and conspecific interaction. Haskins Laboratories has a long history of technological and theoretical innovation, from creating systems of rules for speech synthesis and development of an early working prototype of a reading machine for the blind to developing the landmark concept of phonemic awareness as the critical preparation for learning to read an alphabetic writing system. == Research tools and facilities == Haskins Laboratories is equipped, in-house, with a comprehensive suite of tools and capabilities to advance its mission of research into language and literacy. As of 2014, these included: Anechoic chamber Electroencephalography BioSemi 264 electrode, 24 bit Active Two System EGI 128 electrode, Geodesic EEG System 300 Electromagnetic articulography (EMMA) Carstens AG501 NDI WAVE Eye tracking: HL is equipped with 3 SR Research eye-trackers. 2 Model Eyelink 1000 systems. 1 Model Eyelink 1000plus system. Magnetic resonance imaging: Haskins has access to MRI scanners through agreements with the University of Connecticut and the Yale School of Medicine. On-site, HL has a Linux computer cluster dedicated to analysis of MRI data. Motion capture: HL is equipped with a Vicon motion capture system with one Basler high-speed digital camera, six Vicon MX T-20 cameras and a Vicon MX Giganet for synching camera data and connecting cameras to the data capture computer. Near infrared spectroscopy: HL has a TechEn CW6 8x8 system (four emitters; eight detectors). Ultrasound sonogram == History == Many researchers have contributed to scientific breakthroughs at Haskins Laboratories since its founding. All of them are indebted to the pioneering work and leadership of Caryl Parker Haskins, Franklin S. Cooper, Alvin Liberman, Seymour Hutner and Luigi Provasoli. The history presented here focuses on the research program of the division of Haskins Laboratories that, since the 1940s, has been most well known for its work in the areas of speech, language, and reading. === 1930s === Caryl Haskins and Franklin S. Cooper established Haskins Laboratories in 1935. It was originally affiliated with Harvard University, MIT, and Union College in Schenectady, NY. Caryl Haskins conducted research in microbiology, radiation physics, and other fields in Cambridge, MA and Schenectady. In 1939 Haskins Laboratories moved its center to New York City. Seymour Hutner joined the staff to set up a research program in microbiology, genetics, and nutrition. The descendant of the division led by Hutner program eventually became a department of Pace University in New York. The two identically named organizations are no longer formally affiliated. === 1940s === The U. S. Office of Scientific Research and Development, under Vannevar Bush asked Haskins Laboratories to evaluate and develop technologies for assisting blinded World War II veterans. Experimental psychologist Alvin Liberman joined Haskins Laboratories to assist in developing a "sound alphabet" to represent the letters in a text for use in a reading machine for the blind. Luigi Provasoli joined Haskins Laboratories to set up a research program in marine biology. The program in marine biology moved to Yale University in 1970 and disbanded with Provasoli's retirement in 1978. === 1950s === Franklin S. Cooper invented the pattern playback, a machine that converts pictures of the acoustic patterns of speech back into sound. With this device, Alvin Liberman, Cooper, and Pierre Delattre (and later joined by Katherine Safford Harris, Leigh Lisker, Arthur Abramson, and others), discovered the acoustic cues for the perception of phonetic segments (consonants and vowels). Liberman and colleagues proposed a motor theory of speech perception to resolve the acoustic complexity: they hypothesized that we perceive speech by tapping into a biological specialization, a speech module, that contains knowledge of the acoustic consequences of articulation. Liberman, aided by Frances Ingemann and others, organized the results of the work on speech cues into a groundbreaking set of rules for speech synthesis by the Pattern Playback. === 1960s === Franklin S. Cooper and Katherine Safford Harris, working with Peter MacNeilage, were the first researchers in the U.S. to use electromyographic techniques, pioneered at the University of Tokyo, to study the neuromuscular organization of speech. Leigh Lisker and Arthur Abramson looked for simplification at the level of articulatory action in the voicing of certain contrasting consonants. They showed that many acoustic properties of voicing contrasts arise from variations in voice onset time, the relative phasing of the onset of vocal cord vibration and the end of a consonant. Their work has been widely replicated and elaborated, here and abroad, over the following decades. Donald Shankweiler and Michael Studdert-Kennedy used a dichotic listening technique (presenting different nonsense syllables simultaneously to opposite ears) to demonstrate the dissociation of phonetic (speech) and auditory (nonspeech) perception by finding that phonetic structure devoid of meaning is an integral part of language, typically processed in the left cerebral hemisphere. Liberman, Cooper, Shankweiler, and Studdert-Kennedy summarized and interpreted fifteen years of research in "Perception of the Speech Code", still among the most cited papers in the speech literature. It set the agenda for many years of research at Haskins and elsewhere by describing speech as a code in which speakers overlap (or coarticulate) segments to form syllables. Researchers at Haskins connected their first computer to a speech synthesizer designed by Haskins Laboratories' engineers. Ignatius Mattingly, with British collaborators, John N. Holmes and J.N. Shearme, adapted the Pattern playback rules to write the first computer program for synthesizing continuous speech from a phonetically spelled input. A further step toward a reading machine for the blind combined Mattingly's program with an automatic look-up procedure for converting alphabetic text into strings of phonetic symbols. === 1970s === In 1970, Haskins Laboratories moved to New Haven, Connecticut, and entered into affiliation agreements with Yale University and the University of Connecticut; Haskins remains fully independent of both Yale and UConn, administratively and financially. The lab's original location in New Haven, at 270 Crown Street (from 1970 to 2005), was leased from Yale University. Isabelle Liberman, Donald Shankweiler, and Alvin Liberman teamed up with Ignatius Mattingly to study the relationship between speech perception and reading, a topic implicit in Haskins Laboratories' research program since its inception. They developed the concept of phonemic awareness, the knowledge that would-be readers must be aware of the phonemic structure of their language in order to be able to read. Leonard Katz related the work to contemporary cognitive theory and provided expertise in experimental design and data analysis. Under the broad rubric of the "alphabetic principle", this is the core of the lab's present program of reading pedagogy. Patrick Nye joined Haskins Laboratories to lead a team working on the reading machine for the blind. The project culminated when the addition of an optical character recognizer allowed investigators to assemble the first automatic text-to-speech reading machine. By the end of the decade this technology had advanced to the point where commercial concerns assumed the task of designing and manufacturing reading machines for the blind. In 1973, Franklin S. Cooper was selected to form a panel of six experts charged with investigating the famous 18-minute gap in the White House office tapes of President Richard Nixon related to the Watergate scandal. Building on earlier work, Philip Rubin developed the sinewave synthesis program, which was then used by Robert Remez, Rubin, and colleagues to show that listeners can perceive continuous speech without traditional speech cues from a pattern of sinewaves that track the changing resonances of the vocal tract. This paved the way for a view of speech as a dynamic pattern of trajectories through articulatory-acoustic space. Philip Rubin and colleagues developed Paul Mermelstein's anatomically simplified vocal tract model, originally worked on at Bell Laboratories, into the first articulatory synthesizer that can be controlled in a phy

    Read more →
  • Software engineering demographics

    Software engineering demographics

    Software engineers make up a significant portion of the global workforce. As of 2022, there are an estimated 26.9 million professional software engineers worldwide, up from 21 million in 2016. == By country == === United States === In 2023, there were an estimated 1.6 million professional software developers in North America. There are 166 million people employed in the US workforce, making software developers 0.96% of the total workforce. ==== Summary ==== ==== Software engineers vs. traditional engineers ==== The following two tables compare the number of software engineers (611,900 in 2002) versus the number of traditional engineers (1,157,020 in 2002). There are another 1,500,000 people in system analysis, system administration, and computer support, many of whom might be called software engineers. Many systems analysts manage software development teams, and as analysis is an important software engineering role, many of them may be considered software engineers in the near future. This means that the number of software engineers may actually be much higher. It is important to note that the number of software engineers declined by 5 to 10 percent from 2000 to 2002. ==== Computer managers vs. construction and engineering managers ==== Computer and information system managers (264,790) manage software projects, as well as computer operations. Similarly, Construction and engineering managers (413,750) oversee engineering projects, manufacturing plants, and construction sites. Computer management is 64% the size of construction and engineering management. ==== Software engineering educators vs. engineering educators ==== Most people working in the field of computer science, whether making software systems (software engineering) or studying the theoretical and mathematical facts of software systems (computer science), acquire degrees in computer science. According to the U.S. Bureau of Labor Statistics (May 2023 data), there were approximately 44,800 postsecondary computer science teachers and 50,300 engineering teachers, indicating that the computer science educator workforce is nearly 89% as large as that of engineering educators. The combined number of postsecondary chemistry (25,400) and physics (17,100) teachers totaled 42,500, slightly less than the number of computer science educators. ==== Other software and engineering roles ==== ==== Relation to IT demographics ==== Software engineers are part of the much larger software, hardware, application, and operations community. In 2000 in the U.S., there were about 680,000 software engineers and about 10,000,000 IT workers. As of early 2025, there are an estimated 47.2 million software developers worldwide, representing a 50% increase from 31 million in Q1 2022. There are no numbers on testers in the BLS data. === India === There has been a healthy growth in the number of India's IT professionals over the past few years. From a base of 6,800 knowledge workers in 1985–86, the number increased to 522,000 software and services professionals by the end of 2001–02. It is estimated that out of these 528,000 knowledge workers, almost 170,000 are working in the IT software and services export industry; nearly 106,000 are working in the IT enabled services and over 230,000 in user organizations. === Australia === In May 2024, the Australian government reported that 169,300 Australians are employed as software and applications programmers, 17% of who are women. The role grew annually by 8,300 workers. === Russia === According to the Russian government, the number of IT specialists in the country increased by 13% in 2023, reaching approximately 857,000. During the initial phase of the 2022 invasion of Ukraine, an estimated 100,000 IT specialists left Russia.

    Read more →
  • Fyuse

    Fyuse

    Fyuse is a spatial photography app which lets users capture and share interactive 3D images. By tilting or swiping one's smartphone, one can view such "fyuses" from various angles — as if one were walking around an object or subject. The app blends photography and video to create an interactive medium and was first published for iOS in April 2014. The Android version was released at the end of 2014. == The app == Fyuse lets users capture panoramas, selfies, and full 360° views of objects and allows one to view captured moments from different angles. It has its own personal gallery, social network and standalone web integration. With the app, Fyusion also created a social networking platform similar to Instagram. Fyuses can be shared, commented on, liked and re-shared to one's followers (called Echoes). One can build a network of followers and with engagement tracking, one can see how many times an image has been interacted with The images can also be saved for private, offline view, or shared to other social networks, like Facebook or Twitter, or embedded on a website where the images can be interacted with by desktop users via dragging the mouse. Furthermore, in the compass tab other fyuses can be discovered using the app's system of tags and categories. One's Fyuse feed is prepopulated with top users, and one can follow people to see when they post a new fyuse. The app will also find one's friends if one signs up with Facebook or connects it with one's Twitter account. To create a fyuse one moves around a person or object with one's phone's camera in one direction or moving/tilting one's phone around while holding one's finger on the screen. By combining photography and video the app allows one to capture moments that one may not have otherwise been able to capture by recording not one moment in time but stitched together little moments. According to Fyusion CEO Radu Rusu, a photo freezes a moment in time, while a video captures moments in a linear timeline — both still flat, when viewed. A fyuse image captures a moment in space, where one can not only see one side of something, but also around it. When it is done rendering, fyuses can also be edited – one can trim the fyuse for length and edit the brightness, contrast, exposure, saturation and sharpness. One can also add a vignette and apply a filters, with options to adjust their intensity. After editing, one can write a description, add hashtags, and tag parts of the fyuse before one can (voluntarily) publish and share it. Version 1.0 has been described as "alpha prototype" and version 2.0 was released on 17 December 2014. Version 3.0 introduced 3D tagging by which users can layer 3D graphic that animate accordingly with each interaction to add some context to the content. Version 4.0 was released on December 21, 2016 for iOS. Since January 2016 (v3.2) the app allows the export of fyuses as Live Photos. The app has also been described as a more sophisticated version of 3D stickers and flip images. == Applications == The app has many applications for e-commerce such as for fashion designers who want to showcase a garment from every angle, or real estate listings and Airbnb-type sites that want to make their rental properties seem as enticing as possible. The app can also be used for interactive art, 360° panoramas and selfies. == History == San Francisco-based Fyusion Inc.'s three founders — Radu B. Rusu, CTO Stefan Holzer, and VP of Engineering Stephen Miller — worked together at Willow Garage, the robotics research lab started by early Google employee Scott Hassan in the area of "personal robotics" — Hassan decided to turn the lab into more of an incubator, suggesting that the members spin off their technologies into consumer-facing enterprises. Rusu first set out with an open-source 3D perception software startup called Open Perception. Fyusion was officially founded in 2013, and soon after Rusu and his cofounders patented the technology for spatial photography. The company closed a seed funding round at the end of May, raising $3.35 million from investors, including an angel investment from Sun Microsystems cofounder Andreas Bechtolsheim. In 2014 the Fyuse team consisted of 13 employees, mostly engineers and designers, recruited from around the globe. In March 2015 the team displayed their app at Katy Perry's premiere for the movie "Prismatic World Tour on Epix" where Perry also took Fyuse for a test run. == Augmented reality == In September 2016 Fyusion unveiled its platform for creating augmented reality content using ones smartphone. It takes the images from ones smartphone and converts them into 3D holographic images, which one can then view on an AR headset. According to Rusu "by making it easy for people to capture their surroundings on any mobile device, [Fyusion is] revolutionizing the way that people view the world around them" and also states that for "AR to be successful, anyone should be able to create content for it" opposed to the current "small number of content creators and an even smaller number of hardware players". According to him "the applications of [Fyusion's] technology for consumers and businesses are incredibly limitless". The platform uses the company's patented 3D spatio-temporal platform that uses advanced sensor fusion, machine learning and computer vision algorithms and part of the platform is built into the Fyuse app. Before committing to releasing a separate consumer product the company intends to wait until the HoloLens device becomes available to the public. Until then any Fyuse representation created using Fyuse is AR ready and will be able to be shown in HoloLens in the future. == Fyuse - Point of No Return == Fyuse - Point of No Return is a science fiction short advert for Fyuse 3.0 in which Fyuse's digital medium is extrapolated into the future. In the film a woman uses a mini scanning-drone to 3D scan a tree with Fyuse and later recreate it as an augmented reality object at another place.

    Read more →
  • Transcription software

    Transcription software

    Transcription software assists in the conversion of human speech into a text transcript. Audio or video files can be transcribed manually or automatically. Transcriptionists can replay a recording several times in a transcription editor and type what they hear. By using transcription hot keys, the manual transcription can be accelerated, the sound filtered, equalized or have the tempo adjusted when the clarity is not great. With speech recognition technology, transcriptionists can automatically convert recordings to text transcripts by opening recordings in a PC and uploading them to a cloud for automatic transcription, or transcribe recordings in real-time by using digital dictation. Depending on quality of recordings, machine generated transcripts may still need to be manually verified. The accuracy rate of the automatic transcription depends on several factors such as background noises, speakers' distance to the microphone, and accents. Transcription software, as with transcription services, is often used for business, legal, or medical purposes. Compared with audio content, a text transcript is searchable, takes up less computer memory, and can be used as an alternate method of communication, such as for subtitles and closed captions. Some clinical environments also use digital tools to support transcription workflows, including ambient documentation systems that employ Speech recognition to capture portions of clinical encounters and generate draft notes for later review. These tools are typically used alongside conventional transcription methods. The definition of transcription "software", as compared with transcription "service", is that the former is sufficiently automated that a user can run the entire system without engaging outside personnel. New software-as-a-service and cloud computing models use artificial intelligence, machine learning and natural language processing to convert speech to text and continuously learn new phrases and accents. AI transcription can, however, lead to hallucinations and other errors. == Development == Research at Google released a free android app Google Live Transcribe, it runs on Google Cloud. Google Chrome developed and has an available built in English Live Caption. Google Docs, Google Translate, Google Assistant, GBoard Google Text to Speech engine support transcription tool too. OpenAI launched Whisper, an open-source speech recognition deep learning model in September 2022. In 2024, an AI-powered transcription platform, Transkriptor, was launched, enabling the automatic conversion of audio and video recordings into text using speech recognition technology, with support for transcription in 100 languages and processing of content uploaded via a web interface as well as mobile and browser extensions. It is part of the Tor.app suite of AI-based language processing tools.

    Read more →
  • G'MIC

    G'MIC

    G'MIC (GREYC's Magic for Image Computing) is a free and open-source framework for image processing. It defines a script language that allows the creation of complex macros. Originally usable only through a command line interface, it is currently mostly popular as a GIMP plugin, and is also included in Krita. G'MIC is dual-licensed under CECILL-2.1 or CECILL-C. == Features == G'MIC's graphical interface is notable for its noise removal filters, which came from an earlier project called GREYCstoration by the same authors. G'MIC offers many built-in commands for image processing, including basic mathematical manipulations, look up tables, and filtering operations. More complex macros and pipelines built out of those commands are defined in its library files. == Interpreters == === Command line === G'MIC is primarily a script language callable from a shell. For example, to display an image: This command displays the image contained in the file image.jpg and allows zooming in to examine values. Several filters can be applied in succession. For example, to crop and resize an image: === Graphical interface === G'MIC comes with a Qt-based graphical interface, which may be integrated as a Gimp or Krita plugin. It contains several hundred filters written in the G'MIC language, dynamically updated through an internet feed. The interface provides a preview and setting sliders for each filter. G'MIC is one of the most popular Gimp plugins. === G'MIC Online === Most of the filters available for the graphical interface are also available online. === ZArt === ZArt is a graphical interface for real-time manipulation of webcam images. === libgmic === Libgmic is a C++ library that can be linked to third-party applications. It sees integration in Flowblade and Veejay.

    Read more →
  • Clara.io

    Clara.io

    Clara.io is web-based freemium 3D computer graphics software developed by Exocortex, a Canadian software company. The free or "Basic" component of their freemium offering, however, places severe restrictions, such as on saving models and importing texture maps, which are undisclosed in the company's own descriptions of their plans.vf TMN == History == Clara.io was announced in July 2013, and first presented as part of the official SIGGRAPH 2013 program later that month. By November 2013, when the open beta period started, Clara.io had 14,000 registered users. Clara.io claimed to have 26,000 registered users in January 2014, which grew to 85,000 by December 2014. Clara.io was permanently shut down on December 31, 2022, but the site is currently still partially functional to logged-in users. == Features == Polygonal modeling Constructive solid geometry Key frame animation Skeletal animation Hierarchical scene graph Texture mapping Photorealistic rendering (streaming cloud rendering using V-Ray Cloud) Scene publishing via HTML iframe embedding FBX, Collada, OBJ, STL and Three.js import/export Collaborative real-time editing Revision control (versioning & history) Scripting, Plugins & REST APIs 3D model library Unlisted and Private scenes (paid subscriptions only). == Technology == Clara.io is developed using HTML5, JavaScript, WebGL and Three.js. Clara.io does not rely on any browser plugins and thus runs on any platform that has a modern standards compliant browser. == Screenshots ==

    Read more →
  • Odor source localization

    Odor source localization

    Odor source localization (OSL) is the problem of locating the origin of an airborne or waterborne chemical plume using one or more mobile sensors, typically robots equipped with chemical sensors. The task sits at the intersection of robotics, fluid dynamics and machine olfaction. Chemical plumes in turbulent flows are intermittent and patchy, and most chemical sensors respond slowly and have limited selectivity, so the instantaneous reading available to a moving sensor is a poor proxy for the underlying time-averaged concentration field. Robotic OSL has been studied since the late 1980s and has applications including the detection of gas leaks, search and rescue after industrial accidents, and environmental monitoring of industrial emissions. == History == Robotic odor search emerged in the late 1980s and 1990s, drawing on earlier work in chemical ecology that had described how moths and other insects locate distant pheromone sources. R. A. Russell at Monash University was among the first to build mobile robots that followed chemical trails on the floor and tracked airborne odor plumes. Distributed and multi-robot odor search were investigated by Hayes, Martinoli and Goodman at the California Institute of Technology and EPFL, who studied cooperative plume-tracing on simulated and physical robot swarms. In 2007 Vergassola, Villermaux and Shraiman introduced infotaxis, an information-theoretic search strategy in which a sensor moves so as to maximize the expected information gain about source location, rather than following a chemical concentration gradient; the paper appeared in Nature and prompted substantial follow-up work in the robotics community. From the mid-2010s, multi-rotor unmanned aerial vehicles carrying lightweight chemical sensors became a common experimental platform for OSL research. == Problem formulation == OSL is generally decomposed into three sub-problems: plume detection (deciding whether a chemical signal is present), plume traversal (moving so as to remain in contact with the plume), and source declaration (deciding when the source has been reached). The mathematical difficulty depends strongly on the assumed dispersion model. In laminar or low-Reynolds number flows a Gaussian advection–diffusion model gives a smooth concentration field with a well-defined gradient. In turbulent flows, which dominate most realistic environments, the plume is filamentary: the sensor receives short, randomly spaced bursts of chemical separated by periods of zero signal, and the time-averaged field is not a useful guide on the time scales at which a robot must act. Source-term estimation, surveyed by Hutchinson and colleagues, additionally aims to recover both the position and the release rate of the source from the observed concentrations, often using probabilistic filters. == Biological inspiration == Many OSL strategies are explicitly modeled on the behavior of male moths flying upwind toward a pheromone source. As reviewed by Cardé and Willis, moths combine an upwind surge whenever they detect a filament of pheromone with a wider crosswind cast when contact is lost, producing a characteristic zig-zag trajectory that has been transposed onto mobile robots by several groups. Other biological models draw on the search behavior of dogs and of marine animals such as blue crabs and lobsters, which integrate chemical and bilateral hydrodynamic cues over much shorter ranges. == Algorithms and strategies == === Reactive strategies === Reactive strategies select the next motion as a direct function of the current sensor reading. Chemotaxis steers along the locally estimated concentration gradient, which is effective in laminar plumes but degrades severely in turbulence. Anemotaxis exploits a measured wind direction by surging upwind when chemical contact is made. The bio-inspired cast-and-surge family combines anemotaxis with a deterministic crosswind cast on contact loss, and is the dominant reactive approach for turbulent environments. === Probabilistic and information-theoretic strategies === Probabilistic methods maintain a posterior distribution over possible source locations and choose actions that improve that distribution. The infotaxis strategy of Vergassola, Villermaux and Shraiman selects the move that maximizes the expected reduction in entropy of the source-location posterior, and is effective in regimes where the spatial gradient is unusable. Bayesian source-term estimation extends this idea by inferring both source position and release rate, typically using particle filters or sequential Monte Carlo. === Map-based strategies === Map-based methods build a spatial model of the time-averaged gas distribution from sensor readings collected along the robot's trajectory and search for local maxima in that model. Lilienthal and colleagues describe a family of kernel-based gas distribution mapping techniques in which point measurements are convolved with a Gaussian kernel to produce a spatially extrapolated estimate. Such methods are most useful when the source can be assumed quasi-stationary and the robot is able to revisit locations. === Multi-robot and swarm strategies === Multiple robots searching cooperatively can shorten search times. Cooperative formations spread the sensors across the crosswind axis, making detection of an intermittent plume more likely. Swarm-based approaches, reviewed by Wang and colleagues, deploy larger numbers of simpler agents and rely on collective behavior rather than centralized planning; reported advantages include improved coverage of the search area and the possibility of locating multiple sources in parallel. == Sensors and platforms == Most OSL systems use metal-oxide semiconductor (MOX) sensors, photoionization detectors or electrochemical cells, which trade off sensitivity, selectivity, response time and power consumption. Ishida and colleagues describe how these sensors interact with airflow around the robot body, an effect that motivates careful aerodynamic design and active sampling. Mobile platforms include wheeled ground robots for indoor and structured outdoor environments, multi-rotor unmanned aerial vehicles for open spaces and elevated sources, and autonomous underwater vehicles for chemical plumes in the marine environment. == Notable systems == Among the early demonstrations, R. A. Russell's series of differential-drive robots at Monash University localized volatile sources in still and ventilated rooms during the 1990s. The Smelling Nano Aerial Vehicle reported by Burgués and colleagues used a Crazyflie nano-quadcopter (approximately 27 grams in mass and 10 cm across) carrying a custom MOX gas sensing board, and built three-dimensional gas distribution maps of indoor releases from sweeping flights of less than three minutes. The GADEN simulator, released by Monroy and colleagues, couples three-dimensional dispersion computed from an OpenFOAM CFD solver with models of MOX and photo-ionization gas sensors, and is widely used to test mobile-robot olfaction algorithms in simulation. == Applications == Reported applications include the localization of natural-gas and methane leaks in urban infrastructure, search for chemical contamination after industrial accidents, search and rescue, and environmental monitoring of industrial emissions. Drug- and explosives-detection robots are an adjacent application area, although these typically rely on close-range sniffing rather than long-range plume tracking. == Open challenges == Open challenges identified in recent reviews include the limited speed, selectivity and stability of available chemical sensors; the scarcity of standardized, large-scale benchmarks comparable to those available in computer vision; reliable handling of multi-source environments, where standard single-source assumptions fail; and the integration of OSL with other autonomous-vehicle subsystems such as obstacle avoidance and navigation in three-dimensional turbulent flow.

    Read more →
  • Echo Lake (software)

    Echo Lake (software)

    Echo Lake (AKA Family Album Creator) was the most notable multimedia software product produced by Delrina, which debuted in June 1995. It was touted internally as a "cross [of] Quark Xpress and Myst". It featured an immersive 3D environment where a user could go to a virtual desktop in a virtual office and assemble video and audio clips along with images, and then print them out as either a virtual book other users of the program could use, or for print. It was a highly innovative product for its time, and ultimately was hampered by the inability of many users able to input their own multimedia content easily into a computer from that period. Creative Wonders bought the rights to the Echo Lake multimedia product, which was re-shaped as an introductory program on multimedia and re-released as Family Album Creator in 1996.

    Read more →
  • Image subtraction

    Image subtraction

    Image subtraction or pixel subtraction or difference imaging is an image processing technique whereby the digital numeric value of one pixel or whole image is subtracted from another image, and a new image generated from the result. This is primarily done for one of two reasons – levelling uneven sections of an image such as half an image having a shadow on it, or detecting changes between two images. This method can show things in the image that have changed position, brightness, color, or shape. For this technique to work, the two images must first be spatially aligned to match features between them, and their photometric values and point spread functions must be made compatible, either by careful calibration, or by post-processing (using color mapping). The complexity of the pre-processing needed before differencing varies with the type of image, but is essential to ensure good subtraction of static features. This is commonly used in fields such as time-domain astronomy (known primarily as difference imaging) to find objects that fluctuate in brightness or move. In automated searches for asteroids or Kuiper belt objects, the target moves and will be in one place in one image, and in another place in a reference image made an hour or day later. Thus, image processing algorithms can make the fixed stars in the background disappear, leaving only the target. Distinct families of astronomical image subtraction techniques have emerged, operating in both image space or frequency space, with distinct trade-offs in both quality of subtraction and computational cost. These algorithms lie at the heart of almost all modern (and upcoming) transient surveys, and can enable the detection of even faint supernovae embedded in bright galaxies. Nevertheless, in astronomical imaging, significant 'residuals' remain around bright, complex sources, necessitating further algorithmic steps to identify candidates (known as real-bogus classification) The Hutchinson metric can be used to "measure of the discrepancy between two images for use in fractal image processing".

    Read more →
  • Subpixel rendering

    Subpixel rendering

    Subpixel rendering is a method used to increase the effective resolution of a color display device. It utilizes the composition of each pixel, which consists of three subpixels of which are red, green, and blue that can each be individually addressable on the display matrix. Subpixel rendering is primarily used for text rendering on standard DPI displays. Despite the inherent color anomalies, it can also be used to render general graphics. == History == The origin of subpixel rendering as used today remains controversial. Apple Inc., IBM, and Microsoft patented various implementations that differed in technical details owing to the different purposes for which their technologies were intended. Microsoft held several patents in the United States for subpixel rendering technology used in text rendering on RGB Stripe layouts. The patents 6,219,025; 6,239,783; 6,307,566; 6,225,973; 6,243,070; 6,393,145; 6,421,054; 6,282,327; and 6,624,828 were filed between October 7, 1998, and October 7, 1999, and expired on July 30, 2019. Analysis of the patent by FreeType indicates that the patent does not cover the idea of subpixel rendering, but rather the actual filter used as a last step to balance the color. Microsoft's patent describes the smallest possible filter that distributes each subpixel value equally among the R, G, and B pixels. Any other filter will either be blurrier or will introduce color artifacts. Apple was able to use it in Mac OS X due to a patent cross-licensing agreement. == Characteristics == A single pixel on a color display is made of several subpixels, typically three arranged left-to-right as red, green, and blue (RGB). The components are readily visible with a small magnifying glass, such as a loupe. These pixel components appear as a single color to the human eye because of blurring by optics and spatial integration by nerve cells in the eye. However, the eye is much more sensitive to the location. Therefore, turning on the G and B of one pixel and the R of the next pixel to the right will produce a white dot, but it will appear to be 1/3 of a pixel to the right of the white dot that would be seen from the RGB of only the first pixel. Subpixel rendering leverages this to provide three times the horizontal resolution of the rendered image. However, it has to blur this image to produce the correct color by ensuring the same amount of red, green, and blue are turned on as when no subpixel rendering is being done. Subpixel rendering does not necessitate the use of antialiasing. It gives a smoother result regardless of whether antialiasing is used or not since it artificially increases the resolution. However, it introduces color aliasing since subpixels are colored. Subsequent filtering applied to remove the color artifacts is a form of antialiasing, although its purpose is not smoothing jagged shapes as in conventional antialiasing. Subpixel rendering requires the software to know the layout of the subpixels. The most common reason it is wrong is monitors that can be rotated 90 (or 180) degrees, though monitors are manufactured with other arrangements of the subpixels, such as BGR or in triangles, or with 4 colors like RGBW squares. On any such display the result of incorrect subpixel rendering will be worse than if no subpixel rendering was done at all (it will not produce color artifacts, but it will produce noisy edges). == Implementations == === Apple II === Steve Gibson has claimed that the Apple II, introduced in 1977, supports an early form of subpixel rendering in its high-resolution (280×192) graphics mode. The Wozniak patent only used 2 "sub-pixels". The bytes that comprise the Apple II high-resolution screen buffer contain seven visible bits (each corresponding directly to a pixel) and a flag bit used to select between purple/green or blue/orange color sets. Each pixel, since it is represented by a single bit, is either on or off; there are no bits within the pixel itself for specifying color or brightness. Color is instead created as an artifact of the NTSC color encoding scheme, determined by horizontal position: pixels with even horizontal coordinates are always purple (or blue, if the flag bit is set), and odd pixels are always green (or orange). Two lit pixels next to each other are always white, regardless of whether the pair is even/odd or odd/even, and irrespective of the value of the flag bit. This is an approximation, but it is what most programmers of the time would have in mind while working with the Apple's high-resolution mode. Gibson's example claims that because two adjacent bits form a white block, there are, in fact, two bits per pixel: one that activates the pixel's purple left half and the other that activates its green right half. If the programmer instead activates the green right half of a pixel and the purple left half of the next pixel, the result is a white block 1/2 pixel to the right, which is indeed an instance of subpixel rendering. However, it is not clear whether any programmers of the Apple II have considered the pairs of bits as pixels—instead calling each bit a pixel. The flag bit in each byte affects color by shifting pixels half a pixel-width to the right. This half-pixel shift was exploited by some graphics software, such as HRCG (High-Resolution Character Generator), an Apple utility that displayed text using the high-resolution graphics mode, to smooth diagonals. === ClearType === Microsoft announced its subpixel rendering technology, called ClearType, at COMDEX in 1998. Microsoft published a paper in May 2000, Displaced Filtering for Patterned Displays, describing the filtering behind ClearType. It was then made available in Windows XP. Still, it was not activated by default until Windows Vista, while Windows XP OEMs could and did change the default setting. === FreeType === FreeType, the library used by most current software on the X Window System, contains two open source implementations. The original implementation uses the ClearType antialiasing filters and carries the following notice: "The colour filtering algorithm of Microsoft's ClearType technology for subpixel rendering is covered by patents; for this reason, the corresponding code in FreeType is disabled by default. Note that subpixel rendering per se is prior art; using a different colour filter thus easily circumvents Microsoft's patent claims." FreeType offers a variety of color filters. Since version 2.6.2, the default filter is light, a filter that is both normalized (value sums up to 1) and color-balanced (eliminate color fringes at the cost of resolution). Since version 2.8.1, a second implementation exists, called Harmony, that "offers high quality LCD-optimized output without resorting to ClearType techniques of resolution tripling and filtering". This is the method enabled by default. When using this method, "each color channel is generated separately after shifting the glyph outline, capitalizing on the fact that the color grids on LCD panels are shifted by a third of a pixel. This output is indistinguishable from ClearType with a light 3-tap filter." Since the Harmony method does not require additional filtering, it is not covered by the ClearType patents. === CoolType === Adobe created their own subpixel renderer called CoolType, allowing them to display documents the same way across various operating systems: Windows, MacOS, Linux etc. When it was launched around the year 2001, CoolType supported a wider range of fonts than Microsoft's ClearType, which at the time was limited to TrueType fonts. In contrast, Adobe's CoolType also supported PostScript fonts (and their OpenType equivalents). === macOS === Mac OS X (later OS X, now macOS) also used subpixel rendering, as part of Quartz 2D. However, it was removed after the introduction of Retina displays. Unlike Microsoft's implementation, which favors a tight fit to the grid (font hinting) to maximize legibility, Apple's implementation prioritizes the shape of the glyphs as set out by their designer.

    Read more →
  • Tiimo

    Tiimo

    Tiimo is an app designed to help neurodivergent individuals with planning their life. In August 2024 the company raised €1.4 million, bringing their total funding to €4.3 million. At that point they had over 500,000 users, including 50,000 paid users. The app has Apple Watch support and a learning platform that includes courses on well-being and neurodiversity. The app was founded by Helene Lassen Nørlem and Melissa Würtz Azari in 2015. After being a finalist in 2024, in December 2025 Tiimo was won Apple’s iPhone App of the Year. The premium version is $10/mo and features an AI chatbot alongside the daily planner.

    Read more →
  • List of computer graphics journals

    List of computer graphics journals

    List of computer graphics journals includes notable peer-reviewed scientific and academic journals that focus on computer graphics, visualization, and related areas such as rendering, animation, image processing, and geometric modeling. == Journals == ACM Transactions on Graphics Computers & Graphics IEEE Computer Graphics and Applications IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems Graphical Models Journal of Computer Graphics Techniques Presence: Teleoperators and Virtual Environments Virtual Reality Simulation & Gaming

    Read more →