AI Chat Sites

AI Chat Sites — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • VSCO

    VSCO

    VSCO ( ), formerly known as VSCO Cam, is a photography mobile app available for iOS and Android devices. The app was created by Joel Flory and Greg Lutze. The VSCO app allows users to capture photos in the app and edit them, using preset filters and editing tools. == History == Visual Supply Company was founded by Joel Flory and Greg Lutze in California, in 2011. VSCO was launched in 2012. It raised $40 million from investors in May 2014. In 2017, VSCO launched a subscription model. As of 2018, Visual Supply Company has $90 million in funding from investors and over 2 million paying members. In 2019, VSCO acquired Rylo, a video editing startup founded by the original developer of Instagram’s Hyperlapse. Visual Supply Company has locations in Oakland, California, where it is headquartered, and Chicago, Illinois. In December 2020 VSCO acquired AI-powered video editing app Trash. In April 2018, VSCO reached over 30 million users. In September 2023, Eric Wittman was appointed as the new CEO and co-founder Joel Flory became executive chairman. == Usage == Users must register an account to use the app. Photos can be taken or imported from the camera roll, as well as short videos or animated GIFs (known in the app as DSCO; iOS only). The user can edit their photos through various preset filters, or through the "toolkit" feature which allows finer adjustments to fade, clarity, skin tone, tint, sharpness, saturation, contrast, temperature, exposure, and other properties. Users have the option of posting their photos to their profile, where they can also add captions and hashtags. Photos can also be exported back into the camera roll or shared with other social networking services. The users also have an option to edit their own videos from their camera roll with the VSCO yearly membership, but they are not able to post camera roll as VSCO Film X videos to their account on VSCO. JPEG and raw image files can be used. Research on image based social media platforms has found that engagement with posting, editing, and interacting with images can influence users' mood, self esteem, and body satisfaction. Studies also suggest that greater emotional investment in social media content is associated with increased negative psychological outcomes including stress and depressive symptoms. == In popular culture == VSCO's Oakland headquarters was a key filming location for Boots Riley's 2018 film Sorry to Bother You.

    Read more →
  • Abu Dhabi Autonomous Racing League

    Abu Dhabi Autonomous Racing League

    The Abu Dhabi Autonomous Racing League (A2RL) is an autonomous racing league based in Abu Dhabi and organized by ASPIRE, part of the UAE government's Advanced Technology Research Council. It has three distinct categories: the "car race", the drone race, and the buggy race. The first car race was held on 27 April 2024 at the Yas Marina Circuit, marking the first major autonomous formula race outside the US since the now-folded Roborace championship. The first drone race was held on 11 and 12 April 2025. == Formats == A2RL has three distinct formats, the formula racing format (dubbed the Car Race), the quadcopter drone racing format (dubbed the Drone Race), and the off-road dune buggy racing format (dubbed the Buggy Race). === Car Race === A2RL's main event, the car race is a standard formula racing format with self-driving formula cars. The cars are made by Dallara and are modified versions of Super Formula cars with Yokohama tires. These cars had the CPUs of their AIs mounted where the driver's seat is on a non-modified chassis, as well as hydraulic actuators for AI control of the vehicle, multiple sensor systems including LIDAR and GPS, and a large LED indicator showing the status of the AI. The first car race was held on 27 April 2024. This race was marked by the cars' subpar performance: Out of four cars that qualified, only two finished the race - the other two did not. The next race was held on 15 November 2025, with 11 teams. ==== Technical specifications ==== The full list of technical specifications are as follows: Chassis: Dallara EAV24 (modified Dallara SF23) Forward suspension: Pushrod type, torsion bar spring, adjustable dampers, third element Rear suspension: Pushrod type, torsion bar, coil springs, adjustable dampers, third element Tires: Yokohama Advan Drive-by-wire system: Provided by Meccanica 42, the DBW system consists of steering and brake actuators, with a central ECU that coordinates the driving actions and reacts to any critical situation in real-time. Brakes: Brembo calipers, Brembo carbon discs, electro-hydraulically activated Engine: 4 Piston Racing K20C1 (based on Honda 2.0l; turbocharged 4-cylinder engine) Gearbox: 3MO 6-speed gearbox Sensor suite: 7x Sony IMX728 cameras, 4x ZF ProWave radar units, 3x Seyond Falcon Kinetic lidar units Main computer: Neousys RGS-8805GC ==== Races held ==== === Drone Race === Created in partnership with the Drone Champions' League, the drone race is the quadcopter drone racing aerial format of the A2RL. The first race was held on 11/12 April 2025 at the ADNEC Marina Hall. 10 teams are scheduled to take part. === Buggy Race === The buggy race will be the off-road format of the A2RL using self-driving dune buggies. No date or number of teams has been announced for the first race. === Other events === A2RL is known to host AI vs AI and Human vs AI events, in Abu Dhabi and abroad. One such event took place at the Suzuka Circuit in Japan. The Human vs AI race was precluded due to AI car "Yalla" crashing into the wall during the formation lap. == Team lists ==

    Read more →
  • Akoma Ntoso

    Akoma Ntoso

    Akoma Ntoso (Architecture for Knowledge-Oriented Management of African Normative Texts using Open Standards and Ontologies, AKN) is an international technical standard for representing legal documents (executive, legislative, and judiciary) in a structured manner using a domain specific, legal XML vocabulary. The term akoma ntoso means "linked hearts" in the Akan language of West Africa. Akoma Ntoso is a legal document standard designed to serve as a basis for modern machine-readable and fully digital legislative and judicial processes. This is achieved by providing a coherent syntax and well-defined semantics to represent legal documents in a digital format. It is designed to be suitable as a common exchange format in all parliamentary, legal and judicial systems around the world. Taking advantage of the shared heritage present in all legal systems, Akoma Ntoso has been developed to have ample flexibility to respond to all the differences in texts, languages, and legal practices. Aiming to expand on certain common practices, the standard therefore has a broad scope. It includes a common extensible model for data (the document content) and metadata (such as bibliographic information and annotations). Specifically, as a common legal document standard for the interchange of legal documents it is designed to be highly flexible in its support of documents and functionalities, maintaining a large set of both structural and semantic building blocks (over 500 entities in version 3.0) for representing this wide variety of document types of virtually all legal traditions. It is extensible in order to allow for modifications to address the individual criteria of organizations or unique aspects of various legal practices and languages without sacrificing interoperability with other systems. Akoma Ntoso is as such part of a wider approach to developing open, non-proprietary technical standards for structuring legal documents and information under the name of Legal XML, which also includes formats and standards for, e.g., eContracts, eNotarization, electronic court filings, the technical representation of legal norms and rules (LegalRuleML) or technical standards for the interfaces of, e.g., litigant portal exchange platforms. Akoma Ntoso allows machine-driven processes to operate on the syntactic and semantic components of digital parliamentary, judicial and legislative documents, thus facilitating the development of high-quality information resources. It can substantially enhance the performance, accountability, quality and openness of parliamentary and legislative operations based on best practices and guidance through machine-assisted drafting and machine-assisted (legal) analysis. Embedded in the environment of the semantic web, it forms the basis for a heterogenous yet interoperable ecosystem, with which these tools can operate and communicate, as well as for future applications and use cases based on digital law or rule representation. == Definition == The Akoma Ntoso standard defines a set of machine readable electronic representations in XML format of the building blocks of parliamentary, legislative and judiciary documents. As official self-description, the standard (...) defines a set of simple, technology-neutral electronic representations of parliamentary, legislative and judiciary documents for e-services in a worldwide context and provides an enabling framework for the effective exchange of "machine readable" parliamentary, legislative and judiciary documents such as legislation, debate record, minutes, judgements, etc. Providing access to primary legal materials, parliamentary works and judiciaries documents is not just a matter of giving physical or on-line access to them. "Open access" requires the information to be described and classified in a uniform and organized way so that content is structured into meaningful elements that can be read and understood by software applications, so that the content is made "machine readable" and more sophisticated applications than on-screen display are made possible. The standard is composed of: an XML vocabulary that defines the mapping between the structure of legal documents and their equivalent in XML; specifications of an XML schema that defines the structure of legal documents in XML. They provide rich possibilities of description for several types of parliamentary, legislative and judiciary document, such as bills, acts and parliamentary records, judgments, or gazettes; a recommended naming convention for providing unique identifiers to legal sources based on FRBR model; a MIME type definition. == History and adoption == Akoma Ntoso started as an UNDESA project in 2004 within the initiative "Strengthening Parliaments' Information Systems in Africa". Its core vocabulary was created mostly by Monica Palmirani and Fabio Vitali, two professors from the Centre for Research in the History, Philosophy, and Sociology of Law and in Computer Science and Law (CIRSFID) of the University of Bologna. A first legislative text editor supporting Akoma Ntoso was developed in 2007 on the base of OpenOffice. In 2010 European Parliament developed an open source web-based application called AT4AM based on Akoma Ntoso for facilitating the production and the management of legislative amendments. Thanks to this project, the application of Akoma Ntoso could be extended to new type of documents (e.g. legislative proposal, transcript) and to other scenarios (e.g., multilingual translation process). Akoma Ntoso also was explicitly designed to be compliant with CEN Metalex, one of the other popular legal standards, which is used in the legislation.gov.uk. In 2012, the Akoma Ntoso specifications became the main working base for the activities of the LegalDocML Technical Committee within the LegalXML member section of OASIS. The "United States Legislative Markup" (USLM) schema for the United States Code (the US codified laws), developed in 2013, and the LexML Brasil XML schema for Brazilian legislative and judiciary documents, developed before, in 2008, were both designed to be consistent with Akoma Ntoso. The United States Library of Congress created the Markup of US Legislation in Akoma Ntoso challenge in July 2013 to create representations of selected US bills using the most recent Akoma Ntoso standard within a couple months for a $5000 prize, and the Legislative XML Data Mapping challenge in September 2013 to produce a data map for US bill XML and UK bill XML to the most recent Akoma Ntoso schema within a couple months for a $10000 prize. The National Archives of UK converted all the legislation in AKN in 2014. The availability of bulk legislation "moved the UK's ranking from fourth to first, in the 2014 Global Open Data Index, for legislation". The Senate of Italian Republic provides, since July 2016, all the bills in Akoma Ntoso as bulk in open data repository. The German Federal Ministry of the Interior started the project Elektronische Gesetzgebung ("Electronic Legislation") in 2015/2016 and published Version 1.0 of the German application profile "LegalDocML.de" in March 2020. The projects aim is to digitalize the entire legislative lifecycle from drafting to publication. Germany decided to adopt a model-driven development approach to creating and providing a subschema-based application profile in order to ensure interoperability among organizationally independent actors, each with their respective IT landscapes and tools. In this initial version LegalDocML.de covers draft bills in the form of laws, regulations and general administrative directives. As part of an ongoing development process, the standard could incrementally be expanded in future stages to include all relevant document types of parliamentary, legislative and promulgation processes and tools. The High-Level Committee on Management (HLCM), part of the United Nations System Chief Executives Board for Coordination, set up a Working Group on Document Standards that approved in April 2017 to adopt Akoma Ntoso as standard for modeling its documentation. Akoma Ntoso in its version 1.0 is finally adopted as OASIS standard in the frame of LegalDocML in August 2018.

    Read more →
  • IJCAI Award for Research Excellence

    IJCAI Award for Research Excellence

    The IJCAI Award for Research Excellence is a biannual award before given at the IJCAI conference to researcher in artificial intelligence as a recognition of excellence of their career. Beginning in 2016, the conference is held annually and so is the award. == Laureates == The recipients of this award have been: John McCarthy (1985) Allen Newell (1989) Marvin Minsky (1991) Raymond Reiter (1993) Herbert A. Simon (1995) Aravind Joshi (1997) Judea Pearl (1999) Donald Michie (2001) Nils Nilsson (2003) Geoffrey E. Hinton (2005) Alan Bundy (2007) Victor R. Lesser (2009) Robert Kowalski (2011) Hector Levesque (2013) Barbara Grosz (2015) for her pioneering research in Natural Language Processing and in theories and applications of Multiagent Collaboration. Michael I. Jordan (2016) for his groundbreaking and impactful research in both the theory and application of statistical machine learning. Andrew Barto (2017) for his pioneering work in the theory of reinforcement learning. Jitendra Malik (2018) Yoav Shoham (2019) Eugene Freuder (2020) Richard S. Sutton (2021) Stuart J. Russell (2022) Sarit Kraus (2023) for her pioneering work of the study of interactions among self-interested agents, creating the field of automated negotiation, and developing methods for coalition formation and teamwork, both as formal models and real-world implementations. == Winners of also Turing Award == John McCarthy (1971) Allen Newell (1975) Marvin Minsky (1969) Herbert A. Simon (1975) Judea Pearl (2011) Geoffrey Hinton (2018) Andrew Barto (2024) Richard S. Sutton (2024)

    Read more →
  • Autonomous logistics

    Autonomous logistics

    Autonomous logistics describes systems that provide unmanned, autonomous transfer of equipment, baggage, people, information or resources from point-to-point with minimal human intervention. Autonomous logistics is a new area being researched and currently there are few papers on the topic, with even fewer systems developed or deployed. With web enabled cloud software there are companies focused on developing and deploying such systems which will begin coming online in 2018. == Autonomous logistics vehicles == There are several subclasses of autonomous logistics vehicles: Ground autonomous logistics Based on Unmanned ground vehicle technology, a large autonomous logistics tracked carrier, which can be deployed in a tropical forest for day and night, has been developed. Another example is the TerraMax autonomous truck based on Oshkosh's Medium Tactical Vehicle Replacement (MTVR) military truck platform. Most recently, TerraMax competed in the 2007 Darpa Urban Challenge. The MTVR was designed for the U.S. Marine Corps with a 70% off-road mission profile. TerraMax's unmanned ground vehicle kit does not interfere with the conventional operation of the vehicle. A robust sensor suite allows for 360-degree situational awareness around TerraMax. Elements of the autonomous navigation kit could be used to enhance driver awareness. The complete kit could be used in applications such as snow removal on airport runways. Aerial autonomous logistics Based on unmanned aerial vehicle technology, aerial autonomous logistics (or logistics UAVs) provides transfer of resources and equipment in disaster relief situations, replenishment operations, reconnaissance operations where information is gathered, and general parcel or package delivery. Space autonomous logistics Describes the ability to provide logistics to and from space, be that orbital, lunar or beyond. Current space logistics vehicle examples are the Progress spacecraft, Russian expendable freighter uncrewed resupply spacecraft and the Automated Transfer Vehicle, expendable uncrewed resupply spacecraft developed by the European Space Agency. Above Water autonomous logistics Based on unmanned surface vehicle technology, this class of vehicles provides a range of surface fleet replenishment and equipment transfer capabilities. Subsea autonomous logistics Using autonomous underwater vehicle technology, these vehicles provide re-supply to underwater facilities, reconnaissance of underwater structures, emergency recovery capability, and so on. == Agent-based logistics == Shipping containers handle most of today's intercontinental transport of packaged goods. Managing them in terms of planning and scheduling is a challenging task due to the complexity and dynamics of the involved processes. Hence, recent developments show an increasing trend towards autonomous control with software agents acting on behalf of the logistic objects. Despite the high degree of autonomy it is still necessary to cooperate in order to achieve certain goals. The current trends and recent changes in logistics lead to new, complex and partially conflicting requirements for logistic planning and control systems. Due to the distributed nature of logistics, the usage of agent technology is promising. Due to the mobile nature of logistics, the usage of mobile agent technology is promising as well. Scenarios of usage of mobile agents in logistics has been envisioned.

    Read more →
  • Legal Knowledge Interchange Format

    Legal Knowledge Interchange Format

    The Legal Knowledge Interchange Format (LKIF) was developed in the European ESTRELLA project and was designed with the goal of becoming a standard for representing and interchanging policy, legislation and cases, including their justificatory arguments, in the legal domain. LKIF builds on and uses the Web Ontology Language (OWL) for representing concepts and includes a reusable basic ontology of legal concepts. The core of LKIF consists of a combination of OWL-DL and SWRL. LKIF was designed with two main roles in mind: the translation of legal knowledge bases written in different representation formats and formalisms and to be a knowledge representation formalism which could be part of larger architectures for developing legal knowledge systems.

    Read more →
  • IJCAI Award for Research Excellence

    IJCAI Award for Research Excellence

    The IJCAI Award for Research Excellence is a biannual award before given at the IJCAI conference to researcher in artificial intelligence as a recognition of excellence of their career. Beginning in 2016, the conference is held annually and so is the award. == Laureates == The recipients of this award have been: John McCarthy (1985) Allen Newell (1989) Marvin Minsky (1991) Raymond Reiter (1993) Herbert A. Simon (1995) Aravind Joshi (1997) Judea Pearl (1999) Donald Michie (2001) Nils Nilsson (2003) Geoffrey E. Hinton (2005) Alan Bundy (2007) Victor R. Lesser (2009) Robert Kowalski (2011) Hector Levesque (2013) Barbara Grosz (2015) for her pioneering research in Natural Language Processing and in theories and applications of Multiagent Collaboration. Michael I. Jordan (2016) for his groundbreaking and impactful research in both the theory and application of statistical machine learning. Andrew Barto (2017) for his pioneering work in the theory of reinforcement learning. Jitendra Malik (2018) Yoav Shoham (2019) Eugene Freuder (2020) Richard S. Sutton (2021) Stuart J. Russell (2022) Sarit Kraus (2023) for her pioneering work of the study of interactions among self-interested agents, creating the field of automated negotiation, and developing methods for coalition formation and teamwork, both as formal models and real-world implementations. == Winners of also Turing Award == John McCarthy (1971) Allen Newell (1975) Marvin Minsky (1969) Herbert A. Simon (1975) Judea Pearl (2011) Geoffrey Hinton (2018) Andrew Barto (2024) Richard S. Sutton (2024)

    Read more →
  • Smart speaker

    Smart speaker

    A smart speaker is a type of loudspeaker and voice command device with an integrated virtual assistant that offers interactive actions and hands-free activation with the help of one "wake word" (or several "wake words"). Some smart speakers also act as smart home hubs by using Wi-Fi, Bluetooth, Thread, and other protocol standards to extend usage beyond audio playback and control home automation devices connected through a local area network. == History == Early voice-activated devices began in 2013 with MIT's Jasper project, which used multiple microphones and cloud software to power hands-free interactions from across a room. The first commercial smart speaker was the Amazon Echo, which was released in 2014 powered by Alexa and a ring of far-field microphones. Google followed in 2016 with Home, powered by Google Assistant. By 2017, devices like the Echo Show and Home Hub (later called Nest Hub) added touchscreens and video, creating the "smart display" subcategory. In 2018, Apple joined the smart speaker trend by launching the HomePod, which focused on high-quality audio alongside their built-in assistant Siri. ASUS release its own smart Speaker Xiao-Bu in 2019 with Artificial Intelligence, it terminates the Cloud Service on June 1st, 2025, which means all real-time service such as weather, news, currency conversion is affected. Sonos's 1st smart speaker Sonos One released in 2017, powered by Alexa. Invoke by Harman Kardon was powered by Microsoft's intelligent personal assistant, Cortana. In the early 2020s, smart speakers gained on-device voice processing for faster responses and improved privacy. New standards such as Matter and Thread allowed multitudes of smart-home devices (even from completely different brands) to work together. == Features == === Audio and Voice === Smart speakers use multiple microphones along with noise-cancelling software to pick up your voice from across the room, even when music is playing or the assistant is already talking. Noise suppression and echo cancellation is also used by the speaker so it can focus in on who is talking and ignore any background noises. Most smart speaker models can recognize who is speaking by voiceprint, which allows the speaker to grab information from that person's calendar, preferences, or music playlists. Listening to music on a speaker is when importance for good audio quality becomes apparent. Entry-level (cheaper) speakers such as the Home Mini or the Echo Dot have a single full-range driver. These lower-end speakers typically aren't great for listening to music as the audio quality is pretty poor. More advanced units such as the Home Max or Echo Studio have separate tweeters and woofers meant for listening to music in high quality. === Connectivity and smart-home control === Most connect over Wi-Fi or Bluetooth and support hub protocols like Thread and Matter. That lets them not only stream and play music but also allows you to control various brands of smart lights, thermostats, door locks, cameras, and much more-all from one point of control. Each can have its own designated interface and features in-house, usually launched or controlled via application or home automation software. These devices are able to communicate with each other via peer-to-peer connection through mesh networking. These speakers and related smart devices are typically controlled with one smartphone application. === Assistant services and skills === The built-in assistants handle timers, alarms, reminders, news briefings, weather updates, send messages to other smart devices, send texts, make calls, and simple questions. You can combine actions together in what are typically known as routines (for example saying "good morning" turns on lights, starts the coffee, says the weather, and reads the news) and add extra functions known as skills or actions (for things like ordering food or playing trivia games). This hands-free use of smart speakers can help assist those with disabilities. Most other technologies need the user to be able to physically interact with the device. Smart speakers are not bound by these limitations and can serve as an excellent tool for those who are unable to use their arms or legs or have vision issues. Although these tasks can be completed by a phone or computer, consumers tend to lean towards smart speakers due to factors such as their range being much greater than that of a phone and the need to not have to physically interact with the speaker to get the voice assistant as with most smartphones, certain parts of a phone may need to be interacted with to activate the speaking assistant. === Smart displays === Some smart speakers also include a screen to show the user a visual response. A smart speaker with a touchscreen is known as a smart display; these integrate a conversational user interface with display screens to augment voice interaction with images and video. They are powered by one of the common voice assistants and offer additional controls for smart home devices, feature streaming apps, and web browsers with touch controls for selecting content. The first smart displays were introduced in 2017 by Amazon (Amazon Echo Show) and Google (Google/Nest Home Hub). Hotel chain Marriott International partnered with Amazon to install Echo devices in select hotels since 2018. A Taiwanese startup, Aiello, launched the Aiello Voice Assistant (AVA) in the Asian hotel market in 2019, claiming it is powered by a multi-AI model system. Angie by Nomadix, which is similar to the Amazon Echo, launched its first product in 2017, specifically targeting hotel properties in the North America. In May 2019, Angie Hospitality acquired the assets of Roxy, a competitor that also built its own speech-enabled virtual assistant technology for hotels. This acquisition merged two proprietary NLP stacks into the current Nomadix product. === Artificial intelligence === The newest speakers can use on-device AI or cloud-based generative models to allow the smart speaker to carry on much more natural conversations, draft emails or recipes, suggest ideas based on context, or even create short pieces of music or art. This AI evolution allows these speakers to do far more than what they could do before. == Accuracy == According to a study by Proceedings of the National Academy of Sciences of the United States of America released In March 2020, the six biggest tech development companies, Amazon, Apple, Google, Yandex, IBM and Microsoft, have misidentified more words spoken by "black people" than "white people". The systems tested errors and unreadability, with a 19 and 35 percent discrepancy for the former and a 2 and 20 percent discrepancy for the latter. The North American Chapter of the Association for Computational Linguistics (NAACL) also identified a discrepancy between male and female voices. According to their research, Google's speech recognition software is 13 percent more accurate for men than women. It performs better than the systems used by Bing, AT&T, and IBM. == Privacy concerns == The built-in microphone in smart speakers is continuously listening for wake words followed by a command. However, these continuously listening microphones also raise privacy concerns among users. According to a survey taken by 1,007 people in Western Europe, it is clear that privacy is the biggest concern holding consumers back from buying "smart" products. these concerns include what is being recorded, how the data will be used, how it will be protected, and whether it will be used for invasive advertising. Furthermore, an analysis of Amazon Echo Dots showed that 30–38% of "spurious audio recordings were human conversations", suggesting that these devices capture audio other than strictly detection of the wake word. === As a wiretap === There are strong concerns that the ever-listening microphone of smart speakers presents a perfect candidate for wiretapping. In 2017, British security researcher Mark Barnes showed that pre-2017 Echos have exposed pins which allow for a compromised OS to be booted. According to Umar Iqbal, an assistant professor at Washington University in St. Louis, research indicates that data from consumer interactions with Alexa was used to targeted advertisements and products to consumer with over 40% of transmitted data lacking proper encryption raising privacy concerns. Further data indicates that due to the Smart Speakers ability to always capture audio, it begins to pick up on external conversations from consumers not related to commands given to the smart speaker. Things such as other members in the household, consumers on the phone and even TV audio can be picked up by these speakers and stored for future use by companies. === Voice assistance vs privacy === While voice assistants provide a valuable service, there can be some hesitation towards using them in various social contexts, such as in public or around other users. However, only more recently have users begun interac

    Read more →
  • Digital image processing

    Digital image processing

    Digital image processing is the use of a digital computer to process digital images through an algorithm. As a subcategory or field of digital signal processing, digital image processing has many advantages over analog image processing. It allows a much wider range of algorithms to be applied to the input data and can avoid problems such as the build-up of noise and distortion during processing. Since images are defined over two dimensions (perhaps more), digital image processing may be modeled in the form of multidimensional systems. The generation and development of digital image processing are mainly affected by three factors: first, the development of computers; second, the development of mathematics (especially the creation and improvement of discrete mathematics theory); and third, the demand for a wide range of applications in environment, agriculture, military, industry and medical science has increased. == History == Many of the techniques of digital image processing, or digital picture processing as it often was called, were developed in the 1960s, at Bell Laboratories, the Jet Propulsion Laboratory, Massachusetts Institute of Technology, University of Maryland, and a few other research facilities, with application to satellite imagery, wire-photo standards conversion, medical imaging, videophone, character recognition, and photograph enhancement. The purpose of early image processing was to improve the quality of the image. In image processing, the input is a low-quality image, and the output is an image with improved quality. Common image processing includes image enhancement, restoration, encoding, and compression. The first successful application was the American Jet Propulsion Laboratory (JPL). They used image processing techniques such as geometric correction, gradation transformation, noise removal, etc. on the thousands of lunar photos sent back by the Space Detector Ranger 7 in 1964, taking into account the position of the Sun and the environment of the Moon. The impact of the successful mapping of the Moon's surface map by the computer has been a success. Later, more complex image processing was performed on the nearly 100,000 photos sent back by the spacecraft, so that the topographic map, color map and panoramic mosaic of the Moon were obtained, which achieved extraordinary results and laid a solid foundation for human landing on the Moon. The cost of processing was fairly high, however, with the computing equipment of that era. That changed in the 1970s, when digital image processing proliferated as cheaper computers and dedicated hardware became available. This led to images being processed in real-time, for some dedicated problems such as television standards conversion. As general-purpose computers became faster, they started to take over the role of dedicated hardware for all but the most specialized and computer-intensive operations. With the fast computers and signal processors available in the 2000s, digital image processing has become the most common form of image processing, and is generally used because it is not only the most versatile method, but also the cheapest. === Image sensors === The basis for modern image sensors is metal–oxide–semiconductor (MOS) technology, invented at Bell Labs between 1955 and 1960, This led to the development of digital semiconductor image sensors, including the charge-coupled device (CCD) and later the CMOS sensor. The charge-coupled device was invented by Willard S. Boyle and George E. Smith at Bell Labs in 1969. While researching MOS technology, they realized that an electric charge was the analogy of the magnetic bubble and that it could be stored on a tiny MOS capacitor. As it was fairly straightforward to fabricate a series of MOS capacitors in a row, they connected a suitable voltage to them so that the charge could be stepped along from one to the next. The CCD is a semiconductor circuit that was later used in the first digital video cameras for television broadcasting. The NMOS active-pixel sensor (APS) was invented by Olympus in Japan during the mid-1980s. This was enabled by advances in MOS semiconductor device fabrication, with MOSFET scaling reaching smaller micron and then sub-micron levels. The NMOS APS was fabricated by Tsutomu Nakamura's team at Olympus in 1985. The CMOS active-pixel sensor (CMOS sensor) was later developed by Eric Fossum's team at the NASA Jet Propulsion Laboratory in 1993. By 2007, sales of CMOS sensors had surpassed CCD sensors. MOS image sensors are widely used in optical mouse technology. The first optical mouse, invented by Richard F. Lyon at Xerox in 1980, used a 5 μm NMOS integrated circuit sensor chip. Since the first commercial optical mouse, the IntelliMouse introduced in 1999, most optical mouse devices use CMOS sensors. === Image compression === An important development in digital image compression technology was the discrete cosine transform (DCT), a lossy compression technique first proposed by Nasir Ahmed in 1972. DCT compression became the basis for JPEG, which was introduced by the Joint Photographic Experts Group in 1992. JPEG compresses images down to much smaller file sizes, and has become the most widely used image file format on the Internet. Its highly efficient DCT compression algorithm was largely responsible for the wide proliferation of digital images and digital photos, with several billion JPEG images produced every day as of 2015. Medical imaging techniques produce very large amounts of data, especially from CT, MRI and PET modalities. As a result, storage and communications of electronic image data are prohibitive without the use of compression. JPEG 2000 image compression is used by the DICOM standard for storage and transmission of medical images. The cost and feasibility of accessing large image data sets over low or various bandwidths are further addressed by use of another DICOM standard, called JPIP, to enable efficient streaming of the JPEG 2000 compressed image data. === Digital signal processor (DSP) === Electronic signal processing was revolutionized by the wide adoption of MOS technology in the 1970s. MOS integrated circuit technology was the basis for the first single-chip microprocessors and microcontrollers in the early 1970s, and then the first single-chip digital signal processor (DSP) chips in the late 1970s. DSP chips have since been widely used in digital image processing. The discrete cosine transform (DCT) image compression algorithm has been widely implemented in DSP chips, with many companies developing DSP chips based on DCT technology. DCTs are widely used for encoding, decoding, video coding, audio coding, multiplexing, control signals, signaling, analog-to-digital conversion, formatting luminance and color differences, and color formats such as YUV444 and YUV411. DCTs are also used for encoding operations such as motion estimation, motion compensation, inter-frame prediction, quantization, perceptual weighting, entropy encoding, variable encoding, and motion vectors, and decoding operations such as the inverse operation between different color formats (YIQ, YUV and RGB) for display purposes. DCTs are also commonly used for high-definition television (HDTV) encoder/decoder chips. == Tasks == Digital image processing allows the use of much more complex algorithms, and hence, can offer both more sophisticated performance at simple tasks, and the implementation of methods which would be impossible by analogue means. In particular, digital image processing is a concrete application of, and a practical technology based on: Classification Feature extraction Multi-scale signal analysis Pattern recognition Projection Some techniques that are used in digital image processing include: Anisotropic diffusion Hidden Markov models Image editing Image restoration Independent component analysis Linear filtering Neural networks Partial differential equations Pixelation Point feature matching Principal components analysis Self-organizing maps Wavelets == Digital image transformations == === Filtering === Digital filters are used to blur and sharpen digital images. Filtering can be performed by: convolution with specifically designed kernels (filter array) in the spatial domain masking specific frequency regions in the frequency (Fourier) domain The following examples show both methods: ==== Image padding in Fourier domain filtering ==== Images are typically padded before being transformed to the Fourier space, the highpass filtered images below illustrate the consequences of different padding techniques: Notice that the highpass filter shows extra edges when zero padded compared to the repeated edge padding. ==== Filtering code examples ==== MATLAB example for spatial domain highpass filtering. === Affine transformations === Affine transformations enable basic image transformations including scale, rotate, translate, mirror and shear as is shown in the following examples: To apply the affine

    Read more →
  • Distributed artificial intelligence

    Distributed artificial intelligence

    Distributed Artificial Intelligence (DAI) (also called Decentralized Artificial Intelligence) is a melding of artificial intelligence with distributed computing. From artificial intelligence comes the theory and technology for constructing or analyzing an intelligent system. But where artificial intelligence uses psychology as a source of ideas, inspiration, and metaphor, DAI uses sociology, economics, and management science for inspiration. Where the focus of artificial intelligence is on the individual, the focus of DAI is on the group. Distributed computing provides the computational substrate on which this group focus can occur. Using techniques from artificial intelligence, communication theory, control theory, and interaction theory, it produces a cooperative solution to problems by a decentralized group of computational entities (agents). DAI is closely related to and a predecessor of the field of multi-agent systems. They are distinguished generally by multi-agent systems being open, where the entities might arise from different interests and have individual goals, and distributed artificial-intelligence systems, where the entities have common goals. There are numerous applications and tools. == Definition == Distributed Artificial Intelligence (DAI) is an approach to solving complex learning, planning, and decision-making problems. It is embarrassingly parallel, thus able to exploit large scale computation and spatial distribution of computing resources. These properties allow it to solve problems that require the processing of very large data sets. DAI systems consist of autonomous learning processing nodes (agents), that are distributed, often at a very large scale. DAI nodes can act independently, and partial solutions are integrated by communication between nodes, often asynchronously. By virtue of their scale, DAI systems are robust and elastic, and by necessity, loosely coupled. Furthermore, DAI systems are built to be adaptive to changes in the problem definition or underlying data sets due to the scale and difficulty in redeployment. DAI systems do not require all the relevant data to be aggregated in a single location, in contrast to monolithic or centralized Artificial Intelligence systems, which have tightly coupled and geographically close processing nodes. Therefore, DAI systems often operate on sub-samples or hashed impressions of very large datasets. In addition, the source dataset may change or be updated during the course of the execution of a DAI system. == Development == In 1975 distributed artificial intelligence emerged as a subfield of artificial intelligence that dealt with interactions of intelligent agents. As a scientific discipline, it progressed through a series of workshops in the USA (International Workshop on Distributed Artificial Intelligence, held in 13 editions from 1978 - 1994), Europe (Workshop on Modelling Autonomous Agents in a Multi-Agent World https://link.springer.com/conference/maamaw), and Asia (Multi-Agent and Cooperative Computation Workshop (MACC) https://sites.google.com/view/sig-macc/macc-workshop?authuser=0). Distributed artificial intelligence systems were conceived as a group of intelligent entities, called agents, that interacted by cooperation, by coexistence, or by competition. DAI is categorized into multi-agent systems and distributed problem solving. In multi-agent systems the main focus is how agents coordinate their knowledge and activities. For distributed problem solving the major focus is how the problem is decomposed and the solutions are synthesized. == Goals == The objectives of Distributed Artificial Intelligence are to solve the reasoning, planning, learning and perception problems of artificial intelligence, especially if they require large data, by distributing the problem to autonomous processing nodes (agents). To reach the objective, DAI requires: A distributed system with robust and elastic computation on unreliable and failing resources that are loosely coupled Coordination of the actions and communication of the nodes Subsamples of large data sets and online machine learning There are many reasons for wanting to distribute intelligence or cope with multi-agent systems. Mainstream problems in DAI research include the following: Parallel problem solving: mainly deals with how classic artificial intelligence concepts can be modified, so that multiprocessor systems and clusters of computers can be used to speed up calculation. Distributed problem solving (DPS): the concept of agent, autonomous entities that can communicate with each other, was developed to serve as an abstraction for developing DPS systems. See below for further details. Multi-Agent Based Simulation (MABS): a branch of DAI that builds the foundation for simulations that need to analyze not only phenomena at macro level but also at micro level, as it is in many social simulation scenarios. == Approaches == Two types of DAI has emerged: In Multi-agent systems agents coordinate their knowledge and activities and reason about the processes of coordination. Agents are physical or virtual entities that can act, perceive their environment, and communicate with other agents. An agent is autonomous and has skills to achieve goals. The agents change the state of their environment by their actions. There are a number of different coordination techniques. In distributed problem solving the work is divided among nodes and the knowledge is shared. The main concerns are task decomposition and synthesis of the knowledge and solutions. DAI can apply a bottom-up approach to AI, similar to the subsumption architecture as well as the traditional top-down approach of AI. In addition, DAI can also be a vehicle for emergence. === Challenges === The challenges in Distributed AI are: How to carry out communication and interaction of agents and which communication language or protocols should be used. How to ensure the coherency of agents. How to synthesise the results among 'intelligent agents' group by formulation, description, decomposition and allocation. == Applications and tools == Areas where DAI have been applied are: Electronic commerce, e.g. for trading strategies the DAI system learns financial trading rules from subsamples of very large samples of financial data Networks, e.g. in telecommunications the DAI system controls the cooperative resources in a WLAN network Routing, e.g. model vehicle flow in transport networks Scheduling, e.g. flow shop scheduling where the resource management entity ensures local optimization and cooperation for global and local consistency Search engines, e.g. in LLM federated search like Ithy where document retrieval and analysis are distributed to DAI agents before aggregation Multi-Agent systems, e.g. artificial life, the study of simulated life Electric power systems, e.g. Condition Monitoring Multi-Agent System (COMMAS) applied to transformer condition monitoring, and IntelliTEAM II Automatic Restoration System DAI integration in tools has included: ECStar is a distributed rule-based learning system. == Agents == === Systems: Agents and multi-agents === Notion of Agents: Agents can be described as distinct entities with standard boundaries and interfaces designed for problem solving. Notion of Multi-Agents: Multi-Agent system is defined as a network of agents which are loosely coupled working as a single entity like society for problem solving that an individual agent cannot solve. === Software agents === The key concept used in DPS and MABS is the abstraction called software agents. An agent is a virtual (or physical) autonomous entity that has an understanding of its environment and acts upon it. An agent is usually able to communicate with other agents in the same system to achieve a common goal, that one agent alone could not achieve. This communication system uses an agent communication language. A first classification that is useful is to divide agents into: reactive agent – A reactive agent is not much more than an automaton that receives input, processes it and produces an output. deliberative agent – A deliberative agent in contrast should have an internal view of its environment and is able to follow its own plans. hybrid agent – A hybrid agent is a mixture of reactive and deliberative, that follows its own plans, but also sometimes directly reacts to external events without deliberation. Well-recognized agent architectures that describe how an agent is internally structured are: ASMO (emergence of distributed modules) BDI (Believe Desire Intention, a general architecture that describes how plans are made) InterRAP (A three-layer architecture, with a reactive, a deliberative and a social layer) PECS (Physics, Emotion, Cognition, Social, describes how those four parts influences the agents behavior). Soar (a rule-based approach)

    Read more →
  • Adaptive neuro fuzzy inference system

    Adaptive neuro fuzzy inference system

    An adaptive neuro-fuzzy inference system or adaptive network-based fuzzy inference system (ANFIS) is a kind of artificial neural network that is based on Takagi–Sugeno fuzzy inference system, a class of fuzzy models introduced by Tomohiro Takagi and Michio Sugeno for system identification and control. The technique was developed in the early 1990s. Since it integrates both neural networks and fuzzy logic principles, it has potential to capture the benefits of both in a single framework. Its inference system corresponds to a set of fuzzy IF–THEN rules that have learning capability to approximate nonlinear functions. Hence, ANFIS is considered to be a universal estimator. For using the ANFIS in a more efficient and optimal way, one can use the best parameters obtained by genetic algorithm. It has uses in intelligent situational aware energy management system. == ANFIS architecture == It is possible to identify two parts in the network structure, namely premise and consequence parts. In more details, the architecture is composed by five layers. The first layer takes the input values and determines the membership functions belonging to them. It is commonly called fuzzification layer. The membership degrees of each function are computed by using the premise parameter set, namely {a,b,c}. The second layer is responsible of generating the firing strengths for the rules. Due to its task, the second layer is denoted as "rule layer". The role of the third layer is to normalize the computed firing strengths, by dividing each value for the total firing strength. The fourth layer takes as input the normalized values and the consequence parameter set {p,q,r}. The values returned by this layer are the defuzzificated ones and those values are passed to the last layer to return the final output. === Fuzzification layer === The first layer of an ANFIS network describes the difference to a vanilla neural network. Neural networks in general are operating with a data pre-processing step, in which the features are converted into normalized values between 0 and 1. An ANFIS neural network doesn't need a sigmoid function, but it's doing the preprocessing step by converting numeric values into fuzzy values. Here is an example: Suppose, the network gets as input the distance between two points in the 2d space. The distance is measured in pixels and it can have values from 0 up to 500 pixels. Converting the numerical values into fuzzy numbers is done with the membership function which consists of semantic descriptions like near, middle and far. Each possible linguistic value is given by an individual neuron. The neuron “near” fires with a value from 0 until 1, if the distance is located within the category "near". While the neuron “middle” fires, if the distance in that category. The input value “distance in pixels” is split into three different neurons for near, middle and far.

    Read more →
  • Nature Manifesto

    Nature Manifesto

    Nature Manifesto is an Immersive sound piece and multimedia installation by Icelandic artist Björk and artist and curator Aleph Molinari, created in collaboration with the French Institute for Research and Coordination in Acoustics/Music (IRCAM). The installation was showcased at the Centre Pompidou in Paris, France from November 20, 2024 to December 9, 2024, as part of the museum's "Biodiversity: Which Culture for Which Future?" forum. It combines natural soundscapes, calls of extinct animals reconstructed through artificial intelligence, and Björk's narration to address damages to biodiversity and the collapse of ecosystems. == Background == Björk's work intricately weaves themes of nature and technology, reflecting her deep engagement with both realms. In 2008, she co-founded the Náttúra campaign to protest the construction of foreign-backed aluminum factories in Iceland, aiming to protect the country's natural landscapes. She released the single "Náttúra" featuring Thom Yorke, with all proceeds supporting this environmental initiative. Her 2011 album Biophilia further exemplifies this synthesis, exploring the relationships between music, nature, and technology through a multimedia project that included interactive apps, custom-made instruments, and educational workshops. Björk's Cornucopia tour (2019-2023) seamlessly integrates themes of nature preservation and environmental activism, and featured a recorded message by Swedish climate activist Greta Thunberg. The tour's fusion of music, technology, and natural imagery reflects Björk's vision of a harmonious coexistence between humanity and nature, advocating for sustainable futures. Björk has previously used artificial intelligence in her works. In 2020, she collaborated with Microsoft to create Kórsafn, a sound installation for the Sister City Hotel lobby in New York City which used an AI-powered model that elaborated choral recordings from her discography through a sensor on the rooftop of the building that would generate music according to data like the weather and the seasons. For her charity single "Oral", featuring Spanish singer Rosalía, she released a music video directed by photographer and visual artist Carlota Guerrero, who used AI-generated deepfake versions of the artists. == Concept == Nature Manifesto is a three-minute and forty-second immersive sound piece. The composition merges Björk's voice, as she articulates a manifesto on biodiversity and the climate crisis, with cries of extinct and endangered animals, harmonizing them with natural soundscapes. The installation was curated by Chloé Siganos and Aleph Molinari, with associate curator Delphine Le Gatt. The primary goal of Nature Manifesto is to foster a deeper understanding of humanity's impact on the natural world. Conceived as a "post-optimistic" manifesto, Aleph Molinari stated that the project's purpose was to "offer a voice to nature". He stated that "the modern concept of nature itself is problematic [...] because it’s a concept born in the Romantic period and, with the rise of the industrial era, became an antithesis to human civilisation and everything urban. Nature came to define what was outside, the savage Other... But nature is everything that we’re part of." The soundscape features recreated calls of extinct and endangered species, developed in collaboration with the French sound research institute IRCAM. Artificial intelligence was employed to simulate the vocalizations of animals that no longer exist in the wild. To save energy and lessen the ecological impact of the use of AI, the research institute developed a "frugal AI" model capable of generating audio in real-time on local servers without a graphics processing unit. The sounds were then produced and edited by Björk in collaboration with Robin Meier Wiratunga and Bergur Þórisson. The installation was located within the Centre Pompidou's escalator, known as the "caterpillar". The installation was further supported by videos created by visual artist Sam Balfus (also known as Balfua) by using artificial intelligence, and edited by Santiago Molinari. == Activism == To sustain and broaden the themes presented in Nature Manifesto, Björk publicly urged French President Emmanuel Macron to prohibit bottom trawling within France's marine protected areas (MPA). She criticized the French government's claim of protecting 30% of its marine territories, highlighting that over 90% of these MPAs exist only on paper, allowing destructive practices like bottom trawling to continue unchecked. She collaborated with non-governmental organizations Sustainable Ocean Alliance, Ungir umhverfissinnar and Bloom, to advocate for genuine ocean conservation. Björk promoted the cause through her social media profiles by sharing petitions. In November 2024, Björk lent her Instagram account to French environmental activists to directly address Macron. The activists used the platform to call for stronger protection of the ocean, urging Macron to impose stricter restrictions on harmful fishing practices, particularly bottom trawling. == Reception == Nature Manifesto received mixed to positive reviews from critics. Some critiques focused on the installation's setting, suggesting that the movement inherent to the escalator space diminished the immersive potential of the soundscape. The choice of using artificial intelligence was also questioned. Björk and Molinari defended this, as both see AI as a tool that can be used creatively and sustainably, with Björk focusing on the importance of human input to give AI a "soul", and Molinari stressing the need for sustainable technological practices in the broader context of digital life. After the exhibition ended, Björk further opinionated: "this is how we will work in the future. [...] if there is no soul in tomorrow's music made by AI it is because [no one] put it there and we have to speak out and guard this as listeners", further stating that there is already "soulless muzak" [sic] on Spotify, "mass manufactured without the attention of creativity".

    Read more →
  • Rapid PHP Editor

    Rapid PHP Editor

    rapid PHP Editor is a PHP Editor that incorporates many functions such as AutoComplete, Syntax checker, debugger and many other tools for fast PHP development. Rapid PHP Editor also contain other development tools for helping on HTML, CSS, JavaScript and many other languages. Is part of a family of products covering most aspects of modern web development integrating as well many other capabilities used by developers. Some features: (X)HTML to HTML5 CSS to CSS3 Code intelligence Powerful search and replace Support for several frameworks Code beautifier FTP Explorer (FTP/SFTP/FTPS) File explorer Database explorer Code snippets Validators and Debuggers FAST, real fast Many other tools available (many more to describe all here) == History == Rapid PHP Editor was built using the Delphi programming language.

    Read more →
  • Tip and cue

    Tip and cue

    Tip and cue, sometimes referred to as tip and que, tipping and cueing, or tipping and queing, is a method for satellite imagery and reconnaissance satellites to automatically coordinate tracking of objects across different satellites in real or near real-time. This technique ensures continuous tracking of targets as they move across different regions by handing them off between satellites, sharing satellite imagery and collateral across discrete satellites. The coordination between various satellites and their complementary sensors allows for more accurate and efficient data collection. This system is particularly useful in scenarios requiring real-time monitoring and rapid response; the method significantly improves situational awareness and operational effectiveness. Tip and cue techniques involve integrating various sensor systems, each playing a specific role in the tracking process. As a target moves, it is handed off from one satellite to another, ensuring continuous monitoring. This coordination optimizes data collection and analysis, enhancing overall tracking accuracy. The real-time information gathered by these satellites is critical for decision-making in various applications, including defense and surveillance. By leveraging multiple satellites and their sensors, it provides broader coverage and more reliable tracking, and the continuous handoff between satellites ensures there are no gaps in monitoring, essential for high-stakes applications. The real-time data provided by this system allows for timely and informed decisions, improving response times and outcomes. Tip and cue methodologies are a part of geospatial intelligence, or GEOINT. Robert Cardillo, a former director of the National Geospatial-Intelligence Agency, highlighted the importance of tip and cue methods to their data collection efforts in 2015. == Historical Development == The concept of tip and cue in satellite monitoring has its origins in early military applications designed to enhance missile detection and tracking systems. During the Cold War, advancements in infrared sensing technologies laid the groundwork for more sophisticated tip and cue techniques. The integration of different sensor types, such as radar and optical sensors, in the 1990s expanded the capabilities of tip and cue systems beyond military applications. These advancements have made tip and cue techniques essential for various civilian uses, including disaster monitoring and environmental surveillance. Significant progress was made with the advent of high-speed data processing and communication technologies in the early 2000s, further refining the method. Advanced algorithms and data fusion techniques have been introduced to better integrate information from multiple sensors. Machine learning technologies now play a crucial role in improving detection and prediction capabilities, allowing for more adaptive and efficient tracking. Richmond and Brennan of Lockheed Martin, presenting to the annual technical conference of the Maui Space Surveillance Complex (formerly the Air Force Maui Optical Station (AMOS)), discussed the algorithms needed for 'tip and cue', to facilitate "multi-phenomenology data fusion." The Space Surveillance Telescope (SST) at Naval Communication Station Harold E. Holt in Australia, operated by the United States Space Force and designed by the Massachusetts Institute of Technology Lincoln Laboratory, was reported by the Defense Advanced Research Projects Agency (DARPA) to be a leader in creating and improving tip and cue techniques, from a large library of orbital object data. == Technical overview == Tip and cue systems utilize a network of at least two satellites equipped with complementary sensor technologies to track moving objects in real-time. The method involves detecting a target with a primary sensor, such as an infrared or photographic sensor, which then cues secondary sensors on the same or other satellites for more detailed monitoring. This handoff process between discrete systems ensures continuous tracking as the target moves across different areas, leveraging each systems strengths. Data collected by these systems and sensors are rapidly processed and shared among the network, enhancing situational awareness. This coordination optimizes resource usage and improves the accuracy of tracking moving objects over large areas. The primary sensors detect initial targets based on specific signatures, such as heat or movement, and then cue secondary sensors to gather more precise data. This ensures that each sensor operates within its optimal range, maintaining high tracking accuracy and reliability. The integration of various sensor types, including optical, radar, and infrared, allows the system to function effectively under different conditions and environments. Real-time data processing and communication between satellites and ground stations are crucial for timely and accurate target tracking. Satellites using tip and cue processes may use either passive or active scanning methodoloigies. These systems may also leverage both orbital and ground-based ELINT (electronic signals intelligence). == Known use cases == Tip and cue systems have been extensively utilized in military applications, particularly for missile detection and defense. These systems enable early detection of missile launches using infrared sensors, which then cue other sensors to track the missile's trajectory more accurately. In environmental monitoring, tip and cue techniques help track natural disasters such as wildfires and hurricanes by coordinating various satellite sensors for comprehensive data collection and analysis. Surveillance and reconnaissance operations also benefit from tip and cue systems, which provide continuous and precise tracking of moving objects, enhancing situational awareness. Additionally, these systems are used in maritime surveillance to monitor ship movements and detect illegal activities such as smuggling and piracy. Tip and cue systems are used in disaster management. For instance, during wildfires, infrared sensors can detect heat signatures, prompting other sensors to gather detailed imagery and data on fire spread and intensity. This coordinated approach allows for real-time monitoring and rapid response, crucial for mitigating damage and saving lives. Similarly, in hurricane tracking, satellites equipped with various sensors can monitor storm development and progression, providing timely information for emergency management agencies. The integration of multiple sensor types ensures accurate and comprehensive coverage of these dynamic and fast-changing events. In maritime surveillance, or maritime domain awareness (MDA), tip and cue systems enhance the detection and monitoring of vessel movements, contributing to maritime security. By coordinating satellite sensors, these systems can track ships over vast ocean areas, identifying potential threats or illegal activities such as smuggling, piracy, and illegal fishing. The ability to maintain continuous surveillance and share data in real-time with maritime authorities improves response times and enforcement capabilities. This application of tip and cue systems not only aids in law enforcement but also supports environmental conservation efforts by monitoring protected marine areas. Automatic Identification System (AIS) is one of the most important sources of data for the MDA agencies. AIS is used in order for ships to know each other's whereabouts, they transmit a signal from ship to ship and to shore. Lately, the system has been developed into satellite system, so called satellite AIS, which makes the system more effective. All ocean-going vessels above 300 tons, are supposed to use and transmit via AIS according to the International Maritime Organization. The satellite constellations help facilitate this with tip and cue methodologies.

    Read more →
  • Whisper (speech recognition system)

    Whisper (speech recognition system)

    Whisper is a machine learning model for speech recognition and transcription, created by OpenAI and first released as open-source software in September 2022. It is capable of transcribing speech in English and multiple other languages, and can translate several non-English languages into English. Whisper is a weakly-supervised deep learning acoustic model, made using an encoder-decoder transformer architecture. OpenAI claims that the combination of different training data and post-training filtering used in its development has led to improved recognition of accents, background noise, and jargon compared to previous approaches. While the model does not outperform larger, more specialized models and still experiences AI hallucination, it has been showed to be useful for general sound recognition and has many applications across different industries. == Background == Speech recognition has had a long history in research; the first approaches made use of statistical methods, such as dynamic time warping, and later hidden Markov models. At around the 2010s, deep neural network approaches became more common for speech recognition models, which were enabled by the availability of large datasets ("big data") and increased computational performance. Early approaches to deep learning in speech recognition included convolutional neural networks, which were limited due to their inability to capture sequential data, which later led to developments of Seq2seq approaches, which include recurrent neural networks, which made use of long short-term memory. Transformers, introduced in 2017 by Google, displaced many prior state-of-the-art approaches across a wide range in machine learning, and started becoming the core neural architecture in fields such as language modeling and computer vision. Weakly-supervised approaches to training acoustic models were recognized in the early 2020s as promising for speech recognition approaches using deep neural networks. According to a NYT report, in 2021 OpenAI believed they exhausted sources of higher-quality data to train their large language models and decided to complement scraped web text with transcriptions of YouTube videos and podcasts, and developed Whisper to solve this task. Whisper Large V2 was released on December 8, 2022, followed by Whisper Large V3 being released in November 2023, during the OpenAI Dev Day. In March 2025, OpenAI released new transcription models based on GPT-4o and GPT-4o mini, both of which have lower error rates than Whisper. == Architecture == The Whisper architecture is based on an encoder-decoder transformer. Input audio is resampled to 16,000 Hertz (Hz) and converted to an 80-channel Log-magnitude Mel spectrogram using 25 ms windows with a 10 ms stride. The spectrogram is then normalized to a [-1, 1] range with near-zero mean. The encoder takes this Mel spectrogram as input and processes it. It first passes through two convolutional layers. Sinusoidal positional embeddings are added. It is then processed by a series of Transformer encoder blocks (with pre-activation residual connections). The encoder's output is layer normalized. The decoder is a standard transformer decoder. It has the same width and Transformer blocks as the encoder. It uses learned positional embeddings and tied input-output token representations (using the same weight matrix for both the input and output embeddings). It uses a byte-pair encoding tokenizer, of the same kind as used in GPT-2. English-only models use the GPT-2 vocabulary, while multilingual models employ a re-trained multilingual vocabulary with the same number of words. Special tokens are used to allow the decoder to perform multiple tasks: Tokens that denote language (one unique token per language). Tokens that specify task (<|transcribe|> or <|translate|>). Tokens that specify if no timestamps are present (<|notimestamps|>). If the token is not present, then the decoder predicts timestamps relative to the segment, and quantized to 20 ms intervals. <|nospeech|> for voice activity detection. <|startoftranscript|>, and <|endoftranscript|> . Any text that appears before <|startoftranscript|> is not generated by the decoder, but given to the decoder as context. Loss is only computed over non-contextual parts of the sequence, i.e. tokens between these two special tokens. == Training data == The training dataset consists of 680,000 hours of labeled audio-transcript pairs sourced from the internet using semi-supervised learning. This includes 117,000 hours in 96 non-English languages and 125,000 hours of X→English translation data, where X stands for any non-English language. Preprocessing involved standardization of transcripts, filtering to remove machine-generated transcripts using heuristics (e.g., punctuation, capitalization), language identification and matching with transcripts, fuzzy deduplication, and deduplication with evaluation datasets to avoid data contamination. Speechless segments were also included to allow voice activity detection training. For the files still remaining after the filtering process, audio files were then broken into 30-second segments paired with the subset of the transcript that occurs within that time. If this predicted spoken language differed from the language of the text transcript associated with the audio, that audio-transcript pair was not used for training the speech recognition models, but instead for training translation. The model was trained using the AdamW optimizer with gradient norm clipping and a linear learning rate decay with warmup, with batch size 256 segments. Training proceeded for 1 million updates (approximately 2-3 epochs). No data augmentation or regularization, except for the Large V2 model, which used SpecAugment, Stochastic Depth, and BPE Dropout. The training used data parallelism with float16, dynamic loss scaling, and activation checkpointing. === Post-training filtering === After training the first model, researchers ran it on different subsets of the training data, each representing a distinct source. Data sources were ranked by a combination of their error rate and size. Manual inspection of the top-ranked sources (high error, large size) helped determine if the source was low quality (e.g., partial transcriptions, inaccurate alignment). After training, it was fine-tuned to suppress the prediction of speaker names and low-quality sources were then removed. == Capacity == While Whisper does not outperform models which specialize in the LibriSpeech dataset, when tested across many datasets, it is more robust and makes 55.2% fewer errors than other models. Whisper has a differing error rate with respect to transcribing different languages, with a higher word error rate in languages not well-represented in the training data. The authors found that multi-task learning improved overall performance compared to models specialized to one task. They conjectured that the best Whisper model trained is still underfitting the dataset, and larger models and longer training can result in better models. Third-party evaluations have found varying levels of AI hallucination. A study of transcripts of public meetings found hallucinations in eight out of every 10 transcripts, while an engineer discovered hallucinations in "about half" of 100 hours of transcriptions and a developer identified them in "nearly every one" of 26,000 transcripts. A study of 13,140 short audio segments (averaging 10 seconds) found 187 hallucinations (1.4%), 38% of which generated text that could be harmful because it inserted false references to things like race, non-existent medications, or violent events that were not in the audio. == Applications == The model has been used as the base for many applications, such as a unified model for speech recognition and more general sound recognition. Whisper has also been integrated into the workflow of biomedical research. In 2025, a study on Alzheimer's disease detection used the model to transcribe spontaneous speech recordings. The transcripts that were generated by the model were combined with LLM vector embeddings and traditional classifiers to help classify the patients' health. Another application is when OVALYTICS incorporated Whisper to transcribe YouTube videos and automate content moderation systems, which improved its detection of offensive content. The model has also been used in academic libraries and cultral heritage institutions to generate transcripts and captions for their digitized audiovisual collections. In a 2025 case study, Emory University Libraries found that Whisper reduced the labor used in transcription by around 30-35%, shifting work from text creation to text correction. However, human review is still necessary to make sure accuracy, formatting, and accessibility are all standard.

    Read more →