AI Chat Youtube

AI Chat Youtube — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Text simplification

    Text simplification

    Text simplification is an aspect of natural language processing that involves modifying, organizing, or categorizing existing text to make it easier to understand while retaining its original meaning. This process is essential in today's world, where communication is increasingly complex due to advancements in science, technology, and media. Human languages are inherently intricate, with extensive vocabularies and complex structures that can be challenging for machines to handle efficiently. Researchers have found that semantic compression techniques can help streamline and simplify text by reducing linguistic diversity and simplifying the vocabulary used in a given context. == Example == Text simplification involves modifying complex sentences into simpler ones to enhance readability and comprehension. Siddharthan (2006) provides an example to illustrate this process. The original sentence contains multiple clauses and phrases, which can be broken down into simpler sentences for better understanding. Also contributing to the firmness in copper, the analyst noted, was a report by Chicago purchasing agents, which precedes the full purchasing agents report that is due out today and gives an indication of what the full report might hold. Also contributing to the firmness in copper, the analyst noted, was a report by Chicago purchasing agents. The Chicago report precedes the full purchasing agents report. The Chicago report gives an indication of what the full report might hold. The full report is due out today. An approach to text simplification involves lexical simplification via lexical substitution, a process that replaces complex words with simpler synonyms. Identifying complex words is a challenge addressed by machine learning classifiers trained on labeled data. Researchers have found that asking labelers to sort words by complexity levels yields more consistent results than the traditional method of categorizing words as simple or complex.

    Read more →
  • Anthrobotics

    Anthrobotics

    Anthrobotics is the science of developing and studying robots that are either entirely or in some way human-like. The term anthrobotics was originally coined by Mark Rosheim in a paper entitled "Design of An Omnidirectional Arm" presented at the IEEE International Conference on Robotics and Automation, May 13–18, 1990, pp. 2162–2167. Rosheim says he derived the term from "...Anthropomorphic and Robotics to distinguish the new generation of dexterous robots from its simple industrial robot forebears." The word gained wider recognition as a result of its use in the title of Rosheim's subsequent book Robot Evolution: The Development of Anthrobotics, which focussed on facsimiles of human physical and psychological skills and attributes. However, a wider definition of the term anthrobotics has been proposed, in which the meaning is derived from anthropology rather than anthropomorphic. This usage includes robots that respond to input in a human-like fashion, rather than simply mimicking human actions, thus theoretically being able to respond more flexibly or to adapt to unforeseen circumstances. This expanded definition also encompasses robots that are situated in social environments with the ability to respond to those environments appropriately, such as insect robots, robotic pets, and the like. Anthrobotics is now taught at some universities, encouraging students not only to design and build robots for environments beyond current industrial applications, but also to speculate on the future of robotics that are embedded in the world at large, as mobile phones and computers are today. In 2016 philosopher Luis de Miranda created the Anthrobotics Cluster at the University of Edinburgh "a platform of cross-disciplinary research that seeks to investigate some of the biggest questions that will need to be answered" on the relationship between humans, robots and intelligent systems and "a think tank on the social spread of robotics, and also how automation is part of the definition of what humans have always been". to explore the symbiotic relationship between humans and automated protocols.

    Read more →
  • CatDV

    CatDV

    CatDV is a media asset manager program for handling multimedia production workflows developed by Square Box Systems. Quantum Corporation acquired Square Box Systems in 2020. == Versions == The full family of CatDV Products is as follows: CatDV Standalone Products CatDV Professional Edition CatDV Pegasus CatDV Networked Products CatDV Essential - entry level server product CatDV Enterprise Server - for MySQL databases and most common server platforms including Linux, Windows and Mac OS X CatDV Pegasus Server - adds features such as high performance full-text indexing, access control lists, and more CatDV Worker Node - automated workflow and transcoding engine CatDV Web Client - provides access to the CatDV database via a web browser. There is no need to install special software on the desktop, making it easy to deploy to a large number of users. CatDV Professional Edition & Pegasus Clients - designed to support the multi-user capabilities of the CatDV Enterprise and Workgroup Servers from the desktop Using plugins and scripting, which often require additional professional services support to set up, complex integrations with a wide variety of third party systems (including archive, cloud storage, and artificial intelligence) are possible. == Awards == CatDV won two awards in 2010, a blue ribbon from Creative COW Magazine and a "Best of Show Vidy Award" from Videography. In April 2012 Square Box won a Queen's Award for Enterprise for CatDV.

    Read more →
  • GamePigeon

    GamePigeon

    GamePigeon is a mobile app for iOS devices, developed by Vitalii Zlotskii and released on September 13, 2016. The game takes advantage of the iOS 10 update, which expanded how users could interact with Apple's Messages app. GamePigeon is only available through the Messages app, which allows players to start and respond to different party games in conversations. == Release == The app was first released on September 13, 2016, coinciding with the launch of iOS 10. The app was released for free, although it includes in-app purchases to unlock additional items, such as cosmetic skins, avatar items, new game modes, and an option to remove ads. == Games in the app == The following is a list of games that users can play within GamePigeon: Sources: Poker was one of the games included in GamePigeon at launch, although it has since been removed and is no longer listed on the game's App Store description. == Reception == GamePigeon has enjoyed commercial success, with VentureBeat noting that GamePigeon was ranked number-one in the "Top Free" category of the iMessage App Store, six months after its release. Critically, GamePigeon has been generally well received, being highlighted by online media publications early on shortly after the iOS 10 launch. It has since been included on many "best iMessage apps" lists. Based on over 162,000 ratings, the game holds a 4.0 out of 5 rating on the App Store. Julian Chokkattu of Digital Trends wrote "GamePigeon should be like the pre-installed versions of Solitaire and Minesweeper that used to come with older iterations of Windows." On its launch day, Boy Genius Report included it on a list of "10 of the best iMessage apps, games and stickers for iOS 10 on launch day." The Daily Dot wrote, "GamePigeon is easily the best current gaming option within iMessages." 8-ball and cup pong have been particularly well received by media outlets. The Daily Dot had specific praise for the app's billiards game: "8-Ball controls shockingly smoothly with your fingers, and there’s nothing quite like destroying a dear friend in poker." During his 2020 U.S. presidential campaign, Cory Booker was cited as playing the game with his family. In 2017, CNBC cited one teenager who expressed that GamePigeon was one of just a few reasons that those in her age range use the iMessage app. The game has received particular positive reception for allowing introverted individuals to exercise a form social activity; similarly, the game was highlighted as a way to maintain social distancing guidelines during the COVID-19 pandemic. As an April Fools' Day joke in 2020, The Chronicle, a Duke University newspaper, published that Duke's athletic program adopted GamePigeon's Cup Pong as an official varsity sport.

    Read more →
  • CinePlayer

    CinePlayer

    CinePlayer is a software based media player used to review Digital Cinema Packages (DCP) without the need for a digital cinema server by Doremi Labs. CinePlayer can play back any DCP, not just those created by Doremi Mastering products. In addition to playing DCPs, CinePlayer can also playback JPEG2000 image sequences and many popular multimedia file types. There are two versions of CinePlayer available, standard and Pro. The standard version supports playback of non-encrypted, 2D DCP's up to 2K resolution. The Pro version supports playback of encrypted, 2D or 3D DCP's with subtitles up to 4K resolution. == Supported formats == === Containers === AVI MOV MXF MPG TS WMV M2TS MTS MP4 MKV === Video codecs === JPEG2000 ProRes 422 DNxHD YUV Uncompressed 8-10 bits DIVX XVID MPEG4 AVC / H-264 VC-1 MPEG2 === Supported image sequences === BMP TIFF TGA DPX JPG J2C === Supported audio files === WAV MP3 WMA MP2

    Read more →
  • Robotics

    Robotics

    Robotics is the interdisciplinary study and practice of the design, construction, operation, and use of robots. A roboticist is someone who specializes in robotics. Robotics usually combines four aspects of design work: a power source (e.g. a battery), mechanical construction, a control system (electrical circuits), and software (run by remote control or artificial intelligence). The goal of most robotics is to design machines that can assist humans in various fields, such as agriculture, construction, domestic work, food processing, inventory management, manufacturing, medicine, military, mining, space exploration, and transportation. Robots impact humans by displacing workers. Some expect this to occur at an increasing rate, leading to proposed solutions such as basic income. Robotics is itself a lucrative business that creates careers, especially for postgraduates. Roboticists often aim to create machines that seem to interface naturally with humans. The field is under active research and development, with areas of interest including robot kinematics and quantum robotics. == Design == Robotics usually combines four aspects of design work to create a robot: Power source: Potential energy sources include wired electricity, a battery, and/or petrol. Mechanical construction: A physical form or combination of forms is designed to functionally achieve tasks within a given range of environments. This can include locomotive elements such as wheels and caterpillar tracks, as well as hydraulic limbs and manipulators (e.g. hands). Control system: Electrical circuits (utilizing components such as diodes and transistors) are used to run software, govern motor movement, and read sensors. Software: A program is how a robot decides when or how to do something. Robotic programs can be run by remote control, artificial intelligence (AI), or a hybrid of the two. AI programming is an important part of robotic navigation and human–robot interaction. === Power source === Many different types of batteries can be used as a power source. Most are lead–acid batteries, which are safe and have relatively long shelf lives but are rather heavy compared to silver–cadmium batteries, which are much smaller in volume and much more expensive. Designing a battery-powered robot needs to take into account factors such as safety, cycle lifetime, and weight. Generators, often some type of internal combustion engine, can also be used, but are often mechanically complex and inefficient. Additionally, a tether could connect the robot to a power supply, saving weight and space, but requiring a cumbersome cable. Potential power sources include: Flywheel energy storage Hydraulics Nuclear Organic garbage (through anaerobic digestion) Pneumatics (compressed gases) Solar power === Mechanical construction === Actuators are the "muscles" of a robot, the parts which convert stored energy into movement. The most popular actuators are electric motors that rotate a wheel or gear and linear actuators that control factory robots. Most robots use electric motors—often brushed and brushless DC motors in portable robots or AC motors in industrial robots and computer numerical control machines—especially in systems with lighter loads and where the predominant form of motion is rotational. Meanwhile, linear actuators move in and out and often have quicker direction changes, particularly when large forces are needed, such as with industrial robotics. They are typically powered by oil or compressed air, but can also be powered by electricity, usually via a motor and a leadscrew. The mechanical rack and pinion is common. Recent alternatives to DC motors are piezoelectric motors, including ultrasonic motors, in which tiny piezoceramic elements vibrate many thousands of times per second, causing linear or rotary motion. One type uses the vibration of the piezo elements to step the motor in a circle or a straight line; another type uses the piezo elements to vibrate a nut or drive a screw. The advantages of these motors are nanometer resolution, speed, and force for their size. Series elastic actuation (SEA) relies on introducing intentional elasticity between the motor actuator and the load for robust force control. Due to the resultant lower reflected inertia, series elastic actuation improves safety during robot interactions or collisions. Further, it provides energy efficiency and shock absorption (mechanical filtering) while reducing excessive wear on the transmission and other components. This approach has successfully been employed in various robots, particularly advanced manufacturing robots and walking humanoid robots. The controller design of a series elastic actuator is most often performed within the passivity framework as it ensures the safety of interaction with unstructured environments. However, this framework suffers from stringent limitations imposed on the controller, which may impact performance. Pneumatic artificial muscles, also known as air muscles, are special tubes that expand (typically up to 42%) when air is forced inside them; they are used in some robot applications. Muscle wire, also known as shape memory alloy, is a material that contracts (under 5%) when electricity is applied; they have been used for some small robots. Electroactive polymers are a plastic material that can contract substantially (up to 380% activation strain) from electricity and have been used in the facial muscles and arms of humanoid robots, as well as to enable new robots to float, fly, swim or walk. Additionally, elastic carbon nanotubes are a promising experimental artificial muscle technology. The absence of defects in carbon nanotubes enables these filaments to deform elastically by several percent, with energy storage levels of perhaps 10 J/cm3 for metal nanotubes. Human biceps could be replaced with wire of this material measuring 8 millimetres (3⁄8 in) in diameter, feasibly allowing future robots to outperform humans. ==== Locomotion ==== Robots with only one or two wheel(s) can have advantages such as greater efficiency, reduced parts, and navigation through confined areas. A one-wheeled robot balances on a round ball; Carnegie Mellon University's Ballbot is the approximate height and width of a person. Several attempts have also been made to build spherical robots (also known as orb bots or ball bots), which move by spinning a weight inside the ball or rotating outer shells. Two-wheeled balancing robots generally use a gyroscope to detect how much a robot is falling and drive the wheels proportionally up to hundreds of times per second to counterbalance the fall, based on inverted pendulum dynamics. NASA's Robonaut has been mounted to a Segway for a similar effect. Most mobile robots have four wheels or continuous tracks. Six wheels can give better traction in outdoor terrain, while tracks provide even more grip. Tracked wheels are common for outdoor off-road robots, but are difficult to use indoors. A small number of skating robots have been developed, one of which is a multimodal walking and skating device with four legs and unpowered wheels. Several robots have been made that can walk on two legs, but not yet as reliably as a human. Many other robots have been built that walk on more than two legs, being significantly easier. Walking robots could be used for uneven terrains, providing a high degree of mobility and efficiency, but two-legged robots can currently only handle flat floors or perhaps stairs. Some approaches have included: The zero moment point (ZMP) is the algorithm used by robots such as Honda's ASIMO. The robot's onboard computer tries to keep the total inertial forces (the combination of Earth's gravity and the acceleration and deceleration of walking) exactly opposed by the floor reaction force (the force of the floor pushing back on the robot's foot). In this way, the two forces cancel out, leaving no moment (force causing the robot to rotate and fall over). Human observers note that this is not exactly how a human walks, with some describing ASIMO's walk as looking like it needs use the bathroom. ASIMO's walking algorithm utilizes some dynamic balancing, but requires a flat surface. Several robots, built in the 1980s by Marc Raibert at the MIT Leg Laboratory, successfully demonstrated very dynamic walking. Initially, a robot with only one leg, and a very small foot could stay upright simply by hopping. The movement is the same as that of a person on a pogo stick. As the robot falls to one side, it would jump slightly in that direction to catch itself. Soon, the algorithm was generalized to two and four legs. A bipedal robot was demonstrated running and even performing somersaults. A quadruped was also demonstrated which could trot, run, pace, and bound. A more advanced approach is a dynamic balancing algorithm, which constantly monitors the robot's motion and places the feet to maintain stability. This technique has been demonstrated by Anybots' Dexter robot (

    Read more →
  • Evolutionary robotics

    Evolutionary robotics

    Evolutionary robotics is an embodied approach to Artificial Intelligence (AI) in which robots are automatically designed using Darwinian principles of natural selection. The design of a robot, or a subsystem of a robot such as a neural controller, is optimized against a behavioral goal (e.g. run as fast as possible). Usually, designs are evaluated in simulations as fabricating thousands or millions of designs and testing them in the real world is prohibitively expensive in terms of time, money, and safety. An evolutionary robotics experiment starts with a population of randomly generated robot designs. The worst performing designs are discarded and replaced with mutations and/or combinations of the better designs. This evolutionary algorithm continues until a prespecified amount of time elapses or some target performance metric is surpassed. Evolutionary robotics methods are particularly useful for engineering machines that must operate in environments in which humans have limited intuition (nanoscale, space, etc.). Evolved simulated robots can also be used as scientific tools to generate new hypotheses in biology and cognitive science, and to test old hypothesis that require experiments that have proven difficult or impossible to carry out in reality. == History == In the early 1990s, two separate European groups demonstrated different approaches to the evolution of robot control systems. Dario Floreano and Francesco Mondada at EPFL evolved controllers for the Khepera robot. Adrian Thompson, Nick Jakobi, Dave Cliff, Inman Harvey, and Phil Husbands evolved controllers for a Gantry robot at the University of Sussex. However the body of these robots was presupposed before evolution. The first simulations of evolved robots were reported by Karl Sims and Jeffrey Ventrella of the MIT Media Lab, also in the early 1990s. However these so-called virtual creatures never left their simulated worlds. The first evolved robots to be built in reality were 3D-printed by Hod Lipson and Jordan Pollack at Brandeis University at the turn of the 21st century.

    Read more →
  • Network Abstraction Layer

    Network Abstraction Layer

    The Network Abstraction Layer (NAL) is a part of the H.264/AVC and HEVC video coding standards. The main goal of the NAL is the provision of a "network-friendly" video representation addressing "conversational" (video telephony) and "non conversational" (storage, broadcast, or streaming) applications. NAL has achieved a significant improvement in application flexibility relative to prior video coding standards. == Introduction == An increasing number of services and growing popularity of high definition TV are creating greater needs for higher coding efficiency. Moreover, other transmission media such as cable modem, xDSL, or UMTS offer much lower data rates than broadcast channels, and enhanced coding efficiency can enable the transmission of more video channels or higher quality video representations within existing digital transmission capacities. Video coding for telecommunication applications has diversified from ISDN and T1/E1 service to embrace PSTN, mobile wireless networks, and LAN/Internet network delivery. Throughout this evolution, continued efforts have been made to maximize coding efficiency while dealing with the diversification of network types and their characteristic formatting and loss/error robustness requirements. The H.264/AVC and HEVC standards are designed for technical solutions including areas like broadcasting (over cable, satellite, cable modem, DSL, terrestrial, etc.) interactive or serial storage on optical and magnetic devices, conversational services, video-on-demand or multimedia streaming, multimedia messaging services, etc. Moreover, new applications may be deployed over existing and future networks. This raises the question about how to handle this variety of applications and networks. To address this need for flexibility and customizability, the design covers a NAL that formats the Video Coding Layer (VCL) representation of the video and provides header information in a manner appropriate for conveyance by a variety of transport layers or storage media. The NAL is designed in order to provide "network friendliness" to enable simple and effective customization of the use of VCL for a broad variety of systems. The NAL facilitates the ability to map VCL data to transport layers such as: RTP/IP for any kind of real-time wire-line and wireless Internet services. File formats, e.g., ISO MP4 for storage and MMS. H.32X for wireline and wireless conversational services. MPEG-2 systems for broadcasting services, etc. The full degree of customization of the video content to fit the needs of each particular application is outside the scope of the video coding standardization effort, but the design of the NAL anticipates a variety of such mappings. Some key concepts of the NAL are NAL units, byte stream, and packet formats uses of NAL units, parameter sets, and access units. A short description of these concepts is given below. == NAL units == The coded video data is organized into NAL units, each of which is effectively a packet that contains an integer number of bytes. The first byte of each H.264/AVC NAL unit is a header byte that contains an indication of the type of data in the NAL unit. For HEVC the header was extended to two bytes. All the remaining bytes contain payload data of the type indicated by the header. The NAL unit structure definition specifies a generic format for use in both packet-oriented and bitstream-oriented transport systems, and a series of NAL units generated by an encoder is referred to as a NAL unit stream. == NAL Units in Byte-Stream Format Use == Some systems require delivery of the entire or partial NAL unit stream as an ordered stream of bytes or bits within which the locations of NAL unit boundaries need to be identifiable from patterns within the coded data itself. For use in such systems, the H.264/AVC and HEVC specifications define a byte stream format. In the byte stream format, each NAL unit is prefixed by a specific pattern of three bytes called a start code prefix. The boundaries of the NAL unit can then be identified by searching the coded data for the unique start code prefix pattern. The use of emulation prevention bytes guarantees that start code prefixes are unique identifiers of the start of a new NAL unit. A small amount of additional data (one byte per video picture) is also added to allow decoders that operate in systems that provide streams of bits without alignment to byte boundaries to recover the necessary alignment from the data in the stream. Additional data can also be inserted in the byte stream format that allows expansion of the amount of data to be sent and can aid in achieving more rapid byte alignment recovery, if desired. == NAL Units in Packet-Transport System Use == In other systems (e.g., IP/RTP systems), the coded data is carried in packets that are framed by the system transport protocol, and identification of the boundaries of NAL units within the packets can be established without use of start code prefix patterns. In such systems, the inclusion of start code prefixes in the data would be a waste of data carrying capacity, so instead the NAL units can be carried in data packets without start code prefixes. == VCL and Non-VCL NAL Units == NAL units are classified into VCL and non-VCL NAL units. VCL NAL units contain the data that represents the values of the samples in the video pictures. Non-VCL NAL units contain any associated additional information such as parameter sets (important header data that can apply to a large number of VCL NAL units) and supplemental enhancement information (timing information and other supplemental data that may enhance usability of the decoded video signal but are not necessary for decoding the values of the samples in the video pictures). == Parameter Sets == A parameter set contains shared configuration data that is carried in non-VCL NAL units. Parameter sets are typically reused when decoding many coded pictures within a video sequence. Each VCL NAL unit references a picture parameter set (PPS), which in turn references a sequence parameter set (SPS). There are two types of parameter sets: Sequence parameter set (SPS), which specifies mostly constant configuration such as resolution, bit depth, or chroma format. (For a concrete implementation, see FFmpeg's SPS struct.) Picture parameter set (PPS), which applies on top of an SPS, and specifies configuration such as QP offsets. (For a concrete implementation, see FFmpeg's PPS struct.) The sequence and picture parameter-set mechanism decouples the transmission of infrequently changing information from the transmission of coded representations of the values of the samples in the video pictures. Each VCL NAL unit contains an identifier that refers to the content of the relevant picture parameter set and each picture parameter set contains an identifier that refers to the content of the relevant sequence parameter set. In this manner, a small amount of data (the identifier) can be used to refer to a larger amount of information (the parameter set) without repeating that information within each VCL NAL unit. Sequence and picture parameter sets can be sent well ahead of the VCL NAL units that they apply to, and can be repeated to provide robustness against data loss. In some applications, parameter sets may be sent within the channel that carries the VCL NAL units (termed "in-band" transmission). In other applications, it can be advantageous to convey the parameter sets "out-of-band" using a more reliable transport mechanism than the video channel itself. == Access Units == A set of NAL units in a specified form is referred to as an access unit. The decoding of each access unit results in one decoded picture. Each access unit contains a set of VCL NAL units that together compose a primary coded picture. It may also be prefixed with an access unit delimiter to aid in locating the start of the access unit. Some supplemental enhancement information containing data such as picture timing information may also precede the primary coded picture. The primary coded picture consists of a set of VCL NAL units consisting of slices or slice data partitions that represent the samples of the video picture. Following the primary coded picture may be some additional VCL NAL units that contain redundant representations of areas of the same video picture. These are referred to as redundant coded pictures, and are available for use by a decoder in recovering from loss or corruption of the data in the primary coded pictures. Decoders are not required to decode redundant coded pictures if they are present. Finally, if the coded picture is the last picture of a coded video sequence (a sequence of pictures that is independently decodable and uses only one sequence parameter set), an end of sequence NAL unit may be present to indicate the end of the sequence; and if the coded picture is the last coded picture in the entire NAL unit stream, an end of stream NAL unit may be present to

    Read more →
  • Google Mobile Services

    Google Mobile Services

    Google Mobile Services (GMS) is a collection of proprietary applications and application programming interfaces (APIs) services from Google that are typically pre-installed on the majority of Android devices, such as smartphones, tablets, and smart TVs. GMS is not a part of the Android Open Source Project (AOSP), which means an Android manufacturer needs to obtain a license from Google in order to legally pre-install GMS on an Android device. This license is provided by Google without any licensing fees except in the EU. == Core applications == The following are core applications that are part of Google Mobile Services: Google Search Google Chrome YouTube Google Play Google Drive Gmail Google Meet Google Maps Google Photos Google TV YouTube Music === Historically === Google+ Google Hangouts Google Wallet Google Play Magazines Google Play Music Google Play Movies & TV Google Duo == Reception, competitors, and regulators == === FairSearch === Numerous European firms filed a complaint to the European Commission stating that Google had manipulated their power and dominance within the market to push their Services to be used by phone manufacturers. The firms were joined under the name FairSearch, and the main firms included were Microsoft, Expedia, TripAdvisor, Nokia and Oracle. FairSearch's major problem with Google's practices was that they believed Google were forcing phone manufacturers to use their Mobile Services. They claimed Google managed this by asking these manufacturers to sign a contract stating that they must preinstall specific Google Mobile Services, such as Maps, Search and YouTube, in order to get the latest version of Android. Google swiftly responded stating that they "continue to work co-operatively with the European Commission". === Aptoide === The third-party Android app store Aptoide also filed an EU competition complaint against Google once again stating that they are misusing their power within the market. Aptoide alleged that Google was blocking third-party app stores from being on Google Play, as well as blocking Google Chrome from downloading any third-party apps and app stores. As of June 2014, Google had not responded to these allegations. === Abuse of Android dominance === In May 2019, Umar Javeed, Sukarma Thapar, Aaqib Javeed vs. Google LLC & Ors. the Competition Commission of India ordered an antitrust probe against Google for abusing its dominant position with Android to block market rivals. In Prima Facie opinion the commission held that mandatory pre-installation of the entire Google Mobile Services (GMS) suite, under Mobile Application Distribution Agreements (MADA), amounts to the imposition of unfair conditions on the device manufacturers. === EU antitrust ruling === On July 18, 2018, the European Commission fined Google €4.34 billion for breaching EU antitrust rules which resulted in a change of licensing policy for the GMS in the EU. A new paid licensing agreement for smartphones and tablets shipped into the EEA was created. The change is that the GMS is now decoupled from the base Android and will be offered under a separate paid licensing agreement. === Privacy policy === At the same time, Google faced problems with various European data protection agencies, most notably In the United Kingdom and France. The problem they faced was that they had a set of 60 rules merged into one, which allowed Google to "track users more closely". Google once again came out and stated that their new policies still abide by European Union laws. === Android distributions without Google Mobile Services === After surveillance and privacy concerns, several custom android distributions have been implemented, such as GrapheneOS, LineageOS, CalyxOS, iodéOS or /e/OS, and they come either without any GMS installed by default or with microG, that adds a compatibility layer.

    Read more →
  • Pixel-art scaling algorithms

    Pixel-art scaling algorithms

    Pixel art scaling algorithms are graphical filters that attempt to enhance the appearance of hand-drawn 2D pixel art graphics. These algorithms are a form of automatic image enhancement. Pixel art scaling algorithms employ methods significantly different than the common methods of image rescaling, which have the goal of preserving the appearance of images. As pixel art graphics are commonly used at very low resolutions, they employ careful coloring of individual pixels. This results in graphics that rely on a high amount of stylized visual cues to define complex shapes. Several specialized algorithms have been developed to handle re-scaling of such graphics. These specialized algorithms can improve the appearance of pixel-art graphics, but in doing so they introduce changes. Such changes may be undesirable, especially if the goal is to faithfully reproduce the original appearance. Since a typical application of this technology is improving the appearance of fourth-generation and earlier video games on arcade and console emulators, many pixel art scaling algorithms are designed to run in real-time for sufficiently small input images at 60-frames per second. This places constraints on the type of programming techniques that can be used for this sort of real-time processing. Many work only on specific scale factors. 2× is the most common scale factor, while 3×, 4×, 5×, and 6× exist but are less used. == Algorithms == === SAA5050 'Diagonal Smoothing' === The Mullard SAA5050 Teletext character generator chip (1980) used a primitive pixel scaling algorithm to generate higher-resolution characters on the screen from a lower-resolution representation from its internal ROM. Internally, each character shape was defined on a 5 × 9 pixel grid, which was then interpolated by smoothing diagonals to give a 10 × 18 pixel character, with a characteristically angular shape, surrounded to the top and the left by two pixels of blank space. The algorithm only works on monochrome source data, and assumes the source pixels will be logically true or false depending on whether they are 'on' or 'off'. Pixels 'outside the grid pattern' are assumed to be off. The algorithm works as follows: A B C --\ 1 2 D E F --/ 3 4 1 = B | (A & E & !B & !D) 2 = B | (C & E & !B & !F) 3 = E | (!A & !E & B & D) 4 = E | (!C & !E & B & F) Note that this algorithm, like the Eagle algorithm below, has a flaw: If a pattern of 4 pixels in a hollow diamond shape appears, the hollow will be obliterated by the expansion. The SAA5050's internal character ROM carefully avoids ever using this pattern. The degenerate case: becomes: === EPX/Scale2×/AdvMAME2× === Eric's Pixel Expansion (EPX) is an algorithm developed by Eric Johnston at LucasArts around 1992, when porting the SCUMM engine games from the IBM PC (which ran at 320 × 200 × 256 colors) to the early color Macintosh computers, which ran at more or less double that resolution. The algorithm works as follows, expanding P into 4 new pixels based on P's surroundings: 1=P; 2=P; 3=P; 4=P; IF C==A => 1=A IF A==B => 2=B IF D==C => 3=C IF B==D => 4=D IF of A, B, C, D, three or more are identical: 1=2=3=4=P Later implementations of this same algorithm (as AdvMAME2× and Scale2×, developed around 2001) are slightly more efficient but functionally identical: 1=P; 2=P; 3=P; 4=P; IF C==A AND C!=D AND A!=B => 1=A IF A==B AND A!=C AND B!=D => 2=B IF D==C AND D!=B AND C!=A => 3=C IF B==D AND B!=A AND D!=C => 4=D AdvMAME2× is available in DOSBox via the scaler=advmame2x dosbox.conf option. The AdvMAME4×/Scale4× algorithm is just EPX applied twice to get 4× resolution. ==== Scale3×/AdvMAME3× and ScaleFX ==== The AdvMAME3×/Scale3× algorithm (available in DOSBox via the scaler=advmame3x dosbox.conf option) can be thought of as a generalization of EPX to the 3× case. The corner pixels are calculated identically to EPX. 1=E; 2=E; 3=E; 4=E; 5=E; 6=E; 7=E; 8=E; 9=E; IF D==B AND D!=H AND B!=F => 1=D IF (D==B AND D!=H AND B!=F AND E!=C) OR (B==F AND B!=D AND F!=H AND E!=A) => 2=B IF B==F AND B!=D AND F!=H => 3=F IF (H==D AND H!=F AND D!=B AND E!=A) OR (D==B AND D!=H AND B!=F AND E!=G) => 4=D 5=E IF (B==F AND B!=D AND F!=H AND E!=I) OR (F==H AND F!=B AND H!=D AND E!=C) => 6=F IF H==D AND H!=F AND D!=B => 7=D IF (F==H AND F!=B AND H!=D AND E!=G) OR (H==D AND H!=F AND D!=B AND E!=I) => 8=H IF F==H AND F!=B AND H!=D => 9=F There is also a variant improved over Scale3× called ScaleFX, developed by Sp00kyFox, and a version combined with Reverse-AA called ScaleFX-Hybrid. === Eagle === Eagle works as follows: for every in pixel, we will generate 4 out pixels. First, set all 4 to the color of the pixel we are currently scaling (as nearest-neighbor). Next look at the three pixels above, to the left, and diagonally above left: if all three are the same color as each other, set the top left pixel of our output square to that color in preference to the nearest-neighbor color. Work similarly for all four pixels, and then move to the next one. Assume an input matrix of 3 × 3 pixels where the centermost pixel is the pixel to be scaled, and an output matrix of 2 × 2 pixels (i.e., the scaled pixel) first: |Then . . . --\ CC |S T U --\ 1 2 . C . --/ CC |V C W --/ 3 4 . . . |X Y Z | IF V==S==T => 1=S | IF T==U==W => 2=U | IF V==X==Y => 3=X | IF W==Z==Y => 4=Z Thus if we have a single black pixel on a white background it will vanish. This is a bug in the Eagle algorithm but is solved by other algorithms such as EPX, 2xSaI, and HQ2x. === 2×SaI === 2×SaI, short for 2× Scale and Interpolation engine, was inspired by Eagle. It was designed by Derek Liauw Kie Fa, also known as Kreed, primarily for use in console and computer emulators, and it has remained fairly popular in this niche. Many of the most popular emulators, including ZSNES and VisualBoyAdvance, offer this scaling algorithm as a feature. Several slightly different versions of the scaling algorithm are available, and these are often referred to as Super 2×SaI and Super Eagle. The 2xSaI family works on a 4 × 4 matrix of pixels where the pixel marked A below is scaled: I E F J G A B K --\ W X H C D L --/ Y Z M N O P For 16-bit pixels, they use pixel masks which change based on whether the 16-bit pixel format is 565 or 555. The constants colorMask, lowPixelMask, qColorMask, qLowPixelMask, redBlueMask, and greenMask are 16-bit masks. The lower 8 bits are identical in either pixel format. Two interpolation functions are described: INTERPOLATE(uint32 A, UINT32 B). -- linear midpoint of A and B if (A == B) return A; return ( ((A & colorMask) >> 1) + ((B & colorMask) >> 1) + (A & B & lowPixelMask) ); Q_INTERPOLATE(uint32 A, uint32 B, uint32 C, uint32 D) -- bilinear interpolation; A, B, C, and D's average x = ((A & qColorMask) >> 2) + ((B & qColorMask) >> 2) + ((C & qColorMask) >> 2) + ((D & qColorMask) >> 2); y = (A & qLowPixelMask) + (B & qLowPixelMask) + (C & qLowPixelMask) + (D & qLowPixelMask); y = (y >> 2) & qLowPixelMask; return x + y; The algorithm checks A, B, C, and D for a diagonal match such that A==D and B!=C, or the other way around, or if they are both diagonals or if there is no diagonal match. Within these, it checks for three or four identical pixels. Based on these conditions, the algorithm decides whether to use one of A, B, C, or D, or an interpolation among only these four, for each output pixel. The 2xSaI arbitrary scaler can enlarge any image to any resolution and uses bilinear filtering to interpolate pixels. Since Kreed released the source code under the GNU General Public License, it is freely available to anyone wishing to utilize it in a project released under that license. Developers wishing to use it in a non-GPL project would be required to rewrite the algorithm without using any of Kreed's existing code. It is available in DOSBox via scaler=2xsai option. === hqnx family === Maxim Stepin's hq2x, hq3x, and hq4x are for scale factors of 2:1, 3:1, and 4:1 respectively. Each work by comparing the color value of each pixel to those of its eight immediate neighbors, marking the neighbors as close or distant, and using a pre-generated lookup table to find the proper proportion of input pixels' values for each of the 4, 9 or 16 corresponding output pixels. The hq3x family will perfectly smooth any diagonal line whose slope is ±0.5, ±1, or ±2 and which is not anti-aliased in the input; one with any other slope will alternate between two slopes in the output. It will also smooth very tight curves. Unlike 2xSaI, it anti-aliases the output. hqnx was initially created for the Super NES emulator ZSNES. The author of bsnes has released a space-efficient implementation of hq2x to the public domain. A port to shaders, which has comparable quality to the early versions of xBR, is available. Before the port, a shader called "scalehq" has often been confused for hqx. === xBR family === There are 6 filters in this family: xBR , xBRZ, xBR-Hybrid, Super xBR, xBR+3D and Super xBR+3D. xBR ("scale by rules"), cre

    Read more →
  • Odor source localization

    Odor source localization

    Odor source localization (OSL) is the problem of locating the origin of an airborne or waterborne chemical plume using one or more mobile sensors, typically robots equipped with chemical sensors. The task sits at the intersection of robotics, fluid dynamics and machine olfaction. Chemical plumes in turbulent flows are intermittent and patchy, and most chemical sensors respond slowly and have limited selectivity, so the instantaneous reading available to a moving sensor is a poor proxy for the underlying time-averaged concentration field. Robotic OSL has been studied since the late 1980s and has applications including the detection of gas leaks, search and rescue after industrial accidents, and environmental monitoring of industrial emissions. == History == Robotic odor search emerged in the late 1980s and 1990s, drawing on earlier work in chemical ecology that had described how moths and other insects locate distant pheromone sources. R. A. Russell at Monash University was among the first to build mobile robots that followed chemical trails on the floor and tracked airborne odor plumes. Distributed and multi-robot odor search were investigated by Hayes, Martinoli and Goodman at the California Institute of Technology and EPFL, who studied cooperative plume-tracing on simulated and physical robot swarms. In 2007 Vergassola, Villermaux and Shraiman introduced infotaxis, an information-theoretic search strategy in which a sensor moves so as to maximize the expected information gain about source location, rather than following a chemical concentration gradient; the paper appeared in Nature and prompted substantial follow-up work in the robotics community. From the mid-2010s, multi-rotor unmanned aerial vehicles carrying lightweight chemical sensors became a common experimental platform for OSL research. == Problem formulation == OSL is generally decomposed into three sub-problems: plume detection (deciding whether a chemical signal is present), plume traversal (moving so as to remain in contact with the plume), and source declaration (deciding when the source has been reached). The mathematical difficulty depends strongly on the assumed dispersion model. In laminar or low-Reynolds number flows a Gaussian advection–diffusion model gives a smooth concentration field with a well-defined gradient. In turbulent flows, which dominate most realistic environments, the plume is filamentary: the sensor receives short, randomly spaced bursts of chemical separated by periods of zero signal, and the time-averaged field is not a useful guide on the time scales at which a robot must act. Source-term estimation, surveyed by Hutchinson and colleagues, additionally aims to recover both the position and the release rate of the source from the observed concentrations, often using probabilistic filters. == Biological inspiration == Many OSL strategies are explicitly modeled on the behavior of male moths flying upwind toward a pheromone source. As reviewed by Cardé and Willis, moths combine an upwind surge whenever they detect a filament of pheromone with a wider crosswind cast when contact is lost, producing a characteristic zig-zag trajectory that has been transposed onto mobile robots by several groups. Other biological models draw on the search behavior of dogs and of marine animals such as blue crabs and lobsters, which integrate chemical and bilateral hydrodynamic cues over much shorter ranges. == Algorithms and strategies == === Reactive strategies === Reactive strategies select the next motion as a direct function of the current sensor reading. Chemotaxis steers along the locally estimated concentration gradient, which is effective in laminar plumes but degrades severely in turbulence. Anemotaxis exploits a measured wind direction by surging upwind when chemical contact is made. The bio-inspired cast-and-surge family combines anemotaxis with a deterministic crosswind cast on contact loss, and is the dominant reactive approach for turbulent environments. === Probabilistic and information-theoretic strategies === Probabilistic methods maintain a posterior distribution over possible source locations and choose actions that improve that distribution. The infotaxis strategy of Vergassola, Villermaux and Shraiman selects the move that maximizes the expected reduction in entropy of the source-location posterior, and is effective in regimes where the spatial gradient is unusable. Bayesian source-term estimation extends this idea by inferring both source position and release rate, typically using particle filters or sequential Monte Carlo. === Map-based strategies === Map-based methods build a spatial model of the time-averaged gas distribution from sensor readings collected along the robot's trajectory and search for local maxima in that model. Lilienthal and colleagues describe a family of kernel-based gas distribution mapping techniques in which point measurements are convolved with a Gaussian kernel to produce a spatially extrapolated estimate. Such methods are most useful when the source can be assumed quasi-stationary and the robot is able to revisit locations. === Multi-robot and swarm strategies === Multiple robots searching cooperatively can shorten search times. Cooperative formations spread the sensors across the crosswind axis, making detection of an intermittent plume more likely. Swarm-based approaches, reviewed by Wang and colleagues, deploy larger numbers of simpler agents and rely on collective behavior rather than centralized planning; reported advantages include improved coverage of the search area and the possibility of locating multiple sources in parallel. == Sensors and platforms == Most OSL systems use metal-oxide semiconductor (MOX) sensors, photoionization detectors or electrochemical cells, which trade off sensitivity, selectivity, response time and power consumption. Ishida and colleagues describe how these sensors interact with airflow around the robot body, an effect that motivates careful aerodynamic design and active sampling. Mobile platforms include wheeled ground robots for indoor and structured outdoor environments, multi-rotor unmanned aerial vehicles for open spaces and elevated sources, and autonomous underwater vehicles for chemical plumes in the marine environment. == Notable systems == Among the early demonstrations, R. A. Russell's series of differential-drive robots at Monash University localized volatile sources in still and ventilated rooms during the 1990s. The Smelling Nano Aerial Vehicle reported by Burgués and colleagues used a Crazyflie nano-quadcopter (approximately 27 grams in mass and 10 cm across) carrying a custom MOX gas sensing board, and built three-dimensional gas distribution maps of indoor releases from sweeping flights of less than three minutes. The GADEN simulator, released by Monroy and colleagues, couples three-dimensional dispersion computed from an OpenFOAM CFD solver with models of MOX and photo-ionization gas sensors, and is widely used to test mobile-robot olfaction algorithms in simulation. == Applications == Reported applications include the localization of natural-gas and methane leaks in urban infrastructure, search for chemical contamination after industrial accidents, search and rescue, and environmental monitoring of industrial emissions. Drug- and explosives-detection robots are an adjacent application area, although these typically rely on close-range sniffing rather than long-range plume tracking. == Open challenges == Open challenges identified in recent reviews include the limited speed, selectivity and stability of available chemical sensors; the scarcity of standardized, large-scale benchmarks comparable to those available in computer vision; reliable handling of multi-source environments, where standard single-source assumptions fail; and the integration of OSL with other autonomous-vehicle subsystems such as obstacle avoidance and navigation in three-dimensional turbulent flow.

    Read more →
  • Automation integrator

    Automation integrator

    An automation integrator is a systems integrator company or individual who makes different versions of automation hardware and software work together, generally combining several subsystems to work together as one large system. The title may refer to those who only integrate hardware, although these will often work with software integrators. Software created by automation integrators allows devices to communicate with each other, as well as collecting and reporting data. The magazine Control Engineering publishes an annual “Automation Integrator Guide” which lists over 2,000 automation integrators. They also give an annual system integrator of the year award to three automation integration firms. The Control System Integrators Association (CSIA) maintains a buyers' guide of over 1200 member and nonmember systems integrators known as the Industrial Automation Exchange, or CSIA Exchange for short. == Certification == The Control System Integrators Association (CSIA) certifies automation integrators, through an audit based on 79 critical criteria from the best practices manual. Companies must be associate members of the CSIA to be eligible for certification. Integrators can also receive certification through a program launched in 2012 by the Robotics Industries Association. == Industries == Automation Integrators work in a wide variety of industries which use robotics and automation. Some of the most common include:

    Read more →
  • Contrastive Language-Image Pre-training

    Contrastive Language-Image Pre-training

    Contrastive Language-Image Pre-training (CLIP) is a technique for training a pair of neural network models, one for image understanding and one for text understanding, using a contrastive objective. This method has enabled broad applications across multiple domains, including cross-modal retrieval, text-to-image generation, and aesthetic ranking. == Algorithm == The CLIP method trains a pair of models contrastively. One model takes in a piece of text as input and outputs a single vector representing its semantic content. The other model takes in an image and similarly outputs a single vector representing its visual content. The models are trained so that the vectors corresponding to semantically similar text-image pairs are close together in the shared vector space, while those corresponding to dissimilar pairs are far apart. To train a pair of CLIP models, one would start by preparing a large dataset of image-caption pairs. During training, the models are presented with batches of N {\displaystyle N} image-caption pairs. Let the outputs from the text and image models be respectively v 1 , . . . , v N , w 1 , . . . , w N {\displaystyle v_{1},...,v_{N},w_{1},...,w_{N}} . Two vectors are considered "similar" if their dot product is large. The loss incurred on this batch is the multi-class N-pair loss, which is a symmetric cross-entropy loss over similarity scores: − 1 N ∑ i ln ⁡ e v i ⋅ w i / T ∑ j e v i ⋅ w j / T − 1 N ∑ j ln ⁡ e v j ⋅ w j / T ∑ i e v i ⋅ w j / T {\displaystyle -{\frac {1}{N}}\sum _{i}\ln {\frac {e^{v_{i}\cdot w_{i}/T}}{\sum _{j}e^{v_{i}\cdot w_{j}/T}}}-{\frac {1}{N}}\sum _{j}\ln {\frac {e^{v_{j}\cdot w_{j}/T}}{\sum _{i}e^{v_{i}\cdot w_{j}/T}}}} In essence, this loss function encourages the dot product between matching image and text vectors ( v i ⋅ w i {\displaystyle v_{i}\cdot w_{i}} ) to be high, while discouraging high dot products between non-matching pairs. The parameter T > 0 {\displaystyle T>0} is the temperature, which is parameterized in the original CLIP model as T = e − τ {\displaystyle T=e^{-\tau }} where τ ∈ R {\displaystyle \tau \in \mathbb {R} } is a learned parameter. Other loss functions are possible. For example, Sigmoid CLIP (SigLIP) proposes the following loss function: L = 1 N ∑ i , j ∈ 1 : N f ( ( 2 δ i , j − 1 ) ( e τ w i ⋅ v j + b ) ) {\displaystyle L={\frac {1}{N}}\sum _{i,j\in 1:N}f((2\delta _{i,j}-1)(e^{\tau }w_{i}\cdot v_{j}+b))} where f ( x ) = ln ⁡ ( 1 + e − x ) {\displaystyle f(x)=\ln(1+e^{-x})} is the negative log sigmoid loss, and the Dirac delta symbol δ i , j {\displaystyle \delta _{i,j}} is 1 if i = j {\displaystyle i=j} else 0. == CLIP models == While the original model was developed by OpenAI, subsequent models have been trained by other organizations as well. === Image model === The image encoding models used in CLIP are typically vision transformers (ViT). The naming convention for these models often reflects the specific ViT architecture used. For instance, "ViT-L/14" means a "vision transformer large" (compared to other models in the same series) with a patch size of 14, meaning that the image is divided into 14-by-14 pixel patches before being processed by the transformer. The size indicator ranges from B, L, H, G (base, large, huge, giant), in that order. Other than ViT, the image model is typically a convolutional neural network, such as ResNet (in the original series by OpenAI), or ConvNeXt (in the OpenCLIP model series by LAION). Since the output vectors of the image model and the text model must have exactly the same length, both the image model and the text model have fixed-length vector outputs, which in the original report is called "embedding dimension". For example, in the original OpenAI model, the ResNet models have embedding dimensions ranging from 512 to 1024, and for the ViTs, from 512 to 768. Its implementation of ViT was the same as the original one, with one modification: after position embeddings are added to the initial patch embeddings, there is a LayerNorm. Its implementation of ResNet was the same as the original one, with 3 modifications: In the start of the CNN (the "stem"), they used three stacked 3x3 convolutions instead of a single 7x7 convolution, as suggested by. There is an average pooling of stride 2 at the start of each downsampling convolutional layer (they called it rect-2 blur pooling according to the terminology of ). This has the effect of blurring images before downsampling, for antialiasing. The final convolutional layer is followed by a multiheaded attention pooling. ALIGN a model with similar capabilities, trained by researchers from Google used EfficientNet, a kind of convolutional neural network. === Text model === The text encoding models used in CLIP are typically Transformers. In the original OpenAI report, they reported using a Transformer (63M-parameter, 12-layer, 512-wide, 8 attention heads) with lower-cased byte pair encoding (BPE) with 49152 vocabulary size. Context length was capped at 76 for efficiency. Like GPT, it was decoder-only, with only causally-masked self-attention. Its architecture is the same as GPT-2. Like BERT, the text sequence is bracketed by two special tokens [SOS] and [EOS] ("start of sequence" and "end of sequence"). Take the activations of the highest layer of the transformer on the [EOS], apply LayerNorm, then a final linear map. This is the text encoding of the input sequence. The final linear map has output dimension equal to the embedding dimension of whatever image encoder it is paired with. These models all had context length 77 and vocabulary size 49408. ALIGN used BERT of various sizes. == Dataset == === WebImageText === The CLIP models released by OpenAI were trained on a dataset called "WebImageText" (WIT) containing 400 million pairs of images and their corresponding captions scraped from the internet. The total number of words in this dataset is similar in scale to the WebText dataset used for training GPT-2, which contains about 40 gigabytes of text data. The dataset contains 500,000 text-queries, with up to 20,000 (image, text) pairs per query. The text-queries were generated by starting with all words occurring at least 100 times in English Wikipedia, then extended by bigrams with high mutual information, names of all Wikipedia articles above a certain search volume, and WordNet synsets. The dataset is private and has not been released to the public, and there is no further information on it. ==== Data preprocessing ==== For the CLIP image models, the input images are preprocessed by first dividing each of the R, G, B values of an image by the maximum possible value, so that these values fall between 0 and 1, then subtracting by [0.48145466, 0.4578275, 0.40821073], and dividing by [0.26862954, 0.26130258, 0.27577711]. The rationale was that these are the mean and standard deviations of the images in the WebImageText dataset, so this preprocessing step roughly whitens the image tensor. These numbers slightly differ from the standard preprocessing for ImageNet, which uses [0.485, 0.456, 0.406] and [0.229, 0.224, 0.225]. If the input image does not have the same resolution as the native resolution (224×224 for all except ViT-L/14@336px, which has 336×336 resolution), then the input image is first scaled by bicubic interpolation, so that its shorter side is the same as the native resolution, then the central square of the image is cropped out. === Others === ALIGN used over one billion image-text pairs, obtained by extracting images and their alt-tags from online crawling. The method was described as similar to how the Conceptual Captions dataset was constructed, but instead of complex filtering, they only applied a frequency-based filtering. Later models trained by other organizations had published datasets. For example, LAION trained OpenCLIP with published datasets LAION-400M, LAION-2B, and DataComp-1B. == Training == In the original OpenAI CLIP report, they reported training 5 ResNet and 3 ViT (ViT-B/32, ViT-B/16, ViT-L/14). Each was trained for 32 epochs. The largest ResNet model took 18 days to train on 592 V100 GPUs. The largest ViT model took 12 days on 256 V100 GPUs. All ViT models were trained on 224×224 image resolution. The ViT-L/14 was then boosted to 336×336 resolution by FixRes, resulting in a model. They found this was the best-performing model. In the OpenCLIP series, the ViT-L/14 model was trained on 384 A100 GPUs on the LAION-2B dataset, for 160 epochs for a total of 32B samples seen. == Applications == === Cross-modal retrieval === CLIP's cross-modal retrieval enables the alignment of visual and textual data in a shared latent space, allowing users to retrieve images based on text descriptions and vice versa, without the need for explicit image annotations. In text-to-image retrieval, users input descriptive text, and CLIP retrieves images with matching embeddings. In image-to-text retrieval, images are used to find related text content. CLIP’s ability to connect vis

    Read more →
  • Companion robot

    Companion robot

    A companion robot is a robot created to create real or apparent companionship for human beings. Target markets for companion robots include the elderly and single children. Companions robots are expected to communicate with non-experts in a natural and intuitive way. They offer a variety of functions, such as monitoring the home remotely, communicating with people, or waking people up in the morning. Their aim is to perform a wide array of tasks including educational functions, home security, diary duties, entertainment and message delivery services, etc. The idea of companionship with robots has already existed on science fictions of 1970s, like R2-D2. Starting from the late 20th century, companion robots became a reality, mostly as robotic pets. Besides entertainment purposes, interactive robots were also introduced as a personal service robot for elderly care around 2000. == Characteristics == Companion robots try to interact with users. They gather information about users based on their interactions and yield feedback. This procedure varies slightly based on their specific roles. For example, social-companion robots make simple conversations, while pet-companion robots mimic being real pets. == Types == Companion robots can perform a variety of tasks and they are produced in a specialized manner according to their purpose or target audience in order to increase convenience and end user satisfaction. === Social companion robots === Social companion robots are designed to provide companionship and be a solution for unwanted solitude. They often mimic adult human, child or pet behaviours appealing to the user base. Robots which are specifically devised for simple conversations, conveying emotions and respond to user feelings fall under this category. === Assistive companion robots === Assistive companion robots are aimed at people who require constant care because of age, disability or rehabilitation purposes. Such robots can help disadvantaged users with their daily tasks, act as reminders (e.g., for regular medication) and facilitate mobility in everyday actions. Assistive companion robots reduce the intensity of labour that should be performed by caretakers, nurses and legal guardians. === Educational companion robots === Educational companion robots perform tutorship for students, regardless of their ages, and can teach desired subjects with activities tailored for the user such as interactive assignments and games. Rather than replacing teachers and instructors, educational companion robots are aides to them. === Therapeutic companion robots === Designed for individuals coping with stress (PTSD in severe cases), anxiety and loneliness; therapeutic companion robots support users' emotional and mental wellbeing. Such robots can be utilized in hospitals and care facilities as well as dwellings where the distressed user may need the most help. Therapeutic companion robots bear a vast resemblance to assistive companion robots to the extent of being a branch of them; the nuance between these two types of companion robots is that the former is for long-term/lifetime usage while the latter is mostly for the duration of the therapy received by the user. === Pet companion robots === Pet companion robots are for individuals who seek an alternative to live pets as live animals demand a considerable amount of care and may not be eligible for people with allergies. These robots aim to be perfect imitations of a pet while diminishing the chore aspect of having one. === Entertainment companion robots === Entertainment companion robots are designed solely for entertainment and can provide numerous ways of entertainment, ranging from dancing to playing games with the user. People who would appreciate an individual to have fun with are the main audience of such products. === Personal assistant robots === Personal assistant robots help people with daily tasks, management, scheduling, reminding etc. Their area of activity can be offices as well as homes and public spaces. === Sex robots === Sex robots are anthropomorphic robotic sex dolls that have human-like movement or behavior, and some degree of artificial intelligence. As of 2026, although elaborately instrumented sex dolls have been created by a number of inventors, no fully animated sex robots yet exist. Simple devices have been created which can speak, make facial expressions, or respond to touch. There is controversy as to whether developing them would be morally justifiable. In 2015, robot ethicist Kathleen Richardson called for a ban on the creation of anthropomorphic sex robots with concerns about normalizing relationships with machines and reinforcing female dehumanization. Questions about their ethics, effects, and possible legal regulations have been discussed since then. == Examples == There are several companion robot prototypes, and these include Paro, CompanionAble, and EmotiRob, among others. === Paro === Paro is a pet-type robot system developed by Japan's National Institute of Advanced Industrial Science and Technology (AIST). The robot, which looked like a small harp seal, was designed as a therapeutic tool for use in hospitals and nursing homes. The robot is programmed to cry for attention and respond to its name. Experiments showed that Paro facilitated elderly residents to communicate with each other, which led to psychological improvements. === CompanionAble === This robot is classified as an FP 7 EU project. It is built to "cooperate with Ambient Assistive Living environment". The autonomous device, which is also built to support the elderly, helps its owner interact with smart home environment as well as caregivers. The robot functions as a mobile friend, by which natural interaction is possible via speech and the touchscreen to detect and track people at home. === EmotiRob === EmotiRob is developed in a robotics project which is the continuity of the MAPH (Active Media For the Handicap) project in emotion synthesis. The aim of the project was to maintain emotional interaction with children. EmotiRob designed in a way that a child can hold it in a his/her arms and with which he/she could interact by talking to it, and then the robot would express itself through body postures or facial expressions. It has cognitive capabilities, which are further extended so that the robot can have a natural linguistic interaction with its owner through the DRAGON speech-recognition software developed by a company called NUANCE. Such interaction is expected to facilitate a child's cognitive development and develop new learning patterns. === LOVOT === Lovot is a Japanese company robot whose only purpose is "to make you happy". It features over 50 sensors that mimic the behavior of a human baby or small pet, a 360° camera with a microphone, the ability to distinguish humans from objects, neoteny eyes, and an internal warmth of 30° celsius. An interactive Lovot Café was opened in Japan October 3, 2020. === NICOBO === Nicobo was developed by Panasonic and was influenced by the loneliness of lockdowns created as a measure of the COVID-19 pandemic. It was designed to appear vulnerable, which creates empathy in its owners. Nicobo's name derives from the Japanese word for "smile". It wags its tail, engages in baby talk, and stays as a housemate. === Hyodol === Hyodol is an advanced care robot designed to support the elderly by reminding them to take their medications and monitoring their movements to keep their guardians informed. Additionally, this innovative robot can detect and respond to the emotional states of its elderly users, adding a layer of personalized care. Hyodol is designed with the appearance and speech style of a 7-year-old Korean grandchild, featuring a soft fabric exterior and user interaction methods such as striking the head or patting the back. It is equipped with various sensors and wireless communication technologies to collect and process data, supporting mobile apps and PC web monitoring systems for remote monitoring from anywhere. In South Korea, approximately 10,000 Hyodol robots are deployed to the homes of elderly individuals living alone, providing essential support and companionship. Local governments, including provincial and county offices, have embraced Hyodol as a solution to address social challenges stemming from the country's rapidly aging society.Furthermore, the robot is widely utilized in the treatment of dementia patients at a university hospital in Gangwon province. Hyodol was honored with the Mobile World Congress (MWC) Global Mobile Awards (GLOMO) in the "Best Mobile Innovation for Connected Health and Wellbeing" category on February 29, 2024. === Moxie === Moxie was a companion robot for autistic children developed by a company called Embodied. Although it had limited motion, it presented itself as a lifelike avatar. It was designed to help the children learn emotional cognition, using remotely hosted large language models to direct its respons

    Read more →
  • Direct voice input

    Direct voice input

    Direct voice input (DVI), sometimes called voice input control (VIC), is a style of human–machine interaction "HMI" in which the user makes voice commands to issue instructions to the machine through speech recognition. In the field of military aviation, DVI has been introduced into the cockpits of several modern military aircraft, such as the Eurofighter Typhoon, the Lockheed Martin F-35 Lightning II, the Dassault Rafale, the KF-21 Boramae and the Saab JAS 39 Gripen. Such systems have also been used for various other purposes, including industry control systems and speech recognition assistance for impaired individuals. == Overview == DVI systems can be divided into two major categories of functionality: "user-dependent" or "user-independent". A user-dependent system requires that a personal voice template to be generated for a specific person; the template for this individual has to be loaded onto their assigned machine prior to use of the DVI system for it to function properly. In contrast, a user-independent system does not require any personal voice template, being intended to respond correctly to the voice of any user. They can also be categorised between "discrete recognition" and "continuous recognition". Users of a discrete recognition system must pause between each word so that the DVI system can identify the separations between each word, while a continuous speech recognition system is capable of understanding a normal rate of speech. During the mid-2000s, researchers at the National Aerospace Laboratory in the Netherlands examined the use of DVI in the "GRACE" simulator; a total of twelve pilots participated in the ensuing experiment. The tests performed reportedly revealed that, while the hardware itself functioned well, several improvements were desirable prior to real-world deployment on aircraft since DVI operations actually consumed more time in comparison to traditional existing methods. Recommendations for improvements included the adoption of simpler syntax, the achievement of a greater recognition rate, and a decrease in response times; all of the issues encountered were determined to be of a technological nature, and were deemed feasible to resolve. The researchers concluded that in cockpits, especially during emergencies where pilots have to operate entirely on their own, a DVI system could be highly relevant, but that it was not of crucial importance during most other conceivable scenarios. Around the same time, evaluations of DVI systems for civil aviation purposes were conducted within the framework of Project SafeSound, coordinated by the European Union. It involved the observation of pilot workloads in real-world cockpits and contrasting them against pilot activity in flight simulators using both conventional systems and DVI assistance. The project aimed to enhance aviation safety and to decrease the workload in both ground and flight operations via the application of enhanced audio functions. == Applications == === Aviation === Prior to its widespread deployment, a handful of conventional military aircraft were converted to trial DVI systems; examples include the Harrier AV-8B and F-16 VISTA. In another case, a General Dynamics F-16 Fighting Falcon simulator was modified with DVI for a voice control study that was undertaken by the Royal Netherlands Air Force. DVI trials have also been conducted on helicopters, including the Boeing AH-64 Apache, showing the potential to improve flight safety and mission effectiveness. Numerous modern fighter aircraft have been outfitted with DVI systems, often in combination with various other man-machine interface schemes, such as HOTAS-compliant controls and other advanced control technologies. The combination of Voice and HOTAS control schemes has sometimes been referred to as the "V-TAS" concept. A prominent fighter aircraft to be furnished with a V-TAS cockpit is the Eurofighter Typhoon. The Lockheed Martin F-35 Lightning II also features a DVI system, which was developed by Adacel. Other examples includes the Dassault Rafale and the Saab JAS 39 Gripen. Numerous aircraft have been planned to use DVI. At one stage, the United States Air Force had sought to integrate DVI upon the Lockheed Martin F-22 Raptor; however, the technology was eventually judged to pose too many technical risks at that point in time, and thus such efforts were abandoned. === Personal === By 1990, working prototypes of speech recognition systems were being demonstrated; these were being promoted for the purpose of providing an effective man-machine interface for individuals with impaired speech. Techniques employed included time-encoded digital speech and automatic token set selection. Investigations of these early DVI systems reportedly included the use of automatic diagnostic routines and limited-scale trials using volunteers. During the 2010s, various companies were offering voice recognition systems to the general public in the form of personal digital assistants. One example is the Google Voice service, which allows users to pose questions via a DVI package installed on either a personal computer, tablet, or mobile phone. Numerous digital assistants have been developed, such as Amazon Echo, Siri, and Cortana, that use DVI to interact with users. === Commercial === DVI technology has enabled automated telephone systems to be widely deployed. Many companies commonly use centralised phone systems that route callers to the correct department via such methods. Various car manufacturers have also furnished their road vehicles with DVI systems; these typically allow drivers to control infotainment systems and interact with mobile phones with more convenience than legacy methods. During the late 1980s, investigations into the use of DVI systems for controlling CNC machines and other manufacturing apparatus were underway. During the 2010s, such systems were being used for logistics and warehouse management purposes.

    Read more →