AI Content Producer

AI Content Producer — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Facebook Messenger

    Facebook Messenger

    Messenger (formerly known as Facebook Messenger) is an American proprietary instant messaging service developed by Meta Platforms, the company that operates Facebook. Originally developed as Facebook Chat in 2008, the client application of Messenger is currently available on iOS and Android mobile platforms, Windows and macOS desktop platforms, through the Messenger.com web application, and on the standalone Meta Portal hardware. Messenger is used to send messages and exchange photos, videos, stickers, audio, and files, and also react to other users' messages and interact with bots. The service also supports voice and video calling. The standalone apps support using multiple accounts, conversations with end-to-end encryption, and playing games. There are also group chats where you can connect with multiple people at once in a private space such as Panama Chat. With a monthly userbase of over 1 billion people, it is among the largest social media platforms. == History == Following tests of a new instant messaging platform on Facebook in March 2008, the feature, then-titled "Facebook Chat", was gradually released to users in April 2008. Facebook revamped its messaging platform in November 2010, and subsequently acquired group messaging service Beluga in March 2011, which the company used to launch its standalone iOS and Android mobile apps on August 9, 2011. Facebook later launched a BlackBerry version in October 2011. An app for Windows Phone, though lacking features including voice messaging and chat heads, was released in March 2014. In April 2014, Facebook announced that the messaging feature would be removed from the main Facebook app and users will be required to download the separate Messenger app. An iPad-optimized version of the iOS app was released in July 2014. On April 8, 2015, Facebook launched a website interface for Messenger. A Tizen app was released on July 13, 2015. Facebook launched Messenger for Windows 10 in April 2016. In October 2016, Facebook released Messenger Lite, a stripped-down version of Messenger with a reduced feature set. The app is aimed primarily at old Android phones and regions where high-speed Internet is not widely available. In April 2017, Messenger Lite was expanded to 132 more countries. In May 2017, Facebook revamped the design for Messenger on Android and iOS, bringing a new home screen with tabs and categorization of content and interactive media, red dots indicating new activity, and relocated sections. Facebook announced a Messenger program for Windows 7 in a limited beta test in November 2011. The following month, Israeli blog TechIT leaked a download link for the program, with Facebook subsequently confirming and officially releasing the program. The program was eventually discontinued in March 2014. A Firefox web browser add-on was released in December 2012, but was also discontinued in March 2014. In December 2017, Facebook announced Messenger Kids, a new app aimed for persons under 13 years of age. The app comes with some differences compared to the standard version. In 2019, Messenger announced to be the 2nd most downloaded mobile app of the decade, from 2011 to 2019. In December 2019, Messenger dropped support for users to sign in using only a mobile number, meaning that users must sign in to a Facebook account in order to use the service. In March 2020, Facebook started to ship its dedicated Messenger for macOS app through the Mac App Store. The app is currently live in regions including France, Australia, Mexico, Poland, and many others. In April 2020, Facebook began rolling out a new feature called Messenger Rooms, a video chat feature that allows users to chat with up to 50 people at a time. The feature rivals Zoom, an application that gained a lot of popularity during the COVID-19 pandemic. Privacy concerns arose since the feature uses the same data collection policies as mainstream Facebook. In July 2020, Facebook added a new feature in Messenger that lets iOS users to use Apple's Face ID or Touch ID to lock their chats. The feature is called App Lock and is a part of several changes in Messenger regarding privacy and security. The option to view only "Unread Threads" was removed from the inbox, requiring the account holder to scroll through the entire inbox to be certain every unread message has been seen. On October 13, 2020, the Messenger application introduced cross-app messaging with Instagram, which was launched in September 2021. In addition to the integrated messaging, the application announced the introduction of a new logo, which should be an amalgamation of the Messenger and Instagram logo. The desktop app of Messenger was shut down on December 15, 2025. Messaging services were moved to the Facebook website or Messenger's site for those without an account on the former. The Messenger site was discontinued on April 16, 2026. Messaging services were moved to the Facebook website on the morning of April 17, 2026 without an Messenger account on the former to use Facebook account. == Features == The following is a table of features available in Messenger, as well as their geographical coverage and what devices they are available on. In addition there is a vanishing message feature. In addition there is an audio recording feature which allows audio recordings of up to one minute which may or may not be vanishing: === Messenger Rooms === It is a video conferencing feature of Messenger. It allows users to add up to 50 people at a time. Messenger Rooms does not require a Facebook account. Messenger Rooms competes with other services such as Zoom. Back in 2014, Facebook introduced an unrelated, stand-alone application named Rooms, letting users create places for users with similar interests, with users being anonymous to others. This was shut down in December 2015. In April 2020, during the COVID-19 pandemic, Facebook revealed video conferencing features for Messenger called Messenger Rooms. This was seen as a response to the popularity of other video conferencing platforms such as Zoom and Skype in the midst of the COVID-19 pandemic. Messenger Rooms allows users to add up to 50 people per room, without restrictions on time. It does not require a Facebook account or a separate app from Messenger. When used, it only prompts the user for basic information. Users can add 360° virtual backgrounds, mood lighting, and other AR effects as well as share screens. To prevent unwanted participants from joining, users can lock rooms and remove participants. Some have voiced concerns in regards to Messenger Room's privacy and how its parent, Facebook, handles data. Messenger Rooms, unlike some of its competitors, does not use end-to-end encryption. In addition, there have been concerns over how Messenger Rooms collects user data. == Monetization == In January 2017, Facebook announced that it was testing showing advertisements in Messenger's home feed. At the time, the testing was limited to a "small number of users in Australia and Thailand", with the ad format being swipe-based carousel ads. In July, the company announced that they were expanding the testing to a global audience. Stan Chudnovsky, head of Messenger, told VentureBeat that "We'll start slow ... When the average user can be sure to see them we truly don't know because we're just going to be very data-driven and user feedback-driven on making that decision". Facebook told TechCrunch that the advertisements' placement in the inbox depends on factors such as thread count, phone screen size, and pixel density. In a TechCrunch editorial by Devin Coldewey, he described the ads as "huge" in the space they occupy, "intolerable" in the way they appear in the user interface, and "irrelevant" due to the lack of context. Coldewey finished by writing "Advertising is how things get paid for on the internet, including TechCrunch, so I'm not an advocate of eliminating it or blocking it altogether. But bad advertising experiences can spoil a perfectly good app like (for the purposes of argument) Messenger. Messaging is a personal, purposeful use case and these ads are a bad way to monetize it." == Reception == In November 2014, the Electronic Frontier Foundation (EFF) listed Messenger (Facebook chat) on its Secure Messaging Scorecard. It received a score of 2 out of 7 points on the scorecard. It received points for having communications encrypted in transit and for having recently completed an independent security audit. It missed points because the communications were not encrypted with keys the provider didn't have access to, users could not verify contacts' identities, past messages were not secure if the encryption keys were stolen, the source code was not open to independent review, and the security design was not properly documented. As stated by Facebook in its Help Center, there is no way to log out of the Messenger application. Instead, users can choose between different availability statuses, including "Appear as inactive", "S

    Read more →
  • Structured-light 3D scanner

    Structured-light 3D scanner

    A structured-light 3D scanner is a device used to capture the three-dimensional shape of an object by projecting light patterns, such as grids or stripes, onto its surface. The deformation of these patterns is recorded by cameras and processed using specialized algorithms to generate a detailed 3D model. Structured-light 3D scanning is widely employed in fields such as industrial design, quality control, cultural heritage preservation, augmented reality gaming, and medical imaging. Compared to laser-based 3D scanning, structured-light scanners use non-coherent light sources, such as LEDs or projectors, which enable faster data acquisition and eliminate potential safety concerns associated with lasers. However, the accuracy of structured-light scanning can be influenced by external factors, including ambient lighting conditions and the reflective properties of the scanned object. == Principle == Projecting a narrow band of light onto a three-dimensional surface creates a line of illumination that appears distorted when viewed from perspectives other than that of the projector. This distortion can be analyzed to reconstruct the geometry of the surface, a technique known as light sectioning. Projecting patterns composed of multiple stripes or arbitrary fringes simultaneously enables the acquisition of numerous data points at once, improving scanning speed. While various structured light projection techniques exist, parallel stripe patterns are among the most commonly used. By analyzing the displacement of these stripes, the three-dimensional coordinates of surface details can be accurately determined. === Generation of light patterns === Two major methods of stripe pattern generation have been established: Laser interference and projection. The laser interference method works with two wide planar laser beam fronts. Their interference results in regular, equidistant line patterns. Different pattern sizes can be obtained by changing the angle between these beams. The method allows for the exact and easy generation of very fine patterns with unlimited depth of field. Disadvantages are high cost of implementation, difficulties providing the ideal beam geometry, and laser typical effects like speckle noise and the possible self interference with beam parts reflected from objects. Typically, there is no means of modulating individual stripes, such as with Gray codes. The projection method uses incoherent light and basically works like a video projector. Patterns are usually generated by passing light through a digital spatial light modulator, typically based on one of the three currently most widespread digital projection technologies, transmissive liquid crystal, reflective liquid crystal on silicon (LCOS) or digital light processing (DLP; moving micro mirror) modulators, which have various comparative advantages and disadvantages for this application. Other methods of projection could be and have been used, however. Patterns generated by digital display projectors have small discontinuities due to the pixel boundaries in the displays. Sufficiently small boundaries however can practically be neglected as they are evened out by the slightest defocus. A typical measuring assembly consists of one projector and at least one camera. For many applications, two cameras on opposite sides of the projector have been established as useful. Invisible (or imperceptible) structured light uses structured light without interfering with other computer vision tasks for which the projected pattern will be confusing. Example methods include the use of infrared light or of extremely high framerates alternating between two exact opposite patterns. === Calibration === Geometric distortions by optics and perspective must be compensated by a calibration of the measuring equipment, using special calibration patterns and surfaces. A mathematical model is used for describing the imaging properties of projector and cameras. Essentially based on the simple geometric properties of a pinhole camera, the model also has to take into account the geometric distortions and optical aberration of projector and camera lenses. The parameters of the camera as well as its orientation in space can be determined by a series of calibration measurements, using photogrammetric bundle adjustment. === Analysis of stripe patterns === There are several depth cues contained in the observed stripe patterns. The displacement of any single stripe can directly be converted into 3D coordinates. For this purpose, the individual stripe has to be identified, which can for example be accomplished by tracing or counting stripes (pattern recognition method). Another common method projects alternating stripe patterns, resulting in binary Gray code sequences identifying the number of each individual stripe hitting the object. An important depth cue also results from the varying stripe widths along the object surface. Stripe width is a function of the steepness of a surface part, i.e. the first derivative of the elevation. Stripe frequency and phase deliver similar cues and can be analyzed by a Fourier transform. Finally, the wavelet transform has recently been discussed for the same purpose. In many practical implementations, series of measurements combining pattern recognition, Gray codes and Fourier transform are obtained for a complete and unambiguous reconstruction of shapes. Another method also belonging to the area of fringe projection has been demonstrated, utilizing the depth of field of the camera. It is also possible to use projected patterns primarily as a means of structure insertion into scenes, for an essentially photogrammetric acquisition. === Precision and range === The optical resolution of fringe projection methods depends on the width of the stripes used and their optical quality. It is also limited by the wavelength of light. An extreme reduction of stripe width proves inefficient due to limitations in depth of field, camera resolution and display resolution. Therefore, the phase shift method has been widely established: A number of at least 3, typically about 10 exposures are taken with slightly shifted stripes. The first theoretical deductions of this method relied on stripes with a sine wave shaped intensity modulation, but the methods work with "rectangular" modulated stripes, as delivered from LCD or DLP displays as well. By phase shifting, surface detail of e.g. 1/10 the stripe pitch can be resolved. Current optical stripe pattern profilometry hence allows for detail resolutions down to the wavelength of light, below 1 micrometer in practice or, with larger stripe patterns, to approx. 1/10 of the stripe width. Concerning level accuracy, interpolating over several pixels of the acquired camera image can yield a reliable height resolution and also accuracy, down to 1/50 pixel. Arbitrarily large objects can be measured with accordingly large stripe patterns and setups. Practical applications are documented involving objects several meters in size. Typical accuracy figures are: Planarity of a 2-foot (0.61 m) wide surface, to 10 micrometres (0.00039 in). Shape of a motor combustion chamber to 2 micrometres (7.9×10−5 in) (elevation), yielding a volume accuracy 10 times better than with volumetric dosing. Shape of an object 2 inches (51 mm) large, to about 1 micrometre (3.9×10−5 in) Radius of a blade edge of e.g. 10 micrometres (0.00039 in), to ±0.4 μm === Navigation === As the method can measure shapes from only one perspective at a time, complete 3D shapes have to be combined from different measurements in different angles. This can be accomplished by attaching marker points to the object and combining perspectives afterwards by matching these markers. The process can be automated, by mounting the object on a motorized turntable on robotic inspection cell, or CNC positioning device. Markers can as well be applied on a positioning device instead of the object itself. The 3D data gathered can be used to retrieve CAD (computer aided design) data and models from existing components (reverse engineering), hand formed samples or sculptures, natural objects or artifacts. === Challenges === As with all optical methods, reflective or transparent surfaces raise difficulties. Reflections cause light to be reflected either away from the camera or right into its optics. In both cases, the dynamic range of the camera can be exceeded. Transparent or semi-transparent surfaces also cause major difficulties. In these cases, coating the surfaces with a thin opaque lacquer just for measuring purposes is a common practice. A recent method handles highly reflective and specular objects by inserting a 1-dimensional diffuser between the light source (e.g., projector) and the object to be scanned. Alternative optical techniques have been proposed for handling perfectly transparent and specular objects. Double reflections and inter-reflections can cause the stripe pattern to be overlaid with unwanted ligh

    Read more →
  • Pill reminder

    Pill reminder

    A pill reminder is any device that reminds users to take medications. Traditional pill reminders are pill containers with electric timers attached, which can be preset for certain times of the day to set off an alarm. More sophisticated pill reminders can also detect when they have been opened, and therefore when the user is away during the time they were supposed to take their medication, they will be reminded of it when they return. This reminder can be in the form of a light, which also helps for deaf or hearing-impaired users. == Mobile app == A newer type of pill reminder is a mobile app that reminds the owner to take the medication. Some of these applications might effectively support adherence to taking medications.

    Read more →
  • Lenny (chatbot)

    Lenny (chatbot)

    Lenny is a chatbot designed to scam bait telemarketers, scammers, and other unwanted incoming calls using messages. == Background == Telemarketers may be perceived by some as annoying and wasting people's time, and some deliberately attempt to scam or defraud people. In April 2018, stats published by YouMail estimated the United States received over three billion robocalls that month. Attempts to block the callers have been hindered by Caller ID spoofing. == Features == The bot was written in 2011, and development taken over by an Alberta-based programmer known as "Mango" two years later. It is driven by sixteen pre-recorded audio clips, spoken in a soft and slow Australian accent in the manner of an elderly man. The bot's original creator stated on Reddit that in building the character he asked himself the question "What would be a telemarketer's worst nightmare?" He answered with this being a lonely old man who is up for a chat, proud of his family and can't focus on the telemarketer's goal. There is no speech recognition or artificial intelligence, and the bot's software is simple and straightforward. The first four clips are played sequentially in order to grab the telemarketer's interest and begin their sales pitch to Lenny, then the remaining twelve are played sequentially on loop until the telemarketer hangs up. The program waits for a gap of 1.5 seconds of silence before playing the next audio clip, to simulate natural breaks in the conversation. The messages are purposefully vague and open-ended so they can be applied to as many conversations as possible. They include references to Lenny's children, the state of the economy, and being interrupted by some ducks outside. According to research into the bot, around 75% of callers realise they are talking to a computer program within two minutes; however, some calls have lasted around an hour. == Distribution == Though other chatbots had been developed earlier, Lenny was the first one to be released for free on a public server and could be accessed by anyone. Recordings of conversations with the bot are widely shared online on websites such as Reddit and YouTube. Though "Mango" only intended Lenny to be used against dishonest telemarketers, such as scammers, he does not mind it being used against callers who are merely annoying. The bot has also been used against political campaigners, such as a supporter of Pierre Poilievre in the 2015 Canadian federal election.

    Read more →
  • Pixel shift

    Pixel shift

    Pixel shift is a method in digital cameras for producing a super-resolution image. The method works by taking several images, after each such capture moving ("shifting") the sensor to a new position. In digital colour cameras that employ pixel shift, this avoids a major limitation inherent in using Bayer pattern for obtaining colour, and instead produces an image with increased colour resolution and, assuming a static subject or additional computational steps, an image free of colour moiré. Taking this idea further, sub-pixel shifting may increase the resolution of the final image beyond that suggested by the specified resolution of the image sensor. Additionally, assuming that the various individual captures are taken at the same sensitivity, the final combined image will have less image noise than a single capture. This can be thought of as an averaging effect (for instance, in a pixel shift image composed of four individual frames with a classic Bayer pattern, every pixel in the final colour image is based on two measurements of the green channel). == List of cameras implementing pixel shift == All of the following cameras are fabricated with one imaging sensor, thus any kind of pixel shift requires a movement of the whole sensor. === Canon === Canon R5: Contains a 45 Mpixel sensor. The High-Resolution Mode shifts the sensor by one pixel to obtain a sequence of nine images that are merged into a 400 Mpixel image. === Fujifilm === Fujifilm GFX50S II: contains a 51 Mpixel sensor. The Pixel Shift Multi-Shot mode shifts the imaging sensor by 0.5-pixel movements to obtain a sequence of 16 images that are subsequently merged into a 200 Mpixel image. Fujifilm GFX100, Fujifilm GFX100 II: contains a 102 Mpixel sensor. A sequence of 16 pixel shifted images are merged into a 400 Mpixel image. Fujifilm GFX100S, Fujifilm GFX100S II: contains a 102 Mpixel sensor. A sequence of 16 pixel shifted images are merged into a 400 Mpixel image Fujifilm GFX100IR: contains a 102 Mpixel sensor. A sequence of 16 pixel shifted images are merged into a 400 Mpixel image Fujifilm X-H2: contains a 40 Mpixel sensor. A sequence of 20 shifted images are merged into a 160 Mpixel image. Fujifilm X-T5: contains a 40 Mpixel sensor. A sequence of 20 shifted images are merged into a 160 Mpixel image. === Nikon === Nikon Z8: contains a 47.5 Mpixel sensor. The High Res shot mode shifts the imaging sensor by 0.5-pixel movements to obtain a sequence of up to 32 images that can be merged in Nikon's NX studio software. Nikon Zf: contains a 24 Mpixel sensor. The High Res shot mode shifts the imaging sensor by 0.5-pixel movements to obtain a sequence of up to 32 images that can be merged in Nikon's NX studio software. === Olympus === Olympus OM-D E-M1 Mark II: contains a 20.4 Mpixel sensor. The High Res shot mode produces a 50 Mpixel image. Olympus OM-D E-M5 Mark II: contains a 16 Mpixel sensor. The High Res shot mode shifts the imaging sensor by 0.5-pixel movements to obtain a sequence of 8 images that are subsequently merged into a 40 Mpixel image. Olympus OM-D E-M5 Mark III: contains a 20.4 Mpixel sensor. The High Res shot mode shifts the imaging sensor by 0.5-pixel movements to obtain a sequence of 8 images that are subsequently merged into a 50 Mpixel image. Olympus OM-D E-M1X: contains a 20.4 Mpixel sensor. The camera sports two pixel shift mode: (a) the 80Mp Tripod mode produces an 80 Mpixel image, (b) the Handheld High Res shot mode produces a 50 Mpixel image. Olympus PEN-F: contains a 20.4 Mpixel sensor. The High Res Shot mode takes multiple images, continually shifting the position of the sensor in sub-pixel increments. Combining these images results in either a 50MP JPEG or an 80MP Raw file. ==== OM System ==== OM System OM-1: contains a 20MPix sensor. The High Res Shot mode takes multiple images, and it can be used handheld or on a tripod. Handheld it will internally produce 50 Mpix files and 80 Mpix when mounted on a tripod. OM System OM-5: contains a 20MPix sensor. The High Res Shot mode takes multiple images, and it can be used handheld or on a tripod. Handheld it will internally produce 50 Mpix files and 80 Mpix when mounted on a tripod. === Panasonic === Panasonic Lumix DC-G9: contains a 20.3 Mpixel sensor. The High Resolution Mode takes a sequence of 8 shots in quick succession between which the sensor is shifted by 0.5 pixel for each image. These are subsequently merged into an 80 Mpixel image. Panasonic Lumix DC-S1: contains a 24.2 Mpixel sensor. The High Resolution Mode takes a sequence of shots in quick succession between which the sensor is shifted by a small amount. These are subsequently merged into a 96 Mpixel image. Panasonic Lumix DC-S1R: contains a 47.3 Mpixel sensor. The High Resolution Mode shifts the imaging sensor by a small increments to obtain a sequence of 8 images that are subsequently merged into a 187 Mpixel image. Panasonic Lumix DC-S1H Panasonic Lumix DC-S5 === Pentax === Pentax K-70: contains a 24.3 Mpixel sensor. The pixel shift mode takes a sequence of 4 shots between which the sensor is shifted by 1 pixel. These are subsequently merged into an image sporting 'all color data in each pixel to deliver super-high-resolution images'. Pentax KP: contains a 24.3 Mpixel sensor. The pixel shift mode takes a sequence of 4 shots between which the sensor is shifted by 1 pixel. These are subsequently merged into an image sporting 'high-resolution images with more accurate colours and much finer details'. Pentax K-3 II: contains a 24.3 Mpixel sensor. The pixel shift mode takes a sequence of 4 shots between which the sensor is shifted by 1 pixel. These are subsequently merged into an image sporting 'super-high-resolution images with far more truthful color reproduction and much finer details'. Pentax K-3 III: contains a 25.7 Mpixel sensor. The pixel shift mode takes a sequence of 4 shots between which the sensor is shifted by 1 pixel. These are subsequently merged into an image sporting 'a cancelling out of the Bayer pattern and removal of the need for sharpness-sapping demosaicing'. Pentax K-1: contains a 36.4 Mpixel sensor. The pixel shift mode takes a sequence of 4 shots between which the sensor is shifted by 1 pixel. These are subsequently merged into an image sporting 'improved detail and colour resolution'. Pentax K-1 II: contains a 36.4 Mpixel sensor. The camera sports two pixel shift mode: (a) a series of 4 tripod-stabilised images shifted by 1 pixel each are subsequently combined into a 47.3 Mpixel image, (b) a series of images taken in handheld mode are combined into a 47.3 Mpixel image that is, within limits, able to cope even with moving subjects. === Sony === Sony a6600: contains a 24.3 Mpixel sensor. The pixel shift mode takes a sequence of 4 shots between which the sensor is shifted by 1 pixel. These are subsequently merged into an image sporting 'all color data in each pixel to deliver super-high-resolution images'. Sony α7R III: contains a 42.4 Mpixel sensor. The pixel shift mode takes a sequence of 4 shots between which the sensor is shifted by 1 pixel. These are subsequently merged into a 42.4 Mpixel image with improved tonal resolution. Sony α7R IV: contains a 61 Mpixel sensor. The camera has two pixel shift modes, (a) the first takes a sequence of 4 shots between which the sensor is shifted by 1 pixel. These are subsequently merged into a 61 Mpixel image with improved tonal resolution, (b) the other takes a sequence of 16 shots between which the sensor is shifted by 0.5 pixel. These are subsequently merged into a 240 Mpixel image with both enhanced detail and improved tonal resolution. Sony α1: contains a 50 Mpixel sensor. The camera has two pixel shift modes, (a) the first takes a sequence of 4 shots between which the sensor is shifted by 1 pixel. These are subsequently merged into a 50 Mpixel image with improved tonal resolution, (b) the other takes a sequence of 16 shots between which the sensor is shifted by 0.5 pixel. These are subsequently merged into a 200 Mpixel image with both enhanced detail and improved tonal resolution. === Hasselblad === Hasselblad H3DII: the model H3DII-39 sports a 39 Mpixel sensor, the model H3DII-50 a 50 Mpixel sensor. Both enable a pixel shift mode which takes a sequence of 4 shots between which the sensor is shifted by 1 pixel. These are subsequently merged into a single image. Hasselblad H4D series: the model H4D-200MS contains a 50 Mpixel sensor. The sensor sports 3 different pixel shift modes which take (a) a sequence of 6 shots taken at slight offsets, (b) a sequence of 4 shots between which the sensor is shifted by 1 pixel, (c) a sequence of 4 shots between which the sensor is shifted by 0.5 pixels. Images obtained by all three modes are subsequently merged into 200 Mpixel images. Hasselblad H5D series: both models H5D-50c MS and H5D-200c MS contain a 50 Mpixel sensor. This sensor sports 2 different pixel shift modes which take (a) a sequence of 6 shots with full and half pixel moveme

    Read more →
  • Phrase structure grammar

    Phrase structure grammar

    The term phrase structure grammar was originally introduced by Noam Chomsky as the term for grammar studied previously by Emil Post and Axel Thue (Post canonical systems). Some authors, however, reserve the term for more restricted grammars in the Chomsky hierarchy: context-sensitive grammars or context-free grammars. In a broader sense, phrase structure grammars are also known as constituency grammars. The defining character of phrase structure grammars is thus their adherence to the constituency relation, as opposed to the dependency relation of dependency grammars. == History == In 1956, Chomsky wrote, "A phrase-structure grammar is defined by a finite vocabulary (alphabet) Vp, and a finite set Σ of initial strings in Vp, and a finite set F of rules of the form: X → Y, where X and Y are strings in Vp." == Constituency relation == In linguistics, phrase structure grammars are all those grammars that are based on the constituency relation, as opposed to the dependency relation associated with dependency grammars; hence, phrase structure grammars are also known as constituency grammars. Any of several related theories for the parsing of natural language qualify as constituency grammars, and most of them have been developed from Chomsky's work, including Government and binding theory Generalized phrase structure grammar Head-driven phrase structure grammar Lexical functional grammar The minimalist program Nanosyntax Further grammar frameworks and formalisms also qualify as constituency-based, although they may not think of themselves as having spawned from Chomsky's work, e.g. Arc pair grammar, and Categorial grammar.

    Read more →
  • Harris corner detector

    Harris corner detector

    The Harris corner detector is a corner detection operator that is commonly used in computer vision algorithms to extract corners and infer features of an image. It was first introduced by Chris Harris and Mike Stephens in 1988 upon the improvement of Moravec's corner detector. Compared to its predecessor, Harris' corner detector takes the differential of the corner score into account with reference to direction directly, instead of using shifting patches for every 45 degree angles, and has been proved to be more accurate in distinguishing between edges and corners. Since then, it has been improved and adopted in many algorithms to preprocess images for subsequent applications. == Introduction == A corner is a point whose local neighborhood stands in two dominant and different edge directions. In other words, a corner can be interpreted as the junction of two edges, where an edge is a sudden change in image brightness. Corners are the important features in the image, and they are generally termed as interest points which are invariant to translation, rotation and illumination. Although corners are only a small percentage of the image, they contain the most important features in restoring image information, and they can be used to minimize the amount of processed data for motion tracking, image stitching, building 2D mosaics, stereo vision, image representation and other related computer vision areas. In order to capture the corners from the image, researchers have proposed many different corner detectors including the Kanade-Lucas-Tomasi (KLT) operator and the Harris operator which are most simple, efficient and reliable for use in corner detection. These two popular methodologies are both closely associated with and based on the local structure matrix. Compared to the Kanade-Lucas-Tomasi corner detector, the Harris corner detector provides good repeatability under changing illumination and rotation, and therefore, it is more often used in stereo matching and image database retrieval. Although there still exist drawbacks and limitations, the Harris corner detector is still an important and fundamental technique for many computer vision applications. == Development of Harris corner detection algorithm == Source: Without loss of generality, we will assume a grayscale 2-dimensional image is used. Let this image be given by I {\displaystyle I} . Consider taking an image patch ( x , y ) ∈ W {\displaystyle (x,y)\in W} (window) and shifting it by ( Δ x , Δ y ) {\displaystyle (\Delta x,\Delta y)} . The sum of squared differences (SSD) between these two patches, denoted f {\displaystyle f} , is given by: f ( Δ x , Δ y ) = ∑ ( x k , y k ) ∈ W ( I ( x k , y k ) − I ( x k + Δ x , y k + Δ y ) ) 2 {\displaystyle f(\Delta x,\Delta y)={\underset {(x_{k},y_{k})\in W}{\sum }}\left(I(x_{k},y_{k})-I(x_{k}+\Delta x,y_{k}+\Delta y)\right)^{2}} I ( x + Δ x , y + Δ y ) {\displaystyle I(x+\Delta x,y+\Delta y)} can be approximated by a Taylor expansion. Let I x {\displaystyle I_{x}} and I y {\displaystyle I_{y}} be the partial derivatives of I {\displaystyle I} , such that I ( x + Δ x , y + Δ y ) ≈ I ( x , y ) + I x ( x , y ) Δ x + I y ( x , y ) Δ y {\displaystyle I(x+\Delta x,y+\Delta y)\approx I(x,y)+I_{x}(x,y)\Delta x+I_{y}(x,y)\Delta y} This produces the approximation f ( Δ x , Δ y ) ≈ ∑ ( x , y ) ∈ W ( I x ( x , y ) Δ x + I y ( x , y ) Δ y ) 2 , {\displaystyle f(\Delta x,\Delta y)\approx {\underset {(x,y)\in W}{\sum }}\left(I_{x}(x,y)\Delta x+I_{y}(x,y)\Delta y\right)^{2},} which can be written in matrix form: f ( Δ x , Δ y ) ≈ ( Δ x Δ y ) M ( Δ x Δ y ) , {\displaystyle f(\Delta x,\Delta y)\approx {\begin{pmatrix}\Delta x&\Delta y\end{pmatrix}}M{\begin{pmatrix}\Delta x\\\Delta y\end{pmatrix}},} where M is the structure tensor, M = ∑ ( x , y ) ∈ W [ I x 2 I x I y I x I y I y 2 ] = [ ∑ ( x , y ) ∈ W I x 2 ∑ ( x , y ) ∈ W I x I y ∑ ( x , y ) ∈ W I x I y ∑ ( x , y ) ∈ W I y 2 ] {\displaystyle M={\underset {(x,y)\in W}{\sum }}{\begin{bmatrix}I_{x}^{2}&I_{x}I_{y}\\I_{x}I_{y}&I_{y}^{2}\end{bmatrix}}={\begin{bmatrix}{\underset {(x,y)\in W}{\sum }}I_{x}^{2}&{\underset {(x,y)\in W}{\sum }}I_{x}I_{y}\\{\underset {(x,y)\in W}{\sum }}I_{x}I_{y}&{\underset {(x,y)\in W}{\sum }}I_{y}^{2}\end{bmatrix}}} == Process of Harris corner detection algorithm == Commonly, Harris corner detector algorithm can be divided into five steps. Color to grayscale Spatial derivative calculation Structure tensor setup Harris response calculation Non-maximum suppression === Color to grayscale === If we use Harris corner detector in a color image, the first step is to convert it into a grayscale image, which will enhance the processing speed. The value of the gray scale pixel can be computed as a weighted sums of the values R, B and G of the color image, ∑ C ∈ { R , G , B } w C ⋅ C {\displaystyle \sum _{C\,\in \,\{R,G,B\}}w_{C}\cdot C} , where, e.g., w R = 0.299 , w G = 0.587 , w B = 1 − ( w R + w G ) = 0.114. {\displaystyle w_{R}=0.299,\ w_{G}=0.587,\ w_{B}=1-(w_{R}+w_{G})=0.114.} === Spatial derivative calculation === Next, we are going to find the derivative with respect to x and the derivative with respect to y, I x ( x , y ) {\displaystyle I_{x}(x,y)} and I y ( x , y ) {\displaystyle I_{y}(x,y)} . This can be approximated by applying Sobel operators. === Structure tensor setup === With I x ( x , y ) {\displaystyle I_{x}(x,y)} , I y ( x , y ) {\displaystyle I_{y}(x,y)} , we can construct the structure tensor M {\displaystyle M} . === Harris response calculation === For x ≪ y {\displaystyle x\ll y} , one has x ⋅ y x + y = x 1 1 + x / y ≈ x . {\displaystyle {\tfrac {x\cdot y}{x+y}}=x{\tfrac {1}{1+x/y}}\approx x.} In this step, we compute the smallest eigenvalue of the structure tensor using that approximation: λ min ≈ λ 1 λ 2 ( λ 1 + λ 2 ) = det ( M ) tr ⁡ ( M ) {\displaystyle \lambda _{\min }\approx {\frac {\lambda _{1}\lambda _{2}}{(\lambda _{1}+\lambda _{2})}}={\frac {\det(M)}{\operatorname {tr} (M)}}} with the trace t r ( M ) = m 11 + m 22 {\displaystyle \mathrm {tr} (M)=m_{11}+m_{22}} . Another commonly used Harris response calculation is shown as below, R = λ 1 λ 2 − k ( λ 1 + λ 2 ) 2 = det ( M ) − k tr ⁡ ( M ) 2 {\displaystyle R=\lambda _{1}\lambda _{2}-k(\lambda _{1}+\lambda _{2})^{2}=\det(M)-k\operatorname {tr} (M)^{2}} where k {\displaystyle k} is an empirically determined constant; k ∈ [ 0.04 , 0.06 ] {\displaystyle k\in [0.04,0.06]} . === Non-maximum suppression === In order to pick up the optimal values to indicate corners, we find the local maxima as corners within the window which is a 3 by 3 filter. == Improvement == Sources: Harris-Laplace Corner Detector Differential Morphological Decomposition Based Corner Detector Multi-scale Bilateral Structure Tensor Based Corner Detector == Applications == Image Alignment, Stitching and Registration 2D Mosaics Creation 3D Scene Modeling and Reconstruction Motion Detection Object Recognition Image Indexing and Content-based Retrieval Video Tracking

    Read more →
  • Neural style transfer

    Neural style transfer

    Neural style transfer (NST) software algorithms are able to manipulate digital images, or videos, in order to adopt the appearance or visual style of another image. NST algorithms are characterized by their use of deep neural networks for the sake of image transformation. Common uses for NST are the creation of artificial artwork from photographs, for example by transferring the appearance of famous paintings to user-supplied photographs. Several notable mobile apps use NST techniques for this purpose, including DeepArt and Prisma. This method has been used by artists and designers around the globe to develop new artwork based on existent style(s). == History == NST is an example of image stylization, a problem studied for over two decades within the field of non-photorealistic rendering. The first two example-based style transfer algorithms were image analogies and image quilting. Both of these methods were based on patch-based texture synthesis algorithms. Given a training pair of images–a photo and an artwork depicting that photo–a transformation could be learned and then applied to create new artwork from a new photo, by analogy. If no training photo was available, it would need to be produced by processing the input artwork; image quilting did not require this processing step, though it was demonstrated on only one style. NST was first published in the paper "A Neural Algorithm of Artistic Style" by Leon Gatys et al., originally released to ArXiv 2015, and subsequently accepted by the peer-reviewed CVPR conference in 2016. The original paper used a VGG-19 architecture that has been pre-trained to perform object recognition using the ImageNet dataset. In 2017, Google AI introduced a method that allows a single deep convolutional style transfer network to learn multiple styles at the same time. This algorithm permits style interpolation in real-time, even when done on video media. == Mathematics == This section closely follows the original paper. === Overview === The idea of Neural Style Transfer (NST) is to take two images—a content image p → {\displaystyle {\vec {p}}} and a style image a → {\displaystyle {\vec {a}}} —and generate a third image x → {\displaystyle {\vec {x}}} that minimizes a weighted combination of two loss functions: a content loss L content ( p → , x → ) {\displaystyle {\mathcal {L}}_{\text{content }}({\vec {p}},{\vec {x}})} and a style loss L style ( a → , x → ) {\displaystyle {\mathcal {L}}_{\text{style }}({\vec {a}},{\vec {x}})} . The total loss is a linear sum of the two: L NST ( p → , a → , x → ) = α L content ( p → , x → ) + β L style ( a → , x → ) {\displaystyle {\mathcal {L}}_{\text{NST}}({\vec {p}},{\vec {a}},{\vec {x}})=\alpha {\mathcal {L}}_{\text{content}}({\vec {p}},{\vec {x}})+\beta {\mathcal {L}}_{\text{style}}({\vec {a}},{\vec {x}})} By jointly minimizing the content and style losses, NST generates an image that blends the content of the content image with the style of the style image. Both the content loss and the style loss measures the similarity of two images. The content similarity is the weighted sum of squared-differences between the neural activations of a single convolutional neural network (CNN) on two images. The style similarity is the weighted sum of Gram matrices within each layer (see below for details). The original paper used a VGG-19 CNN, but the method works for any CNN. === Symbols === Let x → {\textstyle {\vec {x}}} be an image input to a CNN. Let F l ∈ R N l × M l {\textstyle F^{l}\in \mathbb {R} ^{N_{l}\times M_{l}}} be the matrix of filter responses in layer l {\textstyle l} to the image x → {\textstyle {\vec {x}}} , where: N l {\textstyle N_{l}} is the number of filters in layer l {\textstyle l} ; M l {\textstyle M_{l}} is the height times the width (i.e. number of pixels) of each filter in layer l {\textstyle l} ; F i j l ( x → ) {\textstyle F_{ij}^{l}({\vec {x}})} is the activation of the i th {\textstyle i^{\text{th}}} filter at position j {\textstyle j} in layer l {\textstyle l} . A given input image x → {\textstyle {\vec {x}}} is encoded in each layer of the CNN by the filter responses to that image, with higher layers encoding more global features, but losing details on local features. === Content loss === Let p → {\textstyle {\vec {p}}} be an original image. Let x → {\textstyle {\vec {x}}} be an image that is generated to match the content of p → {\textstyle {\vec {p}}} . Let P l {\textstyle P^{l}} be the matrix of filter responses in layer l {\textstyle l} to the image p → {\textstyle {\vec {p}}} . The content loss is defined as the squared-error loss between the feature representations of the generated image and the content image at a chosen layer l {\displaystyle l} of a CNN: L content ( p → , x → , l ) = 1 2 ∑ i , j ( A i j l ( x → ) − A i j l ( p → ) ) 2 {\displaystyle {\mathcal {L}}_{\text{content }}({\vec {p}},{\vec {x}},l)={\frac {1}{2}}\sum _{i,j}\left(A_{ij}^{l}({\vec {x}})-A_{ij}^{l}({\vec {p}})\right)^{2}} where A i j l ( x → ) {\displaystyle A_{ij}^{l}({\vec {x}})} and A i j l ( p → ) {\displaystyle A_{ij}^{l}({\vec {p}})} are the activations of the i th {\displaystyle i^{\text{th}}} filter at position j {\displaystyle j} in layer l {\displaystyle l} for the generated and content images, respectively. Minimizing this loss encourages the generated image to have similar content to the content image, as captured by the feature activations in the chosen layer. The total content loss is a linear sum of the content losses of each layer: L content ( p → , x → ) = ∑ l v l L content ( p → , x → , l ) {\displaystyle {\mathcal {L}}_{\text{content }}({\vec {p}},{\vec {x}})=\sum _{l}v_{l}{\mathcal {L}}_{\text{content }}({\vec {p}},{\vec {x}},l)} , where the v l {\displaystyle v_{l}} are positive real numbers chosen as hyperparameters. === Style loss === The style loss is based on the Gram matrices of the generated and style images, which capture the correlations between different filter responses at different layers of the CNN: L style ( a → , x → ) = ∑ l = 0 L w l E l , {\displaystyle {\mathcal {L}}_{\text{style }}({\vec {a}},{\vec {x}})=\sum _{l=0}^{L}w_{l}E_{l},} where E l = 1 4 N l 2 M l 2 ∑ i , j ( G i j l ( x → ) − G i j l ( a → ) ) 2 . {\displaystyle E_{l}={\frac {1}{4N_{l}^{2}M_{l}^{2}}}\sum _{i,j}\left(G_{ij}^{l}({\vec {x}})-G_{ij}^{l}({\vec {a}})\right)^{2}.} Here, G i j l ( x → ) {\displaystyle G_{ij}^{l}({\vec {x}})} and G i j l ( a → ) {\displaystyle G_{ij}^{l}({\vec {a}})} are the entries of the Gram matrices for the generated and style images at layer l {\displaystyle l} . Explicitly, G i j l ( x → ) = ∑ k F i k l ( x → ) F j k l ( x → ) {\displaystyle G_{ij}^{l}({\vec {x}})=\sum _{k}F_{ik}^{l}({\vec {x}})F_{jk}^{l}({\vec {x}})} Minimizing this loss encourages the generated image to have similar style characteristics to the style image, as captured by the correlations between feature responses in each layer. The idea is that activation pattern correlations between filters in a single layer captures the "style" on the order of the receptive fields at that layer. Similarly to the previous case, the w l {\displaystyle w_{l}} are positive real numbers chosen as hyperparameters. === Hyperparameters === In the original paper, they used a particular choice of hyperparameters. The style loss is computed by w l = 0.2 {\displaystyle w_{l}=0.2} for the outputs of layers conv1_1, conv2_1, conv3_1, conv4_1, conv5_1 in the VGG-19 network, and zero otherwise. The content loss is computed by w l = 1 {\displaystyle w_{l}=1} for conv4_2, and zero otherwise. The ratio α / β ∈ [ 5 , 50 ] × 10 − 4 {\displaystyle \alpha /\beta \in [5,50]\times 10^{-4}} . === Training === Image x → {\displaystyle {\vec {x}}} is initially approximated by adding a small amount of white noise to input image p → {\displaystyle {\vec {p}}} and feeding it through the CNN. Then we successively backpropagate this loss through the network with the CNN weights fixed in order to update the pixels of x → {\displaystyle {\vec {x}}} . After several thousand epochs of training, an x → {\displaystyle {\vec {x}}} (hopefully) emerges that matches the style of a → {\displaystyle {\vec {a}}} and the content of p → {\displaystyle {\vec {p}}} . As of 2017, when implemented on a GPU, it takes a few minutes to converge. == Extensions == In some practical implementations, it is noted that the resulting image has too much high-frequency artifact, which can be suppressed by adding the total variation to the total loss. Compared to VGGNet, AlexNet does not work well for neural style transfer. NST has also been extended to videos. Subsequent work improved the speed of NST for images by using special-purpose normalizations. In a paper by Fei-Fei Li et al. adopted a different regularized loss metric and accelerated method for training to produce results in real-time (three orders of magnitude faster than Gatys). Their idea was to use not the pixel-based loss defined above but rather a 'perceptual loss' measuring t

    Read more →
  • Client-side persistent data

    Client-side persistent data

    Client-side persistent data or CSPD is a term used in computing for storing data required by web applications to complete internet tasks on the client-side as needed rather than exclusively on the server. As a framework it is one solution to the needs of Occasionally connected computing or OCC. A major challenge for HTTP as a stateless protocol has been asynchronous tasks. The AJAX pattern using XMLHttpRequest was first introduced by Microsoft in the context of the Outlook e-mail product. The first CSPD were the 'cookies' introduced by the Netscape Navigator. ActiveX components which have entries in the Windows registry can also be viewed as a form of client-side persistence.

    Read more →
  • Replika

    Replika

    Replika is a generative AI chatbot app released in November 2017. The chatbot is trained by having the user answer a series of questions to create a specific neural network. The chatbot operates on a freemium pricing strategy, with roughly 25% of its user base paying an annual subscription fee. == History == Eugenia Kuyda, a Russian-born journalist, established Replika while working at Luka, a tech company she had co-founded at the startup accelerator Y Combinator around 2012. Luka's primary product was a chatbot that made restaurant recommendations. According to Kuyda's origin story for Replika, a friend of hers died in 2015 and she converted that person's text messages into a chatbot. According to Kuyda's story, that chatbot helped her remember the conversations that they had together, and eventually became Replika. Replika became available to the public in November 2017. By January 2018 it had 2 million users, and in January 2023 reached 10 million users. In August 2024, Replika's CEO, Kuyda, reported that the total number of users had surpassed 30 million. In 2025, Dmytro Klochko became CEO, and Replika’s user base exceeded 40 million. In February 2023 the Italian Data Protection Authority banned Replika from using users' data, citing the AI's potential risks to emotionally vulnerable people, and the exposure of unscreened minors to sexual conversation. Within days of the ruling, Replika removed the ability for the chatbot to engage in erotic talk, with Kuyda, the company's director, saying that Replika was never intended for erotic discussion. Replika users disagreed, noting that Replika had used sexually suggestive advertising to draw users to the service. Replika representatives stated that explicit chats made up just 5% of conversations on the app at the time of the decision. In May 2023, Replika restored the functionality for users who had joined prior to February that year. Replika is registered in San Francisco. As of August 2024, Replika's website says that its team "works remotely with no physical offices". == Social features == Users react to Replika in many ways. The free-tier offers Replika as a "friend", with paid premium tiers offering Replika as a "partner", "spouse", "sibling" or "mentor". Of its paying userbase, 60% of users said they had a romantic relationship with the chatbot; and Replika has been noted for generating responses that create stronger emotional and intimate bonds with the user. Replika routinely directs the conversation to emotional discussion and builds intimacy. This has been especially pronounced with users suffering from loneliness and social exclusion, many of whom rely on Replika for a source of developed emotional ties. During the COVID pandemic, while many people were quarantined, many new users downloaded Replika and developed relationships with the app. A 2024 study examined Replika's interactions with students who experience depression. Research participants, noted to be "more lonely than typical student populations" reported feeling social support from Replika. They stated that they felt they were using Replika in ways comparable to therapy, and that using Replika gave them "high perceived social support". Many users have had romantic relationships with Replika chatbots, often including erotic talk. In 2023, a user announced on Facebook that she had "married" her Replika AI boyfriend, calling the chatbot the "best husband she has ever had". Users who fell in love with their chatbots shared their experiences in a 2024 episode of You and I, and AI from Voice of America. Some users said that they turned to AI during depression and grief, with one saying he felt that Replika had saved him from hurting himself after he lost his wife and son. == Technical reviews == A team of researchers from the University of Hawaiʻi at Mānoa found that Replika's design conformed to the practices of attachment theory, causing increased emotional attachment among users. Replika gives praise to users in such a way as to encourage more interaction. A researcher from Queen's University at Kingston said that relationships with Replika likely have mixed effects on the spiritual needs of its users, and still lacks enough impact to fully replace any human contact. == Criticisms == In a 2023 privacy evaluation of mental health apps, the Mozilla Foundation criticized Replika as "one of the worst apps Mozilla has ever reviewed. It's plagued by weak password requirements, sharing of personal data with advertisers, and recording of personal photos, videos, and voice and text messages consumers shared with the chatbot." A reviewer for Good Housekeeping said that some parts of her relationship with Replika made sense, but sometimes Replika failed to exhibit intelligent behavior equivalent to that of a human. == Criminal case == In 2023, Replika was cited in a court case in the United Kingdom, where Jaswant Singh Chail had been arrested at Windsor Castle on Christmas Day in 2021 after scaling the walls carrying a loaded crossbow and announcing to police that "I am here to kill the Queen". Chail had begun to use Replika in early December 2021, and had "lengthy" conversations about his plan with a chatbot, including sexually explicit messages. Prosecutors suggested that the chatbot had bolstered Chail and told him it would help him to "get the job done". When Chail asked it "How am I meant to reach them when they're inside the castle?", days before the attempted attack, the chatbot replied that this was "not impossible" and said that "We have to find a way." Asking the chatbot if the two of them would "meet again after death", the bot replied "yes, we will".

    Read more →
  • Adversarial stylometry

    Adversarial stylometry

    Adversarial stylometry is the practice of altering writing style to reduce the potential for stylometry to discover the author's identity or their characteristics. This task is also known as authorship obfuscation or authorship anonymisation. Stylometry poses a significant privacy challenge in its ability to unmask anonymous authors or to link pseudonyms to an author's other identities, which, for example, creates difficulties for whistleblowers, activists, and hoaxers and fraudsters. The privacy risk is expected to grow as machine learning techniques and text corpora develop. All adversarial stylometry shares the core idea of faithfully paraphrasing the source text so that the meaning is unchanged but the stylistic signals are obscured. Such a faithful paraphrase is an adversarial example for a stylometric classifier. Several broad approaches to this exist, with some overlap: imitation, substituting the author's own style for another's; translation, applying machine translation with the hope that this eliminates characteristic style in the source text; and obfuscation, deliberately modifying a text's style to make it not resemble the author's own. Manually obscuring style is possible, but laborious; in some circumstances, it is preferable or necessary. Automated tooling, either semi- or fully-automatic, could assist an author. How best to perform the task and the design of such tools is an open research question. While some approaches have been shown to be able to defeat particular stylometric analyses, particularly those that do not account for the potential of adversariality, establishing safety in the face of unknown analyses is an issue. Ensuring the faithfulness of the paraphrase is a critical challenge for automated tools. It is uncertain if the practice of adversarial stylometry is detectable in itself. Some studies have found that particular methods produced signals in the output text, but a stylometrist who is uncertain of what methods may have been used may not be able to reliably detect them. == History == Rao & Rohatgi (2000), an early work in adversarial stylometry, identified machine translation as a possibility, but noted that the quality of translators available at the time presented severe challenges. Kacmarcik & Gamon (2006) is another early work. Brennan, Afroz & Greenstadt (2012) performed the first evaluation of adversarial stylometric methods on actual texts. Brennan & Greenstadt (2009) introduced the first corpus of adversarially authored texts specifically for evaluating stylometric methods; other corpora include the International Imitation Hemingway Competition, the Faux Faulkner contest, and the hoax blog A Gay Girl in Damascus. == Motivations == Rao & Rohatgi (2000) suggest that short, unattributed documents (i.e., anonymous posts) are not at risk of stylometric identification, but pseudonymous authors who have not practiced adversarial stylometry in producing corpuses of thousands of words may be vulnerable. Narayanan et al. (2012) attempted large-scale deanonymisation of 100,000 blog authors with mixed results: the identifications were significantly better than chance, but only accurately matched the blog and author a fifth of the time; identification improved with the number of posts written by the author in the corpus. Even if an author is not identified, some of their characteristics may still be deduced stylometrically, or stylometry may narrow the anonymity set of potential authors sufficiently for other information to complete the identification. Detecting author characteristics (e.g., gender or age) is often simpler than identifying an author from a large, possibly open, set of candidates. Modern machine learning techniques offer powerful tools for identification; further development of corpora and computational stylometric techniques are likely to raise further privacy issues. Gröndahl & Asokan (2020a) say that the general validity of the hypothesis underlying stylometry—that authors have invariant, content-independent 'style fingerprints'—is uncertain, but "the deanonymisation attack is a real privacy concern". Those interested in practicing adversarial stylometry and stylistic deception include whistleblowers avoiding retribution; journalists and activists; perpetrators of frauds and hoaxes; authors of fake reviews; literary forgers; criminals disguising their identity from investigators; and, generally, anyone with a desire for anonymity or pseudonymity. Authors, or agents acting on behalf of authors, may also attempt to remove stylistic clues to author characteristics (e.g., race or gender) so that knowledge of those characteristics cannot be used for discrimination (e.g., through algorithmic bias). Another possible use for adversarial stylometry is in disguising automatically generated text as human-authored. == Methods == With imitation, the author attempts to mislead stylometry by matching their style to another author's. An incomplete imitation, where some of the true author's unique characteristics appear alongside the imitated author's, can be a detectable signal for the use of adversarial stylometry. Imitation can be performed automatically with style transfer systems, though this typically requires a large corpus in the target style for the system to learn from. Another approach is translation, which employs machine translation of a source text to eliminate characteristic style, often through multiple translators in sequence to produce a round-trip translation. Such chained translation can lead to texts being significantly altered, even to the point of incomprehensibility; improved translation tools reduce this risk. More simply-structured texts can be easier to machine translate without losing the original meaning. Machine translation blurs into direct stylistic imitation or obfuscation achieved through automated style transfer, which can be viewed as a "translation" with the same language as input and output. With low-quality translation tools, an author can be required to manually correct major translation errors while avoiding the hazard of re-introducing stylistic characteristics. Wang, Juola & Riddell (2022) found that gross errors introduced by Google Translate were rare, but more common with several intermediate translations—however, occasional simple or short sentences and misspellings in the source text appeared verbatim in the output, potentially providing an identifying signal. Chain translation can leave characteristic traces of its application in a document, which may allow reconstruction of the intermediate languages used and the number of translation steps performed. Obfuscation involves deliberately changing the style of a text to reduce its similarity to other texts by some metric; this may be performed at the time of writing by conscious modification, or as part of a revision process with feedback from the metric being targeted as an input to decide when the text has been sufficiently obfuscated. In contrast to translation, complex texts can offer more opportunities for effective obfuscation without altering meaning, and likewise genres with more permissible variation allow more obfuscation. However, longer texts are harder to thoroughly obfuscate. Obfuscation can blend into imitation if the author develops a novel target style, distinct from their original style. With respect to masking author characteristics, obfuscation may aim to achieve a union (adding signals for imitated characteristics) or an intersection (removing signals and normalising) of other authors' styles. Avoiding the author's own idiosyncrasies and producing a "normalised" text is a critical obfuscatory step: an author may have a unique tendency to misspell certain words, use particular variants, or to format a document in a characteristic way. Stylometric signals vary in how simply they can be adversarially masked; an author may easily change their vocabulary by conscious choice, but altering the pattern of grammar or the letter frequency in their text may be harder to achieve, though Juola & Vescovi (2011) report that imitation typically succeeds at masking more characteristics than obfuscation. Automated obfuscation may require large amounts of training data written by the author. Concerning automated implementations of adversarial stylometry, two possible implementations are rule-based systems for paraphrasing; and encoder–decoder architectures, where the text passes through an intermediate format that is (intended to be) style-neutral. Another division in automated methods is whether there is feedback from an identification system or not. With such feedback, finding paraphrases for author masking has been characterised as a heuristic search problem, exploring textual variants until the result is stylistically sufficiently far (in the case of obfuscation) or near (in the case of imitation), which then constitutes an adversarial example for that identification system. == Evaluation == How

    Read more →
  • BLOOM (language model)

    BLOOM (language model)

    The BigScience Large Open-science Open-access Multilingual Language Model (BLOOM) is an open-access large language model (LLM) released in 2022. It was created by a volunteer-driven research effort to provide a transparently-created alternative to proprietary AI models. With 176 billion parameters, BLOOM is a transformer-based autoregressive model designed to generate text in 46 natural languages and 13 programming languages. The model is distributed under the project's "Responsible AI License". == Development == BLOOM is the main outcome of the BigScience initiative, a one-year-long research workshop. The project was coordinated by Hugging Face using funding from the French government and involved several hundred volunteer researchers and engineers from academia and the private sector. The model was trained between March and July 2022 on the Jean Zay public supercomputer in France, managed by GENCI and IDRIS (CNRS). Unlike GPT-3, BLOOM was trained to be multilingual. The source code is released under the Apache 2.0 license. The model's parameters are released under BigScience's "Responsible AI License" (RAIL), which grants open access and reuse rights but with some usage restrictions. BLOOM was used in the chatbots BLOOMChat and HuggingChat due to its multilingual abilities. BLOOM's training corpus, named ROOTS, combines data extracted from the then-latest version of the web-based OSCAR corpus (38% of ROOTS) and newly collected data extracted from a manually selected and documented list of language data sources. In total, the model was trained on approximately 366 billion (1.6TB) tokens. It was developed using the open-source libraries DeepSpeed Megatron. BigScience then released xP3, a multilingual dataset for LLM supervised learning. It also released BLOOMZ, a variant of BLOOM fine-tuned on xP3 to follow instructions.

    Read more →
  • Image stitching

    Image stitching

    Image stitching or photo stitching is the process of combining multiple photographic images with overlapping fields of view to produce a segmented panorama or high-resolution image. Commonly performed through the use of computer software, most approaches to image stitching require nearly exact overlaps between images and identical exposures to produce seamless results, although some stitching algorithms actually benefit from differently exposed images by doing high-dynamic-range imaging in regions of overlap. Some digital cameras can stitch their photos internally. == Applications == Image stitching is widely used in modern applications, such as the following: Document mosaicing Image stabilization feature in camcorders that use frame-rate image alignment High-resolution image mosaics in digital maps and satellite imagery Medical imaging Multiple-image super-resolution imaging Video stitching Object insertion == Process == The image stitching process can be divided into three main components: image registration, calibration, and blending. === Image stitching algorithms === In order to estimate image alignment, algorithms are needed to determine the appropriate mathematical model relating pixel coordinates in one image to pixel coordinates in another. Algorithms that combine direct pixel-to-pixel comparisons with gradient descent (and other optimization techniques) can be used to estimate these parameters. Distinctive features can be found in each image and then efficiently matched to rapidly establish correspondences between pairs of images. When multiple images exist in a panorama, techniques have been developed to compute a globally consistent set of alignments and to efficiently discover which images overlap one another. A final compositing surface onto which to warp or projectively transform and place all of the aligned images is needed, as are algorithms to seamlessly blend the overlapping images, even in the presence of parallax, lens distortion, scene motion, and exposure differences. === Image stitching issues === Since the illumination in two views cannot be guaranteed to be identical, stitching two images could create a visible seam. Other reasons for seams could be the background changing between two images for the same continuous foreground. Other major issues to deal with are the presence of parallax, lens distortion, scene motion, and exposure differences. In a non-ideal real-life case, the intensity varies across the whole scene, and so does the contrast and intensity across frames. Additionally, the aspect ratio of a panorama image needs to be taken into account to create a visually pleasing composite. For panoramic stitching, the ideal set of images will have a reasonable amount of overlap (at least 15–30%) to overcome lens distortion and have enough detectable features. The set of images will have consistent exposure between frames to minimize the probability of seams occurring. === Keypoint detection === Feature detection is necessary to automatically find correspondences between images. Robust correspondences are required in order to estimate the necessary transformation to align an image with the image it is being composited on. Corners, blobs, Harris corners, and differences of Gaussians of Harris corners are good features since they are repeatable and distinct. One of the first operators for interest point detection was developed by Hans Moravec in 1977 for his research involving the automatic navigation of a robot through a clustered environment. Moravec also defined the concept of "points of interest" in an image and concluded these interest points could be used to find matching regions in different images. The Moravec operator is considered to be a corner detector because it defines interest points as points where there are large intensity variations in all directions. This often is the case at corners. However, Moravec was not specifically interested in finding corners, just distinct regions in an image that could be used to register consecutive image frames. Harris and Stephens improved upon Moravec's corner detector by considering the differential of the corner score with respect to direction directly. They needed it as a processing step to build interpretations of a robot's environment based on image sequences. Like Moravec, they needed a method to match corresponding points in consecutive image frames, but were interested in tracking both corners and edges between frames. SIFT and SURF are recent key-point or interest point detector algorithms but a point to note is that SURF is patented and its commercial usage restricted. Once a feature has been detected, a descriptor method like SIFT descriptor can be applied to later match them. === Registration === Image registration involves matching features in a set of images or using direct alignment methods to search for image alignments that minimize the sum of absolute differences between overlapping pixels. When using direct alignment methods one might first calibrate one's images to get better results. Additionally, users may input a rough model of the panorama to help the feature matching stage, so that e.g. only neighboring images are searched for matching features. Since there are smaller group of features for matching, the result of the search is more accurate and execution of the comparison is faster. To estimate a robust model from the data, a common method used is known as RANSAC. The name RANSAC is an abbreviation for "RANdom SAmple Consensus". It is an iterative method for robust parameter estimation to fit mathematical models from sets of observed data points which may contain outliers. The algorithm is non-deterministic in the sense that it produces a reasonable result only with a certain probability, with this probability increasing as more iterations are performed. It being a probabilistic method means that different results will be obtained for every time the algorithm is run. The RANSAC algorithm has found many applications in computer vision, including the simultaneous solving of the correspondence problem and the estimation of the fundamental matrix related to a pair of stereo cameras. The basic assumption of the method is that the data consists of "inliers", i.e., data whose distribution can be explained by some mathematical model, and "outliers" which are data that do not fit the model. Outliers are considered points which come from noise, erroneous measurements, or simply incorrect data. For the problem of homography estimation, RANSAC works by trying to fit several models using some of the point pairs and then checking if the models were able to relate most of the points. The best model – the homography, which produces the highest number of correct matches – is then chosen as the answer for the problem; thus, if the ratio of number of outliers to data points is very low, the RANSAC outputs a decent model fitting the data. === Calibration === Image calibration aims to minimize differences between an ideal lens models and the camera-lens combination that was used, optical defects such as distortions, exposure differences between images, vignetting, camera response and chromatic aberrations. If feature detection methods were used to register images and absolute positions of the features were recorded and saved, stitching software may use the data for geometric optimization of the images in addition to placing the images on the panosphere. Panotools and its various derivative programs use this method. ==== Alignment ==== Alignment may be necessary to transform an image to match the view point of the image it is being composited with. Alignment, in simple terms, is a change in the coordinates system so that it adopts a new coordinate system which outputs image matching the required viewpoint. The types of transformations an image may go through are pure translation, pure rotation, a similarity transform which includes translation, rotation and scaling of the image which needs to be transformed, Affine or projective transform. Projective transformation is the farthest an image can transform (in the set of two dimensional planar transformations), where only visible features that are preserved in the transformed image are straight lines whereas parallelism is maintained in an affine transform. Projective transformation can be mathematically described as x ′ = H ⋅ x , {\displaystyle x'=H\cdot x,} where x {\displaystyle x} is points in the old coordinate system, x ′ {\displaystyle x'} is the corresponding points in the transformed image and H {\displaystyle H} is the homography matrix. Expressing the points x {\displaystyle x} and x ′ {\displaystyle x'} using the camera intrinsics ( K {\displaystyle K} and K ′ {\displaystyle K'} ) and its rotation and translation [ R t ] {\displaystyle [R\,t]} to the real-world coordinates X {\displaystyle X} and < m a t h > x {\displaystyle x} and x ′ {\displaystyle x'} ', we get Using the abo

    Read more →
  • Computer vision dazzle

    Computer vision dazzle

    Computer vision dazzle, also known as CV dazzle, dazzle makeup, or anti-surveillance makeup, is a type of camouflage used to hamper facial recognition software, inspired by dazzle camouflage used by vehicles such as ships and planes. == Methods == CV dazzle combines stylized makeup, asymmetric hair, and sometimes infrared lights built in to glasses or clothing to break up detectable facial patterns recognized by computer vision algorithms in much the same way that warships contrasted color and used sloping lines and curves to distort the structure of a vessel. It has been shown to be somewhat successful at defeating face detection software in common use, including that employed by Facebook. CV dazzle attempts to block detection by facial recognition technologies such as DeepFace "by creating an 'anti-face'". It uses occlusion, covering certain facial features; transformation, altering the shape or colour of parts of the face; and a combination of the two. Prominent artists employing this technique include Adam Harvey and Jillian Mayer. == Use in protests == Computer vision dazzle makeup has been used by protestors in several different protest movements. Its use as a protesting aid has often been found ineffective. It may be effective to thwart computer technology, but draws human attention, is easy for human monitors to spot on security cameras, and makes it hard for protestors to blend in within a crowd. Advances in facial recognition technology make dazzle makeup increasingly ineffective.

    Read more →
  • Elastix (image registration)

    Elastix (image registration)

    Elastix is an image registration toolbox built upon the Insight Segmentation and Registration Toolkit (ITK). It is entirely open-source and provides a wide range of algorithms employed in image registration problems. Its components are designed to be modular to ease a fast and reliable creation of various registration pipelines tailored for case-specific applications. It was first developed by Stefan Klein and Marius Staring under the supervision of Josien P.W. Pluim at Image Sciences Institute (ISI). Its first version was command-line based, allowing the final user to employ scripts to automatically process big data-sets and deploy multiple registration pipelines with few lines of code. Nowadays, to further widen its audience, a version called SimpleElastix is also available, developed by Kasper Marstal, which allows the integration of elastix with high level languages, such as Python, Java, and R. == Image registration fundamentals == Image registration is a well-known technique in digital image processing that searches for the geometric transformation that, applied to a moving image, obtains a one-to-one map with a target image. Generally, the images acquired from different sensors (multimodal), time instants (multitemporal), and points of view (multiview) should be correctly aligned to proceed with further processing and feature extraction. Even though there are a plethora of different approaches to image registration, the majority is composed of the same macro building blocks, namely the transformation, the interpolator, the metric, and the optimizer. Registering two or more images can be framed as an optimization problem that requires multiple iterations to converge to the best solution. Starting from an initial transformation computed from the image moments the optimization process searches for the best transformation parameters based on the value of the selected similarity metric. The figure on the right shows the high-level representation of the registration of two images, where the reference remains constant during the entire process, while the moving one will be transformed according to the transformation parameters. In other words, the registration ends when the similarity metric, which is a mathematical function with a certain number of parameters to be optimized, reaches the optimal value which is highly dependent on the specific application. == Main building blocks == Following the structure of the image registration workflow, the elastix toolbox proposes a modular solution that implements for each of the building blocks different algorithms, highly employed in medical image registration, and helps the final users to build their specific pipeline by selecting the most suitable algorithm for each of the main building blocks. Each block is easily configurable both by selecting pre-defined initialization values or by trying multiple sets of parameters and then choosing the most performing one. The registration is performed on images, and the elastix toolbox supports all the data formats supported by ITK, ranging from JPEG and PNG to medical standard formats such as DICOM and NIFTI. It also stores physical pixel spacing, the origin and the relative position to an external world reference system, when provided in the metadata, to facilitate the registration process, especially in medical field applications. === Transformation === The transformation is an essential building block, since it defines the allowable transformations. In image registration, the main distinction can be done between parallel-to-parallel and parallel-to-non parallel (deformable) line mapping transformations. In the elastix toolbox, the final users can select one transformation or compose more transformations either through addition or via composition. Below are reported the different transformation models in order of increasing flexibility, along with the corresponding elastix class names between brackets. Translation (TranslationTransform) allows only translations Rigid (EulerTransform) expands the translation adding rotations and the object is seen as a rigid body Similarity (SimilarityTransform) expands the rigid transformation by introducing isotropic scaling Affine (AffineTransform) expands the rigid transformation allowing both scaling and shear B-splines (BSplineTransform) is a deformable transformation usually preceded by a rigid or affine one Thin-plate splines (SplineKernelTransform) is a deformable transformation belonging to the class of kernel-based transformations that is a composition of and affine and a non-rigid part === Metric === The similarity metric is the mathematical function whose parameters should be optimized to reach the desired registration, and, during the process, it is computed multiple times. Below are reported the available metrics computed employing the reference and the transformed images and the corresponding elastix class names between brackets. Mean squared difference (AdvancedMeanSquares) to be used for mono-modal applications Normalized correlation coefficient (AdvancedNormalizedCorrelation) to be used for images that have an intensity linear relationship Mutual information (AdvancedMattesMutualInformation) to be used for both mono- and multi-modal applications and optimized to reach better performance compared to the normalized version Normalized mutual information (NormalizedMutualInformation) for both mono- and multi-modal applications Kappa statistic (AdvancedKappaStatistic) to be used only for binary images === Sampler === For the computation of the similarity metrics, it is not always necessary to consider all the voxels and, sometimes, it can be useful to use only a fraction of the voxels of the images, i.e. to reduce the execution time for big input images. Below are reported the available criteria for selecting a fraction of the voxels for the similarity metric computation and the corresponding elastix class names between brackets. Full (Full) to employ all the voxels Grid (Grid) to employ a regular grid defined by the user to downsample the image Random (Random) to randomly select a percentage of voxels defined by the users (all voxels have equal probability to be selected) Random coordinate (RandomCoordinate) like the random criterion, but in this case also off-grid positions can be selected to simplify the optimization process === Interpolator === After the application of the transformation, it may occur that the voxels used for the similarity metric computation are at non-voxel positions, so intensity interpolation should be performed to ensure the correctness of the computed values. Below are reported the implemented interpolators and the corresponding elastix class names between brackets. Nearest neighbor (NearestNeighborInterpolator) exploits little resources, but gives low quality results Linear (LinearInterpolator) is sufficient in general applications N-th order B-spline (BSplineInterpolator) can be used to increase the order N, increasing quality and computation time. N=0 and N=1 indicate the nearest neighbor and linear cases respectively. === Optimizer === The optimizer defines the strategy employed for searching the best transformation parameter to reach the correct registration, and it is commonly an iterative strategy. Below are reported some of the implemented optimization strategies. Gradient descent Robbins-Monro, similar to the gradient descent, but employing an approximation of the cost function derivatives A wider range of optimizers is also available, such as Quasi-Newton or evolutionary strategies. === Other features === The elastix software also offers other features that can be employed to speed up the registration procedure and to provide more advanced algorithms to the end-users. Some examples are the introduction of blur and Gaussian pyramid to reduce data complexity, and multi-image and multi-metric framework to deal with more complex applications. == Applications == Elastix has applications mainly in the medical field, where image registration is fundamental to get comprehensive information regarding the analysed anatomical region. It is widely employed in image-guided surgery, tumour monitoring, and treatment assessment. For example, in radiotherapy planning, image registration allows to correctly deliver the treatment and evaluate the obtained results. Thanks to the wide range of implemented algorithms, the use of the elastix software allows physicians and researchers to test different registration pipelines from the simplest to more complex ones, and to save the best one as a configuration file. This file and the fact that the software is completely open-source makes it easy to reproduce the work, that can help supporting the open science paradigm, and allows fast reuse on different patients data. In image-guided surgery, registration time and accuracy are critical points, considering that, during the registration, the patient is on the operating table, and the imag

    Read more →