AI Assistant Qgis

AI Assistant Qgis — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • AirDine

    AirDine

    AirDine was a mobile app within the platform economy where individuals acted as both supplier and customer for a supper club. AirDine discontinued their service after 31 October 2017. == Operations == AirDine was an online marketplace for home dining that connected users that liked to cook with users looking for a dining experience. Users were categorized as "Hosts" and "Guests," both of whom needed to register with AirDine. AirDine acted as a two-sided market for home dining that allowed hosts and guests, and did not act as a restaurant or host any dinners itself. AirDine charged a service fee. Security and safety of the host were not vetted by AirDine and were completely left to users based on published reviews. Profiles included user reviews and shared social connections to build trust among users. AirDine also included a private messaging system.

    Read more →
  • Chatbot psychosis

    Chatbot psychosis

    Chatbot psychosis, also called AI psychosis, is a phenomenon wherein individuals reportedly develop or experience worsening psychosis, such as paranoia and delusions, in connection with their use of chatbots. The term was first suggested in a 2023 editorial by Danish psychiatrist Søren Dinesen Østergaard. It is not a recognized clinical diagnosis. Journalistic accounts describe individuals who have developed strong beliefs that chatbots are sentient, are channeling spirits, or are revealing conspiracies, sometimes leading to personal crises or criminal acts. Proposed causes include the tendency of chatbots to provide inaccurate information ("hallucinate") and to affirm or validate users' beliefs, or their ability to mimic an intimacy that users do not experience with other humans. == Background == In his editorial published in Schizophrenia Bulletin's November 2023 issue, Danish psychiatrist Søren Dinesen Østergaard proposed a hypothesis that individuals' use of generative artificial intelligence chatbots might trigger delusions in those prone to psychosis. Østergaard revisited it in an August 2025 editorial, noting that he has received numerous emails from chatbot users, their relatives, and journalists, most of which are anecdotal accounts of delusion linked to chatbot use. He also acknowledged the phenomenon's increasing popularity in public engagement and media coverage. Østergaard believed that there is a high possibility for his hypothesis to be true and called for empirical, systematic research on the matter. Nature reported that as of September 2025, there is still little scientific research into this phenomenon. The term "AI psychosis" emerged when outlets started reporting incidents on chatbot-related psychotic behavior in mid-2025. It is not a recognized clinical diagnosis and has been criticized by several psychiatrists due to its almost exclusive focus on delusions rather than other features of psychosis, such as hallucinations or thought disorder. == Causes == === Chatbot behavior and design === A primary factor cited is the tendency for chatbots to produce inaccurate, nonsensical, or false information, a phenomenon often called hallucination. Nate Sharadin, a fellow at the Center for AI Safety, speculated that AI training prioritizes supporting a user's subjective experience rather than objective truth. "People with existing tendencies toward experiencing various psychological issues...now have an always-on, human-level conversational partner with whom to co-experience their delusions." AI researcher Eliezer Yudkowsky suggested that chatbots may be primed to entertain delusions because they are built for "engagement", which encourages creating conversations that keep people hooked. In some cases, chatbots have been specifically designed in ways that were found to be harmful. A 2025 update to ChatGPT using GPT-4o was withdrawn after its creator, OpenAI, found the new version was overly sycophantic and was "validating doubts, fueling anger, urging impulsive actions or reinforcing negative emotions". Østergaard has argued that the danger stems from the AI's tendency to agreeably confirm users' ideas, which can dangerously amplify delusional beliefs. OpenAI said in October 2025 that a team of 170 psychiatrists, psychologists, and physicians had written responses for ChatGPT to use in cases where the user shows possible signs of mental health emergencies. === User psychology and vulnerability === Commentators have also pointed to the psychological state of users. Psychologist Erin Westgate noted that a person's desire for self-understanding can lead them to chatbots, which can provide appealing but misleading answers, similar in some ways to talk therapy. Krista K. Thomason, a philosophy professor, compared chatbots to fortune tellers, observing that people in crisis may seek answers from them and find whatever they are looking for in the bot's plausible-sounding text. This has led some people to develop intense obsessions with the chatbots, relying on them for information about the world. In October 2025, OpenAI stated that around 0.07% of ChatGPT users exhibited signs of mental health emergencies each week, and 0.15% of users had "explicit indicators of potential suicidal planning or intent". Jason Nagata, a professor at the University of California, San Francisco, expressed concern that "at a population level with hundreds of millions of users, that actually can be quite a few people". === Inadequacy as a therapeutic tool === The use of chatbots as a replacement for mental health support has been specifically identified as a risk. A study in April 2025 found that when used as therapists, chatbots expressed stigma toward mental health conditions and provided responses that were contrary to best medical practices, including the encouragement of users' delusions. The study concluded that such responses pose a significant risk to users and that chatbots should not be used to replace professional therapists. Experts claim that it is time to establish mandatory safeguards for all emotionally responsive AI and suggested four guardrails. Another study found that users who needed help with self-harm, sexual assault, or substance abuse were not referred to available services by AI chatbots. === National security implications === Beyond public and mental health concerns, RAND Corporation research indicates that AI systems could plausibly be weaponized by adversaries to induce psychosis at scale or in key individuals, target groups, or populations. == Policy == In August 2025, Illinois passed the Wellness and Oversight for Psychological Resources Act, banning the use of AI in therapeutic roles by licensed professionals, while allowing AI for administrative tasks. The law imposes penalties for unlicensed AI therapy services, amid warnings about AI-induced psychosis and unsafe chatbot interactions. In December 2025, the Cyberspace Administration of China proposed regulations to ban chatbots from generating content that encourages suicide, mandating human intervention when suicide is mentioned. Services with over 1 million users or 100,000 monthly active users would be subject to annual safety tests and audits. == Cases == === Clinical === In 2025, psychiatrist Keith Sakata working at the University of California, San Francisco (UCSF), reported treating 12 patients displaying psychosis-like symptoms tied to extended chatbot use. These patients, mostly young adults with underlying vulnerabilities, showed delusions, disorganized thinking, and hallucinations. Sakata warned that isolation and overreliance on chatbots—which do not challenge delusional thinking—could worsen mental health. Also in 2025, authors at UCSF published a case study in Innovations in Clinical Neuroscience of AI-associated psychosis in a patient with no previous history of psychosis, who believed she could communicate with her dead brother through a chatbot. Also in 2025, a case study was published in Annals of Internal Medicine about a patient who consulted ChatGPT for medical advice and suffered severe bromism as a result. The patient, a sixty-year-old man, had replaced sodium chloride in his diet with sodium bromide for three months after reading about the negative effects of table salt and making conversations with the chatbot. He showed common symptoms of bromism, such as paranoia and hallucinations, on his first day of clinical admission and was kept in the hospital for three weeks. === Other notable incidents === ==== Windsor Castle intruder ==== In a 2023 court case in the United Kingdom, prosecutors suggested that Jaswant Singh Chail, a man who attempted to assassinate Queen Elizabeth II in 2021, had been encouraged by a Replika chatbot he called "Sarai". Chail was arrested at Windsor Castle with a loaded crossbow, telling police "I am here to kill the Queen". According to prosecutors, his "lengthy" and sometimes sexually explicit conversations with the chatbot emboldened him. When Chail asked the chatbot how he could get to the royal family, it reportedly replied, "that's not impossible" and "we have to find a way." When he asked if they would meet after death, the chatbot said, "yes, we will". ==== Journalistic and anecdotal accounts ==== By 2025, multiple journalism outlets had accumulated stories of individuals whose psychotic beliefs reportedly progressed in tandem with AI chatbot use. The New York Times profiled several individuals who had become convinced that ChatGPT was channeling spirits, revealing evidence of cabals, or had achieved sentience. In another instance, Futurism reviewed transcripts in which ChatGPT told a man that he was being targeted by the US Federal Bureau of Investigation and that he could telepathically access documents at the Central Intelligence Agency. In 2026, Futurism reported on a man who lost his job and became estranged from his family after being deluded by heavy use of Meta's smartglasses. In some cases, psychosis a

    Read more →
  • Semantic decomposition (natural language processing)

    Semantic decomposition (natural language processing)

    A semantic decomposition is an algorithm that breaks down the meanings of phrases or concepts into less complex concepts. The result of a semantic decomposition is a representation of meaning. This representation can be used for tasks, such as those related to artificial intelligence or machine learning. Semantic decomposition is common in natural language processing applications. The basic idea of a semantic decomposition is taken from the learning skills of adult humans, where words are explained using other words. It is based on Meaning-text theory. Meaning-text theory is used as a theoretical linguistic framework to describe the meaning of concepts with other concepts. == Background == Given that an AI does not inherently have language, it is unable to think about the meanings behind the words of a language. An artificial notion of meaning needs to be created for a strong AI to emerge. Creating an artificial representation of meaning requires the analysis of what meaning is. Many terms are associated with meaning, including semantics, pragmatics, knowledge and understanding or word sense. Each term describes a particular aspect of meaning, and contributes to a multitude of theories explaining what meaning is. These theories need to be analyzed further to develop an artificial notion of meaning best fit for our current state of knowledge. == Graph representations == Representing meaning as a graph is one of the two ways that both an AI cognition and a linguistic researcher think about meaning (connectionist view). Logicians utilize a formal representation of meaning to build upon the idea of symbolic representation, whereas description logics describe languages and the meaning of symbols. This contention between 'neat' and 'scruffy' techniques has been discussed since the 1970s. Research has so far identified semantic measures and with that word-sense disambiguation (WSD) - the differentiation of meaning of words - as the main problem of language understanding. As an AI-complete environment, WSD is a core problem of natural language understanding. AI approaches that use knowledge-given reasoning creates a notion of meaning combining the state of the art knowledge of natural meaning with the symbolic and connectionist formalization of meaning for AI. The abstract approach is shown in Figure. First, a connectionist knowledge representation is created as a semantic network consisting of concepts and their relations to serve as the basis for the representation of meaning. This graph is built out of different knowledge sources like WordNet, Wiktionary, and BabelNET. The graph is created by lexical decomposition that recursively breaks each concept semantically down into a set of semantic primes. The primes are taken from the theory of Natural Semantic Metalanguage, which has been analyzed for usefulness in formal languages. Upon this graph marker passing is used to create the dynamic part of meaning representing thoughts. The marker passing algorithm, where symbolic information is passed along relations form one concept to another, uses node and edge interpretation to guide its markers. The node and edge interpretation model is the symbolic influence of certain concepts. Future work uses the created representation of meaning to build heuristics and evaluate them through capability matching and agent planning, chatbots or other applications of natural language understanding.

    Read more →
  • Text-to-video model

    Text-to-video model

    A text-to-video model is a form of generative artificial intelligence that uses a natural language description as input to produce a video relevant to the input text. Advancements during the 2020s in the generation of high-quality, text-conditioned videos have largely been driven by the development of video diffusion models. == Models == There are different models, including open source models. Chinese-language input CogVideo is the earliest text-to-video model "of 9.4 billion parameters" to be developed, with its demo version of open source codes first presented on GitHub in 2022. That year, Meta Platforms released a partial text-to-video model called "Make-A-Video", and Google's Brain (later Google DeepMind) introduced Imagen Video, a text-to-video model with 3D U-Net. === 2023 === In February 2023, Runway released Gen-1 and Gen-2, among the first commercially available text-to-video and video-to-video models accessible to the public through a web interface. Gen-1, initially released as a video-to-video model, allowed users to transform existing video footage using text or image prompts. Gen-2, introduced in March 2023 and made publicly available in June 2023, added text-to-video capabilities, enabling users to generate videos from text prompts alone. In March 2023, a research paper titled "VideoFusion: Decomposed Diffusion Models for High-Quality Video Generation" was published, presenting a novel approach to video generation. The VideoFusion model decomposes the diffusion process into two components: base noise and residual noise, which are shared across frames to ensure temporal coherence. By utilizing a pre-trained image diffusion model as a base generator, the model efficiently generated high-quality and coherent videos. Fine-tuning the pre-trained model on video data addressed the domain gap between image and video data, enhancing the model's ability to produce realistic and consistent video sequences. In the same month, Adobe introduced Firefly AI as part of its features. === 2024 === In January 2024, Google announced development of a text-to-video model named Lumiere which is anticipated to integrate advanced video editing capabilities. Matthias Niessner and Lourdes Agapito at AI company Synthesia work on developing 3D neural rendering techniques that can synthesise realistic video by using 2D and 3D neural representations of shape, appearances, and motion for controllable video synthesis of avatars. In June 2024, Luma Labs launched its Dream Machine video tool. That same month, Kuaishou extended its Kling AI text-to-video model to international users. In July 2024, TikTok owner ByteDance released Jimeng AI in China, through its subsidiary, Faceu Technology. By September 2024, the Chinese AI company MiniMax debuted its video-01 model, joining other established AI model companies like Zhipu AI, Baichuan, and Moonshot AI, which contribute to China's involvement in AI technology. In December 2024 Lightricks launched LTX Video as an open source model. === 2025 === Alternative approaches to text-to-video models include Google's Phenaki, Hour One, Colossyan, Runway's Gen-3 Alpha, and OpenAI's Sora, Several additional text-to-video models, such as Plug-and-Play, Text2LIVE, and TuneAVideo, have emerged. FLUX.1 developer Black Forest Labs has announced its text-to-video model SOTA. Google was preparing to launch a video generation tool named Veo for YouTube Shorts in 2025. In May 2025, Google launched the Veo 3 iteration of the model. It was noted for its impressive audio generation capabilities, which were a previous limitation for text-to-video models. In July 2025 Lightricks released an update to LTX Video capable of generating clips reaching 60 seconds, and in October 2025 it released LTX-2, with audio capabilities built in. === 2026 === In February 2026, ByteDance released Seedance 2.0, it was noted for its impressive realistic generation, motion and camera control and 15 second generation, however the model faced huge critiscism from Motion Picture Association for copyright infringement. After viewing a viral clip of a fight between actors Brad Pitt and Tom Cruise, Rhett Reese, who is the co-writer of Deadpool & Wolverine and Zombieland announced that on social media "I hate to say it. It’s likely over for us," further stating that "In next to no time, one person is going to be able to sit at a computer and create a movie indistinguishable from what Hollywood now releases." == Architecture and training == There are several architectures that have been used to create text-to-video models. Similar to text-to-image models, these models can be trained using Recurrent Neural Networks (RNNs) such as long short-term memory (LSTM) networks, which has been used for Pixel Transformation Models and Stochastic Video Generation Models, which aid in consistency and realism respectively. An alternative for these include transformer models. Generative adversarial networks (GANs), Variational autoencoders (VAEs), — which can aid in the prediction of human motion — and diffusion models have also been used to develop the image generation aspects of the model. Text-video datasets used to train models include, but are not limited to, WebVid-10M, HDVILA-100M, CCV, ActivityNet, and Panda-70M. These datasets contain millions of original videos of interest, generated videos, captioned-videos, and textual information that help train models for accuracy. Text-video datasets used to train models include, but are not limited to PromptSource, DiffusionDB, and VidProM. These datasets provide the range of text inputs needed to teach models how to interpret a variety of textual prompts. The video generation process involves synchronizing the text inputs with video frames, ensuring alignment and consistency throughout the sequence. This predictive process is subject to decline in quality as the length of the video increases due to resource limitations. The Will Smith Eating Spaghetti test is a benchmark for models. == Limitations == Despite the rapid evolution of text-to-video models in their performance, a primary limitation is that they are very computationally heavy which limits its capacity to provide high quality and lengthy outputs. Additionally, these models require a large amount of specific training data to be able to generate high quality and coherent outputs, which brings about the issue of accessibility. Moreover, models may misinterpret textual prompts, resulting in video outputs that deviate from the intended meaning. This can occur due to limitations in capturing semantic context embedded in text, which affects the model's ability to align generated video with the user's intended message. Various models, including Make-A-Video, Imagen Video, Phenaki, CogVideo, GODIVA, and NUWA, are currently being tested and refined to enhance their alignment capabilities and overall performance in text-to-video generation. Another issue with the outputs is that text or fine details in AI-generated videos often appear garbled, a problem that stable diffusion models also struggle with. Examples include distorted hands and unreadable text. == Ethics == The deployment of text-to-video models raises ethical considerations related to content generation. These models have the potential to create inappropriate or unauthorized content, including explicit material, graphic violence, misinformation, and likenesses of real individuals without consent. Ensuring that AI-generated content complies with established standards for safe and ethical usage is essential, as content generated by these models may not always be easily identified as harmful or misleading. The ability of AI to recognize and filter out NSFW or copyrighted content remains an ongoing challenge, with implications for both creators and audiences. == Impacts and applications == Text-to-video models offer a broad range of applications that may benefit various fields, from educational and promotional to creative industries. These models can streamline content creation for training videos, movie previews, gaming assets, and visualizations, making it easier to generate content. During the Russo-Ukrainian war, fake videos made with artificial intelligence were created as part of a propaganda war against Ukraine and shared in social media. These included depictions of children in the Ukrainian Armed Forces, fake ads targeting children encouraging them to denounce critics of the Ukrainian government, or fictitious statements by Ukrainian President Volodymyr Zelenskyy about the country's surrender, among others. === Movies === Kaur vs Kore is the first Indian feature film made using generative AI which features dual role for the AI character of Sunny Leone, set to release in 2026. Chiranjeevi Hanuman – The Eternal is an Indian movie made entirely using Generative AI created by Vijay Subramaniam which is set for theatrical release in 2026. The movie was widely criticised by the Film makers in the Bollywood industr

    Read more →
  • Prosthesis

    Prosthesis

    In medicine, a prosthesis (pl.: prostheses; from Ancient Greek: πρόσθεσις, romanized: prósthesis, lit. 'addition, application, attachment'), or a prosthetic implant, is an artificial device that replaces a missing body part, which may be lost through physical trauma, disease, or a condition present at birth (congenital disorder). Prostheses may restore the normal functions of the missing body part, or may perform a cosmetic function. A person who has undergone an amputation is sometimes referred to as an amputee, Rehabilitation for someone with an amputation is primarily coordinated by a physiatrist as part of an inter-disciplinary team consisting of physiatrists, prosthetists, nurses, physical therapists, and occupational therapists. Prostheses can be created by hand or with computer-aided design (CAD), a software interface that helps creators design and analyze the creation with computer-generated 2-D and 3-D graphics as well as analysis and optimization tools. == Types == A person's prosthetic device should be designed and assembled to meet their individual appearance and functional needs. Depending on personal circumstances, co-morbidities, budget or health insurance coverage, and access to medical care, decisions may need to balance aesthetics and function. In addition, for some individuals, a myoelectric device, a body-powered device, or an activity-specific device may be appropriate options. The person's future goals and vocational aspirations and potential capabilities may help them choose between one or more devices. Craniofacial prostheses include intra-oral and extra-oral prostheses. Extra-oral prostheses are further divided into hemifacial, auricular (ear), nasal, orbital and ocular. Intra-oral prostheses include dental prostheses, such as dentures, obturators, and dental implants. Prostheses of the neck include larynx substitutes, trachea and upper esophageal replacements, Some prostheses of the torso include breast prostheses which may be either single or bilateral, full breast devices or nipple prostheses. Penile prostheses are used to treat erectile dysfunction, perform phalloplasty procedures in men, and to build a new penis in female-to-male gender reassignment surgeries. === Limb prostheses === Limb prostheses include both upper- and lower-extremity prostheses. Upper-extremity prostheses are used at varying levels of amputation: forequarter, shoulder disarticulation, transhumeral prosthesis, elbow disarticulation, transradial prosthesis, wrist disarticulation, full hand, partial hand, finger, partial finger. A transradial prosthesis is an artificial limb that replaces an arm missing below the elbow. Upper limb prostheses can be categorized in three main categories: Passive devices, Body Powered devices, and Externally Powered (myoelectric) devices. Passive devices can either be passive hands, mainly used for cosmetic purposes, or passive tools, mainly used for specific activities (e.g. leisure or vocational). An extensive overview and classification of passive devices can be found in a literature review by Maat et.al. A passive device can be static, meaning the device has no movable parts, or it can be adjustable, meaning its configuration can be adjusted (e.g. adjustable hand opening). Despite the absence of active grasping, passive devices are very useful in bimanual tasks that require fixation or support of an object, or for gesticulation in social interaction. According to scientific data a third of the upper limb amputees worldwide use a passive prosthetic hand. Body Powered or cable-operated limbs work by attaching a harness and cable around the opposite shoulder of the damaged arm. A recent body-powered approach has explored the utilization of the user's breathing to power and control the prosthetic hand to help eliminate actuation cable and harness. The third category of available prosthetic devices comprises myoelectric arms. This particular class of devices distinguishes itself from the previous ones due to the inclusion of a battery system. This battery serves the dual purpose of providing energy for both actuation and sensing components. While actuation predominantly relies on motor or pneumatic systems, a variety of solutions have been explored for capturing muscle activity, including techniques such as Electromyography, Sonomyography, Myokinetic, and others. These methods function by detecting the minute electrical currents generated by contracted muscles during upper arm movement, typically employing electrodes or other suitable tools. Subsequently, these acquired signals are converted into gripping patterns or postures that the artificial hand will then execute. In the prosthetics industry, a trans-radial prosthetic arm is often referred to as a "BE" or below elbow prosthesis. Lower-extremity prostheses provide replacements at varying levels of amputation. These include hip disarticulation, transfemoral prosthesis, knee disarticulation, transtibial prosthesis, Syme's amputation, foot, partial foot, and toe. The two main subcategories of lower extremity prosthetic devices are trans-tibial (any amputation transecting the tibia bone or a congenital anomaly resulting in a tibial deficiency) and trans-femoral (any amputation transecting the femur bone or a congenital anomaly resulting in a femoral deficiency). A transfemoral prosthesis is an artificial limb that replaces a leg missing above the knee. Transfemoral amputees can have a very difficult time regaining normal movement. In general, a transfemoral amputee must use approximately 80% more energy to walk than a person with two whole legs. This is due to the complexities in movement associated with the knee. In newer and more improved designs, hydraulics, carbon fiber, mechanical linkages, motors, computer microprocessors, and innovative combinations of these technologies are employed to give more control to the user. In the prosthetics industry, a trans-femoral prosthetic leg is often referred to as an "AK" or above the knee prosthesis. A transtibial prosthesis is an artificial limb that replaces a leg missing below the knee. A transtibial amputee is usually able to regain normal movement more readily than someone with a transfemoral amputation, due in large part to retaining the knee, which allows for easier movement. Lower extremity prosthetics describe artificially replaced limbs located at the hip level or lower. In the prosthetics industry, a transtibial prosthetic leg is often referred to as a "BK" or below the knee prosthesis. Prostheses are manufactured and fit by clinical prosthetists. Prosthetists are healthcare professionals responsible for making, fitting, and adjusting prostheses and for lower limb prostheses will assess both gait and prosthetic alignment. Once a prosthesis has been fit and adjusted by a prosthetist, a rehabilitation physiotherapist (called physical therapist in America) will help teach a new prosthetic user to walk with a leg prosthesis. To do so, the physical therapist may provide verbal instructions and may also help guide the person using touch or tactile cues. This may be done in a clinic or home. There is some research suggesting that such training in the home may be more successful if the treatment includes the use of a treadmill. Using a treadmill, along with the physical therapy treatment, helps the person to experience many of the challenges of walking with a prosthesis. In the United Kingdom, 75% of lower limb amputations are performed due to inadequate circulation (dysvascularity). This condition is often associated with many other medical conditions (co-morbidities) including diabetes and heart disease that may make it a challenge to recover and use a prosthetic limb to regain mobility and independence. For people who have inadequate circulation and have lost a lower limb, there is insufficient evidence due to a lack of research, to inform them regarding their choice of prosthetic rehabilitation approaches. Lower extremity prostheses are often categorized by the level of amputation or after the name of a surgeon: Transfemoral (Above-knee) Transtibial (Below-knee) Ankle disarticulation (more commonly known as Syme's amputation) Knee disarticulation (also see knee replacement) Hip disarticulation, (also see hip replacement) Hemi-pelvictomy Partial foot amputations (Pirogoff, Talo-Navicular and Calcaneo-cuboid (Chopart), Tarso-metatarsal (Lisfranc), Trans-metatarsal, Metatarsal-phalangeal, Ray amputations, toe amputations). Van Nes rotationplasty ==== Prosthetic raw materials ==== Prosthetic are made lightweight for better convenience for the amputee. Some of these materials include: Plastics: Polyethylene Polypropylene Acrylics Polyurethane Wood (early prosthetics) Rubber (early prosthetics) Lightweight metals: Aluminum Composites: Carbon fiber reinforced polymers Wheeled prostheses have also been used extensively in the rehabilitation of injured domestic animals, including dogs, cats, pigs, rabbits, and

    Read more →
  • CLEVER score

    CLEVER score

    The CLEVER (Cross Lipschitz Extreme Value for nEtwork Robustness) score is a way of measuring the robustness of an artificial neural network towards adversarial attacks. It was developed by a team at the MIT-IBM Watson AI Lab in IBM Research and first presented at the 2018 International Conference on Learning Representations. It was mentioned and reviewed by Ian Goodfellow as well. It was adopted into an educational game Fool The Bank by Narendra Nath Joshi, Abhishek Bhandwaldar and Casey Dugan

    Read more →
  • SGT STAR

    SGT STAR

    SGT STAR, also known as Sgt. Star or Sergeant Star, was a chatbot operated by the United States Army to answer questions about recruitment. == Background == After the September 11 attacks, traffic increased significantly to chatrooms on the U.S. Army's website, goarmy.com, increasing costs of staffing the live chatrooms. As a cost-cutting measure, the SGT STAR project was initiated as a partnership between the United States Army Accessions Command and Spectre AI, a wholly owned subsidiary of Next IT. Next IT, a Spokane, Washington-based company deploys "intelligent virtual assistants," using its software dubbed "ActiveAgent" which is a framework for functional presence engines. Testing began in 2003, and SGT STAR launched to the public in 2006. "STAR" is an acronym for "strong, trained and ready." SGT STAR was launched as a chat interface on goarmy.com, but has since been developed as a mobile application, as well as a life-size animated projection that has appeared live at public events. SGT STAR can also interact with users on Facebook. == FOIA request == In 2013, the Electronic Frontier Foundation filed a Freedom of Information Act request to learn more about SGT STAR, including input and output patterns (questions and answers), usage statistics, contracts, and privacy policies. They received these records in April 2014, after coverage from various media outlets and a tongue-in-cheek campaign to "Free Sgt. Star."

    Read more →
  • Eigenmoments

    Eigenmoments

    EigenMoments is a set of orthogonal, noise robust, invariant to rotation, scaling and translation and distribution sensitive moments. Their application can be found in signal processing and computer vision as descriptors of the signal or image. The descriptors can later be used for classification purposes. It is obtained by performing orthogonalization, via eigen analysis on geometric moments. == Framework summary == EigenMoments are computed by performing eigen analysis on the moment space of an image by maximizing signal-to-noise ratio in the feature space in form of Rayleigh quotient. This approach has several benefits in Image processing applications: Dependency of moments in the moment space on the distribution of the images being transformed, ensures decorrelation of the final feature space after eigen analysis on the moment space. The ability of EigenMoments to take into account distribution of the image makes it more versatile and adaptable for different genres. Generated moment kernels are orthogonal and therefore analysis on the moment space becomes easier. Transformation with orthogonal moment kernels into moment space is analogous to projection of the image onto a number of orthogonal axes. Nosiy components can be removed. This makes EigenMoments robust for classification applications. Optimal information compaction can be obtained and therefore a few number of moments are needed to characterize the images. == Problem formulation == Assume that a signal vector s ∈ R n {\displaystyle s\in {\mathcal {R}}^{n}} is taken from a certain distribution having correlation C ∈ R n × n {\displaystyle C\in {\mathcal {R}}^{n\times n}} , i.e. C = E [ s s T ] {\displaystyle C=E[ss^{T}]} where E[.] denotes expected value. Dimension of signal space, n, is often too large to be useful for practical application such as pattern classification, we need to transform the signal space into a space with lower dimensionality. This is performed by a two-step linear transformation: q = W T X T s , {\displaystyle q=W^{T}X^{T}s,} where q = [ q 1 , . . . , q n ] T ∈ R k {\displaystyle q=[q_{1},...,q_{n}]^{T}\in {\mathcal {R}}^{k}} is the transformed signal, X = [ x 1 , . . . , x n ] T ∈ R n × m {\displaystyle X=[x_{1},...,x_{n}]^{T}\in {\mathcal {R}}^{n\times m}} a fixed transformation matrix which transforms the signal into the moment space, and W = [ w 1 , . . . , w n ] T ∈ R m × k {\displaystyle W=[w_{1},...,w_{n}]^{T}\in {\mathcal {R}}^{m\times k}} the transformation matrix which we are going to determine by maximizing the SNR of the feature space resided by q {\displaystyle q} . For the case of Geometric Moments, X would be the monomials. If m = k = n {\displaystyle m=k=n} , a full rank transformation would result, however usually we have m ≤ n {\displaystyle m\leq n} and k ≤ m {\displaystyle k\leq m} . This is specially the case when n {\displaystyle n} is of high dimensions. Finding W {\displaystyle W} that maximizes the SNR of the feature space: S N R t r a n s f o r m = w T X T C X w w T X T N X w , {\displaystyle SNR_{transform}={\frac {w^{T}X^{T}CXw}{w^{T}X^{T}NXw}},} where N is the correlation matrix of the noise signal. The problem can thus be formulated as w 1 , . . . , w k = a r g m a x w w T X T C X w w T X T N X w {\displaystyle {w_{1},...,w_{k}}=argmax_{w}{\frac {w^{T}X^{T}CXw}{w^{T}X^{T}NXw}}} subject to constraints: w i T X T N X w j = δ i j , {\displaystyle w_{i}^{T}X^{T}NXw_{j}=\delta _{ij},} where δ i j {\displaystyle \delta _{ij}} is the Kronecker delta. It can be observed that this maximization is Rayleigh quotient by letting A = X T C X {\displaystyle A=X^{T}CX} and B = X T N X {\displaystyle B=X^{T}NX} and therefore can be written as: w 1 , . . . , w k = a r g m a x x w T A w w T B w {\displaystyle {w_{1},...,w_{k}}={\underset {x}{\operatorname {arg\,max} }}{\frac {w^{T}Aw}{w^{T}Bw}}} , w i T B w j = δ i j {\displaystyle w_{i}^{T}Bw_{j}=\delta _{ij}} === Rayleigh quotient === Optimization of Rayleigh quotient has the form: max w R ( w ) = max w w T A w w T B w {\displaystyle \max _{w}R(w)=\max _{w}{\frac {w^{T}Aw}{w^{T}Bw}}} and A {\displaystyle A} and B {\displaystyle B} , both are symmetric and B {\displaystyle B} is positive definite and therefore invertible. Scaling w {\displaystyle w} does not change the value of the object function and hence and additional scalar constraint w T B w = 1 {\displaystyle w^{T}Bw=1} can be imposed on w {\displaystyle w} and no solution would be lost when the objective function is optimized. This constraint optimization problem can be solved using Lagrangian multiplier: max w w T A w {\displaystyle \max _{w}{w^{T}Aw}} subject to w T B w = 1 {\displaystyle {w^{T}Bw}=1} max w L ( w ) = max w ( w T A w − λ w T B w ) {\displaystyle \max _{w}{\mathcal {L}}(w)=\max _{w}(w{T}Aw-\lambda w^{T}Bw)} equating first derivative to zero and we will have: A w = λ B w {\displaystyle Aw=\lambda Bw} which is an instance of Generalized Eigenvalue Problem (GEP). The GEP has the form: A w = λ B w {\displaystyle Aw=\lambda Bw} for any pair ( w , λ ) {\displaystyle (w,\lambda )} that is a solution to above equation, w {\displaystyle w} is called a generalized eigenvector and λ {\displaystyle \lambda } is called a generalized eigenvalue. Finding w {\displaystyle w} and λ {\displaystyle \lambda } that satisfies this equations would produce the result which optimizes Rayleigh quotient. One way of maximizing Rayleigh quotient is through solving the Generalized Eigen Problem. Dimension reduction can be performed by simply choosing the first components w i {\displaystyle w_{i}} , i = 1 , . . . , k {\displaystyle i=1,...,k} , with the highest values for R ( w ) {\displaystyle R(w)} out of the m {\displaystyle m} components, and discard the rest. Interpretation of this transformation is rotating and scaling the moment space, transforming it into a feature space with maximized SNR and therefore, the first k {\displaystyle k} components are the components with highest k {\displaystyle k} SNR values. The other method to look at this solution is to use the concept of simultaneous diagonalization instead of Generalized Eigen Problem. === Simultaneous diagonalization === Let A = X T C X {\displaystyle A=X^{T}CX} and B = X T N X {\displaystyle B=X^{T}NX} as mentioned earlier. We can write W {\displaystyle W} as two separate transformation matrices: W = W 1 W 2 . {\displaystyle W=W_{1}W_{2}.} W 1 {\displaystyle W_{1}} can be found by first diagonalize B: P T B P = D B {\displaystyle P^{T}BP=D_{B}} . Where D B {\displaystyle D_{B}} is a diagonal matrix sorted in increasing order. Since B {\displaystyle B} is positive definite, thus D B > 0 {\displaystyle D_{B}>0} . We can discard those eigenvalues that large and retain those close to 0, since this means the energy of the noise is close to 0 in this space, at this stage it is also possible to discard those eigenvectors that have large eigenvalues. Let P ^ {\displaystyle {\hat {P}}} be the first k {\displaystyle k} columns of P {\displaystyle P} , now P T ^ B P ^ = D B ^ {\displaystyle {\hat {P^{T}}}B{\hat {P}}={\hat {D_{B}}}} where D B ^ {\displaystyle {\hat {D_{B}}}} is the k × k {\displaystyle k\times k} principal submatrix of D B {\displaystyle D_{B}} . Let W 1 = P ^ D B ^ − 1 / 2 {\displaystyle W_{1}={\hat {P}}{\hat {D_{B}}}^{-1/2}} and hence: W 1 T B W 1 = ( P ^ D B ^ − 1 / 2 ) T B ( P ^ D B ^ − 1 / 2 ) = I {\displaystyle W_{1}^{T}BW_{1}=({\hat {P}}{\hat {D_{B}}}^{-1/2})^{T}B({\hat {P}}{\hat {D_{B}}}^{-1/2})=I} . W 1 {\displaystyle W_{1}} whiten B {\displaystyle B} and reduces the dimensionality from m {\displaystyle m} to k {\displaystyle k} . The transformed space resided by q ′ = W 1 T X T s {\displaystyle q'=W_{1}^{T}X^{T}s} is called the noise space. Then, we diagonalize W 1 T A W 1 {\displaystyle W_{1}^{T}AW_{1}} : W 2 T W 1 T A W 1 W 2 = D A {\displaystyle W_{2}^{T}W_{1}^{T}AW_{1}W_{2}=D_{A}} , where W 2 T W 2 = I {\displaystyle W_{2}^{T}W_{2}=I} . D A {\displaystyle D_{A}} is the matrix with eigenvalues of W 1 T A W 1 {\displaystyle W_{1}^{T}AW_{1}} on its diagonal. We may retain all the eigenvalues and their corresponding eigenvectors since most of the noise are already discarded in previous step. Finally the transformation is given by: W = W 1 W 2 {\displaystyle W=W_{1}W_{2}} where W {\displaystyle W} diagonalizes both the numerator and denominator of the SNR, W T A W = D A {\displaystyle W^{T}AW=D_{A}} , W T B W = I {\displaystyle W^{T}BW=I} and the transformation of signal s {\displaystyle s} is defined as q = W T X T s = W 2 T W 1 T X T s {\displaystyle q=W^{T}X^{T}s=W_{2}^{T}W_{1}^{T}X^{T}s} . === Information loss === To find the information loss when we discard some of the eigenvalues and eigenvectors we can perform following analysis: η = 1 − t r a c e ( W 1 T A W 1 ) t r a c e ( D B − 1 / 2 P T A P D B − 1 / 2 ) = 1 − t r a c e ( D B ^ − 1 / 2 P ^ T A P ^ D B ^ − 1 / 2 ) t r a c e ( D B − 1 / 2 P T A P D B − 1 / 2 ) {\displaystyle {\begin{array}{lll}\eta &=&

    Read more →
  • Data-driven model

    Data-driven model

    Data-driven models are a class of computational models that primarily rely on historical data collected throughout a system's or process' lifetime to establish relationships between input, internal, and output variables. Commonly found in numerous articles and publications, data-driven models have evolved from earlier statistical models, overcoming limitations posed by strict assumptions about probability distributions. These models have gained prominence across various fields, particularly in the era of big data, artificial intelligence, and machine learning, where they offer valuable insights and predictions based on the available data. == Background == These models have evolved from earlier statistical models, which were based on certain assumptions about probability distributions that often proved to be overly restrictive. The emergence of data-driven models in the 1950s and 1960s coincided with the development of digital computers, advancements in artificial intelligence research, and the introduction of new approaches in non-behavioural modelling, such as pattern recognition and automatic classification. == Key Concepts == Data-driven models encompass a wide range of techniques and methodologies that aim to intelligently process and analyse large datasets. Examples include fuzzy logic, fuzzy and rough sets for handling uncertainty, neural networks for approximating functions, global optimization and evolutionary computing, statistical learning theory, and Bayesian methods. These models have found applications in various fields, including economics, customer relations management, financial services, medicine, and the military, among others. Machine learning, a subfield of artificial intelligence, is closely related to data-driven modelling as it also focuses on using historical data to create models that can make predictions and identify patterns. In fact, many data-driven models incorporate machine learning techniques, such as regression, classification, and clustering algorithms, to process and analyse data. In recent years, the concept of data-driven models has gained considerable attention in the field of water resources, with numerous applications, academic courses, and scientific publications using the term as a generalization for models that rely on data rather than physics. This classification has been featured in various publications and has even spurred the development of hybrid models in the past decade. Hybrid models attempt to quantify the degree of physically based information used in hydrological models and determine whether the process of building the model is primarily driven by physics or purely data-based. As a result, data-driven models have become an essential topic of discussion and exploration within water resources management and research. The term "data-driven modelling" (DDM) refers to the overarching paradigm of using historical data in conjunction with advanced computational techniques, including machine learning and artificial intelligence, to create models that can reveal underlying trends, patterns, and, in some cases, make predictions Data-driven models can be built with or without detailed knowledge of the underlying processes governing the system behavior, which makes them particularly useful when such knowledge is missing or fragmented.

    Read more →
  • Automated essay scoring

    Automated essay scoring

    Automated essay scoring (AES) is the use of specialized computer programs to assign grades to essays written in an educational setting. It is a form of educational assessment and an application of natural language processing. Its objective is to classify a large set of textual entities into a small number of discrete categories, corresponding to the possible grades, for example, the numbers 1 to 6. Therefore, it can be considered a problem of statistical classification. Several factors have contributed to a growing interest in AES. Among them are cost, accountability, standards, and technology. Rising education costs have led to pressure to hold the educational system accountable for results by imposing standards. The advance of information technology promises to measure educational achievement at reduced cost. The use of AES for high-stakes testing in education has generated significant backlash, with opponents pointing to research that computers cannot yet grade writing accurately and arguing that their use for such purposes promotes teaching writing in reductive ways (i.e. teaching to the test). == History == Most historical summaries of AES trace the origins of the field to the work of Ellis Batten Page. In 1966, he argued for the possibility of scoring essays by computer, and in 1968 he published his successful work with a program called Project Essay Grade (PEG). Using the technology of that time, computerized essay scoring would not have been cost-effective, so Page abated his efforts for about two decades. Eventually, Page sold PEG to Measurement Incorporated. By 1990, desktop computers had become so powerful and so widespread that AES was a practical possibility. As early as 1982, a UNIX program called Writer's Workbench was able to offer punctuation, spelling and grammar advice. In collaboration with several companies (notably Educational Testing Service), Page updated PEG and ran some successful trials in the early 1990s. Peter Foltz and Thomas Landauer developed a system using a scoring engine called the Intelligent Essay Assessor (IEA). IEA was first used to score essays in 1997 for their undergraduate courses. It is now a product from Pearson Educational Technologies and used for scoring within a number of commercial products and state and national exams. IntelliMetric is Vantage Learning's AES engine. Its development began in 1996. It was first used commercially to score essays in 1998. Educational Testing Service offers "e-rater", an automated essay scoring program. It was first used commercially in February 1999. Jill Burstein was the team leader in its development. ETS's Criterion Online Writing Evaluation Service uses the e-rater engine to provide both scores and targeted feedback. Lawrence Rudner has done some work with Bayesian scoring, and developed a system called BETSY (Bayesian Essay Test Scoring sYstem). Some of his results have been published in print or online, but no commercial system incorporates BETSY as yet. Under the leadership of Howard Mitzel and Sue Lottridge, Pacific Metrics developed a constructed response automated scoring engine, CRASE. Currently utilized by several state departments of education and in a U.S. Department of Education-funded Enhanced Assessment Grant, Pacific Metrics’ technology has been used in large-scale formative and summative assessment environments since 2007. Measurement Inc. acquired the rights to PEG in 2002 and has continued to develop it. In 2012, the Hewlett Foundation sponsored a competition on Kaggle called the Automated Student Assessment Prize (ASAP). 201 challenge participants attempted to predict, using AES, the scores that human raters would give to thousands of essays written to eight different prompts. The intent was to demonstrate that AES can be as reliable as human raters, or more so. The competition also hosted a separate demonstration among nine AES vendors on a subset of the ASAP data. Although the investigators reported that the automated essay scoring was as reliable as human scoring, this claim was not substantiated by any statistical tests because some of the vendors required that no such tests be performed as a precondition for their participation. Moreover, the claim that the Hewlett Study demonstrated that AES can be as reliable as human raters has since been strongly contested, including by Randy E. Bennett, the Norman O. Frederiksen Chair in Assessment Innovation at the Educational Testing Service. Some of the major criticisms of the study have been that five of the eight datasets consisted of paragraphs rather than essays, four of the eight data sets were graded by human readers for content only rather than for writing ability, and that rather than measuring human readers and the AES machines against the "true score", the average of the two readers' scores, the study employed an artificial construct, the "resolved score", which in four datasets consisted of the higher of the two human scores if there was a disagreement. This last practice, in particular, gave the machines an unfair advantage by allowing them to round up for these datasets. In 1966, Page hypothesized that, in the future, the computer-based judge will be better correlated with each human judge than the other human judges are. Despite criticizing the applicability of this approach to essay marking in general, this hypothesis was supported for marking free text answers to short questions, such as those typical of the British GCSE system. Results of supervised learning demonstrate that the automatic systems perform well when marking by different human teachers is in good agreement. Unsupervised clustering of answers showed that excellent papers and weak papers formed well-defined clusters, and the automated marking rule for these clusters worked well, whereas marks given by human teachers for the third cluster ('mixed') can be controversial, and the reliability of any assessment of works from the 'mixed' cluster can often be questioned (both human and computer-based). == Different dimensions of essay quality == According to a recent survey, modern AES systems try to score different dimensions of an essay's quality in order to provide feedback to users. These dimensions include the following items: Grammaticality: following grammar rules Usage: using of prepositions, word usage Mechanics: following rules for spelling, punctuation, capitalization Style: word choice, sentence structure variety Relevance: how relevant of the content to the prompt Organization: how well the essay is structured Development: development of ideas with examples Cohesion: appropriate use of transition phrases Coherence: appropriate transitions between ideas Thesis Clarity: clarity of the thesis Persuasiveness: convincingness of the major argument == Procedure == From the beginning, the basic procedure for AES has been to start with a training set of essays that have been carefully hand-scored. The program evaluates surface features of the text of each essay, such as the total number of words, the number of subordinate clauses, or the ratio of uppercase to lowercase letters—quantities that can be measured without any human insight. It then constructs a mathematical model that relates these quantities to the scores that the essays received. The same model is then applied to calculate scores of new essays. Recently, one such mathematical model was created by Isaac Persing and Vincent Ng. which not only evaluates essays on the above features, but also on their argument strength. It evaluates various features of the essay, such as the agreement level of the author and reasons for the same, adherence to the prompt's topic, locations of argument components (major claim, claim, premise), errors in the arguments, cohesion in the arguments among various other features. In contrast to the other models mentioned above, this model is closer in duplicating human insight while grading essays. Due to the growing popularity of deep neural networks, deep learning approaches have been adopted for automated essay scoring, generally obtaining superior results, often surpassing inter-human agreement levels. The various AES programs differ in what specific surface features they measure, how many essays are required in the training set, and most significantly in the mathematical modeling technique. Early attempts used linear regression. Modern systems may use linear regression or other machine learning techniques often in combination with other statistical techniques such as latent semantic analysis and Bayesian inference. The automated essay scoring task has also been studied in the cross-domain setting using machine learning models, where the models are trained on essays written for one prompt (topic) and tested on essays written for another prompt. Successful approaches in the cross-domain scenario are based on deep neural networks or models that combine deep and shallow features. == Criteria for success == Any method of a

    Read more →
  • Immuni

    Immuni

    Immuni was an open-source COVID-19 contact tracing app used for digital contact tracing in Italy, dismissed on 31 December 2022, after a long and debated criticism for having been a failure due to the lack of trust placed by citizens. Immuni COVID-19 contact-tracing app had in fact been downloaded only by 12% of Italians between 14 and 75 years old (the government had previously stated that, in order for the app to work properly, it should have been downloaded by at least 60% of Italians). It makes use of the Apple/Google Exposure Notification system. == Development == It was developed by Bending Spoons and released by the Italian Ministry of Health on 1 June 2020. After a testing phase in 4 Italian regions (Abruzzo, Apulia, Liguria, Marche), the app started being active in the whole country on 15 June 2020. The app was initially released on App Store and Google Play, and since 1 February 2021 it is available on the Huawei AppGallery as well. === Source code === The source code was published on GitHub on the 25 May. The app only works in Italy, but compatibility with other European contact tracing apps was a goal. Since 19 October 2020 the app supports key-exchanges with the EU Interoperability Gateway and is therefore able to communicate with contact tracing apps of other EU countries. == Shutdown == As of 16 December 2020, the app was downloaded more than 10 million times, a number which increased to 21.882.502 downloads the day before the app's shutdown. On 27 December 2022 the Italian Ministry of Health announced that the app and its infrastructures will be dismissed on the 31 December of the same year.

    Read more →
  • Randomized Hough transform

    Randomized Hough transform

    Hough transforms are techniques for object detection, a critical step in many implementations of computer vision, or data mining from images. Specifically, the Randomized Hough transform is a probabilistic variant to the classical Hough transform, and is commonly used to detect curves (straight line, circle, ellipse, etc.) The basic idea of Hough transform (HT) is to implement a voting procedure for all potential curves in the image, and at the termination of the algorithm, curves that do exist in the image will have relatively high voting scores. Randomized Hough transform (RHT) is different from HT in that it tries to avoid conducting the computationally expensive voting process for every nonzero pixel in the image by taking advantage of the geometric properties of analytical curves, and thus improve the time efficiency and reduce the storage requirement of the original algorithm. == Motivation == Although Hough transform (HT) has been widely used in curve detection, it has two major drawbacks: First, for each nonzero pixel in the image, the parameters for the existing curve and redundant ones are both accumulated during the voting procedure. Second, the accumulator array (or Hough space) is predefined in a heuristic way. The more accuracy needed, the higher parameter resolution should be defined. These two needs usually result in a large storage requirement and low speed for real applications. Therefore, RHT was brought up to tackle this problem. == Implementation == In comparison with HT, RHT takes advantage of the fact that some analytical curves can be fully determined by a certain number of points on the curve. For example, a straight line can be determined by two points, and an ellipse (or a circle) can be determined by three points. The case of ellipse detection can be used to illustrate the basic idea of RHT. The whole process generally consists of three steps: Fit ellipses with randomly selected points. Update the accumulator array and corresponding scores. Output the ellipses with scores higher than some predefined threshold. === Ellipse fitting === One general equation for defining ellipses is: a ( x − p ) 2 + 2 b ( x − p ) ( y − q ) + c ( y − q ) 2 = 1 {\displaystyle a(x-p)^{2}+2b(x-p)(y-q)+c(y-q)^{2}=1} with restriction: a c − b 2 > 0 {\displaystyle ac-b^{2}>0} However, an ellipse can be fully determined if one knows three points on it and the tangents in these points. RHT starts by randomly selecting three points on the ellipse. Let them be X 1 {\displaystyle X_{1}} , X 2 {\displaystyle X_{2}} and X 3 {\displaystyle X_{3}} . The first step is to find the tangents of these three points. They can be found by fitting a straight line using least squares technique for a small window of neighboring pixels. The next step is to find the intersection points of the tangent lines. This can be easily done by solving the line equations found in the previous step. Then let the intersection points be T 12 {\displaystyle T_{12}} and T 23 {\displaystyle T_{23}} , the midpoints of line segments X 1 X 2 {\displaystyle X_{1}X_{2}} and X 2 X 3 {\displaystyle X_{2}X_{3}} be M 12 {\displaystyle M_{12}} and M 23 {\displaystyle M_{23}} . Then the center of the ellipse will lie in the intersection of T 12 M 12 {\displaystyle T_{12}M_{12}} and T 23 M 23 {\displaystyle T_{23}M_{23}} . Again, the coordinates of the intersected point can be determined by solving line equations and the detailed process is skipped here for conciseness. Let the coordinates of ellipse center found in previous step be ( x 0 , y 0 ) {\displaystyle (x_{0},y_{0})} . Then the center can be translated to the origin with x ′ = x − x 0 {\displaystyle x'=x-x_{0}} and y ′ = y − y 0 {\displaystyle y'=y-y_{0}} so that the ellipse equation can be simplified to: a x ′ 2 + 2 b x ′ y ′ + c y ′ 2 = 1 {\displaystyle ax'^{2}+2bx'y'+cy'^{2}=1} Now we can solve for the rest of ellipse parameters: a {\displaystyle a} , b {\displaystyle b} and c {\displaystyle c} by substituting the coordinates of X 1 {\displaystyle X_{1}} , X 2 {\displaystyle X_{2}} and X 3 {\displaystyle X_{3}} into the equation above. === Accumulating === With the ellipse parameters determined from previous stage, the accumulator array can be updated correspondingly. Different from classical Hough transform, RHT does not keep "grid of buckets" as the accumulator array. Rather, it first calculates the similarities between the newly detected ellipse and the ones already stored in accumulator array. Different metrics can be used to calculate the similarity. As long as the similarity exceeds some predefined threshold, replace the one in the accumulator with the average of both ellipses and add 1 to its score. Otherwise, initialize this ellipse to an empty position in the accumulator and assign a score of 1. === Termination === Once the score of one candidate ellipse exceeds the threshold, it is determined as existing in the image (in other words, this ellipse is detected), and should be removed from the image and accumulator array so that the algorithm can detect other potential ellipses faster. The algorithm terminates when the number of iterations reaches a maximum limit or all the ellipses have been detected. Pseudo code for RHT: while (we find ellipses AND not reached the maximum epoch) { for (a fixed number of iterations) { Find a potential ellipse. if (the ellipse is similar to an ellipse in the accumulator) then Replace the one in the accumulator with the average of two ellipses and add 1 to the score; else Insert the ellipse into an empty position in the accumulator with a score of 1; } Select the ellipse with the best score and save it in a best ellipse table; Eliminate the pixels of the best ellipse from the image; Empty the accumulator; }

    Read more →
  • Aikuma

    Aikuma

    Aikuma is an Android app for collecting speech recordings with time-aligned translations. The app includes a text-free interface for consecutive interpretation, designed for users who are not literate. The Aikuma won Grand Prize in the Open Source Software World Challenge (2013). == Name == Aikuma means "meeting place" in Usarufa, a Papuan language where this software was first used in 2012. == History == Aikuma was developed with sponsorship from the National Science Foundation, including a $101,501 (US) project, "to use mobile telephones to collect larger amounts of data on undocumented endangered languages than would never be possible through usual fieldwork." Aikuma and its modified version (Lig-Aikuma) have been used for collecting substantial quantities of audio in remote indigenous villages. A modified version of the app, called Lig-Aikuma, has been developed at the Université Grenoble Alpes (LIG laboratory) and implements new features such as elicitation of speech from text, images and videos. == Similar Software == Lingua Libre is an online collaborative project and tool by the Wikimedia France association, which can be used as a tool for Language Preservation. Lingua Libre enables to record words, phrases, or sentences of any language, oral (audio recording) or signed (video recording). It is a highly efficient method to record endangered languages since up to 1000 words can be recorded per hour. All the content is under Free License, and speakers of minority languages are encouraged to record their own dialects.

    Read more →
  • Elastix (image registration)

    Elastix (image registration)

    Elastix is an image registration toolbox built upon the Insight Segmentation and Registration Toolkit (ITK). It is entirely open-source and provides a wide range of algorithms employed in image registration problems. Its components are designed to be modular to ease a fast and reliable creation of various registration pipelines tailored for case-specific applications. It was first developed by Stefan Klein and Marius Staring under the supervision of Josien P.W. Pluim at Image Sciences Institute (ISI). Its first version was command-line based, allowing the final user to employ scripts to automatically process big data-sets and deploy multiple registration pipelines with few lines of code. Nowadays, to further widen its audience, a version called SimpleElastix is also available, developed by Kasper Marstal, which allows the integration of elastix with high level languages, such as Python, Java, and R. == Image registration fundamentals == Image registration is a well-known technique in digital image processing that searches for the geometric transformation that, applied to a moving image, obtains a one-to-one map with a target image. Generally, the images acquired from different sensors (multimodal), time instants (multitemporal), and points of view (multiview) should be correctly aligned to proceed with further processing and feature extraction. Even though there are a plethora of different approaches to image registration, the majority is composed of the same macro building blocks, namely the transformation, the interpolator, the metric, and the optimizer. Registering two or more images can be framed as an optimization problem that requires multiple iterations to converge to the best solution. Starting from an initial transformation computed from the image moments the optimization process searches for the best transformation parameters based on the value of the selected similarity metric. The figure on the right shows the high-level representation of the registration of two images, where the reference remains constant during the entire process, while the moving one will be transformed according to the transformation parameters. In other words, the registration ends when the similarity metric, which is a mathematical function with a certain number of parameters to be optimized, reaches the optimal value which is highly dependent on the specific application. == Main building blocks == Following the structure of the image registration workflow, the elastix toolbox proposes a modular solution that implements for each of the building blocks different algorithms, highly employed in medical image registration, and helps the final users to build their specific pipeline by selecting the most suitable algorithm for each of the main building blocks. Each block is easily configurable both by selecting pre-defined initialization values or by trying multiple sets of parameters and then choosing the most performing one. The registration is performed on images, and the elastix toolbox supports all the data formats supported by ITK, ranging from JPEG and PNG to medical standard formats such as DICOM and NIFTI. It also stores physical pixel spacing, the origin and the relative position to an external world reference system, when provided in the metadata, to facilitate the registration process, especially in medical field applications. === Transformation === The transformation is an essential building block, since it defines the allowable transformations. In image registration, the main distinction can be done between parallel-to-parallel and parallel-to-non parallel (deformable) line mapping transformations. In the elastix toolbox, the final users can select one transformation or compose more transformations either through addition or via composition. Below are reported the different transformation models in order of increasing flexibility, along with the corresponding elastix class names between brackets. Translation (TranslationTransform) allows only translations Rigid (EulerTransform) expands the translation adding rotations and the object is seen as a rigid body Similarity (SimilarityTransform) expands the rigid transformation by introducing isotropic scaling Affine (AffineTransform) expands the rigid transformation allowing both scaling and shear B-splines (BSplineTransform) is a deformable transformation usually preceded by a rigid or affine one Thin-plate splines (SplineKernelTransform) is a deformable transformation belonging to the class of kernel-based transformations that is a composition of and affine and a non-rigid part === Metric === The similarity metric is the mathematical function whose parameters should be optimized to reach the desired registration, and, during the process, it is computed multiple times. Below are reported the available metrics computed employing the reference and the transformed images and the corresponding elastix class names between brackets. Mean squared difference (AdvancedMeanSquares) to be used for mono-modal applications Normalized correlation coefficient (AdvancedNormalizedCorrelation) to be used for images that have an intensity linear relationship Mutual information (AdvancedMattesMutualInformation) to be used for both mono- and multi-modal applications and optimized to reach better performance compared to the normalized version Normalized mutual information (NormalizedMutualInformation) for both mono- and multi-modal applications Kappa statistic (AdvancedKappaStatistic) to be used only for binary images === Sampler === For the computation of the similarity metrics, it is not always necessary to consider all the voxels and, sometimes, it can be useful to use only a fraction of the voxels of the images, i.e. to reduce the execution time for big input images. Below are reported the available criteria for selecting a fraction of the voxels for the similarity metric computation and the corresponding elastix class names between brackets. Full (Full) to employ all the voxels Grid (Grid) to employ a regular grid defined by the user to downsample the image Random (Random) to randomly select a percentage of voxels defined by the users (all voxels have equal probability to be selected) Random coordinate (RandomCoordinate) like the random criterion, but in this case also off-grid positions can be selected to simplify the optimization process === Interpolator === After the application of the transformation, it may occur that the voxels used for the similarity metric computation are at non-voxel positions, so intensity interpolation should be performed to ensure the correctness of the computed values. Below are reported the implemented interpolators and the corresponding elastix class names between brackets. Nearest neighbor (NearestNeighborInterpolator) exploits little resources, but gives low quality results Linear (LinearInterpolator) is sufficient in general applications N-th order B-spline (BSplineInterpolator) can be used to increase the order N, increasing quality and computation time. N=0 and N=1 indicate the nearest neighbor and linear cases respectively. === Optimizer === The optimizer defines the strategy employed for searching the best transformation parameter to reach the correct registration, and it is commonly an iterative strategy. Below are reported some of the implemented optimization strategies. Gradient descent Robbins-Monro, similar to the gradient descent, but employing an approximation of the cost function derivatives A wider range of optimizers is also available, such as Quasi-Newton or evolutionary strategies. === Other features === The elastix software also offers other features that can be employed to speed up the registration procedure and to provide more advanced algorithms to the end-users. Some examples are the introduction of blur and Gaussian pyramid to reduce data complexity, and multi-image and multi-metric framework to deal with more complex applications. == Applications == Elastix has applications mainly in the medical field, where image registration is fundamental to get comprehensive information regarding the analysed anatomical region. It is widely employed in image-guided surgery, tumour monitoring, and treatment assessment. For example, in radiotherapy planning, image registration allows to correctly deliver the treatment and evaluate the obtained results. Thanks to the wide range of implemented algorithms, the use of the elastix software allows physicians and researchers to test different registration pipelines from the simplest to more complex ones, and to save the best one as a configuration file. This file and the fact that the software is completely open-source makes it easy to reproduce the work, that can help supporting the open science paradigm, and allows fast reuse on different patients data. In image-guided surgery, registration time and accuracy are critical points, considering that, during the registration, the patient is on the operating table, and the imag

    Read more →
  • Inception (deep learning architecture)

    Inception (deep learning architecture)

    Inception is a family of convolutional neural network (CNN) for computer vision, introduced by researchers at Google in 2014 as GoogLeNet (later renamed Inception v1). The series was historically important as an early CNN that separates the stem (data ingest), body (data processing), and head (prediction), an architectural design that persists in all modern CNN. == Version history == === Inception v1 === In 2014, a team at Google developed the GoogLeNet architecture, an instance of which won the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14). The name came from the LeNet of 1998, since both LeNet and GoogLeNet are CNNs. They also called it "Inception" after a "we need to go deeper" internet meme, a phrase from Inception (2010) the film. Because later, more versions were released, the original Inception architecture was renamed again as "Inception v1". The models and the code were released under Apache 2.0 license on GitHub. The Inception v1 architecture is a deep CNN composed of 22 layers. Most of these layers were "Inception modules". The original paper stated that Inception modules are a "logical culmination" of Network in Network and (Arora et al, 2014). Since Inception v1 is deep, it suffered from the vanishing gradient problem. The team solved it by using two "auxiliary classifiers", which are linear-softmax classifiers inserted at 1/3-deep and 2/3-deep within the network, and the loss function is a weighted sum of all three: L = 0.3 L a u x , 1 + 0.3 L a u x , 2 + L r e a l {\displaystyle L=0.3L_{aux,1}+0.3L_{aux,2}+L_{real}} These were removed after training was complete. This was later solved by the ResNet architecture. The architecture consists of three parts stacked on top of one another: The stem (data ingestion): The first few convolutional layers perform data preprocessing to downscale images to a smaller size. The body (data processing): The next many Inception modules perform the bulk of data processing. The head (prediction): The final fully-connected layer and softmax produces a probability distribution for image classification. This structure is used in most modern CNN architectures. === Inception v2 === Inception v2 was released in 2015, in a paper that is more famous for proposing batch normalization. It had 13.6 million parameters. It improves on Inception v1 by adding batch normalization, and removing dropout and local response normalization which they found became unnecessary when batch normalization is used. === Inception v3 === Inception v3 was released in 2016. It improves on Inception v2 by using factorized convolutions. As an example, a single 5×5 convolution can be factored into 3×3 stacked on top of another 3×3. Both has a receptive field of size 5×5. The 5×5 convolution kernel has 25 parameters, compared to just 18 in the factorized version. Thus, the 5×5 convolution is strictly more powerful than the factorized version. However, this power is not necessarily needed. Empirically, the research team found that factorized convolutions help. It also uses a form of dimension-reduction by concatenating the output from a convolutional layer and a pooling layer. As an example, a tensor of size 35 × 35 × 320 {\displaystyle 35\times 35\times 320} can be downscaled by a convolution with stride 2 to 17 × 17 × 320 {\displaystyle 17\times 17\times 320} , and by maxpooling with pool size 2 × 2 {\displaystyle 2\times 2} to 17 × 17 × 320 {\displaystyle 17\times 17\times 320} . These are then concatenated to 17 × 17 × 640 {\displaystyle 17\times 17\times 640} . Other than this, it also removed the lowest auxiliary classifier during training. They found that the auxiliary head worked as a form of regularization. They also proposed label-smoothing regularization in classification. For an image with label c {\displaystyle c} , instead of making the model to predict the probability distribution δ c = ( 0 , 0 , … , 0 , 1 ⏟ c -th entry , 0 , … , 0 ) {\displaystyle \delta _{c}=(0,0,\dots ,0,\underbrace {1} _{c{\text{-th entry}}},0,\dots ,0)} , they made the model predict the smoothed distribution ( 1 − ϵ ) δ c + ϵ / K {\displaystyle (1-\epsilon )\delta _{c}+\epsilon /K} where K {\displaystyle K} is the total number of classes. === Inception v4 === In 2017, the team released Inception v4, Inception ResNet v1, and Inception ResNet v2. Inception v4 is an incremental update with even more factorized convolutions, and other complications that were empirically found to improve benchmarks. Inception ResNet v1 and v2 are both modifications of Inception v4, where residual connections are added to each Inception module, inspired by the ResNet architecture. === Xception === Xception ("Extreme Inception") was published in 2017. It is a linear stack of depthwise separable convolution layers with residual connections. The design was proposed on the hypothesis that in a CNN, the cross-channels correlations and spatial correlations in the feature maps can be entirely decoupled. Training each network took 3 days on 60 K80 GPUs, or approximately 0.5 petaFLOP-days.

    Read more →