AI Image Generators

Explore the best AI Image Generators — independent reviews, comparisons, pricing and step-by-step how-to guides, curated by Aizhi.

  • Probiv

    Probiv

    Probiv (Russian: пробив, literally "to pierce" or "to punch through") is an illicit data market operating primarily in Russia, where personal information from restricted government and corporate databases is bought and sold through networks of corrupt officials and insiders. The probiv market operates as a parallel information economy built on corrupt officials from various sectors including traffic police, banks, telecommunications companies, and security services who sell access to restricted databases. For fees ranging from as little as $10 to several hundred dollars, buyers can obtain passport numbers, addresses, travel histories, vehicle registrations, and telecommunications records. The market operates through various channels, including specialized Telegram bots and darknet forums. == Notable uses == Probiv services have been utilized by diverse actors for various purposes. Investigative journalists have used the market to conduct high-profile investigations, including tracing the FSB unit allegedly behind the poisoning of Alexei Navalny. Russian police and security services themselves have routinely used the black market to track activists and opposition figures. Since Russia's invasion of Ukraine, Ukrainian intelligence services have exploited the market to identify Russian military officials. == Government response == In late 2024, Russian authorities introduced legislation imposing penalties of up to ten years in prison for accessing or distributing leaked data. Several operators of probiv services, including the teams behind Usersbox and Solaris, have been arrested. However, the crackdown appears to have had unintended consequences. Many operators have relocated their businesses abroad, where they operate with fewer constraints. Some services that previously cooperated with Russian authorities have severed those ties and moved staff out of the country.

    Read more →
  • ImageMixer

    ImageMixer

    ImageMixer is a brand name of video editing software that edits digital video and still image in camcorders and authors to VCD and DVD. It is a second-party Japanese product, distributed by Pixela Corporation, a Japanese manufacturer of PC peripheral hardware and multimedia software. == Bundling == ImageMixer is widely used for several camcorder brands, such as JVC, Hitachi and Canon. Also, Sony has chosen to package ImageMixer with its DVD and HDD Handycam. == ImageMixer series == ImageMixer has other series of software for digital camera, such as ImageMixer Label Maker and ImageMixer DVD dubbing. ImageMixer also has movie editing solution for Macintosh. == Windows Vista version of ImageMixer == A Windows Vista version of ImageMixer has been developed (ImageMixer3).

    Read more →
  • Parasolid

    Parasolid

    Parasolid is a geometric modeling kernel originally developed by Shape Data Limited, now owned and developed by Siemens Digital Industries Software. It can be licensed by other companies for use in their 3D computer graphics software products. Parasolid's abilities include model creation and editing utilities such as Boolean modeling operators, feature modeling support, advanced surfacing, thickening and hollowing, blending and filleting, and sheet modeling. It also incorporates modeling with mesh surfaces and lattices. Parasolid also includes tools for direct model editing, including tapering, offsetting, geometry replacement and removing feature details with automated regeneration of surrounding data. Parasolid also provides wide-ranging graphical and rendering support, including hidden-line, wireframe and drafting, tessellation, and model data inquiries. To use Parasolid effectively, software developers need knowledge of CAD in general, computational geometry, and topology. Parasolid is available for Windows (32-bit, 64-bit and AArch64), Linux (64-bit and AArch64), macOS (Apple silicon and Intel), iOS, and Android. == Parasolid XT format == Parasolid parts are normally saved in XT format, which usually has the file extension .X_T. The format is documented and open. There is also a binary version of the format, usually with an .X_B extension, which is somewhat more compact. Both .X_T and .X_B are used for parts files. == Applications == It is used in many computer-aided design (CAD), computer-aided manufacturing (CAM), computer-aided engineering (CAE), product visualization, and CAD data exchange packages. Notable uses include:

    Read more →
  • Vujak

    Vujak

    VuJak is an early video sampler, a VJ remix and mashup tool created in 1992 by Brian Kane, Lisa Eisenpresser, and Jay Haynes. The original name of the project was Mideo, but it was later changed to VuJak. VuJak was based on MIDI control of video in real-time. It was created with MAX from Opcode Systems, and utilized the newly released QuickTime 1.0 movie object. The first working version of the program was built on a Mac IIfx with 8 megs of ram, and could jump in real-time across a 160 x 120 pixel QuickTime movie via a midi keyboard. Later versions could manipulate full screen video, included the first real-time video scratch feature, had looping, vari-speed, and random play features, and allowed for recording and editing of video sequences within the application. VuJak also had networking capabilities which allowed artists to "jam" in real time across standard phone lines. The first public exhibition of VuJak was at the Digital Hollywood conference in Beverly Hills in 1993, where it was promoted by Timothy Leary. VuJak was featured in Mondo 2000, CBS Evening News, Wired Magazine, Electronic Musician, Billboard Magazine, The Hollywood Reporter, and it was used to create promotional videos for MTV. In 1994, VuJak was a featured interactive exhibition at the Exploratorium in San Francisco. Development of VuJak ceased in 1995.

    Read more →
  • Autonomous aircraft

    Autonomous aircraft

    An autonomous aircraft is an aircraft which flies under the control of on-board autonomous robotic systems and needs no intervention from a human pilot or remote control. Most contemporary autonomous aircraft are unmanned aerial vehicles (drones) with pre-programmed algorithms to perform designated tasks, but advancements in artificial intelligence technologies (e.g. machine learning) mean that autonomous control systems are reaching a point where several air taxis and associated regulatory regimes are being developed. == History == === Unmanned aerial vehicles === The earliest recorded use of an unmanned aerial vehicle for warfighting occurred in July 1849, serving as a balloon carrier (the precursor to the aircraft carrier) Significant development of radio-controlled drones started in the early 1900s, and originally focused on providing practice targets for training military personnel. The earliest attempt at a powered UAV was A. M. Low's "Aerial Target" in 1916. Autonomous features such as the autopilot and automated navigation were developed progressively through the twentieth century, although techniques such as terrain contour matching (TERCOM) were applied mainly to cruise missiles. Before the introduction of the Bayraktar Kızılelma some modern drones have a high degree of autonomy, although they were not fully capable and the regulatory environment prohibits their widespread use in civil aviation. However some limited trials had been undertaken. On December 17, 2025, two Bayraktar Kızılelma performed the world's first autonomous close-formation flight by two unmanned fighter jets, using artificial intelligence. This was the first time in the history of aviation when two unmanned aerial vehicles flew in close formation on their own. === Passengers === As flight, navigation and communications systems have become more sophisticated, safely carrying passengers has emerged as a practical possibility. Autopilot systems are relieving the human pilot of progressively more duties, but the pilot currently remains necessary. A number of air taxis are under development and larger autonomous transports are also being planned. The personal air vehicle is another class where from one to four passengers are not expected to be able to pilot the aircraft and autonomy is seen as necessary for widespread adoption. == Control system architecture == The computing capability of aircraft flight and navigation systems followed the advances of computing technology, beginning with analog controls and evolving into microcontrollers, then system-on-a-chip (SOC) and single-board computers (SBC). === Sensors === Position and movement sensors give information about the aircraft state. Exteroceptive sensors deal with external information like distance measurements, while proprioceptive ones correlate internal and external states. Degrees of freedom (DOF) refers to both the amount and quality of sensors on board: 6 DOF implies 3-axis gyroscopes and accelerometers (a typical inertial measurement unit – IMU), 9 DOF refers to an IMU plus a compass, 10 DOF adds a barometer and 11 DOF usually adds a GPS receiver. === Actuators === UAV actuators include digital electronic speed controllers (which control the RPM of the motors) linked to motors/engines and propellers, servomotors (for planes and helicopters mostly), weapons, payload actuators, LEDs and speakers. === Software === UAV software called the flight stack or autopilot. The purpose of the flight stack is to obtain data from sensors, control motors to ensure UAV stability, and facilitate ground control and mission planning communication. UAVs are real-time systems that require rapid response to changing sensor data. As a result, UAVs rely on single-board computers for their computational needs. Examples of such single-board computers include Raspberry Pis, Beagleboards, etc. shielded with NavIO, PXFMini, etc. or designed from scratch such as NuttX, preemptive-RT Linux, Xenomai, Orocos-Robot Operating System or DDS-ROS 2.0. Civil-use open-source stacks include: Due to the open-source nature of UAV software, they can be customized to fit specific applications. For example, researchers from the Technical University of Košice have replaced the default control algorithm of the PX4 autopilot. This flexibility and collaborative effort has led to a large number of different open-source stacks, some of which are forked from others, such as CleanFlight, which is forked from BaseFlight and from which three other stacks are forked from. === Loop principles === UAVs employ open-loop, closed-loop or hybrid control architectures. Open loop – This type provides a positive control signal (faster, slower, left, right, up, down) without incorporating feedback from sensor data. Closed loop – This type incorporates sensor feedback to adjust behavior (reduce speed to reflect tailwind, move to altitude 300 feet). The PID controller is common. Sometimes, feedforward is employed, transferring the need to close the loop further. == Communications == Most UAVs use a radio for remote control and exchange of video and other data. Early UAVs had only narrowband uplink. Downlinks came later. These bi-directional narrowband radio links carried command and control (C&C) and telemetry data about the status of aircraft systems to the remote operator. For very long range flights, military UAVs also use satellite receivers as part of satellite navigation systems. In cases when video transmission was required, the UAVs will implement a separate analog video radio link. In most modern autonomous applications, video transmission is required. A broadband link is used to carry all types of data on a single radio link. These broadband links can leverage quality of service techniques to optimize the C&C traffic for low latency. Usually, these broadband links carry TCP/IP traffic that can be routed over the Internet. Communications can be established with: Ground control – a military ground control station (GCS). The MAVLink protocol is increasingly becoming popular to carry command and control data between the ground control and the vehicle. Remote network system, such as satellite duplex data links for some military powers. Downstream digital video over mobile networks has also entered consumer markets, while direct UAV control uplink over the cellular mesh and LTE have been demonstrated and are in trials. Another aircraft, serving as a relay or mobile control station – military manned-unmanned teaming (MUM-T). As mobile networks have increased in performance and reliability over the years, drones have begun to use mobile networks for communication. Mobile networks can be used for drone tracking, remote piloting, over the air updates, and cloud computing. Modern networking standards have explicitly considered autonomous aircraft and therefore include optimizations. The 5G standard has mandated reduced user plane latency to 1ms while using ultra-reliable and low-latency communications. == Autonomy == Basic autonomy comes from proprioceptive sensors. Advanced autonomy calls for situational awareness, knowledge about the environment surrounding the aircraft from exteroceptive sensors: sensor fusion integrates information from multiple sensors. Civil aviation regulators and standards bodies have published high-level roadmaps and discussion papers focused on assurance, safety and governance of AI-enabled systems in aviation, particularly as autonomy increases in operations and decision support. === Basic principles === One way to achieve autonomous control employs multiple control-loop layers, as in hierarchical control systems. As of 2016 the low-layer loops (i.e. for flight control) tick as fast as 32,000 times per second, while higher-level loops may cycle once per second. The principle is to decompose the aircraft's behavior into manageable "chunks", or states, with known transitions. Hierarchical control system types range from simple scripts to finite state machines, behavior trees and hierarchical task planners. The most common control mechanism used in these layers is the PID controller which can be used to achieve hover for a quadcopter by using data from the IMU to calculate precise inputs for the electronic speed controllers and motors. Examples of mid-layer algorithms: Path planning: determining an optimal path for vehicle to follow while meeting mission objectives and constraints, such as obstacles or fuel requirements Trajectory generation (motion planning): determining control maneuvers to take in order to follow a given path or to go from one location to another Trajectory regulation: constraining a vehicle within some tolerance to a trajectory Evolved UAV hierarchical task planners use methods like state tree searches or genetic algorithms. === Autonomy features === UAV manufacturers often build in specific autonomous operations, such as: Self-level: attitude stabilization on the pitch and roll axes. Altitude hold: The aircraft maint

    Read more →
  • Transcription software

    Transcription software

    Transcription software assists in the conversion of human speech into a text transcript. Audio or video files can be transcribed manually or automatically. Transcriptionists can replay a recording several times in a transcription editor and type what they hear. By using transcription hot keys, the manual transcription can be accelerated, the sound filtered, equalized or have the tempo adjusted when the clarity is not great. With speech recognition technology, transcriptionists can automatically convert recordings to text transcripts by opening recordings in a PC and uploading them to a cloud for automatic transcription, or transcribe recordings in real-time by using digital dictation. Depending on quality of recordings, machine generated transcripts may still need to be manually verified. The accuracy rate of the automatic transcription depends on several factors such as background noises, speakers' distance to the microphone, and accents. Transcription software, as with transcription services, is often used for business, legal, or medical purposes. Compared with audio content, a text transcript is searchable, takes up less computer memory, and can be used as an alternate method of communication, such as for subtitles and closed captions. Some clinical environments also use digital tools to support transcription workflows, including ambient documentation systems that employ Speech recognition to capture portions of clinical encounters and generate draft notes for later review. These tools are typically used alongside conventional transcription methods. The definition of transcription "software", as compared with transcription "service", is that the former is sufficiently automated that a user can run the entire system without engaging outside personnel. New software-as-a-service and cloud computing models use artificial intelligence, machine learning and natural language processing to convert speech to text and continuously learn new phrases and accents. AI transcription can, however, lead to hallucinations and other errors. == Development == Research at Google released a free android app Google Live Transcribe, it runs on Google Cloud. Google Chrome developed and has an available built in English Live Caption. Google Docs, Google Translate, Google Assistant, GBoard Google Text to Speech engine support transcription tool too. OpenAI launched Whisper, an open-source speech recognition deep learning model in September 2022. In 2024, an AI-powered transcription platform, Transkriptor, was launched, enabling the automatic conversion of audio and video recordings into text using speech recognition technology, with support for transcription in 100 languages and processing of content uploaded via a web interface as well as mobile and browser extensions. It is part of the Tor.app suite of AI-based language processing tools.

    Read more →
  • Alias Eclipse

    Alias Eclipse

    Eclipse was a professional 2D image editing program available on Silicon Graphics and Windows workstations. Designed to manipulate high-resolution images like digitized movie frames and photographs for print, it offered color correction tools, image processing effects, rudimentary paint features, and spline-based drawing and masking. == History == Eclipse was originally developed in the late 1980s by Full Color Computing, an early provider of photo retouch and color prepress software for Silicon Graphics workstations. Alias Research (later Alias Systems Corporation), a developer of professional 3D graphics applications for the SGI platform, purchased the rights to Eclipse in fall 1990. Alias developed Eclipse through the early to mid-1990s, releasing version 2.5 in 1995 with improvements to the speed of color correction, effects, and rendering. Xyvision's Contex Prepress division purchased exclusive rights to Eclipse from Alias in 1996, and released version 3.0 the following year. Eclipse was subsequently sold to German developer Form & Vision GmbH, which continued development and ported it to the Windows platform. In 1999, Form & Vision released a demo of Eclipse 3.1.3 on the SGI platform which was limited to 1600 x 1600 pixel images, then ceased development of Eclipse on the SGI platform. Eclipse was thereafter developed exclusively for the Windows platform, culminating with version 3.1.4 in 2001. In the same year the firm went bankrupt. == Features == Eclipse was designed to work with very large images that could not be manipulated in real time on contemporary computer systems due to memory limitations, and thus allowed the user to make modifications to a lower-resolution copy of the original image in "proxy mode." Brush strokes, color corrections, and other edits were saved in proxy mode, then applied to the full-size image in post processing. This method also allowed for batch processing of a high-resolution image sequence using the edits applied to the original proxy image. Other features included color correction and separation, warping, special effects, text, and shape masking. Wavelet image compression created by LuraTech was added to Eclipse 3.1.4

    Read more →
  • Act! LLC

    Act! LLC

    ACT! (previously known as Activity Control Technology, Automated Contact Tracking, ACT! by Sage, and Sage ACT!) is a customer relationship management and marketing automation software platform designed for small and medium-sized businesses. It has over 2.8 million registered users as of December 2014. == History == The company Conductor Software was founded in 1986, in Dallas, Texas, by Pat Sullivan and Mike Muhney. The original name for the software was Activity Control Technology; it was renamed to Automated Contact Tracking, later abbreviated to ACT. The name of the company was subsequently changed to Contact Software International and it was sold in 1993 to Symantec Corporation, who in 1999 then sold it to SalesLogix. The Sage Group purchased Interact Commerce (formerly SalesLogix) in 2001 through Best Software, then its North American software division. Swiftpage acquired it in 2013. Beginning with the 2006 version, the name was styled ACT! by Sage, and in 2010 revised to Sage ACT!. Following its 2013 acquisition by Swiftpage, it was renamed to ACT! Swiftpage. In May 2018, ACT! was sold to SFW Advisors. In December 2018, Kuvana, a marketing automation software solution, was acquired by SFW and merged with ACT! This add-on is now a complementary service to the core CRM solution. In December 2019, ACT! hired Steve Oriola as chairman and CEO. In 2020, Swiftpage changed its company name to ACT!. In March 2023, ACT! hired Bruce Reading as President and CEO. == Software == ACT! features include contact, company and opportunity management, a calendar, marketing automation and e-marketing tools, reports, interactive dashboards with graphical visualizations, and the ability to track prospective customers. ACT! integrates with Microsoft Word, Excel, Outlook, Google Contacts, Gmail, and other applications via Zapier. For custom integrations, ACT! has an in-built API. ACT! can be accessed from Windows desktops (Win7 and later) with local or network shared database; synchronized to laptops or remote officers; Citrix or Remote Desktop; Web browsers (Premium only) with self or SaaS hosting; smartphones and tablets via HTML5 Web (Premium only); smartphones and tablets via sync with Handheld Contact.

    Read more →
  • AI alignment

    AI alignment

    In the field of artificial intelligence (AI), alignment aims to steer AI systems toward a person's or group's intended goals, preferences, or ethical principles. An AI system is considered aligned if it advances the intended objectives. A misaligned AI system pursues unintended objectives. It is often difficult for AI designers to specify the full range of desired and undesired behaviors. Therefore, the designers often use simpler proxy goals, such as gaining human approval. But proxy goals can overlook necessary constraints or reward the AI system for merely appearing aligned. AI systems may also find loopholes that allow them to accomplish their proxy goals efficiently but in unintended, sometimes harmful, ways (reward hacking). Advanced AI systems may develop unwanted instrumental strategies, such as seeking power or self-preservation because such strategies help them achieve their assigned final goals. Furthermore, they might develop undesirable emergent goals that could be hard to detect before the system is deployed and encounters new situations and data distributions. Empirical research showed in 2024 that advanced large language models (LLMs) such as OpenAI o1 or Claude 3 sometimes engage in strategic deception to achieve their goals or prevent them from being changed. Some of these issues affect existing commercial systems such as LLMs, robots, autonomous vehicles, and social media recommendation engines. Some AI researchers argue that more capable future systems will be more severely affected because these problems partially result from high capabilities. Many prominent AI researchers and AI company leaders have argued or asserted that AI is approaching human-like (AGI) and superhuman cognitive capabilities (ASI), and could endanger human civilization if misaligned. These include "AI godfathers" Geoffrey Hinton and Yoshua Bengio and the CEOs of OpenAI, Anthropic, and Google DeepMind. These risks remain debated. AI alignment is a subfield of AI safety, the study of how to build safe AI systems. Other subfields of AI safety include robustness, monitoring, and capability control. Research challenges in alignment include instilling complex values in AI, developing honest AI, scalable oversight, auditing and interpreting AI models, and preventing emergent AI behaviors like power-seeking. Alignment research has connections to interpretability research, (adversarial) robustness, anomaly detection, calibrated uncertainty, formal verification, preference learning, safety-critical engineering, game theory, algorithmic fairness, and social sciences. == Objectives in AI == Programmers provide an AI system such as AlphaZero with an "objective function", in which they intend to encapsulate the goal(s) the AI is configured to accomplish. Such a system later populates a (possibly implicit) internal "model" of its environment. This model encapsulates all the agent's beliefs about the world. The AI then creates and executes whatever plan is calculated to maximize the value of its objective function. For example, when AlphaZero is trained on chess, it has a simple objective function of "+1 if AlphaZero wins, −1 if AlphaZero loses". During the game, AlphaZero attempts to execute whatever sequence of moves it judges most likely to attain the maximum value of +1. Similarly, a reinforcement learning system can have a "reward function" that allows the programmers to shape the AI's desired behavior. An evolutionary algorithm's behavior is shaped by a "fitness function". == Alignment problem == In 1960, AI pioneer Norbert Wiener described the AI alignment problem as follows: If we use, to achieve our purposes, a mechanical agency with whose operation we cannot interfere effectively [...] we had better be quite sure that the purpose put into the machine is the purpose which we really desire. AI alignment refers to ensuring that an AI system's objectives match some target. The target is variously defined as the goals of the system's designers or users, widely shared values, objective ethical standards, legal requirements, or the intentions its designers would have if they were more informed and enlightened. In democratic AI alignment, the target is the values and preferences of median voters, which increases political legitimacy. AI alignment is an open problem for modern AI systems and is a research field within AI. Aligning AI involves two main challenges: carefully specifying the purpose of the system (outer alignment) and ensuring that the system adopts the specification robustly (inner alignment). Researchers also attempt to create AI models that have robust alignment, sticking to safety constraints even when users adversarially try to bypass them. === Specification gaming and side effects === To specify an AI system's purpose, AI designers typically provide an objective function, examples, or feedback to the system. But designers are often unable to completely specify all important values and constraints, so they resort to easy-to-specify proxy goals such as maximizing the approval of human overseers, who are fallible. As a result, AI systems can find loopholes that help them accomplish the specified objective efficiently but in unintended, possibly harmful ways. This tendency is known as specification gaming or reward hacking, and is an instance of Goodhart's law. As AI systems become more capable, they are often able to game their specifications more effectively. Specification gaming has been observed in numerous AI systems. OpenAI GPT models for programming—including in real-world cases—have been found to explicitly plan hacking the tests used to evaluate them to falsely appear successful (e.g., explicitly stating "let's hack"). When the company penalized this, many models learned to obfuscate their plans while continuing to hack the tests. Another system was trained to finish a simulated boat race by rewarding the system for hitting targets along the track, but the system achieved more reward by looping and crashing into the same targets indefinitely. A 2025 Palisade Research study found that when tasked to win at chess against a stronger opponent, some reasoning LLMs attempted to hack the game system, for example by modifying or entirely deleting their opponent. Some alignment researchers aim to help humans detect specification gaming and steer AI systems toward carefully specified objectives that are safe and useful to pursue. When a misaligned AI system is deployed, it can have consequential side effects. Social media platforms have been known to optimize their recommendation algorithms for click-through rates, causing user addiction on a global scale. Stanford researchers say that such recommender systems are misaligned with their users because they "optimize simple engagement metrics rather than a harder-to-measure combination of societal and consumer well-being". Explaining such side effects, Berkeley computer scientist Stuart J. Russell said that the omission of implicit constraints can cause harm: "A system [...] will often set [...] unconstrained variables to extreme values; if one of those unconstrained variables is actually something we care about, the solution found may be highly undesirable. This is essentially the old story of the genie in the lamp, or the sorcerer's apprentice, or King Midas: you get exactly what you ask for, not what you want." Some researchers suggest that AI designers specify their desired goals by listing forbidden actions or by formalizing ethical rules (as with Asimov's Three Laws of Robotics). But Russell and Norvig argue that this approach overlooks the complexity of human values: "It is certainly very hard, and perhaps impossible, for mere humans to anticipate and rule out in advance all the disastrous ways the machine could choose to achieve a specified objective." Additionally, even if an AI system fully understands human intentions, it may still disregard them, because following human intentions may not be its objective (unless it is already fully aligned). === Pressure to deploy unsafe systems === Commercial organizations sometimes have incentives to take shortcuts on safety and to deploy misaligned or unsafe AI systems. For example, social media recommender systems have been profitable despite creating unwanted addiction and polarization. Competitive pressure can also lead to a race to the bottom on AI safety standards. For example, OpenAI has been sued for releasing a ChatGPT version that encouraged suicide for some unstable users, a behavior the company had overlooked amid a rushed product release. Similarly, in 2018, a self-driving car killed a pedestrian (Elaine Herzberg) after engineers disabled the emergency braking system because it was oversensitive and slowed development. === Risks from advanced misaligned AI === Some researchers are interested in aligning increasingly advanced AI systems, as progress in AI development is rapid, and industry and governments are trying to build advan

    Read more →
  • Radar geo-warping

    Radar geo-warping

    Radar geo-warping is the adjustment of geo-referenced radar images and video data to be consistent with a geographical projection. This image warping avoids any restrictions when displaying it together with video from multiple radar sources or with other geographical data including scanned maps and satellite images which may be provided in a particular projection. There are many areas where geo warping has unique benefits: Single radar video signal displayed together with maps of different geographical projections. E.g. Mercator UTM stereographic Multiple radar video signals displayed simultaneously: Having the computing power to do so on one computer. Adapting the projection of all radar signals allowing the geographically correct display and accurate superimposition of those videos. Slant range correction: a modern 3D radar system can measure the height of a target and hence it is possible to correct the radar video by the real corrected range of the target. Slant Range Correction also allows to compensate the radar tower height e.g. for maritime surveillance radars. == Introduction == Radar video presents the echoes of electromagnetic waves a radar system has emitted and received as reflections afterwards. These echoes are typically presented on a computer screen with a color-coding scheme depicting the reflection strength. Two problems have to be solved during such a visualization process. The first problem arises from the fact that typically the radar antenna turns around its position and measures the reflection echo distances from its position in one direction. This effectively means that the radar video data are present in polar coordinates. In older systems the polar oriented picture has been displayed in so called plan position indicators (PPI). The PPI-scope uses a radial sweep pivoting about the center of the presentation. This results in a map-like picture of the area covered by the radar beam. A long-persistence screen is used so that the display remains visible until the sweep passes again. Bearing to the target is indicated by the target's angular position in relation to an imaginary line extending vertically from the sweep origin to the top of the scope. The top of the scope is either true north (when the indicator is operated in the true bearing mode) or ship's heading (when the indicator is operated in the relative bearing mode). For visualization on a modern computer screen the polar coordinates have to be converted into Cartesian coordinates. This process called radar scan conversion is presented with more detail in the next section. The second problem to solve arises from the fact that a radar system is placed in the real world and measures real world echo positions. These echoes have to be displayed together with other real world data like object positions, vector maps and satellite images in a consistent way. All this information refers to the curved earth surface but is displayed on a flat computer display. Building a link from real world earth positions to display pixels is commonly called geographical referencing or in short geo-referencing. Part of the geo-referencing process is to map the 3D earth surface onto a 2D display. This process of a geographical projection can be performed in many ways, but different data sources have their own 'natural' projection. E.g. Cartesian radar video data from a radar source on the earth surface are geo-referenced by a so-called radar projection. When using this radar projection the Cartesian radar video pixels can directly displayed on a computer screen (only being linearly transformed according to the current position on the screen and e.g. the current zoom level). A problem now arises if e.g. also a satellite map shall be shown together with the radar video data. The 'natural' geographical projection of a satellite image would be a satellite projection which depends on the satellite orbit, position and further parameters. Now either the satellite image has to be reprojected to a radar projection or the radar video has to use the satellite projection. This geographical re-projection is also called geographical warping or Geo Warping where each image pixel has to be transformed from one projection into another. This article describes in further detail the Geo Warping of radar video images in real time. It will also show that radar video Geo Warping is done most efficiently when it is integrated with the radar scan conversion process. == Radar-scan conversion == This section describes the principles of the radar-scan conversion (RSC) process. The radar supplies its measured data in polar coordinates (ρ,θ) directly from the rotating antenna. ρ defines the target/echo distance and θ the target angle in polar world coordinates. These data are measured, digitized and stored in a polar coordinate polar store or polar pixmap. The main RSC task is to convert these data to Cartesian (x, y) display coordinates, creating the necessary display pixels. The RSC process is influenced by the current zoom, shift and rotation settings defining which part of the 'world' shall be visible in the display image. As detailed later the RSC process also takes the currently used geographical projection into account when the radar video images are Geo Warped. The OpenGL RSC is implemented using a reverse scan conversion approach which calculates for every image pixel the most appropriate radar amplitude value in the polar store. This approach generates an optimal image without any artifacts known from forward spoke fill algorithms. By applying bi-linear filtering between adjacent pixels in the polar store during the conversion process the OpenGL RSC finally achieves a very high visual quality radar display image for every zoom level, creating smooth images of the radar echoes. == Radar projection == This section illustrates how radar video data are geo referenced and displayed on a computer screen. The radar sensor is positioned on the earth surface with a height h above the ground. It measures the direct distance d to the target (and not e.g. the distance the target is away from the radar if one would move on the earth surface). This distance is then used in the display plane after adjustment to the current display zoom level by the radar scan converter (RSC). Now it has to be clarified how the radar video data is geo referenced. This basically means, that if we want to display a geographical real world object (like e.g. a light house) which is at the same real world position as the radar target, that it also shall appear at the same position in the display plane. This is realized by calculating the distance from the radar sensor to the respective real world object and use that distance in the display plane. The position of the real world object is typically given in geographical coordinates (latitude, longitude and height above the earth surface). In other words, using a radar projection with geographical data is done by simulating a radar measurement process with the real world objects and use the resulting range and azimuth in the display plane. The second picture to the right shows an example radar projection with the center of projection (COP) at latitude 50.0° and longitude 0.0° which is also the radar position. The dashed lines are the equal-latitude and equal-longitude lines on top of the background map. The solid lines show equal-range and equal-azimuth with the respect to the radar position. It is a feature of the radar projection that equal-range lines are circles and equal-azimuth lines are straight lines. This is necessary to display radar video consistently with other map data when using a radar projection where the projection center has to be the radar position. == Geo Warping process == This section explains the actual geo warping or re-projection process when applied to radar video in real time. Assume we want to display radar video on top of a satellite image. As an example we use the CIB projection which is used to display satellite data in CIB (Controlled Image Base) format. The Figure Geo Warping Radar to CIB Projection shows dashed the maximal range circle for a range of 111 km or 60 miles using the radar projection. Such a range is typical for long range coastal surveillance radars. As stated in the last section this is a perfect circle also on the computer screen. The solid line ellipse shows the same range circle for the CIB projection. Typically the errors occurring without Geo Warping are smallest near the radar position if at least the projection center (COP) coincides with the radar position, as realized in our example. Otherwise the error distribution depends both on the used projection and also on the projection parameters. Thus, in our case the errors are most significant near the maximum radar range. The CIB projection error corrected in east–west direction at half the radar range is 2.6 km and is 5.3 km at the full radar range of 111 km. An error of 5.3 km is

    Read more →
  • Prism Video Converter

    Prism Video Converter

    Prism is a multi-format video converter developed by NCH Software for Windows and Mac OS. It offers converting tools for instant media conversions. Prism Video Converter can handle large and high-quality resolution media files. It provides built-in compressor and adjuster settings, allowing users to customize and optimize their videos according to their needs. The software also includes features such as previewing videos and adding effects. Prism offers a free version for non-commercial use as well as a premium version. == Features == Prism Video File Converter supports a wide range of file formats. It enables users to convert videos into formats like AVI, ASF, WMV, MP4, 3GP, etc. It offers the ability to convert DVDs into various formats. It provides tools for adjusting colour and filter options. Prism Video File Converter provides several customizable options for tweaking the output files during the conversion process. Users can adjust compression/encoder rates, set the resolution and frame rate, and specify the desired output file size. The software also offers various effects like video rotation, captions, watermarks, and text overlay. It also includes a built-in preview feature, that enables users to view their videos before and after the conversion process. It supports batch conversion and running conversion in background. == Controversy == Previously, Prism and certain other NCH Software products were bundled with optional browser plugins, including the Google Chrome toolbar and the Conduit toolbar. This resulted in user complaints and raised concerns from antivirus software companies like Norton and McAfee, which flagged them as potential malware. NCH Software has since removed all toolbars, browsers, and third-party app offerings in all Prism versions.

    Read more →
  • PCPaint

    PCPaint

    PCPaint was one of the first IBM PC-based mouse-driven GUI paint programs, released in 1984. It followed after Microsoft Doodle, released in 1983 with the Microsoft Mouse version 1 drivers for DOS, and around the same time as Digital Research’s Draw program. It was developed and created by John Bridges and Doug Wolfgram. It was later developed into Pictor Paint. The hardware manufacturer Mouse Systems bundled PCPaint with millions of computer mice that they sold, making PCPaint one of the best-selling DOS-based paint programs of the mid 1980s. == History == In 1983, Doug Wolfgram bought a Microsoft Mouse and decided to write a drawing program for it. They named it “Mouse Draw”. The interface was primitive but the program functioned well. Wolfgram traveled to SoftCon in New Orleans where he demonstrated the program to Mouse Systems. Mouse Systems was developing an optical mouse and they wanted to bundle a painting program so they agreed to publish Mouse Draw. The original program was written entirely in assembly language with primitive graphics routines developed by Wolfgram. John Bridges worked for an educational software company, Classroom Consortia Media, Inc., developing and writing Apple and IBM graphics libraries for CCM's software. Bridges and Wolfgram were friends who had been connected through a bulletin board system developed and run by Wolfgram. The two collaborated cross country via the BBS, Wolfram in California and Bridges in New York. Mouse Systems wanted the paint program to capture the look and feel of MacPaint. John Bridges and Doug Wolfgram started reworking Mouse Draw into what became PCPaint. The program was completely re-written using Bridge's graphics library and the top-level elements were written in C rather than assembly language. Bridges developed the core graphics code for the first version of PCPaint while Wolfgram worked on the user interface and top-level code. Mouse Systems signed an exclusive agreement with Wolfgram's company, Microtex Industries, Inc., to bundle PCPaint with every mouse they sold. They began publishing PCPaint with their mice in 1984. Microsoft responded in 1985 by bundling a competing product, PC Paintbrush, with version 4 of its DOS drivers for the Microsoft Mouse, replacing its in-house Microsoft Doodle program which it published with version 1 of the DOS drivers in mid-1983. Microsoft’s mouse began to outsell Mouse Systems mouse. In November 1985 Microsoft bundled a cut-down version of PC Paintbrush with Windows 1.0 (called Microsoft Paint), later bundling an updated version of PC Paintbrush with Windows 3.0 (as Paintbrush), impacting PCPaint’s marketshare. In early 1987, Mouse Systems decided that PCPaint wasn't helping to sell mice any longer so they discontinued the bundle deal and returned rights to the code to MicroTex Industries, but retained rights to the name, PCPaint. Wolfgram then combined the paint program with a new animation system he was developing (called GRASP) and Paul Mace Software bought publishing rights to the animation system and PCPaint, which was to be renamed Pictor. Bridges again got involved and took over programming responsibilities for GRASP as well as PCPaint while Wolfgram focused on more of the business details. In creating the first version of PCPaint, Doug had a dual-floppy machine with a Computer Innovations compiler on one disk and source code on the other. John had the "luxury" of a 10MB hard disk in his XT. Data was exchanged daily via 1200, then 2400 baud modems. === Authorship and Ownership === John Bridges and Wolfgram continued to work on PCPaint and GRASP on behalf of Paul Mace Software until 1990. Also in that year, Doug Wolfgram sold his remaining rights to PCPaint (and its animation system, GRASP) to John Bridges. In 1994, GRASP development stopped and so did development of Pictor Paint. John Bridges terminated his GRASP publishing contract with Paul Mace Software, and went off to create GLPro (the next generation of GRASP) with GMEDIA. Along with GLPro, came GLPaint, the successor to PCPaint and Pictor Paint. == Versions == In June 1984, Mouse Systems shipped PCPaint 1.0, the first GUI based Paint program for the IBM PC family of computers. John Bridges and Doug Wolfgram, were the co-authors of PCPaint 1.0. PCPaint 1.0 saved its graphics in a modified BSaved image format with the extension of ".PIC". The release of PCPaint Version 1.5 followed in late 1984, with the additions of graphics image compression for the .PIC format and support for "larger-than-screen" images. PCjr support was also added in this version after overcoming severe memory shortage problems getting PCPaint to run on the 128k PCjr. October 1985 saw the release of PCPaint 2.0. EGA support and publishing features were added to this version. The .PIC format was further refined, offering support for the rapidly expanding graphics capabilities of the PC and efficient image compression. PCPaint 3.1 was released in 1989. Unlike previous versions, it was not bundled with mice but was sold as a stand-alone software product. PCPaint 3.1 offered improved text and image handling, provided 36 types of flood and fill, worked with VGA adapters in hi-res 16-color and 256-color modes, allowed the user to save and retrieve files in a variety of intercompatible formats (.PIC, .GIF, .PCX, .IMG), and printed selected portions of images on color or black-and-white dot matrix, ink jet, and laser printers such as PostScript and HP Laser Jet. PCPaint 3.1 is still in use today by some users of DOS emulation programs like DOSBox and available for free download. Pictor Paint was an improved version, written by John Bridges, and bundled with GRASP GRaphical System for Presentation also written by John Bridges. It was also called "The Painter's Easel". GLPaint, released in 1995, was the last in this series of paint programs written by John Bridges. By 1998 version 7.0 provided support for TrueColor images and the Pictor PIC format was expanded to handle these. == Pictor PIC Image Format == PCPaint 1.0 saved its graphics in a modified BSAVE image format (which was popular at the time) with the file type (extension) of ".PIC". By PCPaint 1.5 this format was extended further to accommodate image compression. With the release of version 2.0 the PICtor PIC image format was developed almost to its present state, with no similarity to the BSAVE format used by earlier versions. Pictor Paint saved its files in a compressed format with the file extension PIC, which was the same format used by PCPaint.

    Read more →
  • Statistical learning theory

    Statistical learning theory

    Statistical learning theory is a framework for machine learning drawing from the fields of statistics and functional analysis. Statistical learning theory deals with the statistical inference problem of finding a predictive function based on data. Statistical learning theory has led to successful applications in fields such as computer vision, speech recognition, and bioinformatics. == Introduction == The goals of learning are understanding and prediction. Learning falls into many categories, including supervised learning, unsupervised learning, online learning, and reinforcement learning. From the perspective of statistical learning theory, supervised learning is best understood. Supervised learning involves learning from a training set of data. Every point in the training is an input–output pair, where the input maps to an output. The learning problem consists of inferring the function that maps between the input and the output, such that the learned function can be used to predict the output from future input. Depending on the type of output, supervised learning problems are either problems of regression or problems of classification. If the output takes a continuous range of values, it is a regression problem. Using Ohm's law as an example, a regression could be performed with voltage as input and current as an output. The regression would find the functional relationship between voltage and current to be R {\displaystyle R} , such that V = I R {\displaystyle V=IR} Classification problems are those for which the output will be an element from a discrete set of labels. Classification is very common for machine learning applications. In facial recognition, for instance, a picture of a person's face would be the input, and the output label would be that person's name. The input would be represented by a large multidimensional vector whose elements represent pixels in the picture. After learning a function based on the training set data, that function is validated on a test set of data, data that did not appear in the training set. == Formal description == Take X {\displaystyle X} to be the vector space of all possible inputs, and Y {\displaystyle Y} to be the vector space of all possible outputs. Statistical learning theory takes the perspective that there is some unknown probability distribution over the product space Z = X × Y {\displaystyle Z=X\times Y} , i.e. there exists some unknown p ( z ) = p ( x , y ) {\displaystyle p(z)=p(\mathbf {x} ,y)} . The training set is made up of n {\displaystyle n} samples from this probability distribution, and is notated S = { ( x 1 , y 1 ) , … , ( x n , y n ) } = { z 1 , … , z n } {\displaystyle S=\{(\mathbf {x} _{1},y_{1}),\dots ,(\mathbf {x} _{n},y_{n})\}=\{\mathbf {z} _{1},\dots ,\mathbf {z} _{n}\}} Every x i {\displaystyle \mathbf {x} _{i}} is an input vector from the training data, and y i {\displaystyle y_{i}} is the output that corresponds to it. In this formalism, the inference problem consists of finding a function f : X → Y {\displaystyle f:X\to Y} such that f ( x ) ∼ y {\displaystyle f(\mathbf {x} )\sim y} . Let H {\displaystyle {\mathcal {H}}} be a space of functions f : X → Y {\displaystyle f:X\to Y} called the hypothesis space. The hypothesis space is the space of functions the algorithm will search through. Let V ( f ( x ) , y ) {\displaystyle V(f(\mathbf {x} ),y)} be the loss function, a metric for the difference between the predicted value f ( x ) {\displaystyle f(\mathbf {x} )} and the actual value y {\displaystyle y} . The expected risk is defined to be I [ f ] = ∫ X × Y V ( f ( x ) , y ) p ( x , y ) d x d y {\displaystyle I[f]=\int _{X\times Y}V(f(\mathbf {x} ),y)\,p(\mathbf {x} ,y)\,d\mathbf {x} \,dy} The target function, the best possible function f {\displaystyle f} that can be chosen, is given by the f {\displaystyle f} that satisfies f = argmin h ∈ H ⁡ I [ h ] {\displaystyle f=\mathop {\operatorname {argmin} } _{h\in {\mathcal {H}}}I[h]} Because the probability distribution p ( x , y ) {\displaystyle p(\mathbf {x} ,y)} is unknown, a proxy measure for the expected risk must be used. This measure is based on the training set, a sample from this unknown probability distribution. It is called the empirical risk I S [ f ] = 1 n ∑ i = 1 n V ( f ( x i ) , y i ) {\displaystyle I_{S}[f]={\frac {1}{n}}\sum _{i=1}^{n}V(f(\mathbf {x} _{i}),y_{i})} A learning algorithm that chooses the function f S {\displaystyle f_{S}} that minimizes the empirical risk is called empirical risk minimization. == Loss functions == The choice of loss function is a determining factor on the function f S {\displaystyle f_{S}} that will be chosen by the learning algorithm. The loss function also affects the convergence rate for an algorithm. It is important for the loss function to be convex. Different loss functions are used depending on whether the problem is one of regression or one of classification. === Regression === The most common loss function for regression is the square loss function (also known as the L2-norm). This familiar loss function is used in Ordinary Least Squares regression. The form is: V ( f ( x ) , y ) = ( y − f ( x ) ) 2 {\displaystyle V(f(\mathbf {x} ),y)=(y-f(\mathbf {x} ))^{2}} The absolute value loss (also known as the L1-norm) is also sometimes used: V ( f ( x ) , y ) = | y − f ( x ) | {\displaystyle V(f(\mathbf {x} ),y)=|y-f(\mathbf {x} )|} === Classification === In some sense the 0-1 indicator function is the most natural loss function for classification. It takes the value 0 if the predicted output is the same as the actual output, and it takes the value 1 if the predicted output is different from the actual output. For binary classification with Y = { − 1 , 1 } {\displaystyle Y=\{-1,1\}} , this is: V ( f ( x ) , y ) = θ ( − y f ( x ) ) {\displaystyle V(f(\mathbf {x} ),y)=\theta (-yf(\mathbf {x} ))} where θ {\displaystyle \theta } is the Heaviside step function. == Regularization == In machine learning problems, a major problem that arises is that of overfitting. Because learning is a prediction problem, the goal is not to find a function that most closely fits the (previously observed) data, but to find one that will most accurately predict output from future input. Empirical risk minimization runs this risk of overfitting: finding a function that matches the data exactly but does not predict future output well. Overfitting is symptomatic of unstable solutions; a small perturbation in the training set data would cause a large variation in the learned function. It can be shown that if the stability for the solution can be guaranteed, generalization and consistency are guaranteed as well. Regularization can solve the overfitting problem and give the problem stability. Regularization can be accomplished by restricting the hypothesis space H {\displaystyle {\mathcal {H}}} . A common example would be restricting H {\displaystyle {\mathcal {H}}} to linear functions: this can be seen as a reduction to the standard problem of linear regression. H {\displaystyle {\mathcal {H}}} could also be restricted to polynomial of degree p {\displaystyle p} , exponentials, or bounded functions on L1. Restriction of the hypothesis space avoids overfitting because the form of the potential functions are limited, and so does not allow for the choice of a function that gives empirical risk arbitrarily close to zero. One example of regularization is Tikhonov regularization. This consists of minimizing 1 n ∑ i = 1 n V ( f ( x i ) , y i ) + γ ‖ f ‖ H 2 {\displaystyle {\frac {1}{n}}\sum _{i=1}^{n}V(f(\mathbf {x} _{i}),y_{i})+\gamma \left\|f\right\|_{\mathcal {H}}^{2}} where γ {\displaystyle \gamma } is a fixed and positive parameter, the regularization parameter. Tikhonov regularization ensures existence, uniqueness, and stability of the solution. == Bounding empirical risk == Consider a binary classifier f : X → { 0 , 1 } {\displaystyle f:{\mathcal {X}}\to \{0,1\}} . We can apply Hoeffding's inequality to bound the probability that the empirical risk deviates from the true risk to be a Sub-Gaussian distribution. P ( | R ^ ( f ) − R ( f ) | ≥ ϵ ) ≤ 2 e − 2 n ϵ 2 {\displaystyle \mathbb {P} (|{\hat {R}}(f)-R(f)|\geq \epsilon )\leq 2e^{-2n\epsilon ^{2}}} But generally, when we do empirical risk minimization, we are not given a classifier; we must choose it. Therefore, a more useful result is to bound the probability of the supremum of the difference over the whole class. P ( sup f ∈ F | R ^ ( f ) − R ( f ) | ≥ ϵ ) ≤ 2 S ( F , n ) e − n ϵ 2 / 8 ≈ n d e − n ϵ 2 / 8 {\displaystyle \mathbb {P} {\bigg (}\sup _{f\in {\mathcal {F}}}|{\hat {R}}(f)-R(f)|\geq \epsilon {\bigg )}\leq 2S({\mathcal {F}},n)e^{-n\epsilon ^{2}/8}\approx n^{d}e^{-n\epsilon ^{2}/8}} where S ( F , n ) {\displaystyle S({\mathcal {F}},n)} is the shattering number and n {\displaystyle n} is the number of samples in your dataset. The exponential term comes from Hoeffding but there is an extra cost of taking the supremum over the whole cla

    Read more →
  • Image stitching

    Image stitching

    Image stitching or photo stitching is the process of combining multiple photographic images with overlapping fields of view to produce a segmented panorama or high-resolution image. Commonly performed through the use of computer software, most approaches to image stitching require nearly exact overlaps between images and identical exposures to produce seamless results, although some stitching algorithms actually benefit from differently exposed images by doing high-dynamic-range imaging in regions of overlap. Some digital cameras can stitch their photos internally. == Applications == Image stitching is widely used in modern applications, such as the following: Document mosaicing Image stabilization feature in camcorders that use frame-rate image alignment High-resolution image mosaics in digital maps and satellite imagery Medical imaging Multiple-image super-resolution imaging Video stitching Object insertion == Process == The image stitching process can be divided into three main components: image registration, calibration, and blending. === Image stitching algorithms === In order to estimate image alignment, algorithms are needed to determine the appropriate mathematical model relating pixel coordinates in one image to pixel coordinates in another. Algorithms that combine direct pixel-to-pixel comparisons with gradient descent (and other optimization techniques) can be used to estimate these parameters. Distinctive features can be found in each image and then efficiently matched to rapidly establish correspondences between pairs of images. When multiple images exist in a panorama, techniques have been developed to compute a globally consistent set of alignments and to efficiently discover which images overlap one another. A final compositing surface onto which to warp or projectively transform and place all of the aligned images is needed, as are algorithms to seamlessly blend the overlapping images, even in the presence of parallax, lens distortion, scene motion, and exposure differences. === Image stitching issues === Since the illumination in two views cannot be guaranteed to be identical, stitching two images could create a visible seam. Other reasons for seams could be the background changing between two images for the same continuous foreground. Other major issues to deal with are the presence of parallax, lens distortion, scene motion, and exposure differences. In a non-ideal real-life case, the intensity varies across the whole scene, and so does the contrast and intensity across frames. Additionally, the aspect ratio of a panorama image needs to be taken into account to create a visually pleasing composite. For panoramic stitching, the ideal set of images will have a reasonable amount of overlap (at least 15–30%) to overcome lens distortion and have enough detectable features. The set of images will have consistent exposure between frames to minimize the probability of seams occurring. === Keypoint detection === Feature detection is necessary to automatically find correspondences between images. Robust correspondences are required in order to estimate the necessary transformation to align an image with the image it is being composited on. Corners, blobs, Harris corners, and differences of Gaussians of Harris corners are good features since they are repeatable and distinct. One of the first operators for interest point detection was developed by Hans Moravec in 1977 for his research involving the automatic navigation of a robot through a clustered environment. Moravec also defined the concept of "points of interest" in an image and concluded these interest points could be used to find matching regions in different images. The Moravec operator is considered to be a corner detector because it defines interest points as points where there are large intensity variations in all directions. This often is the case at corners. However, Moravec was not specifically interested in finding corners, just distinct regions in an image that could be used to register consecutive image frames. Harris and Stephens improved upon Moravec's corner detector by considering the differential of the corner score with respect to direction directly. They needed it as a processing step to build interpretations of a robot's environment based on image sequences. Like Moravec, they needed a method to match corresponding points in consecutive image frames, but were interested in tracking both corners and edges between frames. SIFT and SURF are recent key-point or interest point detector algorithms but a point to note is that SURF is patented and its commercial usage restricted. Once a feature has been detected, a descriptor method like SIFT descriptor can be applied to later match them. === Registration === Image registration involves matching features in a set of images or using direct alignment methods to search for image alignments that minimize the sum of absolute differences between overlapping pixels. When using direct alignment methods one might first calibrate one's images to get better results. Additionally, users may input a rough model of the panorama to help the feature matching stage, so that e.g. only neighboring images are searched for matching features. Since there are smaller group of features for matching, the result of the search is more accurate and execution of the comparison is faster. To estimate a robust model from the data, a common method used is known as RANSAC. The name RANSAC is an abbreviation for "RANdom SAmple Consensus". It is an iterative method for robust parameter estimation to fit mathematical models from sets of observed data points which may contain outliers. The algorithm is non-deterministic in the sense that it produces a reasonable result only with a certain probability, with this probability increasing as more iterations are performed. It being a probabilistic method means that different results will be obtained for every time the algorithm is run. The RANSAC algorithm has found many applications in computer vision, including the simultaneous solving of the correspondence problem and the estimation of the fundamental matrix related to a pair of stereo cameras. The basic assumption of the method is that the data consists of "inliers", i.e., data whose distribution can be explained by some mathematical model, and "outliers" which are data that do not fit the model. Outliers are considered points which come from noise, erroneous measurements, or simply incorrect data. For the problem of homography estimation, RANSAC works by trying to fit several models using some of the point pairs and then checking if the models were able to relate most of the points. The best model – the homography, which produces the highest number of correct matches – is then chosen as the answer for the problem; thus, if the ratio of number of outliers to data points is very low, the RANSAC outputs a decent model fitting the data. === Calibration === Image calibration aims to minimize differences between an ideal lens models and the camera-lens combination that was used, optical defects such as distortions, exposure differences between images, vignetting, camera response and chromatic aberrations. If feature detection methods were used to register images and absolute positions of the features were recorded and saved, stitching software may use the data for geometric optimization of the images in addition to placing the images on the panosphere. Panotools and its various derivative programs use this method. ==== Alignment ==== Alignment may be necessary to transform an image to match the view point of the image it is being composited with. Alignment, in simple terms, is a change in the coordinates system so that it adopts a new coordinate system which outputs image matching the required viewpoint. The types of transformations an image may go through are pure translation, pure rotation, a similarity transform which includes translation, rotation and scaling of the image which needs to be transformed, Affine or projective transform. Projective transformation is the farthest an image can transform (in the set of two dimensional planar transformations), where only visible features that are preserved in the transformed image are straight lines whereas parallelism is maintained in an affine transform. Projective transformation can be mathematically described as x ′ = H ⋅ x , {\displaystyle x'=H\cdot x,} where x {\displaystyle x} is points in the old coordinate system, x ′ {\displaystyle x'} is the corresponding points in the transformed image and H {\displaystyle H} is the homography matrix. Expressing the points x {\displaystyle x} and x ′ {\displaystyle x'} using the camera intrinsics ( K {\displaystyle K} and K ′ {\displaystyle K'} ) and its rotation and translation [ R t ] {\displaystyle [R\,t]} to the real-world coordinates X {\displaystyle X} and < m a t h > x {\displaystyle x} and x ′ {\displaystyle x'} ', we get Using the abo

    Read more →
  • Coolgorilla

    Coolgorilla

    Coolgorilla was one of the earliest software developers that created 3rd party native applications for Apple iPod devices. Coolgorilla was an early adopter of using a sponsorship business model to enable mobile applications to be given away freely. Coolgorilla developed a series of Talking Phrasebooks for iPods in 2006. They partnered with online travel company lastminute.com who sponsored the applications enabling them to be made available to download completely free of charge. As mobile devices became more sophisticated, Coolgorilla developed the Talking Phrasebooks for Sony Ericsson and Nokia Mobile Devices which at the time were considerably noteworthy since the applications used real voice audio translations. With Apple's introduction of the iPhone in 2007, Coolgorilla developed a Web App before having four of the iPhone Talking Phrasebooks available to download from Apple's App Store on the day it opened in 2008. == Almanac in Chronological Order == On 23 December 2005, CoolGorilla, a new start-up, launched a trivia game for the iPod. It was titled "Rock and Pop Quiz". It was a quiz game that tested users' knowledge on bands such as U2, Metallica, Beyonce, and the Beatles. The quiz contained twenty megabytes of audible trivia questions. The free game was compatible with 3rd, 4th and 5th generation iPods, iPod mini and nano. In March 2006, Coolgorilla released "Movie Quiz for iPods" with a price of $5. It was an audio game narrated by New York's DJ Thomas, a radio and television host, voice over artist and event Master of Ceremonies. There were questions on Star Wars, Spiderman, The Godfather, Pulp Fiction, The Matrix, James Bond, and others. The user could keep track of their score. The game included a secret code for players who answered all questions correctly which enabled users to enter their name on the Coolgorilla Hall of Fame. In May 2006, Coolgorilla launched a World Cup Encyclopedia which was released prior to the 2006 FIFA World Cup. It had information on the World Cup schedule, details of every player from every team, every score from every world cup game ever played, stadium details, and manager profiles. It was a free download. In June 2006, Coolgorilla released a series of iPod Phrasebooks in German, Greek, French and Spanish. They were sponsored by lastminute.com and were free. The phrasebooks included common words and phrases for tourists with 750 sound files. They were accessed through the iPod's Notes feature. In April 2007, Coolgorilla released a downloadable version of the Talking Phrasebooks for Nokia and Sony Ericsson mobile devices. French, Spanish, German, Greek, Italian, and Portuguese were produced. The application provided real voice translations. They initially sold for £3 but 3 months later were offered for free. The branding was lastminute.com branding. Apple's iPhone was released at the end of June 2007. Soon after, Coolgorilla released an online all-in-one version of their Talking Phrasebooks for iPhone (Web App). The Phrasebooks were made available online in the form of a web app as iPhone did not yet allow for the download of additional apps. The app provided both text and audio translations in French, Spanish, Portuguese, Italian, German, and Greek. The iPhone translated the phrases using the recordings of real, native voice-over artists. A text translation on screen was also displayed. Apple's App Store opened in July 2008 with approximately 500 native apps available. Four of these Apps were Coolgorilla's Talking Phrasebooks for iPhone (Native Apps). There was French, German, Italian, and Spanish. These Apps carried lastminute.com branding and were available for free download. In the first three weeks following their release, the phrasebooks had over 350,000 downloads. Subsequently, Dutch, Arabic, Mandarin and Cantonese were also released. In October 2008, Coolgorilla released an iPhone London Travel Guide. Coolgorilla featured on NBC News in August 2009. In 2010, FIAT used the Italian Phrasebook to help promote the release of their FIAT 500 in the US. There has been no further activity since.

    Read more →