AI Data Jobs

AI Data Jobs — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Diagnostically acceptable irreversible compression

    Diagnostically acceptable irreversible compression

    Diagnostically acceptable irreversible compression (DAIC) is the amount of lossy compression which can be used on a medical image to produce a result that does not prevent the reader from using the image to make a medical diagnosis. The term was first introduced at a workshop on irreversible compression convened by the European Society of Radiology (ESR) in Palma de Mallorca October 13, 2010, the results of which were reported in a subsequent position paper. == Determination == The "amount of compression" in irreversible compression used to be determined by the compression ratio, where the acceptable minimum is determined by the algorithm (typically JPEG or J2K) and the data type (body part and imaging method). Such a definition is easy to follow, and has been used by medical bodies in 2010 around the world. However, its downside is obvious: the compression ratio tells nothing about the real quality of the image, as different compressors can produce vastly different qualities under the same file size. For example, the JPEG format of 1992 can perform as well as many modern formats given newer techniques exploited in mozjpeg and ISO libjpeg, yet they would be lumped together with the legacy encoders in such a scheme. The image compression community has long used objective quality metrics like SSIM to measure the effects of compression. In the absence of good data regarding SSIM, the ESR review of 2010 concluded that it is still difficult to establish a criterion for whether a particular irreversible compression scheme applied with particular parameters to a particular individual image, or category of images, avoids the introduction of some quantifiable risk of a diagnostic error for any particular diagnostic task. A 2017 study showed that a SSIM variant called 4-G-r (4-component, gradient, structural component of SSIM) best reflects changes in images that affect the decision of radiologists out of 16 SSIM variants. A 2020 study shows that visual information fidelity (VIF), feature similarity index (FSIM), and noise quality metric (NQM) best reflect radiologist preferences out of ten metrics. It also mentions that the original version of SSIM works as poorly as a basic root-mean-square distance (RMSD) for this purpose, a result echoed by the 2017 study. The 4-G-r modification is not tested in the study.

    Read more →
  • Seam carving

    Seam carving

    Seam carving (or liquid rescaling) is an algorithm for content-aware image resizing, developed by Shai Avidan, of Mitsubishi Electric Research Laboratories (MERL), and Ariel Shamir, of the Interdisciplinary Center and MERL. It functions by establishing a number of seams (paths of least importance) in an image and automatically removes seams to reduce image size or inserts seams to extend it. Seam carving also allows manually defining areas in which pixels may not be modified, and features the ability to remove whole objects from photographs. The purpose of the algorithm is image retargeting, which is the problem of displaying images without distortion on media of various sizes (cell phones, projection screens) using document standards, like HTML, that already support dynamic changes in page layout and text but not images. Image Retargeting was invented by Vidya Setlur, Saeko Takage, Ramesh Raskar, Michael Gleicher and Bruce Gooch in 2005. The work by Setlur et al. won the 10-year impact award in 2015. == Seams == Seams can be either vertical or horizontal. A vertical seam is a path of pixels connected from top to bottom in an image with one pixel in each row. A horizontal seam is similar with the exception of the connection being from left to right. The importance/energy function values a pixel by measuring its contrast with its neighbor pixels. == Process == The below example describes the process of seam carving: The seams to remove depends only on the dimension (height or width) one wants to shrink. It is also possible to invert step 4 so the algorithm enlarges in one dimension by copying a low energy seam and averaging its pixels with its neighbors. === Computing seams === Computing a seam consists of finding a path of minimum energy cost from one end of the image to another. This can be done via Dijkstra's algorithm, dynamic programming, greedy algorithm or graph cuts among others. ==== Dynamic programming ==== Dynamic programming is a programming method that stores the results of sub-calculations in order to simplify calculating a more complex result. Dynamic programming can be used to compute seams. If attempting to compute a vertical seam (path) of lowest energy, for each pixel in a row we compute the energy of the current pixel plus the energy of one of the three possible pixels above it. The images below depict a DP process to compute one optimal seam. Each square represents a pixel, with the top-left value in red representing the energy value of that pixel. The value in black represents the cumulative sum of energies leading up to and including that pixel. The energy calculation is trivially parallelized for simple functions. The calculation of the DP array can also be parallelized with some interprocess communication. However, the problem of making multiple seams at the same time is harder for two reasons: the energy needs to be regenerated for each removal for correctness and simply tracing back multiple seams can form overlaps. Avidan 2007 computes all seams by removing each seam iteratively and storing an "index map" to record all the seams generated. The map holds a "nth seam" number for each pixel on the image, and can be used later for size adjustment. If one ignores both issues however, a greedy approximation for parallel seam carving is possible. To do so, one starts with the minimum-energy pixel at one end, and keep choosing the minimum energy path to the other end. The used pixels are marked so that they are not picked again. Local seams can also be computed for smaller parts of the image in parallel for a good approximation. == Issues == The algorithm may need user-provided information to reduce errors. This can consist of painting the regions which are to be preserved. With human faces it is possible to use face detection. Sometimes the algorithm, by removing a low energy seam, may end up inadvertently creating a seam of higher energy. The solution to this is to simulate a removal of a seam, and then check the energy delta to see if the energy increases (forward energy). If it does, prefer other seams instead. == Implementations == Adobe Systems acquired a non-exclusive license to seam carving technology from MERL, and implemented it as a feature in Photoshop CS4, where it is called Content Aware Scaling. As the license is non-exclusive, other popular computer graphics applications (e. g. GIMP, digiKam, and ImageMagick) as well as some stand-alone programs (e. g. iResizer) also have implementations of this technique, some of which are released as free and open source software. There also exists an implementation for webpages. == Improvements and extensions == Better energy function and application to video by introducing 2D (time+1D) seams. Faster implementation on GPU. Application of this forward energy function to static images. Multi-operator: Combine with cropping and scaling. Much faster removal of multiple seams. Removing seams through neural deformation fields to extend to continuous domains like 3D scenes. A 2010 review of eight image retargeting methods found that seam carving produced output that was ranked among the worst of the tested algorithms. It was, however, a part of one of the highest-ranking algorithms: the multi-operator extension mentioned above (combined with cropping and scaling).

    Read more →
  • Microsoft To Do

    Microsoft To Do

    Microsoft To Do (previously styled as Microsoft To-Do) is a cloud-based task management application. It allows users to manage their tasks from a smartphone, tablet and computer. The technology is produced by the team behind Wunderlist, which was acquired by Microsoft, and the stand-alone apps feed into the existing Tasks feature of the Outlook product range. == History == Microsoft To Do was first launched as a preview with basic features in April 2017. Later more features were added including Task list sharing in June 2018. In September 2019, a major update to the app was unveiled, adopting a new user interface with a closer resemblance to Wunderlist. The name was also slightly updated by removing the hyphen from To-Do. In May 2020, Microsoft officially closed the doors on Wunderlist, ending its active service in favor of improving and expanding Microsoft To Do.

    Read more →
  • AppyStore

    AppyStore

    AppyStore is a comprehensive learning videos and games app for kids up to the age of 8 years. The platform developed by Mauj Mobile, a mobile value-added services (VAS) provider curates content to help in child development by leveraging technology. Mauj is funded by Sequoia Capital, Westbridge Capital and Intel Capital. == Background == AppyStore was launched in 2014 as a platform providing content for kids between the ages of 1.5 and 6 years. AppyStore subsequently extended its services for kids up to 8 years of age. The company operates on a subscription-based model and claims to have 5,000 learning games and videos segregated in 18 learning areas developed to help children gain optimal skills and qualities. According to an article published in Business Standard, the application is claimed to be one of the top 5 apps that help to enhance the logical and imaginative capabilities of children. AppyStore was awarded the Best app for kids by Google Play in December 2017. == Service == The company provides content via a website and an Android app. The website and android app provide learning games, rhymes, phonics, reading, stories, science, numbers, maths, logic videos comprising puzzles, worksheets, videos and fun activities and the premium subscription also includes physical worksheets which are home delivered. This content is educational and has been handpicked by teachers and experts with an understanding of the major areas of child development milestones for children up to 8 years of age. The mobile application also allows parents to track the progress of their child on the basis of the number of videos viewed.

    Read more →
  • JDoodle

    JDoodle

    JDoodle is a cloud-based online integrated development environment and compiler platform that supports execution of source code in 70+ programming languages including Java, Python, C/C++, PHP, Ruby, Perl, HTML, and more. It provides zero‑setup code for compilation, execution, and sharing via a web browser interface. == Features == Provides real‑time collaboration and code embedding via shareable URLs and APIs Offers an integrated terminal interface supporting database engines such as MySQL and MongoDB. JDroid — AI‑assistant to generate code snippets, optimize code, and assist debugging. == Languages and frameworks supported ==

    Read more →
  • Word error rate

    Word error rate

    Word error rate (WER) is a common metric of the performance of a speech recognition or machine translation system. The WER metric typically ranges from 0 to 1, where 0 indicates that the compared pieces of text are exactly identical, and 1 (or larger) indicates that they are completely different with no similarity. This way, a WER of 0.8 means that there is an 80% error rate for compared sentences. The general difficulty of measuring performance lies in the fact that the recognized word sequence can have a different length from the reference word sequence (supposedly the correct one). The WER is derived from the Levenshtein distance, working at the word level instead of the phoneme level. The WER is a valuable tool for comparing different systems as well as for evaluating improvements within one system. This kind of measurement, however, provides no details on the nature of translation errors and further work is therefore required to identify the main source(s) of error and to focus any research effort. This problem is solved by first aligning the recognized word sequence with the reference (spoken) word sequence using dynamic string alignment. Examination of this issue is seen through a theory called the power law that states the correlation between perplexity and word error rate. Word error rate can then be computed as: W E R = S + D + I N = S + D + I S + D + C {\displaystyle {\mathit {WER}}={\frac {S+D+I}{N}}={\frac {S+D+I}{S+D+C}}} where S is the number of substitutions, D is the number of deletions, I is the number of insertions, C is the number of correct words, N is the number of words in the reference (N=S+D+C) The intuition behind 'deletion' and 'insertion' is how to get from the reference to the hypothesis. So if we have the reference "This is wikipedia" and hypothesis "This _ wikipedia", we call it a deletion. Note that since N is the number of words in the reference, the word error rate can be larger than 1.0, namely if the number of insertions I is larger than the number of correct words C. When reporting the performance of a speech recognition system, sometimes word accuracy (WAcc) is used instead: W A c c = 1 − W E R = N − S − D − I N = C − I N {\displaystyle {\mathit {WAcc}}=1-{\mathit {WER}}={\frac {N-S-D-I}{N}}={\frac {C-I}{N}}} Since the WER can be larger than 1.0, the word accuracy can be smaller than 0.0. == Experiments == It is commonly believed that a lower word error rate shows superior accuracy in recognition of speech, compared with a higher word error rate. However, at least one study has shown that this may not be true. In a Microsoft Research experiment, it was shown that, if people were trained under "that matches the optimization objective for understanding", (Wang, Acero and Chelba, 2003) they would show a higher accuracy in understanding of language than other people who demonstrated a lower word error rate, showing that true understanding of spoken language relies on more than just high word recognition accuracy. == Other metrics == One problem with using a generic formula such as the one above, however, is that no account is taken of the effect that different types of error may have on the likelihood of successful outcome, e.g. some errors may be more disruptive than others and some may be corrected more easily than others. These factors are likely to be specific to the syntax being tested. A further problem is that, even with the best alignment, the formula cannot distinguish a substitution error from a combined deletion plus insertion error. Hunt (1990) has proposed the use of a weighted measure of performance accuracy where errors of substitution are weighted at unity but errors of deletion and insertion are both weighted only at 0.5, thus: W E R = S + 0.5 D + 0.5 I N {\displaystyle {\mathit {WER}}={\frac {S+0.5D+0.5I}{N}}} There is some debate, however, as to whether Hunt's formula may properly be used to assess the performance of a single system, as it was developed as a means of comparing more fairly competing candidate systems. A further complication is added by whether a given syntax allows for error correction and, if it does, how easy that process is for the user. There is thus some merit to the argument that performance metrics should be developed to suit the particular system being measured. Whichever metric is used, however, one major theoretical problem in assessing the performance of a system is deciding whether a word has been “mis-pronounced,” i.e. does the fault lie with the user or with the recogniser. This may be particularly relevant in a system which is designed to cope with non-native speakers of a given language or with strong regional accents. The pace at which words should be spoken during the measurement process is also a source of variability between subjects, as is the need for subjects to rest or take a breath. All such factors may need to be controlled in some way. For text dictation it is generally agreed that performance accuracy at a rate below 95% is not acceptable, but this again may be syntax and/or domain specific, e.g. whether there is time pressure on users to complete the task, whether there are alternative methods of completion, and so on. The term "Single Word Error Rate" is sometimes referred to as the percentage of incorrect recognitions for each different word in the system vocabulary. == Edit distance == The word error rate may also be referred to as the length normalized edit distance. The normalized edit distance between X and Y, d( X, Y ) is defined as the minimum of W( P ) / L ( P ), where P is an editing path between X and Y, W ( P ) is the sum of the weights of the elementary edit operations of P, and L(P) is the number of these operations (length of P).

    Read more →
  • Image translation

    Image translation

    Image translation is the machine translation of images of printed text (posters, banners, menus, screenshots etc.). This is done by applying optical character recognition (OCR) technology to an image to extract any text contained in the image, and then have this text translated into a language of their choice, and the applying digital image processing on the original image to get the translated image with a new language. == General == Machine translation made available on the internet (web and mobile) is a notable advance in multilingual communication eliminating the need for an intermediary translator/interpreter, translating foreign texts still poses a problem to the user as they cannot be expected to be able to type the foreign text they wish to translate and understand. Manually entering the foreign text may prove to be a difficulty especially in cases where an unfamiliar alphabet is used from a script which user can't read, e.g. Cyrillic, Chinese, Japanese etc. for an English speaker or any speaker of a Latin-based language or vice versa. The technical advancements in OCR made it possible to recognize text from images. The possibility to use one's mobile device's camera to capture and extract printed text is also known as mobile OCR and was first introduced in Japanese manufactured mobile telephones in 2004. Using the handheld's camera one could take a picture of (a line of) text and have it extracted (digitalized) for further manipulation such as storing the information in their contacts list, as a web page address (URL) or text to use in an SMS/email message etc. Presently, mobile devices having a camera resolution of 2 megapixels or above with an auto-focus ability, often feature the text scanner service. Taking the text scanning facility one step further, image translation emerged, giving users the ability to capture text with their mobile phone's camera, extract the text, and have it translated in their own language. More and more applications emerged on this technology including Word Lens. After getting acquired by Google, it was made a part of Google Translate mobile app. Another simultaneous advancement in Image Processing, has also made it possible now to replace the text on the image with the translated text and create a new image altogether. == History == The development of the image translation service springs from the advances in OCR technology (miniaturization and reduction of memory resources consumed) enabling text scanning on mobile telephones. Among the first to announce mobile software capable of “reading” text using the mobile device's camera is International Wireless Inc. who in February 2003 released their “CheckPoint” and “WebPoint” applications. “CheckPoint” reads critical symbolic information on checks and is aimed at reducing losses that mobile merchants suffer from “bounced” checks by scanning the MICR number on the bottom of a check, while “WebPoint” enables the visual recognition and decoding of printed URL's, which are then opened by the device's web browser. The first commercial release of a mobile text scanner, however, took place in December 2004 when Vodafone and Sharp began selling the 902SH mobile which was the first to feature a 2 megapixel digital camera with optical zoom. Among the device's various multimedia features was the built-in text/bar code/QR code scanner. The text scanner function could handle up to 60 alphabetical characters simultaneously. The scanned text could be then sent as an email or SMS message, added as a dictionary entry or, in the case of scanned URLs, opened via the device's web browser. All subsequent Sharp mobiles feature the text scanner functionality. In September 2005, NEC Corporation and the Nara Institute of Science and Technology in Japan (NAIST) announced new software capable of transforming cameraphones into text scanners. The application differs substantially from similarly equipped mobile telephones in Japan (able to scan businesscards and small bits of text and use OCR to convert that to editable text or to URL addresses) by it ability to scan a whole page. The two companies, however, said they would not release the software commercially before the end of 2008. Combining the text scanner function with machine translation technology was first made by US company RantNetwork who in July 2007 started selling the Communilator, a machine translation application for mobile devices featuring the Image Translation functionality. Using the built-in camera, the mobile user could take a picture of some printed text, apply OCR to recognize the text and then translate it into any one of over 25 language available. In April 2008 Nokia showcased their Shoot-to-Translate application for the N73 model which is capable of taking a picture using the device's camera, extracting the text and then translating it. The application only offers Chinese to English translation, and does not handle large segments of text. Nokia said they are in the process of developing their Multiscanner product which, besides scanning text and business cards, would be able to translate between 52 languages. Again in April 2008, Korean company Unichal Inc. released their handheld Dixau text scanner capable of scanning and recognizing English text and then translating it into Korean using online translation tools such as Wikipedia or Google Translate. The device is connected to a PC or a laptop via the USB port. In February 2009, Bulgarian company Interlecta presented at the Mobile World Congress in Barcelona their mobile translator including image recognition and speech synthesis. The application handles all European languages along with Chinese, Japanese and Korean. The software connects to a server over the Internet to accomplish the image recognition and the translation. In May 2014, Google acquired Word Lens to improve the quality of visual and voice translation. It is able to scan text or picture with one's device and have it translated instantly. Since the OCR has been improving many companies or website started combining OCR and translation, to read the text from an image and show the translated text. In August 2018, an Indian company created ImageTranslate. It is able to read, translate and re-create the image in another language. As of late 2018, the tool added 13 new languages, including Arabic, Thai, Vietnamese, Hindi, and Bengali, significantly increasing its utility in Asia and the Middle East. This helps users translate photos already stored in their phone's gallery, not just live, real-time views. Currently, image translation is offered by the following companies: Google Translate app with camera ImageTranslate Yandex

    Read more →
  • Quantum robotics

    Quantum robotics

    Quantum robotics is an interdisciplinary field that investigates the intersection of robotics and quantum mechanics. This field, in particular, explores the applications of quantum phenomena such as quantum entanglement within the realm of robotics. Examples of its applications include quantum communication in multi-agent cooperative robotic scenarios, the use of quantum algorithms in performing robotics tasks, and the integration of quantum devices (e.g., quantum detectors) in robotic systems. == Introduction == The free-space quantum communication between mobile platforms was proposed for reconfigurable quantum key distribution (QKD) applications using unmanned aerial vehicle (UAVs, a.k.a. drones) in 2017. This technology was later advanced in various aspects in mobile drone and vehicle platforms in several configurations such as drone-to-drone, drone-to-moving vehicle, and vehicle-to-vehicle systems. Some research has contributed to low-size, low-weight, and low-power quantum key distribution systems for small-form UAVs, the characterization of a polarization-based receiver for mobile free-space optical QKD, and optical-relayed entanglement distribution using drones as mobile nodes. The topic of free-space quantum communication between mobile platforms, initially developed to meet the need for free-space QKD and entanglement distribution using mobile nodes, was brought into the robotics domain as an emerging interdisciplinary mechatronics topic to investigate the interface between quantum technologies and the robotic systems domain. The main advantage of such integrated technology is the guaranteed security in communication between multi-agent and cooperative autonomous systems. Other advances are anticipated. == Quantum entanglement == According to quantum mechanics, entanglement occurs when more than one particle become connected. If the state of one particle changes then it will instantly change the state of other particles regardless of their distance. Entangled sensors do the same kind of work and achieve strong sensitivity. A group of quantum robots can measure magnetic fields, gravitational fields and other physical properties using entangled sensors with high rate of accuracy. Again the connection of one robot to other is increased (become strong) by quantum entanglement. == Quantum teleportation == Quantum teleportation is the transfer of quantum information (not physical objects). This is used in case of multi robot process. One robot is programmed with a complex quantum update. Then that robot can teleport that complex quantum information (the update) to other robots. This teleportation or communication is very secure because all the work is done in quantum state. == Kinematics == Quantum computing has been proposed as being optimal for calculating inverse kinematics values. == Alice and Bob robots == In the realm of quantum mechanics, the names Alice and Bob are frequently employed to illustrate various phenomena, protocols, and applications. These include their roles in QKD, quantum cryptography, entanglement, and teleportation. The terms "Alice Robot" and "Bob Robot" serve as analogous expressions that merge the concepts of Alice and Bob from quantum mechanics with mechatronic mobile platforms (such as robots, drones, and autonomous vehicles). For example, the Alice Robot functions as a transmitter platform that communicates with the Bob Robot, housing the receiving detectors.

    Read more →
  • Ernie Bot

    Ernie Bot

    Ernie Bot (Chinese: 文心一言, Pinyin: wénxīn yīyán), full name Enhanced Representation through Knowledge Integration, is an artificial intelligence chatbot developed by the Chinese technology company Baidu. Ernie Bot rivals GPT models in Chinese NLP tasks. It is built on the company's ERNIE series of large language models, which have been in development since 2019. The service was first launched for invited testing on March 16, 2023, and was released to the general public on August 31, 2023, after receiving approval from Chinese regulators. Since its public launch, Ernie Bot has undergone several updates, with newer versions like ERNIE 4.0 and 4.5 released to improve its capabilities. The service has seen rapid user adoption, reportedly reaching over 200 million users by April 2024. It has been integrated into various products, notably powering AI features for the Chinese release of Samsung's Galaxy S24 smartphones. As a product operating in China, Ernie Bot is subject to the country's censorship regulations. It has been observed to refuse answers to politically sensitive questions, such as those regarding CCP general secretary Xi Jinping, the 1989 Tiananmen Square protests and massacre, and other topics deemed taboo by the government. == History == Ernie Bot was initially released for invited testing on March 16, 2023. The live release demo was reported to have been prerecorded, which caused Baidu's stock to drop 10 percent on the day of the launch. The company's stock gained 14 percent the following day after analysts from Citigroup and Bank of America tested Ernie Bot and gave it positive preliminary reviews. On August 31, 2023, Ernie Bot was released to the public after receiving approval from Chinese regulatory authorities. By December 2023, Baidu announced the service had surpassed 100 million users. In January 2024, Hong Kong newspaper South China Morning Post reported that a university research lab linked to the People's Liberation Army (PLA) had tested Ernie Bot for military response scenarios. Baidu denied the allegations, stating it had no connection with the academic paper. That same month, Ernie was integrated into Samsung's Galaxy S24 lineup for its launch in China. The user base reportedly grew to 200 million by April 2024 and 300 million by June 2024. In September 2024, Baidu changed the chatbot's Chinese name from "Wenxin Yiyan" (文心一言) to "Wenxiaoyan" (文小言) to position it as a search assistant. On March 16, 2025, Baidu announced version 4.5 and the reasoning model ERNIE X1. The following month, at the Create2025 Baidu AI Developer Conference, the company released the Wenxin 4.5 Turbo and Wenxin X1 Turbo models, designed to be faster and less expensive to operate. == Development == Ernie Bot is based on Baidu's ERNIE (Enhanced Representation through Knowledge Integration) series of foundation models. The general training process begins with pre-training on large datasets, followed by refinement using techniques like supervised fine-tuning, reinforcement learning with human feedback, and prompt engineering. === Foundation models === ==== Ernie 3.0 ==== The model powering the initial launch of Ernie Bot. It was trained with 10 billion parameters on a 4-terabyte corpus consisting of plain text and a large-scale knowledge graph. ==== Ernie 3.5 ==== Released in June 2023. At the time of release, its performance was reported as "slightly inferior" to OpenAI's GPT-4. ==== Ernie 4.0 ==== Unveiled in October 2023 and released to paying subscribers in November. According to Baidu, this version featured improved performance over its predecessor, with information updated to April 2023. ==== Ernie X1 ==== Announced in March 2025, with Ernie X1 positioned as a specialized reasoning model. Baidu stated that performance improvements were achieved through new technologies such as "FlashMask" dynamic attention masking and a heterogeneous multimodal mixture-of-experts architecture. === Turbo Models === In June 2024, Baidu announced Ernie 4.0 Turbo. In April 2025, Ernie 4.5 Turbo and X1 Turbo were released. These models are optimized for faster response times and lower operational costs. == Service == In its subscription options, the professional plan gives users access to Ernie 4.0 with a payment either for a month or with reduced payment for auto-renewal per month. Meanwhile, Ernie 3.5 is free of charge. Ernie 4.0, the language model for Ernie bot, has information updated to April 2023. == Censorship == Ernie Bot is subject to the Chinese government's censorship regime. In public tests with journalists, Ernie Bot refused to answer questions about CCP general secretary Xi Jinping, the 1989 Tiananmen Square protests and massacre, the persecution of Uyghurs in China in Xinjiang, and the 2019–2020 Hong Kong protests. When queried about the origin of SARS-CoV-2, Ernie Bot stated that it originated among American vape users.

    Read more →
  • Macromedia FreeHand

    Macromedia FreeHand

    Macromedia FreeHand (formerly Aldus FreeHand) is a discontinued computer application for creating two-dimensional vector graphics oriented primarily to professional illustration, desktop publishing and content creation for the Web. FreeHand was similar in scope, intended market, and functionality to Adobe Illustrator, CorelDRAW and Xara Designer Pro. Because of FreeHand's dedicated page layout and text control features, it also compares to Adobe InDesign and QuarkXPress. Professions using FreeHand include graphic design, illustration, cartography, fashion and textile design, product design, architects, scientific research, and multimedia production. FreeHand was created by Altsys Corporation in 1988 and licensed to Aldus Corporation, which released versions 1 through 4. In 1994, Aldus merged with Adobe Systems and because of the overlapping market with Adobe Illustrator, FreeHand was returned to Altsys by order of the Federal Trade Commission. Altsys was later bought by Macromedia, which released FreeHand versions 5 through 11 (FreeHand MX). In 2005, Adobe Systems acquired Macromedia and its product line which included FreeHand MX, under whose ownership it presently resides. Since 2003, FreeHand development has been discontinued; in the Adobe Systems catalog, FreeHand has been replaced by Adobe Illustrator. FreeHand MX continues to run under Windows 11 and under Mac OS X 10.6 (Snow Leopard) within Rosetta, a PowerPC code emulator, and requires a registration patch supplied by Adobe. FreeHand 10 runs without problems on Mac OS X Snow Leopard with Rosetta enabled, and does not require a registration patch. Later versions of macOS can use a Mac OS X Snow Leopard Server virtual machine to emulate the required PowerPC support. == History == === Altsys and Aldus FreeHand === In 1984, James R. Von Ehr founded Altsys Corporation to develop graphics applications for personal computers. Based in Plano, Texas, the company initially produced font editing and conversion software; Fontastic Plus, Metamorphosis, and the Art Importer. Their premier PostScript font-design package, Fontographer, was released in 1986 and was the first such program on the market. With the PostScript background having been established by Fontographer, Altsys also developed FreeHand (originally called Masterpiece) as a Macintosh Postscript-based illustration program that used Bézier curves for drawing and was similar to Adobe Illustrator. FreeHand was announced as "... a Macintosh graphics program described as having all the features of Adobe's Illustrator plus drawing tools such as those in Mac Paint and Mac Draft and special effects similar to those in Cricket Draw." Seattle's Aldus Corporation acquired a licensing agreement with Altsys Corporation to release FreeHand along with their flagship product, Pagemaker, and Aldus FreeHand 1.0 was released in 1988. FreeHand's product name used intercaps; the F and H were capitalized. The partnership between the two companies continued with Altsys developing FreeHand and with Aldus controlling marketing and sales. After 1988, a competitive exchange between Aldus FreeHand and Adobe Illustrator ensued on the Macintosh platform with each software advancing new tools, achieving better speed, and matching significant features. Windows PC development also allowed Illustrator 2 (aka, Illustrator 88 on the Mac) and FreeHand 3 to release Windows versions to the graphics market. FreeHand 1.0 sold for $495 in 1988. It included the standard drawing tools and features as other draw programs including special effects in fills and screens, text manipulation tools, and full support for CMYK color printing. It was also possible to create and insert PostScript routines anywhere within the program. FreeHand performed in preview mode instead of keyline mode but performance was slower. FreeHand 2.0 sold for $495 in 1989. Besides improving on the features of FreeHand 1.0, FreeHand 2 added faster operation, Pantone colors, stroked text, flexible fill patterns and automatically import graphic assets from other programs. It added accurate control over a color monitor screen display, limited only by its resolution. FreeHand 3.0 sold for $595 in 1991. New features included resizable color, style, and layer panels including an Attributes menu. Also tighter precision of both the existing tools and aligning of objects. FH3 created compound Paths. Text could be converted to paths, applied to an ellipse, or made vertical. Carried over from version 1.0, FreeHand 3 suffered by having text entered into a dialog box instead of directly to the page. In October 1991, a 3.1 upgrade made FreeHand work with System 7 but additionally, it supported pressure-sensitive drawing which offered varying line widths with a users stroke. It improved element manipulation and added more import/export options. FreeHand 4.0 sold for $595 in 1994. Altsys ported FreeHand 3.0 to the NeXT system creating a new program named Virtuoso. Virtuoso continued its development at Altsys and version 2.0 of Virtuoso was feature-equivalent to FreeHand 4 (with the addition of NeXT-specific features such as Services and Display PostScript) and file compatible, with Virtuoso 2 able to open FreeHand 4 files and vice versa. A prominent feature of this version was the ability to type directly into the page and wrap inside or outside any shape. It also included drag-and-drop color imaging, a larger pasteboard, and a user interface that featured floating, rollup panels. The colors palette included a color mixer for adding new colors to the swatch list. Speed increases were made. In the same year of FreeHand 4 release, Adobe Systems announced merger plans with Aldus Corporation for $525 million. Fear about the end of competition between these two leading applications was reported in the media and expressed by customers (Illustrator versus FreeHand and Adobe Photoshop versus Aldus PhotoStyler.) Because of this overlapping of the market, Altsys stepped in by suing Aldus, saying that the merger deal was "a prima facie violation of a non-compete clause within the FreeHand licensing agreement." Altsys CEO Jim Von Ehr explained, "No one loves FreeHand more than we do. We will do whatever it takes to see it survive." The Federal Trade Commission issued a complaint against Adobe Systems on October 18, 1994, ordering a divestiture of FreeHand to "remedy the lessening of competition resulting from the acquisition as alleged in the Commission's complaint," and further, the FTC ordering, "That for a period of ten (10) years from the date on which this order becomes final, respondents shall not, without the prior approval of the Commission, directly or indirectly, through subsidiaries, partnerships, or otherwise .. Acquire any Professional Illustration Software or acquire or enter into any exclusive license to Professional Illustration Software;" (referring to FreeHand.) FreeHand was returned to Altsys with all licensing and marketing rights as well as Aldus FreeHand's customer list. === Macromedia Freehand === By late 1994, Altsys still retained all rights to FreeHand. Despite brief plans to keep it in-house to sell it along with Fontographer and Virtuoso, Altsys reached an agreement with the multimedia software company, Macromedia, to be acquired. This mutual agreement provided FreeHand and Fontographer a new home with ample resources for marketing, sales, and competition against the newly merged Adobe-Aldus company. Altsys would remain in Richardson, Texas, but would be renamed as the Digital Arts Group of Macromedia and was responsible for the continued development of FreeHand. Macromedia received FreeHand's 200,000 customers and expanded its traditional product line of multimedia graphics software to illustration and design graphics software. CEO James Von Ehr became a Macromedia vice-president until 1997 when he left to start another venture. FreeHand 5.0 sold for $595 in 1995. This version featured a more customizable and expanded workspace, multiple views, stronger design and editing tools, a report generator, spell check, paragraph styles, multicolor gradient fills up to 64 colors, speed improvements, and it accepted Illustrator plugins. In September 1995, a 5.5 upgrade added Photoshop plug-in support, PDF import capabilities, the Extract feature, inline graphics to text, improved auto-expanding text containers, the Crop feature, and the Create PICT Image feature. A FreeHand 5.5 upgrade was part of the FreeHand Graphics Studio (a suite that included Fontographer, Macromedia xRes image editing application, and Extreme 3D animation and modeling application). FreeHand 6.0 in 1996. This version only existed in beta. Some Freehand 7 prerelease versions were released under the Freehand 6 tag. FreeHand 7.0 sold for $399 in 1996, or $449 as part of the FreeHand Graphics Studio (see above.) Features included a redesigned user interface that allowed recombining Inspectors, Panel Tabs, Dockable Panels, Smart Cursors,

    Read more →
  • Cepstral mean and variance normalization

    Cepstral mean and variance normalization

    Cepstral mean and variance normalization (CMVN) is a computationally efficient normalization technique for robust speech recognition. The performance of CMVN is known to degrade for short utterances. This is due to insufficient data for parameter estimation and loss of discriminable information as all utterances are forced to have zero mean and unit variance. CMVN minimizes distortion by noise contamination for robust feature extraction by linearly transforming the cepstral coefficients to have the same segmental statistics. Cepstral Normalization has been effective in the CMU Sphinx for maintaining a high level of recognition accuracy over a wide variety of acoustical environments. == Cepstral Normalization Techniques == There are multiple algorithms that achieve Cepstral Normalization in different ways. === Fixed codeword-dependent cepstral normalization (FCDCN) === FCDCN was developed to provide a form of compensation that provides greater recognition accuracy than SDCN but in a more computationally-efficient manner than the CDCN algorithm. The FCDCN algorithm applies an additive correction that depends on the instantaneous SNR of the input (like SDCN), but that can also vary from codeword to codeword (like CDCN). === Multiple Fixed Codeword-dependent Cepstral Normalization (MFCDCN) === MFCDCN is a simple extension of FCDCN algorithm that does not need environment specific training. In MFCDCN, compensation vectors are pre-computed in parallel for a set of target environments, using the FCDCN algorithm. === Incremental Multiple Fixed Codeword-dependent Cepstral Normalization (IMFCDCN) === While environment selection for the compensation vectors of MFCDCN is generally performed on an utterance-by-utterance basis, IMFCFCN improves on it by allowing the classification process to make use of cepstral vectors from previous utterances in a given session. == Cepstral Noise Subtraction == Automatic speech recognition (ASR) describes the steps of transcribing speech utterances represented as acoustic wave forms to written words. As is, CMVN has been used in different applications as this technique has proven to provide better speech recognitions results in different environments. CMVN has the capabilities to reduce differences between test and training data produced by channel distortions and colorizations . CMVN has also been found to be able to reduce differences in feature representation between speakers can also partly reduce the influence of background noise.

    Read more →
  • Rapid prototyping

    Rapid prototyping

    Rapid prototyping is a group of techniques used to quickly fabricate a scale model of a physical part or assembly using three-dimensional computer aided design (CAD) data. Construction of the part or assembly is usually done using 3D printing or "additive layer manufacturing" technology. The first methods for rapid prototyping became available in mid 1987 and were used to produce models and prototype parts. Today, they are used for a wide range of applications and are used to manufacture production-quality parts in relatively small numbers if desired without the typical unfavorable short-run economics. This economy has encouraged online service bureaus. Historical surveys of RP technology start with discussions of simulacra production techniques used by 19th-century sculptors. Some modern sculptors use the progeny technology to produce exhibitions and various objects. The ability to reproduce designs from a dataset has given rise to issues of rights, as it is now possible to interpolate volumetric data from 2D images. As with CNC subtractive methods, the computer-aided-design – computer-aided manufacturing CAD -CAM workflow in the traditional rapid prototyping process starts with the creation of geometric data, either as a 3D solid using a CAD workstation, or 2D slices using a scanning device. For rapid prototyping this data must represent a valid geometric model; namely, one whose boundary surfaces enclose a finite volume, contain no holes exposing the interior, and do not fold back on themselves. In other words, the object must have an "inside". The model is valid if for each point in 3D space the computer can determine uniquely whether that point lies inside, on, or outside the boundary surface of the model. CAD post-processors will approximate the application vendors' internal CAD geometric forms (e.g., B-splines) with a simplified mathematical form, which in turn is expressed in a specified data format which is a common feature in additive manufacturing: STL file format, a de facto standard for transferring solid geometric models to SFF machines. To obtain the necessary motion control trajectories to drive the actual SFF, rapid prototyping, 3D printing or additive manufacturing mechanism, the prepared geometric model is typically sliced into layers, and the slices are scanned into lines (producing a "2D drawing" used to generate trajectory as in CNC's toolpath), mimicking in reverse the layer-to-layer physical building process. == Application areas == Rapid prototyping is also commonly applied in software engineering to try out new business models and application architectures such as Aerospace, Automotive, Financial Services, Product development, and Healthcare. Aerospace design and industrial teams rely on prototyping in order to create new AM methodologies in the industry. Using SLA they can quickly make multiple versions of their projects in a few days and begin testing quicker. Rapid Prototyping allows designers/developers to provide an accurate idea of how the finished product will turn out before putting too much time and money into the prototype. 3D printing being used for Rapid Prototyping allows for Industrial 3D printing to take place. With this, you could have large-scale moulds to spare parts being pumped out quickly within a short period of time. == Types of Rapid Prototyping == Stereolithography (SLA) → a laser-cured photopolymer for materials such as thermoplastic-like photopolymers. Selective Laser Sintering (SLS) → a laser-sintered powder for materials such as Nylon or TPU. Direct Metal Laser Sintering (DMLS) → laser-sintered metal powder for materials like stainless steel, titanium, chrome, and aluminum. Fused Deposition Modeling (FDM) → fused extrusions of filaments like ABS, PC, and PPCU. Multi Jet Fusion (MJF) → it is an inkjet array selective fusing across bed of nylon powder for Black Nylon 12. PolyJet (PJET) → it is a uv-cured jetted photopolymer to work with acrylic-based and elastomeric photopolymers. Computer Numerical Controlled Machine (CNC) → it is used for manipulating engineering-grade thermoplastics and metals. Injection Molding (IM) → the injection is done using aluminum molds and it is used for thermoplastics, metals and liquid silicone rubber. Vacuum Casting→ is a manufacturing process used to create high-quality prototypes and small batches of parts. == History == In the 1970s, Joseph Henry Condon and others at Bell Labs developed the Unix Circuit Design System (UCDS), automating the laborious and error-prone task of manually converting drawings to fabricate circuit boards for the purposes of research and development. By the 1980s, U.S. policy makers and industrial managers were forced to take note that America's dominance in the field of machine tool manufacturing evaporated, in what was named the machine tool crisis. Numerous projects sought to counter these trends in the traditional CNC CAM area, which had begun in the US. Later when Rapid Prototyping Systems moved out of labs to be commercialized, it was recognized that developments were already international and U.S. rapid prototyping companies would not have the luxury of letting a lead slip away. The National Science Foundation was an umbrella for the National Aeronautics and Space Administration (NASA), the US Department of Energy, the US Department of Commerce NIST, the US Department of Defense, Defense Advanced Research Projects Agency (DARPA), and the Office of Naval Research coordinated studies to inform strategic planners in their deliberations. One such report was the 1997 Rapid Prototyping in Europe and Japan Panel Report in which Joseph J. Beaman founder of DTM Corporation [DTM RapidTool pictured] provides a historical perspective: The roots of rapid prototyping technology can be traced to practices in topography and photosculpture. Within TOPOGRAPHY Blanther (1892) suggested a layered method for making a mold for raised relief paper topographical maps .The process involved cutting the contour lines on a series of plates which were then stacked. Matsubara (1974) of Mitsubishi proposed a topographical process with a photo-hardening photopolymer resin to form thin layers stacked to make a casting mold. PHOTOSCULPTURE was a 19th-century technique to create exact three-dimensional replicas of objects. Most famously Francois Willeme (1860) placed 24 cameras in a circular array and simultaneously photographed an object. The silhouette of each photograph was then used to carve a replica. Morioka (1935, 1944) developed a hybrid photo sculpture and topographic process using structured light to photographically create contour lines of an object. The lines could then be developed into sheets and cut and stacked, or projected onto stock material for carving. The Munz (1956) Process reproduced a three-dimensional image of an object by selectively exposing, layer by layer, a photo emulsion on a lowering piston. After fixing, a solid transparent cylinder contains an image of the object. "The Origins of Rapid Prototyping - RP stems from the ever-growing CAD industry, more specifically, the solid modeling side of CAD. Before solid modeling was introduced in the late 1980's, three-dimensional models were created with wire frames and surfaces. But not until the development of true solid modeling could innovative processes such as RP be developed. Charles Hull, who helped found 3D Systems in 1986, developed the first RP process. This process, called stereolithography, builds objects by curing thin consecutive layers of certain ultraviolet light-sensitive liquid resins with a low-power laser. With the introduction of RP, CAD solid models could suddenly come to life". The technologies referred to as Solid Freeform Fabrication are what we recognize today as rapid prototyping, 3D printing or additive manufacturing: Swainson (1977), Schwerzel (1984) worked on polymerization of a photosensitive polymer at the intersection of two computer controlled laser beams. Ciraud (1972) considered magnetostatic or electrostatic deposition with electron beam, laser or plasma for sintered surface cladding. These were all proposed but it is unknown if working machines were built. Hideo Kodama of Nagoya Municipal Industrial Research Institute was the first to publish an account of a solid model fabricated using a photopolymer rapid prototyping system (1981). The first 3D rapid prototyping system relying on Fused Deposition Modeling (FDM) was made in April 1992 by Stratasys but the patent did not issue until June 9, 1992. Sanders Prototype, Inc introduced the first desktop inkjet 3D Printer (3DP) using an invention from August 4, 1992 (Helinski), Modelmaker 6Pro in late 1993 and then the larger industrial 3D printer, Modelmaker 2, in 1997. Z-Corp using the MIT 3DP powder binding for Direct Shell Casting (DSP) invented 1993 was introduced to the market in 1995. Even at that early date the technology was seen as having a place in manufacturing practice. A low resol

    Read more →
  • Adobe Encore

    Adobe Encore

    Adobe Encore (previously Adobe Encore DVD) was a DVD authoring software tool produced by Adobe Systems and targeted at professional video producers. Video and audio resources could be used in their current format for development, allowing the user to transcode them to MPEG-2 video and Dolby Digital audio upon project completion. DVD menus could be created and edited in Adobe Photoshop using special layering techniques. Adobe Encore did not support writing to a Blu-ray Disc using AVCHD 2.0. Encore is bundled with Adobe Premiere Pro CS6. Adobe Encore CS6 was the last release. While Premiere Pro CC has moved to the Creative Cloud, Encore has now been discontinued. == Licensing == All forms of Adobe Encore used a proprietary licensing system from its developer, Adobe Systems. Versions 1.0 and 1.5 required a separate license fee (rather than making 1.5 available as a free update). Version 3, also known as CS3, was sold only in bundle with Premiere CS3. Encore CS4, CS5, CS5.5 and CS6 were only sold in the Premiere Pro CS4, CS5, CS5.5 and CS6 bundles, respectively. Adobe CC subscribers no longer have access to Adobe Encore CS6. Adobe Encore is not included with Premiere Pro CC. == Functionality == Adobe Encore allowed for creating interactive DVD menus from Photoshop documents, which could be tweaked from within Encore. Video and audio streams could be embedded in the DVD and be made to play when certain elements of the menu are interacted with. It had similar functionality to Adobe Flash and Premiere Pro, due to its ability to both edit video on a timeline and embed interactive content.

    Read more →
  • Captions (app)

    Captions (app)

    Mirage (formerly known as Captions) is a video-generating, video-editing and AI research company headquartered in New York City. Their first app, Captions, is available on iOS, Android, and Web and offers a suite of tools aimed at streamlining the creation and editing of videos. Their enterprise platform, Mirage Studio, generates AI actors and videos for marketing assets and video campaigns. == History == Mirage was co-founded by Gaurav Misra and Dwight Churchill. During Misra's time leading design engineering at Snap Inc., he followed the rise of a new category of video, the "talking video." In 2021, Misra left Snap to found Mirage with his former colleague Churchill. Later that year, the Captions app launched with early backing from venture capital firms Sequoia Capital and Andreessen Horowitz as well as individual investors. In 2023, the company released Lipdub, an Al dubbing app which translates any video with spoken audio into 28 languages. In October 2023, Captions shared that it maintained over 100,000 daily active users with "about a million" videos being created monthly. In November 2024, Captions acquired AlpacaML, a generative AI company that focused on art and other images. In June 2025, Captions launched Mirage Studio, for marketers and advertising agencies. In September 2025, Captions rebranded their company to Mirage. This change reflects the company's focus on developing their proprietary foundation model and future video products. == Products == The Captions app offers features to automate common production tasks including captioning, editing, dubbing, script creation, and music integration. Mirage Studio allows users to generate AI avatars and create short-form videos from prompts or audio. == Awards == In 2023, the company was recognized as part of Fast Company's "Next Big Things In Tech" series. In 2024, the company won 2 Webby Awards for Best Use of AI & Machine Learning and Creative Production.

    Read more →
  • Parasolid

    Parasolid

    Parasolid is a geometric modeling kernel originally developed by Shape Data Limited, now owned and developed by Siemens Digital Industries Software. It can be licensed by other companies for use in their 3D computer graphics software products. Parasolid's abilities include model creation and editing utilities such as Boolean modeling operators, feature modeling support, advanced surfacing, thickening and hollowing, blending and filleting, and sheet modeling. It also incorporates modeling with mesh surfaces and lattices. Parasolid also includes tools for direct model editing, including tapering, offsetting, geometry replacement and removing feature details with automated regeneration of surrounding data. Parasolid also provides wide-ranging graphical and rendering support, including hidden-line, wireframe and drafting, tessellation, and model data inquiries. To use Parasolid effectively, software developers need knowledge of CAD in general, computational geometry, and topology. Parasolid is available for Windows (32-bit, 64-bit and AArch64), Linux (64-bit and AArch64), macOS (Apple silicon and Intel), iOS, and Android. == Parasolid XT format == Parasolid parts are normally saved in XT format, which usually has the file extension .X_T. The format is documented and open. There is also a binary version of the format, usually with an .X_B extension, which is somewhat more compact. Both .X_T and .X_B are used for parts files. == Applications == It is used in many computer-aided design (CAD), computer-aided manufacturing (CAM), computer-aided engineering (CAE), product visualization, and CAD data exchange packages. Notable uses include:

    Read more →