AI Coding Newsletter

AI Coding Newsletter — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Lawbot

    Lawbot

    Lawbots are a broad class of customer-facing legal AI applications that are used to automate specific legal tasks, such as document automation and legal research. The terms robot lawyer and lawyer bot are used as synonyms to lawbot. A robot lawyer or a robo-lawyer refers to a legal AI application that can perform tasks that are typically done by paralegals or young associates at law firms. However, there is some debate on the correctness of the term. Some commentators say that legal AI is technically speaking neither a lawyer nor a robot and should not be referred to as such. Other commentators believe that the term can be misleading and note that the robot lawyer of the future will not be one all-encompassing application but a collection of specialized bots for various tasks. Lawbots use various artificial intelligence techniques or other intelligent systems to limit humans' direct ongoing involvement in certain steps of a legal matter. The user interfaces on lawbots vary from smart searches and step-by-step forms to chatbots. Consumer and enterprise-facing lawbot solutions often do not require direct supervision from a legal professional. Depending on the task, some client-facing solutions used at law firms operate under an attorney supervision. == Levels of autonomy == The following levels of autonomy (LoA) are suggested for automated AI legal reasoning: Level 0 (LoA0): No automation for AI legal reasoning Level 1 (LoA1): Simple assistance automation Level 2 (LoA2): Advanced assistance automation Level 3 (LoA3): Semi-autonomous automation Level 4 (LoA4): Domain automation Level 5 (LoA5): Fully-autonomous automation Level 6 (LoA6): Superhuman automation == Examples == Some legal AI solutions are developed and marketed directly to the customers or consumers, whereas other applications are tools for the attorneys at law firms. There are already hundreds of legal AI solutions that operate in multitude of ways varying in sophistication and dependence on scripted algorithms. One notable legal technology chatbot application is DoNotPay. It had started off as an app for contesting parking tickets, but has since expanded to include features that help users with many different types of legal issues, ranging from consumer protection to immigration rights and other social issues. == Impact on the legal industry == In the 2016 report, Deloitte estimated that more than 110,000 law jobs in just the United Kingdom alone could disappear within the next twenty years due to automation. This change could result in the creation of more highly skilled jobs and in the reduction of paralegal and temporary positions. Deloitte's report asserts that "there is significant potential for high-skilled roles that involve repetitive processes to be automated by smart and self-learning algorithms". According to Lawyers to Engage, between 22% of a lawyer’s work and 35% of a legal assistant’s work can be automated in the US. Top law schools like Harvard have already begun to integrate Artificial Intelligence into the curriculum. Legal tech start-up companies have begun developing applications that assist law firms with completing low-risk legal processes. These applications can enable lawyers to focus on more work that requires their specific expertise. The automation of processes like contract reviewing, enforcement of negotiations (smart contracts) and client intake (expert systems) allows law firms to streamline their procedures and improve efficiency. In addition, automation benefits small-to-medium law firms that do not have the resources to utilize junior talent on such routine tasks. The increase of law firms utilizing automated applications could result into legal tech becoming a necessity in the industry. Digital Reason CEO, Tim Estes, stated that those who refuse the opportunity to integrate AI in their workflow are “most at risk.” In 2018, Forbes reported a 713% increase in investments in legal tech. This rapid growth is reflective of law firms beginning to “cede business to… new model legal providers… that meld technological, business and legal expertise.” == Access to law and justice == It has been widely estimated for at least the last generation that all the programs and resources devoted to ensuring access to justice address only 20% of the civil legal needs of low-income people in the United States. Drawing on this experience, in late 2011, the U.S. government-funded Legal Services Corporation decided to convene a summit of leaders to explore how best to use technology in the access-to-justice community. The group adopted a mission for The Summit on the Use of Technology to Expand Access to Justice (Summit) consistent with the magnitude of the challenge: "to explore the potential of technology to move the United States toward providing some form of effective assistance to 100% of persons otherwise unable to afford an attorney for dealing with essential civil legal needs". In April 2017, joined by Microsoft and Pro Bono Net, the Legal Services Corporation (LSC) announced a pilot program to develop online, statewide legal portals to direct individuals with civil legal needs to the most appropriate forms of assistance. == Technological limitations == Current research in subjects such as computational privacy, explainable machine learning, Bayesian deep learning, knowledge-intensive machine learning, and transfer learning reveals that we do not yet have the technology to enable Level 4 to 6 AI lawbots. In 2023, OpenLaw began developing a model called Law Bot, which interacts in a conversational way as an attorney. The dialogue format makes it possible for Law Bot to answer follow-up questions, challenge incorrect premises, and reject inappropriate requests. Currently, they try to ensure it is in full compliance with all laws and regulations while conducting further beta testing before releasing it to the general public.

    Read more →
  • Pixlr

    Pixlr

    Pixlr is a group of SaaS creative tools including Pixlr.com, Designs.ai and Vectr.com. Pixlr.com is a cloud-based set of image editing tools and utilities, including AI image generation and enhancements. The Pixlr suite targets users who require subjectively simple, or more advanced, photo editing as well as graphic design. It features a freemium business model with subscription plans—Plus, Premium and Teams. The platform can be used on desktop and also smartphones and tablets. Pixlr is compatible with various image formats such as JPEG, PNG, WEBP, GIF, PSD (Photoshop Document) and PXZ (native Pixlr document format). Designs.ai lets users create content using AI, with a goal of being within two minutes, across different media types including videos, text, banners and audio. Vectr.com was acquired in 2017 before being spun out into Pixlr Group in 2023. == History == Pixlr was founded in 2008 and built on Macromedia Flash. On 19 July 2011, Autodesk announced that they had acquired the Pixlr suite. In 2013, Time listed Pixlr as one of the top 50 websites of the year. In 2017, Pixlr was acquired from Autodesk. It was subsequently rebuilt and relaunched in HTML5 in 2019. In September 2023, Pixlr was awarded as the Top 13 GenAi Web Product by the world's top venture firm Andreessen Horowitz. In November 2023, Pixlr, Designs.ai and Vectr were combined as a new business group named Pixlr Group focusing on generative AI and creative software solutions. In May 2024, Pixlr was featured as one of the top 18 progressive web applications highlighted on Google I/O. == Versions == Pixlr.com rebranded itself as a full creative suite in 2019 by introducing Pixlr X, Pixlr E and Pixlr M. The platform introduced more features in December 2021 with a new logo and added tools which included: Brushes, the 'Heal tool', Animation, and Batch upload. The brush feature enables the creation of hand-drawn effects. The Heal tool allows users to remove unwanted objects from their images whereas the Animation feature can be used to include movements into their edits. Users can also utilize Batch upload to edit up to 50 images simultaneously. In November 2022, Pixlr 2023 was launched, adding more tools such as "AI smart resize", colorization, text wrapping and other additional effects. In November 2023, Pixlr 2024 was launched with Pixlr Designer and new AI-powered updates which includes AI image generation, AI infill, AI inpainting and more.

    Read more →
  • LumenVox

    LumenVox

    LumenVox is a privately held speech recognition software company based in San Diego, California. LumenVox has been described as one of the market leaders in the speech recognition software industry. == History == LumenVox was founded in 2001 as subsidiary of Progressive Computing. According to LumenVox CEO Edward Miller, when Progressive had initially looked to add speech recognition to its own phone system, it found the existing offerings too expensive and recognized a niche in the market for a more affordable speech recognition product. This led to the development of LumenVox with an aim to bring speech recognition to small-to-midsized businesses. LumenVox is one of the major providers of automatic speech recognition for telephone systems, and as of 2006, became the second largest provider of speech recognition software. == Products == The primary LumenVox product is the LumenVox Speech Engine. It is a speaker-independent automatic speech recognizer that uses the Speech Recognition Grammar Specification for building and defining grammars. It has been integrated with several of the major voice platforms, including Avaya Voice Portal/Interactive Response, Aculab, and BroadSoft's BroadWorks. The Speech Engine was originally derived from CMU Sphinx, but LumenVox has added considerable development effort to make it a commercial-ready product. LumenVox also offers a product called the Speech Tuner, which provides a graphical means of testing and troubleshooting speech recognition applications. == Open source support == LumenVox was recognized as one of the top VoIP companies in 2008 for its work in providing its offerings to the open source community, an effort by the company that began in 2006 when it partnered with Digium. At that time, Digium, maintainer of the open source Asterisk PBX, integrated the LumenVox Speech Engine into Asterisk. This made LumenVox the first commercially available speech recognition engine for Asterisk. As one of the earlier commercial software integrations with Asterisk, the LumenVox integration has been described as one of the applications that helped to mainstream Asterisk. In 2009, LumenVox also began offering access to the Speech Engine as a monthly subscription, bringing the cost of entry down even lower for open source users. LumenVox is also integrated with the open source UniMRCP project, which provides open source client and server libraries for the Media Resource Control Protocol.

    Read more →
  • Adobe PhotoDeluxe

    Adobe PhotoDeluxe

    PhotoDeluxe was a consumer-oriented image editing software line published by Adobe Systems from 1996 until July 8, 2002. At that time it was replaced by Adobe's newly launched consumer-oriented image editing software Photoshop Elements. Adobe no longer provides technical support for the PhotoDeluxe software line. PhotoDeluxe had a range of image processing capabilities for the home photographer and image handler. These included removing red-eye, cropping, and adjusting brightness, contrast, and sharpness. It also included software to extract pictures from an image scanner. Among the functionality included was the ability to dynamically resize photos and export them in a wide range of formats. It also had a range of printing options including printing multiple copies of an image on the same page. It was often bundled free with Epson scanners or as free software with new computers. == Features == Despite the critical concerns regarding the quality of the setup, Photo Deluxe supports layering, blurs, sharpening, cloning, gradient fills, color and background switches, color variations, resizing options, and many other features. Another drawback of PhotoDeluxe was that it was designed for Mac computers, so working on Windows PC was a problem for those who were unable to customize their preferences. == Versions == === Adobe PhotoDeluxe 1.0 === The first version was released in 1996 for Windows and Macintosh computers. In one year, it sold over one million copies. === Adobe PhotoDeluxe 2.0 === The new version was released in 1997 and had added features such as a Clone Tool, red-eye removal, and sample templates for making posters, cards, and calendars. It also had new special effect features. === Adobe PhotoDeluxe 3.0 === The 3rd version was released in 1998. The new features included customizable clipart settings, the ability to import photos on the web, enhanced repair activities following Guided Activities, and Adobe Connectables to add new activities. === Adobe PhotoDeluxe Home Edition (4.0) === Version 4.0 was created by the makers of Photoshop. It had advanced abilities such as tools to add animation, voice, and music to a picture. It also had features to restore photos to their original position. == History == Adobe PhotoDeluxe 1.0 was released in 1996 for Macintosh computers, initially retailing for an MSRP of $49. The software did quite well, reportedly selling over a million copies by February of the next year, primarily due to bundles with companies like Apple and Hewlett-Packard. PhotoDeluxe was primarily advertised to consumers as a way to do basic photo manipulation, such as cropping and rotating images, or creating simple cards and calendars. PhotoDeluxe 2.0 was released in 1997, and was the last version of PhotoDeluxe that Adobe made that worked on Macs. PhotoDeluxe 2.0 became the "number one selling consumer photo-editing software product in the world." PhotoDeluxe 3.0 was released in 1998, where it was rebranded as "3.0 Home Edition", as Adobe released PhotoDeluxe Business Edition later that year for a higher price. PhotoDeluxe Home Edition, unofficially called PhotoDeluxe 4.0, was released in 1999 and was the last version of PhotoDeluxe to be released. Adobe officially cancelled PhotoDeluxe on July 8, 2002, citing the presence of Photoshop and Photoshop Elements, with support being officially cancelled in mid-2003. No version of PhotoDeluxe is compatible with Windows 10, rendering the program obsolete. == Pricing == All home versions of PhotoDeluxe retailed for an MSRP of $49. PhotoDeluxe 2.0 and onwards allowed users to upgrade from a previous version of PhotoDeluxe or a competing piece of graphics software for $39. Additionally PhotoDeluxe Business Edition allowed a similar deal, allowing users to upgrade from other versions of PhotoDeluxe or a competing software for $59, instead of its normal price of $99. Adobe also offered a bundle allowing users of 1.0 or 2.0 to get 3.0 and Business Edition for $79.

    Read more →
  • Showcase Workshop

    Showcase Workshop

    Showcase Workshop, also referred to as Showcase, is a SaaS company that develops a presentation-building application for business use. Users upload files and images to a web platform which generates presentations viewable on a suite of mobile apps. Showcase was founded in 2011. The company’s headquarters are in Wellington, New Zealand. == History == Showcase Workshop was originally developed in response to dynamically changing content being presented on iPads at the 2012 Olympics. After market-testing a beta version of the core application, Showcase Workshop launched commercially in 2012. In 2014 Showcase partnered with Vodafone Global Enterprise. == Product == Users upload pre-existing PDFs, videos, images and Microsoft Office documents to a secure server, building presentations or ‘showcases’ which can then be downloaded via the mobile apps. The presentations are used for mobile sales enablement, training, or operational/health and safety purposes. == Reception == Reviewers have praised the ease of use of Showcase, calling it a “better alternative to developing a native app” and “intuitive”. Criticisms include the lack of differing templates and a lack of complex customisation controls. Showcase was nominated for a Tabby Award in 2014 and won a Tabby Award in 2015 for its Windows app.

    Read more →
  • Bump (application)

    Bump (application)

    Bump was an iOS and Android mobile app that enabled smartphone users to transfer contact information, photos and files between devices. In 2011, it was #8 on Apple's list of all-time most popular free iPhone apps, and by February 2013 it had been downloaded 125 million times. Its developer, Bump Technologies, shut down the service and discontinued the app on January 31, 2014, after being acquired by Google for Google Photos and Android Camera. == Features == Bump sent contact information, photos and files to another device over the internet. Before activating the transfer, each user confirmed what they want to send to the other user. To initiate a transfer, two people physically bumped their phones together. A screen appeared on both users' smartphone displays, allowing them to confirm what they want to send to each other. When two users bumped their phones, software on the phones send a variety of sensor data to an algorithm running on Bump servers, which included the location of the phone, accelerometer readings, IP address, and other sensor readings. The algorithm figured out which two phones felt the same physical bump and then transfers the information between those phones. Bump did not use Near Field Communication. February 2012 release of Bump 3.0 for iOS, the company streamlined the app to focus on its most frequently used features: contact and photo sharing. Bump 3.0 for Android maintained the features eliminated from the iOS version but moved them behind swipeable layers. In May 2012, a Bump update enabled users to transfer photos from their phone to their computer via a web service. To initiate a transfer, the user goes to the Bump website on their computer and bumps the smartphone on the computer keyboard's space bar. By December 2012, various Bump updates for iOS and Android had added the abilities to share video, audio, and any files. Users swipe to access those features. In February 2013, an update to the Bump iOS and Android apps enabled users to transfer photos, videos, contacts and other files from a computer to a smartphone and vice versa via a web service. To perform the transfer, users went to the Bump website on their computer and bump the smartphone on the computer keyboard's space bar. == History == The underlying idea of a synchronous gesture like bumping two devices for content transfer or pairing them was first conceived by Ken Hinkley of Microsoft Research in 2003. This idea was presented at a user interface and technology conference that same year. The paper proposed the use of accelerometers and a bumping gesture of two devices to enable communication, screen sharing and content transfer between them. Similar to this original concept, the idea for Bump app was conceived by David Lieb, a former employee of Texas Instruments, while he was attending the University of Chicago Booth School of Business for his MBA. While going through the orientation and meeting process of business school, he became frustrated by constantly entering contact information into his iPhone and felt that the process could be improved. His fellow Texas Instruments employees Andy Huibers and Jake Mintz, who was a classmate of Lieb's at the University of Chicago's MBA program, joined Lieb to form Bump Technologies. Bump Technologies launched in 2008 and is located in Mountain View, CA. Early funding for the project was provided by startup incubator Y Combinator, Sequoia Capital and other angel investors. It gained attention at the CTIA international wireless conference, due to its accessibility and novelty factor. In October 2009, Bump received $3.4m in Series A funding followed in January 2011 with a $16m series B financing round led by Andreessen Horowitz. Silicon Valley venture capitalist Marc Andreessen sits on the company's board. The Bump app debuted in the Apple iOS App Store in March 2009 and was “one of the apps that helped to define the iPhone” (Harry McCracken, Technologizer). It soon became the billionth download on Apple's App Store. An Android version launched in November 2009. By the time Bump 3.0 for iOS was released in February 2012, the app had been installed 77 million times, with users sharing more than 2 million photos daily. As of February 2013, there had been 125 million Bump app downloads. == Other apps created by Bump Technologies == Bump Technologies worked with PayPal in March 2010 to create a PayPal iPhone application. The application, which allows two users to automatically activate an Internet transfer of money between their accounts, found widespread adoption. A similar version was released for Android in August 2010. The Bump capability in PayPal's apps was removed in March 2012. At that time, Bump Technologies released Bump Pay, an iOS app that lets users transfer money via PayPal by physically bumping two smartphones together. The tool was originally created for the Bump team to use when splitting up restaurant bills. The payment feature was not added to the Bump app because the company “wanted to make it as simple as possible so people understand how this works,” Lieb told ABC News. Bump Pay was the first app from the company's Bump Labs initiative. A goal of Bump Labs is to test new app ideas that may not fit within the main Bump app. ING Direct added a feature to its iPhone app in 2011 that lets users transfer money to each other using Bump's technology. The feature was later added to its Android app, now called Capital One 360. In July 2012, Bump Technologies released Flock, an iPhone photo sharing app. An Android version was released in December 2012. Using geolocation data embedded in photos and a user's Facebook connections, Flock finds pictures the user takes while out with friends and family and puts everyone's photos from that event into a single shared album. Users receive a push notification after the event, asking if they want to share their photos with friends who were there in the moment. The app will also scan previous photos in the iPhone camera roll and uncover photos that have yet to be shared. If location services were enabled at the time a photo was taken, Flock allows users to create an album of photos from the past with the friends who were there with them. == Acquisition by Google == On September 16, 2013, Bump Technologies announced that it had been acquired by Google. On December 31, 2013, they broke the news that both Bump and Flock would be discontinued so that the team could focus on new projects at Google. The apps were removed from the App Store and Google Play on January 31, 2014. The company subsequently deleted all user data and shut down their servers, thus rendering existing installations of the apps inoperable.

    Read more →
  • Feed forward (control)

    Feed forward (control)

    A feed forward (sometimes written feedforward) is an element or pathway within a control system that passes a controlling signal from a source in its external environment to a load elsewhere in its external environment. This is often a command signal from an external operator. In control engineering, a feedforward control system is a control system that uses sensors to detect disturbances affecting the system and then applies an additional input to minimize the effect of the disturbance. This requires a mathematical model of the system so that the effect of disturbances can be properly predicted. A control system which has only feed-forward behavior responds to its control signal in a pre-defined way without responding to the way the system reacts; it is in contrast with a system that also has feedback, which adjusts the input to take account of how it affects the system, and how the system itself may vary unpredictably. In a feed-forward system, the control variable adjustment is not error-based. Instead it is based on knowledge about the process in the form of a mathematical model of the process and knowledge about, or measurements of, the process disturbances. Some prerequisites are needed for control scheme to be reliable by pure feed-forward without feedback: the external command or controlling signal must be available, and the effect of the output of the system on the load should be known (that usually means that the load must be predictably unchanging with time). Sometimes pure feed-forward control without feedback is called 'ballistic', because once a control signal has been sent, it cannot be further adjusted; any corrective adjustment must be by way of a new control signal. In contrast, 'cruise control' adjusts the output in response to the load that it encounters, by a feedback mechanism. These systems could relate to control theory, physiology, or computing. == Overview == With feed-forward or feedforward control, the disturbances are measured and accounted for before they have time to affect the system. In the house example, a feed-forward system may measure the fact that the door is opened and automatically turn on the heater before the house can get too cold. The difficulty with feed-forward control is that the effects of the disturbances on the system must be accurately predicted, and there must not be any unmeasured disturbances. For instance, if a window was opened that was not being measured, the feed-forward-controlled thermostat might let the house cool down. The term has specific meaning within the field of CPU-based automatic control. The discipline of feedforward control as it relates to modern, CPU based automatic controls is widely discussed, but is seldom practiced due to the difficulty and expense of developing or providing for the mathematical model required to facilitate this type of control. Open-loop control and feedback control, often based on canned PID control algorithms, are much more widely used. There are three types of control systems: open-loop, feed-forward, and feedback. An example of a pure open-loop control system is manual non-power-assisted steering of a motor car; the steering system does not have access to an auxiliary power source and does not respond to varying resistance to turning of the direction wheels; the driver must make that response without help from the steering system. In comparison, power steering has access to a controlled auxiliary power source, which depends on the engine speed. When the steering wheel is turned, a valve is opened which allows fluid under pressure to turn the wheels. A sensor monitors that pressure so that the valve only opens enough to cause the correct pressure to reach the wheel turning mechanism. This is feed-forward control where the output of the system, the change in direction of travel of the vehicle, plays no part in the system. See Model predictive control. If the driver is included in the system, then they do provide a feedback path by observing the direction of travel and compensating for errors by turning the steering wheel. In that case you have a feedback system, and the block labeled System in Figure(c) is a feed-forward system. In other words, systems of different types can be nested, and the overall system regarded as a black-box. Feedforward control is distinctly different from open-loop control and teleoperator systems. Feedforward control requires a mathematical model of the plant (process and/or machine being controlled) and the plant's relationship to any inputs or feedback the system might receive. Neither open-loop control nor teleoperator systems require the sophistication of a mathematical model of the physical system or plant being controlled. Control based on operator input without integral processing and interpretation through a mathematical model of the system is a teleoperator system and is not considered feedforward control. == History == Historically, the use of the term feedforward is found in works by Harold S. Black in US patent 1686792 (invented 17 March 1923) and D. M. MacKay as early as 1956. While MacKay's work is in the field of biological control theory, he speaks only of feedforward systems. MacKay does not mention feedforward control or allude to the discipline of feedforward controls. MacKay and other early writers who use the term feedforward are generally writing about theories of how human or animal brains work. Black also has US patent 2102671 invented 2 August 1927 on the technique of feedback applied to electronic systems. The discipline of feedforward controls was largely developed by professors and graduate students at Georgia Tech, MIT, Stanford and Carnegie Mellon. Feedforward is not typically hyphenated in scholarly publications. Meckl and Seering of MIT and Book and Dickerson of Georgia Tech began the development of the concepts of Feedforward Control in the mid-1970s. The discipline of Feedforward Controls was well defined in many scholarly papers, articles and books by the late 1980s. == Benefits == The benefits of feedforward control are significant and can often justify the extra cost, time and effort required to implement the technology. Control accuracy can often be improved by as much as an order of magnitude if the mathematical model is of sufficient quality and implementation of the feedforward control law is well thought out. Energy consumption by the feedforward control system and its driver is typically substantially lower than with other controls. Stability is enhanced such that the controlled device can be built of lower cost, lighter weight, springier materials while still being highly accurate and able to operate at high speeds. Other benefits of feedforward control include reduced wear and tear on equipment, lower maintenance costs, higher reliability and a substantial reduction in hysteresis. Feedforward control is often combined with feedback control to optimize performance. == Model == The mathematical model of the plant (machine, process or organism) used by the feedforward control system may be created and input by a control engineer or it may be learned by the control system. Control systems capable of learning and/or adapting their mathematical model have become more practical as microprocessor speeds have increased. The discipline of modern feedforward control was itself made possible by the invention of microprocessors. Feedforward control requires integration of the mathematical model into the control algorithm such that it is used to determine the control actions based on what is known about the state of the system being controlled. In the case of control for a lightweight, flexible robotic arm, this could be as simple as compensating between when the robot arm is carrying a payload and when it is not. The target joint angles are adjusted to place the payload in the desired position based on knowing the deflections in the arm from the mathematical model's interpretation of the disturbance caused by the payload. Systems that plan actions and then pass the plan to a different system for execution do not satisfy the above definition of feedforward control. Unless the system includes a means to detect a disturbance or receive an input and process that input through the mathematical model to determine the required modification to the control action, it is not true feedforward control. === Open system === In control theory, an open system is a feed forward system that does not have any feedback loop to control its output. In contrast, a closed system uses on a feedback loop to control the operation of the system. In an open system, the output of the system is not fed back into the input to the system for control or operation. == Applications == === Physiological feed-forward system === In physiology, feed-forward control is exemplified by the normal anticipatory regulation of heartbeat in advance of actual physical exertion by the central autonomic network. Feed-forward

    Read more →
  • Non-separable wavelet

    Non-separable wavelet

    Non-separable wavelets are multi-dimensional wavelets that are not directly implemented as tensor products of wavelets on some lower-dimensional space. They have been studied since 1992. They offer a few important advantages. Notably, using non-separable filters leads to more parameters in design, and consequently better filters. The main difference, when compared to the one-dimensional wavelets, is that multi-dimensional sampling requires the use of lattices (e.g., the quincunx lattice). The wavelet filters themselves can be separable or non-separable regardless of the sampling lattice. Thus, in some cases, the non-separable wavelets can be implemented in a separable fashion. Unlike separable wavelet, the non-separable wavelets are capable of detecting structures that are not only horizontal, vertical or diagonal (show less anisotropy). == Examples == Red-black wavelets Contourlets Shearlets Directionlets Steerable pyramids Non-separable schemes for tensor-product wavelets

    Read more →
  • Microsoft Whiteboard

    Microsoft Whiteboard

    Microsoft Whiteboard is a free multi-platform application, as well as an online service and a feature in Microsoft Teams, which simulates a virtual whiteboard and enables real-time collaboration between users. == Overview and features == Microsoft Whiteboard allows users to draw on a virtual whiteboard using input methods such as a stylus pen or a mouse and keyboard, and write down notes, draw connections between shareable ideas, and interact in real time. Microsoft Whiteboard is available to download on the following platforms and devices: Microsoft Windows (on Windows 10 or above) Android Apple iOS Surface Hub devices It is also available on the web and as a feature in Microsoft Teams. Microsoft Whiteboard allows users with Microsoft accounts to view, edit, and share whiteboards using the provided tools and options. The feature set includes tools for drawing, shapes, and media. Drawing in Microsoft Whiteboard is called inking. It works both on mobile devices and computers. The inking toolbar has customizable pencils, a ruler, a highlighter, an eraser, and an object selector. Whiteboard can recognize shapes drawn by hand and straighten them. Holding the Shift key on a computer while inking draws straight lines. Microsoft Whiteboard has keyboard shortcuts for some functions. Additional features include inserting sticky notes, text boxes, stickers, as well as images. Grid lines and colors are adjustable. Different templates can be inserted into the whiteboard. Users can also share their reactions. A feature limited to boards created in Microsoft Teams, is the ability to make them read-only; other participants from the meeting cannot edit them. == Reviews == PC Magazine gave Microsoft Whiteboard a score of 3.5 out of 5, praising the app's free availability and plentiful templates. It compared it to other, paid whiteboarding solutions, and concluded that Microsoft offers the best free one. Some of the cons, described by PCMag, include the inability to view boards without a Microsoft account and the inability to create custom templates.

    Read more →
  • WebPlus

    WebPlus

    Serif WebPlus was a website design program for Microsoft Windows, developed by the software company, Serif. It allows users to design, create and upload their website onto the internet without any knowledge of HTML or other web technologies. Much like Microsoft Word, WebPlus uses WYSIWYG drag and drop editing to add and position text, images and links as they would appear on the finished web page. Once a user has designed their site, WebPlus can preview the site in a web browser before uploading the site using the in-built FTP. The software comes with a variety of pre-designed sample websites containing Filler text like Lorem ipsum, which can be used as a template for quickly designing a site. It also provides drawing tools for creating and editing buttons and web graphics. == Free WebPlus Starter Edition == Previously Serif had made available feature limited Starter Editions of their software, based on older versions, which could be obtained and used free of charge. For WebPlus the final free edition was based on version X5 and this was released in September 2012. This continued to be available from Serif's server until it was withdrawn around March 2016. WebPlus was then only available as a paid-for version X8. == Program Withdrawal == In March 2016, Serif announced that WebPlus X8 would be the final version, and that there were no current plans to design an application to replace it. Sales of WebPlus X8 by Serif were ended around December 2016. In early 2018, Serif announced that Serif Web Resources, hosted on Serif servers and required to implement some advanced web-site functionality in WebPlus created sites, would no longer work after 31 August 2018. In 2018, Serif also shutdown the servers that generated the "Plus" software registration numbers on-line from the product version and the individual generated installation number. Serif revealed the alternative was to use a universal master registration number, which is 881887. This is known to work with post 2003 Serif "Plus" software (e.g. verified to work with PagePlus v5.02). However, later Serif "Plus" software still registers itself automatically if within a certain recent period of a previous Serif software registration on the same PC. == Supported platforms == WebPlus was developed for Microsoft Windows "Win32" graphical desktop interface and is fully compatible with Windows XP, Windows Vista (32/64bit), Windows 7 (32/64bit) and Windows 8. == Features == Web hosting to upload websites to the internet with the address www.sitename.webplus.net and email [email protected]. E-Commerce tool to create online stores with providers such as PayPal. Form wizard generates online forms to collect information from website visitors. Add blogs, forums, hit counters, online polls and content management systems to websites using Smart Objects. Google Maps tool embeds maps and optional navigation markers within a website. Site navigation bars adopt a website's structure providing a tool for navigating around the website. Photo gallery groups a collection of images together and displays them as an animated slideshow. Search engine optimization (SEO) tools optimise a websites search ranking with the likes of Google, Yahoo! and Bing. Collect website metrics such as page popularity and number of website hits using Google Analytics. WebPlus X5 introduced a button studio for creating button graphics. Restrict access to specific pages on a website with a secure member's area. WebPlus automatically converts images and graphics into a web targeted format, optimising them for fast download. Embed YouTube videos within a web page. Add animated effects to a website with Animated GIFs, Animated Marquees or by importing Flash videos. Stream news and information feeds to a website using RSS and podcasts. Automated Site Checker analyses and corrects potential problems with a website. AdSense tool incorporates Google AdSense advertisements into a website In-built FTP transfers files onto a web server, uploading a website to the internet. In-built Basic Photo Editor the PhotoLab can make automatic adjustments and "Quick Fix's" to photos. From X5, WebPlus offers image editing and filters, through its PhotoLab and also provides a dedicated background-removal tool in the form of Cutout Studio. Display images, Flash videos and web pages using animated Lightboxes. Filter Effects can be applied to the graphical objects, giving convincing, realistic effects such as glass, metallic, plastic and other 2D/3D filters. WebPlus also provides QuickShapes for creating button and web graphics. These predefined shapes can be quickly modified with sliders to adjust certain parameters, for example creating rounded rectangles, etc. Shapes include: rectangles, ellipses, stars, spirals, cogs, petals, etc.

    Read more →
  • Adobe PhotoDeluxe

    Adobe PhotoDeluxe

    PhotoDeluxe was a consumer-oriented image editing software line published by Adobe Systems from 1996 until July 8, 2002. At that time it was replaced by Adobe's newly launched consumer-oriented image editing software Photoshop Elements. Adobe no longer provides technical support for the PhotoDeluxe software line. PhotoDeluxe had a range of image processing capabilities for the home photographer and image handler. These included removing red-eye, cropping, and adjusting brightness, contrast, and sharpness. It also included software to extract pictures from an image scanner. Among the functionality included was the ability to dynamically resize photos and export them in a wide range of formats. It also had a range of printing options including printing multiple copies of an image on the same page. It was often bundled free with Epson scanners or as free software with new computers. == Features == Despite the critical concerns regarding the quality of the setup, Photo Deluxe supports layering, blurs, sharpening, cloning, gradient fills, color and background switches, color variations, resizing options, and many other features. Another drawback of PhotoDeluxe was that it was designed for Mac computers, so working on Windows PC was a problem for those who were unable to customize their preferences. == Versions == === Adobe PhotoDeluxe 1.0 === The first version was released in 1996 for Windows and Macintosh computers. In one year, it sold over one million copies. === Adobe PhotoDeluxe 2.0 === The new version was released in 1997 and had added features such as a Clone Tool, red-eye removal, and sample templates for making posters, cards, and calendars. It also had new special effect features. === Adobe PhotoDeluxe 3.0 === The 3rd version was released in 1998. The new features included customizable clipart settings, the ability to import photos on the web, enhanced repair activities following Guided Activities, and Adobe Connectables to add new activities. === Adobe PhotoDeluxe Home Edition (4.0) === Version 4.0 was created by the makers of Photoshop. It had advanced abilities such as tools to add animation, voice, and music to a picture. It also had features to restore photos to their original position. == History == Adobe PhotoDeluxe 1.0 was released in 1996 for Macintosh computers, initially retailing for an MSRP of $49. The software did quite well, reportedly selling over a million copies by February of the next year, primarily due to bundles with companies like Apple and Hewlett-Packard. PhotoDeluxe was primarily advertised to consumers as a way to do basic photo manipulation, such as cropping and rotating images, or creating simple cards and calendars. PhotoDeluxe 2.0 was released in 1997, and was the last version of PhotoDeluxe that Adobe made that worked on Macs. PhotoDeluxe 2.0 became the "number one selling consumer photo-editing software product in the world." PhotoDeluxe 3.0 was released in 1998, where it was rebranded as "3.0 Home Edition", as Adobe released PhotoDeluxe Business Edition later that year for a higher price. PhotoDeluxe Home Edition, unofficially called PhotoDeluxe 4.0, was released in 1999 and was the last version of PhotoDeluxe to be released. Adobe officially cancelled PhotoDeluxe on July 8, 2002, citing the presence of Photoshop and Photoshop Elements, with support being officially cancelled in mid-2003. No version of PhotoDeluxe is compatible with Windows 10, rendering the program obsolete. == Pricing == All home versions of PhotoDeluxe retailed for an MSRP of $49. PhotoDeluxe 2.0 and onwards allowed users to upgrade from a previous version of PhotoDeluxe or a competing piece of graphics software for $39. Additionally PhotoDeluxe Business Edition allowed a similar deal, allowing users to upgrade from other versions of PhotoDeluxe or a competing software for $59, instead of its normal price of $99. Adobe also offered a bundle allowing users of 1.0 or 2.0 to get 3.0 and Business Edition for $79.

    Read more →
  • RagTime

    RagTime

    RagTime is a frame-oriented business publishing software which combines word processing, spreadsheets, simple drawings, image processing, and charts, in a single document/program, integrated software. It is often used to create forms, reports, documentation, desktop publishing, and in office environments. Typical users are business clients, educational institutions, administrations, architects, and also private users. Ragtime includes the following modules: Page layout (forms, templates etc.) Word processing Image processing Spreadsheets, similar to Microsoft Excel Formulas and functions which can be used throughout, in text, graphics, and spreadsheets Charts in different types of diagrams Drawings in vector graphics including lines, polygons, Bézier curves and more Slide show (presentation of RagTime documents) Audio/video Buttons (pop-up menus, switches, and more) that can be used within RagTime documents Import/export of various file formats Support of the AppleScript scripting language available system-wide under macOS == Principle == RagTime differs from most other comparable programs or software packages in its strict frame-oriented design: all content is contained within frames on each page. The content can have a fixed position within its frame or, if it is text or a spreadsheet, flow into another frame that is connected to the first frame via a so-called “pipeline”. RagTime has no different document types for different types of data; all content is stored in a single compound document type. Thus, a RagTime document not only can contain multiple pages, but also multiple layouts within the same document; e.g. spreadsheets in addition to text and images. The RagTime filename extension is .rtd (RagTime document); for templates the extension is .rtt (RagTime template). The current version is RagTime 6.6.5. It is available for OS X (10.6-10.14) and Windows (XP/Vista/7/8/10). == Extensions == FileTime – allows accessing “FileMaker Pro” databases from RagTime documents under OS X RagTime Connect – ODBC database connection for RagTime 6 (Mac and Windows) Johannes – print extension for the simple creation of stapled or folded brochures, booklets etc. PowerFunctions – additional functions for a more effective creation of intelligent documents for exchanging data and for use in mixed Mac/Windows environments MetaFormula – SYLK-based extension that allows calculating text as formula == History == RagTime has been developed since 1985 for the Macintosh – originally named MacFrame – and was published in 1986. When released, it already had the present name, which was chosen following the then-available software package Lotus Jazz. In the European Macintosh market, RagTime quickly gained a prominent position that continues to this day, even though the market share has decreased. Despite repeated attempts, the program could not gain acceptance in the North American market due to its high cost ($395 in 1990). The North American sales office closed in 1991, shortly after Claris Corporation released ClarisWorks which duplicated much of the functionality of RagTime for a lower price. After the manufacturer – first Brüning & Everth, followed by B&E Software and today RagTime.de Development – had focused on the Macintosh only for a very long time, it also released a Windows version, RagTime 5.0, in 1999. However, the program could not assume great significance against established competitors, especially Microsoft Office. Until mid-2006 RagTime was, in addition to the commercial version, also available as a free version (RagTime Solo) for personal use. RagTime Solo included the same features and performance (except for spelling and Syllabification) dictionaries), but was not allowed for use in commercial environments. In other languages RagTime Solo was distributed as RagTime Privat. In a press release from July 5, 2006, RagTime announced the discontinuation of RagTime Solo: “… the RagTime Solo license conditions were often misinterpreted or deliberately flouted. Therefore we discontinued RagTime Solo, there will be no private version of RagTime 6 anymore.” After a successful start of the RagTime 6.0 software, sales edged significantly lower in the following years. Disagreements arose among the shareholders about the continuation of the company, which filed for bankruptcy in July 2007. As a result, the rights to RagTime were taken over by the newly established company RagTime.de Development GmbH, which was responsible for the development. The sales partner RagTime.de Sales GmbH distributed the RagTime products until October 2015. Today RagTime.de Development GmbH is also responsible for sales. The last level of development is the extensively revamped version RagTime 6.6 of 8 October 2015, which also includes new OS X features (e.g. high-resolution “Retina” displays) and supports Windows 10. == Programming == RagTime 1-3 were developed in Pascal, since version 4 the development is completely coded in C++. External programming and automation can be implemented via AppleScript on a Mac, and via OLE/COM-API (e.g. Visual Basic) under Windows. On a Mac, RagTime provides a comprehensive AppleScript library, for the automation of almost any task, from automatic document creation to the export of PDF documents. RagTime also supports “recordings” by use of the “AppleScript Editor”, which allows recording the interactive RagTime operation as an AppleScript program sequence. AppleScripts can be saved in the RagTime document and called via menu or shortcut keys. On Windows, RagTime (since version 6) disposes over an OLE/COM API, which allows automating many RagTime components via external programming. For that purpose there is a type library that installs the available RagTime OLE/COM object catalogue. Programming can be realized in all programming languages supported by Microsoft.

    Read more →
  • Manifold regularization

    Manifold regularization

    In machine learning, manifold regularization is a technique for using the shape of a dataset to constrain the functions that should be learned on that dataset. In many machine learning problems, the data to be learned do not cover the entire input space. For example, a facial recognition system may not need to classify any possible image, but only the subset of images that contain faces. The technique of manifold learning assumes that the relevant subset of data comes from a manifold, a mathematical structure with useful properties. The technique also assumes that the function to be learned is smooth: data with different labels are not likely to be close together, and so the labeling function should not change quickly in areas where there are likely to be many data points. Because of this assumption, a manifold regularization algorithm can use unlabeled data to inform where the learned function is allowed to change quickly and where it is not, using an extension of the technique of Tikhonov regularization. Manifold regularization algorithms can extend supervised learning algorithms in semi-supervised learning and transductive learning settings, where unlabeled data are available. The technique has been used for applications including medical imaging, geographical imaging, and object recognition. == Manifold regularizer == === Motivation === Manifold regularization is a type of regularization, a family of techniques that reduces overfitting and ensures that a problem is well-posed by penalizing complex solutions. In particular, manifold regularization extends the technique of Tikhonov regularization as applied to Reproducing kernel Hilbert spaces (RKHSs). Under standard Tikhonov regularization on RKHSs, a learning algorithm attempts to learn a function f {\displaystyle f} from among a hypothesis space of functions H {\displaystyle {\mathcal {H}}} . The hypothesis space is an RKHS, meaning that it is associated with a kernel K {\displaystyle K} , and so every candidate function f {\displaystyle f} has a norm ‖ f ‖ K {\displaystyle \left\|f\right\|_{K}} , which represents the complexity of the candidate function in the hypothesis space. When the algorithm considers a candidate function, it takes its norm into account in order to penalize complex functions. Formally, given a set of labeled training data ( x 1 , y 1 ) , … , ( x ℓ , y ℓ ) {\displaystyle (x_{1},y_{1}),\ldots ,(x_{\ell },y_{\ell })} with x i ∈ X , y i ∈ Y {\displaystyle x_{i}\in X,y_{i}\in Y} and a loss function V {\displaystyle V} , a learning algorithm using Tikhonov regularization will attempt to solve the expression arg min f ∈ H 1 ℓ ∑ i = 1 ℓ V ( f ( x i ) , y i ) + γ ‖ f ‖ K 2 {\displaystyle {\underset {f\in {\mathcal {H}}}{\arg \!\min }}{\frac {1}{\ell }}\sum _{i=1}^{\ell }V(f(x_{i}),y_{i})+\gamma \left\|f\right\|_{K}^{2}} where γ {\displaystyle \gamma } is a hyperparameter that controls how much the algorithm will prefer simpler functions over functions that fit the data better. Manifold regularization adds a second regularization term, the intrinsic regularizer, to the ambient regularizer used in standard Tikhonov regularization. Under the manifold assumption in machine learning, the data in question do not come from the entire input space X {\displaystyle X} , but instead from a nonlinear manifold M ⊂ X {\displaystyle M\subset X} . The geometry of this manifold, the intrinsic space, is used to determine the regularization norm. === Laplacian norm === There are many possible choices for the intrinsic regularizer ‖ f ‖ I {\displaystyle \left\|f\right\|_{I}} . Many natural choices involve the gradient on the manifold ∇ M {\displaystyle \nabla _{M}} , which can provide a measure of how smooth a target function is. A smooth function should change slowly where the input data are dense; that is, the gradient ∇ M f ( x ) {\displaystyle \nabla _{M}f(x)} should be small where the marginal probability density P X ( x ) {\displaystyle {\mathcal {P}}_{X}(x)} , the probability density of a randomly drawn data point appearing at x {\displaystyle x} , is large. This gives one appropriate choice for the intrinsic regularizer: ‖ f ‖ I 2 = ∫ x ∈ M ‖ ∇ M f ( x ) ‖ 2 d P X ( x ) {\displaystyle \left\|f\right\|_{I}^{2}=\int _{x\in M}\left\|\nabla _{M}f(x)\right\|^{2}\,d{\mathcal {P}}_{X}(x)} In practice, this norm cannot be computed directly because the marginal distribution P X {\displaystyle {\mathcal {P}}_{X}} is unknown, but it can be estimated from the provided data. === Graph-based approach of the Laplacian norm === When the distances between input points are interpreted as a graph, then the Laplacian matrix of the graph can help to estimate the marginal distribution. Suppose that the input data include ℓ {\displaystyle \ell } labeled examples (pairs of an input x {\displaystyle x} and a label y {\displaystyle y} ) and u {\displaystyle u} unlabeled examples (inputs without associated labels). Define W {\displaystyle W} to be a matrix of edge weights for a graph, where W i j {\displaystyle W_{ij}} is a similarity built from distance measure between the data points x i {\displaystyle x_{i}} and x j {\displaystyle x_{j}} (so that more close implies higher W i j {\displaystyle W_{ij}} ). Define D {\displaystyle D} to be a diagonal matrix with D i i = ∑ j = 1 ℓ + u W i j {\displaystyle D_{ii}=\sum _{j=1}^{\ell +u}W_{ij}} and L {\displaystyle L} to be the Laplacian matrix D − W {\displaystyle D-W} . Then, as the number of data points ℓ + u {\displaystyle \ell +u} increases, L {\displaystyle L} converges to the Laplace–Beltrami operator Δ M {\displaystyle \Delta _{M}} , which is the divergence of the gradient ∇ M {\displaystyle \nabla _{M}} . Then, if f {\displaystyle \mathbf {f} } is a vector of the values of f {\displaystyle f} at the data, f = [ f ( x 1 ) , … , f ( x l + u ) ] T {\displaystyle \mathbf {f} =[f(x_{1}),\ldots ,f(x_{l+u})]^{\mathrm {T} }} , the intrinsic norm can be estimated: ‖ f ‖ I 2 = 1 ( ℓ + u ) 2 f T L f {\displaystyle \left\|f\right\|_{I}^{2}={\frac {1}{(\ell +u)^{2}}}\mathbf {f} ^{\mathrm {T} }L\mathbf {f} } As the number of data points ℓ + u {\displaystyle \ell +u} increases, this empirical definition of ‖ f ‖ I 2 {\displaystyle \left\|f\right\|_{I}^{2}} converges to the definition when P X {\displaystyle {\mathcal {P}}_{X}} is known. === Solving the regularization problem with graph-based approach === Using the weights γ A {\displaystyle \gamma _{A}} and γ I {\displaystyle \gamma _{I}} for the ambient and intrinsic regularizers, the final expression to be solved becomes: arg min f ∈ H 1 ℓ ∑ i = 1 ℓ V ( f ( x i ) , y i ) + γ A ‖ f ‖ K 2 + γ I ( ℓ + u ) 2 f T L f {\displaystyle {\underset {f\in {\mathcal {H}}}{\arg \!\min }}{\frac {1}{\ell }}\sum _{i=1}^{\ell }V(f(x_{i}),y_{i})+\gamma _{A}\left\|f\right\|_{K}^{2}+{\frac {\gamma _{I}}{(\ell +u)^{2}}}\mathbf {f} ^{\mathrm {T} }L\mathbf {f} } As with other kernel methods, H {\displaystyle {\mathcal {H}}} may be an infinite-dimensional space, so if the regularization expression cannot be solved explicitly, it is impossible to search the entire space for a solution. Instead, a representer theorem shows that under certain conditions on the choice of the norm ‖ f ‖ I {\displaystyle \left\|f\right\|_{I}} , the optimal solution f ∗ {\displaystyle f^{}} must be a linear combination of the kernel centered at each of the input points: for some weights α i {\displaystyle \alpha _{i}} , f ∗ ( x ) = ∑ i = 1 ℓ + u α i K ( x i , x ) {\displaystyle f^{}(x)=\sum _{i=1}^{\ell +u}\alpha _{i}K(x_{i},x)} Using this result, it is possible to search for the optimal solution f ∗ {\displaystyle f^{}} by searching the finite-dimensional space defined by the possible choices of α i {\displaystyle \alpha _{i}} . === Functional approach of the Laplacian norm === The idea beyond the graph-Laplacian is to use neighbors to estimate the Laplacian. This method is akin to local averaging methods, that are known to scale poorly in high-dimensional problems. Indeed, the graph Laplacian is known to suffer from the curse of dimensionality. Luckily, it is possible to leverage expected smoothness of the function to estimate thanks to more advanced functional analysis. This method consists of estimating the Laplacian operator using derivatives of the kernel reading ∂ 1 , j K ( x i , x ) {\displaystyle \partial _{1,j}K(x_{i},x)} where ∂ 1 , j {\displaystyle \partial _{1,j}} denotes the partial derivatives according to the j-th coordinate of the first variable. This second approach to the Laplacian norm is to put in relation with meshfree methods, that contrast with the finite difference method in PDE. == Applications == Manifold regularization can extend a variety of algorithms that can be expressed using Tikhonov regularization, by choosing an appropriate loss function V {\displaystyle V} and hypothesis space H {\displaystyle {\mathcal {H}}} . Two commonly used examples are the families of support vector machines and regularized least squares algorithm

    Read more →
  • Word error rate

    Word error rate

    Word error rate (WER) is a common metric of the performance of a speech recognition or machine translation system. The WER metric typically ranges from 0 to 1, where 0 indicates that the compared pieces of text are exactly identical, and 1 (or larger) indicates that they are completely different with no similarity. This way, a WER of 0.8 means that there is an 80% error rate for compared sentences. The general difficulty of measuring performance lies in the fact that the recognized word sequence can have a different length from the reference word sequence (supposedly the correct one). The WER is derived from the Levenshtein distance, working at the word level instead of the phoneme level. The WER is a valuable tool for comparing different systems as well as for evaluating improvements within one system. This kind of measurement, however, provides no details on the nature of translation errors and further work is therefore required to identify the main source(s) of error and to focus any research effort. This problem is solved by first aligning the recognized word sequence with the reference (spoken) word sequence using dynamic string alignment. Examination of this issue is seen through a theory called the power law that states the correlation between perplexity and word error rate. Word error rate can then be computed as: W E R = S + D + I N = S + D + I S + D + C {\displaystyle {\mathit {WER}}={\frac {S+D+I}{N}}={\frac {S+D+I}{S+D+C}}} where S is the number of substitutions, D is the number of deletions, I is the number of insertions, C is the number of correct words, N is the number of words in the reference (N=S+D+C) The intuition behind 'deletion' and 'insertion' is how to get from the reference to the hypothesis. So if we have the reference "This is wikipedia" and hypothesis "This _ wikipedia", we call it a deletion. Note that since N is the number of words in the reference, the word error rate can be larger than 1.0, namely if the number of insertions I is larger than the number of correct words C. When reporting the performance of a speech recognition system, sometimes word accuracy (WAcc) is used instead: W A c c = 1 − W E R = N − S − D − I N = C − I N {\displaystyle {\mathit {WAcc}}=1-{\mathit {WER}}={\frac {N-S-D-I}{N}}={\frac {C-I}{N}}} Since the WER can be larger than 1.0, the word accuracy can be smaller than 0.0. == Experiments == It is commonly believed that a lower word error rate shows superior accuracy in recognition of speech, compared with a higher word error rate. However, at least one study has shown that this may not be true. In a Microsoft Research experiment, it was shown that, if people were trained under "that matches the optimization objective for understanding", (Wang, Acero and Chelba, 2003) they would show a higher accuracy in understanding of language than other people who demonstrated a lower word error rate, showing that true understanding of spoken language relies on more than just high word recognition accuracy. == Other metrics == One problem with using a generic formula such as the one above, however, is that no account is taken of the effect that different types of error may have on the likelihood of successful outcome, e.g. some errors may be more disruptive than others and some may be corrected more easily than others. These factors are likely to be specific to the syntax being tested. A further problem is that, even with the best alignment, the formula cannot distinguish a substitution error from a combined deletion plus insertion error. Hunt (1990) has proposed the use of a weighted measure of performance accuracy where errors of substitution are weighted at unity but errors of deletion and insertion are both weighted only at 0.5, thus: W E R = S + 0.5 D + 0.5 I N {\displaystyle {\mathit {WER}}={\frac {S+0.5D+0.5I}{N}}} There is some debate, however, as to whether Hunt's formula may properly be used to assess the performance of a single system, as it was developed as a means of comparing more fairly competing candidate systems. A further complication is added by whether a given syntax allows for error correction and, if it does, how easy that process is for the user. There is thus some merit to the argument that performance metrics should be developed to suit the particular system being measured. Whichever metric is used, however, one major theoretical problem in assessing the performance of a system is deciding whether a word has been “mis-pronounced,” i.e. does the fault lie with the user or with the recogniser. This may be particularly relevant in a system which is designed to cope with non-native speakers of a given language or with strong regional accents. The pace at which words should be spoken during the measurement process is also a source of variability between subjects, as is the need for subjects to rest or take a breath. All such factors may need to be controlled in some way. For text dictation it is generally agreed that performance accuracy at a rate below 95% is not acceptable, but this again may be syntax and/or domain specific, e.g. whether there is time pressure on users to complete the task, whether there are alternative methods of completion, and so on. The term "Single Word Error Rate" is sometimes referred to as the percentage of incorrect recognitions for each different word in the system vocabulary. == Edit distance == The word error rate may also be referred to as the length normalized edit distance. The normalized edit distance between X and Y, d( X, Y ) is defined as the minimum of W( P ) / L ( P ), where P is an editing path between X and Y, W ( P ) is the sum of the weights of the elementary edit operations of P, and L(P) is the number of these operations (length of P).

    Read more →
  • Normalization (image processing)

    Normalization (image processing)

    In image processing, normalization is a process that changes the range of pixel intensity values, a kind of intensity mapping. Applications include photographs with poor contrast due to glare, for example. A typical case is contrast stretching. In more general fields of data processing, such as digital signal processing, it is referred to as dynamic range expansion. The purpose of dynamic range expansion in the various applications is usually to bring the image, or other type of signal, into a range that is more familiar or normal to the senses, hence the term normalization. Often, the motivation is to achieve consistency in dynamic range for a set of data, signals, or images to avoid mental distraction or fatigue. For example, a newspaper will strive to make all of the images in an issue share a similar range of grayscale. Auto-normalization in image processing software typically normalizes to the full dynamic range of the number system specified in the image file format. == Definition == Normalization transforms an n-dimensional grayscale image I : { X ⊆ R n } → { Min , . . , Max } {\displaystyle I:\{\mathbb {X} \subseteq \mathbb {R} ^{n}\}\rightarrow \{{\text{Min}},..,{\text{Max}}\}} with intensity values in the range ( Min , Max ) {\displaystyle ({\text{Min}},{\text{Max}})} , into a new image I N : { X ⊆ R n } → { newMin , . . , newMax } {\displaystyle I_{N}:\{\mathbb {X} \subseteq \mathbb {R} ^{n}\}\rightarrow \{{\text{newMin}},..,{\text{newMax}}\}} with intensity values in the range ( newMin , newMax ) {\displaystyle ({\text{newMin}},{\text{newMax}})} . The linear normalization of a grayscale digital image is performed according to the formula I N = ( I − Min ) newMax − newMin Max − Min + newMin {\displaystyle I_{N}=(I-{\text{Min}}){\frac {{\text{newMax}}-{\text{newMin}}}{{\text{Max}}-{\text{Min}}}}+{\text{newMin}}} For example, if the intensity range of the image is 50 to 180 and the desired range is 0 to 255 the process entails subtracting 50 from each of pixel intensity, making the range 0 to 130. Then each pixel intensity is multiplied by 255/130, making the range 0 to 255. Normalization might also be non-linear, as the relationship between I {\displaystyle I} and I N {\displaystyle I_{N}} may not be linear. An example of non-linear normalization is when the normalization follows a sigmoid function, in which case the normalized image is computed according to the formula I N = ( newMax − newMin ) 1 1 + e − I − β α + newMin {\displaystyle I_{N}=({\text{newMax}}-{\text{newMin}}){\frac {1}{1+e^{-{\frac {I-\beta }{\alpha }}}}}+{\text{newMin}}} Where α {\displaystyle \alpha } defines the width of the input intensity range, and β {\displaystyle \beta } defines the intensity around which the range is centered. Gamma correction (log/inverse log) is also a common transformation function. === Colorspace === Intensity operations generally operate on a colorspace that maps to the human perception of lightness without intentionally changing the other properties. This can be done, for example, by operating on the L component of the CIELAB color space, or approximately by operating on the Y component of YCbCr. It is also possible to operate on each of the RGB color channels, though the result will not always make sense. == Contrast stretching == This is the most significant and essential technique of spatial-based image enhancement. The basic intent of this contrast enhancement technique is to adjust the local contrast in the image so as to bring out the clear regions or objects in the image. Low-contrast images often result from poor or non-uniform lighting conditions, a limited dynamic range of the imaging sensor, or improper settings of the lens aperture. This operation tries to change the intensity of the pixel in the image, particularly in the input image, to obtain an enhanced image. It is based on the number of techniques, namely local, global, dark and bright levels of contrast. The contrast enhancement is considered as the amount of color or gray differentiation that lies among the different features in an image. The contrast enhancement improves the quality of image by increasing the luminance difference between the foreground and background. A contrast stretching transformation can be achieved by: Stretching the dark range of input values into a wider range of output values: This involves increasing the brightness of the darker areas in the image to enhance details and improve visibility. Shifting the mid-range of input values: This involves adjusting the brightness levels of the mid-tones in the image to improve overall contrast and clarity. Compressing the bright range of input values: This process involves reducing the brightness of the brighter areas in the image to prevent overexposure resulting in a more balanced and visually appealing image. It can be described as the following piecewise funciton: I N = { s 1 r 1 I if I < r 1 s 2 − s 1 r 1 − r 2 ( I − r 1 ) if r 1 ≤ I ≤ r 2 1 − s 2 1 − r 2 ( I − r 2 ) if I > r 2 {\displaystyle I_{N}={\begin{cases}{\frac {s_{1}}{r_{1}}}I&{\text{if }}Ir_{2}\end{cases}}} Where: ( r 1 , s 1 ) {\displaystyle (r_{1},s_{1})} defines the transition point between the "dark" range to the "main" range. ( r 2 , s 2 ) {\displaystyle (r_{2},s_{2})} defines the transition point between the "main" range to the "bright" range. A typical linear stretch is obtained when ( r 1 , s 1 ) = ( r min , 0 ) {\displaystyle (r_{1},s_{1})=(r_{\text{min}},0)} and ( r 2 , s 2 ) = ( r max , 1 ) {\displaystyle (r_{2},s_{2})=(r_{\text{max}},1)} , where r min {\displaystyle r_{\text{min}}} and r max {\displaystyle r_{\text{max}}} denote the minimum and maximum levels in the source image. === Global contrast stretching === Global Contrast Stretching considers all color palate ranges at once to determine the maximum and minimum values for the entire RGB color image. This approach utilizes the combination of RGB colors to derive a single maximum and minimum value for contrast stretching across the entire image. === Local contrast stretching === Local contrast stretching (LCS) is an image enhancement method that focuses on locally adjusting each pixel's value to improve the visualization of structures within an image, particularly in both the darkest and lightest portions. It operates by utilizing sliding windows, known as kernels, which traverse the image. The central pixel within each kernel is adjusted using the following formula: I p ( x , y ) = 255 × [ I 0 ( x , y ) − m i n ] ( m a x − m i n ) {\displaystyle I_{p}(x,y)=255\times {\frac {[I_{0}(x,y)-min]}{(max-min)}}} Where: Ip(x,y) is the color level for the output pixel (x,y) after the contrast stretching process. I0(x,y) is the color level input for data pixel (x, y). max is the maximum value for color level in the input image within the selected kernel. min is the minimum value for color level in the input image within the selected kernel. A piecewise form (see above) may also be used. LCS can be applied to the three color channels of an image separately.

    Read more →