Cost-sensitive machine learning

Cost-sensitive machine learning is an approach within machine learning that considers varying costs associated with different types of errors. This method diverges from traditional approaches by introducing a cost matrix, explicitly specifying the penalties or benefits for each type of prediction error. The inherent difficulty which cost-sensitive machine learning tackles is that minimizing different kinds of classification errors is a multi-objective optimization problem. == Overview == Cost-sensitive machine learning optimizes models based on the specific consequences of misclassifications, making it a valuable tool in various applications. It is especially useful in problems with a high imbalance in class distribution and a high imbalance in associated costs Cost-sensitive machine learning introduces a scalar cost function in order to find one (of multiple) Pareto optimal points in this multi-objective optimization problem (similar to the Weighted sum model) == Cost Matrix == The cost matrix is a crucial element within cost-sensitive modeling, explicitly defining the costs or benefits associated with different prediction errors in classification tasks. Represented as a table, the matrix aligns true and predicted classes, assigning a cost value to each combination. For instance, in binary classification, it may distinguish costs for false positives and false negatives. The utility of the cost matrix lies in its application to calculate the expected cost or loss. The formula, expressed as a double summation, utilizes joint probabilities: Expected Loss = ∑ i ∑ j P ( Actual i , Predicted j ) ⋅ Cost Actual i , Predicted j {\displaystyle {\text{Expected Loss}}=\sum _{i}\sum _{j}P({\text{Actual}}_{i},{\text{Predicted}}_{j})\cdot {\text{Cost}}_{{\text{Actual}}_{i},{\text{Predicted}}_{j}}} Here, P ( Actual i , Predicted j ) {\displaystyle P({\text{Actual}}_{i},{\text{Predicted}}_{j})} denotes the joint probability of actual class i {\displaystyle i} and predicted class j {\displaystyle j} , providing a nuanced measure that considers both the probabilities and associated costs. This approach allows practitioners to fine-tune models based on the specific consequences of misclassifications, adapting to scenarios where the impact of prediction errors varies across classes. == Applications == === Fraud Detection === In the realm of data science, particularly in finance, cost-sensitive machine learning is applied to fraud detection. By assigning different costs to false positives and false negatives, models can be fine-tuned to minimize the overall financial impact of misclassifications. === Medical Diagnostics === In healthcare, cost-sensitive machine learning plays a role in medical diagnostics. The approach allows for customization of models based on the potential harm associated with misdiagnoses, ensuring a more patient-centric application of machine learning algorithms. == Challenges == A typical challenge in cost-sensitive machine learning is the reliable determination of the cost matrix which may evolve over time. == Literature == Cost-Sensitive Machine Learning. USA, CRC Press, 2011. ISBN 9781439839287 Abhishek, K., Abdelaziz, D. M. (2023). Machine Learning for Imbalanced Data: Tackle Imbalanced Datasets Using Machine Learning and Deep Learning Techniques. (n.p.): Packt Publishing. ISBN 9781801070881

Excalidraw

Excalidraw is an open-source, web-based virtual whiteboard and diagramming application. It is used to create diagrams, wireframes, and sketches within a web browser without requiring account registration. The software features a characteristic hand-drawn visual style and supports real-time multi-user collaboration using client-side end-to-end encryption. Excalidraw is released under the MIT License and is maintained by Excalidraw s.r.o., a company based in Brno, Czech Republic. == History == Excalidraw was created on 1 January 2020 by Christopher Chedeau, a software engineer at Meta Platforms. Chedeau, who previously co-created React Native and Prettier, initially developed the application as a personal project before registering the domain on 3 January 2020. Within its first months, the project attracted open-source contributors who assisted in expanding its features and rewriting the codebase into TypeScript and React. By early 2021, day-to-day operations moved to Czech developers David Luzar and Milos Vetesnik. In May 2021, the team incorporated Excalidraw s.r.o. in Brno and launched a commercial cloud-based version named Excalidraw+ to fund the open-source project's development. By May 2026, the main open-source repository on GitHub had accumulated over 123,000 stars. == Features and architecture == The application provides an infinite canvas for geometric shapes, lines, arrows, text, and freehand drawing. Its visual presentation relies on Rough.js, a JavaScript graphics library that alters standard vector paths to mimic irregular, hand-drawn lines. Excalidraw operates as a Progressive web application (PWA), allowing local installation and offline usage, saving data natively to local browser storage. Files use a native, JSON-based extension format (.excalidraw), and canvases can be exported to PNG or SVG formats. Real-time collaboration sessions are executed using Socket.IO via a relay server. Data transmission uses the browser's native Web Cryptography API to achieve end-to-end encryption. A symmetric AES key is generated on the client side and appended to the sharing URL as a fragment identifier (following the # character). Because web browsers do not transmit URL fragments to HTTP servers, the data remains unreadable to the distribution server. == Ecosystem == Excalidraw is distributed as an npm package, allowing third-party developers to embed the whiteboard component directly into external React web applications. Community-developed extensions integrate the application's file format into text editors and note-taking systems, including Visual Studio Code and Obsidian. The platform also has native integrations in commercial platforms such as Notion and HackerRank. == Reception == Google's developer relations team published a technical case study on Excalidraw as a reference implementation for Progressive Web Apps. The analysis highlighted the software's adoption of advanced web platform capabilities, specifically its utilization of the File System Access API and native Clipboard API to replicate desktop software behavior within a web browser environment.

DiscoVision

DiscoVision is the name of several things related to the video LaserDisc format. It was the original name of the "Reflective Optical Videodisc System" format later known as "LaserVision" or LaserDisc. == Description == MCA DiscoVision, Inc. was a division of entertainment giant MCA (Music Corporation of America), established in 1969 to develop and sell an optical videodisc system. MCA released discs pressed in Carson and Costa Mesa, California on the DiscoVision label from the format's Atlanta, Georgia launch in 1978 to 1982 and the release of the film The Four Seasons. DiscoVision titles included films from Universal Pictures, Paramount Pictures, Warner Bros. Pictures, and Disney content. Agreements were made with Columbia Pictures and United Artists, though no discs were released on the DiscoVision label from either studio. Most of these companies later established their own labels for the format, the first being Paramount with a dozen movies released on the Paramount Home Video label in the summer of 1981. The successor to MCA DiscoVision, DiscoVision Associates (DVA), was the result of a partnership between IBM and MCA. It was hoped that the merger would provide the basis for improvement of the quality of DiscoVision pressings, but no appreciable improvement ever took hold. In 1981, responsibility for the laser videodisc was sold to Pioneer Electronic Corporation, after MCA Discovision had previously started a partnership in 1977 with Pioneer, Universal Pioneer, to produce the Pioneer PR-7820 player (the first industrial model of DiscoVision player from 1978), as well as establishing disc pressing plants in Japan. As part of the partnership, Pioneer, in association with MCA, had a disc replication facility in Kofu, Japan that produced discs. Some of the last DiscoVision label discs were manufactured by Pioneer in Japan. In the same year, MCA discontinued their DiscoVision branding, due to the sale of the technology to Pioneer (who then rebranded the format as LaserDisc) and in turn rebranded their laserdisc releases, now fabricated by Pioneer, under the MCA Videodisc banner; this was changed to the "MCA Home Video" name for both its VHS and videodisc releases. Some of DiscoVision's technical staff went on to form MCA Video Games, in an effort to produce video game cartridges. DiscoVision Associates later evolved into a patent holding company which manages and licenses intellectual property related to LaserDisc, Compact Disc, and optical disc technologies, as well as other non-disc related fields. In 1989, Pioneer acquired DiscoVision Associates where it continues to license its technologies independently. As the portfolio of patent expired, the presence of DiscoVision became less visible. However, it established the success of a patent holding company, which other companies are stimulated to generate royalty income from their own patent portfolio.

Digital intermediate

Digital intermediate (DI) is a motion picture finishing process which classically involves digitizing a motion picture and manipulating the color and other image characteristics. == Definition and overview == A digital intermediate often replaces or augments the photochemical timing process and is usually the final creative adjustment to a movie before distribution in theaters. It is distinguished from the telecine process in which film is scanned and color is manipulated early in the process to facilitate editing. However the lines between telecine and DI are continually blurred and are often executed on the same hardware by colorists of the same background. These two steps are typically part of the overall color management process in a motion picture at different points in time. A digital intermediate is also customarily done at higher resolution and with greater color fidelity than telecine transfers. Although originally used to describe a process that started with film scanning and ended with film recording, digital intermediate is also used to describe color correction and color grading and even final mastering when a digital camera is used as the image source and/or when the final movie is not output to film. This is due to recent advances in digital cinematography and digital projection technologies that strive to match film origination and film projection. In traditional photochemical film finishing, an intermediate is produced by exposing film to the original camera negative. The intermediate is then used to mass-produce the films that get distributed to theaters. Color grading is done by varying the amount of red, green, and blue light used to expose the intermediate. The digital intermediate process uses digital tools to color grade, which allows for much finer control of individual colors and areas of the image, and allows for the adjustment of image structure (grain, sharpness, etc.). The intermediate for film reproduction can then be produced by means of a film recorder. The physical intermediate film that is a result of the recording process is sometimes also called a digital intermediate, and is usually recorded to internegative (IN) stock, which is inherently finer-grain than original camera negative (OCN). One of the key technical achievements that made the transition to DI possible was the use of 3D look-up tables, which could be used to mimic how the digital image would look once it was printed onto release print stock. This removed a large amount of guesswork from the film-making process, and allowed greater freedom in the colour grading process while reducing risk. The digital master is often used as a source for a DCI-compliant distribution of the motion picture for digital projection. For archival purposes, the digital master created during the digital intermediate process can be recorded to very stable high dynamic range yellow-cyan-magenta (YCM) separations on black-and-white film with an expected 100-year or longer life. While still subject to the natural degradation of any analog chemical master, this archival format, long used in the industry prior to the invention of DI, was considered valuable for providing an archival medium that is independent of changes in digital data recording technologies and file formats that might otherwise render digitally archived material unreadable in the long term. A "film intermediate" is an analog variation of a digital intermediate, where a project shot on digital video is printed onto film stock and transferred back to digital video to emulate film. The term was coined after it was used on the Oscar-winning 2012 short film "Curfew". The process was also used on the films Dune (2021) and The Batman (2022). == History == Telecine tools to electronically capture film images are nearly as old as broadcast television, but the resulting images were widely considered unsuitable for exposing back onto film for theatrical distribution. Film scanners and recorders with quality sufficient to produce images that could be inter-cut with regular film began appearing in the 1970s, with significant improvements in the late 1980s and early 1990s. During this time, digitally processing an entire feature-length film was impractical because the scanners and recorders were extremely slow and the image files were too large compared to computing power available. Instead, individual shots or short sequences were processed for visual effects. In 1992, Visual Effects Supervisor/Producer Chris F. Woods broke through several "techno-barriers" in creating a digital studio to produce the visual effects for the 1993 release Super Mario Bros. It was the first feature film project to digitally scan a large number of VFX plates (over 700) at 2K resolution. It was also the first film scanned and recorded at Kodak's just launched Cinesite facility in Hollywood. This project based studio was the first feature film to use Discreet Logic's (now Autodesk) Flame and Inferno systems, which enjoyed early dominance as high resolution / high performance digital compositing systems. Digital film compositing for visual effects was immediately embraced, while optical printer use for VFX declined just as quickly. Chris Watts further revolutionized the process on the 1998 feature film Pleasantville, becoming the first visual effects supervisor for New Line Cinema to scan, process, and record the majority of a feature-length, live-action, Hollywood film digitally. The first Hollywood film to utilize a digital intermediate process from beginning to end was O Brother, Where Art Thou? in 2000 and in Europe it was Chicken Run released that same year. The process rapidly caught on in the mid-2000s. Around 50% of Hollywood films went through a digital intermediate in 2005, increasing to around 70% by mid-2007. This is due not only to the extra creative options the process affords film makers but also the need for high-quality scanning and color adjustments to produce movies for digital cinema. == Milestones == 1990: The Rescuers Down Under – First feature-length film to be entirely recorded to film from digital files; in this case animation assembled on computers using Walt Disney Feature Animation and Pixar's CAPS system. 1992: Visual effects supervisor and producer Chris F. Woods creates a VFX studio to produce the visual effects for the 1993 film Super Mario Bros. It was the first 35mm feature film to digitally scan a large number of VFX plates (over 700) at 2K resolution, as well as to output the finished VFX to 35mm negative at 2K. 1993: Snow White and the Seven Dwarfs – First film to be entirely scanned to digital files, manipulated, and recorded back to film at 4K resolution. The restoration project was done entirely at 4K resolution and 10-bit color depth using the Cineon system to digitally remove dirt and scratches and restore faded colors. 1998: Pleasantville – The first time the majority of a new feature film was scanned, processed, and recorded digitally. The black-and-white meets color world portrayed in the movie was filmed entirely in color and selectively desaturated and contrast adjusted digitally. The work was done in Los Angeles by Cinesite utilizing a Spirit DataCine for scanning at 2K resolution and a MegaDef color correction system from UK Company Pandora International 1998: Zingo - The first feature film to use digital color correction via digital intermediate in its entirety. The work was performed at the Digital Film Lab in Copenhagen, using a Spirit Datacine to transfer the entire film to digital files at 2K resolution. The digital intermediate process was also used to perform a digital blowup of the film's original Super 16 source format to a 35mm output. 1999: Pacific Ocean Post Film, a team led by John McCunn and Greg Kimble used Kodak film scanners & laser film printer, Cineon software as well as proprietary tools to rebuild and repair the first two reels of the 1968 Beatles' film Yellow Submarine for re-release. 1999: Star Wars: Episode I – The Phantom Menace - Industrial Light & Magic (ILM) scanned the entirety of the visual effects-laden film for the purposes of digital enhancement and the integration of thousands of separately filmed elements with computer generated characters and environments. Outside of the approximately 2000 effects shots that were digitally manipulated, the remaining 170 non-effects shots were also scanned for continuity. However, after the digital shots were manipulated at ILM, they were filmed out individually and sent to Deluxe Labs where they were processed and color timed photochemically. 2000: Sorted - The first feature-length, color 35mm motion picture to fully utilize the digital intermediate process in its entirety from inception to completion. The film was produced at Wave Pictures' digital intermediate film facility in London, England. It was scanned at 2K resolution with 8 bits color depth per color / per pixel using a pin registered, liquid gate Oxberry

Friending and following

Friending is the act of adding someone to a list of "friends" on a social networking service. The notion does not necessarily involve the concept of friendship. It is also distinct from the idea of a "fan"—as employed on the WWW sites of businesses, bands, artists, and others—since it is more than a one-way relationship. A "fan" only receives things. A "friend" can communicate back to the person friending. The act of "friending" someone usually grants that person special privileges (on the service) with respect to oneself. On Facebook, for example, one's "friends" have the privilege of viewing and posting to one's "timeline". Following is a similar concept on other social network services, such as Twitter and Instagram, where a person (follower) chooses to add content from a person or page to their newsfeed. Unlike friending, following is not necessarily mutual, and a person can unfollow (stop following) or block another user at any time without affecting that user's following status. The first scholarly definition and examination of friending and defriending (the act of removing someone from one's friend list, also called unfriending) was David Fono and Kate Raynes-Goldie's "Hyperfriendship and beyond: Friends and Social Norms on LiveJournal" from 2005, which identified the use of the term as both a noun and a verb by users of early social network site and blogging platform LiveJournal, which was originally launched in 1999. == Friend/follower count, friend collecting, and multiple accounts == The addition of people to a friend list without regard to whether one actually is their friend is sometimes known as friend whoring. Matt Jones of Dopplr went so far as to coin the expression "friending considered harmful" to describe the problem of focusing upon the friending of more and more people at the expense of actually making any use of a social network. Friend collecting is the adding of hundreds or thousands of friends/followers, a not uncommon order of magnitude on some social sites. As a result, many teen users feel pressured to heavily curate their posts, posting only carefully posed and edited photographs with well-thought-out captions. Some Instagram users will create a second account, known as a Finsta (short for "Fake Instagram"). A Finsta is typically private, and the owner only allows close friends to follow it. Since the follower count is kept down, the posts can be more candid and silly in nature. Users may also create multiple accounts based on their interests. Someone with a personal social media account might be a photographer and maintain a separate account for that. There is risk associated with following large numbers of people: scholars say that social anxiety could be an effect of managing a large social media network, as users can feel jealous and have a "fear of missing out". == Unfriending and unfollowing == Unfriending is the act of removing someone from a friends list. On Facebook, this means the action is unilateral, meaning, the friendship is terminated on both sides. The act of unfriending is often used when one user was flirting and made the other uncomfortable. Unfollowing is a little different. When a user unfollows someone on Instagram or Twitter, it continues a one-sided relationship. Often, the unfollowed user doesn't realize they were unfollowed, so they continue the following. == Social network friending and friendship == There are distinct groups of "friends" that one can friend on a social networking service. The notion of a social network friend does not necessarily embody the concept of friendship. Although terminology has not yet evolved to distinguish the different types of social networking friends, they can be broken into the following three categories. friends who are actually known These are people that may be one's friends or family in real life, with whom one has regular interaction either on-line or off-line. organizational friends These are companies and other organizations who maintain a "friending" relationship as a contacts list. complete strangers These are social networking "friends" with whom one has no relationship at all. Within these categories "friends" can be made up of strong ties, weak existing ties, weak latent ties, and parasocial ties. Strong ties can be made up of close family members and friends where self-disclosure, intimacy and frequent content occur. Weak existing ties can be made up of acquaintances, co-workers and distance relatives with whom the user has inconsistent contact. Weak latent ties can be made up of people within a similar geographical location or profession that can be used as a potential future bridge to other connections. Parasocial ties can be made up of celebrities, public figures and media personas. Human nature is to reciprocate a friending, marking someone as a friend who has marked oneself as a friend. This is a social norm for social networking services. However, this leads to mixing up who is an actual friend, and who is a contact. Tagging someone as a "contact" who has marked one as a "friend" can be perceived as impolite. Other concerns about this issue are treated in Sherry Turkle's Alone Together which analyses many behavioral dynamics in social media friendships. Turkle defines herself as "cautiously optimistic", but expresses concern that distance communications may undermine genuine face-to-face spoken discourses, lessening people's expectations of one another. One social networking service, FriendFeed, allows one to friend someone as a "fake" friend. The person "fake" friended receives the usual notifications for friending, but that person's updates are not received. Gavin Bell, author of Building Social Web Applications, describes this mechanism as "ludicrous". Results from a 2007 survey the Center for the Digital Future stated that only 23% of internet users have at least one virtual friend whom they have only met online. Ideally the number of virtual friends is directly proportional to the use of the Internet, but the same survey showed 20% of heavy-users (more than 3 hours/day) who claimed an average of 8.7% online friends, reported at least one relationship that started virtually and migrated to in-person contact. This results and other concerning issues are included in the book Networked: The New Social Operating System co-written by Lee Rainie and Barry Wellman in 2012. == Ethical considerations == The act of "friending" someone on a social networking service has particular ethical implications for judges in the United States. Judicial codes of conducts in the various states generally incorporate some form of provision that judges should avoid even the appearance of impropriety. Whether this regulates and even prohibits judges "friending" attorneys that appear before them, and law enforcement personnel, has been the subject of some analysis by the judicial ethics panels of the various states. They haven't all agreed on the guidance that they have given to judges: The New York state Judicial Ethics committee in 2009 simply advised judges to employ caution, noting that the issue of "friending" someone on a social networking service is a publicly observable act that has little difference from other public behavior concerns judges already face. The Florida Judicial Ethics Advisory committee in 2009 noted that, judges being normal human beings, it was unavoidable for judges to form friendships without the responsibilities of their job. It prohibited judges from friending any attorneys that appeared before them, whilst allowing friending of those who do not, on the grounds that it may give the appearance to the general public (even if the substance is otherwise) that those attorneys who are friended hold special sway with the judge. A minority opinion of the committee asserted that there is a substantive difference between "friending" on a social networking service and actual friendship, and that the general public, being aware of the norms of social networking services, was capable of drawing this distinction and would not reasonably conclude either a special degree of influence or a violation of the code of judicial conduct. This minority opinion was outnumbered twice in 2009, both in the Judicial Ethics Advisory and in the Florida Supreme Court Judicial Ethics Advisory committee. The South Carolina judicial conduct committee in 2009 permitted judges to friend attorneys and law enforcement personnel, with the proviso that no judicial business should be conducted upon nor discussed via the social networking service. "... a judge should not become isolated from the community in which the judge lives.", the committee stated. The Kentucky Judicial Ethics committee in 2010 took the same position as the minority opinion in Florida. It urged judges to exercise caution, but recognized that the act of friending "does not, in and of itself, indicate the degree or intensity of a judge's relationship with the person who is the 'friend'

Bigram

A bigram or digram is a sequence of two adjacent elements from a string of tokens, which are typically letters, syllables, or words. A bigram is an n-gram for n=2. The frequency distribution of every bigram in a string is commonly used for simple statistical analysis of text in many applications, including in computational linguistics, cryptography, and speech recognition. Gappy bigrams or skipping bigrams are word pairs which allow gaps (perhaps avoiding connecting words, or allowing some simulation of dependencies, as in a dependency grammar). == Applications == Bigrams, along with other n-grams, are used in most successful language models for speech recognition. Bigram frequency attacks can be used in cryptography to solve cryptograms. See frequency analysis. Bigram frequency is one approach to statistical language identification. Some activities in logology or recreational linguistics involve bigrams. These include attempts to find English words beginning with every possible bigram, or words containing a string of repeated bigrams, such as logogogue. == Bigram frequency in the English language == The frequency of the most common letter bigrams in a large English corpus is: th 3.56% of 1.17% io 0.83% he 3.07% ed 1.17% le 0.83% in 2.43% is 1.13% ve 0.83% er 2.05% it 1.12% co 0.79% an 1.99% al 1.09% me 0.79% re 1.85% ar 1.07% de 0.76% on 1.76% st 1.05% hi 0.76% at 1.49% to 1.05% ri 0.73% en 1.45% nt 1.04% ro 0.73% nd 1.35% ng 0.95% ic 0.70% ti 1.34% se 0.93% ne 0.69% es 1.34% ha 0.93% ea 0.69% or 1.28% as 0.87% ra 0.69% te 1.20% ou 0.87% ce 0.65%

CodePen

CodePen is an online community for testing and showcasing user-created HTML, CSS and JavaScript code snippets. It functions as an online code editor and open-source learning environment, where developers can create code snippets, called "pens," and test them. It was founded in 2012 by full-stack developers Alex Vazquez and Tim Sabat and front-end designer Chris Coyier. Its employees work remotely, rarely all meeting together in person. CodePen is a large community for web designers and developers to showcase their coding skills, with an estimated 330,000 registered users and 14.16 million monthly visitors.