Software diagnosis

Software diagnosis

Software diagnosis (also: software diagnostics) refers to concepts, techniques, and tools that allow for obtaining findings, conclusions, and evaluations about software systems and their implementation, composition, behaviour, and evolution. It serves as means to monitor, steer, observe and optimize software development, software maintenance, and software re-engineering in the sense of a business intelligence approach specific to software systems. It is generally based on the automatic extraction, analysis, and visualization of corresponding information sources of the software system. It can also be manually done and not automatic. == Applications == Software diagnosis supports all branches of software engineering, in particular project management, quality management, risk management as well as implementation and test. Its main strength is to support all stakeholders of software projects (in particular during software maintenance and for software re-engineering tasks) and to provide effective communication means for software development projects. For example, software diagnosis facilitates "bridging an essential information gap between management and development, improve awareness, and serve as early risk detection instrument". Software diagnosis includes assessment methods for "perfective maintenance" that, for example, apply "visual analysis techniques to combine multiple indicators for low maintainability, including code complexity and entanglement with other parts of the system, and recent changes applied to the code". == Characteristics == In contrast to manifold approaches and techniques in software engineering, software diagnosis does not depend on programming languages, modeling techniques, software development processes or the specific techniques used in the various stages of the software development process. Instead, software diagnosis aims at analyzing and evaluating the software system in its as-is state and based on system-generated information to bypass any subjective or potentially outdated information sources (e.g., initial software models). For it, software diagnosis combines and relates sources of information that are typically not directly linked. Examples: Source-code metrics are related with software developer activity to gain insight into developer-specific effects on software code quality. System structure and run-time execution traces are correlated to facilitate program comprehension through dynamic analysis in software maintenance tasks. == Principles == The core principle of software diagnosis is to automatically extract information from all available information sources of a given software projects such as source code base, project repository, code metrics, execution traces, test results, etc. To combine information, software-specific data mining, analysis, and visualization techniques are applied. Its strength results, among various reasons, from integrating decoupled information spaces in the scope of a typical software project, for example development and developer activities (recorded by the repository) and code and quality metrics (derived by analyzing source code) or key performance indicators (KPIs). == Examples == Examples of software diagnosis tools include software maps and software metrics. == Critics == Software diagnosis—in contrast to many approaches in software engineering—does not assume that developer capabilities, development methods, programming or modeling languages are right or wrong (or better or worse compared to each other): Software diagnosis aims at giving insight into a given software system and its status regardless of the methods, languages, or models used to create and maintain the system. === Related subjects === Cost estimation in software engineering Programming productivity Rapid application development Software design Software development Software documentation Software map Software release life cycle Systems design Systems Development Life Cycle

Lingua Libre

Lingua Libre is an online collaborative project and tool by the Wikimédia France association, which aims to build a collaborative, multilingual, audiovisual speech corpus under a free license. It mostly consists of a rapid recording online service which allows the user to chain hundreds of recordings. Contributors have produced content in 310+ languages. == Description == Lingua Libre enables the recording of words, phrases or sentences of any language, oral (audio recording) or signed (video recording). Words are presented to the speaker in the form of a list, created on the spot, in advance, or by reusing an existing Wikimedia category. The speaker simply reads the word displayed on the screen, and the software moves on to the next word when it detects a silence after the read word. This principle, borrowed from the open source software Shtooka recorder with the help of its creator, Nicolas Vion, makes it possible to record several hundreds of words per hour. The recordings are then uploaded automatically from the web client to the Wikimedia Commons media library. In spring 2021, Lingua Libre was offline due to a fire in Strasbourg, but no audio recordings were lost. === Use of the recordings === The recordings can be consulted either on Lingua Libre or on Commons. They are mainly used on other Wikimedia projects, for example to illustrate entries on Wiktionaries or proper nouns in Wikipedia articles. The re-use of the recordings in a language teaching context is envisaged. Language learners can freely download pronunciations and use them on GoldenDict, a popular dictionary software. Thus, audio recordings can be used as “Pronunciation Dictionaries” on GoldenDict without needing internet connection. The recordings are also reused in Natural Language Processing projects, for example to drive Mozilla's DeepSpeech speech recognition engines. == Versions == Lingua Libre was initiated on January 23, 2015 and has had three successive versions: === Lingua Libre v.1 (2016) === As part of the Languages of France project, which aims to document and promote the regional languages of France on Wikimedia and Internet projects in general, the conception of Lingua Libre started in November 2015, partly funded by the DGLFLF (General Delegation for the French language and the languages of France). The first version of the project was launched in August 2016. Only suitable for audio recording, Lingua Libre was shown during a workshop on Occitan language in December 2016, and then presented to the online Wikimedia community and at international events in 2017. === Lingua Libre v.2 (2018) === A complete rebuilding was launched at the end of 2017. The new version of Lingua Libre is based on MediaWiki, uses Wikibase and OAuth to better integrate into the Wikimedia environment. The interface is translated via Translatewiki.net so that the project can be used by a large number of communities. The new version of the site was ready in June 2018 and opened to the public in August 2018. === Lingua Libre v.2.2 (2020) === In 2020, important changes were made to the platform; a new look was developed especially for the site, the .org domain replaced the .fr domain used until then, and added support for sign languages through video recording. == Statistics == In the first two years of the project's launch, approximately 10,000 recordings were made. The transition to v.2 was accompanied by a sharp increase in the contributions. The number of recordings multiplied by 10 in less than a year, exceeding the 100,000 threshold in May 2019. These recordings were made by 127 speakers in almost 50 languages. By September 2020, the platform had more than 300,000 recordings in 90 languages with more than 350 speakers. The 500,000 recordings milestone was reached in June 2021, thanks to 540 speakers of 120 languages.

PhyCV

PhyCV is the first computer vision library which utilizes algorithms directly derived from the equations of physics governing physical phenomena. The algorithms appearing in the first release emulate the propagation of light through a physical medium with natural and engineered diffractive properties followed by coherent detection. Unlike traditional algorithms that are a sequence of hand-crafted empirical rules, physics-inspired algorithms leverage physical laws of nature as blueprints. In addition, these algorithms can, in principle, be implemented in real physical devices for fast and efficient computation in the form of analog computing. Currently PhyCV has three algorithms, Phase-Stretch Transform (PST) and Phase-Stretch Adaptive Gradient-Field Extractor (PAGE), and Vision Enhancement via Virtual diffraction and coherent Detection (VEViD). All algorithms have CPU and GPU versions. PhyCV is now available on GitHub and can be installed from pip. == History == Algorithms in PhyCV are inspired by the physics of the photonic time stretch (a hardware technique for ultrafast and single-shot data acquisition). PST is an edge detection algorithm that was open-sourced in 2016 and has 800+ stars and 200+ forks on GitHub. PAGE is a directional edge detection algorithm that was open-sourced in February, 2022. PhyCV was originally developed and open-sourced by Jalali-Lab @ UCLA in May 2022. In the initial release of PhyCV, the original open-sourced code of PST and PAGE is significantly refactored and improved to be modular, more efficient, GPU-accelerated and object-oriented. VEViD is a low-light and color enhancement algorithm that was added to PhyCV in November 2022. == Background == === Phase-Stretch Transform (PST) === Phase-Stretch Transform (PST) is a computationally efficient edge and texture detection algorithm with exceptional performance in visually impaired images. The algorithm transforms the image by emulating propagation of light through a device with engineered diffractive property followed by coherent detection. It has been applied in improving the resolution of MRI image, extracting blood vessels in retina images, dolphin identification, and waste water treatment, single molecule biological imaging, and classification of UAV using micro Doppler imaging. === Phase-Stretch Adaptive Gradient-Field Extractor (PAGE) === Phase-Stretch Adaptive Gradient-Field Extractor (PAGE) is a physics-inspired algorithm for detecting edges and their orientations in digital images at various scales. The algorithm is based on the diffraction equations of optics. Metaphorically speaking, PAGE emulates the physics of birefringent (orientation-dependent) diffractive propagation through a physical device with a specific diffractive structure. The propagation converts a real-valued image into a complex function. Related information is contained in the real and imaginary components of the output. The output represents the phase of the complex function. === Vision Enhancement via Virtual diffraction and coherent Detection (VEViD) === Vision Enhancement via Virtual diffraction and coherent Detection (VEViD) an efficient and interpretable low-light and color enhancement algorithm that reimagines a digital image as a spatially varying metaphoric light field and then subjects the field to the physical processes akin to diffraction and coherent detection. The term “Virtual” captures the deviation from the physical world. The light field is pixelated and the propagation imparts a phase with an arbitrary dependence on frequency which can be different from the quadratic behavior of physical diffraction. VEViD can be further accelerated through mathematical approximations that reduce the computation time without appreciable sacrifice in image quality. A closed-form approximation for VEViD which we call VEViD-lite can achieve up to 200 FPS for 4K video enhancement. == PhyCV on the Edge == Featuring low-dimensionality and high-efficiency, PhyCV is ideal for edge computing applications. In this section, we demonstrate running PhyCV on NVIDIA Jetson Nano in real-time. === NVIDIA Jetson Nano Developer Kit === NVIDIA Jetson Nano Developer Kit is a small- sized and power-efficient platform for edge computing applications. It is equipped with an NVIDIA Maxwell architecture GPU with 128 CUDA cores, a quad-core ARM Cortex-A57 CPU, 4GB 64-bit LPDDR4 RAM, and supports video encoding and decoding up to 4K resolution. Jetson Nano also offers a variety of interfaces for connectivity and expansion, making it ideal for a wide range of AI and IoT applications. In our setup, we connect a USB camera to the Jetson Nano to acquire videos and demonstrate using PhyCV to process the videos in real-time. === Real-time PhyCV on Jetson Nano === We use the Jetson Nano (4GB) with NVIDIA JetPack SDK version 4.6.1, which comes with pre- installed Python 3.6, CUDA 10.2, and OpenCV 4.1.1. We further install PyTorch 1.10 to enable the GPU accelerated PhyCV. We demonstrate the results and metrics of running PhyCV on Jetson Nano in real-time for edge detection and low-light enhancement tasks. For 480p videos, both operations achieve beyond 38 FPS, which is sufficient for most cameras that capture videos at 30 FPS. For 720p videos, PhyCV low-light enhancement can operate at 24 FPS and PhyCV edge detection can operate at 17 FPS. == Highlights == === Modular Code Architecture === The code in PhyCV has a modular design which faithfully follows the physical process from which the algorithm was originated. Both PST and PAGE modules in the PhyCV library emulate the propagation of the input signal (original digital image) through a device with engineered diffractive property followed by coherent (phase) detection. The dispersive propagation applies a phase kernel to the frequency domain of the original image. This process has three steps in general, loading the image, initializing the kernel and applying the kernel. In the implementation of PhyCV, each algorithm is represented as a class in Python and each class has methods that simulate the steps described above. The modular code architecture follows the physics behind the algorithm. Please refer to the source code on GitHub for more details. === GPU Acceleration === PhyCV supports GPU acceleration. The GPU versions of PST and PAGE are built on PyTorch accelerated by the CUDA toolkit. The acceleration is beneficial for applying the algorithms in real-time image video processing and other deep learning tasks. The running time per frame of PhyCV algorithms on CPU (Intel i9-9900K) and GPU (NVIDIA TITAN RTX) for videos at different resolutions are shown below. Note that the PhyCV low-light enhancement operates in the HSV color space, so the running time also includes RGB to HSV conversion. However, for all running times using GPUs, we ignore the time of moving data from CPUs to GPUs and count the algorithm operation time only. == Installation and Examples == Please refer to the GitHub README file for a detailed technical documentation. == Current Limitations == === I/O (Input/Output) Bottleneck for Real-time Video Processing === When dealing with real-time video streams from cameras, the frames are captured and buffered in CPU and have to be moved to GPU to run the GPU-accelerated PhyCV algorithms. This process is time-consuming and it is a common bottleneck for real-time video-processing algorithms. === Lack of Parameter Adaptivity for Different Images === Currently, the parameters of PhyCV algorithms have to be manually tuned for different images. Although a set of pre-selected parameters work relatively well for a wide range of images, the lack of parameter adaptivity for different images remains a limitation for now.

Doubao

Doubao (Chinese: 豆包) is an artificial intelligence assistant developed by ByteDance. == History == The chatbot was launched in August 2023. By November 2024, it had become China's most popular AI chatbot, with approximately 60 million monthly active users according to industry analytics. == Design == Doubao is powered by Volcano Engine (Volcengine), 120 trillion tokens consumed per day. == Variants == === Dola === The international version of Doubao is Dola which was launched in August 2023 as Cici. Dola is powered by OpenAI's GPT series of large language models and by Google's Gemini.

Doubao

Doubao (Chinese: 豆包) is an artificial intelligence assistant developed by ByteDance. == History == The chatbot was launched in August 2023. By November 2024, it had become China's most popular AI chatbot, with approximately 60 million monthly active users according to industry analytics. == Design == Doubao is powered by Volcano Engine (Volcengine), 120 trillion tokens consumed per day. == Variants == === Dola === The international version of Doubao is Dola which was launched in August 2023 as Cici. Dola is powered by OpenAI's GPT series of large language models and by Google's Gemini.

Exercism

Exercism is an online, open-source, free coding platform that offers code practice and mentorship on 77 different programming languages. == History == Software developer Katrina Owen created Exercism while she was teaching programming at Jumpstart Labs. The platform was developed as an internal tool to solve the problem of her own students not receiving feedback on the coding problems they were practicing. Katrina put the site publicly online and found that people were sharing it with their friends, practicing together and giving each other feedback. Within 12 months, the site had organically grown to see over 6,000 users had submitted code or feedback, and hundreds of volunteers contribute to the languages or tooling on the platform. In 2016, Jeremy Walker joined as co-founder and CEO. In July 2018, the site was relaunched with a new design and centered around a formal mentoring mode, at which point Katrina stepped back from day-to-day involvement. == Product == In the past, the website differed from other coding platforms by requiring students to download exercises through a command line client, solve the code on their own computers then submit the solution for feedback, at which point they can also view other's solutions to the same problem. Since its second relaunch in 2021, solutions can be edited and submitted through a web editor, though the command line client remains available. Exercism has tracks for 74 programming languages. Among the notable languages taught: ABAP, C, C#, C++, CoffeeScript, Delphi, Elm, Erlang, F#, Gleam, Go, Java, JavaScript, Julia, Kotlin, Objective-C, PHP, Python, Raku, Red, Ruby, Rust, Scala, Swift, and V (Vlang). In 2023, the site launched a "12 in 23" challenge for users to learn the basics of 12 different languages - one per month in 2023. == Open source == The Exercism codebase is open source. In April 2016, it consisted of 50 repositories including website code, API code, command-line code and, most of all, over 40 stand-alone repositories for different language tracks. As of February 2024 Exercism has 14,344 contributors, maintains 366 repositories, and 19,603 mentors.

Semantic interpretation

Semantic interpretation is an important component in dialog systems. It is related to natural language understanding, but mostly it refers to the last stage of understanding. The goal of interpretation is binding the user utterance to concept, or something the system can understand. Typically it is creating a database query based on user utterance.