In image processing, normalization is a process that changes the range of pixel intensity values, a kind of intensity mapping. Applications include photographs with poor contrast due to glare, for example. A typical case is contrast stretching. In more general fields of data processing, such as digital signal processing, it is referred to as dynamic range expansion. The purpose of dynamic range expansion in the various applications is usually to bring the image, or other type of signal, into a range that is more familiar or normal to the senses, hence the term normalization. Often, the motivation is to achieve consistency in dynamic range for a set of data, signals, or images to avoid mental distraction or fatigue. For example, a newspaper will strive to make all of the images in an issue share a similar range of grayscale. Auto-normalization in image processing software typically normalizes to the full dynamic range of the number system specified in the image file format. == Definition == Normalization transforms an n-dimensional grayscale image I : { X ⊆ R n } → { Min , . . , Max } {\displaystyle I:\{\mathbb {X} \subseteq \mathbb {R} ^{n}\}\rightarrow \{{\text{Min}},..,{\text{Max}}\}} with intensity values in the range ( Min , Max ) {\displaystyle ({\text{Min}},{\text{Max}})} , into a new image I N : { X ⊆ R n } → { newMin , . . , newMax } {\displaystyle I_{N}:\{\mathbb {X} \subseteq \mathbb {R} ^{n}\}\rightarrow \{{\text{newMin}},..,{\text{newMax}}\}} with intensity values in the range ( newMin , newMax ) {\displaystyle ({\text{newMin}},{\text{newMax}})} . The linear normalization of a grayscale digital image is performed according to the formula I N = ( I − Min ) newMax − newMin Max − Min + newMin {\displaystyle I_{N}=(I-{\text{Min}}){\frac {{\text{newMax}}-{\text{newMin}}}{{\text{Max}}-{\text{Min}}}}+{\text{newMin}}} For example, if the intensity range of the image is 50 to 180 and the desired range is 0 to 255 the process entails subtracting 50 from each of pixel intensity, making the range 0 to 130. Then each pixel intensity is multiplied by 255/130, making the range 0 to 255. Normalization might also be non-linear, as the relationship between I {\displaystyle I} and I N {\displaystyle I_{N}} may not be linear. An example of non-linear normalization is when the normalization follows a sigmoid function, in which case the normalized image is computed according to the formula I N = ( newMax − newMin ) 1 1 + e − I − β α + newMin {\displaystyle I_{N}=({\text{newMax}}-{\text{newMin}}){\frac {1}{1+e^{-{\frac {I-\beta }{\alpha }}}}}+{\text{newMin}}} Where α {\displaystyle \alpha } defines the width of the input intensity range, and β {\displaystyle \beta } defines the intensity around which the range is centered. Gamma correction (log/inverse log) is also a common transformation function. === Colorspace === Intensity operations generally operate on a colorspace that maps to the human perception of lightness without intentionally changing the other properties. This can be done, for example, by operating on the L component of the CIELAB color space, or approximately by operating on the Y component of YCbCr. It is also possible to operate on each of the RGB color channels, though the result will not always make sense. == Contrast stretching == This is the most significant and essential technique of spatial-based image enhancement. The basic intent of this contrast enhancement technique is to adjust the local contrast in the image so as to bring out the clear regions or objects in the image. Low-contrast images often result from poor or non-uniform lighting conditions, a limited dynamic range of the imaging sensor, or improper settings of the lens aperture. This operation tries to change the intensity of the pixel in the image, particularly in the input image, to obtain an enhanced image. It is based on the number of techniques, namely local, global, dark and bright levels of contrast. The contrast enhancement is considered as the amount of color or gray differentiation that lies among the different features in an image. The contrast enhancement improves the quality of image by increasing the luminance difference between the foreground and background. A contrast stretching transformation can be achieved by: Stretching the dark range of input values into a wider range of output values: This involves increasing the brightness of the darker areas in the image to enhance details and improve visibility. Shifting the mid-range of input values: This involves adjusting the brightness levels of the mid-tones in the image to improve overall contrast and clarity. Compressing the bright range of input values: This process involves reducing the brightness of the brighter areas in the image to prevent overexposure resulting in a more balanced and visually appealing image. It can be described as the following piecewise funciton: I N = { s 1 r 1 I if I < r 1 s 2 − s 1 r 1 − r 2 ( I − r 1 ) if r 1 ≤ I ≤ r 2 1 − s 2 1 − r 2 ( I − r 2 ) if I > r 2 {\displaystyle I_{N}={\begin{cases}{\frac {s_{1}}{r_{1}}}I&{\text{if }}I
Semantic interpretation
Semantic interpretation is an important component in dialog systems. It is related to natural language understanding, but mostly it refers to the last stage of understanding. The goal of interpretation is binding the user utterance to concept, or something the system can understand. Typically it is creating a database query based on user utterance.
Topincs
Topincs is a software for rapid development of web databases and web applications. It is based on LAMP and the semantic technology Topic Maps. A Topincs web database makes information accessible through browsing very much like a Wiki. Editing a page on a subject is done through forms rather than markup editing. A web database can be tailored into a web application to provide specific user groups a contextualized approach to the data. All modeling and development tasks are performed in the web browser. No other development tools are necessary. The server requires Apache, MySQL and PHP. The client works on any standards-compliant web browser on desktops, laptops, tablets, and mobile phones. The layout is automatically adjusted to smaller screens. The programmatic access to data is done via a virtual object-oriented programming interface which is set up over the schema in a few minutes. It is interpreted rather than generated. Portions of the database can be pulled into memory to accelerate bulk access. == Features == Browseable data High-quality web forms Little to no programming Development done in the browser, no other tools required Client runs in any standard-compliant web browser Virtual object-oriented programming interface User interface adjusts to screen size Supports desktops, laptops, tablets, and mobile phones Flexible data modeling == Challenges == Requires rethinking the development process and dropping many hard learned habits Requires a familiarity with two ISO standards ISO 13259 and 19756 Forms cannot be easily adjusted in layout and behavior Server installation difficult and prone to error == License == Topincs can be used in a private network for any purpose for free. The use in a public network is restricted to non-commercial applications.
Web-based simulation
Web-based simulation (WBS) is the invocation of computer simulation services over the World Wide Web, specifically through a web browser. Increasingly, the web is being looked upon as an environment for providing modeling and simulation applications, and as such, is an emerging area of investigation within the simulation community. == Application == Web-based simulation is used in several contexts: In e-learning, various principles can quickly be illustrated to students by means of interactive computer animations, for example during lecture demonstrations and computer exercises. In distance learning, web-based simulation may provide an alternative to installing expensive simulation software on the student computer, or an alternative to expensive laboratory equipment. In software engineering, web-based emulation allows application development and testing on one platform for other target platforms, for example for various mobile operating systems or mobile web browsers, without the need of target hardware or locally installed emulation software. In online computer games, 3D environments can be simulated, and old home computers and video game consoles can be emulated, allowing the user to play old computer games in the web browser. In medical education, nurse education and allied health education (like sonographer training), web-based simulations can be used for learning and practicing clinical healthcare procedures. Web-based procedural simulations emphasize the cognitive elements such as the steps of the procedure, the decisions, the tools/devices to be used, and the correct anatomical location. == Client-side vs server-side approaches == Web-based simulation can take place either on the server side or on the client side. In server-side simulation, the numerical calculations and visualization (generation of plots and other computer graphics) is carried out on the web server, while the interactive graphical user interface (GUI) often partly is provided by the client-side, for example using server-side scripting such as PHP or CGI scripts, interactive services based on Ajax or a conventional application software remotely accessed through a VNC Java applet. In client-side simulation, the simulation program is downloaded from the server side but completely executed on the client side, for example using Java applets, Flash animations, JavaScript, or some mathematical software viewer plug-in. Server-side simulation is not scalable for many simultaneous users, but places fewer demands on the user computer performance and web-browser plug-ins than client-side simulation. The term on-line simulation sometimes refers to server-side web-based simulation, sometimes to symbiotic simulation, i.e. a simulation that interacts in real-time with a physical system. The upcoming cloud-computing technologies can be used for new server-side simulation approaches. For instance, there are multi-agent-simulation applications which are deployed on cloud-computing instances and act independently. This allows simulations to be highly scalable. == Existing tools == AgentSheets – graphically programmed tool for creating web-based The Sims-like simulation games, and for teaching beginner students programming. AnyLogic – a graphically programmed tool that generates Java code for discrete-event simulation, system dynamics and agent-based models Easy Java Simulations – a tool for modelling and visualization of physical phenomenons, that automatically generates Java code from mathematical expressions. ExploreLearning Gizmos – a large library of interactive online simulations for math and science education in grades 3–12. FreeFem++ Javascript Version – FreeFem++ is a free and open source PDE solver using the finite element method. GNU Octave web interfaces – MATLAB compatible open-source software Lanner Group Ltd L-SIM Server – Java-based discrete-event simulation engine which supports model standards such as BPMN 2.0 Nanohub – web 2.0 in-browser interactive simulation of nanotechnology NetLogo – a multi-agent programming language and integrated modeling environment that runs on the Java Virtual Machine OpenPlaG – PHP-based function graph plotter for the use on websites OpenEpi – web-based packet of tools for biostatistics Recursive Porous Agent Simulation Toolkit (Repast) – agent-based modeling and simulation toolkit implemented in Java and many other languages SageMath – open-source numerical-analysis software with web interface, based on the Python programming language SimScale – web-based simulation platform supporting computational fluid dynamics, solid mechanics, and thermodynamics StarLogo – agent-based simulation language written in Java. VisSim viewer – graphically programmed data-flow diagrams for simulation of dynamical systems webMathematica and Mathematica Player – a computer algebra system and programming language. VisualSim Architect – VisualSim Explorer enables system-level models to be embedded in documents for viewing, simulation and analysis from within a web browser without any local software installation.
Apache Parquet
Apache Parquet is a free and open-source column-oriented data storage format in the Apache Hadoop ecosystem inspired by Google Dremel interactive ad-hoc query system for analysis of read-only nested data. It is similar to RCFile and ORC, the other columnar-storage file formats in Hadoop, and is compatible with most of the data processing frameworks around Hadoop. It provides data compression and encoding schemes with enhanced performance to handle complex data in bulk. == History == The open-source project to build Apache Parquet began as a joint effort between Twitter and Cloudera using the record shredding and assembly algorithm as described in Google's Dremel. Parquet was designed as an improvement on the Trevni columnar storage format created by Doug Cutting, the creator of Hadoop. The name 'parquet' (lit. 'small compartment') refers to a style of decorative flooring and was chosen to "evoke the bottom layer of a database with an interesting layout". The first version, Apache Parquet 1.0, was released in July 2013. Since April 27, 2015, Apache Parquet has been a top-level Apache Software Foundation (ASF)-sponsored project. == Features == Apache Parquet is implemented using the record-shredding and assembly algorithm, which accommodates the complex data structures that can be used to store data. The values in each column are stored in contiguous memory locations, providing the following benefits: Column-wise compression is efficient in storage space Encoding and compression techniques specific to the type of data in each column can be used Queries that fetch specific column values need not read the entire row, thus improving performance Apache Parquet is implemented using the Apache Thrift framework, which increases its flexibility; it can work with a number of programming languages like C++, Java, Python, PHP, etc. As of August 2015, Parquet supports the big-data-processing frameworks including Apache Hive, Apache Drill, Apache Impala, Apache Crunch, Apache Pig, Cascading, Presto and Apache Spark. It is one of the external data formats used by the pandas Python data manipulation and analysis library. == Compression and encoding == In Parquet, compression is performed column by column, which enables different encoding schemes to be used for text and integer data. This strategy also keeps the door open for newer and better encoding schemes to be implemented as they are invented. Parquet supports various compression formats: snappy, gzip, LZO, brotli, zstd, and LZ4. === Dictionary encoding === Parquet has an automatic dictionary encoding enabled dynamically for data with a small number of unique values (i.e. below 105) that enables significant compression and boosts processing speed. === Bit packing === Storage of integers is usually done with dedicated 32 or 64 bits per integer. For small integers, packing multiple integers into the same space makes storage more efficient. === Run-length encoding (RLE) === To optimize storage of multiple occurrences of the same value, run-length encoding is used, which is where a single value is stored once along with the number of occurrences. Parquet implements a hybrid of bit packing and RLE, in which the encoding switches based on which produces the best compression results. This strategy works well for certain types of integer data and combines well with dictionary encoding. == Cloud Storage and Data Lakes == Parquet is widely used as the underlying file format in modern cloud-based data lake architectures. Cloud storage systems such as Amazon S3, Azure Data Lake Storage, and Google Cloud Storage commonly store data in Parquet format due to its efficient columnar representation and retrieval capabilities. Data lakehouse frameworks—including Apache Iceberg, Delta Lake, and Apache Hudi —build an additional metadata layer on top of Parquet files to support features such as schema evolution, time-travel queries, and ACID-compliant transactions. In these architectures, Parquet files serve as the immutable storage layer while the table formats manage data versioning and transactional integrity. == Comparison == Apache Parquet is comparable to RCFile and Optimized Row Columnar (ORC) file formats — all three fall under the category of columnar data storage within the Hadoop ecosystem. They all have better compression and encoding with improved read performance at the cost of slower writes. In addition to these features, Apache Parquet supports limited schema evolution, i.e., the schema can be modified according to the changes in the data. It also provides the ability to add new columns and merge schemas that do not conflict. Apache Arrow is designed as an in-memory complement to on-disk columnar formats like Parquet and ORC. The Arrow and Parquet projects include libraries that allow for reading and writing between the two formats. == Implementations == Known implementations of Parquet include:
Spectral shape analysis
Spectral shape analysis relies on the spectrum (eigenvalues and/or eigenfunctions) of the Laplace–Beltrami operator to compare and analyze geometric shapes. Since the spectrum of the Laplace–Beltrami operator is invariant under isometries, it is well suited for the analysis or retrieval of non-rigid shapes, i.e. bendable objects such as humans, animals, plants, etc. == Laplace == The Laplace–Beltrami operator is involved in many important differential equations, such as the heat equation and the wave equation. It can be defined on a Riemannian manifold as the divergence of the gradient of a real-valued function f: Δ f := div grad f . {\displaystyle \Delta f:=\operatorname {div} \operatorname {grad} f.} Its spectral components can be computed by solving the Helmholtz equation (or Laplacian eigenvalue problem): Δ φ i + λ i φ i = 0. {\displaystyle \Delta \varphi _{i}+\lambda _{i}\varphi _{i}=0.} The solutions are the eigenfunctions φ i {\displaystyle \varphi _{i}} (modes) and corresponding eigenvalues λ i {\displaystyle \lambda _{i}} , representing a diverging sequence of positive real numbers. The first eigenvalue is zero for closed domains or when using the Neumann boundary condition. For some shapes, the spectrum can be computed analytically (e.g. rectangle, flat torus, cylinder, disk or sphere). For the sphere, for example, the eigenfunctions are the spherical harmonics. The most important properties of the eigenvalues and eigenfunctions are that they are isometry invariants. In other words, if the shape is not stretched (e.g. a sheet of paper bent into the third dimension), the spectral values will not change. Bendable objects, like animals, plants and humans, can move into different body postures with only minimal stretching at the joints. The resulting shapes are called near-isometric and can be compared using spectral shape analysis. == Discretizations == Geometric shapes are often represented as 2D curved surfaces, 2D surface meshes (usually triangle meshes) or 3D solid objects (e.g. using voxels or tetrahedra meshes). The Helmholtz equation can be solved for all these cases. If a boundary exists, e.g. a square, or the volume of any 3D geometric shape, boundary conditions need to be specified. Several discretizations of the Laplace operator exist (see Discrete Laplace operator) for the different types of geometry representations. Many of these operators do not approximate well the underlying continuous operator. == Spectral shape descriptors == === ShapeDNA and its variants === The ShapeDNA is one of the first spectral shape descriptors. It is the normalized beginning sequence of the eigenvalues of the Laplace–Beltrami operator. Its main advantages are the simple representation (a vector of numbers) and comparison, scale invariance, and in spite of its simplicity a very good performance for shape retrieval of non-rigid shapes. Competitors of shapeDNA include singular values of Geodesic Distance Matrix (SD-GDM) and Reduced BiHarmonic Distance Matrix (R-BiHDM). However, the eigenvalues are global descriptors, therefore the shapeDNA and other global spectral descriptors cannot be used for local or partial shape analysis. === Global point signature (GPS) === The global point signature at a point x {\displaystyle x} is a vector of scaled eigenfunctions of the Laplace–Beltrami operator computed at x {\displaystyle x} (i.e. the spectral embedding of the shape). The GPS is a global feature in the sense that it cannot be used for partial shape matching. === Heat kernel signature (HKS) === The heat kernel signature makes use of the eigen-decomposition of the heat kernel: h t ( x , y ) = ∑ i = 0 ∞ exp ( − λ i t ) φ i ( x ) φ i ( y ) . {\displaystyle h_{t}(x,y)=\sum _{i=0}^{\infty }\exp(-\lambda _{i}t)\varphi _{i}(x)\varphi _{i}(y).} For each point on the surface the diagonal of the heat kernel h t ( x , x ) {\displaystyle h_{t}(x,x)} is sampled at specific time values t j {\displaystyle t_{j}} and yields a local signature that can also be used for partial matching or symmetry detection. === Wave kernel signature (WKS) === The WKS follows a similar idea to the HKS, replacing the heat equation with the Schrödinger wave equation. === Improved wave kernel signature (IWKS) === The IWKS improves the WKS for non-rigid shape retrieval by introducing a new scaling function to the eigenvalues and aggregating a new curvature term. === Spectral graph wavelet signature (SGWS) === SGWS is a local descriptor that is not only isometric invariant, but also compact, easy to compute and combines the advantages of both band-pass and low-pass filters. An important facet of SGWS is the ability to combine the advantages of WKS and HKS into a single signature, while allowing a multiresolution representation of shapes. == Spectral Matching == The spectral decomposition of the graph Laplacian associated with complex shapes (see Discrete Laplace operator) provides eigenfunctions (modes) which are invariant to isometries. Each vertex on the shape could be uniquely represented with a combinations of the eigenmodal values at each point, sometimes called spectral coordinates: s ( x ) = ( φ 1 ( x ) , φ 2 ( x ) , … , φ N ( x ) ) for vertex x . {\displaystyle s(x)=(\varphi _{1}(x),\varphi _{2}(x),\ldots ,\varphi _{N}(x)){\text{ for vertex }}x.} Spectral matching consists of establishing the point correspondences by pairing vertices on different shapes that have the most similar spectral coordinates. Early work focused on sparse correspondences for stereoscopy. Computational efficiency now enables dense correspondences on full meshes, for instance between cortical surfaces. Spectral matching could also be used for complex non-rigid image registration, which is notably difficult when images have very large deformations. Such image registration methods based on spectral eigenmodal values indeed capture global shape characteristics, and contrast with conventional non-rigid image registration methods which are often based on local shape characteristics (e.g., image gradients).
Software design
Software design is the process of conceptualizing how a software system will work before it is implemented or modified. Software design also refers to the direct result of the design process – the concepts of how the software will work which may be formally documented or may be maintained less formally, including via oral tradition. The design process enables a designer to model aspects of a software system before it exists with the intent of making the effort of writing the code more efficiently. Creativity, past experience, a sense of what makes "good" software, and a commitment to quality are success factors for a competent design. A software design can be compared to an architected plan for a house. High-level plans represent the totality of the house (e.g., a three-dimensional rendering of the house). Lower-level plans provide guidance for constructing each detail (e.g., the plumbing lay). Similarly, the software design model provides a variety of views of the proposed software solution. == Part of the overall process == In terms of the waterfall development process, software design is the activity that occurs after requirements analysis and before coding. Requirements analysis determines what the system needs to do without determining how it will do it, and thus, multiple designs can be imagined that satisfy the requirements. The design can be created while coding, without a plan or requirements analysis, but for more complex projects this is less feasible. Completing a design prior to coding allows for multidisciplinary designers and subject-matter experts to collaborate with programmers to produce software that is useful and technically sound. Sometimes, a simulation or prototype is created to model the system in an effort to determine a valid and good design. == Code as design == A common point of confusion with the term design in software is that the process applies at multiple levels of abstraction such as a high-level software architecture and lower-level components, functions and algorithms. A relatively formal process may occur at high levels of abstraction but at lower levels, the design process is almost always less formal where the only artifact of design may be the code itself. To the extent that this is true, software design refers to the design of the design. Edsger W. Dijkstra referred to this layering of semantic levels as the "radical novelty" of computer programming, and Donald Knuth used his experience writing TeX to describe the futility of attempting to design a program prior to implementing it: TEX would have been a complete failure if I had merely specified it and not participated fully in its initial implementation. The process of implementation constantly led me to unanticipated questions and to new insights about how the original specifications could be improved. == Artifacts == A design process may include the production of art Software design documentation such as flow chart, use case, Pseudocode, Unified Modeling Language model and other Fundamental modeling concepts. For user centered software, design may involve user experience design yielding a storyboard to help determine those specifications. Documentation may be reviewed to allow constraints, specifications and even requirements to be adjusted prior to coding. == Iterative design == Software systems inherently deal with uncertainties, and the size of software components can significantly influence a system's outcomes, both positively and negatively. Neal Ford and Mark Richards propose an iterative approach to address the challenge of identifying and right-sizing components. This method emphasizes continuous refinement as teams develop a more nuanced understanding of system behavior and requirements. The approach typically involves a cycle with several stages: A high-level partitioning strategy is established, often categorized as technical or domain-based. Guidelines for the smallest meaningful deployable unit, referred to as "quanta," are defined. While these foundational decisions are made early, they may be revisited later in the cycle if necessary. Initial components are identified based on the established strategy. Requirements are assigned to the identified components. The roles and responsibilities of each component are analyzed to ensure clarity and minimize overlap. Architectural characteristics, such as scalability, fault tolerance, and maintainability, are evaluated. Components may be restructured based on feedback from development teams. This cycle serves as a general framework and can be adapted to different domains. == Design principles == Design principles enable a software engineer to navigate the design process. Davis suggested principles which have been refined over time as: The design process should not suffer from "tunnel vision" A good designer should consider alternative approaches, judging each based on the requirements of the problem, the resources available to do the job. The design should be traceable to the analysis model Because a single element of the design model can often be traced back to multiple requirements, it is necessary to have a means for tracking how requirements have been satisfied by the design model. The design should not reinvent the wheel Systems are constructed using a set of design patterns, many of which have likely been encountered before. These patterns should always be chosen as an alternative to reinvention. Time is short and resources are limited; design time should be invested in representing (truly new) ideas by integrating patterns that already exist (when applicable). The design should "minimize the intellectual distance" between the software and the problem as it exists in the real world That is, the structure of the software design should, whenever possible, mimic the structure of the problem domain. The design should exhibit uniformity and integration A design is uniform if it appears fully coherent. In order to achieve this outcome, rules of style and format should be defined for a design team before design work begins. A design is integrated if care is taken in defining interfaces between design components. The design should be structured to accommodate change The design concepts discussed in the next section enable a design to achieve this principle. The design should be structured to degrade gently, even when aberrant data, events, or operating conditions are encountered Well-designed software should never "bomb"; it should be designed to accommodate unusual circumstances, and if it must terminate processing, it should do so in a graceful manner. Design is not coding, coding is not design Even when detailed procedural designs are created for program components, the level of abstraction of the design model is higher than the source code. The only design decisions made at the coding level should address the small implementation details that enable the procedural design to be coded. The design should be assessed for quality as it is being created, not after the fact A variety of design concepts and design measures are available to assist the designer in assessing quality throughout the development process. The design should be reviewed to minimize conceptual (semantic) errors There is sometimes a tendency to focus on minutiae when the design is reviewed, missing the forest for the trees. A design team should ensure that major conceptual elements of the design (omissions, ambiguity, inconsistency) have been addressed before worrying about the syntax of the design model. == Design concepts == Design concepts provide a designer with a foundation from which more sophisticated methods can be applied. Design concepts include: Abstraction Reducing the information content of a concept or an observable phenomenon, typically to retain only information that is relevant for a particular purpose. It is an act of Representing essential features without including the background details or explanations. Architecture The overall structure of the software and the ways in which that structure provides conceptual integrity for a system. Good software architecture will yield a good return on investment with respect to the desired outcome of the project, e.g. in terms of performance, quality, schedule and cost. Control hierarchy A program structure that represents the organization of a program component and implies a hierarchy of control. Data structure Representing the logical relationship between elements of data. Design pattern A designer may identify a design aspect of the system that has solved in the past. The reuse of such patterns can increase software development velocity. Information hiding Modules should be specified and designed so that information contained within a module is inaccessible to other modules that have no need for such information. Modularity Dividing the solution into parts (modules). Refinement The process of elaboration. A hierarchy is developed by decomposing a macrosco