FMLLR

FMLLR

In signal processing, Feature space Maximum Likelihood Linear Regression (fMLLR) is a global feature transform that are typically applied in a speaker adaptive way, where fMLLR transforms acoustic features to speaker adapted features by a multiplication operation with a transformation matrix. In some literature, fMLLR is also known as the Constrained Maximum Likelihood Linear Regression (cMLLR). == Overview == fMLLR transformations are trained in a maximum likelihood sense on adaptation data. These transformations may be estimated in many ways, but only maximum likelihood (ML) estimation is considered in fMLLR. The fMLLR transformation is trained on a particular set of adaptation data, such that it maximizes the likelihood of that adaptation data given a current model-set. This technique is a widely used approach for speaker adaptation in HMM-based speech recognition. Later research also shows that fMLLR is an excellent acoustic feature for DNN/HMM hybrid speech recognition models. The advantage of fMLLR includes the following: the adaptation process can be performed within a pre-processing phase, and is independent of the ASR training and decoding process. this type of adapted feature can be applied to deep neural networks (DNN) to replace traditionally used mel-spectrogram in end-to-end speech recognition models. fMLLR's speaker adaptation process leads to a significant performance boost for ASR models, hence outperforming other transform or features like MFCCs (Mel-Frequency Cepstral Coefficients) and FBANKs (Filter bank) coefficients. fMLLR features can be efficiently realized with speech toolkits like Kaldi. Major problem and disadvantage of fMLLR: when the amount of adaptation data is limited, the transformation matrices tends to easily overfit the given data. == Computing fMLLR transform == Feature transform of fMLLR can be easily computed with the open source speech tool Kaldi, the Kaldi script uses the standard estimation scheme described in Appendix B of the original paper, in particular the section Appendix B.1 "Direct method over rows". In the Kaldi formulation, fMLLR is an affine feature transform of the form x {\displaystyle x} → A {\displaystyle A} x {\displaystyle x} + b {\displaystyle +b} , which can be written in the form x {\displaystyle x} →W x ^ {\displaystyle {\hat {x}}} , where x ^ {\displaystyle {\hat {x}}} = [ x 1 ] {\displaystyle {\begin{bmatrix}x\\1\end{bmatrix}}} is the acoustic feature x {\displaystyle x} with a 1 appended. Note that this differs from some of the literature where the 1 comes first as x ^ {\displaystyle {\hat {x}}} = [ 1 x ] {\displaystyle {\begin{bmatrix}1\\x\end{bmatrix}}} . The sufficient statistics stored are: K = ∑ t , j , m γ j , m ( t ) Σ j m − 1 μ j m x ( t ) + {\displaystyle K=\sum _{t,j,m}\gamma _{j,m}(t)\textstyle \Sigma _{jm}^{-1}\mu _{jm}x(t)^{+}\displaystyle } where Σ j m − 1 {\displaystyle \textstyle \Sigma _{jm}^{-1}\displaystyle } is the inverse co-variance matrix. And for 0 ≤ i ≤ D {\displaystyle 0\leq i\leq D} where D {\displaystyle D} is the feature dimension: G ( i ) = ∑ t , j , m γ j , m ( t ) ( 1 σ j , m 2 ( i ) ) x ( t ) + x ( t ) + T {\displaystyle G^{(i)}=\sum _{t,j,m}\gamma _{j,m}(t)\left({\frac {1}{\sigma _{j,m}^{2}(i)}}\right)x(t)^{+}x(t)^{+T}\displaystyle } For a thorough review that explains fMLLR and the commonly used estimation techniques, see the original paper "Maximum likelihood linear transformations for HMM-based speech recognition ". Note that the Kaldi script that performs the feature transforms of fMLLR differs with by using a column of the inverse in place of the cofactor row. In other words, the factor of the determinant is ignored, as it does not affect the transform result and can causes potential danger of numerical underflow or overflow. == Comparing with other features or transforms == Experiment result shows that by using the fMLLR feature in speech recognition, constant improvement is gained over other acoustic features on various commonly used benchmark datasets (TIMIT, LibriSpeech, etc). In particular, fMLLR features outperform MFCCs and FBANKs coefficients, which is mainly due to the speaker adaptation process that fMLLR performs. In, phoneme error rate (PER, %) is reported for the test set of TIMIT with various neural architectures: As expected, fMLLR features outperform MFCCs and FBANKs coefficients despite the use of different model architecture. Where MLP (multi-layer perceptron) serves as a simple baseline, on the other hand RNN, LSTM, and GRU are all well known recurrent models. The Li-GRU architecture is based on a single gate and thus saves 33% of the computations over a standard GRU model, Li-GRU thus effectively address the gradient vanishing problem of recurrent models. As a result, the best performance is obtained with the Li-GRU model on fMLLR features. == Extract fMLLR features with Kaldi == fMLLR can be extracted as reported in the s5 recipe of Kaldi. Kaldi scripts can certainly extract fMLLR features on different dataset, below are the basic example steps to extract fMLLR features from the open source speech corpora Librispeech. Note that the instructions below are for the subsets train-clean-100,train-clean-360,dev-clean, and test-clean, but they can be easily extended to support the other sets dev-other, test-other, and train-other-500. These instruction are based on the codes provided in this GitHub repository, which contains Kaldi recipes on the LibriSpeech corpora to execute the fMLLR feature extraction process, replace the files under $KALDI_ROOT/egs/librispeech/s5/ with the files in the repository. Install Kaldi. Install Kaldiio. If running on a single machine, change the following lines in $KALDI_ROOT/egs/librispeech/s5/cmd.sh to replace queue.pl to run.pl: Change the data path in run.sh to your LibriSpeech data path, the directory LibriSpeech/ should be under that path. For example: Install flac with: sudo apt-get install flac Run the Kaldi recipe run.sh for LibriSpeech at least until Stage 13 (included), for simplicity you can use the modified run.sh. Copy exp/tri4b/trans. files into exp/tri4b/decode_tgsmall_train_clean_/ with the following command: Compute the fMLLR features by running the following script, the script can also be downloaded here: Compute alignments using: Apply CMVN and dump the fMLLR features to new .ark files, the script can also be downloaded here: Use the Python script to convert Kaldi generated .ark features to .npy for your own dataloader, an example Python script is provided:

NER model

NER is one of several formulas for accessing live subtitles in television broadcasts and events that are produced using speech recognition. The three letters stand for number, edit error and recognition error. It has been promoted as an alternative to Word error rate (Word Error Rate) which is a more objective measure. The overall score is calculated as follows: Firstly, the number of edit and recognition errors is deducted from the total number of words in the live subtitles. This number is then divided by the total number of words in the live subtitles and finally multiplied by one hundred. N E R v a l u e = N − E − R N ∗ 100 {\displaystyle NERvalue={\frac {N-E-R}{N}}100} . The acronyms stand for the following: N (number) = total number of words in the live subtitles E (Edit error) = edit error R (Recognition error) = recognition error This measurement process has been used for public television broadcasts in European countries like Italy and Switzerland. One major drawback with NER is that it requires a human assessor to rate errors as either: 1 Minor edition or recognition errors 2 Normal edition or recognition errors 3 Serious errors which are then weighted in the assessment process. This is both subjective, time consuming and costly. Also, NER fails to account for words left out subtitles which is something that does not take account of the D/deaf audience who want verbatim subtitles. As a result, NER cannot accurately reflect the audience's experience of subtitles. Another problem is the inconsistency of human evaluation of subtitles, particularly with live subtitles, where there are differing opinions of the importance of subtitle errors. By way of contrast, Word error rate is an objective measure of subtitle errors, since it measures the textual discrepancy between the subtitles and the speech.

Closest point method

The closest point method (CPM) is an embedding method for solving partial differential equations on surfaces. The closest point method uses standard numerical approaches such as finite differences, finite element or spectral methods in order to solve the embedding partial differential equation (PDE) which is equal to the original PDE on the surface. The solution is computed in a band surrounding the surface in order to be computationally efficient. In order to extend the data off the surface, the closest point method uses a closest point representation. This representation extends function values to be constant along directions normal to the surface. == Definitions == Closest Point function: Given a surface S , c p ( x ) {\displaystyle {\mathcal {S}},cp(\mathbf {x} )} refers to a (possibly non-unique) point belonging to S {\displaystyle {\mathcal {S}}} , which is closest to x {\displaystyle \mathbf {x} } [SE]. Closest point extension: Let S {\displaystyle {\mathcal {S}}} , be a smooth surface in R d {\displaystyle \mathbb {R} ^{d}} . The closest point extension of a function u : S → R {\displaystyle u:{\mathcal {S}}\rightarrow \mathbb {R} } , to a neighborhood Ω {\displaystyle \Omega } of S {\displaystyle {\mathcal {S}}} , is the function v : Ω → R {\displaystyle v:\Omega \rightarrow \mathbb {R} } , defined by v ( x ) = u ( c p ( x ) ) {\displaystyle v(\mathbf {x} )=u(cp(\mathbf {x} ))} . == Closest point method == Initialization consists of these steps [EW]: If it is not already given, a closest point representation of the surface is constructed. A computational domain is chosen. Typically this is a band around the surface. Replace surface gradients by standard gradients in R 3 {\displaystyle \mathbb {R} ^{3}} . Solution is initialized by extending the initial surface data on to the computational domain using the closest point function. After initialization, alternate between the following two steps: Using the closest point function, extend the solution off the surface to the computational domain. Compute the solution to the embedding PDE on a Cartesian mesh in the computational domain for one time step. == Banding == The surface PDE is extended into R 3 {\displaystyle \mathbb {R} ^{3}} however it is only necessary to solve this new PDE near the surface. Hence, we solve the PDE in a band surrounding the surface for efficient computational purposes. Ω c x : ‖ x − c p ( x ) ‖ 2 ≤ λ {\displaystyle \Omega _{c}{x:\|x-cp(x)\|_{2}\leq \lambda }} where λ {\displaystyle \lambda } is the bandwidth. == Example: Heat equation on a circle == Using initial profile u S ( θ , t ) = sin ⁡ ( θ ) {\displaystyle u_{S}(\theta ,t)=\sin(\theta )} leads to the solution u S ( θ , t ) = exp ⁡ ( − t ) sin ⁡ ( θ ) {\displaystyle u_{S}(\theta ,t)=\exp(-t)\sin(\theta )} for the heat equation. Forward Euler time-stepping is used with relation Δ t = 0.1 Δ x 2 {\displaystyle \Delta t=0.1\Delta x^{2}} and degree-four interpolation polynomials for the interpolations. Second-order centered differences are used for the spatial discretization. The CPM results in the expected second order error in the solution u {\displaystyle u} . == Applications == The closest point method can be applied to various PDEs on surfaces. Reaction–diffusion problems on point clouds [RD], eigenvalue problems [EV], and level set equations [LS] are a few examples.

Content as a service

Content as a service (CaaS) or managed content as a service (MCaaS) is a service-oriented model, where the service provider delivers the content on demand to the service consumer via web services that are licensed under subscription. The content is hosted by the service provider centrally in the cloud and offered to a number of consumers that need the content delivered into any applications or system, hence content can be demanded by the consumers as and when required. Content as a Service is a way to provide raw content (in other words, without the need for a specific human compatible representation, such as HTML) in a way that other systems can make use of it. Content as a Service is not meant for direct human consumption, but rather for other platforms to consume and make use of the content according to their particular needs. This happens usually on the cloud, with a centralized platform which can be globally accessible and provides a standard format for your content. With Content as a Service, you centralize your content into a single repository, where you can manage it, categorize it, make it available to others, search for it, or do whatever you wish with it. == Overview == The content delivered typically could be one or more of the following The technical terminology related to equipment or spares that is required to procure or design the materials The industrial terminology of the equipment or spares Technical values pertaining to various types, specifications, applications, characteristics of equipment or spares Sourcing information which will help in procurement or supply-chain management of equipment or spares Descriptive specifications of equipment or spares based on the product reference number or identifier UNSPSC codes or industry practiced classifications ISO, IEC compliant terminology Ontology or Technical Dictionary of products & services Predefined content for specific business needs The term "Content as a service" (CaaS) is considered to be part of the nomenclature of cloud computing service models & Service-oriented architecture along with Software as a service (SaaS), Infrastructure as a service (IaaS), and Platform as a service (PaaS).

Report generator

A report generator is a computer program whose purpose is to take data from a source such as a database, XML stream or a spreadsheet, and use it to produce a document in a format which satisfies a particular human readership. Report generation functionality is almost always present in database systems, where the source of the data is the database itself. It can also be argued that report generation is part of the purpose of a spreadsheet. Standalone report generators may work with multiple data sources and export reports to different document formats. Information systems theory specifies that information delivered to a target human reader must be timely, accurate and relevant. Report generation software targets the final requirement by making sure that the information delivered is presented in the way most readily understood by the target reader. == History == An early report writer was part of NOMAD developed in the 1970s. The evolution of reporting software has a rich history dating back to the mid-20th century, driven by the increasing need for businesses to efficiently analyze and present data. Initially, manual extraction and tabulation were commonplace, but the advent of computers in the 1960s marked a transformative phase with the emergence of basic reporting tools. The 1980s saw the widespread adoption of database management systems, laying the groundwork for more sophisticated reporting capabilities. Notable dedicated reporting software, such as Crystal Reports and BusinessObjects, gained prominence in the 1990s amidst the growing demand for business intelligence. The 21st century witnessed a paradigm shift towards web-based reporting solutions and the rise of self-service BI tools, empowering users to create reports independently. Presently, reporting software continues to evolve with a focus on data visualization, integration of artificial intelligence, and the imperative for real-time analytics in decision-making.

Round-trip engineering

Round-trip engineering (RTE) in the context of model-driven architecture is a functionality of software development tools that synchronizes two or more related software artifacts, such as, source code, models, configuration files, documentation, etc. between each other. The need for round-trip engineering arises when the same information is present in multiple artifacts and when an inconsistency may arise in case some artifacts are updated. For example, some piece of information was added to/changed in only one artifact (source code) and, as a result, it became missing in/inconsistent with the other artifacts (in models). == Overview == Round-trip engineering is closely related to traditional software engineering disciplines: forward engineering (creating software from specifications), reverse engineering (creating specifications from existing software), and reengineering (understanding existing software and modifying it). Round-trip engineering is often wrongly defined as simply supporting both forward and reverse engineering. In fact, the key characteristic of round-trip engineering that distinguishes it from forward and reverse engineering is the ability to synchronize existing artifacts that evolved concurrently by incrementally updating each artifact to reflect changes made to the other artifacts. Furthermore, forward engineering can be seen as a special instance of RTE in which only the specification is present and reverse engineering can be seen as a special instance of RTE in which only the software is present. Many reengineering activities can also be understood as RTE when the software is updated to reflect changes made to the previously reverse engineered specification. === Types === Various books describe two types of RTE: partial or uni-directional RTE: changes made to a higher level representation of a code and model are reflected in lower level, but not otherwise; the latter might be allowed, but with limitations that may not affect higher-level abstractions full or bi-directional RTE: regardless of changes, both higher and lower-level code and model representations are synchronized if any of them altered === Auto synchronization === Another characteristic of round-trip engineering is automatic update of the artifacts in response to automatically detected inconsistencies. In that sense, it is different from forward- and reverse engineering which can be both manual (traditionally) and automatic (via automatic generation or analysis of the artifacts). The automatic update can be either instantaneous or on-demand. In instantaneous RTE, all related artifacts are immediately updated after each change made to one of them. In on-demand RTE, authors of the artifacts may concurrently update the artifacts (even in a distributed setting) and at some point choose to execute matching to identify inconsistencies and choose to propagate some of them and reconcile potential conflicts. === Iterative approach === Round trip engineering may involve an iterative development process. After you have synchronized your model with revised code, you are still free to choose the best way to work – make further modifications to the code or make changes to your model. You can synchronize in either direction at any time and you can repeat the cycle as many times as necessary. == Software == Many commercial tools and research prototypes support this form of RTE; a 2007 book lists Rational Rose, Together, ESS-Model, BlueJ, and Fujaba among those capable, with Fujaba said to be capable to also identify design patterns. == Limitations == A 2005 book on Visual Studio notes for instance that a common problem in RTE tools is that the model reversed is not the same as the original one, unless the tools are aided by leaving laborious annotations in the source code. The behavioral parts of UML impose even more challenges for RTE. Usually, UML class diagrams are supported to some degree; however, certain UML concepts, such as associations and containment do not have straightforward representations in many programming languages which limits the usability of the created code and accuracy of code analysis/reverse engineering (e.g., containment is hard to recognize in the code). A more tractable form of round-trip engineering is implemented in the context of framework application programming interfaces (APIs), whereby a model describing the usage of a framework API by an application is synchronized with that application's code. In this setting, the API prescribes all correct ways the framework can be used in applications, which allows precise and complete detection of API usages in the code as well as creation of useful code implementing correct API usages. Two prominent RTE implementations in this category are framework-specific modeling languages and Spring Roo (Java). Round-trip engineering is critical for maintaining consistency among multiple models and between the models and the code in Object Management Group's (OMG) Model-driven architecture. OMG proposed the QVT (query/view/transformation) standard to handle model transformations required for MDA. To date, a few implementations of the standard have been created. (Need to present practical experiences with MDA in relation to RTE). == Controversies == === Code generation controversy === Code generation (forward-engineering) from models means that the user abstractly models solutions, which are connoted by some model data, and then an automated tool derives from the models parts or all of the source code for the software system. In some tools, the user can provide a skeleton of the program source code, in the form of a source code template where predefined tokens are then replaced with program source code parts during the code generation process. UML (if used for MDA) diagrams specification was criticized for lack the detail which is needed to contain the same information as is covered with the program source. Some developers even claim that "the Code is the design". == Disadvantages == There is a serious risk that the generated code will rapidly differ from the model or that the reverse-engineered model will lose its reflection on the code or a mix of these two problems as result of cycled reengineering efforts. Regarding behavioral/dynamic part of UML for features like statechart diagram there is no equivalents in programming languages. Their translation during code-generation will result in common programming statement (.e.g if,switch,enum) being either missing or misinterpreted. If edited and imported back may result in different or incomplete model. The same goes for code snippets used for code generation stage for the pattern-implementation and user-specific logic: intermixed they may not be easily reverse-engineered back. There is also general lack of advanced tooling for modelling that are comparable to that of modern IDEs (for testing, debugging, navigation, etc.) for general-purpose programming languages and domain-specific languages. == Examples in software engineering == Perhaps the most common form of round-trip engineering is synchronization between UML (Unified Modeling Language) models and the corresponding source code and entity–relationship diagrams in data modelling and database modelling. Round-trip engineering based on Unified Modeling Language (UML) needs three basic tools for software development: Source Code Editor; UML Editor for the Attributes and Methods; Visualisation of UML structure

Tom's Planner

Tom's Planner is a web-based tool and application service provider for project planning, management and collaboration. == History == Tom's Planner is based on Curaçao. In November 2009, it announced its public beta launch on TechCrunch and moved out of beta in August 2010. In 2013 Tom's Planner acquired its competitor Gantto. == Software == Tom's Planner is project management software that enables the creation of project schedules (Gantt charts) using a visual perspective. Tom's Planner uses the Freemium Business Model. Users can register for a free account or choose a paid version. Tom's Planner is available in five languages and is used by thousands of users on a daily basis in more than 100 countries worldwide. Customers range from fortune 500 companies to small mom-and-pop shops. == Reviews == Tom's Planner has been reviewed by PC World, TechCrunch, Lifehacker, and several other periodicals.