Parasolid

Parasolid

Parasolid is a geometric modeling kernel originally developed by Shape Data Limited, now owned and developed by Siemens Digital Industries Software. It can be licensed by other companies for use in their 3D computer graphics software products. Parasolid's abilities include model creation and editing utilities such as Boolean modeling operators, feature modeling support, advanced surfacing, thickening and hollowing, blending and filleting, and sheet modeling. It also incorporates modeling with mesh surfaces and lattices. Parasolid also includes tools for direct model editing, including tapering, offsetting, geometry replacement and removing feature details with automated regeneration of surrounding data. Parasolid also provides wide-ranging graphical and rendering support, including hidden-line, wireframe and drafting, tessellation, and model data inquiries. To use Parasolid effectively, software developers need knowledge of CAD in general, computational geometry, and topology. Parasolid is available for Windows (32-bit, 64-bit and AArch64), Linux (64-bit and AArch64), macOS (Apple silicon and Intel), iOS, and Android. == Parasolid XT format == Parasolid parts are normally saved in XT format, which usually has the file extension .X_T. The format is documented and open. There is also a binary version of the format, usually with an .X_B extension, which is somewhat more compact. Both .X_T and .X_B are used for parts files. == Applications == It is used in many computer-aided design (CAD), computer-aided manufacturing (CAM), computer-aided engineering (CAE), product visualization, and CAD data exchange packages. Notable uses include:

Graphical Kernel System

The Graphical Kernel System (GKS) is a 2D computer graphics system using vector graphics, introduced in 1977. It was suitable for making line and bar charts and similar tasks. A key concept was cross-system portability, based on an underlying coordinate system that could be represented on almost any hardware. GKS is best known as the basis for the graphics in the GEM GUI system used on the Atari ST and as part of Ventura Publisher. A draft international standard was circulated for review in September 1983. Final ratification of the standard was achieved in 1985, making it the first ISO graphics standard. A 3D system modelled on GKS was introduced as PHIGS, which saw some use in the 1980s and early 1990s. == Overview == GKS provides a set of drawing features for two-dimensional vector graphics suitable for charting and similar duties. The calls are designed to be portable across different programming languages, graphics devices and hardware, so that applications written to use GKS will be readily portable to many platforms and devices. GKS was fairly common on computer workstations in the 1980s and early 1990s. GKS formed the basis of Digital Research's GSX which evolved into VDI, one of the core components of GEM. GEM was the native GUI on the Atari ST and was occasionally seen on PCs, particularly in conjunction with Ventura Publisher. GKS was little used commercially outside these markets, but remains in use in some scientific visualization packages. It is also the underlying API defining the Computer Graphics Metafile. One popular application based on an implementation of GKS is the GR Framework, a C library for high-performance scientific visualization that has become a common plotting backend among Julia users. A main developer and promoter of the GKS was José Luis Encarnação, formerly director of the Fraunhofer Institute for Computer Graphics (IGD) in Darmstadt, Germany. GKS has been standardized in the following documents: ANSI standard ANSI X3.124 of 1985. ISO 7942:1985 standard, revised as ISO 7942:1985/Amd 1:1991 and ISO/IEC 7942-1:1994, as well as ISO/IEC 7942-2:1997, ISO/IEC 7942-3:1999 and ISO/IEC 7942-4:1998 The language bindings are ISO standard ISO 8651. GKS-3D (Graphical Kernel System for Three Dimensions) functional definition is ISO standard ISO 8805, and the corresponding C bindings are ISO/IEC 8806. The functionality of GKS is wrapped up as a data model standard in the STEP standard, section ISO 10303-46.

Charge based boundary element fast multipole method

The charge-based formulation of the boundary element method (BEM) is a dimensionality reduction numerical technique that is used to model quasistatic electromagnetic phenomena in highly complex conducting media (targeting, e.g., the human brain) with a very large (up to approximately 1 billion) number of unknowns. The charge-based BEM solves an integral equation of the potential theory written in terms of the induced surface charge density. This formulation is naturally combined with fast multipole method (FMM) acceleration, and the entire method is known as charge-based BEM-FMM. The combination of BEM and FMM is a common technique in different areas of computational electromagnetics and, in the context of bioelectromagnetism, it provides improvements over the finite element method. == Historical development == Along with more common electric potential-based BEM, the quasistatic charge-based BEM, derived in terms of the single-layer (charge) density, for a single-compartment medium has been known in the potential theory since the beginning of the 20th century. For multi-compartment conducting media, the surface charge density formulation first appeared in discretized form (for faceted interfaces) in the 1964 paper by Gelernter and Swihart. A subsequent continuous form, including time-dependent and dielectric effects, appeared in the 1967 paper by Barnard, Duck, and Lynn. The charge-based BEM has also been formulated for conducting, dielectric, and magnetic media, and used in different applications. In 2009, Greengard et al. successfully applied the charge-based BEM with fast multipole acceleration to molecular electrostatics of dielectrics. A similar approach to realistic modeling of the human brain with multiple conducting compartments was first described by Makarov et al. in 2018. Along with this, the BEM-based multilevel fast multipole method has been widely used in radar and antenna studies at microwave frequencies as well as in acoustics. == Physical background - surface charges in biological media == The charge-based BEM is based on the concept of an impressed (or primary) electric field E i {\displaystyle \mathbf {E} ^{i}} and a secondary electric field E s {\displaystyle \mathbf {E} ^{s}} . The impressed field is usually known a priori or is trivial to find. For the human brain, the impressed electric field can be classified as one of the following: A conservative field E i {\displaystyle \mathbf {E} ^{i}} derived from an impressed density of EEG or MEG current sources in a homogeneous infinite medium with the conductivity σ {\displaystyle \sigma } at the source location; An instantaneous solenoidal field E i {\displaystyle \mathbf {E} ^{i}} of an induction coil obtained from Faraday's law of induction in a homogeneous infinite medium (air), when transcranial magnetic stimulation (TMS) problems are concerned; A surface field E i {\displaystyle \mathbf {E} ^{i}} derived from an impressed surface current density J i = σ E i {\displaystyle \mathbf {J} ^{i}=\sigma \mathbf {E} ^{i}} of current electrodes injecting electric current at a boundary of a compartment with conductivity σ {\displaystyle \sigma } when transcranial direct-current stimulation (tDCS) or deep brain stimulation (DBS) are concerned; A conservative field E i {\displaystyle \mathbf {E} ^{i}} of charges deposited on voltage electrodes for tDCS or DBS. This specific problem requires a coupled treatment since these charges will depend on the environment; In application to multiscale modeling, a field E i {\displaystyle \mathbf {E} ^{i}} obtained from any other macroscopic numerical solution in a small (mesoscale or microscale) spatial domain within the brain. For example, a constant field can be used. When the impressed field is "turned on", free charges located within a conducting volume D immediately begin to redistribute and accumulate at the boundaries (interfaces) of regions of different conductivity in D. A surface charge density ρ ( r ) {\displaystyle \rho (\mathbf {r} )} appears on the conductivity interfaces. This charge density induces a secondary conservative electric field E s {\displaystyle \mathbf {E} ^{s}} following Coulomb's law. One example is a human under a direct current powerline with the known field E i {\displaystyle \mathbf {E} ^{i}} directed down. The superior surface of the human's conducting body will be charged negatively while its inferior portion is charged positively. These surface charges create a secondary electric field that effectively cancels or blocks the primary field everywhere in the body so that no current will flow within the body under DC steady state conditions. Another example is a human head with electrodes attached. At any conductivity interface with a normal vector n {\displaystyle \mathbf {n} } pointing from an "inside" (-) compartment of conductivity σ − {\displaystyle \sigma ^{-}} to an "outside" (+) compartment of conductivity σ + {\displaystyle \sigma ^{+}} , Kirchhoff's current law requires continuity of the normal component of the electric current density. This leads to the interfacial boundary condition in the form for every facet at a triangulated interface. As long as σ ± {\displaystyle \sigma ^{\pm }} are different from each other, the two normal components of the electric field, E ± ⋅ n {\displaystyle \mathbf {E} ^{\pm }\cdot \mathbf {n} } , must also be different. Such a jump across the interface is only possible when a sheet of surface charge exists at that interface. Thus, if an electric current or voltage is applied, the surface charge density follows. The goal of the numerical analysis is to find the unknown surface charge distribution and thus the total electric field E = E i + E s {\displaystyle \mathbf {E} =\mathbf {E} ^{i}+\mathbf {E} ^{s}} (and the total electric potential if required) anywhere in space. == System of equations for surface charges == Below, a derivation is given based on Gauss's law and Coulomb's law. All conductivity interfaces, denoted by S, are discretized into planar triangular facets t m {\displaystyle t_{m}} with centers r m {\displaystyle \mathbf {r} _{m}} . Assume that an m-th facet with the normal vector n m {\displaystyle \mathbf {n} _{m}} and area A m {\displaystyle A_{m}} carries a uniform surface charge density ρ m {\displaystyle \rho _{m}} . If a volumetric tetrahedral mesh were present, the charged facets would belong to tetrahedra with different conductivity values. We first compute the electric field E m + {\displaystyle \mathbf {E} _{m}^{+}} at the point r m + δ n m {\displaystyle \mathbf {r} _{m}+\delta \mathbf {n} _{m}} , for δ → 0 + {\displaystyle \delta \rightarrow 0^{+}} i.e., just outside facet 𝑚 at its center. This field contains three contributions: The continuous impressed electric field E i {\displaystyle \mathbf {E} ^{i}} itself; An electric field of the m-th charged facet itself. Very close to the facet, it can be approximated as the electric field of an infinite sheet of uniform surface charge ρ m {\displaystyle \rho _{m}} . By Gauss's law, it is given by + ρ m / 2 ε 0 ⋅ n m {\displaystyle +\rho _{m}/2\varepsilon _{0}\cdot \mathbf {n} _{m}} where ε 0 {\displaystyle \varepsilon _{0}} is a background electrical permittivity; An electric field generated by all other facets t n {\displaystyle t_{n}} , which we approximate as point charges of charge A n ρ n {\displaystyle A_{n}\rho _{n}} at each center r n {\displaystyle \mathbf {r} _{n}} . A similar treatment holds for the electric field E m − {\displaystyle \mathbf {E} _{m}^{-}} just inside facet 𝑚, but the electric field of the flat sheet of charge changes its sign. Using Coulomb's law to calculate the contribution of facets different from t m {\displaystyle t_{m}} , we find From this equation, we see that the normal component of the electric field indeed undergoes a jump through the charged interface. This is equivalent to a jump relation of the potential theory. As a second step, the two expressions for E m ± {\displaystyle \mathbf {E} _{m}^{\pm }} are substituted into the interfacial boundary condition σ − E m − ⋅ n m = σ + E m + ⋅ n m {\displaystyle \sigma ^{-}\mathbf {E} _{m}^{-}\cdot \mathbf {n} _{m}=\sigma ^{+}\mathbf {E} _{m}^{+}\cdot \mathbf {n} _{m}} , applied to every facet 𝑚. This operation leads to a system of linear equations for unknown charge densities ρ m {\displaystyle \rho _{m}} which solves the problem: where K m = σ − − σ + σ − + σ + {\displaystyle K_{m}={\frac {\sigma ^{-}-\sigma ^{+}}{\sigma ^{-}+\sigma ^{+}}}} is the electric conductivity contrast at the m-th facet. The normalization constant ε 0 {\displaystyle \varepsilon _{0}} will cancel out after the solution is substituted in the expression for E s {\displaystyle \mathbf {E} ^{s}} and becomes redundant. == Application of fast multipole method == For modern characterizations of brain topologies with ever-increasing levels of complexity, the above system of equations for ρ m {\displaystyle \rho _{m}} is very large; it is t

Frequent pattern discovery

Frequent pattern discovery (or FP discovery, FP mining, or Frequent itemset mining) is part of knowledge discovery in databases, Massive Online Analysis, and data mining; it describes the task of finding the most frequent and relevant patterns in large datasets. The concept was first introduced for mining transaction databases. Frequent patterns are defined as subsets (itemsets, subsequences, or substructures) that appear in a data set with frequency no less than a user-specified or auto-determined threshold. == Techniques == Techniques for FP mining include: market basket analysis cross-marketing catalog design clustering classification recommendation systems For the most part, FP discovery can be done using association rule learning with particular algorithms Eclat, FP-growth and the Apriori algorithm. Other strategies include: Frequent subtree mining Structure mining Sequential pattern mining and respective specific techniques. Implementations exist for various machine learning systems or modules like MLlib for Apache Spark.

Accumulated local effects

Accumulated local effects (ALE) is a machine learning interpretability method. == Concepts == ALE uses a conditional feature distribution as an input and generates augmented data, creating more realistic data than a marginal distribution. It ignores far out-of-distribution (outlier) values. Unlike partial dependence plots and marginal plots, ALE is not defeated in the presence of correlated predictors. It analyzes differences in predictions instead of averaging them by calculating the average of the differences in model predictions over the augmented data, instead of the average of the predictions themselves. == Example == Given a model that predicts house prices based on its distance from city center and size of the building area, ALE compares the differences of predictions of houses of different sizes. The result separates the impact of the size from otherwise correlated features. == Limitations == Defining evaluation windows is subjective. High correlations between features can defeat the technique. ALE requires more and more uniformly distributed observations than PDP so that the conditional distribution can be reliably determined. The technique may produce inadequate results if the data is highly sparse, which is more common with high-dimensional data (curse of dimensionality).

Sketch Engine

Sketch Engine is a corpus manager and text analysis software developed by Lexical Computing since 2003. Its purpose is to enable people studying language behaviour (lexicographers, researchers in corpus linguistics, translators or language learners) to search large text collections according to complex and linguistically motivated queries. Sketch Engine gained its name after one of the key features, word sketches: one-page, automatic, corpus-derived summaries of a word's grammatical and collocational behaviour. Currently, it supports and provides corpora in over 100 languages. == History of development == Sketch Engine is a product of Lexical Computing, a company founded in 2003 by the lexicographer and research scientist Adam Kilgarriff. He started a collaboration with Pavel Rychlý, a computer scientist working at the Natural Language Processing Centre, Masaryk University, and the developer of Manatee and Bonito (two major parts of the software suite). Kilgarriff also introduced the concept of word sketches. Since then, Sketch Engine has been commercial software, however, all the core features of Manatee and Bonito that were developed by 2003 (and extended since then) are freely available under the GPL license within the NoSketch Engine suite. == Features == A list of tools available in Sketch Engine: Word sketches – a one-page automatic derived summary of a word's grammatical and collocational behaviour Word sketch difference – compares and contrasts two words by analysing their collocations Distributional thesaurus – automated thesaurus for finding words with similar meaning or appearing in the same/similar context Concordance search – finds occurrences of a word form, lemma, phrase, tag or complex structure Collocation search – word co-occurrence analysis displaying the most frequent words (for a search word) which can be regarded as collocation candidates Word lists – generates frequency lists which can be filtered with complex criteria n-grams – generates frequency lists of multi-word expressions Terminology / Keyword extraction (both monolingual and bilingual) – automatic extraction of key words and multi-word terms from texts (based on frequency count and linguistic criteria) Diachronic analysis (Trends) – detecting words which undergo changes in the frequency of use in time (show trending words) Corpus building and management – create corpora from the Web or uploaded texts including part-of-speech tagging and lemmatization which can be used as data mining software Parallel corpus (bilingual) facilities – looking up translation examples (EUR-Lex corpus, Europarl corpus, OPUS corpus, etc.) or building a parallel corpus from own aligned texts Text type analysis – statistics of metadata in the corpus === Keywords and terminology extraction === Sketch Engine can perform automatic term extraction by identifying words typical of a particular corpus, document, or text. Single words and multi-word units can be extracted from monolingual or bilingual texts. The terminology extraction feature provides a list of relevant terms based on comparison with a large corpus of general language. This functionality is also available as a separate service called OneClick Terms with a dedicated interface. === SKELL === A free web service based on Sketch Engine and aimed at language learners and teachers is SKELL (formerly SkELL). It exploits Sketch Engine's proprietary GDEX (Good Dictionary Examples) scoring function to provide authentic example sentences for specific target words. Results are drawn from a special corpus of high-quality texts covering everyday, standard, formal, and professional language and displayed as a concordance. SKELL also includes simplified versions of Sketch Engine's word sketch and thesaurus functions. It has been suggested that SKELL can be used, for instance, to help students understand the meaning and/or usage of a word or phrase; to help teachers wanting to use example sentences in a class; to discover and explore collocates; to create gap-fill exercises; to teach various kinds of homonyms and polysemous words. SKELL was first presented in 2014, when only English was supported. Later, support was added for Russian, Czech, German, Italian and Estonian. == List of text corpora == Sketch Engine provides access to more than 800 text corpora. There are monolingual as well as multilingual corpora of different sizes (from one thousand words up to 85 billion words) and various sources (e.g. web, books, subtitles, legal documents). The list of corpora includes British National Corpus, Brown Corpus, Cambridge Academic English Corpus and Cambridge Learner Corpus, CHILDES corpora of child language, OpenSubtitles (a set of 60 parallel corpora), 24 multilingual corpora of EUR-Lex documents, the TenTen Corpus Family (multi-billion web corpora), and Trends corpora (monitor corpora with daily updates). == Architecture == Sketch Engine consists of three main components: an underlying database management system called Manatee, a web interface search front-end called Bonito, and a web interface for corpus building and management called Corpus Architect. === Manatee === Manatee is a database management system specifically devised for effective indexing of large text corpora. It is based on the idea of inverted indexing (keeping an index of all positions of a given word in the text). It has been used to index text corpora comprising tens of billions of words. Searching corpora indexed by Manatee is performed by formulating queries in the Corpus Query Language (CQL). Manatee is written in C++ and offers an API for a number of other programming languages including Python, Java, Perl and Ruby. Recently, it was rewritten into Go for faster processing of corpus queries. === Bonito === Bonito is a web interface for Manatee providing access to corpus search. In the client–server model, Manatee is the server and Bonito plays the client part. It is written in Python. === Corpus Architect === Corpus Architect is a web interface providing corpus building and management features. It is also written in Python. == Applications == Sketch Engine has been used by major British and other publishing houses for producing dictionaries such as Macmillan English Dictionary, Dictionnaires Le Robert, Oxford University Press or Shogakukan. Four of United Kingdom's five biggest dictionary publishers use Sketch Engine.

Abess

abess (Adaptive Best Subset Selection, also ABESS) is a machine learning method designed to address the problem of best subset selection. It aims to determine which features or variables are crucial for optimal model performance when provided with a dataset and a prediction task. abess was introduced by Zhu in 2020 and it dynamically selects the appropriate model size adaptively, eliminating the need for selecting regularization parameters. abess is applicable in various statistical and machine learning tasks, including linear regression, the Single-index model, and other common predictive models. abess can also be applied in biostatistics. == Basic Form == The basic form of abess is employed to address the optimal subset selection problem in general linear regression. abess is an l 0 {\displaystyle l_{0}} method, it is characterized by its polynomial time complexity and the property of providing both unbiased and consistent estimates. In the context of linear regression, assuming we have knowledge of n {\displaystyle n} independent samples ( x i , y i ) , i = 1 , … , n {\displaystyle (x_{i},y_{i}),i=1,\ldots ,n} , where x i ∈ R p × 1 {\displaystyle x_{i}\in \mathbb {R} ^{p\times 1}} and y i ∈ R {\displaystyle y_{i}\in \mathbb {R} } , we define X = ( x 1 , … , x n ) ⊤ {\displaystyle X=(x_{1},\ldots ,x_{n})^{\top }} and y = ( y 1 , … , y n ) ⊤ {\displaystyle y=(y_{1},\ldots ,y_{n})^{\top }} . The following equation represents the general linear regression model: y = X β + ε . {\displaystyle y=X\beta +\varepsilon .} To obtain appropriate parameters β {\displaystyle \beta } , one can consider the loss function for linear regression: L n LR ( β ; X , y ) = 1 2 n ‖ y − X β ‖ 2 2 . {\displaystyle {\mathcal {L}}_{n}^{\text{LR}}(\beta ;X,y)={\frac {1}{2n}}\|y-X\beta \|_{2}^{2}.} In abess, the initial focus is on optimizing the loss function under the l 0 {\displaystyle l_{0}} constraint. That is, we consider the following problem: min β ∈ R p × 1 L n LR ( β ; X , y ) , subject to ‖ β ‖ 0 ≤ s , {\displaystyle \min _{\beta \in \mathbb {R} ^{p\times 1}}{\mathcal {L}}_{n}^{\text{LR}}(\beta ;X,y),{\text{ subject to }}\|\beta \|_{0}\leq s,} where s {\displaystyle s} represents the desired size of the support set, and ‖ β ‖ 0 = ∑ i = 1 p I ( β i ≠ 0 ) {\displaystyle \|\beta \|_{0}=\sum _{i=1}^{p}{\mathcal {I}}_{(\beta _{i}\neq 0)}} is the l 0 {\displaystyle l_{0}} norm of the vector. To address the optimization problem described above, abess iteratively exchanges an equal number of variables between the active set and the inactive set. In each iteration, the concept of sacrifice is introduced as follows: For j in the active set ( j ∈ A ^ {\displaystyle j\in {\hat {\mathcal {A}}}} ): ξ j = L n LR ( β ^ A ∖ { j } ) − L n LR ( β ^ A ) = X j ⊤ X j 2 n ( β ^ j ) 2 {\displaystyle \xi _{j}={\mathcal {L}}_{n}^{\text{LR}}\left({\hat {\boldsymbol {\beta }}}^{{\mathcal {A}}\backslash \{j\}}\right)-{\mathcal {L}}_{n}^{\text{LR}}\left({\hat {\boldsymbol {\beta }}}^{\mathcal {A}}\right)={\frac {{\boldsymbol {X}}_{j}^{\top }{\boldsymbol {X}}_{j}}{2n}}\left({\hat {\beta }}_{j}\right)^{2}} For j in the inactive set ( j ∉ A ^ {\displaystyle j\notin {\hat {\mathcal {A}}}} ): ξ j = L n LR ( β ^ A ) − L n LR ( β ^ A + t ^ { j } ) = X j ⊤ X j 2 n ( d ^ j X j ⊤ X j / n ) 2 {\displaystyle \xi _{j}={\mathcal {L}}_{n}^{\text{LR}}\left({\hat {\boldsymbol {\beta }}}^{\mathcal {A}}\right)-{\mathcal {L}}_{n}^{\text{LR}}\left({\hat {\boldsymbol {\beta }}}^{\mathcal {A}}+{\hat {\boldsymbol {t}}}^{\{j\}}\right)={\frac {{\boldsymbol {X}}_{j}^{\top }{\boldsymbol {X}}_{j}}{2n}}\left({\frac {{\hat {\mathrm {d} }}_{j}}{{\boldsymbol {X}}_{j}^{\top }{\boldsymbol {X}}_{j}/n}}\right)^{2}} Here are the key elements in the above equations: β ^ A {\displaystyle {\hat {\beta }}^{\mathcal {A}}} : This represents the estimate of β {\displaystyle \beta } obtained in the previous iteration. A ^ {\displaystyle {\hat {\mathcal {A}}}} : It denotes the estimated active set from the previous iteration. β ^ A ∖ { j } {\displaystyle {\hat {\boldsymbol {\beta }}}^{{\mathcal {A}}\backslash \{j\}}} : This is a vector where the j-th element is set to 0, while the other elements are the same as β ^ A {\displaystyle {\hat {\beta }}^{\mathcal {A}}} . t ^ { j } = arg ⁡ min t L n LR ( β ^ A + t { j } ) {\displaystyle {\hat {\boldsymbol {t}}}^{\{j\}}=\arg \min _{t}{\mathcal {L}}_{n}^{\text{LR}}\left({\hat {\boldsymbol {\beta }}}^{\mathcal {A}}+{\boldsymbol {t}}^{\{j\}}\right)} : Here, t { j } {\displaystyle t^{\{j\}}} represents a vector where all elements are 0 except the j-th element. d ^ j = X j ⊤ ( y − X β ^ ) / n {\displaystyle {\hat {d}}_{j}={\boldsymbol {X}}_{j}^{\top }({\boldsymbol {y}}-{\boldsymbol {X}}{\hat {\boldsymbol {\beta }}})/n} : This is calculated based on the equation mentioned. The iterative process involves exchanging variables, with the aim of minimizing the sacrifices in the active set while maximizing the sacrifices in the inactive set during each iteration. This approach allows abess to efficiently search for the optimal feature subset. In abess, select an appropriate s max {\displaystyle s_{\max }} and optimize the above problem for active sets size s = 1 , … , s max {\displaystyle s=1,\ldots ,s_{\max }} using the information criterion GIC = n log ⁡ L n LR + s log ⁡ p log ⁡ log ⁡ n , {\displaystyle {\text{GIC}}=n\log {\mathcal {L}}_{n}^{\text{LR}}+s\log p\log \log n,} to adaptively choose the appropriate active set size s {\displaystyle s} and obtain its corresponding abess estimator. == Generalizations == The splicing algorithm in abess can be employed for subset selection in other models. === Distribution-Free Location-Scale Regression === In 2023, Siegfried extends abess to the case of Distribution-Free and Location-Scale. Specifically, it considers the optimization problem max ϑ ∈ R P , β ∈ R J , γ ∈ R J ∑ i = 1 N ℓ i ( ϑ , x i ⊤ β , exp ⁡ ( x i ⊤ γ ) − 1 ) , {\displaystyle \max _{{\boldsymbol {\vartheta }}\in \mathbb {R} ^{P},{\boldsymbol {\beta }}\in \mathbb {R} ^{J},{\boldsymbol {\gamma }}\in \mathbb {R} ^{J}}\sum _{i=1}^{N}\ell _{i}\left({\boldsymbol {\vartheta }},{\boldsymbol {x}}_{i}^{\top }{\boldsymbol {\beta }},{\sqrt {\exp \left({\boldsymbol {x}}_{i}^{\top }{\boldsymbol {\gamma }}\right)}}^{-1}\right),} subject to ‖ ( β ⊤ , γ ⊤ ) ⊤ ‖ 0 ≤ s , {\displaystyle \left\|\left({\boldsymbol {\beta }}^{\top },{\boldsymbol {\gamma }}^{\top }\right)^{\top }\right\|_{0}\leq s,} where ℓ i {\displaystyle \ell _{i}} is a loss function, ϑ {\displaystyle {\boldsymbol {\vartheta }}} is a parameter vector, β {\displaystyle {\boldsymbol {\beta }}} and γ {\displaystyle {\boldsymbol {\gamma }}} are vectors, and x i {\displaystyle {\boldsymbol {x}}_{i}} is a data vector. This approach, demonstrated across various applications, enables parsimonious regression modeling for arbitrary outcomes while maintaining interpretability through innovative subset selection procedures. === Groups Selection === In 2023, Zhang applied the splicing algorithm to group selection, optimizing the following model: min β ∈ R p L n LR ( β ; X , y ) subject to ∑ j = 1 J I ( ‖ β G j ‖ 2 ≠ 0 ) ≤ s {\displaystyle \min _{{\boldsymbol {\beta }}\in \mathbb {R} ^{p}}{\mathcal {L}}_{n}^{\text{LR}}(\beta ;X,y){\text{ subject to }}\sum _{j=1}^{J}I\left(\|{\boldsymbol {\beta }}_{G_{j}}\|_{2}\neq 0\right)\leq s} Here are the symbols involved: J {\displaystyle J} : Total number of feature groups, representing the existence of J {\displaystyle J} non-overlapping feature groups in the dataset. G j {\displaystyle G_{j}} : Index set for the j {\displaystyle j} -th feature group, where j {\displaystyle j} ranges from 1 to J {\displaystyle J} , representing the feature grouping structure in the data. s {\displaystyle s} : Model size, a positive integer determined from the data, limiting the number of selected feature groups. === Regression with Corrupted Data === Zhang applied the splicing algorithm to handle corrupted data. Corrupted data refers to information that has been disrupted or contains errors during the data collection or recording process. This interference may include sensor inaccuracies, recording errors, communication issues, or other external disturbances, leading to inaccurate or distorted observations within the dataset. === Single Index Models === In 2023, Tang applied the splicing algorithm to optimal subset selection in the Single-index model. The form of the Single Index Model (SIM) is given by y i = g ( b ⊤ x i , e i ) , i = 1 , … , n , {\displaystyle y_{i}=g({\boldsymbol {b}}^{\top }{\boldsymbol {x}}_{i},e_{i}),\quad i=1,\ldots ,n,} where b {\displaystyle {\boldsymbol {b}}} is the parameter vector, e i {\displaystyle e_{i}} is the error term. The corresponding loss function is defined as l n ( β ) = ∑ i = 1 n ( r i n − 1 2 − x i ⊤ β ) 2 , {\displaystyle l_{n}({\boldsymbol {\beta }})=\sum _{i=1}^{n}\left({\frac {r_{i}}{n}}-{\frac {1}{2}}-{\boldsymbol {x}}_{i}^{\top }{\boldsymbol {\beta }}\right)^{2},} where r {\disp