AI Coding Interview Meta

AI Coding Interview Meta — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Metadatabase

    Metadatabase

    Metadatabase is a database model for (1) metadata management, (2) global query of independent databases, and (3) distributed data processing. The word metadatabase is an addition to the dictionary. Originally, metadata was only a common term referring simply to "data about data", such as tags, keywords, and markup headers. However, in this technology, the concept of metadata is extended to also include such data and knowledge representation as information models (e.g., relations, entities-relationships, and objects), application logic (e.g., production rules), and analytic models (e.g., simulation, optimization, and mathematical algorithms). In the case of analytic models, it is also referred to as a Modelbase. These classes of metadata are integrated with some modeling ontology to give rise to a stable set of meta-relations (tables of metadata). Individual models are interpreted as metadata and entered into these tables. As such, models are inserted, retrieved, updated, and deleted in the same manner as ordinary data do in an ordinary (relational) database. Users will also formulate global queries and requests for processing of local databases through the Metadatabase, using the globally integrated metadata. The Metadatabase structure can be implemented in any open technology for relational databases. == Significance == The Metadatabase technology is developed at Rensselaer Polytechnic Institute at Troy, New York, by a group of faculty and students (see the references at the end of the article), starting in late 1980s. Its main contribution includes the extension of the concept of metadata and metadata management, and the original approach of designing a database for metadata applications. These conceptual results continue to motivate new research and new applications. At the level of particular design, its openness and scalability is tied to that of the particular ontology proposed: It requires reverse-representation of the application models in order to save them into the meta-relations. In theory, the ontology is neutral, and it has been proven in some industrial applications. However, it needs more development to establish it for the field as an open technology. The requirement of reverse-representation is common to any global information integration technology. A way to facilitate it in the Metadatabase approach is to distribute a core portion of it at each local site, to allow for peer-to-peer translation on the fly.

    Read more →
  • Scikit-learn

    Scikit-learn

    scikit-learn (formerly scikits.learn and also known as sklearn) is a free and open-source machine learning library for the Python programming language. It features various classification, regression and clustering algorithms including support-vector machines, random forests, gradient boosting, k-means and DBSCAN, and is designed to interoperate with the Python numerical and scientific libraries NumPy and SciPy. Scikit-learn is a NumFOCUS fiscally sponsored project. == Overview == The scikit-learn project started as scikits.learn, a Google Summer of Code project by French data scientist David Cournapeau. The name of the project derives from its role as a "scientific toolkit for machine learning", originally developed and distributed as a third-party extension to SciPy. The original codebase was later rewritten by other developers. In 2010, contributors Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort and Vincent Michel, from the French Institute for Research in Computer Science and Automation in Saclay, France, took leadership of the project and released the first public version of the library on February 1, 2010. In November 2012, scikit-learn as well as scikit-image were described as two of the "well-maintained and popular" scikits libraries. In 2019, it was noted that scikit-learn is one of the most popular machine learning libraries on GitHub. At that time, the project had over 1,400 contributors and the documentation received 42 million visits in 2018. According to a 2022 Kaggle survey of nearly 24,000 respondents from 173 countries, scikit-learn was identified as the most widely used machine learning framework. == Features == Large catalogue of well-established machine learning algorithms and data pre-processing methods (i.e. feature engineering) Utility methods for common data-science tasks, such as splitting data into train and test sets, cross-validation and grid search Consistent way of running machine learning models (estimator.fit() and estimator.predict()), which libraries can implement Declarative way of structuring a data science process (the Pipeline), including data pre-processing and model fitting == Examples == Fitting a random forest classifier: == Implementation == scikit-learn is largely written in Python, and uses NumPy extensively for high-performance linear algebra and array operations. Furthermore, some core algorithms are written in Cython to improve performance. Support vector machines are implemented by a Cython wrapper around LIBSVM; logistic regression and linear support vector machines by a similar wrapper around LIBLINEAR. In such cases, extending these methods with Python may not be possible. scikit-learn integrates well with many other Python libraries, such as Matplotlib and plotly for plotting, NumPy for array vectorization, Pandas dataframes, SciPy, and many more. == History == scikit-learn was initially developed by David Cournapeau as a Google Summer of Code project in 2007. Later that year, Matthieu Brucher joined the project and started to use it as a part of his thesis work. In 2010, INRIA, the French Institute for Research in Computer Science and Automation, got involved and the first public release (v0.1 beta) was published in late January 2010. The project released its first stable version, 1.0.0, on September 24, 2021. The release was the result of over 2,100 merged pull requests, approximately 800 of which were dedicated to improving documentation. Development continues to focus on bug fixes, efficiency and feature expansion. The latest version, 1.8, was released on December 10, 2025. This update introduced native Array API support, enabling the library to perform GPU computations by directly using PyTorch and CuPy arrays. This version also included bug fixes, improvements and new features, such as efficiency improvements to the fit time of linear models. == Applications == Scikit-learn is widely used across industries for a variety of machine learning tasks such as classification, regression, clustering, and model selection. The following are real-world applications of the library: === Finance and Insurance === AXA uses scikit-learn to speed up the compensation process for car accidents and to detect insurance fraud. Zopa, a peer-to-peer lending platform, employs scikit-learn for credit risk modelling, fraud detection, marketing segmentation, and loan pricing. BNP Paribas Cardif uses scikit-learn to improve the dispatching of incoming mail and manage internal model risk governance through pipelines that reduce operational and overfitting risks. J.P. Morgan reports broad usage of scikit-learn across the bank for classification tasks and predictive analytics in financial decision-making. === Retail and E-Commerce === Booking.com uses scikit-learn for hotel and destination recommendation systems, fraudulent reservation detection, and workforce scheduling for customer support agents. HowAboutWe uses it to predict user engagement and preferences on a dating platform. Lovely leverages the library to understand user behaviour and detect fraudulent activity on its platform. Data Publica uses it for customer segmentation based on the success of past partnerships. Otto Group integrates scikit-learn throughout its data science stack, particularly in logistics optimization and product recommendations. === Media, Marketing, and Social Platforms === Spotify applies scikit-learn in its recommendation systems. Betaworks uses the library for both recommendation systems (e.g., for Digg) and dynamic subspace clustering applied to weather forecasting data. PeerIndex used scikit-learn for missing data imputation, tweet classification, and community clustering in social media analytics. Bestofmedia Group employs it for spam detection and ad click prediction. Machinalis utilizes scikit-learn for click-through rate prediction and relational information extraction for content classification and advertising optimization. Change.org applies scikit-learn for targeted email outreach based on user behaviour. === Technology === AWeber uses scikit-learn to extract features from emails and build pipelines for managing large-scale email campaigns. Solido applies it to semiconductor design tasks such as rare-event estimation and worst-case verification using statistical learning. Evernote, Dataiku, and other tech companies employ scikit-learn in prototyping and production workflows due to its consistent API and integration with the Python ecosystem. === Academia === Télécom ParisTech integrates scikit-learn in hands-on coursework and assignments as part of its machine learning curriculum. == Awards == 2019 Inria-French Academy of Sciences-Dassault Systèmes Innovation Prize: Awarded in recognition of scikit-learn's impact as a major free software breakthrough in machine learning and its role in the digital transformation of science and industry. 2022 Open Science Award for Open Source Research Software: Awarded by the French Ministry of Higher Education and Research as part of the second National Plan for Open Science. The project was recognized in the "Community" category for its technical quality, its large international contributor network, and the quality of its documentation.

    Read more →
  • Discrete skeleton evolution

    Discrete skeleton evolution

    Discrete Skeleton Evolution (DSE) describes an iterative approach to reducing a morphological or topological skeleton. It is a form of pruning in that it removes noisy or redundant branches (spurs) generated by the skeletonization process, while preserving information-rich "trunk" segments. The value assigned to individual branches varies from algorithm to algorithm, with the general goal being to convey the features of interest of the original contour with a few carefully chosen lines. Usually, clarity for human vision (aka. the ability to "read" some features of the original shape from the skeleton) is valued as well. DSE algorithms are distinguished by complex, recursive decision-making processes with high computational requirements. Pruning methods such as by structuring element (SE) convolution and the Hough transform are general purpose algorithms which quickly pass through an image and eliminate all branches shorter than a given threshold. DSE methods are most applicable when detail retention and contour reconstruction are valued. == Methodology == === Pre-processing === Input images will typical contain more data than is necessary to generate an initial skeleton, and thus must be reduced in some way. Reducing the resolution, converting to grayscale, and then binary by masking or thresholding are common first steps. Noise removal may occur before and/or after converting an image to binary. Morphological operations such as closing, opening, and smoothing of the binary image may also be part of pre-processing. Ideally, the binarized contour should be as noise-free as possible before the skeleton is generated. === Skeletonization === DSE techniques may be applied to an existing skeleton or incorporated as part of the skeleton growing algorithm. Suitable skeletons may be obtained using a variety of methods: Thinning algorithms, such as the Grassfire transform Voronoi diagram Medial Axis Transform or Symmetry Axis Transform Distance Mapping === Significance Measures === DSE and related methods remove entire spurious branches while leaving the main trunk intact. The intended result is typically optimized for visual clarity and retention of information, such that the original contour can be reconstructed from the fully pruned skeleton. The value of various properties must be weighted by the application, and improving the efficiency is an ongoing topic of research in computer vision and image processing. Some significance measures include: Discrete Bisector Function Contour length Bending Potential Ratio Discrete Curve Evolution === Iteration === Each branch is evaluated during a pass through the skeletonized image according to the specific algorithm being used. Low value branches are removed and the process is repeated until a desired threshold of simplicity is reached. === Reconstruction === If all points on the output skeleton are the center points of maximal disks of the image and the radius information is retained, a contour image can be reconstructed == Applications == === Handwriting and text parsing === Variability in hand-written text is an ongoing challenge, simplification makes it somewhat easier for computer vision algorithms to make judgements about intended characters. === Soft body classification (animals) === The maximal disks centered on the skeleton imply roughly spherical masses, the features of the extracted skeleton are relatively unchanged even as the soft body deforms or self-occludes. Skeleton information is one facet of determining whether two animals are the "same" some way, though it must usually be paired with another technique to effectively identify a target. === Medical uses === Investigation of organs, tissue damage and deformation caused by disease.

    Read more →
  • Boundary vector field

    Boundary vector field

    The boundary vector field (BVF) is an external force for parametric active contours (i.e. Snakes). In the fields of computer vision and image processing, parametric active contours are widely used for segmentation and object extraction. The active contours move progressively towards its target based on the external forces. There are a number of shortcomings in using the traditional external forces, including the capture range problem, the concave object extraction problem, and high computational requirements. The BVF is generated by an interpolation scheme which reduces the computational requirement significantly, and at the same time, improves the capture range and concave object extraction capability. The BVF is also tested in moving object tracking and is proven to provide fast detection method for real time video applications.

    Read more →
  • Feature engineering

    Feature engineering

    Feature engineering is a preprocessing step in supervised machine learning and statistical modeling which transforms raw data into a more effective set of inputs. Each input comprises several attributes, known as features. By providing models with relevant information, feature engineering significantly enhances their predictive accuracy and decision-making capability. Beyond machine learning, the principles of feature engineering are applied in various scientific fields, including physics. For example, physicists construct dimensionless numbers such as the Reynolds number in fluid dynamics, the Nusselt number in heat transfer, and the Archimedes number in sedimentation. They also develop first approximations of solutions, such as analytical solutions for the strength of materials in mechanics. == Clustering == One of the applications of feature engineering has been clustering of feature-objects or sample-objects in a dataset. Especially, feature engineering based on matrix decomposition has been extensively used for data clustering under non-negativity constraints on the feature coefficients. These include Non-Negative Matrix Factorization (NMF), Non-Negative Matrix-Tri Factorization (NMTF), Non-Negative Tensor Decomposition/Factorization (NTF/NTD), etc. The non-negativity constraints on coefficients of the feature vectors mined by the above-stated algorithms yields a part-based representation, and different factor matrices exhibit natural clustering properties. Several extensions of the above-stated feature engineering methods have been reported in literature, including orthogonality-constrained factorization for hard clustering, and manifold learning to overcome inherent issues with these algorithms. Other classes of feature engineering algorithms include leveraging a common hidden structure across multiple inter-related datasets to obtain a consensus (common) clustering scheme. An example is Multi-view Classification based on Consensus Matrix Decomposition (MCMD), which mines a common clustering scheme across multiple datasets. MCMD is designed to output two types of class labels (scale-variant and scale-invariant clustering), and: is computationally robust to missing information, can obtain shape- and scale-based outliers, and can handle high-dimensional data effectively. Coupled matrix and tensor decompositions are popular in multi-view feature engineering. == Predictive modelling == Feature engineering in machine learning and statistical modeling involves selecting, creating, transforming, and extracting data features. Key components include feature creation from existing data, transforming and imputing missing or invalid features, reducing data dimensionality through methods like Principal Components Analysis (PCA), Independent Component Analysis (ICA), and Linear Discriminant Analysis (LDA), and selecting the most relevant features for model training based on importance scores and correlation matrices. Features vary in significance. Even relatively insignificant features may contribute to a model. Feature selection can reduce the number of features to prevent a model from becoming too specific to the training data set (overfitting). Feature explosion occurs when the number of identified features is too large for effective model estimation or optimization. Common causes include: Feature templates - implementing feature templates instead of coding new features Feature combinations - combinations that cannot be represented by a linear system Feature explosion can be limited via techniques such as regularization, kernel methods, and feature selection. == Automation == Automation of feature engineering is a research topic that dates back to the 1990s. Machine learning software that incorporates automated feature engineering has been commercially available since 2016. Related academic literature can be roughly separated into two types: Multi-relational Decision Tree Learning (MRDTL) uses a supervised algorithm that is similar to a decision tree. Deep Feature Synthesis uses simpler methods. === Multi-relational Decision Tree Learning (MRDTL) === Multi-relational Decision Tree Learning (MRDTL) extends traditional decision tree methods to relational databases, handling complex data relationships across tables. It innovatively uses selection graphs as decision nodes, refined systematically until a specific termination criterion is reached. Most MRDTL studies base implementations on relational databases, which results in many redundant operations. These redundancies can be reduced by using techniques such as tuple id propagation. === Open-source implementations === There are a number of open-source libraries and tools that automate feature engineering on relational data and time series: featuretools is a Python library for transforming time series and relational data into feature matrices for machine learning. MCMD: An open-source feature engineering algorithm for joint clustering of multiple datasets. OneBM or One-Button Machine combines feature transformations and feature selection on relational data with feature selection techniques. OneBM helps data scientists reduce data exploration time allowing them to try and error many ideas in short time. On the other hand, it enables non-experts, who are not familiar with data science, to quickly extract value from their data with a little effort, time, and cost. getML community is an open source tool for automated feature engineering on time series and relational data. It is implemented in C/C++ with a Python interface. It has been shown to be at least 60 times faster than tsflex, tsfresh, tsfel, featuretools or kats. tsfresh is a Python library for feature extraction on time series data. It evaluates the quality of the features using hypothesis testing. tsflex is an open source Python library for extracting features from time series data. Despite being 100% written in Python, it has been shown to be faster and more memory efficient than tsfresh, seglearn or tsfel. seglearn is an extension for multivariate, sequential time series data to the scikit-learn Python library. tsfel is a Python package for feature extraction on time series data. kats is a Python toolkit for analyzing time series data. === Deep feature synthesis === The deep feature synthesis (DFS) algorithm beat 615 of 906 human teams in a competition. == Feature stores == The feature store is where the features are stored and organized for the explicit purpose of being used to either train models (by data scientists) or make predictions (by applications that have a trained model). It is a central location where you can either create or update groups of features created from multiple different data sources, or create and update new datasets from those feature groups for training models or for use in applications that do not want to compute the features but just retrieve them when it needs them to make predictions. A feature store includes the ability to store code used to generate features, apply the code to raw data, and serve those features to models upon request. Useful capabilities include feature versioning and policies governing the circumstances under which features can be used. Feature stores can be standalone software tools or built into machine learning platforms. == Alternatives == Feature engineering can be a time-consuming and error-prone process, as it requires domain expertise and often involves trial and error. Deep learning algorithms may be used to process a large raw dataset without having to resort to feature engineering. However, deep learning algorithms still require careful preprocessing and cleaning of the input data. In addition, choosing the right architecture, hyperparameters, and optimization algorithm for a deep neural network can be a challenging and iterative process.

    Read more →
  • Dominant resource fairness

    Dominant resource fairness

    Dominant resource fairness (DRF) is a rule for fair division. It is particularly useful for dividing computing resources in among users in cloud computing environments, where each user may require a different combination of resources. DRF was presented by Ali Ghodsi, Matei Zaharia, Benjamin Hindman, Andy Konwinski, Scott Shenker and Ion Stoica in 2011. == Motivation == In an environment with a single resource, a widely used criterion is max-min fairness, which aims to maximize the minimum amount of resource given to a user. But in cloud computing, it is required to share different types of resource, such as: memory, CPU, bandwidth and disk-space. Previous fair schedulers, such as in Apache Hadoop, reduced the multi-resource setting to a single-resource setting by defining nodes with a fixed amount of each resource (e.g. 4 CPU, 32 MB memory, etc.), and dividing slots which are fractions of nodes. But this method is inefficient, since not all users need the same ratio of resources. For example, some users need more CPU whereas other users need more memory. As a result, most tasks either under-utilize or over-utilize their resources. DRF solves the problem by maximizing the minimum amount of the dominant resource given to a user (then the second-minimum etc., in a leximin order). The dominant resource may be different for different users. For example, if user A runs CPU-heavy tasks and user B runs memory-heavy tasks, DRF will try to equalize the CPU share given to user A and the memory share given to user B. == Definition == There are m resources. The total capacities of the resources are r1,...,rm. There are n users. Each users runs individual tasks. Each task has a demand-vector (d1,..,dm), representing the amount it needs of each resource. It is implicitly assumed that the utility of a user equals the number of tasks he can perform. For example, if user A runs tasks with demand-vector [1 CPU, 4 GB RAM], and receives 3 CPU and 8 GB RAM, then his utility is 2, since he can perform only 2 tasks. More generally, the utility of a user receiving x1,...,xm resources is minj(xj/dj), that is, the users have Leontief utilities. The demand-vectors are normalized to fractions of the capacities. For example, if the system has 9 CPUs and 18 GB RAM, then the above demand-vector is normalized to [1/9 CPU, 2/9 GB]. For each user, the resource with the highest demand-fraction is called the dominant resource. In the above example, the dominant resource is memory, as 2/9 is the largest fraction. If user B runs a task with demand-vector [3 CPU, 1 GB], which is normalized to [1/3 CPU, 1/18 GB], then his dominant resource is CPU. DRF aims to find the maximum x such that all agents can receive at least x of their dominant resource. In the above example, this maximum x is 2/3: User A gets 3 tasks, which require 3/9 CPU and 2/3 GB. User B gets 2 tasks, which require 2/3 CPU and 1/9 GB. The maximum x can be found by solving a linear program; see Lexicographic max-min optimization. Alternatively, the DRF can be computed sequentially. The algorithm tracks the amount of dominant resource used by each user. At each round, it finds a user with the smallest allocated dominant resource so far, and allocates the next task of this user. Note that this procedure allows the same user to run tasks with different demand vectors. == Properties == DRF has several advantages over other policies for resource allocation. Proportionality: each user receives at least as much resources as they could get in a system in which all resources are partitioned equally among users (the authors call this condition "sharing incentive"). Strategyproofness: a user cannot get a larger allocation by lying about his needs. Strategyproofness is important, as evidence from cloud operators show that users try to manipulate the servers in order to get better allocations. Envy-freeness: no user would prefer the allocation of another user. Pareto efficiency: no other allocation is better for some users and not worse for anyone. Population monotonicity: when a user leaves the system, the allocations of remaining users do not decrease. When there is a single resource that is a bottleneck resource (highly demanded by all users), DRF reduces to max-min fairness. However, DRF violates resource monotonicity: when resources are added to the system, some allocations might decrease. == Extensions == Weighted DRF is an extension of DRF to settings in which different users have different weights (representing their different entitlements). Parkes, Procaccia and Shah formally extend weighted DRF to a setting in which some users do not need all resources (that is, they may have demand 0 to some resource). They prove that the extended version still satisfies proportionality, Pareto-efficiency, envy-freeness, strategyproofness, and even Group strategyproofness. On the other hand, they show that DRF may yield poor utilitarian social welfare, that is, the sum of utilities may be only 1/m of the optimum. However, they prove that any mechanism satisfying one of proportionality, envy-freeness or strategyproofness may suffers from the same low utilitarian welfare. They also extend DRF to the setting in which the users' demands are indivisible (as in fair item allocation). For the indivisible setting, they relax envy-freeness to EF1. They show that strategyproofness is incompatible with PO+EF1 or with PO+proportionality. However, a mechanism called SequentialMinMax satisfies efficiency, proportionality and EF1. Wang, Li and Liang present DRFH - an extension of DRF to a system with several heterogeneous servers. == Implementation == DRF was first implemented in Apache Mesos - a cluster resource manager, and it led to better throughput and fairness than previously used fair-sharing schemes.

    Read more →
  • Cloud-native computing

    Cloud-native computing

    Cloud native computing is an approach in software development that utilizes cloud computing to "build and run scalable applications in modern, dynamic environments such as public, private, and hybrid clouds". These technologies, such as containers, microservices, serverless functions, cloud native processors and immutable infrastructure, deployed via declarative code are common elements of this architectural style. Cloud native technologies focus on minimizing users' operational burden. Cloud native techniques "enable loosely coupled systems that are resilient, manageable, and observable. Combined with robust automation, they allow engineers to make high-impact changes frequently and predictably with minimal toil." This independence contributes to the overall resilience of the system, as issues in one area do not necessarily cripple the entire application. Additionally, such systems are easier to manage, and monitor, given their modular nature, which simplifies tracking performance and identifying issues. Frequently, cloud-native applications are built as a set of microservices that run in Open Container Initiative compliant containers, such as Containerd, and may be orchestrated in Kubernetes and managed and deployed using DevOps and Git CI workflows (although there is a large amount of competing open source that supports cloud-native development). The advantage of using containers is the ability to package all software needed to execute into one executable package. The container runs in a virtualized environment, which isolates the contained application from its environment.

    Read more →
  • Radek Maneuver

    Radek Maneuver

    The Radek Maneuver is a scale-up-then-scale-down tactic used in the administration of web services, specifically those deployed under a cloud computing paradigm (by a provider e.g. Amazon Elastic Compute Cloud or Microsoft Azure). == History == Developed by Olivier "Radek" Dabrowski in the mid-2010s, the Radek Maneuver was originally conceived of in using and maintaining applications running on a PaaS system. == Execution == The Radek Maneuver consists of a series of steps, usually executed via the PaaS or web portal interface. The tactic should be used when a service is misbehaving or otherwise experiencing errors, and the suspected cause is the underlying cloud layer, rather than the application layer. This includes networking issues and other "bad box" problems. The steps are as follows: Identify the application or service which is misbehaving. Increase the compute resource (number of CPU cores, amount of ram) for the instance on which the application is running. This is also known as scaling up. Wait for the application to re-deploy and stabilize. Scale back down to the original instance size. == Principle of action == This scale-up-scale-down method is understood to shift the application to a different physical machine underlying the PaaS service or application virtual machine. While this layer of the cloud computing stack is generally out of the access of an application developer (instead in the hands of the cloud provider), the maneuver allows troubleshooting and dodging errors in that layer.

    Read more →
  • Smart data capture

    Smart data capture

    Smart data capture (SDC), also known as 'intelligent data capture' or 'automated data capture', describes the branch of technology concerned with using computer vision techniques like optical character recognition (OCR), barcode scanning, object recognition and other similar technologies to extract and process information from semi-structured and unstructured data sources. IDC characterize smart data capture as an integrated hardware, software, and connectivity strategy to help organizations enable the capture of data in an efficient, repeatable, scalable, and future-proof way. Data is captured visually from barcodes, text, IDs and other objects - often from many sources simultaneously - before being converted and prepared for digital use, typically by artificial intelligence-powered software. An important feature of SDC is that it focuses not just on capturing data more efficiently but serving up easy-to-access, actionable insights at the instant of data collection to both frontline and desk-based workers, aiding decision-making and making it a two-way process. Smart data capture automates and accelerates capture, applying insights in real time and automating processes based on extracted input. Smart data capture is designed to be repeatable and scalable to reduce low-level manual tasks and eliminate human error. To achieve this goal, smart data capture solutions are often made available using specialist software installed on commodity hardware such as smartphones. However, some solutions may rely on specialized hardware such as dedicated scanning devices, wearables or shop floor robots. == Differences from OCR == Optical character recognition applications are typically concerned with the actual data capture process; they are intended to faithfully reproduce text, words, letters and symbols from a printed document. Smart data capture is multimodal, capable of extracting data from a wider range of semi-structured and unstructured sources, going beyond basic text recognition to offer a wider scope of applications. By extending functionality to provide actionable insights at the point of capture, SDC is also a two-way process (capture-display), while OCR is more commonly one-way (capture only), primarily used for data input. Smart data capture solutions typically have two parts: Data capture (which includes OCR, barcode scanning, object recognition) Functionality that then uses this data to provide actionable insights at the point of capture. == Applications == Smart data capture can be applied to almost any industry and application that requires visual information capture and interpretation. This may include: Retail Warehouse inventory control Logistics, handling and shipping Manufacturing Field service Healthcare Transport and travel Fraud detection

    Read more →
  • T Layout

    T Layout

    The T-Layout is an architectural and design concept for web applications, specifically tailored to improve the user experience on mobile devices. It features a horizontally scrollable container divided into three distinct sections, each spanning the full width of the screen, and was developed to optimise space usage and streamline navigation. == Background == The T-Layout introduces horizontal scrolling as a complementary method to the conventional pop-up-based navigation system in mobile web applications. In this layout, the central section which is visible by default upon accessing the application, facilitates the main content of a URL address and is flanked by two "helper" sections. This approach minimises the need for extensive user movements, in order to reach navigation controls typically located at the top of the screen. It is aimed at enhancing the user experience on mobile devices by providing an easier way to access essential content such as the main navigation, e-commerce related screens, or user account related information, ensuring that those elements are readily accessible while requiring minimal user effort. The T-Layout was first implemented by E (e-streetwear.com) in their mobile web app layout, and it was inspired by the interfaces of well-tested native mobile apps like Instagram and Revolut. A study titled "Mobile Navigation and User Preferences Survey" indicated a preference among mobile app users for one-handed usage, primarily navigating with their thumb. These insights led to the T-Layout Experiment, which compared the efficiency of using swipe gestures to access navigational elements against reaching traditional navigation controls. == Development history == It was first released as the mobile layout of E in early 2023. It was originally developed based on six principles: user-centric functionality, lightweight filesize, HTML and CSS implementation with minimal or no use of JavaScript required, suitable both for browser and server-rendering architectures, intuitive design, and improved SEO. The development of the T-Layout was driven by the necessity for more ergonomic and user-friendly interfaces in mobile web applications. Its design, reminiscent of the letter 'T', emerged as a solution to several usability challenges mobile device users face, emphasising ease of access and efficient screen space utilisation. In July 2023, E formalised the concept and its technical specifications, introducing it to the web design and development community. In October 2023 the "Mobile Navigation and User Preferences Survey" was conducted, establishing that the vast majority of individuals prefer to use mobile applications by holding the phone in a one-handed grip, utilising only the thumb for gestures when possible. The subsequent "T-Layout Experiment", designed to measure the time in seconds and the distance (user effort) in pixels, required to access navigational elements by traditionally tapping on fixed-positioned controls compared to swiping anywhere on the screen. The results proved that swipe gestures require less time and much less effort. == Styling and features == The main characteristic of the T-Layout is its horizontal scrolling feature, which can improve navigation efficiency while preserving the functionality of traditionally structured user interfaces. Its Implementation can be achieved with a combination of HTML and styling with CSS as well as precompiled Scss and Sass, CSS-in-JS, and styled JSX. It can be either a purely HTML/CSS solution but JavaScript can be utilised as well to add more specific functionalities, while It can be implemented to both existing and new applications. Its application in server-side rendering architectures will ensure that all its underlying principles apply. Although principally each section in the layout has a distinct role and facilitates specific types of content, the T-Layout as a concept is versatile, and it is adaptable allowing modifications in the layout or how it's implemented to cater to the specific needs of different applications.

    Read more →
  • Switch (app)

    Switch (app)

    Switch was a mobile-only job-matching app that connected candidates directly to hiring managers. Candidates could upload their resumes and connect their social and professional media profiles, but remain anonymous while searching. Users received a daily set of job recommendations that fit their backgrounds and salary criteria, and swipe right to apply. Employers post many jobs on Switch directly, which eliminates the need for third-party job boards and recruiters, and connects job seekers to hiring managers. Switch reveals a candidate’s identity to one employer at a time, only after the candidate matches with that employer. When candidates and employers match, they can chat within the app. Switch is available for iOS, with an Android version in development. == History == === Founding === Yarden Tadmor founded Switch in New York City in January 2014. For the first 10 months, Tadmor funded the company himself. By December 2014, Switch had raised $1.4 million in funding from venture capitals firms Metamorphic Ventures, SG VC, BAM and Rhodium. Tadmor's inspiration for Switch came after being frustrated by his experience both as a job seeker, and also as a supervisor hiring at numerous technology startup companies. Tadmor has said of Switch, “We operate on the five-second resume principle, which is usually the amount of time a recruiter spends on a resume. They scan through the typical data points and move on.” Switch was designed for passive job seekers to browse openings discreetly and connect quickly. Originally, Switch served only the New York metro area technology sector while in early beta, but Tadmor always intended to expand into national coverage. Soon, the company started including all major metropolitan markets across the U.S. In May 2015, Switch announced it would start sourcing tech and media jobs from all the job boards available online. Later in 2015, Switch began to post jobs in smaller urban areas. The company also expanded industries and jobs to include restaurant staff, retail sales, healthcare, nursing and education. Tadmor subsequently founded Livekick, a one-on-one private fitness and yoga instruction company, based in New York. == Operation == In May 2015, Switch reported generating over 400,000 job applications. The company said that nine of the 50 largest websites in the U.S. were using the service. It had grown its customer base to thousands of companies in a few months from launch including Microsoft, Amazon, Facebook, IBM, Yahoo!, eBay, DropBox, SoundCloud, and Wikipedia. John Cline, software development manager at eBay, told ABC’s Good Morning America that Switch is now his “main way of finding new prospective employees.” Switch uses a double opt-in technique, meaning job seekers and employers must both say yes before moving forward. They also use swiping technology and intelligent matching algorithms to connect job seekers and employers. The user experience is different for each group, but the major attraction for both sides is the speed at which they can be connected. === Features === Swipe is a major aspect of the Switch user experience. Job seekers swipe to apply to jobs, or left to pass on positions. Employers respond and swipe right to reciprocate interest, or left to eliminate the candidate. Direct connection between job seekers and employers allows hiring managers and job seekers to start an immediate conversation. Hiring managers can message with job seekers within the app, and both parties can quickly vet one another and decide whether to move forward. Easy profile creation from social media and in-app profile editing helps job seekers focus on finding a job. === Users === Job Seekers can either load their profile manually or pull in professional credentials from social media. They can post validated photos on their Facebook account. Switch’s matching algorithm analyzes the job seeker’s location, experience, and skills to bring them jobs they may be interested in. Job seekers swipe to apply and, if the employer shows interest too, only then does Switch’s system reveal the job seeker’s identity to the corporate recruiter or hiring manager. The job seeker and hiring manager can then chat through the app. Employers behave similarly to job seekers. Hiring managers or corporate recruiters sign up online, add open positions, then view Switch-recommended candidates or wait for job seekers to swipe right. Employers can select relevant job seekers by swiping right on their profiles, then chat directly in the app. === Subscriptions === The app is currently free for users and employers. == Company overview == === Financials === Switch closed out its seed round in May 2015 with $2 million in seed round funding. Investors include Marker VC, Metamorphic, Rhodium, 500 Startups, BAM, SG VC and Marcel Legrand. In a July 2015 interview with Tadmor, he claimed that Switch had raised $2.4 million to date. == Reception == Thanks to its swipe technology and double opt-in make-up, the media often refers to Switch as the Tinder for jobs. Switch has received features in lists and app reviews as an effective tool to improve your digital job search, particularly on the mobile platform. “It’s minimal effort to connect with relevant matches,” said Good Morning America workplace contributor Tory Johnson. “Which is what everybody wants to find.”

    Read more →
  • List of Haskell software and tools

    List of Haskell software and tools

    This is a list of Haskell software and tools, including compilers, interpreters, build tools, package managers, integrated development environments, libraries, and other development utilities. == Compilers, interpreters and editors == Emacs — text editor Glasgow Haskell Compiler (GHC) Hugs — bytecode interpreter (discontinued) IntelliJ IDEA — IDE with Haskell support via plugins Vim — text editor Visual Studio Code — editor/IDE with Haskell support via extensions == Libraries and frameworks == Parsec — parser combinator library Servant — web framework Yesod — web framework == Build tools and package management == Cabal — build system and packaging infrastructure Haskell Platform — bundled distribution of Haskell tools and libraries (deprecated) Stack — build tool and dependency manager == Language tools and static analysis == Fourmolu — code formatter based on Ormolu Haskell Language Server — implementation of the Language Server Protocol for Haskell HLint — source code suggestion and linting tool Hoogle — Haskell API search engine Ormolu — code formatter Stan — static analysis tool Stylish Haskell — source code formatter == Interactive environments == GHCi — interactive REPL for the Glasgow Haskell Compiler IHaskell — Jupyter kernel for Haskell == Debugging and profiling tools == hp2ps — heap profiling visualization tool ThreadScope — parallel execution visualizer for Haskell programs == Documentation generators == Haddock — API documentation generator for Haskell == Parser and lexer generators == Alex — lexer generator for Haskell Happy — parser generator for Haskell == Testing frameworks == HUnit — unit testing framework QuickCheck — property-based testing library == Version control == Darcs — distributed version control system written in Haskell

    Read more →
  • Image stitching

    Image stitching

    Image stitching or photo stitching is the process of combining multiple photographic images with overlapping fields of view to produce a segmented panorama or high-resolution image. Commonly performed through the use of computer software, most approaches to image stitching require nearly exact overlaps between images and identical exposures to produce seamless results, although some stitching algorithms actually benefit from differently exposed images by doing high-dynamic-range imaging in regions of overlap. Some digital cameras can stitch their photos internally. == Applications == Image stitching is widely used in modern applications, such as the following: Document mosaicing Image stabilization feature in camcorders that use frame-rate image alignment High-resolution image mosaics in digital maps and satellite imagery Medical imaging Multiple-image super-resolution imaging Video stitching Object insertion == Process == The image stitching process can be divided into three main components: image registration, calibration, and blending. === Image stitching algorithms === In order to estimate image alignment, algorithms are needed to determine the appropriate mathematical model relating pixel coordinates in one image to pixel coordinates in another. Algorithms that combine direct pixel-to-pixel comparisons with gradient descent (and other optimization techniques) can be used to estimate these parameters. Distinctive features can be found in each image and then efficiently matched to rapidly establish correspondences between pairs of images. When multiple images exist in a panorama, techniques have been developed to compute a globally consistent set of alignments and to efficiently discover which images overlap one another. A final compositing surface onto which to warp or projectively transform and place all of the aligned images is needed, as are algorithms to seamlessly blend the overlapping images, even in the presence of parallax, lens distortion, scene motion, and exposure differences. === Image stitching issues === Since the illumination in two views cannot be guaranteed to be identical, stitching two images could create a visible seam. Other reasons for seams could be the background changing between two images for the same continuous foreground. Other major issues to deal with are the presence of parallax, lens distortion, scene motion, and exposure differences. In a non-ideal real-life case, the intensity varies across the whole scene, and so does the contrast and intensity across frames. Additionally, the aspect ratio of a panorama image needs to be taken into account to create a visually pleasing composite. For panoramic stitching, the ideal set of images will have a reasonable amount of overlap (at least 15–30%) to overcome lens distortion and have enough detectable features. The set of images will have consistent exposure between frames to minimize the probability of seams occurring. === Keypoint detection === Feature detection is necessary to automatically find correspondences between images. Robust correspondences are required in order to estimate the necessary transformation to align an image with the image it is being composited on. Corners, blobs, Harris corners, and differences of Gaussians of Harris corners are good features since they are repeatable and distinct. One of the first operators for interest point detection was developed by Hans Moravec in 1977 for his research involving the automatic navigation of a robot through a clustered environment. Moravec also defined the concept of "points of interest" in an image and concluded these interest points could be used to find matching regions in different images. The Moravec operator is considered to be a corner detector because it defines interest points as points where there are large intensity variations in all directions. This often is the case at corners. However, Moravec was not specifically interested in finding corners, just distinct regions in an image that could be used to register consecutive image frames. Harris and Stephens improved upon Moravec's corner detector by considering the differential of the corner score with respect to direction directly. They needed it as a processing step to build interpretations of a robot's environment based on image sequences. Like Moravec, they needed a method to match corresponding points in consecutive image frames, but were interested in tracking both corners and edges between frames. SIFT and SURF are recent key-point or interest point detector algorithms but a point to note is that SURF is patented and its commercial usage restricted. Once a feature has been detected, a descriptor method like SIFT descriptor can be applied to later match them. === Registration === Image registration involves matching features in a set of images or using direct alignment methods to search for image alignments that minimize the sum of absolute differences between overlapping pixels. When using direct alignment methods one might first calibrate one's images to get better results. Additionally, users may input a rough model of the panorama to help the feature matching stage, so that e.g. only neighboring images are searched for matching features. Since there are smaller group of features for matching, the result of the search is more accurate and execution of the comparison is faster. To estimate a robust model from the data, a common method used is known as RANSAC. The name RANSAC is an abbreviation for "RANdom SAmple Consensus". It is an iterative method for robust parameter estimation to fit mathematical models from sets of observed data points which may contain outliers. The algorithm is non-deterministic in the sense that it produces a reasonable result only with a certain probability, with this probability increasing as more iterations are performed. It being a probabilistic method means that different results will be obtained for every time the algorithm is run. The RANSAC algorithm has found many applications in computer vision, including the simultaneous solving of the correspondence problem and the estimation of the fundamental matrix related to a pair of stereo cameras. The basic assumption of the method is that the data consists of "inliers", i.e., data whose distribution can be explained by some mathematical model, and "outliers" which are data that do not fit the model. Outliers are considered points which come from noise, erroneous measurements, or simply incorrect data. For the problem of homography estimation, RANSAC works by trying to fit several models using some of the point pairs and then checking if the models were able to relate most of the points. The best model – the homography, which produces the highest number of correct matches – is then chosen as the answer for the problem; thus, if the ratio of number of outliers to data points is very low, the RANSAC outputs a decent model fitting the data. === Calibration === Image calibration aims to minimize differences between an ideal lens models and the camera-lens combination that was used, optical defects such as distortions, exposure differences between images, vignetting, camera response and chromatic aberrations. If feature detection methods were used to register images and absolute positions of the features were recorded and saved, stitching software may use the data for geometric optimization of the images in addition to placing the images on the panosphere. Panotools and its various derivative programs use this method. ==== Alignment ==== Alignment may be necessary to transform an image to match the view point of the image it is being composited with. Alignment, in simple terms, is a change in the coordinates system so that it adopts a new coordinate system which outputs image matching the required viewpoint. The types of transformations an image may go through are pure translation, pure rotation, a similarity transform which includes translation, rotation and scaling of the image which needs to be transformed, Affine or projective transform. Projective transformation is the farthest an image can transform (in the set of two dimensional planar transformations), where only visible features that are preserved in the transformed image are straight lines whereas parallelism is maintained in an affine transform. Projective transformation can be mathematically described as x ′ = H ⋅ x , {\displaystyle x'=H\cdot x,} where x {\displaystyle x} is points in the old coordinate system, x ′ {\displaystyle x'} is the corresponding points in the transformed image and H {\displaystyle H} is the homography matrix. Expressing the points x {\displaystyle x} and x ′ {\displaystyle x'} using the camera intrinsics ( K {\displaystyle K} and K ′ {\displaystyle K'} ) and its rotation and translation [ R t ] {\displaystyle [R\,t]} to the real-world coordinates X {\displaystyle X} and < m a t h > x {\displaystyle x} and x ′ {\displaystyle x'} ', we get Using the abo

    Read more →
  • Trello

    Trello

    Trello is a web-based, kanban-style list-making application developed by Atlassian. Created in 2011 by Fog Creek Software, it was spun out to form the basis of a separate company in New York City in 2014 and sold to Atlassian in January 2017. == History == The name Trello is derived from the word trellis, which had been a code name for the project at its early stages. Trello was released at a TechCrunch event by Fog Creek founder Joel Spolsky. In September 2011 Wired magazine named the application one of "The 7 Coolest Startups You Haven't Heard of Yet". Lifehacker said "it makes project collaboration simple and kind of enjoyable". In 2014, it raised US$10.3 million in funding from Index Ventures and Spark Capital. Prior to its acquisition, Trello had sold 22% of its shares to investors, with the remaining shares held by founders Michael Pryor and Joel Spolsky. In May 2016, Trello claimed it had more than 1.1 million daily active users and 14 million total signups. In May 2015, Trello expanded internationally with localized interfaces for Brazil, Germany, and Spain. In 2016 Trello launched the Power-Up platform, allowing 3rd party developers to build and distribute extensions known as Power-Ups to Trello. Initial integrations included Zendesk, SurveyMonkey and Giphy. By January 2022 there were a total of 247 power-ups listed in the Power-Up directory. On 9 January 2017, Atlassian announced its intent to acquire Trello for $425 million. The transaction was made with $360 million in cash and $65 million in shares and options. In December 2018, Trello announced its acquisition of Butler, a company that developed a leading power-up for automating tasks within a Trello board. Trello announced 35 million users in March 2019 and 50 million users in October 2019. In 2020 Craig Jones, then cybersecurity operations director at Sophos, found that the company exposed the personally identifiable information (PII) data of its users, exposed through public Trello boards; the researcher first tweeted about this issue in the year 2018. On 16 January 2024 Trello suffered a data breach containing over 15 million unique email addresses, names and usernames, when the data was posted on a popular hacking forum. The data was obtained by enumerating a publicly accessible resource using email addresses from previous breach corpuses; it was then added on 22 January 2024 to the famous website collecting data breaches "Have I Been Pwned?". == Uses == Users can create task boards with different columns and move the tasks between them. Typically columns include task statuses such as To Do, In Progress, Done. The tool can be used for personal and business purposes including real estate management, software project management, school bulletin boards, lesson planning, accounting, web design, gaming, and law office case management. == Architecture == According to a Fog Creek blog post in January 2012, the client was a thin web layer which downloads the main app, written in CoffeeScript and compiled to minified JavaScript, using Backbone.js, HTML5 .pushState(), and the Mustache templating language. The server was built on top of MongoDB, Node.js and a modified version of Socket.io. == Reception == On 26 January 2017, PC Magazine gave Trello a 3.5 / 5 rating, calling it "flexible" and saying that "you can get rather creative", while noting that "it may require some experimentation to figure out how to best use it for your team and the workload you manage."

    Read more →
  • Caffe (software)

    Caffe (software)

    Caffe (Convolutional Architecture for Fast Feature Embedding) is a deep learning framework, originally developed at University of California, Berkeley. It is open source, under a BSD license. It is written in C++, with a Python interface. == History == Yangqing Jia created the Caffe project during his PhD at UC Berkeley, while working the lab of Trevor Darrell. The first version, called "DeCAF", made its first appearance in Spring 2013 when it was used for the ILSVRC challenge (later called ImageNet). The library was named Caffe and released to the public in December 2013. It reached end-of-support in 2018. It is hosted on GitHub. == Features == Caffe supports many different types of deep learning architectures geared towards image classification and image segmentation. It supports CNN, RCNN, LSTM and fully-connected neural network designs. Caffe supports GPU- and CPU-based acceleration computational kernel libraries such as Nvidia cuDNN and Intel MKL. == Applications == Caffe is being used in academic research projects, startup prototypes, and even large-scale industrial applications in vision, speech, and multimedia. Yahoo! has also integrated Caffe with Apache Spark to create CaffeOnSpark, a distributed deep learning framework. == Caffe2 == In April 2017, Facebook announced Caffe2, which included new features such as recurrent neural network (RNN). At the end of March 2018, Caffe2 was merged into PyTorch.

    Read more →