AI Content Repurposing Service

AI Content Repurposing Service — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Bigram

    Bigram

    A bigram or digram is a sequence of two adjacent elements from a string of tokens, which are typically letters, syllables, or words. A bigram is an n-gram for n=2. The frequency distribution of every bigram in a string is commonly used for simple statistical analysis of text in many applications, including in computational linguistics, cryptography, and speech recognition. Gappy bigrams or skipping bigrams are word pairs which allow gaps (perhaps avoiding connecting words, or allowing some simulation of dependencies, as in a dependency grammar). == Applications == Bigrams, along with other n-grams, are used in most successful language models for speech recognition. Bigram frequency attacks can be used in cryptography to solve cryptograms. See frequency analysis. Bigram frequency is one approach to statistical language identification. Some activities in logology or recreational linguistics involve bigrams. These include attempts to find English words beginning with every possible bigram, or words containing a string of repeated bigrams, such as logogogue. == Bigram frequency in the English language == The frequency of the most common letter bigrams in a large English corpus is: th 3.56% of 1.17% io 0.83% he 3.07% ed 1.17% le 0.83% in 2.43% is 1.13% ve 0.83% er 2.05% it 1.12% co 0.79% an 1.99% al 1.09% me 0.79% re 1.85% ar 1.07% de 0.76% on 1.76% st 1.05% hi 0.76% at 1.49% to 1.05% ri 0.73% en 1.45% nt 1.04% ro 0.73% nd 1.35% ng 0.95% ic 0.70% ti 1.34% se 0.93% ne 0.69% es 1.34% ha 0.93% ea 0.69% or 1.28% as 0.87% ra 0.69% te 1.20% ou 0.87% ce 0.65%

    Read more →
  • Software configuration management

    Software configuration management

    Software configuration management (SCM), a.k.a. software change and configuration management (SCCM), is the software engineering practice of tracking and controlling changes to a software system. It is part of the larger cross-disciplinary field of configuration management (CM). SCM includes version control and the establishment of baselines. == Goals == The goals of SCM include: Configuration identification - Identifying configurations, configuration items and baselines. Configuration control - Implementing a controlled change process. This is usually achieved by setting up a change control board whose primary function is to approve or reject all change requests that are sent against any baseline. Configuration status accounting - Recording and reporting all the necessary information on the status of the development process. Configuration auditing - Ensuring that configurations contain all their intended parts and are sound with respect to their specifying documents, including requirements, architectural specifications and user manuals. Build management - Managing the process and tools used for builds. Process management - Ensuring adherence to the organization's development process. Environment management - Managing the software and hardware that host the system. Teamwork - Facilitate team interactions related to the process. Defect tracking - Making sure every defect has traceability back to the source. With the introduction of cloud computing and DevOps the purposes of SCM tools have become merged in some cases. The SCM tools themselves have become virtual appliances that can be instantiated as virtual machines and saved with state and version. The tools can model and manage cloud-based virtual resources, including virtual appliances, storage units, and software bundles. The roles and responsibilities of the actors have become merged as well with developers now being able to dynamically instantiate virtual servers and related resources. == History == == Examples == Ansible – Open-source software platform for remote configuring and managing computers CFEngine – Configuration management software Chef – Configuration management toolPages displaying short descriptions of redirect targets LCFG – Computer configuration management system NixOS – Linux distribution OpenMake Software – DevOps company Otter Puppet – Open source configuration management software Salt – Configuration management software Rex – Open source software

    Read more →
  • Decorrelation

    Decorrelation

    Decorrelation is a general term for any process that is used to reduce autocorrelation within a signal, or cross-correlation within a set of signals, while preserving other aspects of the signal. A frequently used method of decorrelation is the use of a matched linear filter to reduce the autocorrelation of a signal as far as possible. Since the minimum possible autocorrelation for a given signal energy is achieved by equalising the power spectrum of the signal to be similar to that of a white noise signal, this is often referred to as signal whitening. == Process == === Signal processing === Most decorrelation algorithms are linear, but there are also non-linear decorrelation algorithms. Many data compression algorithms incorporate a decorrelation stage. For example, many transform coders first apply a fixed linear transformation that would, on average, have the effect of decorrelating a typical signal of the class to be coded, prior to any later processing. This is typically a Karhunen–Loève transform, or a simplified approximation such as the discrete cosine transform. By comparison, sub-band coders do not generally have an explicit decorrelation step, but instead exploit the already-existing reduced correlation within each of the sub-bands of the signal, due to the relative flatness of each sub-band of the power spectrum in many classes of signals. Linear predictive coders can be modelled as an attempt to decorrelate signals by subtracting the best possible linear prediction from the input signal, leaving a whitened residual signal. Decorrelation techniques can also be used for many other purposes, such as reducing crosstalk in a multi-channel signal, or in the design of echo cancellers. In image processing decorrelation techniques can be used to enhance or stretch, colour differences found in each pixel of an image. This is generally termed as 'decorrelation stretching'. === Neuroscience === In neuroscience, decorrelation is used in the analysis of the neural networks in the human visual system. The raw inputs from cone cells and rod cells under go many steps of processing before it is handled by the visual cortex. These steps generally perform decorrelation, both spatial (surround suppression in the retina) and temporal (handling of movement in the lateral geniculate nucleus). === Cryptography === In cryptography, decorrelation is used in cipher design (see Decorrelation theory) and in the design of hardware random number generators.

    Read more →
  • Mentimeter

    Mentimeter

    Mentimeter (or Menti for short) is a Swedish company based in Stockholm that develops and maintains an eponymous app used to create presentations with real-time feedback. == Foundation and background == Based in Stockholm, Sweden, the Mentimeter app was started by Swedish entrepreneur Johnny Warström and Niklas Ingvar as a response to unproductive meetings. The initial start-up budget was $500,000 raised by a group of prominent investors, including Per Appelgren in 2014, following the market's tendency to invest in Scandinavia. The app also focuses on online collaboration for the education sector, allowing students or public members to answer questions anonymously. The app enables users to share knowledge and real-time feedback on mobile devices with presentations, polls or brainstorming sessions in classes, meetings, gatherings, conferences and other group activities. == Achievements == By 2021, Mentimeter had over 270 million users and was one of Sweden's fastest-growing startups. The company also ranked #10 on 20 Fastest Growing 500 Startups Batch 16 Companies. It was ranked Stockholm's fastest growing company of the 2018 edition of the DI Gasell Award. Mentimeter has a freemium business model.

    Read more →
  • Clipmap

    Clipmap

    In computer graphics, clipmapping is a method of clipping a mipmap to a subset of data pertinent to the geometry being displayed. This is useful for loading as little data as possible when memory is limited, such as on a graphics processing unit. The technique is used for LODing in NVIDIA’s implementation of voxel cone tracing. The high-resolution levels of the mipmapped scene representation are clipped to a region near the camera, while lower resolution levels are clipped further away. == MegaTexture == MegaTexture is a clipmap implementation developed by id Software. It was introduced in their id Tech 4 engine and also appeared in id Tech 5 and id Tech 6 before being removed in id Tech 7. MegaTexture is a texture allocation technique that uses a single, extremely large texture rather than repeating multiple smaller textures. It is also featured in Splash Damage's game Enemy Territory: Quake Wars, and was developed by id Software former technical director John Carmack. MegaTexture employs a single large texture space for static terrain. The texture is stored on removable media or a computer's hard drive and streamed as needed, allowing large amounts of detail and variation over a large area with comparatively little RAM usage. Depending on the pixel resolution per square meter, covering a large area could require several gigabytes of memory. However, RAM is also filled by the rest of the game and the underlying operating system, limiting the amount available for texturing. As the player moves around the game, different sections of the MegaTexture are loaded into memory. They are then scaled to the correct size and applied to the 3D models of the terrain. Id has presented a more advanced technique that builds upon the MegaTexture idea and virtualizes both the geometry and the textures to obtain unique geometry down to the equivalent of the texel: the sparse voxel octree (SVO). It works by raycasting the geometry represented by voxels (instead of triangles) stored in an octree. The goal is to stream parts of the octree into video memory, going further down along the tree for nearby objects to give them more details, and to use higher level, larger voxels for farther objects, which give an automatic level of detail (LOD) system for both geometry and textures at the same time. The geometric detail that can be obtained using this method is nearly infinite, which removes the need for faking 3-dimensional details with techniques such as normal mapping. Despite that most voxel rendering tests use very large amounts of memory (up to several GB), Jon Olick of id Software claimed the technology is able to compress such SVO to 1.15 bits per voxel of position data. == Virtual texturing == Unlike clipmaps, which clip each mip level around a viewpoint-dependent clipcenter and therefore work best for terrain, virtual texturing preprocesses texture data into equally sized tiles that can be streamed for arbitrary textured geometry. Rage, powered by the id Tech 5 engine, uses a more advanced technique called virtual texturing. Textures can measure up to 128000×128000 pixels and are also used for in-game models and sprites, etc. and not just the terrain. Wolfenstein: The New Order and the 2016 version of Doom also use these. Carmageddon: Reincarnation also uses virtual texturing, though unlike id's virtual texturing system, which is designed for unique texture-mapping everywhere, their system is designed to use storage space sparingly while still offering good blend of texture variation and resolution.

    Read more →
  • Foveated rendering

    Foveated rendering

    Foveated rendering is a rendering technique which uses an eye tracker integrated with a virtual reality headset to reduce the rendering workload by greatly reducing the image quality in the peripheral vision (outside of the zone gazed by the fovea). A less sophisticated variant called fixed foveated rendering doesn't utilise eye tracking and instead assumes a fixed focal point. == History == Research into foveated rendering dates back at least to 1991. At Tech Crunch Disrupt SF 2014, Fove unveiled a headset featuring foveated rendering. This was followed by a successful kickstarter in May 2015. At CES 2016, SensoMotoric Instruments (SMI) demoed a new 250 Hz eye tracking system and a working foveated rendering solution. It resulted from a partnership with camera sensor manufacturer Omnivision who provided the camera hardware for the new system. In July 2016, Nvidia demonstrated during SIGGRAPH a new method of foveated rendering claimed to be invisible to users. In February 2017, Qualcomm announced their Snapdragon 835 Virtual Reality Development Kit (VRDK) which includes foveated rendering support called Adreno Foveation. == Use == According to chief scientist Michael Abrash at Oculus, utilising foveated rendering in conjunction with sparse rendering and deep learning image reconstruction has the potential to require an order of magnitude fewer pixels to be rendered in comparison to a full image. Later, these results have been demonstrated and published. In December 2019, fixed foveated rendering support was added to the Oculus Quest SDK. A number of VR headsets have included on-board eye tracking to provide support for foveated rendering, including HTC's Vive Pro Eye (2019), Meta Quest Pro (2022), PlayStation VR2 (2023), and Apple Vision Pro (2024). In 2025, Valve announced the upcoming Steam Frame headset, which applies a variation of the technique known as "foveated streaming" for wireless streaming from a PC to the headset; the method similarly uses variance in bit rate, and is performed at the encoder level rather than the software level.

    Read more →
  • Observability (software)

    Observability (software)

    In software engineering, more specifically in distributed computing, observability is the ability to collect data about programs' execution, modules' internal states, and the communication among components. To improve observability, software engineers use a wide range of logging and tracing techniques to gather telemetry information, and tools to analyze and use it. Observability is foundational to site reliability engineering, as it is the first step in triaging a service outage. One of the goals of observability is to minimize the amount of prior knowledge needed to debug an issue. == Etymology, terminology and definition == The term is borrowed from control theory, where the "observability" of a system measures how well its state can be determined from its outputs. Similarly, software observability measures how well a system's state can be understood from the obtained telemetry (metrics, logs, traces, profiling). The definition of observability varies by vendor: Observability is the process of making a system’s internal state more transparent. Systems are made observable by the data they produce, which in turn helps you to determine if your infrastructure or application is healthy and functioning normally. a measure of how well you can understand and explain any state your system can get into, no matter how novel or bizarre [...] without needing to ship new code software tools and practices for aggregating, correlating and analyzing a steady stream of performance data from a distributed application along with the hardware and network it runs onobservability starts by shipping all your raw data to central service before you begin analysisthe ability to measure a system’s current state based on the data it generates, such as logs, metrics, and traces Observability is tooling or a technical solution that allows teams to actively debug their system. Observability is based on exploring properties and patterns not defined in advance. proactively collecting, visualizing, and applying intelligence to all of your metrics, events, logs, and traces—so you can understand the behavior of your complex digital system The term is frequently referred to as its numeronym o11y (where 11 stands for the number of letters between the first letter and the last letter of the word). This is similar to other computer science abbreviations such as i18n and l10n and k8s. === Observability vs. monitoring === Observability and monitoring are sometimes used interchangeably. As tooling, commercial offerings and practices evolved in complexity, "monitoring" was re-branded as observability in order to differentiate new tools from the old. The terms are commonly contrasted in that systems are monitored using predefined sets of telemetry, and monitored systems may be observable. Majors et al. suggest that engineering teams that only have monitoring tools end up relying on expert foreknowledge (seniority), whereas teams that have observability tools rely on exploratory analysis (curiosity). == Telemetry types == Observability relies on three main types of telemetry data: metrics, logs and traces. Those are often referred to as "pillars of observability". === Metrics === A metric is a point in time measurement (scalar) that represents some system state. Examples of common metrics include: number of HTTP requests per second; total number of query failures; database size in bytes; time in seconds since last garbage collection. Monitoring tools are typically configured to emit alerts when certain metric values exceed set thresholds. Thresholds are set based on knowledge about normal operating conditions and experience. Metrics are typically tagged to facilitate grouping and searchability. Application developers choose what kind of metrics to instrument their software with, before it is released. As a result, when a previously unknown issue is encountered, it is impossible to add new metrics without shipping new code. Furthermore, their cardinality can quickly make the storage size of telemetry data prohibitively expensive. Since metrics are cardinality-limited, they are often used to represent aggregate values (for example: average page load time, or 5-second average of the request rate). Without external context, it is impossible to correlate between events (such as user requests) and distinct metric values. === Logs === Logs, or log lines, are generally free-form, unstructured text blobs that are intended to be human readable. Modern logging is structured to enable machine parsability. As with metrics, an application developer must instrument the application upfront and ship new code if different logging information is required. Logs typically include a timestamp and severity level. An event (such as a user request) may be fragmented across multiple log lines and interweave with logs from concurrent events. === Traces === ==== Distributed traces ==== A cloud native application is typically made up of distributed services which together fulfill a single request. A distributed trace is an interrelated series of discrete events (also called spans) that track the progression of a single user request. A trace shows the causal and temporal relationships between the services that interoperate to fulfill a request. Instrumenting an application with traces means sending span information to a tracing backend. The tracing backend correlates the received spans to generate presentable traces. To be able to follow a request as it traverses multiple services, spans are labeled with unique identifiers that enable constructing a parent-child relationship between spans. Span information is typically shared in the HTTP headers of outbound requests. === Continuous profiling === Continuous profiling is another telemetry type used to precisely determine how an application consumes resources. === Instrumentation === To be able to observe an application, telemetry about the application's behavior needs to be collected or exported. Instrumentation means generating telemetry alongside the normal operation of the application. Telemetry is then collected by an independent backend for later analysis. In fast-changing systems, instrumentation itself is often the best possible documentation, since it combines intention (what are the dimensions that an engineer named and decided to collect?) with the real-time, up-to-date information of live status in production. Instrumentation can be automatic, or custom. Automatic instrumentation offers blanket coverage and immediate value; custom instrumentation brings higher value but requires more intimate involvement with the instrumented application. Instrumentation can be native - done in-code (modifying the code of the instrumented application) - or out-of-code (e.g. sidecar, eBPF). Verifying new features in production by shipping them together with custom instrumentation is a practice called "observability-driven development". == "Pillars of observability" == Metrics, logs and traces are most commonly listed as the pillars of observability. Majors et al. suggest that the pillars of observability are high cardinality, high-dimensionality, and explorability, arguing that runbooks and dashboards have little value because "modern systems rarely fail in precisely the same way twice." == Self monitoring == Self monitoring is a practice where observability stacks monitor each other, in order to reduce the risk of inconspicuous outages. Self monitoring may be put in place in addition to high availability and redundancy to further avoid correlated failures.

    Read more →
  • Cloud computing

    Cloud computing

    Cloud computing is defined by the International Organization for Standardization (ISO) as "a paradigm for enabling network access to a scalable and elastic pool of shareable physical or virtual resources with self-service provisioning and administration on demand". It is commonly referred to as "the cloud". == Characteristics == In 2011, the National Institute of Standards and Technology (NIST) identified five "essential characteristics" for cloud systems. Below are the exact definitions according to NIST: On-demand self-service: "A consumer can unilaterally provision computing capabilities, such as server time and network storage, as needed automatically without requiring human interaction with each service provider." Broad network access: "Capabilities are available over the network and accessed through standard mechanisms that promote use by heterogeneous thin or thick client platforms (e.g., mobile phones, tablets, laptops, and workstations)." Resource pooling: " The provider's computing resources are pooled to serve multiple consumers using a multi-tenant model, with different physical and virtual resources dynamically assigned and reassigned according to consumer demand." Rapid elasticity: "Capabilities can be elastically provisioned and released, in some cases automatically, to scale rapidly outward and inward commensurate with demand. To the consumer, the capabilities available for provisioning often appear unlimited and can be appropriated in any quantity at any time." Measured service: "Cloud systems automatically control and optimize resource use by leveraging a metering capability at some level of abstraction appropriate to the type of service (e.g., storage, processing, bandwidth, and active user accounts). Resource usage can be monitored, controlled, and reported, providing transparency for both the provider and consumer of the utilized service. By 2023, the International Organization for Standardization (ISO) had expanded and refined the list. == History == The history of cloud computing extends to the 1960s, with the initial concepts of time-sharing becoming popularized via remote job entry (RJE). The "data center" model, where users submitted jobs to operators to run on mainframes, was predominantly used during this era. This period saw broad experimentation with making large-scale computing power more accessible through time-sharing, while optimizing infrastructure, platforms, and applications to improve efficiency for end users. The "cloud" metaphor for virtualized services dates to 1994, when it was used by General Magic for the universe of "places" that mobile agents in the Telescript environment could "go". The metaphor is credited to David Hoffman, a General Magic communications specialist, based on its long-standing use in networking and telecom. The expression cloud computing became more widely known in 1996 when Compaq Computer Corporation drew up a business plan for future computing and the Internet. The company's ambition was to supercharge sales with "cloud computing-enabled applications". The business plan foresaw that online consumer file storage would likely be commercially successful. As a result, Compaq decided to sell server hardware to internet service providers. In the 2000s, the application of cloud computing began to take shape with the establishment of Amazon Web Services (AWS) in 2002, which allowed developers to build applications independently. In 2006 Amazon Simple Storage Service, known as Amazon S3, and the Amazon Elastic Compute Cloud (EC2) were released. In 2008 NASA's development of the first open-source software for deploying private and hybrid clouds. The following decade saw the launch of various cloud services. In 2010, Microsoft launched Microsoft Azure, and Rackspace Hosting and NASA initiated an open-source cloud-software project, OpenStack. IBM introduced the IBM SmartCloud framework in 2011, and Oracle announced the Oracle Cloud in 2012. In December 2019, Amazon launched AWS Outposts, a service that extends AWS infrastructure, services, APIs, and tools to customer data centers, co-location spaces, or on-premises facilities. == Value proposition == Cloud computing can shorten time to market by offering pre-configured tools, scalable resources, and managed services, allowing users to focus on core business value rather than maintaining infrastructure. Cloud platforms can enable organizations and individuals to reduce upfront capital expenditures on physical infrastructure by shifting to an operational expenditure model, where costs scale with usage. Cloud platforms also offer managed services and tools, such as artificial intelligence, data analytics, and machine learning, which might otherwise require significant in-house expertise and infrastructure investment. While cloud computing can offer cost advantages through effective resource optimization, organizations often face challenges such as unused resources, inefficient configurations, and hidden costs without proper oversight and governance. Many cloud platforms provide cost management tools, such as AWS Cost Explorer and Azure Cost Management, and frameworks like FinOps have emerged to standardize financial operations in the cloud. Cloud computing also facilitates collaboration, remote work, and global service delivery by enabling secure access to data and applications from any location with an internet connection. Cloud providers offer various redundancy options for core services, such as managed storage and managed databases, though redundancy configurations often vary by service tier. Advanced redundancy strategies, such as cross-region replication or failover systems, typically require explicit configuration and may incur additional costs or licensing fees. Cloud environments operate under a shared responsibility model, where providers are typically responsible for infrastructure security, physical hardware, and software updates, while customers are accountable for data encryption, identity and access management (IAM), and application-level security. These responsibilities vary depending on the cloud service model—Infrastructure as a Service (IaaS), Platform as a Service (PaaS), or Software as a Service (SaaS)—with customers typically having more control and responsibility in IaaS environments and progressively less in PaaS and SaaS models, often trading control for convenience and managed services. == Adoption and suitability == The decision to adopt cloud computing or maintain on-premises infrastructure depends on factors such as scalability, cost structure, latency requirements, regulatory constraints, and infrastructure customization. Organizations with variable or unpredictable workloads, limited capital for upfront investments, or a focus on rapid scalability benefit from cloud adoption. Startups, SaaS companies, and e-commerce platforms often prefer the pay-as-you-go operational expenditure (OpEx) model of cloud infrastructure. Additionally, companies prioritizing global accessibility, remote workforce enablement, disaster recovery, and leveraging advanced services such as AI/ML and analytics are well-suited for the cloud. In recent years, some cloud providers have started offering specialized services for high-performance computing and low-latency applications, addressing some use cases previously exclusive to on-premises setups. On the other hand, organizations with strict regulatory requirements, highly predictable workloads, or reliance on deeply integrated legacy systems may find cloud infrastructure less suitable. Businesses in industries like defense, government, or those handling highly sensitive data often favor on-premises setups for greater control and data sovereignty. Additionally, companies with ultra-low latency requirements, such as high-frequency trading (HFT) firms, rely on custom hardware (e.g., FPGAs) and physical proximity to exchanges, which most cloud providers cannot fully replicate despite recent advancements. Similarly, tech giants like Google, Meta, and Amazon build their own data centers due to economies of scale, predictable workloads, and the ability to customize hardware and network infrastructure for optimal efficiency. However, these companies also use cloud services selectively for certain workloads and applications where it aligns with their operational needs. In practice, many organizations are increasingly adopting hybrid cloud architectures, combining on-premises infrastructure with cloud services. This approach allows businesses to balance scalability, cost-effectiveness, and control, offering the benefits of both deployment models while mitigating their respective limitations. == Challenges and limitations == One of the primary challenges of cloud computing, compared with traditional on-premises systems, is maintaining data security and privacy. Cloud users entrust their sensitive data to third-party providers, who may not have adequate measures to protect it from unau

    Read more →
  • Conditional random field

    Conditional random field

    Conditional random fields (CRFs) are a class of statistical modeling methods often applied in pattern recognition and machine learning and used for structured prediction. Whereas a classifier predicts a label for a single sample without considering "neighbouring" samples, a CRF can take context into account. To do so, the predictions are modelled as a graphical model, which represents the presence of dependencies between the predictions. The kind of graph used depends on the application. For example, in natural language processing, "linear chain" CRFs are popular, for which each prediction is dependent only on its immediate neighbours. In image processing, the graph typically connects locations to nearby and/or similar locations to enforce that they receive similar predictions. Other examples where CRFs are used are: labeling or parsing of sequential data for natural language processing or biological sequences, part-of-speech tagging, shallow parsing, named entity recognition, gene finding, peptide critical functional region finding, and object recognition and image segmentation in computer vision. == Description == CRFs are a type of discriminative undirected probabilistic graphical model. Lafferty, McCallum and Pereira define a CRF on observations X {\displaystyle {\boldsymbol {X}}} and random variables Y {\displaystyle {\boldsymbol {Y}}} as follows: Let G = ( V , E ) {\displaystyle G=(V,E)} be a graph such that Y = ( Y v ) v ∈ V {\displaystyle {\boldsymbol {Y}}=({\boldsymbol {Y}}_{v})_{v\in V}} , so that Y {\displaystyle {\boldsymbol {Y}}} is indexed by the vertices of G {\displaystyle G} . Then ( X , Y ) {\displaystyle ({\boldsymbol {X}},{\boldsymbol {Y}})} is a conditional random field when each random variable Y v {\displaystyle {\boldsymbol {Y}}_{v}} , conditioned on X {\displaystyle {\boldsymbol {X}}} , obeys the Markov property with respect to the graph; that is, its probability is dependent only on its neighbours in G and not its past states: P ( Y v | X , { Y w : w ≠ v } ) = P ( Y v | X , { Y w : w ∼ v } ) {\displaystyle P({\boldsymbol {Y}}_{v}|{\boldsymbol {X}},\{{\boldsymbol {Y}}_{w}:w\neq v\})=P({\boldsymbol {Y}}_{v}|{\boldsymbol {X}},\{{\boldsymbol {Y}}_{w}:w\sim v\})} , where w ∼ v {\displaystyle {\mathit {w}}\sim v} means that w {\displaystyle w} and v {\displaystyle v} are neighbors in G {\displaystyle G} . What this means is that a CRF is an undirected graphical model whose nodes can be divided into exactly two disjoint sets X {\displaystyle {\boldsymbol {X}}} and Y {\displaystyle {\boldsymbol {Y}}} , the observed and output variables, respectively; the conditional distribution p ( Y | X ) {\displaystyle p({\boldsymbol {Y}}|{\boldsymbol {X}})} is then modeled. === Inference === For general graphs, the problem of exact inference in CRFs is intractable. The inference problem for a CRF is basically the same as for an MRF and the same arguments hold. However, there exist special cases for which exact inference is feasible: If the graph is a chain or a tree, message passing algorithms yield exact solutions. The algorithms used in these cases are analogous to the forward-backward and Viterbi algorithm for the case of HMMs. If the CRF only contains pair-wise potentials and the energy is submodular, combinatorial min cut/max flow algorithms yield exact solutions. If exact inference is impossible, several algorithms can be used to obtain approximate solutions. These include: Loopy belief propagation Alpha expansion Mean field inference Linear programming relaxations === Parameter learning === Learning the parameters θ {\displaystyle \theta } is usually done by maximum likelihood learning for p ( Y i | X i ; θ ) {\displaystyle p(Y_{i}|X_{i};\theta )} . If all nodes have exponential family distributions and all nodes are observed during training, this optimization is convex. It can be solved for example using gradient descent algorithms, or Quasi-Newton methods such as the L-BFGS algorithm. On the other hand, if some variables are unobserved, the inference problem has to be solved for these variables. Exact inference is intractable in general graphs, so approximations have to be used. === Examples === In sequence modeling, the graph of interest is usually a chain graph. An input sequence of observed variables X {\displaystyle X} represents a sequence of observations and Y {\displaystyle Y} represents a hidden (or unknown) state variable that needs to be inferred given the observations. The Y i {\displaystyle Y_{i}} are structured to form a chain, with an edge between each Y i − 1 {\displaystyle Y_{i-1}} and Y i {\displaystyle Y_{i}} . As well as having a simple interpretation of the Y i {\displaystyle Y_{i}} as "labels" for each element in the input sequence, this layout admits efficient algorithms for: model training, learning the conditional distributions between the Y i {\displaystyle Y_{i}} and feature functions from some corpus of training data. decoding, determining the probability of a given label sequence Y {\displaystyle Y} given X {\displaystyle X} . inference, determining the most likely label sequence Y {\displaystyle Y} given X {\displaystyle X} . The conditional dependency of each Y i {\displaystyle Y_{i}} on X {\displaystyle X} is defined through a fixed set of feature functions of the form f ( i , Y i − 1 , Y i , X ) {\displaystyle f(i,Y_{i-1},Y_{i},X)} , which can be thought of as measurements on the input sequence that partially determine the likelihood of each possible value for Y i {\displaystyle Y_{i}} . The model assigns each feature a numerical weight and combines them to determine the probability of a certain value for Y i {\displaystyle Y_{i}} . Linear-chain CRFs have many of the same applications as conceptually simpler hidden Markov models (HMMs), but relax certain assumptions about the input and output sequence distributions. An HMM can loosely be understood as a CRF with very specific feature functions that use constant probabilities to model state transitions and emissions. Conversely, a CRF can loosely be understood as a generalization of an HMM that makes the constant transition probabilities into arbitrary functions that vary across the positions in the sequence of hidden states, depending on the input sequence. Notably, in contrast to HMMs, CRFs can contain any number of feature functions, the feature functions can inspect the entire input sequence X {\displaystyle X} at any point during inference, and the range of the feature functions need not have a probabilistic interpretation. == Variants == === Higher-order CRFs and semi-Markov CRFs === CRFs can be extended into higher order models by making each Y i {\displaystyle Y_{i}} dependent on a fixed number k {\displaystyle k} of previous variables Y i − k , . . . , Y i − 1 {\displaystyle Y_{i-k},...,Y_{i-1}} . In conventional formulations of higher order CRFs, training and inference are only practical for small values of k {\displaystyle k} (such as k ≤ 5), since their computational cost increases exponentially with k {\displaystyle k} . However, another recent advance has managed to ameliorate these issues by leveraging concepts and tools from the field of Bayesian nonparametrics. Specifically, the CRF-infinity approach constitutes a CRF-type model that is capable of learning infinitely-long temporal dynamics in a scalable fashion. This is effected by introducing a novel potential function for CRFs that is based on the Sequence Memoizer (SM), a nonparametric Bayesian model for learning infinitely-long dynamics in sequential observations. To render such a model computationally tractable, CRF-infinity employs a mean-field approximation of the postulated novel potential functions (which are driven by an SM). This allows for devising efficient approximate training and inference algorithms for the model, without undermining its capability to capture and model temporal dependencies of arbitrary length. There exists another generalization of CRFs, the semi-Markov conditional random field (semi-CRF), which models variable-length segmentations of the label sequence Y {\displaystyle Y} . This provides much of the power of higher-order CRFs to model long-range dependencies of the Y i {\displaystyle Y_{i}} , at a reasonable computational cost. Finally, large-margin models for structured prediction, such as the structured Support Vector Machine can be seen as an alternative training procedure to CRFs. === Latent-dynamic conditional random field === Latent-dynamic conditional random fields (LDCRF) or discriminative probabilistic latent variable models (DPLVM) are a type of CRFs for sequence tagging tasks. They are latent variable models that are trained discriminatively. In an LDCRF, like in any sequence tagging task, given a sequence of observations x = x 1 , … , x n {\displaystyle x_{1},\dots ,x_{n}} , the main problem the model must solve is how to assign a sequence of labels y = y 1 , … , y n {\displaystyle y_{1},\dots ,y_{n}} from one finite set

    Read more →
  • GNU toolchain

    GNU toolchain

    The GNU toolchain is a broad collection of programming tools produced by the GNU Project. These tools form a toolchain (a suite of tools used in a serial manner) used for developing software applications and operating systems. The GNU toolchain plays a vital role in development of Linux, some BSD systems, and software for embedded systems. Parts of the GNU toolchain are also directly used with or ported to other platforms such as Solaris, macOS, Microsoft Windows (via Cygwin and MinGW/MSYS/WSL2), Sony PlayStation Portable (used by PSP modding scene) and Sony PlayStation 3. == Components == Projects in the GNU toolchain are: GNU Autotools (build system) – Software build toolset from GNU GNU Binutils – GNU software development tools for executable code GNU Bison – Yacc-compatible parser generator program GNU C Library – GNU implementation of the standard C libraryPages displaying short descriptions of redirect targets GNU Compiler Collection – Free and open-source compiler for various programming languages GNU Debugger – Source-level debugger GNU m4 – General-purpose macro processor GNU make – Software build automation tool

    Read more →
  • Likewise, Inc.

    Likewise, Inc.

    Likewise, Inc., is an American technology startup company which provides a social networking service for finding and saving content recommendations for movies, TV shows, books, and podcasts. A team of ex-Microsoft employees founded Likewise in October 2017 with financial investment from Microsoft co-founder Bill Gates. The company is led by CEO Ian Morris and as of 2020 had a team of about 35 employees. Its headquarters operates in Bellevue, Washington. As of July 2020, 1 million users had joined the platform. == History == === Ideation (October 2017) === In 2017, former Microsoft Communications Chief Larry Cohen came up with the idea for Likewise in Bill Gates’ private office, Gates Ventures. Cohen currently serves as Gates Ventures’ CEO and managing partner. Cohen collaborated with colleagues Michael Dix and Ian Morris to co-found what would become Likewise, with Morris as its CEO. Gates funded the company's early development. The company developed its platform in stealth mode before launching publicly in October 2018. === Release (October 2018) === Likewise officially released its platform in the US and Canada on October 3, 2018. === Growth (2020 COVID-19 pandemic) === Likewise experienced accelerated growth alongside the COVID-19 pandemic. From March 2020 to July 2020, the platform's monthly active users tripled in numbers. The company reached one million users in July 2020. == Applications == === Mobile === Likewise is available as a mobile app for the Android and iOS mobile operating systems. Users receive recommendations from the Likewise algorithm, people they follow, and the Likewise editorial team. === Likewise TV === In October 2019, the company launched its Apple TV app called Likewise TV. The television app organizes shows across streaming services under one watchlist. On July 20, 2020, Likewise TV expanded to Android TV and Amazon Fire TV users.

    Read more →
  • Circular thresholding

    Circular thresholding

    Circular thresholding is an algorithm for automatic image threshold selection in image processing. Most threshold selection algorithms assume that the values (e.g. intensities) lie on a linear scale. However, some quantities such as hue and orientation are a circular quantity, and therefore require circular thresholding algorithms. The example shows that the standard linear version of Otsu's method when applied to the hue channel of an image of blood cells fails to correctly segment the large white blood cells (leukocytes). In contrast the white blood cells are correctly segmented by the circular version of Otsu's method. == Methods == There are a relatively small number of circular image threshold selection algorithms. The following examples are all based on Otsu's method for linear histograms: (Tseng, Li and Tung 1995) smooth the circular histogram, and apply Otsu's method. The histogram is cyclically rotated so that the selected threshold is shifted to zero. Otsu's method and histogram rotation are applied iteratively until several heuristics involving class size, threshold location, and class variance are satisfied. (Wu et al. 2006) smooth the circular histogram until it contains only two peaks. The histogram is cyclically rotated so that the midpoint between the peaks is shifted to zero. Otsu's method and histogram rotation are applied iteratively until convergence of the threshold. (Lai and Rosin 2014) applied Otsu's method to the circular histogram. For the two class circular thresholding task they showed that, for a histogram with an even number of bins, the optimal solution for Otsu's criterion of within-class variance is obtained when the histogram is split into two halves. Therefore the optimal solution can be efficiently obtained in linear rather than quadratic time. == References and further reading == D.-C. Tseng, Y.-F. Li, and C.-T. Tung, Circular histogram thresholding for color image segmentation in Proc. Int. Conf. Document Anal. Recognit., 1995, pp. 673–676. J. Wu, P. Zeng, Y. Zhou, and C. Olivier, A novel color image segmentation method and its application to white blood cell image analysis in Proc. Int. Conf. Signal Process., vol. 2. 2006, pp. 16–20. Y.K. Lai, P.L. Rosin, Efficient Circular Thresholding, IEEE Trans. on Image Processing 23(3), 992–1001 (2014). doi:10.1109/TIP.2013.2297014

    Read more →
  • Pydio

    Pydio

    Pydio Cells, previously known as just Pydio and formerly known as AjaXplorer, is an open-source file-sharing and synchronisation software that runs on the user's own server or in the cloud. == Presentation == The project was created by musician Charles Du Jeu (current CEO and CTO) in 2007 under the name AjaXplorer. The name was changed in 2013 and became Pydio (an acronym for Put Your Data in Orbit). In May 2018, Pydio switched from PHP to Go with the release of Pydio Cells. The PHP version reached end-of-life state on 31 December 2019. Pydio Cells runs on any server supporting a recent Go version. Windows/Linux/macOS on the Intel architecture are directly supported; a fully functional working ARM implementation is under active development. Pydio Cells has been developed from scratch using the Go programming language; release 4.0.0 introduced code refactoring to fully support the Go modular structure as well as grid computing. Nevertheless, the web-based interface of Cells is very similar to the one from Pydio 8 (in PHP), and it successfully replicates most of its features, while adding a few more. There is also a new synchronisation client (also written in Go). The PHP version has been phased out as the company's focus is moving to Pydio Cells, with community feedback on the new features. According to the company, the switch to the new environment was made "to overcome inherent PHP limitations and provide you with a future-proof and modern solution for collaborating on documents". From a technical point of view, Pydio differs from solutions such as Google Drive or Dropbox. Pydio is not based on a public cloud; instead, the software connects to the user's existing storage (such as SAN / Local FS, SAMBA / CIFS, (s)FTP, NFS, S3-compatible cloud storage, Azure Blob Storage, Google Cloud Storage) as well as to the existing user directories (LDAP / AD, OAuth2 / OIDC SSO, SAML / Azure ADFS SSO, RADIUS, Shibboleth...), which allows companies to keep their data inside their infrastructure, according to their data security policy and user rights management. The software is built in a modular perspective; up to Pydio 8, various plugins allowed administrators to implement extra features. On the server side, Pydio Cells is deployed as a collection of independent microservices communicating among themselves using gRPC and logging user actions via Activity Streams 2.0 (AS2). Pydio Cells microservices are built with the Go Micro framework (using an embedded NATS server). A standard installation will deploy all required services on the same physical server, but for the purposes of performance, reliability and high availability, these can now be spread across several different servers (even in geographically separate locations) according to the 12-factors architecture pattern. Pydio Cells is available either through a free and open-source community distribution (Pydio Cells Home), or a commercially-licensed enterprise distribution (in two variants, Pydio Cells Connect and Pydio Cells Enterprise), which add features not available in the community distribution as well as additional levels of support beyond the community forums. == Features == File sharing between different internal users and across other Pydio instances SSL/TLS Encryption WebDAV file server Creation of dedicated workspaces, for each line of business / project / client, with a dedicated user rights management for each workspace. File-sharing with external users (private links, public links, password protection, download limitation, etc.) Online viewing and editing of documents with Collabora Office (Pydio Cells Enterprise also offers OnlyOffice integration) Preview and editing of image files Integrated audio and video reader Activity stream ('timeline') for all actions taken by users Integrated chat platform Client applications are available for all major desktop and mobile platforms.

    Read more →
  • Mobile simulator

    Mobile simulator

    A mobile simulator is a software application for a personal computer which creates a virtual machine version of a mobile device, such as a mobile phone, iPhone, other smartphone, or calculator, on the computer. This may sometimes also be termed an emulator. The mobile simulator allows the user to use features and run applications on the virtual mobile on their computer as though it was the actual mobile device. A mobile simulator lets you test a website and determine how well it performs on various types of mobile devices. A good simulator tests mobile content quickly on multiple browsers and emulates several device profiles simultaneously. This allows analysis of mobile content in real-time, locate errors in code, view rendering in an environment that simulates the mobile browser, and optimize the site for performance. Mobile simulators may be developed using programming languages such as Java, .NET and JavaScript.

    Read more →
  • XLeratorDB

    XLeratorDB

    XLeratorDB is a suite of database function libraries that enable Microsoft SQL Server to perform a wide range of additional (non-native) business intelligence and ad hoc analytics. The libraries, which are embedded and run centrally on the database, include more than 450 individual functions similar to those found in Microsoft Excel spreadsheets. The individual functions are grouped and sold as six separate libraries based on usage: finance, statistics, math, engineering, unit conversions and strings. WestClinTech, the company that developed XLeratorDB, claims it is "the first commercial function package add-in for Microsoft SQL Server." == Company history == WestClinTech (LLC), founded by software industry veterans Charles Flock and Joe Stampf in 2008, is located in Irvington, New York, United States. Flock was a co-founder of The Frustum Group, developer of the OPICS enterprise banking and trading platform, which was acquired by London-based Misys, PLC in 1996. Stampf joined Frustum in 1994 and with Flock remained active with the company after acquisition, helping to develop successive generations of OPICS now employed by over 150 leading financial institutions worldwide. Following a full year of research, development and testing, WestClinTech introduced and recorded its first commercial sale of XLeratorDB in April 2009. In September 2009, XLeratorDB became available to all Federal agencies through NASA's Strategic Enterprise-Wide Procurement (SEWP-IV) program, a government-wide acquisition contract. == Technology == XLeratorDB uses Microsoft SQL CLR(Common Language Runtime) technology. SQL CLR allows managed code to be hosted by, and run in, the Microsoft SQL Server environment. SQL CLR relies on the creation, deployment and registration of .NET Framework assemblies that are physically stored in managed code dynamic-link libraries (DLL). The assemblies may contain .NET namespaces, classes, functions, and properties. Because managed code compiles to native code prior to execution, functions using SQL CLR can achieve significant performance increases versus the equivalent functions written in T-SQL in some scenarios. XLeratorDB requires Microsoft SQL Server 2005 or SQL Server 2005 Express editions, or later (compatibility mode 90 or higher). The product installs with PERMISSION_SET=SAFE. SAFE mode, the most restrictive permission set, is accessible by all users. Code executed by an assembly with SAFE permissions cannot access external system resources such as files, the network, the internet, environment variables, or the registry. == Functions == In computer science, a function is a portion of code within a larger program which performs a specific task and is relatively independent of the remaining code. As used in database and spreadsheet applications these functions generally represent mathematical formulas widely used across a variety of fields. While this code may be user-generated, it is also embedded as a pre-written sub-routine in applications. These functions are typically identified by common nomenclature which corresponds to their underlying operations: e.g. IRR identifies the function which calculates Internal Rate of Return on a series of periodic cash flows. === Function uses === As subroutines, functions can be integrated and used in a variety of ways, and as part of larger, more complicated applications. Within large enterprise applications they may, for example, play an important role in defining business rules or risk management parameters, while remaining virtually invisible to end users. Within database management systems and spreadsheets, however, these kinds of functions also represent discrete sets of tools; they can be accessed directly and utilized on a stand-alone basis, or in more complex, user-defined configurations. In this context, functions can be used for business intelligence and ad hoc analysis of data in fields such as finance, statistics, engineering, math, etc. === Function types === XLeratorDB uses three kinds of functions to perform analytic operations: scalar, aggregate, and a hybrid form which WestClinTech calls Range Queries. Scalar functions take a single value, perform an operation and return a single value. An example of this type of function is LOG, which returns the logarithm of a number to a specified base. Aggregate functions operate on a series of values but return a single, summarizing value. An example of this type of function is AVG, which returns the average of values in a specified group. In XLeratorDB there are some functions which have characteristics of aggregate functions (operating on multiple series of values) but cannot be processed in SQL CLR using single column inputs, such as AVG does. For example, irregular internal rate of return (XIRR), a financial function, operates on a collection of cash flow values from one column, but must also apply variable period lengths from another column and an initial iterative assumption from a third, in order to return a single, summarizing value. WestClinTech documentation notes that Range Queries specify the data to be included in the result set of the function independently of the WHERE clause associated with the T-SQL statement, by incorporating a SELECT statement into the function as a string argument; the function then traps that SELECT statement, executes it internally and processes the result. Some XLeratorDB functions that employ Range Queries are: NPV, XNPV, IRR, XIRR, MIRR, MULTINOMIAL, and SERIESSUM. Within the application these functions are identified by a "_q" naming convention: e.g. NPV_q, IRR_q, etc. == Analytic functions == === SQL Server functions === Microsoft SQL Server is the #3 selling database management system (DBMS), behind Oracle and IBM. (While versions of SQL Server have been on the market since 1987, XLeratorDB is compatible with only the 2005 edition and later.) Like all major DBMS, SQL Server performs a variety of data mining operations by returning or arraying data in different views (also known as drill-down). In addition, SQL Server uses Transact-SQL (T-SQL) to execute four major classes of pre-defined functions in native mode. Functions operating on the DBMS offer several advantages over client layer applications like Excel: they utilize the most up-to-date data available; they can process far larger quantities of data; and, the data is not subject to exporting and transcription errors. SQL Server 2008 includes a total of 58 functions that perform relatively basic aggregation (12), math (23) and string manipulation (23) operations useful for analytics; it includes no native functions that perform more complex operations directly related to finance, statistics or engineering. === Excel functions === Microsoft Excel, a component of Microsoft Office suite, is one of the most widely used spreadsheet applications on the market today. In addition to its inherent utility as a stand-alone desktop application, Excel overlaps and complements the functionality of DBMS in several ways: storing and arraying data in rows and columns; performing certain basic tasks such as pivot table and aggregating values; and facilitating sharing, importing and exporting of database data. Excel's chief limitation relative to a true database is capacity; Excel 2003 is limited to some 65k rows and 256 columns; Excel 2007 extends this capacity to roughly 1million rows and 16k columns. By comparison, SQL Server is able to manage over 500k terabytes of memory. Excel offers, however, an extensive library of specialized pre-written functions which are useful for performing ad hoc analysis on database data. Excel 2007 includes over 300 of these pre-defined functions, although customized functions can also be created by users, or imported from third party developers as add-ons. Excel functions are grouped by type: === Excel business intelligence functions === Operating on the client computing layer Excel plays an important role as a business intelligence tool because it: performs a wide array of complex analytic functions not native to most DBMS software offers far greater ad hoc reporting and analytic flexibility than most enterprise software provides a medium for sharing and collaborating because of its ubiquity throughout the enterprise Microsoft reinforces this positioning with Business Intelligence documentation that positions Excel in a clearly pivotal role. === XLeratorDB vs. Excel functions === While operating within the database environment, XLeratorDB functions utilize the same naming conventions and input formats, and in most cases, return the same calculation results as Excel functions. XLeratorDB, coupled with SQL Server's native capabilities, compares to Excel's function sets as follows:

    Read more →