AI Detector Excel File

AI Detector Excel File — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Image texture

    Image texture

    An image texture is the small-scale structure perceived on an image, based on the spatial arrangement of color or intensities. It can be quantified by a set of metrics calculated in image processing. Image texture metrics give us information about the whole image or selected regions. Image textures can be artificially created or found in natural scenes captured in an image. Image textures are one way that can be used to help in segmentation or classification of images. For more accurate segmentation the most useful features are spatial frequency and an average grey level. To analyze an image texture in computer graphics, there are two ways to approach the issue: structured approach and statistical approach. == Structured approach == A structured approach sees an image texture as a set of primitive texels in some regular or repeated pattern. This works well when analyzing artificial textures. To obtain a structured description a characterization of the spatial relationship of the texels is gathered by using Voronoi tessellation of the texels. == Statistical approach == A statistical approach sees an image texture as a quantitative measure of the arrangement of intensities in a region. In general this approach is easier to compute and is more widely used, since natural textures are made of patterns of irregular subelements. === Edge detection === The use of edge detection is to determine the number of edge pixels in a specified region, helps determine a characteristic of texture complexity. After edges have been found the direction of the edges can also be applied as a characteristic of texture and can be useful in determining patterns in the texture. These directions can be represented as an average or in a histogram. Consider a region with N pixels. the gradient-based edge detector is applied to this region by producing two outputs for each pixel p: the gradient magnitude Mag(p) and the gradient direction Dir(p). The edgeness per unit area can be defined by F e d g e n e s s = | { p | M a g ( p ) > T } | N {\displaystyle F_{edgeness}={\frac {|\{p|Mag(p)>T\}|}{N}}} for some threshold T. To include orientation with edgeness histograms for both gradient magnitude and gradient direction can be used. Hmag(R) denotes the normalized histogram of gradient magnitudes of region R, and Hdir(R) denotes the normalized histogram of gradient orientations of region R. Both are normalized according to the size NR Then F m a g , d i r = ( H m a g ( R ) , H d i r ( R ) ) {\displaystyle F_{mag,dir}=(H_{mag}(R),H_{dir}(R))} is a quantitative texture description of region R. === Co-occurrence matrices === The co-occurrence matrix captures numerical features of a texture using spatial relations of similar gray tones. Numerical features computed from the co-occurrence matrix can be used to represent, compare, and classify textures. The following are a subset of standard features derivable from a normalized co-occurrence matrix: A n g u l a r 2 n d M o m e n t = ∑ i ∑ j p [ i , j ] 2 C o n t r a s t = ∑ i = 1 N g ∑ j = 1 N g n 2 p [ i , j ] , where | i − j | = n C o r r e l a t i o n = ∑ i = 1 N g ∑ j = 1 N g ( i j ) p [ i , j ] − μ x μ y σ x σ y E n t r o p y = − ∑ i ∑ j p [ i , j ] l n ( p [ i , j ] ) {\displaystyle {\begin{aligned}Angular{\text{ }}2nd{\text{ }}Moment&=\sum _{i}\sum _{j}p[i,j]^{2}\\Contrast&=\sum _{i=1}^{Ng}\sum _{j=1}^{Ng}n^{2}p[i,j]{\text{, where }}|i-j|=n\\Correlation&={\frac {\sum _{i=1}^{Ng}\sum _{j=1}^{Ng}(ij)p[i,j]-\mu _{x}\mu _{y}}{\sigma _{x}\sigma _{y}}}\\Entropy&=-\sum _{i}\sum _{j}p[i,j]ln(p[i,j])\\\end{aligned}}} where p [ i , j ] {\displaystyle p[i,j]} is the [ i , j ] {\displaystyle [i,j]} th entry in a gray-tone spatial dependence matrix, and Ng is the number of distinct gray-levels in the quantized image. One negative aspect of the co-occurrence matrix is that the extracted features do not necessarily correspond to visual perception. It is used in dentistry for the objective evaluation of lesions [DOI: 10.1155/2020/8831161], treatment efficacy [DOI: 10.3390/ma13163614; DOI: 10.11607/jomi.5686; DOI: 10.3390/ma13173854; DOI: 10.3390/ma13132935] and bone reconstruction during healing [DOI: 10.5114/aoms.2013.33557; DOI: 10.1259/dmfr/22185098; EID: 2-s2.0-81455161223; DOI: 10.3390/ma13163649]. === Laws texture energy measures === Another approach is to use local masks to detect various types of texture features. Laws originally used four vectors representing texture features to create sixteen 2D masks from the outer products of the pairs of vectors. The four vectors and relevant features were as follows: L5 = [ +1 +4 6 +4 +1 ] (Level) E5 = [ -1 -2 0 +2 +1 ] (Edge) S5 = [ -1 0 2 0 -1 ] (Spot) R5 = [ +1 -4 6 -4 +1 ] (Ripple) To these 4, a fifth is sometimes added: W5 = [ -1 +2 0 -2 +1 ] (Wave) From Laws' 4 vectors, 16 5x5 "energy maps" are then filtered down to 9 in order to remove certain symmetric pairs. For instance, L5E5 measures vertical edge content and E5L5 measures horizontal edge content. The average of these two measures is the "edginess" of the content. The resulting 9 maps used by Laws are as follows: L5E5/E5L5 L5R5/R5L5 E5S5/S5E5 S5S5 R5R5 L5S5/S5L5 E5E5 E5R5/R5E5 S5R5/R5S5 Running each of these nine maps over an image to create a new image of the value of the origin ([2,2]) results in 9 "energy maps," or conceptually an image with each pixel associated with a vector of 9 texture attributes. === Autocorrelation and power spectrum === The autocorrelation function of an image can be used to detect repetitive patterns of textures. == Texture segmentation == The use of image texture can be used as a description for regions into segments. There are two main types of segmentation based on image texture, region based and boundary based. Though image texture is not a perfect measure for segmentation it is used along with other measures, such as color, that helps solve segmenting in image. === Region based === Attempts to group or cluster pixels based on texture properties. === Boundary based === Attempts to group or cluster pixels based on edges between pixels that come from different texture properties.

    Read more →
  • Organizational information theory

    Organizational information theory

    Organizational Information Theory (OIT) is a communication theory, developed by Karl Weick, offering systemic insight into the processing and exchange of information within organizations and among its members. Unlike the past structure-centered theory, OIT focuses on the process of organizing in dynamic, information-rich environments. Given that, it contends that the main activity of organizations is the process of making sense of equivocal information. Organizational members are instrumental to reduce equivocality and achieve sensemaking through some strategies — enactment, selection, and retention of information. With a framework that is interdisciplinary in nature, organizational information theory's desire to eliminate both ambiguity and complexity from workplace messaging builds upon earlier findings from general systems theory and phenomenology. == Inspiration and influence of pre-existing theories == 1. General Systems Theory The General Systems Theory, on its most basic premise, describes the phenomenon of a cohesive group of interrelated parts. When one part of the system is changed or affected, it will affect the system as a whole. Weick uses this theoretical framework from 1950 to influence his organizational information theory. Likewise, organizations can be viewed as a system of related parts that work together towards a common goal or vision. Applying this to Weick's organizational information theory, organizations must work to reduce ambiguity and complexity in the workplace to maximize cohesiveness and efficiency. Weick uses the term, coupling, to describe how organizations, like a system, can be composed of interrelated and dependent parts. Coupling looks at the relationship between people and work. There are two types of coupling: 1. Loose coupling Loose coupling describes that while people within the organization or system are connected and often work together, they do not depend on one another to continue or fully complete individual work. The dependencies are weak and workflow is flexible. For example, "if the whole Science department completely shuts down because all of teachers are sick or for whatsoever reason, the school can still continue to operate because other departments are still present." 2. Tight coupling Tight coupling describes when connections within an organization are strong and dependent. If one part of the organization is not operating correctly, the organization as a whole cannot continue to their fullest potential. " For instance, the format and ink section completely shuts down hence the succeeding steps cannot be continued, so the whole process of the organization will be dropped. Thus, components of a system are directly dependent on one another." 2. Theory of evolution The theory of evolution, by Charles Darwin, is a framework for survival of the fittest. According to Darwin, organisms attempt to adapt and live in an unforgiving environment. Those that are unsuccessful in adaptation do not survive, while the strong organisms continue to thrive and reproduce. Weick invokes inspiration from Darwin, to incorporate a biological perspective to his theory. It is natural for organizations to have to adapt to incoming information that often interfere with the preexisting environment. Organizations that are able to plan and alter strategies in accordance with their constant need of organizing and sense making, will survive and be the most successful. However, there is a notable difference between animal evolution and survival of the fittest in organizations, "A given animal is what it is; variation comes through mutation. But the nature of an organization can change when its members alter their behavior." == Assumptions == 1. Human organizations exist in an information environment Unlike senders and receivers models, OIT stands on the situational perspective. Karl Weick views a human organization as an open social system. People in that system develop a mechanism to establish goals, obtain and process information, or perceive the environment. In this process, people and the environment come to conclusions on "what's going on here?". Colville believes that this attributional process is retrospective. Take an education institution as an example. A university can obtain information regarding students' needs in numerous ways. It might create feedback section in its website. It could organize alumni panels or academic affairs to attract prospective students and collect concrete questions they are interested in. It may also conduct the survey or host focus group to get the information. After that, the staff of the university have to decide how to deal with these information, based on which, it has to set and accomplish its goals for current and prospective students. 2. The information an organization receives differs in terms of equivocality Weick posits that numerous feasible interpretations of reality exist when organizations process information. Their varying levels of understandability lead to different outcomes of information inputs. In other academic works, scholars tend to say that messages are uncertain or ambiguous. While according to OIT, messages are described to be equivocal. believes that people proactively exclude a number of possibilities to perceive what is going on in the environment. Due to OIT's situational perspective, the meanings of messages consist of the messages, the interpretations of receivers, and the interactional context. However, ambiguity and uncertainty can mean that a standard answer - the only one true objective interpretation - exists. Also, Weick emphasizes that "the equivocality is the engine that motivates people to organize". Maitlis and Christianson states that the equivocality trigger sensemaking for three reasons: environment jolts and organizational crises, threats to identity, and planned change interventions. 3. Human organizations engage in information processing to reduce equivocality of information Based upon the first two assumption, OIT proposes that information processing within organizations is a social activity. Sharing is the key feature of organizational information processing. In that particular context, members jointly make sense the reality by reducing equivocality. It other words, the sensemaking is a joint responsibility which includes numerous interdependent people to accomplish. In this process, organizations and its members combine actions and attributions together in order to find the balance between the complexity of thoughts and the simplicity of actions. Weick also proposes that people create their own environment though enactment, which is the action of making sense. This is because people have different perceptual schemas and selective perception, so people create different information environments. In creating different information environments, people can arrive at the same or close to the same understanding or solution through different thought processes and overall understanding. == Key concepts == === The organization === In order to place Weick's vision regarding Organizational Information Theory into proper working context, exploring his view regarding what constitutes the organization and how its individuals embody that construct might yield significant insights. From a fundamental standpoint, he shared a belief that organizational validation is derived---not through bricks and mortar, or locale—but from a series of events which enable entities to "collect, manage and use the information they receive." In elaborating further on what constitutes an organization during early writings outlining OIT, Weick said, "The word organization is a noun and it is also a myth. if one looks for an organization, one will not find it. What will be found is that there are events linked together, that transpire within concrete walls and these sequences, their pathways, their timing, are the forms we erroneously make into substances when we talk about an organization". When viewed in this modular fashion, the organization meets Weick's theoretical vision by encompassing parameters that are less bound by concrete, wood, and structural restraints and more by an ability to serve as a repository where information can be consistently and effectively channeled. Taking these defining characteristics into account, proper channel execution relies on maximization of messaging clarity, context, delivery and evolution through any system. One example as to how these interactions might unfold on a more granular level within these confines can be gleaned through Weick's double interact loop, which he considers the "building blocks of every organization". Simply put, double interacts describe interpersonal exchanges that, inherently, occur across the organizational chain of command and in life, itself. Thus: "An act occurs when you say something (Can I have a Popsicle?). An interact occurs when you say something and I respond ("No, it will spoil your dinner

    Read more →
  • Quality of Data

    Quality of Data

    Quality of Data (QoD) is a designation coined by L. Veiga, that specifies and describes the required Quality of Service of a distributed storage system from the Consistency point of view of its data. It can be used to support big data management frameworks, Workflow management, and HPC systems (mainly for data replication and consistency). It takes into account data semantics, namely the Time interval of data freshness, the Sequence of tolerable number of outstanding versions of the data read before ore refresh, and the Value divergence allowed before displaying it. Initially it was based on a model from an existing research work regarding vector-field Consistency, awarded the best-paper prize in the ACM/IFIP/Usenix Middleware Conference 2007 and later enhanced for increased scalability and fault-tolerance. This consistency model has been successfully applied and proven in big data key/value store Apache HBase, initially designed as a middleware module seating between clusters from separate data centres. The HBase-QoD coupling minimises bandwidth usage and optimises resources allocation during replication achieving the desired consistency level at a more fine-grained level. QoD is defined by the three-dimensions of vector k=(θ,σ,ν), but with a broader view of the issue, applicable also to large-scale data management techniques in regards to their timely delivery. == Other descriptions == Quality of Data should not be confused with other definitions for data quality such as completeness, validity, and accuracy.

    Read more →
  • Small data

    Small data

    Small data is data that is 'small' enough for human comprehension. It is data in a volume and format that makes it accessible, informative and actionable. The term "big data" is about machines and "small data" is about people. This is to say that eyewitness observations or five pieces of related data could be small data. Small data is what we used to think of as data. The only way to comprehend Big data is to reduce the data into small, visually-appealing objects representing various aspects of large data sets (such as histogram, charts, and scatter plots). Big Data is all about finding correlations, but Small Data is all about finding the causation, the reason why. A formal definition of small data has been proposed by Allen Bonde, former vice-president of Innovation at Actuate - now part of OpenText: "Small data connects people with timely, meaningful insights (derived from big data and/or “local” sources), organized and packaged – often visually – to be accessible, understandable, and actionable for everyday tasks." Another definition of small data is: The small set of specific attributes produced by the Internet of Things. These are typically a small set of sensor data such as temperature, wind speed, vibration and status. It was estimated (2016) that “If one takes the top 100 biggest innovations of our time, perhaps around 60% to 65% percent are really based on Small Data.” as Martin Lindstrom puts it. Small data includes everything from Snapchat to simple objects such as the post-it note. Lindstrom believes we become so focused on Big-Data that we tend to forget about more basic concepts and creativity. Lindstrom defines Small Data "as seemingly insignificant observations you identify in consumers’ homes, is everything from how you place your shoes on how you hang your paintings". He thus considers that one should perfectly master the basic (Small Data) in order to mine and find correlations. == Academic Recognition and Methodology == The growing significance of "small data" as a distinct field of inquiry was highlighted by the 2024 Thematic Einstein Semester (TES) on Small Data Analysis, hosted by the Berlin Mathematics Research Center MATH+. A central focus of this semester was the transition from theoretical analysis to practical decision-making. Because small data sets are primarily used to drive specific actions, the presentation of results becomes an essential methodological step. The semester’s findings emphasized that while small data may lack volume, it often contains a high density of "many possible interpretations." Consequently, the final conference of the TES was structured around the pillars of interpretation, explanation, and knowledge gain. Participants sought to develop new mathematical and methodical representations that could accurately depict this wealth of interpretative possibilities. This work underscores that analyzing small data is not purely a computational task; it requires a robust interface between mathematics and diverse disciplines to ensure that insights are both contextually grounded and scientifically rigorous. == Uses in business == === Marketing === Bonde has written about the topic for Forbes, Direct Marketing News, CMO.com and other publications. According to Martin Lindstrom, in his book, Small Data: "{In customer research, small data is} Seemingly insignificant behavioural observations containing very specific attributes pointing towards an unmet customer need. Small data is the foundation for breakthrough ideas or completely new ways to turnaround brands." His approach is based on the combination of the observation of small samples with intuition. Marketers can obtain market insights from gathering Small Data by engaging with and observing people in their own environments. In comparison to Big Data, Small Data has the power to trigger emotions and to provide insights into the reasons behind the behaviours of customers. It may uncover detailed information on a person's extroversion or introversion, self-confidence, whether one is having problems in his/her relationship, etc. According to Lindstrom, relationships among people and customer segments are organized around four criteria: Climate: It reveals for example how a person's environment affects their diet. Rulership: The power or government in charge Religion: The prevalence of religion in a country, depending on its influence, indicates whether a person's decision making process is impacted by their belief system. Tradition: Cultural norms influence people's behaviors and interactions. Many companies underestimate the power of Small Data, using samples of millions of consumers instead of recognizing the value of closely observing small samples in their market research. In his book, Lindstrom defines "7Cs", which companies should consider in the attempt to derive meaningful customer insights and market trends through small data from their customers: Collecting: Understanding the manner in which observations are translated inside a home. Clues: Uncovering other distinctive emotional reflections that can be observed. Connecting: Identifying the consequences of emotional behaviour. Causation: Understanding what emotions are being evoked. Correlation: Identifying the initial date of appearance of the behaviour or emotion. Compensation: Identifying the unmet or unfulfilled desire. Concept: Defining the “big idea” compensation for the identified consumer need. Some of Lindstrom's clients such as Lowes Foods looked at data in a different way and actually chose to live with the customer. “As you enter their store, they have now created an amazing community where every staff member acts in a character mood, based on Small Data”. The supermarket made everything it can to make the customer feel at home. All the behaviours of employees are inspired by customer feedbacks gathered from interviews directly done at customer’s home. === Healthcare === Researchers at Cornell University started developing applications to monitor health problems in patients, based on small data. This is an initiative of Cornell's Small Data Lab, in close cooperation with Weill Cornell Medicine College, led by Deborah Estrin. The Small Data Lab developed a series of apps, focusing not only on gathering data from patients' pain but also tracking habits in areas such as grocery shopping. In the case of patients with rheumatoid arthritis for example, which has flares and remissions that do not follow a particular cycle, the app gathers information passively, thus allowing to forecast when a flare might be coming up based on small changes in behaviour. Other apps developed also include monitoring online grocery shopping, to use this information from every user to adapt their groceries to the recommendations of nutritionists, or monitoring email language to identify patterns that might indicate "fluctuations in cognitive performance, fatigue, side effects of medication or poor sleep, and other conditions and treatments that are typically self-reported and self-medicated". === Postal Service === The United States Postal Service (USPS) used optical character recognition (OCR) to automatically read and process 98% of all hand-addressed mail and 99.5% of machine-printed mail. By combining this technology with its small data sample of US zip codes, the USPS can now process more than 36,000 pieces of mail per hour. === Aerospace === In 2015, Boeing established the analytics lab for aerospace data in cooperation with the Carnegie Mellon University to leverage the university's leadership in machine learning, language technologies and data analytics. One of the initiatives projects aims to by standardize maintenance logs using AI to dramatically reduce costs. Currently, there is no standardized procedure to document maintenance logs leading to small but highly unstructured data sets. As a result, it becomes highly difficult for maintenance workers to translate these variations in maintenance logs within a short period of time. However, with AI and a narrow data set of common aircraft maintenance terminology, it becomes possible to dynamically translate these logs in real time. By using AI to enhance the speed and accuracy of the airline maintenance workflow, airlines stand to save billions according to the Harvard Business Review.

    Read more →
  • SWILE

    SWILE

    SWILE (formerly: Lunchr) is a French app-based company that focuses on improving the employee experience. Among others, the platform offers meal vouchers, gift vouchers, mobility vouchers, and business travel solutions. In March 2020, it was renamed SWILE and entered the lunch break and meal voucher market. == History == The company was founded as Lunchr by Loïc Soubeyrand in 2016. Originally, Lunchr was an app for pre-ordering lunch on the spot or to go. In January 2017, the company raised €2.5 million in seed funding from Daphni. In 2018, the company raised €11 million (series A) from Idinvest, followed by another €30 million in February 2019 (series B), notably from Index Ventures and Kima Ventures. In January 2020, Lunchr became one of the first startups to join the French Tech 120. A few months later, in March, Lunchr diversified its services, adding team life management tools and changing its brand name to Swile. In June 2020, the company raised €70 million more in a new round of financing (Series C) from the same investors and the BPI. In November 2020, Swile acquired Briq, a startup specializing in employee engagement. In January 2021, Swile won a tender with Carrefour and distributed 62,000 Swile cards to its employees. In early October 2021, a new $200 million (€175 million) fundraising round, in which Japanese Softbank joined other investors, allowed Swile to capitalize on $1 billion. President Emmanuel Macron cited the company as "a further proof that FrenchTech is at the forefront internationally." In May 2022, the company acquired the travel management start-up Okarito for €6 million. == Overview == Swile operates in two countries (France and Brazil) and has a total of 1000 employees, 5.5 million users and 85,000 corporate customers, including Carrefour, Le Monde, JCDECAUX, PSG, Airbnb, Spotify, Red Bull, and TikTok in the private sector, as well as numerous local authorities and ministerial references in the public sector.

    Read more →
  • Berlekamp–Rabin algorithm

    Berlekamp–Rabin algorithm

    In number theory, Berlekamp's root finding algorithm, also called the Berlekamp–Rabin algorithm, is the probabilistic method of finding roots of polynomials over the field F p {\displaystyle \mathbb {F} _{p}} with p {\displaystyle p} elements. The method was discovered by Elwyn Berlekamp in 1970 as an auxiliary to the algorithm for polynomial factorization over finite fields. The algorithm was later modified by Rabin for arbitrary finite fields in 1979. The method was also independently discovered before Berlekamp by other researchers. == History == The method was proposed by Elwyn Berlekamp in his 1970 work on polynomial factorization over finite fields. His original work lacked a formal correctness proof and was later refined and modified for arbitrary finite fields by Michael Rabin. In 1986 René Peralta proposed a similar algorithm for finding square roots in F p {\displaystyle \mathbb {F} _{p}} . In 2000 Peralta's method was generalized for cubic equations. == Statement of problem == Let p {\displaystyle p} be an odd prime number. Consider the polynomial f ( x ) = a 0 + a 1 x + ⋯ + a n x n {\textstyle f(x)=a_{0}+a_{1}x+\cdots +a_{n}x^{n}} over the field F p ≃ Z / p Z {\displaystyle \mathbb {F} _{p}\simeq \mathbb {Z} /p\mathbb {Z} } of remainders modulo p {\displaystyle p} . The algorithm should find all λ {\displaystyle \lambda } in F p {\displaystyle \mathbb {F} _{p}} such that f ( λ ) = 0 {\textstyle f(\lambda )=0} in F p {\displaystyle \mathbb {F} _{p}} . == Algorithm == === Randomization === Let f ( x ) = ( x − λ 1 ) ( x − λ 2 ) ⋯ ( x − λ n ) {\textstyle f(x)=(x-\lambda _{1})(x-\lambda _{2})\cdots (x-\lambda _{n})} . Finding all roots of this polynomial is equivalent to finding its factorization into linear factors. To find such factorization it is sufficient to split the polynomial into any two non-trivial divisors and factorize them recursively. To do this, consider the polynomial f z ( x ) = f ( x − z ) = ( x − λ 1 − z ) ( x − λ 2 − z ) ⋯ ( x − λ n − z ) {\textstyle f_{z}(x)=f(x-z)=(x-\lambda _{1}-z)(x-\lambda _{2}-z)\cdots (x-\lambda _{n}-z)} where z {\displaystyle z} is some element of F p {\displaystyle \mathbb {F} _{p}} . If one can represent this polynomial as the product f z ( x ) = p 0 ( x ) p 1 ( x ) {\displaystyle f_{z}(x)=p_{0}(x)p_{1}(x)} then in terms of the initial polynomial it means that f ( x ) = p 0 ( x + z ) p 1 ( x + z ) {\displaystyle f(x)=p_{0}(x+z)p_{1}(x+z)} , which provides needed factorization of f ( x ) {\displaystyle f(x)} . === Classification of === F p {\displaystyle \mathbb {F} _{p}} elements Due to Euler's criterion, for every monomial ( x − λ ) {\displaystyle (x-\lambda )} exactly one of following properties holds: The monomial is equal to x {\displaystyle x} if λ = 0 {\displaystyle \lambda =0} , The monomial divides g 0 ( x ) = ( x ( p − 1 ) / 2 − 1 ) {\textstyle g_{0}(x)=(x^{(p-1)/2}-1)} if λ {\displaystyle \lambda } is quadratic residue modulo p {\displaystyle p} , The monomial divides g 1 ( x ) = ( x ( p − 1 ) / 2 + 1 ) {\textstyle g_{1}(x)=(x^{(p-1)/2}+1)} if λ {\displaystyle \lambda } is quadratic non-residual modulo p {\displaystyle p} . Thus if f z ( x ) {\displaystyle f_{z}(x)} is not divisible by x {\displaystyle x} , which may be checked separately, then f z ( x ) {\displaystyle f_{z}(x)} is equal to the product of greatest common divisors gcd ( f z ( x ) ; g 0 ( x ) ) {\displaystyle \gcd(f_{z}(x);g_{0}(x))} and gcd ( f z ( x ) ; g 1 ( x ) ) {\displaystyle \gcd(f_{z}(x);g_{1}(x))} . === Berlekamp's method === The property above leads to the following algorithm: Explicitly calculate coefficients of f z ( x ) = f ( x − z ) {\displaystyle f_{z}(x)=f(x-z)} , Calculate remainders of x , x 2 , x 2 2 , x 2 3 , x 2 4 , … , x 2 ⌊ log 2 ⁡ p ⌋ {\textstyle x,x^{2},x^{2^{2}},x^{2^{3}},x^{2^{4}},\ldots ,x^{2^{\lfloor \log _{2}p\rfloor }}} modulo f z ( x ) {\displaystyle f_{z}(x)} by squaring the current polynomial and taking remainder modulo f z ( x ) {\displaystyle f_{z}(x)} , Using exponentiation by squaring and polynomials calculated on the previous steps calculate the remainder of x ( p − 1 ) / 2 {\textstyle x^{(p-1)/2}} modulo f z ( x ) {\textstyle f_{z}(x)} , If x ( p − 1 ) / 2 ≢ ± 1 ( mod f z ( x ) ) {\textstyle x^{(p-1)/2}\not \equiv \pm 1{\pmod {f_{z}(x)}}} then gcd {\displaystyle \gcd } mentioned below provide a non-trivial factorization of f z ( x ) {\displaystyle f_{z}(x)} , Otherwise all roots of f z ( x ) {\displaystyle f_{z}(x)} are either residues or non-residues simultaneously and one has to choose another z {\displaystyle z} . If f ( x ) {\displaystyle f(x)} is divisible by some non-linear primitive polynomial g ( x ) {\displaystyle g(x)} over F p {\displaystyle \mathbb {F} _{p}} then when calculating gcd {\displaystyle \gcd } with g 0 ( x ) {\displaystyle g_{0}(x)} and g 1 ( x ) {\displaystyle g_{1}(x)} one will obtain a non-trivial factorization of f z ( x ) / g z ( x ) {\displaystyle f_{z}(x)/g_{z}(x)} , thus algorithm allows to find all roots of arbitrary polynomials over F p {\displaystyle \mathbb {F} _{p}} . === Modular square root === Consider equation x 2 ≡ a ( mod p ) {\textstyle x^{2}\equiv a{\pmod {p}}} having elements β {\displaystyle \beta } and − β {\displaystyle -\beta } as its roots. Solution of this equation is equivalent to factorization of polynomial f ( x ) = x 2 − a = ( x − β ) ( x + β ) {\textstyle f(x)=x^{2}-a=(x-\beta )(x+\beta )} over F p {\displaystyle \mathbb {F} _{p}} . In this particular case problem it is sufficient to calculate only gcd ( f z ( x ) ; g 0 ( x ) ) {\displaystyle \gcd(f_{z}(x);g_{0}(x))} . For this polynomial exactly one of the following properties will hold: GCD is equal to 1 {\displaystyle 1} which means that z + β {\displaystyle z+\beta } and z − β {\displaystyle z-\beta } are both quadratic non-residues, GCD is equal to f z ( x ) {\displaystyle f_{z}(x)} which means that both numbers are quadratic residues, GCD is equal to ( x − t ) {\displaystyle (x-t)} which means that exactly one of these numbers is quadratic residue. In the third case GCD is equal to either ( x − z − β ) {\displaystyle (x-z-\beta )} or ( x − z + β ) {\displaystyle (x-z+\beta )} . It allows to write the solution as β = ( t − z ) ( mod p ) {\textstyle \beta =(t-z){\pmod {p}}} . === Example === Assume we need to solve the equation x 2 ≡ 5 ( mod 11 ) {\textstyle x^{2}\equiv 5{\pmod {11}}} . For this we need to factorize f ( x ) = x 2 − 5 = ( x − β ) ( x + β ) {\displaystyle f(x)=x^{2}-5=(x-\beta )(x+\beta )} . Consider some possible values of z {\displaystyle z} : Let z = 3 {\displaystyle z=3} . Then f z ( x ) = ( x − 3 ) 2 − 5 = x 2 − 6 x + 4 {\displaystyle f_{z}(x)=(x-3)^{2}-5=x^{2}-6x+4} , thus gcd ( x 2 − 6 x + 4 ; x 5 − 1 ) = 1 {\displaystyle \gcd(x^{2}-6x+4;x^{5}-1)=1} . Both numbers 3 ± β {\displaystyle 3\pm \beta } are quadratic non-residues, so we need to take some other z {\displaystyle z} . Let z = 2 {\displaystyle z=2} . Then f z ( x ) = ( x − 2 ) 2 − 5 = x 2 − 4 x − 1 {\displaystyle f_{z}(x)=(x-2)^{2}-5=x^{2}-4x-1} , thus gcd ( x 2 − 4 x − 1 ; x 5 − 1 ) ≡ x − 9 ( mod 11 ) {\textstyle \gcd(x^{2}-4x-1;x^{5}-1)\equiv x-9{\pmod {11}}} . From this follows x − 9 = x − 2 − β {\textstyle x-9=x-2-\beta } , so β ≡ 7 ( mod 11 ) {\displaystyle \beta \equiv 7{\pmod {11}}} and − β ≡ − 7 ≡ 4 ( mod 11 ) {\textstyle -\beta \equiv -7\equiv 4{\pmod {11}}} . A manual check shows that, indeed, 7 2 ≡ 49 ≡ 5 ( mod 11 ) {\textstyle 7^{2}\equiv 49\equiv 5{\pmod {11}}} and 4 2 ≡ 16 ≡ 5 ( mod 11 ) {\textstyle 4^{2}\equiv 16\equiv 5{\pmod {11}}} . == Correctness proof == The algorithm finds factorization of f z ( x ) {\displaystyle f_{z}(x)} in all cases except for ones when all numbers z + λ 1 , z + λ 2 , … , z + λ n {\displaystyle z+\lambda _{1},z+\lambda _{2},\ldots ,z+\lambda _{n}} are quadratic residues or non-residues simultaneously. According to theory of cyclotomy, the probability of such an event for the case when λ 1 , … , λ n {\displaystyle \lambda _{1},\ldots ,\lambda _{n}} are all residues or non-residues simultaneously (that is, when z = 0 {\displaystyle z=0} would fail) may be estimated as 2 − k {\displaystyle 2^{-k}} where k {\displaystyle k} is the number of distinct values in λ 1 , … , λ n {\displaystyle \lambda _{1},\ldots ,\lambda _{n}} . In this way even for the worst case of k = 1 {\displaystyle k=1} and f ( x ) = ( x − λ ) n {\displaystyle f(x)=(x-\lambda )^{n}} , the probability of error may be estimated as 1 / 2 {\displaystyle 1/2} and for modular square root case error probability is at most 1 / 4 {\displaystyle 1/4} . == Complexity == Let a polynomial have degree n {\displaystyle n} . We derive the algorithm's complexity as follows: Due to the binomial theorem ( x − z ) k = ∑ i = 0 k ( k i ) ( − z ) k − i x i {\textstyle (x-z)^{k}=\sum \limits _{i=0}^{k}{\binom {k}{i}}(-z)^{k-i}x^{i}} , we may transition from f ( x ) {\displaystyle f(x)} to f ( x − z ) {\displaystyle f(x-z)} in O ( n 2 ) {\displaystyle O(n^{2})} time. Polynomial multiplication a

    Read more →
  • Car–Parrinello molecular dynamics

    Car–Parrinello molecular dynamics

    Car–Parrinello molecular dynamics (CPMD) refers to either a method used in molecular dynamics (also known as the Car–Parrinello method) or the computational chemistry software package used to implement this method. The CPMD method is one of the major methods for calculating ab initio molecular dynamics (ab initio MD or AIMD). Ab initio molecular dynamics (AIMD) is a computational method that uses first principles through quantum mechanics to simulate the motion of atoms in a system. It is a type of molecular dynamics (MD) simulation that does not rely on empirical potentials or force fields to describe the interactions between atoms, but rather calculates these interactions entirely from the electronic structure of the system using quantum mechanics. In an ab initio MD simulation, the total energy of the system is calculated at each time step using density functional theory (DFT), Hartree-Fock (HF), or other electronic structure calculation methods. The forces acting on each atom are then determined from the gradient of the energy with respect to the atomic coordinates, and the equations of motion are solved to predict the trajectory of the atoms. AIMD permits chemical bond breaking and forming events to occur and accounts for electronic polarization effect. Therefore, Ab initio MD simulations can be used to study a wide range of phenomena, including the structural, thermodynamic, and dynamic properties of materials and chemical reactions. They are particularly useful for systems that are not well described by empirical potentials or force fields, such as systems with strong electronic correlation or systems with many degrees of freedom. However, ab initio MD simulations are computationally demanding and require significant computational resources. The CPMD method is related to the more common Born–Oppenheimer molecular dynamics (BOMD) method in that the quantum mechanical effect of the electrons is included in the calculation of energy and forces for the classical motion of the nuclei. CPMD and BOMD are different types of AIMD. However, whereas BOMD treats the electronic structure problem within the time-independent Schrödinger equation, CPMD explicitly includes the electrons as active degrees of freedom, via (fictitious) dynamical variables. The software is a parallelized plane wave / pseudopotential implementation of density functional theory, particularly designed for ab initio molecular dynamics. == Car–Parrinello method == The Car–Parrinello method is a type of molecular dynamics, usually employing periodic boundary conditions, planewave basis sets, and density functional theory, proposed by Roberto Car and Michele Parrinello in 1985 while working at SISSA, who were subsequently awarded the Dirac Medal by ICTP in 2009. In contrast to Born–Oppenheimer molecular dynamics wherein the nuclear (ions) degree of freedom are propagated using ionic forces which are calculated at each iteration by approximately solving the electronic problem with conventional matrix diagonalization methods, the Car–Parrinello method explicitly introduces the electronic degrees of freedom as (fictitious) dynamical variables, writing an extended Lagrangian for the system which leads to a system of coupled equations of motion for both ions and electrons. In this way, an explicit electronic minimization at each time step, as done in Born–Oppenheimer MD, is not needed: after an initial standard electronic minimization, the fictitious dynamics of the electrons keeps them on the electronic ground state corresponding to each new ionic configuration visited along the dynamics, thus yielding accurate ionic forces. In order to maintain this adiabaticity condition, it is necessary that the fictitious mass of the electrons is chosen small enough to avoid a significant energy transfer from the ionic to the electronic degrees of freedom. This small fictitious mass in turn requires that the equations of motion are integrated using a smaller time step than the one (1–10 fs) commonly used in Born–Oppenheimer molecular dynamics. Currently, the CPMD method can be applied to systems that consist of a few tens or hundreds of atoms and access timescales on the order of tens of picoseconds. == General approach == In CPMD the core electrons are usually described by a pseudopotential and the wavefunction of the valence electrons are approximated by a plane wave basis set. The ground state electronic density (for fixed nuclei) is calculated self-consistently, usually using the density functional theory method. Kohn-Sham equations are often used to calculate the electronic structure, where electronic orbitals are expanded in a plane-wave basis set. Then, using that density, forces on the nuclei can be computed, to update the trajectories (using, e.g. the Verlet integration algorithm). In addition, however, the coefficients used to obtain the electronic orbital functions can be treated as a set of extra spatial dimensions, and trajectories for the orbitals can be calculated in this context. == Fictitious dynamics == CPMD is an approximation of the Born–Oppenheimer MD (BOMD) method. In BOMD, the electrons' wave function must be minimized via matrix diagonalization at every step in the trajectory. CPMD uses fictitious dynamics to keep the electrons close to the ground state, preventing the need for a costly self-consistent iterative minimization at each time step. The fictitious dynamics relies on the use of a fictitious electron mass (usually in the range of 400 – 800 a.u.) to ensure that there is very little energy transfer from nuclei to electrons, i.e. to ensure adiabaticity. Any increase in the fictitious electron mass resulting in energy transfer would cause the system to leave the ground-state BOMD surface. === Lagrangian === L = 1 2 ( ∑ I n u c l e i M I R ˙ I 2 + μ ∑ i o r b i t a l s ∫ d r | ψ ˙ i ( r , t ) | 2 ) − E [ { ψ i } , { R I } ] + ∑ i j Λ i j ( ∫ d r ψ i ψ j − δ i j ) , {\displaystyle {\mathcal {L}}={\frac {1}{2}}\left(\sum _{I}^{\mathrm {nuclei} }\ M_{I}{\dot {\mathbf {R} }}_{I}^{2}+\mu \sum _{i}^{\mathrm {orbitals} }\int d\mathbf {r} \ |{\dot {\psi }}_{i}(\mathbf {r} ,t)|^{2}\right)-E\left[\{\psi _{i}\},\{\mathbf {R} _{I}\}\right]+\sum _{ij}\Lambda _{ij}\left(\int d\mathbf {r} \ \psi _{i}\psi _{j}-\delta _{ij}\right),} where μ {\displaystyle \mu } is the fictitious mass parameter; E[{ψi},{RI}] is the Kohn–Sham energy density functional, which outputs energy values when given Kohn–Sham orbitals and nuclear positions. === Orthogonality constraint === ∫ d r ψ i ∗ ( r , t ) ψ j ( r , t ) = δ i j , {\displaystyle \int d\mathbf {r} \ \psi _{i}^{}(\mathbf {r} ,t)\psi _{j}(\mathbf {r} ,t)=\delta _{ij},} where δij is the Kronecker delta. === Equations of motion === The equations of motion are obtained by finding the stationary point of the Lagrangian under variations of ψi and RI, with the orthogonality constraint. M I R ¨ I = − ∇ I E [ { ψ i } , { R I } ] {\displaystyle M_{I}{\ddot {\mathbf {R} }}_{I}=-\nabla _{I}\,E\left[\{\psi _{i}\},\{\mathbf {R} _{I}\}\right]} μ ψ ¨ i ( r , t ) = − δ E δ ψ i ∗ ( r , t ) + ∑ j Λ i j ψ j ( r , t ) , {\displaystyle \mu {\ddot {\psi }}_{i}(\mathbf {r} ,t)=-{\frac {\delta E}{\delta \psi _{i}^{}(\mathbf {r} ,t)}}+\sum _{j}\Lambda _{ij}\psi _{j}(\mathbf {r} ,t),} where Λij is a Lagrangian multiplier matrix to comply with the orthonormality constraint. === Born–Oppenheimer limit === In the formal limit where μ → 0, the equations of motion approach Born–Oppenheimer molecular dynamics. == Software packages == There are a number of software packages available for performing AIMD simulations. Some of the most widely used packages include: CP2K: an open-source software package for AIMD. Quantum Espresso: an open-source package for performing DFT calculations. It includes a module for AIMD. VASP: a commercial software package for performing DFT calculations. It includes a module for AIMD. Gaussian: a commercial software package that can perform AIMD. NWChem: an open-source software package for AIMD. LAMMPS: an open-source software package for performing classical and ab initio MD simulations. SIESTA: an open-source software package for AIMD. ORCA: a general-purpose quantum chemistry package. == Applications == Studying the behavior of water across different environments, such as near a hydrophobic graphene sheet. Investigating the structure and dynamics of liquid water at ambient temperature. Solving the heat transfer problems (heat conduction and thermal radiation), such as in Si/Ge superlattices. Probing the proton transfer along hydrogen-bonds in different environments, such as in 1D water chains inside carbon nanotubes. Evaluating the critical point of crystals, composites, and solid-state materials, such as aluminum. Predicting and modelling different phases and phase transitions, such as in the amorphous phase of the phase-change memory material GeSbTe. Studying the combustion of combustibles, such as lignite-water systems. Measuring th

    Read more →
  • Knuth–Eve algorithm

    Knuth–Eve algorithm

    In computer science, the Knuth–Eve algorithm is an algorithm for polynomial evaluation. It preprocesses the coefficients of the polynomial to reduce the number of multiplications required at runtime. Ideas used in the algorithm were originally proposed by Donald Knuth in 1962. His procedure opportunistically exploits structure in the polynomial being evaluated. In 1964, James Eve determined for which polynomials this structure exists, and gave a simple method of "preconditioning" polynomials (explained below) to endow them with that structure. == Algorithm == === Preliminaries === Consider an arbitrary polynomial p ∈ R [ x ] {\displaystyle p\in \mathbb {R} [x]} of degree n {\displaystyle n} . Assume that n ≥ 3 {\displaystyle n\geq 3} . Define m {\displaystyle m} such that: if n {\displaystyle n} is odd then n = 2 m + 1 {\displaystyle n=2m+1} , and if n {\displaystyle n} is even then n = 2 m + 2 {\displaystyle n=2m+2} . Unless otherwise stated, all variables in this article represent either real numbers or univariate polynomials with real coefficients. All operations in this article are done over R {\displaystyle \mathbb {R} } . Again, the goal is to create an algorithm that returns p ( x ) {\displaystyle p(x)} given any x {\displaystyle x} . The algorithm is allowed to depend on the polynomial p {\displaystyle p} itself, since its coefficients are known in advance. === Overview === ==== Key idea ==== Using polynomial long division, we can write p ( x ) = q ( x ) ⋅ ( x 2 − α ) + ( β x + γ ) , {\displaystyle p(x)=q(x)\cdot (x^{2}-\alpha )+(\beta x+\gamma ),} where x 2 − α {\displaystyle x^{2}-\alpha } is the divisor. Picking a value for α {\displaystyle \alpha } fixes both the quotient q {\displaystyle q} and the coefficients in the remainder β {\displaystyle \beta } and γ {\displaystyle \gamma } . The key idea is to cleverly choose α {\displaystyle \alpha } such that β = 0 {\displaystyle \beta =0} , so that p ( x ) = q ( x ) ⋅ ( x 2 − α ) + γ . {\displaystyle p(x)=q(x)\cdot (x^{2}-\alpha )+\gamma .} This way, no operations are needed to compute the remainder polynomial, since it's just a constant. We apply this procedure recursively to q {\displaystyle q} , expressing p ( x ) = ( ( q ( x ) ⋅ ( x 2 − α m ) + γ m ) ⋯ ) ⋅ ( x 2 − α 1 ) + γ 1 . {\displaystyle p(x)=\left(\left(q(x)\cdot (x^{2}-\alpha _{m})+\gamma _{m}\right)\cdots \right)\cdot (x^{2}-\alpha _{1})+\gamma _{1}.} After m {\displaystyle m} recursive calls, the quotient q {\displaystyle q} is either a linear or a quadratic polynomial. In this base case, the polynomial can be evaluated with (say) Horner's method. ==== "Preconditioning" ==== For arbitrary p {\displaystyle p} , it may not be possible to force β = 0 {\displaystyle \beta =0} at every step of the recursion. Consider the polynomials p e {\displaystyle p^{e}} and p o {\displaystyle p^{o}} with coefficients taken from the even and odd terms of p {\displaystyle p} respectively, so that p ( x ) = p e ( x 2 ) + x ⋅ p o ( x 2 ) . {\displaystyle p(x)=p^{e}(x^{2})+x\cdot p^{o}(x^{2}).} If every root of p o {\displaystyle p^{o}} is real, then it is possible to write p {\displaystyle p} in the form given above. Each α i {\displaystyle \alpha _{i}} is a different root of p o {\displaystyle p^{o}} , counting multiple roots as distinct. Furthermore, if at least n − 1 {\displaystyle n-1} roots of p {\displaystyle p} lie in one half of the complex plane, then every root of p o {\displaystyle p^{o}} is real. Ultimately, it may be necessary to "precondition" p {\displaystyle p} by shifting it — by setting p ( x ) ← p ( x + t ) {\displaystyle p(x)\gets p(x+t)} for some t {\displaystyle t} — to endow it with the structure that most of its roots lie in one half of the complex plane. At runtime, this shift has to be "undone" by first setting x ← x − t {\displaystyle x\gets x-t} . === Preprocessing step === The following algorithm is run once for a given polynomial p {\displaystyle p} . At this point, the values of x {\displaystyle x} that p {\displaystyle p} will be evaluated on are not known. ==== Better choice of t ==== While any t ≥ Re ( r 2 ) {\displaystyle t\geq {\text{Re}}(r_{2})} can work, it is possible to remove one addition during evaluation if t {\displaystyle t} is also chosen such that two roots of p ( x + t ) {\displaystyle p(x+t)} are symmetric about the origin. In that case, α 1 {\displaystyle \alpha _{1}} can be chosen such that the shifted polynomial has a factor of x 2 − α 1 {\displaystyle x^{2}-\alpha _{1}} , so γ 1 = 0 {\displaystyle \gamma _{1}=0} . It is always possible to find such a t {\displaystyle t} . One possible algorithm for choosing t {\displaystyle t} is: === Evaluation step === The following algorithm evaluates p {\displaystyle p} at some, now known, point x {\displaystyle x} . Assuming t {\displaystyle t} is chosen optimally, γ 1 = 0 {\displaystyle \gamma _{1}=0} . So, the final iteration of the loop can instead run y ← y ⋅ ( s − α i ) , {\displaystyle y\gets y\cdot (s-\alpha _{i}),} saving an addition. == Analysis == In total, evaluation using the Knuth–Eve algorithm for a polynomial of degree n {\displaystyle n} requires n {\displaystyle n} additions and ⌊ n / 2 ⌋ + 2 {\displaystyle \lfloor n/2\rfloor +2} multiplications, assuming t {\displaystyle t} is chosen optimally. No algorithm to evaluate a given polynomial of degree n {\displaystyle n} can use fewer than n {\displaystyle n} additions or fewer than ⌈ n / 2 ⌉ {\displaystyle \lceil n/2\rceil } multiplications during evaluation. This result assumes only addition and multiplication are allowed during both preprocessing and evaluation. The Knuth–Eve algorithm is not well-conditioned.

    Read more →
  • StatMuse

    StatMuse

    StatMuse Inc. is an American artificial intelligence company founded in 2014. It operates an eponymous website that hosts a database of sports statistics covering the four major North American sports leagues, the Women's National Basketball Association (WNBA), NCAA Division I men's basketball, NCAA Division I Football Bowl Subdivision, the Big Five association football leagues in Europe, and various professional golf tours. == History == The company was founded by friends Adam Elmore and Eli Dawson in 2014. In email correspondence to the Springfield News-Leader, Elmore detailed that he and Dawson, fans of the National Basketball Association (NBA), were compelled to create StatMuse after they realized there was no online platform where they could search "Lebron James most points" [sic] and quickly get a result "showing his highest scoring games." As a startup, the company's goal was to utilize a type of artificial intelligence called natural language processing (NLP) for sports. In 2015, the company was part of the second group of startups accepted into the Disney Accelerator program. The company secured support from several investors, including The Walt Disney Company, Techstars, Allen & Company, the NFL Players Association, Greycroft and NBA Commissioner David Stern. As part of their partnership with Disney, StatMuse signed a content deal with ESPN (owned by Disney) to provide stats content on social media and television during the 2015–16 NBA season. Initially, the company only had stats available for the NBA, but eventually expanded to provide stats for the other major North American sports leagues. The company's initial demographic was players of fantasy sports, but it eventually expanded to target general sports fans as well. StatMuse offers responses to user queries in the voices of sports-related public figures. Dawson shared with VentureBeat that StatMuse brings people in and records them saying different words and phrases. These celebrity voices were made accessible through Google's Google Assistant service, Microsoft's Cortana virtual assistant, and Amazon's Echo devices. The company launched its phone app in September 2017. The app allows users to access StatMuse's sports statistics database by submitting queries in their natural language. Upon the launch of the phone app, Fitz Tepper of TechCrunch wrote that: "The technology isn't perfect – some of the pauses between words are a bit awkward, making it clear that some phrases are being stitched together on the fly. But this is the exception, and on the whole, most responses sound pretty good." StatMuse plug-ins for Slack and Facebook Messenger were also made, providing text-based sports stats. In 2019, StatMuse received investment from the Google Assistant Investment program. The service launched a premium option dubbed StatMuse+ in May 2023, offering options that had previously been included for free, such as unlimited searches and full results in data tables. The premium version also included early access to new features and a personalized search history, as well as not having ads. The app received a variety of feedback. In January 2024, the service launched a Premier League version of the website dubbed StatMuse FC. It is planned to introduce more leagues on the website.

    Read more →
  • Technical data management system

    Technical data management system

    A technical data management system (TDMS) is a document management system (DMS) pertaining to the management of technical and engineering drawings and documents. Often the data are contained in 'records' of various forms, such as on paper, microfilms or digital media. Hence technical data management is also concerned with record management involving technical data. Technical document management systems are used within large organisations with large scale projects involving engineering. For example, a TDMS can be used for integrated steel plants (ISP), automobile factories, aero-space facilities, infrastructure companies, city corporations, research organisations, etc. In such organisations, technical archives or technical documentation centres are created as central facilities for effective management of technical data and records. TDMS functions are similar to that of conventional archive functions in concepts, except that the archived materials in this case are essentially engineering drawings, survey maps, technical specifications, plant and equipment data sheets, feasibility reports, project reports, operation and maintenance manuals, standards, etc. Document registration, indexing, repository management, reprography, etc. are parts of TDMS. Various kinds of sophisticated technologies such as document scanners, microfilming and digitization camera units, wide format printers, digital plotters, software, etc. are available, making TDMS functions an easier process than previous times. == Constituents of a technical data management system == Technical data refers to both scientific and technical information recorded and presented in any form or manner (excluding financial and management information). A Technical Data Management System is created within an organisation for archiving and sharing information such as technical specifications, datasheets and drawings. Similar to other types of data management system, a Technical Data Management System consists of the 4 crucial constituents mentioned below. === Data planning === Data plans (long-term or short-term) are constructed as the first essential step of a proper and complete TDMS. It is created to ultimately help with the 3 other constituents, data acquisition, data management and data sharing. A proper data plan should not exceed 2 pages and should address the following basics: Types of data (samples, experiment results, reports, drawings, etc.) and metadata (data that summarizes and describes other data. In this case, it refers to details such as sample sizes, experiment conditions and procedures, dates of reports, explanations of drawings, etc.) Means of researches and collections of data (field works, experiments in production lines, etc.) Costs of researches Policies for access, sharing (re-use within the organisation and re-distribution to the public) Proposals for archiving data and maintaining access to it === Data acquisition === Raw data is collected from primary sites of the organisations through the use of modern technologies. Please reference the table below for examples. The data collected is then transferred to technical data centres for data management. === Data management === After data acquisition, data is sorted out, whilst useful data is archived, unwanted data is disposed. When managing and archiving data, the features below of the data are considered. Names, labels, values and descriptions for variables and records. (In the case of TDMS, one example is names of equipments on an equipment datasheet) Derived data from the original data, with code, algorithm or command file used to create them. (In the case of TDMS, one example is an expectation report derived from the analysis of an equipment datasheet) Metadata associates with the data being archived === Data sharing === Archived and managed data are accessible to rightful entities. A proper and complete TDMS should share data to a suitable extent, under suitable security, in order to achieve optimal usage of data within the organisation. It aims for easy access when reused by other researchers and hence it enhances other research processes. Data is often referred in other tests and technical specifications, where new analysis is generated, managed and archived again. As a result, data is flowing within the organisation under effective management through the use of TDMS. == Advantages and disadvantages of usage of technical data management systems == There are strengths and weakness when using technical data management systems (TDMS) to archive data. Some of the advantages and disadvantages are listed below. === Advantages === ==== 1. Faster and easier data management ==== Since TDMS is integrated into the organisation's systems, whenever workers develop data files (SolidWorks, AutoCAD, Microsoft Word, etc.), they can also archive and manage data, linking what they need to their current work, at the same time they can also update the archives with useful data. This speeds up working processes and makes them more efficient. ==== 2. Increased security ==== All data files are centralized, hence internal and external data leakages are less likely to happen, and the data flow is more closely monitored. As a result, data in the organisation is more secured. ==== 3. Increased collaboration within the organisation ==== Since the data files are centralized and the data flow within the organisation increases, researchers and workers within the organisation are able to work on joint projects. More complex tasks can be performed for higher yields. ==== 4. Compatible to various formats of data ==== TDMS is compatible to many formats of data, from basic data like Microsoft Words to complex data like voice data. This enhances the quality of the management of data archived. === Disadvantages === ==== 1. Higher financial costs ==== Implementing TDMS into the organisation's systems involves monetary costs. Maintenance costs certain amount of human resources and money as well. These resources involve opportunity costs as they can be utilized in other aspects. ==== 2. Lower stability ==== Since TDMS manages and centralizes all the data the organisation processes, it links the working processes within the whole organisation together. It also increases the vulnerability of the organisation data network. If TDMS is not stable enough or when it is exposed to hacker and virus attacks, the organisation's data flow might shut down completely, affecting the work in an organisation-wide scale and leading to a lower stability as results. == Comparison between traditional data management approaches and technical data management systems == Test engineers and researchers are facing great challenges in turning complex test results and simulation data into usable information for higher yields of firms. These challenges are listed below. Increase in complication of designs Reduced in time and budgets available Higher quality is demanded === Traditional data management approaches === Many organisations are still applying the conventional file management systems, due to the difficulty in building a proper and complete archives for data management. The first approach is the simple file-folder system. This costs the problem of ineffectiveness as workers and researchers have to manually go through numerous layers of systems and files for the target data. Moreover, the target data may contain files with different formats and these files may not be stored in the same machine. These files are also easily lost if renamed or moved to another location. The second approach is conventional databases such as Oracle. These databases are capable of enabling easy search and access of data. However, a great drawback is that huge effort for preparing and modeling the data is required. For large-scale projects, huge monetary costs are induced, and extra IT human resources must be employed for constant handling, expanding and maintaining the inflexible system, which is custom for specific tasks, instead of all tasks. In the long-term, it is not cost-effective. === Technical data management systems (TDMS) === TDMS is developed based on 3 principles, flexible and organized file storage, self-scaling hybrid data index, and an interactive post-processing environment. The system in practical, mainly consists of 3 components, data files with essential and relevant Metadata, data finders for organizing and managing data regardless of files formats, and, a software of searching, analyzing and reporting. With metadata attached to original data files, the data finder can identify different related data files during searches, even if they are in different file formats. TDMS hence allows researchers to search for data like browsing the Internet. Last but not least, it can adapt to changes and update itself according to the changes, unlike databases. == Comparison between strong information systems and weak information systems == Complex organizations may need large amounts

    Read more →
  • Energy informatics

    Energy informatics

    Energy informatics is a research field covering the use of information and communication technology to address energy utilization and management challenges. Methods used for "smart" implementations often combine IoT sensors with artificial intelligence and machine learning. Energy Informatics is founded on flow networks that are the major suppliers and consumers of energy. Their efficiency can be improved by collecting and analyzing information. == Application areas == The field among other consider application areas within: Smart Buildings by developing ICT-centred solutions for improving the energy-efficiency of buildings. Smart Cities by investigating the synergies between demand patterns and supply availability of energy flows in cities and communities to improve energy efficiency, increase integration of renewable sources, and provide resilience towards system faults caused by extreme situations, like hurricanes and flooding. Smart Industries including the development of ICT-centred solutions for improving the energy efficiency and predictability of energy intensive industrial processes, without compromising process and product quality. Smart Energy Networks by developing ICT-centred solutions for coordinating the supply and demand in environmentally sustainable energy networks.

    Read more →
  • AVT Statistical filtering algorithm

    AVT Statistical filtering algorithm

    AVT Statistical filtering algorithm is an approach to improving quality of raw data collected from various sources. It is most effective in cases when there is inband noise present. In those cases AVT is better at filtering data then, band-pass filter or any digital filtering based on variation of. Conventional filtering is useful when signal/data has different frequency than noise and signal/data is separated/filtered by frequency discrimination of noise. Frequency discrimination filtering is done using Low Pass, High Pass and Band Pass filtering which refers to relative frequency filtering criteria target for such configuration. Those filters are created using passive and active components and sometimes are implemented using software algorithms based on Fast Fourier transform (FFT). AVT filtering is implemented in software and its inner working is based on statistical analysis of raw data. When signal frequency/(useful data distribution frequency) coincides with noise frequency/(noisy data distribution frequency) we have inband noise. In this situations frequency discrimination filtering does not work since the noise and useful signal are indistinguishable and where AVT excels. To achieve filtering in such conditions there are several methods/algorithms available which are briefly described below. == Averaging algorithm == Collect n samples of data Calculate average value of collected data Present/record result as actual data == Median algorithm == Collect n samples of data Sort the data in ascending or descending order. Note that order does not matter Select the data that happen to be in n/2 position and present/record it as final result representing data sample == AVT algorithm == AVT algorithm stands for Antonyan Vardan Transform and its implementation explained below. Collect n samples of data Calculate the standard deviation and average value Drop any data that is greater or less than average ± one standard deviation Calculate average value of remaining data Present/record result as actual value representing data sample This algorithm is based on amplitude discrimination and can easily reject any noise that is not like actual signal, otherwise statistically different than 1 standard deviation of the signal. Note that this type of filtering can be used in situations where the actual environmental noise is not known in advance. Notice that it is preferable to use the median in above steps than average. Originally the AVT algorithm used average value to compare it with results of median on the data window. == Filtering algorithms comparison == Using a system that has signal value of 1 and has noise added at 0.1% and 1% levels will simplify quantification of algorithm performance. The R script is used to create pseudo random noise added to signal and analyze the results of filtering using several algorithms. Please refer to "Reduce Inband Noise with the AVT Algorithm" article for details. This graphs show that AVT algorithm provides best results compared with Median and Averaging algorithms while using data sample size of 32, 64 and 128 values. Note that this graph was created by analyzing random data array of 10000 values. Sample of this data is graphically represented below. From this graph it is apparent that AVT outperforms other filtering algorithms by providing 5% to 10% more accurate data when analyzing same datasets. Considering random nature of noise used in this numerical experiment that borderlines worst case situation where actual signal level is below ambient noise the precision improvements of processing data with AVT algorithm are significant. == AVT algorithm variations == === Cascaded AVT === In some situations better results can be obtained by cascading several stages of AVT filtering. This will produce singular constant value which can be used for equipment that has known stable characteristics like thermometers, thermistors and other slow acting sensors. === Reverse AVT === Collect n samples of data Calculate the standard deviation and average value Drop any data that is within one standard deviation ± average band Calculate average value of remaining data Present/record result as actual data This is useful for detecting minute signals that are close to background noise level. == Possible applications and uses == Use to filter data that is near or below noise level Used in planet detection to filter out raw data from the Kepler space telescope Filter out noise from sound sources where all other filtering methods (Low-pass filter, High-pass filter, Band-pass filter, Digital filter) fail. Pre-process scientific data for data analysis (Smoothness) before plotting see (Plot (graphics)) Used in SETI (Search for extraterrestrial intelligence) for detecting/distinguishing extraterrestrial signals from cosmic background Use AVT as image filtering algorithm to detect altered images. This image of Jupiter generated from this program, detecting alterations in original picture that was modified to be visually appealing by applying filters. Another version of this comparison is the Reverse AVT filter applied to the same original Jupiter Image, where we only see that altered portion as Noise that was eliminated by AVT algorithm. Use AVT as image filtering algorithm to estimate data density from images. Picture of Pillars of Creation Nebula shows data density in filtered images from Hubble and Webb. Note that image on the left has big patches of missing data marked with simpler color patterns.

    Read more →
  • Expectation propagation

    Expectation propagation

    Expectation propagation (EP) is a technique in Bayesian machine learning. EP finds approximations to a probability distribution. It uses an iterative approach that uses the factorization structure of the target distribution. It differs from other Bayesian approximation approaches such as variational Bayesian methods. More specifically, suppose we wish to approximate an intractable probability distribution p ( x ) {\displaystyle p(\mathbf {x} )} with a tractable distribution q ( x ) {\displaystyle q(\mathbf {x} )} . Expectation propagation achieves this approximation by minimizing the Kullback–Leibler divergence K L ( p | | q ) {\displaystyle \mathrm {KL} (p||q)} . Variational Bayesian methods minimize K L ( q | | p ) {\displaystyle \mathrm {KL} (q||p)} instead. If q ( x ) {\displaystyle q(\mathbf {x} )} is a Gaussian N ( x | μ , Σ ) {\displaystyle {\mathcal {N}}(\mathbf {x} |\mu ,\Sigma )} , then K L ( p | | q ) {\displaystyle \mathrm {KL} (p||q)} is minimized with μ {\displaystyle \mu } and Σ {\displaystyle \Sigma } being equal to the mean of p ( x ) {\displaystyle p(\mathbf {x} )} and the covariance of p ( x ) {\displaystyle p(\mathbf {x} )} , respectively; this is called moment matching. == Applications == Expectation propagation via moment matching plays a vital role in approximation for indicator functions that appear when deriving the message passing equations for TrueSkill.

    Read more →
  • Bottom-up and top-down approaches

    Bottom-up and top-down approaches

    Bottom-up and top-down are strategies of composition and decomposition in fields as diverse as information processing and ordering knowledge, software, humanistic and scientific theories (see systemics), time management, and organization. In practice they can be seen as a style of thinking, teaching, or leadership. A top-down approach (also known as stepwise design and stepwise refinement and in some cases used as a synonym of decomposition) is essentially the breaking down of a system to gain insight into its compositional subsystems in a reverse engineering fashion. In a top-down approach an overview of the system is formulated, specifying, but not detailing, any first-level subsystems. Each subsystem is then refined in yet greater detail, sometimes in many additional subsystem levels, until the entire specification is reduced to base elements. A top-down model is often specified with the assistance of black boxes, which makes it easier to manipulate. However, black boxes may fail to clarify elementary mechanisms or be detailed enough to realistically validate the model. A top-down approach starts with the big picture, then breaks down into smaller segments. A bottom-up approach is the piecing together of systems to give rise to more complex systems, thus making the original systems subsystems of the emergent system. Bottom-up processing is a type of information processing based on incoming data from the environment to form a perception. From a cognitive psychology perspective, information enters the eyes in one direction (sensory input, or the "bottom"), and is then turned into an image by the brain that can be interpreted and recognized as a perception (output that is "built up" from processing to final cognition). In a bottom-up approach the individual base elements of the system are first specified in great detail. These elements are then linked together to form larger subsystems, which then in turn are linked, sometimes in many levels, until a complete top-level system is formed. This strategy often resembles a "seed" model, by which the beginnings are small but eventually grow in complexity and completeness. But "organic strategies" may result in a tangle of elements and subsystems, developed in isolation and subject to local optimization as opposed to meeting a global purpose. == Computer science == === Software development === In the software development process, the top-down and bottom-up approaches play a key role. Top-down approaches emphasize planning and a complete understanding of the system. It is inherent that no coding can begin until a sufficient level of detail has been reached in the design of at least some part of the system. Top-down approaches are implemented by attaching the stubs in place of the module. But these delay testing of the ultimate functional units of a system until significant design is complete. Bottom-up emphasizes coding and early testing, which can begin as soon as the first module has been specified. But this approach runs the risk that modules may be coded without having a clear idea of how they link to other parts of the system, and that such linking may not be as easy as first thought. Re-usability of code is one of the main benefits of a bottom-up approach. Top-down design was promoted in the 1970s by IBM researchers Harlan Mills and Niklaus Wirth. Mills developed structured programming concepts for practical use and tested them in a 1969 project to automate the New York Times morgue index. The engineering and management success of this project led to the spread of the top-down approach through IBM and the rest of the computer industry. Among other achievements, Niklaus Wirth, the developer of Pascal programming language, wrote the influential paper Program Development by Stepwise Refinement. Since Niklaus Wirth went on to develop languages such as Modula and Oberon (where one could define a module before knowing about the entire program specification), one can infer that top-down programming was not strictly what he promoted. Top-down methods were favored in software engineering until the late 1980s, and object-oriented programming assisted in demonstrating the idea that both aspects of top-down and bottom-up programming could be used. Modern software design approaches usually combine top-down and bottom-up approaches. Although an understanding of the complete system is usually considered necessary for good design—leading theoretically to a top-down approach—most software projects attempt to make use of existing code to some degree. Pre-existing modules give designs a bottom-up flavor. === Programming === Top-down is a programming style, the mainstay of traditional procedural languages, in which design begins by specifying complex pieces and then dividing them into successively smaller pieces. The technique for writing a program using top-down methods is to write a main procedure that names all the major functions it will need. Later, the programming team looks at the requirements of each of those functions and the process is repeated. These compartmentalized subroutines eventually will perform actions so simple they can be easily and concisely coded. When all the various subroutines have been coded the program is ready for testing. By defining how the application comes together at a high level, lower-level work can be self-contained. In a bottom-up approach the individual base elements of the system are first specified in great detail. These elements are then linked together to form larger subsystems, which in turn are linked, sometimes at many levels, until a complete top-level system is formed. This strategy often resembles a "seed" model, by which the beginnings are small, but eventually grow in complexity and completeness. Object-oriented programming (OOP) is a paradigm that uses "objects" to design applications and computer programs. In mechanical engineering with software programs such as Pro/ENGINEER, Solidworks, and Autodesk Inventor users can design products as pieces not part of the whole and later add those pieces together to form assemblies like building with Lego. Engineers call this "piece part design". === Parsing === Parsing is the process of analyzing an input sequence (such as that read from a file or a keyboard) in order to determine its grammatical structure. This method is used in the analysis of both natural languages and computer languages, as in a compiler. Bottom-up parsing is parsing strategy that recognizes the text's lowest-level small details first, before its mid-level structures, and leaves the highest-level overall structure to last. In top-down parsing, on the other hand, one first looks at the highest level of the parse tree and works down the parse tree by using the rewriting rules of a formal grammar. == Natural sciences == === Nanotechnology === Top-down and bottom-up are two approaches for the manufacture of products. These terms were first applied to the field of nanotechnology by the Foresight Institute in 1989 to distinguish between molecular manufacturing (to mass-produce large atomically precise objects) and conventional manufacturing (which can mass-produce large objects that are not atomically precise). Bottom-up approaches seek to have smaller (usually molecular) components built up into more complex assemblies, while top-down approaches seek to create nanoscale devices by using larger, externally controlled ones to direct their assembly. Certain valuable nanostructures, such as Silicon nanowires, can be fabricated using either approach, with processing methods selected on the basis of targeted applications. A top-down approach often uses the traditional workshop or microfabrication methods where externally controlled tools are used to cut, mill, and shape materials into the desired shape and order. Micropatterning techniques, such as photolithography and inkjet printing belong to this category. Vapor treatment can be regarded as a new top-down secondary approaches to engineer nanostructures. Bottom-up approaches, in contrast, use the chemical properties of single molecules to cause single-molecule components to (a) self-organize or self-assemble into some useful conformation, or (b) rely on positional assembly. These approaches use the concepts of molecular self-assembly and/or molecular recognition. See also Supramolecular chemistry. Such bottom-up approaches should, broadly speaking, be able to produce devices in parallel and much cheaper than top-down methods but could potentially be overwhelmed as the size and complexity of the desired assembly increases. === Neuroscience and psychology === These terms are also employed in cognitive sciences including neuroscience, cognitive neuroscience and cognitive psychology to discuss the flow of information in processing. Typically, sensory input is considered bottom-up, and higher cognitive processes, which have more information from other sources, are considered top-down. A bottom-up proc

    Read more →
  • Universal Data Element Framework

    Universal Data Element Framework

    The Universal Data Element Framework (UDEF) was a controlled vocabulary developed by The Open Group. It provided a framework for categorizing, naming, and indexing data. It assigned to every item of data a structured alphanumeric tag plus a controlled vocabulary name that describes the meaning of the data. This allowed relating data elements to similar elements defined by other organizations. UDEF defined a Dewey-decimal like code for each concept. For example, an "employee number" is often used in human resource management. It has a UDEF tag a.5_12.35.8 and a controlled vocabulary description "Employee.PERSON_Employer.Assigned.IDENTIFIER". UDEF has been superseded by the Open Data Element Framework (ODEF). == Examples == In an application used by a hospital, the last name and first name of several people could include the following example concepts: Patient Person Family Name – find the word “Patient” under the UDEF object “Person” and find the word “Family” under the UDEF property “Name” Patient Person Given Name – find the word “Patient” under the UDEF object “Person” and find the word “Given” under the UDEF property “Name” Doctor Person Family Name – find the word “Doctor” under the UDEF object “Person” and find the word “Family” under the UDEF property “Name” Doctor Person Given Name – find the word “Doctor” under the UDEF object “Person” and find the word “Given” under the UDEF property “Name” For the examples above, the following UDEF IDs are available: “Patient Person Family Name” the UDEF ID is “au.5_11.10” “Patient Person Given Name” the UDEF ID is “au.5_12.10” “Doctor Person Family Name” the UDEF ID is “aq.5_11.10” “Doctor Person Given Name” the UDEF ID is “aq.5_12.10”

    Read more →