AI Bot Grammar Checker

AI Bot Grammar Checker — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Verbot

    Verbot

    The Verbot (short for Verbal-Robot) was a chatbot program and artificial intelligence software development kit (SDK) designed for Windows and web platforms. == Early beginning == The origin of verbot traces back to Michael Mauldin's research during his time as a graduate student and post-doctoral fellow at Carnegie Mellon University. The creative foundation also stems from Peter Plantec's work in personality psychology and art direction. === Historic outline === In 1994, Michael Loren Mauldin, founder of Lycos, Inc., developed a prototype chatbot, Julia, which competed in the internationally known Turing test, for the coveted Loebner Prize. The Turing test matches computer scientist judges against machines to see if they can distinguish a computer from a real human. Julia was refined and developed, and in 1997, Dr. Mauldin and Peter Plantec, a clinical psychologist and animator, formed Virtual Personalities, Inc. (now Conversive, Inc.) in order to create a virtual human interface that would incorporate real-time animation as well as speech and natural language processing. The initial release, a stand-alone virtual person called Sylvie, was beta-tested to the public. This release was well received, and finally, after several versions, the production release (deemed version 3) of the Verbally Enhanced Software Robot, or Verbot, was deployed in fall 2000. The grandfather of all Verbots is Rog-O-Matic, which, although it could not talk, could and did explore a virtual world. Julia has been active on the internet in one form or another since 1989. A close cousin of Julia is Lycos, a robot that explores the World Wide Web and answers questions about it. Sylvie was the first Verbot with a face and a voice. Sylvie was the first Virtual Human with advanced, flexible interfacing capability. === Beginnings === The Virtual Personalities story goes back to 1978, where Mauldin was attending Rice University. Fascinated by the idea of ELIZA, he proceeded to write a program called "PET" for his 8 kilobyte Commodore PET Computer. PET included simple induction as a way to post new information, for example: Subject: I like my friend (later) Subject: I like food. PET: I have heard that food is your friend. Meanwhile, Plantec was separately designing a personality for "Entity", a theoretical virtual human that would interact comfortably with humans without pretending to be one. At that time the technology was not advanced enough to realize Entity. Mauldin got so involved with this that he majored in Computer Science and minored in Linguistics. === Rogue === In the late seventies and early eighties, a popular computer game at universities was Rogue, an implementation of Dungeons and Dragons where the player would descend 26 levels in a randomly created dungeon, fighting monsters, gathering treasure, and searching for the elusive "Amulet of Yendor". Mauldin was one of four grad students who devoted a large amount of time to building a program called "Rog-O-Matic" capable of retrieving the amulet and emerging victorious from the dungeon. === TinyMUD === In 1989, when James Aspnes at Carnegie Mellon created the first TinyMUD (a descendant of MUD and AberMUD), Mauldin was one of the first to create a computer player that would explore the text-based world of TinyMUD. But his first robot, Gloria, gradually accreted more and more linguistic ability, to the point that it could pass the "unsuspecting" Turing test. In this version of the test, the human has no reason to suspect that one of the other occupants of the room is controlled by a computer, and so is more polite and asks fewer probing questions. The second generation of Mauldin's TinyMUD robots was Julia, created on Jan. 8, 1990. Julia slowly developed into a more and more capable conversational agent, and assumed useful duties in the TinyMUD world, including tour guide, information assistant, note-taker, and message-relayer. She could even play the card game hearts along with the other human players. In 1991, Julia attended the first Loebner Prize contest in Boston, Massachusetts. Although she only finished third, she was ranked by one judge as more human than one of the human confederates, winning a coveted certificate of humanness in the world's first restricted Turing test. Julia continued to log in to various TinyMUD's and TinyMucks for the next seven years, and chatted with hundreds of people a month over the internet. === Lycos === Julia's job was to explore a virtual world consisting of pages of textual descriptions, with links between them, and to construct an internal map of that world and answer questions about it (including path information such as the shortest route from one room to another, and matching information, such as which rooms contained a certain kind of object or textual description). It was therefore only a very short cognitive leap from Julia to Lycos, another robotic agent that explores a virtual world made of hyperlinked pages of text, and which answers questions about those pages. Sylvie was born and her abilities were expanded greatly to include interfacing with computers and control systems via her serial ports. === Sylvie === Sylvie was the first intelligent animated virtual human. She was designed both as a conversation agent and as a virtual human interface that would form a bridge between the two. She became more popular as a conversation agent, but her designers believe she serves as a prototype for future virtual human interface design that will help us all cope with the increasing complexity of technology. As an aside, Plantec noticed that a large number of Sylvies have been sold in Southeast Asia. Upon investigation, he found out that students had discovered a "test" mode that would allow them to type in English sentences that Sylvie would pronounce in her somewhat stylized English. == Ownership == In 1997, Dr. Mauldin and Peter Plantec formed Virtual Personalities, Inc. to create Natural Language Processing solutions for companies. In 2001 Virtual Personalities, Inc. became Conversive, Inc. to reflect the focus on providing Customer Service and Marketing to the Enterprise Market. In late 2012 Avaya, Inc. acquired Conversive's assets including Verbots. == Verbot versions == The Verbot 4 version was created and released in 2004. In 2005 Version 4.1 of the Verbot Software was released with many feature enhancements and bug fixes, including built-in support for embedding C# code in outputs and conditionals. In early 2006 Conversive launched Verbots Online allowing Verbot 4 users to upload their knowledge and show off their bots to the world. In 2009 Version 5 was released, completely free and fully featured. In early 2012 the last version of Verbot, 5.0.1.2, was released to the general public with support for Windows 7. Later in 2012 Verbots Online completely shut down. == Verbots today == Verbots.com, its community of users, and its forums no longer exist, but the software and users can still be found. There has been no active development since the early 2012 release of Verbot 5.0.1.2.

    Read more →
  • PureXML

    PureXML

    pureXML is the native XML storage feature in the IBM Db2 data server. pureXML provides query languages, storage technologies, indexing technologies, and other features to support XML data. The word pure in pureXML was chosen to indicate that Db2 natively stores and natively processes XML data in its inherent hierarchical structure, as opposed to treating XML data as plain text or converting it into a relational format. == Technical information == Db2 includes two distinct storage mechanisms: one for efficiently managing traditional SQL data types, and another for managing XML data. The underlying storage mechanism is transparent to users and applications; they simply use SQL (including SQL with XML extensions or SQL/XML) or XQuery to work with the data. XML data is stored in columns of Db2 tables that have the XML data type. XML data is stored in a parsed format that reflects the hierarchical nature of the original XML data. As such, pureXML uses trees and nodes as its model for storing and processing XML data. If you instruct Db2 to validate XML data against an XML schema prior to storage, Db2 annotates all nodes in the XML hierarchy with information about the schema types; otherwise, it will annotate the nodes with default type information. Upon storage, Db2 preserves the internal structure of XML data, converting its tag names and other information into integer values. Doing so helps conserve disk space and also improves the performance of queries that use navigational expressions. However, users aren't aware of this internal representation. Finally, Db2 automatically splits XML nodes across multiple database pages, as needed. XML schemas specify which XML elements are valid, in what order these elements should appear in XML data, which XML data types are associated with each element, and so on. pureXML allows you to validate the cells in a column of XML data against no schema, one schema, or multiple schemas. pureXML also provides tools to support evolving XML schemas. IBM has enhanced its programming language interfaces to support access to its XML data. These enhancements span Java (JDBC), C (embedded SQL and call-level interface), COBOL (embedded SQL), PHP, and Microsoft's .NET Framework (through the DB2.NET provider). == History == pureXML was first included in the DB2 9 for Linux, Unix, and Microsoft Windows release, which was codenamed Viper, in June 2006. It was available on DB2 9 for z/OS in March 2007. In October 2007, IBM released DB2 9.5 with improved XML data transaction performance and improved storage savings. In June 2009, IBM released DB2 9.7 with XML supported for database-partitioned, range-partitioned, and multi-dimensionally clustered tables as well as compression of XML data and indices. == Competition == Db2 is a hybrid data server—it offers data management for traditional relational data, as well as providing native XML data management. Other vendors that offer data management for both relational data and native XML storage include Oracle with its 11g product and Microsoft with its SQL Server product. pureXML also competes with native XML databases like BaseX, eXist, MarkLogic or Sedna. == Books == IBM International Technical Support Organization (ITSO) has published the following books, which are available in print or as free e-books: DB2 9: pureXML Overview and Fast Start DB2 9 pureXML Guide The following books are also available for purchase: DB2 pureXML Cookbook: Master the Power of IBM Hybrid Data Server == Education and training == The following pureXML classroom and online courses are available from IBM Education: Query and Manage XML Data with DB2 9. IBM course CG130. Classroom. Duration: 4 days. Query XML Data with DB2 9. IBM course CG100. Classroom. Duration: 2 days (first 2 days of CG130). Managing XML Data in DB2 9. IBM course CG160. Classroom. Duration: 2 days (last 2 days of CG130). DB2 pureXML. IBM Course CT140. Self-paced study plus Live Virtual Classroom.

    Read more →
  • Lai–Robbins lower bound

    Lai–Robbins lower bound

    The Lai–Robbins lower bound gives an asymptotic lower bound on the regret that any uniformly good algorithm must incur in the stochastic multi-armed bandit problem. The original result was proved by Tze Leung Lai and Herbert Robbins in 1985 for parametric exponential families. Later work extended the statement to more general classes of distributions. == Multi-armed bandit problem == The multi-armed bandit problem (MAB) is a sequential game in which the player must trade off exploration (to learn) and exploitation (to earn). The player chooses among K {\displaystyle K} actions (arms) with unknown distributions ν = ( ν 1 , … , ν K ) {\displaystyle \nu =(\nu _{1},\dots ,\nu _{K})} . The player is assumed to know a class of distributions D {\displaystyle {\mathcal {D}}} such that for every k {\displaystyle k} one has ν k ∈ D {\displaystyle \nu _{k}\in {\mathcal {D}}} (for example, D {\displaystyle {\mathcal {D}}} may be the family of Gaussian or Bernoulli distributions). At each round t = 1 , … , T {\displaystyle t=1,\dots ,T} the player selects (pulls) an arm a t {\displaystyle a_{t}} and observes a reward X t ∼ ν a t {\displaystyle X_{t}\sim \nu _{a_{t}}} . We denote N a ( t ) := ∑ s = 1 t 1 { a s = a } {\displaystyle N_{a}(t):=\sum _{s=1}^{t}\mathbf {1} _{\{a_{s}=a\}}} the number of times arm a {\displaystyle a} has been pulled in the first t {\displaystyle t} rounds, μ ( ν ) := ( μ 1 , … , μ K ) {\displaystyle \mu (\nu ):=(\mu _{1},\dots ,\mu _{K})} the vector of arm means, where μ k = E X ∼ ν k [ X ] {\displaystyle \mu _{k}=\mathbb {E} _{X\sim \nu _{k}}[X]} , μ ∗ := max a μ a {\displaystyle \mu ^{}:=\max _{a}\mu _{a}} the highest mean Δ a := μ ∗ − μ a ≥ 0 {\displaystyle \Delta _{a}:=\mu ^{}-\mu _{a}\geq 0} the gap of arm a {\displaystyle a} . An arm a {\displaystyle a} with μ a = μ ∗ {\displaystyle \mu _{a}=\mu ^{}} is called an optimal arm; otherwise it is a suboptimal arm. The goal is to minimize the regret at horizon T {\displaystyle T} , defined by R T := ∑ a = 1 K Δ a E [ N a ( T ) ] . {\displaystyle R_{T}:=\sum _{a=1}^{K}\Delta _{a}\,\mathbb {E} [N_{a}(T)].} Intuitively, the regret is the (expected) total loss compared to always playing an optimal arm: regret = ∑ a ( cost of playing a ) × ( times a is played ) . {\displaystyle {\text{regret}}=\sum _{a}\ ({\text{cost of playing }}a)\times ({\text{times }}a{\text{ is played}}).} An MAB algorithm is a (possibly randomized) policy that, at each round t {\displaystyle t} , choose an arm a_t by using the observations received from previous turns. === Intuitive example === Suppose a farmer must choose, each year, one of K {\displaystyle K} seed varieties to plant. Each variety k {\displaystyle k} has an unknown average yield μ k {\displaystyle \mu _{k}} . If the farmer knew the best variety (with mean μ ∗ {\displaystyle \mu ^{}} ) he would plant it every year; in reality he must try varieties to learn which is best. The cumulative regret after T {\displaystyle T} years measures the total expected loss in yield due to imperfect knowledge. Remarks The model above is the stochastic MAB; there also exist adversarial variants. One may consider a fixed-horizon setting (known T {\displaystyle T} ) or an anytime setting (unknown T {\displaystyle T} ). == Lai–Robbins lower bound == The theorem gives the right amount of time we should pull a suboptimal arm k {\displaystyle k} to distinguish whether we are in the instance with ν k {\displaystyle \nu _{k}} or with ν ~ k {\displaystyle {\tilde {\nu }}_{k}} where ν ~ k {\displaystyle {\tilde {\nu }}_{k}} is such that μ ~ k > μ ∗ {\displaystyle {\tilde {\mu }}_{k}>\mu ^{}} . Knowning a lower bound on the number of pull of every suboptimal arm gives a lower bound on the regret as only suboptimal arms contribute to the regret. Before stating the formal theorem we need to define what is a consistent algorithm. === Consistency (uniformly good algorithms) === Let D {\displaystyle {\mathcal {D}}} be a class of probability distributions and consider K {\displaystyle K} arms with reward distributions ν = ( ν 1 , … , ν K ) ∈ D K {\displaystyle \nu =(\nu _{1},\dots ,\nu _{K})\in {\mathcal {D}}^{K}} . An algorithm is said to be consistent (also called uniformly good) on D K {\displaystyle {\mathcal {D}}^{K}} if, for every instance ν ∈ D K {\displaystyle \nu \in {\mathcal {D}}^{K}} , the expected regret R T ( ν ) {\displaystyle R_{T}(\nu )} grows subpolynomially: ∀ α > 0 , R T ( ν ) = o ( T α ) as T → ∞ {\displaystyle \forall \alpha >0,\qquad R_{T}(\nu )=o(T^{\alpha })\quad {\text{as }}T\to \infty } This assumption excludes algorithms that perform well on some instances but incur linear regret on others. === Formal lower bound === For any suboptimal arm a {\displaystyle a} . For a distribution ν a ∈ D {\displaystyle \nu _{a}\in {\mathcal {D}}} and a threshold x {\displaystyle x} , define K inf ( ν a , x , D ) := inf { KL ⁡ ( ν a , ν ′ ) : ν ′ ∈ D , μ ′ > x } {\displaystyle {\mathcal {K}}_{\inf }(\nu _{a},x,{\mathcal {D}}):=\inf {\Bigl \{}\operatorname {KL} (\nu _{a},\nu '):\nu '\in {\mathcal {D}},\ \mu '>x{\Bigr \}}} where KL ⁡ ( ⋅ , ⋅ ) {\displaystyle \operatorname {KL} (\cdot ,\cdot )} denotes the Kullback-Leibler divergence. Then, for any algorithm consistent on D K {\displaystyle {\mathcal {D}}^{K}} and for every instance ν ∈ D K {\displaystyle \nu \in {\mathcal {D}}^{K}} , every suboptimal arm a {\displaystyle a} satisfies E ν [ N a ( T ) ] ≥ ln ⁡ T K inf ( ν a , μ ∗ , D ) + o ( ln ⁡ T ) {\displaystyle \mathbb {E} _{\nu }[N_{a}(T)]\geq {\frac {\ln T}{{\mathcal {K}}_{\inf }(\nu _{a},\mu ^{},{\mathcal {D}})}}+o(\ln T)} Consequently, the regret satisfies R T ( ν ) ≥ ( ∑ a : μ a < μ ∗ Δ a K inf ( ν a , μ ∗ , D ) ) ln ⁡ T + o ( ln ⁡ T ) {\displaystyle R_{T}(\nu )\geq \left(\sum _{a:\,\mu _{a}<\mu ^{}}{\frac {\Delta _{a}}{{\mathcal {K}}_{\inf }(\nu _{a},\mu ^{},{\mathcal {D}})}}\right)\ln T+o(\ln T)} The original 1985 paper established this result for exponential families; later work showed that the bound holds under much weaker assumptions on D {\displaystyle {\mathcal {D}}} . === Intuition === Consistency imposes that, for every ν {\displaystyle \nu } , the number of pulls of an optimal arm must be large. This means that μ ∗ {\displaystyle \mu ^{}} is estimated very accurately. The goal is to determine, for a suboptimal arm k {\displaystyle k} , how many samples are needed to be confident, with the appropriate level of confidence, that μ k < μ ∗ {\displaystyle \mu _{k}<\mu ^{}} . To do so, we use what is called the most confusing instance: an instance close to ν {\displaystyle \nu } such that arm k {\displaystyle k} is optimal. We define it as ν ~ {\displaystyle {\tilde {\nu }}} such that, for all a ≠ k {\displaystyle a\neq k} , ν ~ a = ν a {\displaystyle {\tilde {\nu }}_{a}=\nu _{a}} , and ν ~ k {\displaystyle {\tilde {\nu }}_{k}} is chosen so that μ ~ k > μ ∗ {\displaystyle {\tilde {\mu }}_{k}>\mu ^{}} . The objective is to determine how many samples of arm k {\displaystyle k} are required to distinguish whether we are in the instance with ν k {\displaystyle \nu _{k}} or with ν ~ k {\displaystyle {\tilde {\nu }}_{k}} in terms of KL {\displaystyle \operatorname {KL} } distance. == Algorithms achieving the Lai–Robbins lower bound == Several algorithms are known to achieve the Lai–Robbins asymptotic lower bound under specific assumptions on the reward distribution class D {\displaystyle {\mathcal {D}}} . The following list summarizes a non-exhaustive list of algorithms matching the lower bound. == Extension to other problems == === Structured bandit === A more complexe is structured bandit where we know that the mean of each arm is in a set with some restriction. In this case we can prove a smaller lower bound that use the knowledge of this set. === Best arm identification (BAI) === A similar result has been proved for best arm identification, which is the same game except that, instead of minimizing the regret, the goal is to identify the best arm with probability 1 − δ {\displaystyle 1-\delta } using as few rounds as possible. === Reinforcement Learning (RL) === Similar results have been proved for regret minimization in average-reward reinforcement learning. The order is also ln ⁡ T {\displaystyle \ln T} , with a constant that depends on the problem.

    Read more →
  • Randomized rounding

    Randomized rounding

    In computer science and operations research, randomized rounding is a widely used approach for designing and analyzing approximation algorithms. Many combinatorial optimization problems are computationally intractable to solve exactly (to optimality). For such problems, randomized rounding can be used to design fast (polynomial time) approximation algorithms—that is, algorithms that are guaranteed to return an approximately optimal solution given any input. The basic idea of randomized rounding is to convert an optimal solution of a relaxation of the problem into an approximately-optimal solution to the original problem. The resulting algorithm is usually analyzed using the probabilistic method. == Overview == The basic approach has three steps: Formulate the problem to be solved as an integer linear program (ILP). Compute an optimal fractional solution x {\displaystyle x} to the linear programming relaxation (LP) of the ILP. Round the fractional solution x {\displaystyle x} of the LP to an integer solution x ′ {\displaystyle x'} of the ILP. (Although the approach is most commonly applied with linear programs, other kinds of relaxations are sometimes used. For example, see Goemans' and Williamson's semidefinite programming-based Max-Cut approximation algorithm.) In the first step, the challenge is to choose a suitable integer linear program. Familiarity with linear programming, in particular modelling using linear programs and integer linear programs, is required. For many problems, there is a natural integer linear program that works well, such as in the Set Cover example below. (The integer linear program should have a small integrality gap; indeed randomized rounding is often used to prove bounds on integrality gaps.) In the second step, the optimal fractional solution can typically be computed in polynomial time using any standard linear programming algorithm. In the third step, the fractional solution must be converted into an integer solution (and thus a solution to the original problem). This is called rounding the fractional solution. The resulting integer solution should (provably) have cost not much larger than the cost of the fractional solution. This will ensure that the cost of the integer solution is not much larger than the cost of the optimal integer solution. The main technique used to do the third step (rounding) is to use randomization, and then to use probabilistic arguments to bound the increase in cost due to the rounding (following the probabilistic method from combinatorics). Therein, probabilistic arguments are used to show the existence of discrete structures with desired properties. In this context, one uses such arguments to show the following: Given any fractional solution x {\displaystyle x} of the LP, with positive probability the randomized rounding process produces an integer solution x ′ {\displaystyle x'} that approximates x {\displaystyle x} according to some desired criterion. Finally, to make the third step computationally efficient, one either shows that x ′ {\displaystyle x'} approximates x {\displaystyle x} with high probability (so that the step can remain randomized) or one derandomizes the rounding step, typically using the method of conditional probabilities. The latter method converts the randomized rounding process into an efficient deterministic process that is guaranteed to reach a good outcome. == Example: the set cover problem == The following example illustrates how randomized rounding can be used to design an approximation algorithm for the set cover problem. Fix any instance ⟨ c , S ⟩ {\displaystyle \langle c,{\mathcal {S}}\rangle } of set cover over a universe U {\displaystyle {\mathcal {U}}} . === Computing the fractional solution === For step 1, let IP be the standard integer linear program for set cover for this instance. For step 2, let LP be the linear programming relaxation of IP, and compute an optimal solution x ∗ {\displaystyle x^{}} to LP using any standard linear programming algorithm. This takes time polynomial in the input size. The feasible solutions to LP are the vectors x {\displaystyle x} that assign each set s ∈ S {\displaystyle s\in {\mathcal {S}}} a non-negative weight x s {\displaystyle x_{s}} , such that, for each element e ∈ U {\displaystyle e\in {\mathcal {U}}} , x ′ {\displaystyle x'} covers e {\displaystyle e} —the total weight assigned to the sets containing e {\displaystyle e} is at least 1, that is, ∑ s ∋ e x s ≥ 1. {\displaystyle \sum _{s\ni e}x_{s}\geq 1.} The optimal solution x ∗ {\displaystyle x^{}} is a feasible solution whose cost ∑ s ∈ S c ( S ) x s ∗ {\displaystyle \sum _{s\in {\mathcal {S}}}c(S)x_{s}^{}} is as small as possible. Note that any set cover C {\displaystyle {\mathcal {C}}} for S {\displaystyle {\mathcal {S}}} gives a feasible solution x {\displaystyle x} (where x s = 1 {\displaystyle x_{s}=1} for s ∈ C {\displaystyle s\in {\mathcal {C}}} , x s = 0 {\displaystyle x_{s}=0} otherwise). The cost of this C {\displaystyle {\mathcal {C}}} equals the cost of x {\displaystyle x} , that is, ∑ s ∈ C c ( s ) = ∑ s ∈ S c ( s ) x s . {\displaystyle \sum _{s\in {\mathcal {C}}}c(s)=\sum _{s\in {\mathcal {S}}}c(s)x_{s}.} In other words, the linear program LP is a relaxation of the given set-cover problem. Since x ∗ {\displaystyle x^{}} has minimum cost among feasible solutions to the LP, the cost of x ∗ {\displaystyle x^{}} is a lower bound on the cost of the optimal set cover. === Randomized rounding step === In step 3, we must convert the minimum-cost fractional set cover x ∗ {\displaystyle x^{}} into a feasible integer solution x ′ {\displaystyle x'} (corresponding to a true set cover). The rounding step should produce an x ′ {\displaystyle x'} that, with positive probability, has cost within a small factor of the cost of x ∗ {\displaystyle x^{}} .Then (since the cost of x ∗ {\displaystyle x^{}} is a lower bound on the cost of the optimal set cover), the cost of x ′ {\displaystyle x'} will be within a small factor of the optimal cost. As a starting point, consider the most natural rounding scheme: For each set s ∈ S {\displaystyle s\in {\mathcal {S}}} in turn, take x s ′ = 1 {\displaystyle x'_{s}=1} with probability min ( 1 , x s ∗ ) {\displaystyle \min(1,x_{s}^{})} , otherwise take x s ′ = 0 {\displaystyle x'_{s}=0} . With this rounding scheme, the expected cost of the chosen sets is at most ∑ s c ( s ) x s ∗ {\displaystyle \sum _{s}c(s)x_{s}^{}} , the cost of the fractional cover. This is good. Unfortunately the coverage is not good. When the variables x s ∗ {\displaystyle x_{s}^{}} are small, the probability that an element e {\displaystyle e} is not covered is about ∏ s ∋ e 1 − x s ∗ ≈ ∏ s ∋ e exp ⁡ ( − x s ∗ ) = exp ⁡ ( − ∑ s ∋ e x s ∗ ) ≈ exp ⁡ ( − 1 ) . {\displaystyle \prod _{s\ni e}1-x_{s}^{}\approx \prod _{s\ni e}\exp(-x_{s}^{})=\exp {\Big (}-\sum _{s\ni e}x_{s}^{}{\Big )}\approx \exp(-1).} So only a constant fraction of the elements will be covered in expectation. To make x ′ {\displaystyle x'} cover every element with high probability, the standard rounding scheme first scales up the rounding probabilities by an appropriate factor λ > 1 {\displaystyle \lambda >1} . Here is the standard rounding scheme: Fix a parameter λ ≥ 1 {\displaystyle \lambda \geq 1} . For each set s ∈ S {\displaystyle s\in {\mathcal {S}}} in turn, take x s ′ = 1 {\displaystyle x'_{s}=1} with probability min ( λ x s ∗ , 1 ) {\displaystyle \min(\lambda x_{s}^{},1)} , otherwise take x s ′ = 0 {\displaystyle x'_{s}=0} . Scaling the probabilities up by λ {\displaystyle \lambda } increases the expected cost by λ {\displaystyle \lambda } , but makes coverage of all elements likely. The idea is to choose λ {\displaystyle \lambda } as small as possible so that all elements are provably covered with non-zero probability. Here is a detailed analysis. ==== Lemma (approximation guarantee for rounding scheme) ==== Fix λ = ln ⁡ ( 2 | U | ) {\displaystyle \lambda =\ln(2|{\mathcal {U}}|)} . With positive probability, the rounding scheme returns a set cover x ′ {\displaystyle x'} of cost at most 2 ln ⁡ ( 2 | U | ) c ⋅ x ∗ {\displaystyle 2\ln(2|{\mathcal {U}}|)c\cdot x^{}} (and thus of cost O ( log ⁡ | U | ) {\displaystyle O(\log |{\mathcal {U}}|)} times the cost of the optimal set cover). (Note: with care the O ( log ⁡ | U | ) {\displaystyle O(\log |{\mathcal {U}}|)} can be reduced to ln ⁡ ( | U | ) + O ( log ⁡ log ⁡ | U | ) {\displaystyle \ln(|{\mathcal {U}}|)+O(\log \log |{\mathcal {U}}|)} .) ==== Proof ==== The output x ′ {\displaystyle x'} of the random rounding scheme has the desired properties as long as none of the following "bad" events occur: the cost c ⋅ x ′ {\displaystyle c\cdot x'} of x ′ {\displaystyle x'} exceeds 2 λ c ⋅ x ∗ {\displaystyle 2\lambda c\cdot x^{}} , or for some element e {\displaystyle e} , x ′ {\displaystyle x'} fails to cover e {\displaystyle e} . The expectation of each x s ′ {\displaystyle x'_{s}} is at most λ x s ∗ {\displaystyle \lambda x_{s

    Read more →
  • Core FTP

    Core FTP

    Core FTP LE is a freeware secure FTP client for Windows, developed by CoreFTP.com. Features include FTP, SSL/TLS, SFTP via SSH, and HTTP/HTTPS support. Secure FTP clients encrypt account information and data transferred across the internet, protecting data from being seen, or sniffed across networks. Core FTP is a traditional FTP client with local files displayed on the left, remote files on the right. Core FTP Server is a secure FTP server for Windows, developed by CoreFTP.com, starting in 2010. == Licensing == CoreFTP LE is free for personal, educational, non-profit, and business use.

    Read more →
  • Bibliometrician

    Bibliometrician

    A bibliometrician is a researcher or a specialist in bibliometrics. It is near-synonymous with an informetrican (who studies informetrics), a scientometrican (who study scientometrics) and a webometrician, who study webometrics. == Notable bibliometricians == Christine L. Borgman Samuel C. Bradford Blaise Cronin Margaret Elizabeth Egan Eugene Garfield (developer of the Science Citation Index and the Impact factor) Jorge E. Hirsch (developer of the h-index) Alfred J. Lotka Vasily Nalimov Derek J. de Solla Price Ronald Rousseau George Kingsley Zipf

    Read more →
  • Living lab

    Living lab

    The concept of the living lab has been defined in multiple ways. A definition from the European Network of Living Labs (ENoLL) is used most widely, describing them as "user-centred open innovation ecosystems” that integrate research and innovation through co-creation in real-world environments.[1] Emerging at the intersection of ambient intelligence research and user experience methodologies in the late 1990s, the concept was pioneered at the Massachusetts Institute of Technology (MIT) as a way to study human interaction with new technologies in natural settings. Over time, living labs have evolved beyond their origins as controlled research environments, becoming dynamic platforms for participatory design, collaborative experimentation, and iterative innovation across various domains, including urban development, healthcare, sustainability, and digital technology. Characterized by principles such as real-world experimentation, active user involvement, and multi-stakeholder collaboration, living labs enable the continuous adaptation and validation of solutions in everyday contexts. Today, they are implemented globally, supported by networks like the European Network of Living Labs (ENoLL), and increasingly recognized as vital tools for addressing local and global transformation agendas. == Background == The term "living lab" has emerged in parallel from the ambient intelligence (AmI) research communities context and from the discussion on experience and application research (EAR). The emergence of the term is based on the concept of user experience and ambient intelligence. The term dates back to the late 1990s when Professor William J. Mitchell, Kent Larson, and Alex (Sandy) Pentland at the Massachusetts Institute of Technology were credited with first exploring the concept of a living laboratory. It was first associated with MIT's Media Lab as a concept for studying real-life contexts, where they described a living lab as a controlled environment designed to test new information and communication technology (ICT) innovations in a simulated home setting. This was also when some of the key characteristics often assigned to living labs today began to take shape. They argued that a living lab represents a user-centric research methodology for sensing, prototyping, validating and refining complex solutions in multiple and evolving real-life contexts. Research on living labs has expanded since the 1990s, especially in the 2010s, with growing interest in co-creation and participatory design. Particularly in Europe, the living lab evolved into a model that focused on studying user interactions with technology in real-world environments. This shift was influenced by earlier experiences in participatory design and social experiments with ICT. As interest grew, the term began to encompass a broader array of initiatives and projects, leading to variations in its interpretation and implementation. Today, living labs are used in various fields, such as technology, healthcare, and urban sustainability, showing a transition from a narrow focus on their role as controlled environments to a more wide-ranging understanding of collaborative innovation addressing real societal challenges, while also being referred to with various descriptions and definitions available from different sources. == Description == The ENoLL definition that refers to living labs as "user-centred open innovation ecosystems” that integrate research and innovation through co-creation in real-world environments is the most widely accepted description of living labs in academic literature. In simple terms, living labs can be described as an organization or experimental space, that can be both virtually or physically located, bringing different stakeholders from research, business, government, and citizens together to design and test solutions to be implemented in a real world environment. A common definition for the living lab term still does not exist to this day, which is due to the fact that living labs are interpreted and implemented across different contexts and can cover a wide range of activities and organizations, leading to different understandings of how living labs should function. Living labs also often operate in various territorial contexts (e.g. city, agglomeration, region, campus), and can vary in their methodological approach integrating concurrent research and innovation processes within a public-private-people partnership. Despite these variations, common characteristics include user-centricity, real-world experimentation, multi-stakeholder collaboration, and iterative innovation processes. The systematic user co-creation approach refers to integrating research and innovation processes through the co-creation, exploration, experimentation and evaluation of innovative ideas, scenarios, concepts and related technological artefacts in real life use cases. Such use cases involve user communities, not only as observed subjects but also as a source of creation. This approach allows all involved stakeholders to concurrently consider both the global performance of a product or service and its potential adoption by users. This consideration may be made at the earlier stage of research and development and through all elements of the product life-cycle, from design up to recycling. User-centred research methods, such as action research, community informatics, contextual design, user-centered design, participatory design, empathic design, emotional design, and other usability methods, already exist but fail to sufficiently empower users for co-creating into open development environments. More recently, the Web 2.0 has demonstrated the positive impact of involving user communities in new product development (NPD) such as mass collaboration projects (e.g. crowdsourcing, Wisdom of Crowds) in collectively creating new contents and applications. Real-world experimentation emphasizes conducting activities in real-life settings to ensure that the results of the projects and solutions are applicable to actual market conditions. Multi-stakeholder collaboration refers to an approach that involved various stakeholders, such as users, businesses, researchers, and government entities, working together towards a common goal. This is an important characteristics of living lab because collaboration of these diverse groups allows for exchange of ideas and perspectives, which are thought to enhance innovation processes. Iterative innovation processes involve a cyclical method of developing products or services, where stages such as research, development, testing, and implementation are revisited multiple times based on feedback and evaluation. This process allows for continuous improvement of the innovation, product, or service being developed. In particular, the ongoing involvement of the user creates feedback mechanisms that are ultimately key to successful development and implementation of products and services. A living lab is not similar to a testbed as its philosophy is to turn users, from being traditionally considered as observed subjects for testing modules against requirements, into value creation in contributing to the co-creation and exploration of emerging ideas, breakthrough scenarios, innovative concepts and related artefacts. Hence, a living lab rather constitutes an experiential environment, which could be compared to the concept of experiential learning, where users are immersed in a creative social space for designing and experiencing their own future. Living labs could also be used by policy makers and users/citizens for designing, exploring, experiencing and refining new policies and regulations in real-life scenarios for evaluating their potential impacts before their implementations. == European Network of Living Labs (ENoLL) == The European Network of Living Labs (ENoLL) is an international, non-profit, independent association of certified living labs, which popularized the living lab concept in the aim to increase user involvement in innovation. Formed in November 2006 under the guidance of the Finnish European Presidency, ENoLL is composed of a variety of stakeholders, including municipalities and research institutes, businesses, and users. Its primary role is to support the collaboration among living labs across Europe and includes many living labs focused on user-driven innovation across sectors. ENoLL focuses on facilitating knowledge exchange, joint actions and project partnerships among its historically labelled +/- 500 members, influencing EU policies, promoting living labs and enabling their implementation worldwide. ENoLL serves as a platform for linking living labs around the globe, which enables knowledge sharing and collaborative learning among diverse cultural environments. Membership to the platform is open to organizations worldwide, and ENoLL has expanded beyond Europe to include global members. ENoLL follows an application and accreditation pro

    Read more →
  • Snap rounding

    Snap rounding

    Snap rounding is a method of approximating line segment locations by creating a grid and placing each point in the centre of a cell (pixel) of the grid. The method preserves certain topological properties of the arrangement of line segments. Drawbacks include the potential interpolation of additional vertices in line segments (lines become polylines), the arbitrary closeness of a point to a non-incident edge, and arbitrary numbers of intersections between input line-segments. The 3 dimensional case is worse, with a polyhedral subdivision of complexity n becoming complexity O(n4). There are more refined algorithms to cope with some of these issues, for example iterated snap rounding guarantees a "large" separation between points and non-incident edges. == Algorithm == ... (please edit). See, and https://www.cgal.org/ () == Properties == Canonicity: Efficiency; A number of efficient implementations exist. Conversely there are undesirable properties: Non-idempotence: Repeated applications can cause arbitrary drift of points. Exception on "Stable snap rounding" algorithms, see https://doi.org/10.1016/j.comgeo.2012.02.011

    Read more →
  • Pocketbook (application)

    Pocketbook (application)

    Pocketbook was a Sydney-based free budget planner and personal finance app launched in 2012. The app helped users setup and manage budgets, track spending and manage bills. As of 2016 Pocketbook claimed to support over 250,000 Australians, in January 2018 that number was 435,000. After being acquired by Zip Co Ltd in 2016, it was announced in 2022 that the app was to be shut down and all user accounts deleted. == History == Pocketbook was founded by Alvin Singh and Bosco Tan in 2012. It was conceived in 2011 in a Wolli Creek apartment as a tool for Alvin and Bosco to take control of their money. In 2013, Pocketbook raised $500,000 from technology fund Tank Stream Ventures, and a group of investors including TV personality David Koch, Geoff Levy, David Shein and Peter Cooper. In September 2016 Digital retail finance and payment industry player zipMoney (now trading as Zip Co Limited) acquired Pocketbook in a $7.5m deal == Features == The app synced with the bank account of users and would organize spending into different categories. Users could also be reminded of bill payments, analyse spending and set spending limits. They can also be alerted of fraudulent transactions and deductions. The app employs security measures like end to end encryption, CloudFlare protection, fraud detection, identity protection etc. Pocketbook was available via web and mobile version. == Awards == Personal Finance Innovator of the Year by Fintech Business Awards 2017 Innovator of the Year by OPTUS MyBusiness Awards 2017 Best Finance App of 2016 by Australian Fintech Best Personal Finance App: Pocketbook won the 2016 Finder Innovation Awards, presented at a gala dinner hosted by media personality and The New Inventors presenter James O'Loghlin. Best Mobile App of the Year Winner: StartCon hosted the first annual Australasian Startup Awards. Over 200 nominations in 14 categories and an overall winner were reviewed, and winners were determined by public voting, with over 63,000 votes in total. Best New Startup 2014 by StartupSmart. Finalist in the SWIFT Innotribe startup competition in Dubai in 2013.

    Read more →
  • Leiden algorithm

    Leiden algorithm

    The Leiden algorithm is a community detection algorithm developed by Traag et al at Leiden University. It was developed as a modification of the Louvain method. Like the Louvain method, the Leiden algorithm attempts to optimize modularity in extracting communities from networks; however, it addresses key issues present in the Louvain method, namely poorly connected communities and the resolution limit of modularity. == Improvement over Louvain method == Broadly, the Leiden algorithm uses the same two primary phases as the Louvain algorithm: a local node moving step (though, the method by which nodes are considered in Leiden is more efficient) and a graph aggregation step. However, to address the issues with poorly-connected communities and the merging of smaller communities into larger communities (the resolution limit of modularity), the Leiden algorithm employs an intermediate refinement phase in which communities may be split to guarantee that all communities are well-connected. Consider, for example, the following graph: Three communities are present in this graph (each color represents a community). Additionally, the center "bridge" node (represented with an extra circle) is a member of the community represented by blue nodes. Now consider the result of a node-moving step which merges the communities denoted by red and green nodes into a single community (as the two communities are highly connected): Notably, the center "bridge" node is now a member of the larger red community after node moving occurs (due to the greedy nature of the local node moving algorithm). In the Louvain method, such a merging would be followed immediately by the graph aggregation phase. However, this causes a disconnection between two different sections of the community represented by blue nodes. In the Leiden algorithm, the graph is instead refined: The Leiden algorithm's refinement step ensures that the center "bridge" node is kept in the blue community to ensure that it remains intact and connected, despite the potential improvement in modularity from adding the center "bridge" node to the red community. == Graph components == Before defining the Leiden algorithm, it will be helpful to define some of the components of a graph. === Vertices and edges === A graph is composed of vertices (nodes) and edges. Each edge is connected to two vertices, and each vertex may be connected to zero or more edges. Edges are typically represented by straight lines, while nodes are represented by circles or points. In set notation, let V {\displaystyle V} be the set of vertices, and E {\displaystyle E} be the set of edges: V := { v 1 , v 2 , … , v n } E := { e i j , e i k , … , e k l } {\displaystyle {\begin{aligned}V&:=\{v_{1},v_{2},\dots ,v_{n}\}\\E&:=\{e_{ij},e_{ik},\dots ,e_{kl}\}\end{aligned}}} where e i j {\displaystyle e_{ij}} is the directed edge from vertex v i {\displaystyle v_{i}} to vertex v j {\displaystyle v_{j}} . We can also write this as an ordered pair: e i j := ( v i , v j ) {\displaystyle {\begin{aligned}e_{ij}&:=(v_{i},v_{j})\end{aligned}}} === Community === A community is a unique set of nodes: C i ⊆ V C i ⋂ C j = ∅ ∀ i ≠ j {\displaystyle {\begin{aligned}C_{i}&\subseteq V\\C_{i}&\bigcap C_{j}=\emptyset ~\forall ~i\neq j\end{aligned}}} and the union of all communities must be the total set of vertices: V = ⋃ i = 1 C i {\displaystyle {\begin{aligned}V&=\bigcup _{i=1}C_{i}\end{aligned}}} === Partition === A partition is the set of all communities: P = { C 1 , C 2 , … , C n } {\displaystyle {\begin{aligned}{\mathcal {P}}&=\{C_{1},C_{2},\dots ,C_{n}\}\end{aligned}}} == Partition quality == How communities are partitioned is an integral part on the Leiden algorithm. How partitions are decided can depend on how their quality is measured. Additionally, many of these metrics contain parameters of their own that can change the outcome of their communities. === Modularity === Modularity is a highly used quality metric for assessing how well a set of communities partition a graph. The equation for this metric is defined for an adjacency matrix, A, as: Q = 1 2 m ∑ i j ( A i j − k i k j 2 m ) δ ( c i , c j ) {\displaystyle Q={\frac {1}{2m}}\sum _{ij}(A_{ij}-{\frac {k_{i}k_{j}}{2m}})\delta (c_{i},c_{j})} where: A i j {\displaystyle A_{ij}} represents the edge weight between nodes i {\displaystyle i} and j {\displaystyle j} ; see Adjacency matrix; k i {\displaystyle k_{i}} and k j {\displaystyle k_{j}} are the sum of the weights of the edges attached to nodes i {\displaystyle i} and j {\displaystyle j} , respectively; m {\displaystyle m} is the sum of all of the edge weights in the graph; c i {\displaystyle c_{i}} and c j {\displaystyle c_{j}} are the communities to which the nodes i {\displaystyle i} and j {\displaystyle j} belong; and δ {\displaystyle \delta } is Kronecker delta function: δ ( c i , c j ) = { 1 if c i and c j are the same community 0 otherwise {\displaystyle {\begin{aligned}\delta (c_{i},c_{j})&={\begin{cases}1&{\text{if }}c_{i}{\text{ and }}c_{j}{\text{ are the same community}}\\0&{\text{otherwise}}\end{cases}}\end{aligned}}} === Reichardt Bornholdt Potts Model (RB) === One of the most well used metrics for the Leiden algorithm is the Reichardt Bornholdt Potts Model (RB). This model is used by default in most mainstream Leiden algorithm libraries under the name RBConfigurationVertexPartition. This model introduces a resolution parameter γ {\displaystyle \gamma } and is highly similar to the equation for modularity. This model is defined by the following quality function for an adjacency matrix, A, as: Q = ∑ i j ( A i j − γ k i k j 2 m ) δ ( c i , c j ) {\displaystyle Q=\sum _{ij}(A_{ij}-\gamma {\frac {k_{i}k_{j}}{2m}})\delta (c_{i},c_{j})} where: γ {\displaystyle \gamma } represents a linear resolution parameter === Constant Potts Model (CPM) === Another metric similar to RB is the Constant Potts Model (CPM). This metric also relies on a resolution parameter γ {\displaystyle \gamma } The quality function is defined as: H = − ∑ i j ( A i j w i j − γ ) δ ( c i , c j ) {\displaystyle H=-\sum _{ij}(A_{ij}w_{ij}-\gamma )\delta (c_{i},c_{j})} === Understanding Potts Model resolution parameters/Resolution limit === Typically Potts models such as RB or CPM include a resolution parameter in their calculation. Potts models are introduced as a response to the resolution limit problem that is present in modularity maximization based community detection. The resolution limit problem is that, for some graphs, maximizing modularity may cause substructures of a graph to merge and become a single community and thus smaller structures are lost. These resolution parameters allow modularity adjacent methods to be modified to suit the requirements of the user applying the Leiden algorithm to account for small substructures at a certain granularity. The figure on the right illustrates why resolution can be a helpful parameter when using modularity based quality metrics. In the first graph, modularity only captures the large scale structures of the graph; however, in the second example, a more granular quality metric could potentially detect all substructures in a graph. == Algorithm == The Leiden algorithm starts with a graph of disorganized nodes (a) and sorts it by partitioning them to maximize modularity (the difference in quality between the generated partition and a hypothetical randomized partition of communities). The method it uses is similar to the Louvain algorithm, except that after moving each node it also considers that node's neighbors that are not already in the community it was placed in. This process results in our first partition (b), also referred to as P {\displaystyle {\mathcal {P}}} . Then the algorithm refines this partition by first placing each node into its own individual community and then moving them from one community to another to maximize modularity. It does this iteratively until each node has been visited and moved, and each community has been refined - this creates partition (c), which is the initial partition of P refined {\displaystyle {\mathcal {P}}_{\text{refined}}} . Then an aggregate network (d) is created by turning each community into a node. P refined {\displaystyle {\mathcal {P}}_{\text{refined}}} is used as the basis for the aggregate network while P {\displaystyle {\mathcal {P}}} is used to create its initial partition. Because we use the original partition P {\displaystyle {\mathcal {P}}} in this step, we must retain it so that it can be used in future iterations. These steps together form the first iteration of the algorithm. In subsequent iterations, the nodes of the aggregate network (which each represent a community) are once again placed into their own individual communities and then sorted according to modularity to form a new P refined {\displaystyle {\mathcal {P}}_{\text{refined}}} , forming (e) in the above graphic. In the case depicted by the graph, the nodes were already sorted optimally, so no change too

    Read more →
  • Online analytical processing

    Online analytical processing

    In computing, online analytical processing (OLAP) (), is an approach to quickly answer multi-dimensional analytical (MDA) queries. The term OLAP was created as a slight modification of the traditional database term online transaction processing (OLTP). OLAP is part of the broader category of business intelligence, which also encompasses relational databases, report writing and data mining. Typical applications of OLAP include business reporting for sales, marketing, management reporting, business process management (BPM), budgeting and forecasting, financial reporting and similar areas, with new applications emerging, such as agriculture. OLAP tools enable users to analyse multidimensional data interactively from multiple perspectives. OLAP consists of three basic analytical operations: consolidation (roll-up), drill-down, and slicing and dicing. Consolidation involves the aggregation of data that can be accumulated and computed in one or more dimensions. For example, all sales offices are rolled up to the sales department or sales division to anticipate sales trends. By contrast, the drill-down is a technique that allows users to navigate through the details. For instance, users can view the sales by individual products that make up a region's sales. Slicing and dicing is a feature whereby users can take out (slicing) a specific set of data of the OLAP cube and view (dicing) the slices from different viewpoints. These viewpoints are sometimes called dimensions (such as looking at the same sales by salesperson, or by date, or by customer, or by product, or by region, etc.). Databases configured for OLAP use a multidimensional data model, allowing for complex analytical and ad hoc queries with a rapid execution time. They borrow aspects of navigational databases, hierarchical databases and relational databases. OLAP is typically contrasted to OLTP (online transaction processing), which is generally characterized by much less complex queries, in a larger volume, to process transactions rather than for the purpose of business intelligence or reporting. Whereas OLAP systems are mostly optimized for read, OLTP has to process all kinds of queries (read, insert, update and delete). == Overview of OLAP systems == At the core of any OLAP system is an OLAP cube (also called a 'multidimensional cube' or a hypercube). It consists of numeric facts called measures that are categorized by dimensions. The measures are placed at the intersections of the hypercube, which is spanned by the dimensions as a vector space. The usual interface to manipulate an OLAP cube is a matrix interface, like Pivot tables in a spreadsheet program, which performs projection operations along the dimensions, such as aggregation or averaging. The cube metadata is typically created from a star schema or snowflake schema or fact constellation of tables in a relational database. Measures are derived from the records in the fact table and dimensions are derived from the dimension tables. Each measure can be thought of as having a set of labels, or meta-data associated with it. A dimension is what describes these labels; it provides information about the measure. A simple example would be a cube that contains a store's sales as a measure, and Date/Time as a dimension. Each Sale has a Date/Time label that describes more about that sale. For example: Sales Fact Table +-------------+----------+ | sale_amount | time_id | +-------------+----------+ Time Dimension | 930.10| 1234 |----+ +---------+-------------------+ +-------------+----------+ | | time_id | timestamp | | +---------+-------------------+ +---->| 1234 | 20080902 12:35:43 | +---------+-------------------+ === Multidimensional databases === Multidimensional structure is defined as "a variation of the relational model that uses multidimensional structures to organize data and express the relationships between data". The structure is broken into cubes and the cubes are able to store and access data within the confines of each cube. "Each cell within a multidimensional structure contains aggregated data related to elements along each of its dimensions". Even when data is manipulated it remains easy to access and continues to constitute a compact database format. The data still remains interrelated. Multidimensional structure is quite popular for analytical databases that use online analytical processing (OLAP) applications. Analytical databases use these databases because of their ability to deliver answers to complex business queries swiftly. Data can be viewed from different angles, which gives a broader perspective of a problem unlike other models. === Aggregations === It has been claimed that for complex queries OLAP cubes can produce an answer in around 0.1% of the time required for the same query on OLTP relational data. The most important mechanism in OLAP which allows it to achieve such performance is the use of aggregations. Aggregations are built from the fact table by changing the granularity on specific dimensions and aggregating up data along these dimensions, using an aggregate function (or aggregation function). The number of possible aggregations is determined by every possible combination of dimension granularities. The combination of all possible aggregations and the base data contains the answers to every query which can be answered from the data. Because usually there are many aggregations that can be calculated, often only a predetermined number are fully calculated; the remainder are solved on demand. The problem of deciding which aggregations (views) to calculate is known as the view selection problem. View selection can be constrained by the total size of the selected set of aggregations, the time to update them from changes in the base data, or both. The objective of view selection is typically to minimize the average time to answer OLAP queries, although some studies also minimize the update time. View selection is NP-complete. Many approaches to the problem have been explored, including greedy algorithms, randomized search, genetic algorithms and A search algorithm. Some aggregation functions can be computed for the entire OLAP cube by precomputing values for each cell, and then computing the aggregation for a roll-up of cells by aggregating these aggregates, applying a divide and conquer algorithm to the multidimensional problem to compute them efficiently. For example, the overall sum of a roll-up is just the sum of the sub-sums in each cell. Functions that can be decomposed in this way are called decomposable aggregation functions, and include COUNT, MAX, MIN, and SUM, which can be computed for each cell and then directly aggregated; these are known as self-decomposable aggregation functions. In other cases, the aggregate function can be computed by computing auxiliary numbers for cells, aggregating these auxiliary numbers, and finally computing the overall number at the end; examples include AVERAGE (tracking sum and count, dividing at the end) and RANGE (tracking max and min, subtracting at the end). In other cases, the aggregate function cannot be computed without analyzing the entire set at once, though in some cases approximations can be computed; examples include DISTINCT COUNT, MEDIAN, and MODE; for example, the median of a set is not the median of medians of subsets. These latter are difficult to implement efficiently in OLAP, as they require computing the aggregate function on the base data, either computing them online (slow) or precomputing them for possible rollouts (large space). == Types == OLAP systems have been traditionally categorized using the following taxonomy. === Multidimensional OLAP (MOLAP) === MOLAP (multi-dimensional online analytical processing) is the classic form of OLAP and is sometimes referred to as just OLAP. MOLAP stores this data in an optimized multi-dimensional array storage, rather than in a relational database. Some MOLAP tools require the pre-computation and storage of derived data, such as consolidations – the operation known as processing. Such MOLAP tools generally utilize a pre-calculated data set referred to as a data cube. The data cube contains all the possible answers to a given range of questions. As a result, they have a very fast response to queries. On the other hand, updating can take a long time depending on the degree of pre-computation. Pre-computation can also lead to what is known as data explosion. Other MOLAP tools, particularly those that implement the functional database model do not pre-compute derived data but make all calculations on demand other than those that were previously requested and stored in a cache. Advantages of MOLAP Fast query performance due to optimized storage, multidimensional indexing and caching. Smaller on-disk size of data compared to data stored in relational database due to compression techniques. Automated computation of higher-level aggregates of the data. It is very compact for low dimension data se

    Read more →
  • Collaborative diffusion

    Collaborative diffusion

    Collaborative Diffusion is a type of pathfinding algorithm which uses the concept of antiobjects, objects within a computer program that function opposite to what would be conventionally expected. Collaborative Diffusion is typically used in video games, when multiple agents must path towards a single target agent. For example, the ghosts in Pac-Man. In this case, the background tiles serve as antiobjects, carrying out the necessary calculations for creating a path and having the foreground objects react accordingly, whereas having foreground objects be responsible for their own pathing would be conventionally expected. Collaborative Diffusion is favored for its efficiency over other pathfinding algorithms, such as A, when handling multiple agents. Also, this method allows elements of competition and teamwork to easily be incorporated between tracking agents. Notably, the time taken to calculate paths remains constant as the number of agents increases.

    Read more →
  • QANDA

    QANDA

    QANDA (stands for 'Q and A') is an AI-based learning platform developed by Mathpresso Inc., a South Korea-based education technology company. Its best known feature is a solution search, which uses optical character recognition technology to scan problems and provide step-by-step solutions and learning content. As of March 2024, QANDA solved over 6.3 billion questions. QANDA has 90 million total registered users and has reached 8 million monthly active users (MAU) in 50 countries. 90% of the cumulative users are from overseas such as Vietnam and Indonesia. In January 2024, its MathGPT, a math-specific small large language model set a new world record, surpassed Microsoft's 'ToRA 13B', the previous record holder in benchmarks assessing mathematical performance such as 'MATH' (high school math) and 'GSM8K' (grade school math). 'MathGPT' was co-developed with Upstage and KT. In March 2024, Mathpresso launched 'Cramify' (formerly known as Prep.Pie), an AI-powered study material generator designed to create personalized exam prep materials for U.S. college students. It uses generative AI to create customized study materials uploaded by students. Its features include a range of tools including study summarizer and question solver. == History == Co-founder Jongheun ‘Ray’ Lee first came up with the idea of QANDA during his freshman year in college. While he was tutoring to earn money, Lee realized that the quality of education a student receives is greatly based on their location. Lee saw his K-12 students were regularly asking similar questions and realized that these questions were from a pre-selected number of textbooks currently being used in schools. He decided to team up with his high school friend, Yongjae ‘Jake’ Lee to build a platform whereby, one uses a mobile app to scan and submit questions, and students can ask and receive detailed responses. Lee's school friends, Wonguk Jung and Hojae Jeong, joined the team. In June 2015, Mathpresso, Inc. was founded in Seoul, South Korea. In January 2016, Mathpresso's first product QANDA was launched. It supported a Q&A feature between students and tutors. In October 2017, QANDA introduced an AI-based search capability that permitted users to search for answers in seconds. In April 2020, Jake Yongjae Lee(CEO & co-founder) and Ray Jongheun Lee (co-founder) were selected as Forbes 30 under 30 Asia. In June 2021, QANDA raised $50 million in series C funding. Jake Yongjae Lee was recognized as an Innovator Under 35 by MIT Technology Review. In November 2021, QANDA secured a strategic investment from Google. Since its inception, it has received backing in Series C funding from investors namely Google, Yellowdog, GGV Capital, Goodwater Capital, KDB, and SKS Private Equity with participation from SoftBank Ventures Asia, Legend Capital, Mirae Asset Venture Investment, and Smilegate Investment. In September 2023, Mathpresso has raised $8 million (10 billion KRW) from Korea's telecom giant, KT. The total cumulative investment is about 130 million US dollars. The partnership aims to accelerate the development of an education-specific Large Language Model. The company intends to incorporate the LLM model to fortify its AI tutor, which later will be integrated into the existing services: QANDA App, B2B & B2G Saas, and 1:1 online tutoring (QANDA Tutor). == Features == QANDA features OCR-based solution search, one-on-one Q&A tutoring, a study timer. In 2021, QANDA launched additional features, including the premium subscription model that offers unlimited “byte-sized” micro-video lectures and the community feature that enhances collaborative learning. In 2021, QANDA launched QANDA Tutor, a tablet-based 1:1 tutoring service and QANDA Study, a 1:N online school in Vietnam. In 2022, QANDA launched an exam prep feature that offers past exam materials from school via online. This feature is currently available in South Korea. In August 2023, QANDA launched a beta version of an LLM-powered AI Tutor. == Awards and recognition == Best Hidden Gems of 2017 by Google Playstore 2018 AWS AI Startup Challenge Award National representative for the Google AI for Social Good APAC, 2018 Best Self-Improvement Apps of 2018 by Google Playstore GSV Edtech 150 — the Most Transformational Growth Companies in Digital Learning Speaker at the Google App Summit, 2021 Selected as a prospect unicorn company by Korea Technology Finance Corporation in 2023 Winner of G20-DIA Global Pitching in 2023 2021, 2022, 2023 East Asia EdTech 150 by HolonIQ

    Read more →
  • Information scientist

    Information scientist

    The term information scientist developed in the latter part of the twentieth century by Wm. Hovey Smith to describe an individual, usually with a relevant subject degree (such as one in Information and Computer Science - CIS) or high level of subject knowledge, providing focused information to scientific and technical research staff in industry. It is a role quite distinct from and complementary to that of a librarian. Developments in end-user searching, together with some convergence between the roles of librarian and information scientist, have led to a diminution in its use in this context, and the term information officer or information professional (information specialist) are also now used. The term was, and is, also used for an individual carrying out research in information science. Brian C. Vickery mentions that the Institute of Information Scientists (IIS) was established in London during 1958 and lists the criteria put forward by this institute "Criteria for Information Science" (appendix 1) as well as his own "Areas of study in information science" (appendix 2). The IIS merged with the Library Association in 2002 to form the Chartered Institute of Library and Information Professionals (CILIP). == Notable Information Scientists == See also Award of Merit - Association for Information Science and Technology Marcia Bates David Blair (information technologist) Samuel C. Bradford Michael Buckland John M. Carroll Blaise Cronin Emilia Currás Brenda Dervin Eugene Garfield Paul B. Kantor Frederick Wilfrid Lancaster Calvin Mooers Tefko Saracevic Linda C. Smith Robert Saxton Taylor Brian Campbell Vickery Thomas D. Wilson == Additional reading == Ellis, David and Merete Haugan. (1997) "Modelling the information seeking patterns of engineers and research scientists in an industrial environment" (Journal of Documentation, Volume 53(4): pp. 384–403) Poole, Alex H. (2024). "'There's a big difference between going through life with the wind at your back, and going through life leaning into the wind': Feminism in Post-World War II Information Science". Proceedings of the Association for Information Science and Technology. 61: 300–313. doi:10.1002/pra2.1029. Vickery, Brian Campbell (1988) "Essays presented to B. C. Vickery" (Journal of Documentation, Volume 44, pp. 199–283). Vickery, B. & Vickery, A. (1987) Information Science in theory and practice (London: Bowker-Saur, pp. 361–369)

    Read more →
  • Overcategorization

    Overcategorization

    Overcategorization or category clutter is a phenomenon during classification where too many categories or classes are assigned to a document, record, or item. Overcategorization is related to the library and information science (LIS) concepts of document classification and subject indexing. It is also related to online shopping where excessive product categories can overwhelm users with too many choices or make it more difficult for customers to find the products they need. Although these categories are intended to improve organization and ease of navigation when shipping online, too many categories can lower customer satisfaction, increase difficulty navigating the online store, and reduce future shopping intentions. In LIS, the ideal number of terms that should be assigned to classify an item are measured by the variables precision and recall. Assigning few category labels that are most closely related to the content of the item being classified will result in searches that have high precision, I.e., where a high proportion of the results are closely related to the query. Assigning more category labels to each item will reduce the precision of each search, but increase the recall, retrieving more relevant results. Related LIS concepts include exhaustivity of indexing and information overload. == Basic principles == If too many categories are assigned to a given document, the implications for users depend on how informative the links are. If the user is able to distinguish between useful and not useful links, the damage is limited: The user only wastes time selecting links. In many cases, however, the user cannot judge whether or not a given link will turn out to be fruitful. In that case he or she has to follow the link and to read or skim another document. The worst case scenario is, of course, that even after reading the new document the user is unable to decide whether or not it might be useful if its subject matter is not thoroughly investigated. Overcategorization also has another unpleasant implication: It makes the system (for example in Wikipedia) difficult to maintain in a consistent way. If the system is inconsistent, it means that when the user considers the links in a given category, he or she will not find all documents relevant to that category. Basically, the problem of overcategorization should be understood from the perspective of relevance and the traditional measures of recall and precision. If too few relevant categories are assigned to a document, recall may decrease. If too many non-relevant categories are assigned, precision becomes lower. The hard job is to say which categories are fruitful or relevant for future use of the document.

    Read more →