AI Analytics Social Media

AI Analytics Social Media — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Cloudflare

    Cloudflare

    Cloudflare, Inc., is an American technology company headquartered in San Francisco, California, that provides a range of internet services, including content delivery network (CDN) services, cloud cybersecurity, DDoS mitigation, and ICANN-accredited domain registration. The company's services act primarily as a reverse proxy between website visitors and a customer's hosting provider, improving performance and protecting against malicious traffic. Cloudflare was founded in 2009 by Matthew Prince, Lee Holloway, and Michelle Zatlyn. The company went public on the New York Stock Exchange in 2019 under the ticker symbol NET. Cloudflare has since expanded its offerings to include edge computing through its Workers platform, a public DNS resolver (1.1.1.1), and a VPN-like service known as WARP. In recent years, the company has integrated artificial intelligence into its infrastructure, acquiring companies such as Replicate and launching tools to manage AI bots and scrapers. According to W3Techs, Cloudflare is used by approximately 21.3% of all websites on the Internet as of January 2026. The company has been the subject of controversy regarding its policy of content neutrality. While Cloudflare executives have historically advocated for remaining a neutral infrastructure provider, the company has terminated services for specific high-profile websites associated with hate speech and violence, including The Daily Stormer, 8chan, and Kiwi Farms, following significant public pressure. Cloudflare has also faced criticism and litigation regarding copyright infringement by websites using its services, notably losing a lawsuit against Japanese publishers in 2025. The company experienced significant global outages in late 2025 which disrupted services for major platforms internationally. == History == Cloudflare was founded on July 26, 2009, by Matthew Prince, Lee Holloway, and Michelle Zatlyn. Prince and Holloway had previously collaborated on Project Honey Pot, a product of Unspam Technologies that partly inspired the basis of Cloudflare. In 2009, the company was venture-capital funded. On August 15, 2019, Cloudflare submitted its S-1 filing for an initial public offering on the New York Stock Exchange under the stock ticker NET. It opened for public trading on September 13, 2019, at $15 per share. According to the company, the name 'Cloudflare' was chosen, over the initial 'WebWall', because it best described what they were trying to do: build a "firewall in the cloud." In 2020, Cloudflare co-founder and COO Michelle Zatlyn was named president. Cloudflare has acquired web-services and security companies, including StopTheHacker (February 2014), CryptoSeal (June 2014), Eager Platform Co. (December 2016), Neumob (November 2017), S2 Systems (January 2020), Linc (December 2020), Zaraz (December 2021), Vectrix (February 2022), Area 1 Security (February 2022), Nefeli Networks (March 2024), BastionZero (May 2024), and Kivera (October 2024). Replicate (November 2025), and Human Native (January 2026). Since at least 2017, Cloudflare has used a wall of lava lamps at its San Francisco headquarters as a source of randomness for encryption keys, alongside double pendulums at its London offices and a Geiger counter at its Singapore offices. The lava lamp installation implements the Lavarand method, where a camera transforms the unpredictable shapes of the "lava" blobs into a digital image. In Q4 2022, Cloudflare provided paid services to 162,086 customers. In October 2024, Cloudflare won a lawsuit against patent troll Sable Networks. Sable paid Cloudflare $225,000, granted it a royalty-free license to its patent portfolio, and dedicated its patents to the public by abandoning its patent rights. In November 2025, it was announced Cloudflare had agreed to acquire Replicate, a San Francisco–based platform that enables software developers to run, fine-tune, and deploy open-source machine-learning models via an API without managing infrastructure. In January 2026, Cloudflare released an analysis regarding BGP routing leaks observed from the Venezuelan state-owned ISP CANTV (AS8048), which occurred on January 2 coincides with the arrest of Nicolás Maduro. While some security researchers had speculated that the outages were linked to U.S. cyber operations, Cloudflare's data indicated that the anomalies were consistent with a pattern of "insufficient routing export and import policies" by the ISP rather than malicious external interference. In January 2026, Cloudflare acquired Human Native, an AI data marketplace that brokers transactions between developers and content creators, for an undisclosed amount. On January 16, 2026, Cloudflare acquired The Astro Technology Company, the developers behind the open-source web framework Astro. In May 2026, Cloudflare announced the elimination of approximately 1,100 positions, around 20 percent of its workforce, in a restructuring the company attributed to the rapid adoption of artificial intelligence tools. The announcement coincided with the company's first-quarter 2026 earnings, which reported a record $639.8 million in quarterly revenue, a 34 percent year-over-year increase. CEO Matthew Prince stated the cuts were not driven by performance concerns but reflected roles made obsolete by AI, and that Cloudflare expected to employ more people by the end of 2027 than at any point during 2026. == Products == Cloudflare provides network and security products for consumers and businesses, utilizing edge computing, reverse proxies for web traffic, data center interconnects, and a content distribution network to serve content across its network of servers. It supports transport layer protocols TCP, UDP, QUIC, and many application layer protocols such as DNS over HTTPS, SMTP, and HTTP/2 with support for HTTP/2 Server Push. As of 2023, Cloudflare handles an average of 45 million HTTP requests per second. As of 2024, Cloudflare servers are powered by AMD EPYC 9684X processors. Cloudflare also provides analysis and reports on large-scale outages, including Verizon's October 2024 outage. === Artificial intelligence === In 2023, Cloudflare launched "Workers AI", a framework allowing for use of Nvidia GPU's within Cloudflare's network. In 2024, Cloudflare launched a tool that prevents bots from scraping websites. To build automatic bot detector models, the company analyzed "AI" bots and crawler traffic. The company also launched an "AI" assistant to generate charts based on queries by leveraging "Workers AI". Cloudflare announced plans in September 2024 to launch a marketplace where website owners can sell "AI" model providers access to scrape their site's content. In March 2025, Cloudflare announced a new feature called "AI Labyrinth", which combats unauthorized "AI" data scraping by serving fake "AI"-generated content to LLM bots. In July, the company rolled out a permission-based setting to allow websites to automatically block online bots from scraping data and content. Cloudflare released AutoRAG into beta in 2025. AutoRAG (retrieval augmented generation) creates a vector database of a website's unstructured content to identify relationships between concepts. It is part of an initiative with Microsoft, alongside their NLWeb standard, to make websites easier for people and automated systems to query. Cloudflare and GoDaddy partnered in April 2026 to enable AI Crawl Control features on GoDaddy hosted websites. This would allow site owners to decide how AI bot crawlers interact with their content. === DDoS mitigation === Cloudflare provides free and paid DDoS mitigation services that protect customers from distributed denial of service (DDoS) attacks. Cloudflare received media attention in June 2011 for providing DDoS mitigation for the website of LulzSec, a black hat hacking group. In March 2013, The Spamhaus Project was targeted by a DDoS attack that Cloudflare reported exceeded 300 gigabits per second (Gbit/s). Patrick Gilmore, of Akamai, stated that at the time it was "the largest publicly announced DDoS attack in the history of the Internet". While trying to defend Spamhaus against the DDoS attacks, Cloudflare ended up being attacked as well; Google and other companies eventually came to Spamhaus' defense and helped it to absorb the unprecedented amount of attack traffic. In 2014, Cloudflare began providing free DDoS mitigation for artists, activists, journalists, and human rights groups under the name "Project Galileo". In 2017, it extended the service to electoral infrastructure and political campaigns under the name "Athenian Project". By 2025, more than 2,900 users and organizations were participating in Project Galileo, including 31 US states. In February 2014, Cloudflare claimed to have mitigated an NTP reflection attack against an unnamed European customer, which it stated peaked at 400 Gbit/s. In November 2014, it reported a 500 Gbit/s DDoS attack in Hong Kong. In July 2021, the company claimed to have absorbed a DDoS atta

    Read more →
  • Nearest centroid classifier

    Nearest centroid classifier

    In machine learning, a nearest centroid classifier or nearest prototype classifier is a classification model that assigns to observations the label of the class of training samples whose mean (centroid) is closest to the observation. When applied to text classification using word vectors containing tfidf weights to represent documents, the nearest centroid classifier is known as the Rocchio classifier because of its similarity to the Rocchio algorithm for relevance feedback. An extended version of the nearest centroid classifier has found applications in the medical domain, specifically classification of tumors. == Algorithm == === Training === Given labeled training samples { ( x → 1 , y 1 ) , … , ( x → n , y n ) } {\displaystyle \textstyle \{({\vec {x}}_{1},y_{1}),\dots ,({\vec {x}}_{n},y_{n})\}} with class labels y i ∈ Y {\displaystyle y_{i}\in \mathbf {Y} } , compute the per-class centroids μ → ℓ = 1 | C ℓ | ∑ i ∈ C ℓ x → i {\displaystyle \textstyle {\vec {\mu }}_{\ell }={\frac {1}{|C_{\ell }|}}{\underset {i\in C_{\ell }}{\sum }}{\vec {x}}_{i}} where C ℓ {\displaystyle C_{\ell }} is the set of indices of samples belonging to class ℓ ∈ Y {\displaystyle \ell \in \mathbf {Y} } . === Prediction === The class assigned to an observation x → {\displaystyle {\vec {x}}} is y ^ = arg ⁡ min ℓ ∈ Y ‖ μ → ℓ − x → ‖ {\displaystyle {\hat {y}}={\arg \min }_{\ell \in \mathbf {Y} }\|{\vec {\mu }}_{\ell }-{\vec {x}}\|} .

    Read more →
  • NOMINATE (scaling method)

    NOMINATE (scaling method)

    NOMINATE (an acronym for nominal three-step estimation) is a multidimensional scaling application developed by US political scientists Keith T. Poole and Howard Rosenthal in the early 1980s to analyze preferential and choice data, such as legislative roll-call voting behavior. In its most well-known application, members of the US Congress are placed on a two-dimensional map, with politicians who are ideologically similar (i.e. who often vote the same) being close together. One of these two dimensions corresponds to the familiar left–right political spectrum (liberal–conservative in the United States). As computing capabilities grew, Poole and Rosenthal developed multiple iterations of their NOMINATE procedure: the original D-NOMINATE method, W-NOMINATE, and most recently DW-NOMINATE (for dynamic, weighted NOMINATE). In 2009, Poole and Rosenthal were the first recipients of the Society for Political Methodology's Best Statistical Software Award for their development of NOMINATE. In 2016, the society awarded Poole its Career Achievement Award, stating that "the modern study of the U.S. Congress would be simply unthinkable without NOMINATE legislative roll call voting scores." == Procedure == The main procedure is an application of multidimensional scaling techniques to political choice data. Though there are important technical differences between these types of NOMINATE scaling procedures, all operate under the same fundamental assumptions. First, that alternative choices can be projected on a basic, low-dimensional (often two-dimensional) Euclidean space. Second, within that space, individuals have utility functions which are bell-shaped (normally distributed), and maximized at their ideal point. Because individuals also have symmetric, single-peaked utility functions which center on their ideal point, ideal points represent individuals' most preferred outcomes. That is, individuals most desire outcomes closest their ideal point, and will choose/vote probabilistically for the closest outcome. Ideal points can be recovered from observing choices, with individuals exhibiting similar preferences placed more closely than those behaving dissimilarly. It is helpful to compare this procedure to producing maps based on driving distances between cities. For example, Los Angeles is about 1,800 miles from St. Louis; St. Louis is about 1,200 miles from Miami; and Miami is about 2,700 miles from Los Angeles. From this (dis)similarities data, any map of these three cities should place Miami far from Los Angeles, with St. Louis somewhere in between (though a bit closer to Miami than Los Angeles). Just as cities like Los Angeles and San Francisco would be clustered on a map, NOMINATE places ideologically similar legislators (e.g., liberal Senators Barbara Boxer (D-Calif.) and Al Franken (D-Minn.)) closer to each other, and farther from dissimilar legislators (e.g., conservative Senator Tom Coburn (R-Okla.)) based on the degree of agreement between their roll call voting records. At the heart of the NOMINATE procedures (and other multidimensional scaling methods, such as Poole's Optimal Classification method) are algorithms they utilize to arrange individuals and choices in low dimensional (usually two-dimensional) space. Thus, NOMINATE scores provide "maps" of legislatures. Using NOMINATE procedures to study congressional roll call voting behavior from the First Congress to the present-day, Poole and Rosenthal published Congress: A Political-Economic History of Roll Call Voting in 1997 and the revised edition Ideology and Congress in 2007. In 2009, Poole and Rosenthal were named the first recipients of the Society for Political Methodology's Best Statistical Software Award for their development of NOMINATE, a recognition conferred to "individual(s) for developing statistical software that makes a significant research contribution". In 2016, Keith T. Poole was awarded the Society for Political Methodology's Career Achievement Award. The citation for this award reads, in part, "One can say perfectly correctly, and without any hyperbole: the modern study of the U.S. Congress would be simply unthinkable without NOMINATE legislative roll call voting scores. NOMINATE has produced data that entire bodies of our discipline—and many in the press—have relied on to understand the U.S. Congress." == Dimensions == Poole and Rosenthal demonstrate that—despite the many complexities of congressional representation and politics—roll call voting in both the House and the Senate can be organized and explained by no more than two dimensions throughout the sweep of American history. The first dimension (horizontal or x-axis) is the familiar left-right (or liberal-conservative) spectrum on economic matters. The second dimension (vertical or y-axis) picks up attitudes on cross-cutting, salient issues of the day (which include or have included slavery, bimetallism, civil rights, regional, and social/lifestyle issues). Rosenthal and Poole have initially argued that the first dimension refers to socio-economic matters and the second dimension to race-relations. However, the often confusing and residual nature of the second dimension has led to the second dimension being largely ignored by other researchers. For the most part, congressional voting is uni-dimensional, with most of the variation in voting patterns explained by placement along the liberal-conservative first dimension. While the first dimension of the DW-NOMINATE score is able to predict results at 83% accuracy, the addition of the second dimension only increases accuracy to 85%. Furthermore, the second dimension only provided a significant increase in accuracy for Congresses 1-99. As late as the 1990s, the second dimension was able to measure partisan splits in abortion and gun rights issues. However, a 2017 analysis found that since 1987, the votes of the US Congress had best fit a one-dimensional model, suggesting increasing party polarization after 1987. == Interpretation of nominate scores == For illustrative purposes, consider the following plots which use W-NOMINATE scores to scale members of Congress and uses the probabilistic voting model (in which legislators farther from the "cutting line" between "yea" and "nay" outcomes become more likely to vote in the predicted manner) to illustrate some major Congressional votes in the 1990s. Some of these votes, like the House's vote on President Clinton's welfare reform package (the Personal Responsibility and Work Opportunity Act of 1996) are best modeled through the use of the first (economic liberal-conservative) dimension. On the welfare reform vote, nearly all Republicans joined the moderate-conservative bloc of House Democrats in voting for the bill, while opposition was virtually confined to the most liberal Democrats in the House. The errors (those representatives on the "wrong" side of the cutting line which separates predicted "yeas" and predicted "nays") are generally close to the cutting line, which is what we would expect. A legislator directly on the cutting line is indifferent between voting "yea" and "nay" on the measure. All members are shown on the left panel of the plot, while only errors are shown on the right panel: Economic ideology also dominates the Senate vote on the Balanced Budget Amendment of 1995: On other votes, however, a second dimension (which has recently come to represent attitudes on cultural and lifestyle issues) is important. For example, roll call votes on gun control routinely split party coalitions, with socially conservative "blue dog" Democrats joining most Republicans in opposing additional regulation and socially liberal Republicans joining most Democrats in supporting gun control. The addition of the second dimension accounts for these inter-party differences, and the cutting line is more horizontal than vertical (meaning the cleavage is found on the second dimension rather than the first dimension on these votes) This pattern was evident in the 1991 House vote to require waiting periods on handguns: == Political ideology == DW-NOMINATE scores have been used widely to describe the political ideology of political actors, political parties and political institutions. For instance, a score in the first dimension that is close to either pole means that such score is located at one of the extremes in the liberal-conservative scale. So, a score closer to 1 is described as conservative whereas a score closer to −1 can be described as liberal. Finally, a score at zero or close to zero is described as moderate. == Political polarization == Poole and Rosenthal (beginning with their 1984 article "The Polarization of American Politics") have also used NOMINATE data to show that, since the 1970s, party delegations in Congress have become ideologically homogeneous and distant from one another (a phenomenon known as "polarization"). Using DW-NOMINATE scores (which permit direct comparisons between members of different Congress

    Read more →
  • Dominance-based rough set approach

    Dominance-based rough set approach

    The dominance-based rough set approach (DRSA) is an extension of rough set theory for multi-criteria decision analysis (MCDA), introduced by Greco, Matarazzo and Słowiński. The main change compared to the classical rough sets is the substitution for the indiscernibility relation by a dominance relation, which permits one to deal with inconsistencies typical to consideration of criteria and preference-ordered decision classes. == Multicriteria classification (sorting) == Multicriteria classification (sorting) is one of the problems considered within MCDA and can be stated as follows: given a set of objects evaluated by a set of criteria (attributes with preference-order domains), assign these objects to some pre-defined and preference-ordered decision classes, such that each object is assigned to exactly one class. Due to the preference ordering, improvement of evaluations of an object on the criteria should not worsen its class assignment. The sorting problem is very similar to the problem of classification, however, in the latter, the objects are evaluated by regular attributes and the decision classes are not necessarily preference ordered. The problem of multicriteria classification is also referred to as ordinal classification problem with monotonicity constraints and often appears in real-life application when ordinal and monotone properties follow from the domain knowledge about the problem. As an illustrative example, consider the problem of evaluation in a high school. The director of the school wants to assign students (objects) to three classes: bad, medium and good (notice that class good is preferred to medium and medium is preferred to bad). Each student is described by three criteria: level in Physics, Mathematics and Literature, each taking one of three possible values bad, medium and good. Criteria are preference-ordered and improving the level from one of the subjects should not result in worse global evaluation (class). As a more serious example, consider classification of bank clients, from the viewpoint of bankruptcy risk, into classes safe and risky. This may involve such characteristics as "return on equity (ROE)", "return on investment (ROI)" and "return on sales (ROS)". The domains of these attributes are not simply ordered but involve a preference order since, from the viewpoint of bank managers, greater values of ROE, ROI or ROS are better for clients being analysed for bankruptcy risk . Thus, these attributes are criteria. Neglecting this information in knowledge discovery may lead to wrong conclusions. == Data representation == === Decision table === In DRSA, data are often presented using a particular form of decision table. Formally, a DRSA decision table is a 4-tuple S = ⟨ U , Q , V , f ⟩ {\displaystyle S=\langle U,Q,V,f\rangle } , where U {\displaystyle U\,\!} is a finite set of objects, Q {\displaystyle Q\,\!} is a finite set of criteria, V = ⋃ q ∈ Q V q {\displaystyle V=\bigcup {}_{q\in Q}V_{q}} where V q {\displaystyle V_{q}\,\!} is the domain of the criterion q {\displaystyle q\,\!} and f : U × Q → V {\displaystyle f\colon U\times Q\to V} is an information function such that f ( x , q ) ∈ V q {\displaystyle f(x,q)\in V_{q}} for every ( x , q ) ∈ U × Q {\displaystyle (x,q)\in U\times Q} . The set Q {\displaystyle Q\,\!} is divided into condition criteria (set C ≠ ∅ {\displaystyle C\neq \emptyset } ) and the decision criterion (class) d {\displaystyle d\,\!} . Notice, that f ( x , q ) {\displaystyle f(x,q)\,\!} is an evaluation of object x {\displaystyle x\,\!} on criterion q ∈ C {\displaystyle q\in C} , while f ( x , d ) {\displaystyle f(x,d)\,\!} is the class assignment (decision value) of the object. An example of decision table is shown in Table 1 below. === Outranking relation === It is assumed that the domain of a criterion q ∈ Q {\displaystyle q\in Q} is completely preordered by an outranking relation ⪰ q {\displaystyle \succeq _{q}} ; x ⪰ q y {\displaystyle x\succeq _{q}y} means that x {\displaystyle x\,\!} is at least as good as (outranks) y {\displaystyle y\,\!} with respect to the criterion q {\displaystyle q\,\!} . Without loss of generality, we assume that the domain of q {\displaystyle q\,\!} is a subset of reals, V q ⊆ R {\displaystyle V_{q}\subseteq \mathbb {R} } , and that the outranking relation is a simple order between real numbers ≥ {\displaystyle \geq \,\!} such that the following relation holds: x ⪰ q y ⟺ f ( x , q ) ≥ f ( y , q ) {\displaystyle x\succeq _{q}y\iff f(x,q)\geq f(y,q)} . This relation is straightforward for gain-type ("the more, the better") criterion, e.g. company profit. For cost-type ("the less, the better") criterion, e.g. product price, this relation can be satisfied by negating the values from V q {\displaystyle V_{q}\,\!} . === Decision classes and class unions === Let T = { 1 , … , n } {\displaystyle T=\{1,\ldots ,n\}\,\!} . The domain of decision criterion, V d {\displaystyle V_{d}\,\!} consist of n {\displaystyle n\,\!} elements (without loss of generality we assume V d = T {\displaystyle V_{d}=T\,\!} ) and induces a partition of U {\displaystyle U\,\!} into n {\displaystyle n\,\!} classes Cl = { C l t , t ∈ T } {\displaystyle {\textbf {Cl}}=\{Cl_{t},t\in T\}} , where C l t = { x ∈ U : f ( x , d ) = t } {\displaystyle Cl_{t}=\{x\in U\colon f(x,d)=t\}} . Each object x ∈ U {\displaystyle x\in U} is assigned to one and only one class C l t , t ∈ T {\displaystyle Cl_{t},t\in T} . The classes are preference-ordered according to an increasing order of class indices, i.e. for all r , s ∈ T {\displaystyle r,s\in T} such that r ≥ s {\displaystyle r\geq s\,\!} , the objects from C l r {\displaystyle Cl_{r}\,\!} are strictly preferred to the objects from C l s {\displaystyle Cl_{s}\,\!} . For this reason, we can consider the upward and downward unions of classes, defined respectively, as: C l t ≥ = ⋃ s ≥ t C l s C l t ≤ = ⋃ s ≤ t C l s t ∈ T {\displaystyle Cl_{t}^{\geq }=\bigcup _{s\geq t}Cl_{s}\qquad Cl_{t}^{\leq }=\bigcup _{s\leq t}Cl_{s}\qquad t\in T} == Main concepts == === Dominance === We say that x {\displaystyle x\,\!} dominates y {\displaystyle y\,\!} with respect to P ⊆ C {\displaystyle P\subseteq C} , denoted by x D p y {\displaystyle xD_{p}y\,\!} , if x {\displaystyle x\,\!} is better than y {\displaystyle y\,\!} on every criterion from P {\displaystyle P\,\!} , x ⪰ q y , ∀ q ∈ P {\displaystyle x\succeq _{q}y,\,\forall q\in P} . For each P ⊆ C {\displaystyle P\subseteq C} , the dominance relation D P {\displaystyle D_{P}\,\!} is reflexive and transitive, i.e. it is a partial pre-order. Given P ⊆ C {\displaystyle P\subseteq C} and x ∈ U {\displaystyle x\in U} , let D P + ( x ) = { y ∈ U : y D p x } {\displaystyle D_{P}^{+}(x)=\{y\in U\colon yD_{p}x\}} D P − ( x ) = { y ∈ U : x D p y } {\displaystyle D_{P}^{-}(x)=\{y\in U\colon xD_{p}y\}} represent P-dominating set and P-dominated set with respect to x ∈ U {\displaystyle x\in U} , respectively. === Rough approximations === The key idea of the rough set philosophy is approximation of one knowledge by another knowledge. In DRSA, the knowledge being approximated is a collection of upward and downward unions of decision classes and the "granules of knowledge" used for approximation are P-dominating and P-dominated sets. The P-lower and the P-upper approximation of C l t ≥ , t ∈ T {\displaystyle Cl_{t}^{\geq },t\in T} with respect to P ⊆ C {\displaystyle P\subseteq C} , denoted as P _ ( C l t ≥ ) {\displaystyle {\underline {P}}(Cl_{t}^{\geq })} and P ¯ ( C l t ≥ ) {\displaystyle {\overline {P}}(Cl_{t}^{\geq })} , respectively, are defined as: P _ ( C l t ≥ ) = { x ∈ U : D P + ( x ) ⊆ C l t ≥ } {\displaystyle {\underline {P}}(Cl_{t}^{\geq })=\{x\in U\colon D_{P}^{+}(x)\subseteq Cl_{t}^{\geq }\}} P ¯ ( C l t ≥ ) = { x ∈ U : D P − ( x ) ∩ C l t ≥ ≠ ∅ } {\displaystyle {\overline {P}}(Cl_{t}^{\geq })=\{x\in U\colon D_{P}^{-}(x)\cap Cl_{t}^{\geq }\neq \emptyset \}} Analogously, the P-lower and the P-upper approximation of C l t ≤ , t ∈ T {\displaystyle Cl_{t}^{\leq },t\in T} with respect to P ⊆ C {\displaystyle P\subseteq C} , denoted as P _ ( C l t ≤ ) {\displaystyle {\underline {P}}(Cl_{t}^{\leq })} and P ¯ ( C l t ≤ ) {\displaystyle {\overline {P}}(Cl_{t}^{\leq })} , respectively, are defined as: P _ ( C l t ≤ ) = { x ∈ U : D P − ( x ) ⊆ C l t ≤ } {\displaystyle {\underline {P}}(Cl_{t}^{\leq })=\{x\in U\colon D_{P}^{-}(x)\subseteq Cl_{t}^{\leq }\}} P ¯ ( C l t ≤ ) = { x ∈ U : D P + ( x ) ∩ C l t ≤ ≠ ∅ } {\displaystyle {\overline {P}}(Cl_{t}^{\leq })=\{x\in U\colon D_{P}^{+}(x)\cap Cl_{t}^{\leq }\neq \emptyset \}} Lower approximations group the objects which certainly belong to class union C l t ≥ {\displaystyle Cl_{t}^{\geq }} (respectively C l t ≤ {\displaystyle Cl_{t}^{\leq }} ). This certainty comes from the fact, that object x ∈ U {\displaystyle x\in U} belongs to the lower approximation P _ ( C l t ≥ ) {\displaystyle {\underline {P}}(Cl_{t}^{\geq })} (respectively P _ ( C l t ≤ ) {\displaystyle {\underl

    Read more →
  • Fling (social network)

    Fling (social network)

    Fling was a social media app available for IOS and Android. It was founded in 2014 by Marco Nardone and was taken offline in August 2016. == Overview == In 2012, Marco Nardone founded the startup Unii and launched Unii.com, a social network intended for students in the UK. While working on this service, Nardone had the idea for a messaging service where pictures could be sent to strangers in January 2014. The app Fling was then developed and released between March and July 2014. After a month, it already had 375,000 downloads and 180,000 active users on iOS. Users were able to take pictures inside the app and send them to 50 random people all over the world. The recipient could then choose to answer via chat or reply by sending a picture themselves. The app was used by many users as a medium to exchange sexually explicit pictures and for sexting with strangers. This led to the app being removed from the App Store in June 2015. In the 19 days that followed, flings developers rewrote the App almost completely from scratch, working around the clock. The feature to message random strangers was removed, and the app was readmitted into the App Store as a messenger App resembling Snapchat. But the redesigned Application did not have the success of its predecessor. The funding ran out and the parent company Unii went bankrupt. The company was not able to pay their content moderation team anymore, leading to a new surge of pornographic content on the App. Shortly after that, the Social Network was taken offline in August 2016. It has been inactive since. During the 2 years Fling was online, $21 million was raised from investors while generating no revenue at all. Of this $21 million (£16.5m), £5 million came from Nardone's father. == Allegations against CEO == Former employees made multiple allegations against Marco Nardone, the Founder and CEO of Unii and Fling. According to these claims, he behaved erratic and abusive, throwing "things across the office". He hired his girlfriend as the head of human resources to handle issues between him and his staff. Employees who left the company often had "some part of their pay held back". According to the reports, he also spent the money raised from investors irresponsibly, having no clear concept of a budget. Some of that money was used on expensive restaurants in London, a luxurious office for CEO Nardone and advertisements for Fling on Twitter and Facebook. Nardone also spent time partying in Ibiza with two employees, while the developer team in London frantically tried to get Fling back online after it being removed from the App Store. In December 2017 he pleaded guilty to assaulting his girlfriend at a domestic violence court.

    Read more →
  • Physical neural network

    Physical neural network

    A physical neural network is a type of artificial neural network in which an electrically adjustable material is used to emulate the function of a neural synapse or a higher-order (dendritic) neuron model. "Physical" neural network is used to emphasize the reliance on physical hardware used to emulate neurons as opposed to software-based approaches. More generally the term is applicable to other artificial neural networks in which a memristor or other electrically adjustable resistance material is used to emulate a neural synapse. == Types of physical neural networks == === ADALINE === In the 1960s Bernard Widrow and Ted Hoff developed ADALINE (Adaptive Linear Neuron) which used electrochemical cells called memistors (memory resistors) to emulate synapses of an artificial neuron. The memistors were implemented as 3-terminal devices operating based on the reversible electroplating of copper such that the resistance between two of the terminals is controlled by the integral of the current applied via the third terminal. The ADALINE circuitry was briefly commercialized by the Memistor Corporation in the 1960s enabling some applications in pattern recognition. However, since the memistors were not fabricated using integrated circuit fabrication techniques the technology was not scalable and was eventually abandoned as solid-state electronics became mature. === Analog VLSI === In 1989 Carver Mead published his book Analog VLSI and Neural Systems, which spun off perhaps the most common variant of analog neural networks. The physical realization is implemented in analog VLSI. This is often implemented as field effect transistors in low inversion. Such devices can be modelled as translinear circuits. This is a technique described by Barrie Gilbert in several papers around mid 1970th, and in particular his Translinear Circuits from 1981. With this method circuits can be analyzed as a set of well-defined functions in steady-state, and such circuits assembled into complex networks. === Physical Neural Network === Alex Nugent describes a physical neural network as one or more nonlinear neuron-like nodes used to sum signals and nanoconnections formed from nanoparticles, nanowires, or nanotubes which determine the signal strength input to the nodes. Alignment or self-assembly of the nanoconnections is determined by the history of the applied electric field performing a function analogous to neural synapses. Numerous applications for such physical neural networks are possible. For example, a temporal summation device can be composed of one or more nanoconnections having an input and an output thereof, wherein an input signal provided to the input causes one or more of the nanoconnection to experience an increase in connection strength thereof over time. Another example of a physical neural network is taught by U.S. Patent No. 7,039,619 entitled "Utilized nanotechnology apparatus using a neural network, a solution and a connection gap," which issued to Alex Nugent by the U.S. Patent & Trademark Office on May 2, 2006. A further application of physical neural network is shown in U.S. Patent No. 7,412,428 entitled "Application of hebbian and anti-hebbian learning to nanotechnology-based physical neural networks," which issued on August 12, 2008. Nugent and Molter have shown that universal computing and general-purpose machine learning are possible from operations available through simple memristive circuits operating the AHaH plasticity rule. More recently, it has been argued that also complex networks of purely memristive circuits can serve as neural networks. === Phase change neural network === In 2002, Stanford Ovshinsky described an analog neural computing medium in which phase-change material has the ability to cumulatively respond to multiple input signals. An electrical alteration of the resistance of the phase change material is used to control the weighting of the input signals. === Memristive neural network === Greg Snider of HP Labs describes a system of cortical computing with memristive nanodevices. The memristors (memory resistors) are implemented by thin film materials in which the resistance is electrically tuned via the transport of ions or oxygen vacancies within the film. DARPA's SyNAPSE project has funded IBM Research and HP Labs, in collaboration with the Boston University Department of Cognitive and Neural Systems (CNS), to develop neuromorphic architectures which may be based on memristive systems. === Protonic artificial synapses === In 2022, researchers reported the development of nanoscale brain-inspired artificial synapses, using the ion proton (H+), for 'analog deep learning'.

    Read more →
  • Latent and observable variables

    Latent and observable variables

    In statistics, latent variables (from Latin: present participle of lateo 'lie hidden') are variables that can only be inferred indirectly through a mathematical model from other observable variables that can be directly observed or measured. Such latent variable models are used in many disciplines, including engineering, medicine, ecology, physics, machine learning/artificial intelligence, natural language processing, bioinformatics, chemometrics, demography, economics, management, political science, psychology and the social sciences. Latent variables may correspond to aspects of physical reality. These could in principle be measured, but may not be for practical reasons. Among the earliest expressions of this idea is Francis Bacon's polemic the Novum Organum, itself a challenge to the more traditional logic expressed in Aristotle's Organon: But the latent process of which we speak, is far from being obvious to men’s minds, beset as they now are. For we mean not the measures, symptoms, or degrees of any process which can be exhibited in the bodies themselves, but simply a continued process, which, for the most part, escapes the observation of the senses. In this situation, the term hidden variables is commonly used, reflecting the fact that the variables are meaningful, but not observable. Other latent variables correspond to abstract concepts, like categories, behavioral or mental states, or data structures. The terms hypothetical variables or hypothetical constructs may be used in these situations. The use of latent variables can serve to reduce the dimensionality of data. Many observable variables can be aggregated in a model to represent an underlying concept, making it easier to understand the data. In this sense, they serve a function similar to that of scientific theories. At the same time, latent variables link observable "sub-symbolic" data in the real world to symbolic data in the modeled world. == Examples == === Psychology === Latent variables, as created by factor analytic methods, generally represent "shared" variance, or the degree to which variables "move" together. Variables that have no correlation cannot result in a latent construct based on the common factor model. The "Big Five personality traits" have been inferred using factor analysis. extraversion spatial ability wisdom: “Two of the more predominant means of assessing wisdom include wisdom-related performance and latent variable measures.” Spearman's g, or the general intelligence factor in psychometrics === Economics === Examples of latent variables from the field of economics include quality of life, business confidence, morale, happiness and conservatism: these are all variables which cannot be measured directly. However, by linking these latent variables to other, observable variables, the values of the latent variables can be inferred from measurements of the observable variables. Quality of life is a latent variable which cannot be measured directly, so observable variables are used to infer quality of life. Observable variables to measure quality of life include wealth, employment, environment, physical and mental health, education, recreation and leisure time, and social belonging. === Medicine === Latent-variable methodology is used in many branches of medicine. A class of problems that naturally lend themselves to latent variables approaches are longitudinal studies where the time scale (e.g. age of participant or time since study baseline) is not synchronized with the trait being studied. For such studies, an unobserved time scale that is synchronized with the trait being studied can be modeled as a transformation of the observed time scale using latent variables. Examples of this include disease progression modeling and modeling of growth (see box). == Inferring latent variables == There exists a range of different model classes and methodology that make use of latent variables and allow inference in the presence of latent variables. Models include: linear mixed-effects models and nonlinear mixed-effects models Hidden Markov models Factor analysis Item response theory Analysis and inference methods include: Principal component analysis Instrumented principal component analysis Partial least squares regression Latent semantic analysis and probabilistic latent semantic analysis EM algorithms Metropolis–Hastings algorithm === Bayesian algorithms and methods === Bayesian statistics is often used for inferring latent variables. Latent Dirichlet allocation The Chinese restaurant process is often used to provide a prior distribution over assignments of objects to latent categories. The Indian buffet process is often used to provide a prior distribution over assignments of latent binary features to objects.

    Read more →
  • Reservoir computing

    Reservoir computing

    Reservoir computing is a framework for computation derived from recurrent neural network theory that maps input signals into higher dimensional computational spaces through the dynamics of a fixed, non-linear system called a reservoir. After the input signal is fed into the reservoir, which is treated as a "black box," a simple readout mechanism is trained to read the state of the reservoir and map it to the desired output. The first key benefit of this framework is that training is performed only at the readout stage, as the reservoir dynamics are fixed. The second is that the computational power of naturally available systems, both classical and quantum mechanical, can be used to reduce the effective computational cost. == History == The first examples of reservoir neural networks demonstrated that randomly connected recurrent neural networks could be used for sensorimotor sequence learning, and simple forms of interval and speech discrimination. In these early models the memory in the network took the form of both short-term synaptic plasticity and activity mediated by recurrent connections. In other early reservoir neural network models the memory of the recent stimulus history was provided solely by the recurrent activity. Overall, the general concept of reservoir computing stems from the use of recursive connections within neural networks to create a complex dynamical system. It is a generalisation of earlier neural network architectures such as recurrent neural networks, liquid-state machines and echo-state networks. Reservoir computing also extends to physical systems that are not networks in the classical sense, but rather continuous systems in space and/or time: e.g. a literal "bucket of water" can serve as a reservoir that performs computations on inputs given as perturbations of the surface. The resultant complexity of such recurrent neural networks was found to be useful in solving a variety of problems including language processing and dynamic system modeling. However, training of recurrent neural networks is challenging and computationally expensive. Reservoir computing reduces those training-related challenges by fixing the dynamics of the reservoir and only training the linear output layer. A large variety of nonlinear dynamical systems can serve as a reservoir that performs computations. In recent years semiconductor lasers have attracted considerable interest as computation can be fast and energy efficient compared to electrical components. Recent advances in both AI and quantum information theory have given rise to the concept of quantum neural networks. These hold promise in quantum information processing, which is challenging to classical networks, but can also find application in solving classical problems. In 2018, a physical realization of a quantum reservoir computing architecture was demonstrated in the form of nuclear spins within a molecular solid. However, the nuclear spin experiments in did not demonstrate quantum reservoir computing per se as they did not involve processing of sequential data. Rather the data were vector inputs, which makes this more accurately a demonstration of quantum implementation of a random kitchen sink algorithm (also going by the name of extreme learning machines in some communities). In 2019, another possible implementation of quantum reservoir processors was proposed in the form of two-dimensional fermionic lattices. In 2020, realization of reservoir computing on gate-based quantum computers was proposed and demonstrated on cloud-based IBM superconducting near-term quantum computers. Reservoir computers have been used for time-series analysis purposes. In particular, some of their usages involve chaotic time-series prediction, separation of chaotic signals, and link inference of networks from their dynamics. == Classical reservoir computing == === Reservoir === The 'reservoir' in reservoir computing is the internal structure of the computer, and must have two properties: it must be made up of individual, non-linear units, and it must be capable of storing information. The non-linearity describes the response of each unit to input, which is what allows reservoir computers to solve complex problems. Reservoirs are able to store information by connecting the units in recurrent loops, where the previous input affects the next response. The change in reaction due to the past allows the computers to be trained to complete specific tasks. Reservoirs can be virtual or physical. Virtual reservoirs are typically randomly generated and are designed like neural networks. Virtual reservoirs can be designed to have non-linearity and recurrent loops, but, unlike neural networks, the connections between units are randomized and remain unchanged throughout computation. Physical reservoirs are possible because of the inherent non-linearity of certain natural systems. The interaction between ripples on the surface of water contains the nonlinear dynamics required in reservoir creation, and a pattern recognition RC was developed by first inputting ripples with electric motors then recording and analyzing the ripples in the readout. === Readout === The readout is a neural network layer that performs a linear transformation on the output of the reservoir. The weights of the readout layer are trained by analyzing the spatiotemporal patterns of the reservoir after excitation by known inputs, and by utilizing a training method such as a linear regression or a Ridge regression. As its implementation depends on spatiotemporal reservoir patterns, the details of readout methods are tailored to each type of reservoir. For example, the readout for a reservoir computer using a container of liquid as its reservoir might entail observing spatiotemporal patterns on the surface of the liquid. === Types === ==== Context reverberation network ==== An early example of reservoir computing was the context reverberation network. In this architecture, an input layer feeds into a high dimensional dynamical system which is read out by a trainable single-layer perceptron. Two kinds of dynamical system were described: a recurrent neural network with fixed random weights, and a continuous reaction–diffusion system inspired by Alan Turing's model of morphogenesis. At the trainable layer, the perceptron associates current inputs with the signals that reverberate in the dynamical system; the latter were said to provide a dynamic "context" for the inputs. In the language of later work, the reaction–diffusion system served as the reservoir. ==== Echo state network ==== The tree echo state network (TreeESN) model represents a generalization of the reservoir computing framework to tree structured data. ==== Liquid-state machine ==== Chaotic liquid state machine The liquid (i.e. reservoir) of a chaotic liquid state machine (CLSM), or chaotic reservoir, is made from chaotic spiking neurons but which stabilize their activity by settling to a single hypothesis that describes the trained inputs of the machine. This is in contrast to general types of reservoirs that don't stabilize. The liquid stabilization occurs via synaptic plasticity and chaos control that govern neural connections inside the liquid. CLSM showed promising results in learning sensitive time series data. ==== Nonlinear transient computation ==== This type of information processing is most relevant when time-dependent input signals depart from the mechanism's internal dynamics. These departures cause transients or temporary altercations which are represented in the device's output. ==== Deep reservoir computing ==== The extension of the reservoir computing framework towards deep learning, with the introduction of deep reservoir computing and of the deep echo state network (DeepESN) model allows to develop efficiently trained models for hierarchical processing of temporal data, at the same time enabling the investigation on the inherent role of layered composition in recurrent neural networks. == Quantum reservoir computing == Quantum reservoir computing may use the nonlinear nature of quantum mechanical interactions or processes to form the characteristic nonlinear reservoirs but may also be done with linear reservoirs when the injection of the input to the reservoir creates the nonlinearity. The marriage of machine learning and quantum devices is leading to the emergence of quantum neuromorphic computing as a new research area. === Types === ==== Gaussian states of interacting quantum harmonic oscillators ==== Gaussian states are a paradigmatic class of states of continuous variable quantum systems. Although they can nowadays be created and manipulated in, e.g, state-of-the-art optical platforms, naturally robust to decoherence, it is well-known that they are not sufficient for, e.g., universal quantum computing because transformations that preserve the Gaussian nature of a state are linear. Normally, linear dynamics would not be sufficient for nontrivial reser

    Read more →
  • CapCut

    CapCut

    CapCut, known domestically as JianYing (Chinese: 剪映; pinyin: Jiǎnyìng) and formerly internationally as ViaMaker, is a video editor developed by ByteDance, available as a mobile app, desktop app, and web app. == History == The app was first released in China in 2019 and was initially available for iPhone and Android. In 2020, it was rebranded in English from ViaMaker to CapCut and became available globally. It later expanded to include web and desktop versions for Mac and Windows. In 2022, CapCut reached 200 million active users. According to The Wall Street Journal, in March 2023, it was the second-most downloaded app in the U.S., behind that of Chinese discount retailer Temu. In January 2025, CapCut had over 1 billion downloads on the Google Play Store. On February 1, 2021, CapCut Pro for Windows was launched. On November 27, the Pro version for Mac was launched. In July 2025, CapCut Pro for HarmonyOS was available on HarmonyOS NEXT tablets. In July 2024, CapCut was reported by the South China Morning Post to be a generative AI (GenAI) application that led global AI app downloads, with approximately 38.42 million downloads and 323 million monthly active users. == Features == CapCut supports basic video editing functions, including editing, trimming, and adding or splitting clips. Editing projects is limited to single-layer editing, but the app supports overlay options that enable additional effects, including multi-layer editing. The app includes a library of pre-made templates and a tool that generates editable video captions. It also provides photo editing tools, including retouch and product photo features integrated within the editing interface. CapCut's video editor includes AI-based features such as video and script generation. Users can export or save completed projects directly to different social media platforms. CapCut includes a free version and a paid Pro version with cloud storage and advanced features. == Controversies == === Illegal data collection === In July 2023, many users of CapCut accused it of illegally profiting off their personal data. A class-action lawsuit filed in the U.S. District Court for the Northern District of Illinois on July 28, 2023, alleged that CapCut illegally harvests and profits from user data including biometric information and geolocation without consent. In September 2025, a federal court excluded most of the lawsuit, which alleged that TikTok’s parent company improperly scraped private data from CapCut's video editing software, as lacking grounds, with some of the class action continuing to move forward. == Bans and restrictions == === Ban in India === As a response to border clashes with China in May 2020, the Indian government banned around 56 Chinese applications including CapCut and TikTok, which is owned by CapCut's parent company ByteDance. Indian users were unable to use and download the application. As of February 2022, around 273 Chinese applications have been banned by the Indian government under the concern of national security and Indian user privacy. === Ban in the United States === On January 18, 2025, at 10 PM EST, CapCut was banned in the United States along with TikTok and all other ByteDance apps due to the implementation of the Protecting Americans from Foreign Adversary Controlled Applications Act. Hours after the suspension of services took effect, President Donald Trump indicated on Truth Social that he would issue an executive order on the day of his inauguration "to extend the period of time before the law's prohibitions take effect". On January 21, CapCut began restoring service. On February 13, Google and Apple restored CapCut on the App Store and Google Play Store.

    Read more →
  • Modern Hopfield network

    Modern Hopfield network

    Modern Hopfield networks (also known as Dense Associative Memories) are generalizations of the classical Hopfield networks that break the linear scaling relationship between the number of input features and the number of stored memories. This is achieved by introducing stronger non-linearities (either in the energy function or neurons’ activation functions) leading to super-linear (even an exponential) memory storage capacity as a function of the number of feature neurons. The network still requires a sufficient number of hidden neurons. The key theoretical idea behind the modern Hopfield networks is to use an energy function and an update rule that is more sharply peaked around the stored memories in the space of neuron’s configurations compared to the classical Hopfield network. == Classical Hopfield networks == Hopfield networks are recurrent neural networks with dynamical trajectories converging to fixed point attractor states and described by an energy function. The state of each model neuron i {\textstyle i} is defined by a time-dependent variable V i {\displaystyle V_{i}} , which can be chosen to be either discrete or continuous. A complete model describes the mathematics of how the future state of activity of each neuron depends on the known present or previous activity of all the neurons. In the original Hopfield model of associative memory, the variables were binary, and the dynamics were described by a one-at-a-time update of the state of the neurons. An energy function quadratic in the V i {\displaystyle V_{i}} was defined, and the dynamics consisted of changing the activity of each single neuron i {\displaystyle i} only if doing so would lower the total energy of the system. This same idea was extended to the case of V i {\displaystyle V_{i}} being a continuous variable representing the output of neuron i {\displaystyle i} , and V i {\displaystyle V_{i}} being a monotonic function of an input current. The dynamics became expressed as a set of first-order differential equations for which the "energy" of the system always decreased. The energy in the continuous case has one term which is quadratic in the V i {\displaystyle V_{i}} (as in the binary model), and a second term which depends on the gain function (neuron's activation function). While having many desirable properties of associative memory, both of these classical systems suffer from a small memory storage capacity, which scales linearly with the number of input features. == Discrete variables == A simple example of the Modern Hopfield network can be written in terms of binary variables V i {\displaystyle V_{i}} that represent the active V i = + 1 {\displaystyle V_{i}=+1} and inactive V i = − 1 {\displaystyle V_{i}=-1} state of the model neuron i {\displaystyle i} . E = − ∑ μ = 1 N mem F ( ∑ i = 1 N f ξ μ i V i ) {\displaystyle E=-\sum \limits _{\mu =1}^{N_{\text{mem}}}F{\Big (}\sum \limits _{i=1}^{N_{f}}\xi _{\mu i}V_{i}{\Big )}} In this formula the weights ξ μ i {\textstyle \xi _{\mu i}} represent the matrix of memory vectors (index μ = 1... N mem {\displaystyle \mu =1...N_{\text{mem}}} enumerates different memories, and index i = 1... N f {\displaystyle i=1...N_{f}} enumerates the content of each memory corresponding to the i {\displaystyle i} -th feature neuron), and the function F ( x ) {\displaystyle F(x)} is a rapidly growing non-linear function. The update rule for individual neurons (in the asynchronous case) can be written in the following form V i ( t + 1 ) = sign ⁡ [ ∑ μ = 1 N mem ( F ( ξ μ i + ∑ j ≠ i ξ μ j V j ( t ) ) − F ( − ξ μ i + ∑ j ≠ i ξ μ j V j ( t ) ) ) ] {\displaystyle V_{i}^{(t+1)}=\operatorname {sign} {\bigg [}\sum \limits _{\mu =1}^{N_{\text{mem}}}{\bigg (}F{\Big (}\xi _{\mu i}+\sum \limits _{j\neq i}\xi _{\mu j}V_{j}^{(t)}{\Big )}-F{\Big (}-\xi _{\mu i}+\sum \limits _{j\neq i}\xi _{\mu j}V_{j}^{(t)}{\Big )}{\bigg )}{\bigg ]}} which states that in order to calculate the updated state of the i {\textstyle i} -th neuron the network compares two energies: the energy of the network with the i {\displaystyle i} -th neuron in the ON state and the energy of the network with the i {\displaystyle i} -th neuron in the OFF state, given the states of the remaining neuron. The updated state of the i {\displaystyle i} -th neuron selects the state that has the lowest of the two energies. In the limiting case when the non-linear energy function is quadratic F ( x ) = x 2 {\displaystyle F(x)=x^{2}} these equations reduce to the familiar energy function and the update rule for the classical binary Hopfield network. The memory storage capacity of these networks can be calculated for random binary patterns. For the power energy function F ( x ) = x n {\displaystyle F(x)=x^{n}} the maximal number of memories that can be stored and retrieved from this network without errors is given by N mem max ≈ 1 2 ( 2 n − 3 ) ! ! N f n − 1 ln ⁡ ( N f ) {\displaystyle N_{\text{mem}}^{\max }\approx {\frac {1}{2(2n-3)!!}}{\frac {N_{f}^{n-1}}{\ln(N_{f})}}} For an exponential energy function F ( x ) = e x {\textstyle F(x)=e^{x}} the memory storage capacity is exponential in the number of feature neurons N mem max ≈ 2 N f / 2 {\displaystyle N_{\text{mem}}^{\max }\approx 2^{N_{f}/2}} == Continuous variables == Modern Hopfield networks or Dense Associative Memories can be best understood in continuous variables and continuous time. Consider the network architecture, shown in Fig.1, and the equations for the neurons' state evolutionwhere the currents of the feature neurons are denoted by x i {\textstyle x_{i}} , and the currents of the memory neurons are denoted by h μ {\displaystyle h_{\mu }} ( h {\displaystyle h} stands for hidden neurons). There are no synaptic connections among the feature neurons or the memory neurons. A matrix ξ μ i {\displaystyle \xi _{\mu i}} denotes the strength of synapses from a feature neuron i {\displaystyle i} to the memory neuron μ {\displaystyle \mu } . The synapses are assumed to be symmetric, so that the same value characterizes a different physical synapse from the memory neuron μ {\displaystyle \mu } to the feature neuron i {\displaystyle i} . The outputs of the memory neurons and the feature neurons are denoted by f μ {\displaystyle f_{\mu }} and g i {\displaystyle g_{i}} , which are non-linear functions of the corresponding currents. In general these outputs can depend on the currents of all the neurons in that layer so that f μ = f ( { h μ } ) {\displaystyle f_{\mu }=f(\{h_{\mu }\})} and g i = g ( { x i } ) {\textstyle g_{i}=g(\{x_{i}\})} . It is convenient to define these activation function as derivatives of the Lagrangian functions for the two groups of neuronsThis way the specific form of the equations for neuron's states is completely defined once the Lagrangian functions are specified. Finally, the time constants for the two groups of neurons are denoted by τ f {\displaystyle \tau _{f}} and τ h {\displaystyle \tau _{h}} , I i {\displaystyle I_{i}} is the input current to the network that can be driven by the presented data. General systems of non-linear differential equations can have many complicated behaviors that can depend on the choice of the non-linearities and the initial conditions. For Hopfield networks, however, this is not the case - the dynamical trajectories always converge to a fixed point attractor state. This property is achieved because these equations are specifically engineered so that they have an underlying energy function The terms grouped into square brackets represent a Legendre transform of the Lagrangian function with respect to the states of the neurons. If the Hessian matrices of the Lagrangian functions are positive semi-definite, the energy function is guaranteed to decrease on the dynamical trajectory This property makes it possible to prove that the system of dynamical equations describing temporal evolution of neurons' activities will eventually reach a fixed point attractor state. In certain situations one can assume that the dynamics of hidden neurons equilibrates at a much faster time scale compared to the feature neurons, τ h ≪ τ f {\textstyle \tau _{h}\ll \tau _{f}} . In this case the steady state solution of the second equation in the system (1) can be used to express the currents of the hidden units through the outputs of the feature neurons. This makes it possible to reduce the general theory (1) to an effective theory for feature neurons only. The resulting effective update rules and the energies for various common choices of the Lagrangian functions are shown in Fig.2. In the case of log-sum-exponential Lagrangian function the update rule (if applied once) for the states of the feature neurons is the attention mechanism commonly used in many modern AI systems (see Ref. for the derivation of this result from the continuous time formulation). == Relationship to classical Hopfield network with continuous variables == Classical formulation of continuous Hopfield networks can be understood as a

    Read more →
  • Markov blanket

    Markov blanket

    In statistics and machine learning, a Markov blanket of a random variable is a set of variables that renders the variable conditionally independent of all other variables in the system. This concept is central in probabilistic graphical models and feature selection. If a Markov blanket is minimal—meaning that no variable in it can be removed without losing this conditional independence—it is called a Markov boundary. Identifying a Markov blanket or boundary allows for efficient inference and helps isolate relevant variables for prediction or causal reasoning. The terms Markov blanket and Markov boundary were coined by Judea Pearl in 1988. A Markov blanket may be derived from the structure of a probabilistic graphical model such as a Bayesian network or Markov random field. == Definition == A Markov blanket of a random variable Y {\displaystyle Y} in a random variable set S = { X 1 , … , X n } {\displaystyle {\mathcal {S}}=\{X_{1},\ldots ,X_{n}\}} is any subset S 1 {\displaystyle {\mathcal {S}}_{1}} of S {\displaystyle {\mathcal {S}}} , conditioned on which other variables are independent with Y {\displaystyle Y} : Y ⊥ ⊥ S ∖ S 1 ∣ S 1 {\displaystyle Y\perp \!\!\!\perp {\mathcal {S}}\smallsetminus {\mathcal {S}}_{1}\mid {\mathcal {S}}_{1}} It means that S 1 {\displaystyle {\mathcal {S}}_{1}} contains at least all the information one needs to infer Y {\displaystyle Y} , where the variables in S ∖ S 1 {\displaystyle {\mathcal {S}}\smallsetminus {\mathcal {S}}_{1}} are redundant. In general, a given Markov blanket is not unique. Any set in S {\displaystyle {\mathcal {S}}} that contains a Markov blanket is also a Markov blanket itself. Specifically, S {\displaystyle {\mathcal {S}}} is a Markov blanket of Y {\displaystyle Y} in S {\displaystyle {\mathcal {S}}} . === Example === In a Bayesian network, the Markov blanket of a node consists of its parents, its children, and its children's other parents (i.e., co-parents). Knowing the values of these nodes makes the target node conditionally independent of the rest of the network. In a Markov random field, the Markov blanket of a node is simply its immediate neighbors. == Markov condition == The concept of a Markov blanket is rooted in the Markov condition, which states that in a probabilistic graphical model, each variable is conditionally independent of its non-descendants given its parents. This condition implies the existence of a minimal separating set — the Markov blanket — that shields a variable from the rest of the network. For instance, when a person holds an object stationary against gravity, the object’s acceleration is fully determined by its direct causes—namely, the upward force from the hand and the downward gravitational pull. Other variables such as air pressure or temperature are causally irrelevant. == Markov boundary == A Markov boundary of Y {\displaystyle Y} in S {\displaystyle {\mathcal {S}}} is a subset S 2 {\displaystyle {\mathcal {S}}_{2}} of S {\displaystyle {\mathcal {S}}} , such that S 2 {\displaystyle {\mathcal {S}}_{2}} itself is a Markov blanket of Y {\displaystyle Y} , but any proper subset of S 2 {\displaystyle {\mathcal {S}}_{2}} is not a Markov blanket of Y {\displaystyle Y} . In other words, a Markov boundary is a minimal Markov blanket. The Markov boundary of a node A {\displaystyle A} in a Bayesian network is the set of nodes composed of A {\displaystyle A} 's parents, A {\displaystyle A} 's children, and A {\displaystyle A} 's children's other parents. In a Markov random field, the Markov boundary for a node is the set of its neighboring nodes. In a dependency network, the Markov boundary for a node is the set of its parents. === Uniqueness of Markov boundary === The Markov boundary always exists. Under some mild conditions, the Markov boundary is unique. However, for most practical and theoretical scenarios multiple Markov boundaries may provide alternative solutions. When there are multiple Markov boundaries, quantities measuring causal effect could fail. == In cognitive science == In the study of consciousness, brain function, and complex adaptive systems, Markov blankets are proposed as a mathematical mechanism which delimits the extent of cognitive entities, whether it be physical or causal.

    Read more →
  • Grammatical evolution

    Grammatical evolution

    Grammatical evolution (GE) is a genetic programming (GP) technique (or approach) from evolutionary computation pioneered by Conor Ryan, JJ Collins and Michael O'Neill in 1998 at the BDS Group in the University of Limerick. As in any other GP approach, the objective is to find an executable program, program fragment, or function, which will achieve a good fitness value for a given objective function. In most published work on GP, a LISP-style tree-structured expression is directly manipulated, whereas GE applies genetic operators to an integer string, subsequently mapped to a program (or similar) through the use of a grammar, which is typically expressed in Backus–Naur form. One of the benefits of GE is that this mapping simplifies the application of search to different programming languages and other structures. == Problem addressed == In type-free, conventional Koza-style GP, the function set must meet the requirement of closure: all functions must be capable of accepting as their arguments the output of all other functions in the function set. Usually, this is implemented by dealing with a single data-type such as double-precision floating point. While modern Genetic Programming frameworks support typing, such type-systems have limitations that Grammatical Evolution does not suffer from. == GE's solution == GE offers a solution to the single-type limitation by evolving solutions according to a user-specified grammar (usually a grammar in Backus-Naur form). Therefore, the search space can be restricted, and domain knowledge of the problem can be incorporated. The inspiration for this approach comes from a desire to separate the "genotype" from the "phenotype": in GP, the objects the search algorithm operates on and what the fitness evaluation function interprets are one and the same. In contrast, GE's "genotypes" are ordered lists of integers which code for selecting rules from the provided context-free grammar. The phenotype, however, is the same as in Koza-style GP: a tree-like structure that is evaluated recursively. This model is more in line with how genetics work in nature, where there is a separation between an organism's genotype and the final expression of phenotype in proteins, etc. Separating genotype and phenotype allows a modular approach. In particular, the search portion of the GE paradigm needn't be carried out by any one particular algorithm or method. Observe that the objects GE performs search on are the same as those used in genetic algorithms. This means, in principle, that any existing genetic algorithm package, such as the popular GAlib, can be used to carry out the search, and a developer implementing a GE system need only worry about carrying out the mapping from list of integers to program tree. It is also in principle possible to perform the search using some other method, such as particle swarm optimization (see the remark below); the modular nature of GE creates many opportunities for hybrids as the problem of interest to be solved dictates. Brabazon and O'Neill have successfully applied GE to predicting corporate bankruptcy, forecasting stock indices, bond credit ratings, and other financial applications. GE has also been used with a classic predator-prey model to explore the impact of parameters such as predator efficiency, niche number, and random mutations on ecological stability. It is possible to structure a GE grammar that for a given function/terminal set is equivalent to genetic programming. == Criticism == Despite its successes, GE has been the subject of some criticism. One issue is that as a result of its mapping operation, GE's genetic operators do not achieve high locality which is a highly regarded property of genetic operators in evolutionary algorithms. == Variants == Although GE was originally described in terms of using an Evolutionary Algorithm, specifically, a Genetic Algorithm, other variants exist. For example, GE researchers have experimented with using particle swarm optimization to carry out the searching instead of genetic algorithms with results comparable to that of normal GE; this is referred to as a "grammatical swarm"; using only the basic PSO model it has been found that PSO is probably equally capable of carrying out the search process in GE as simple genetic algorithms are. (Although PSO is normally a floating-point search paradigm, it can be discretized, e.g., by simply rounding each vector to the nearest integer, for use with GE.) Yet another possible variation that has been experimented with in the literature is attempting to encode semantic information in the grammar in order to further bias the search process. Other work showed that, with biased grammars that leverage domain knowledge, even random search can be used to drive GE. == Related work == GE was originally a combination of the linear representation as used by the Genetic Algorithm for Developing Software (GADS) and Backus Naur Form grammars, which were originally used in tree-based GP by Wong and Leung in 1995 and Whigham in 1996. Other related work noted in the original GE paper was that of Frederic Gruau, who used a conceptually similar "embryonic" approach, as well as that of Keller and Banzhaf, which similarly used linear genomes. == Implementations == There are several implementations of GE. These include the following.

    Read more →
  • Vector database

    Vector database

    A vector database, vector store or vector search engine is a database that stores and retrieves embeddings of data in vector space. Vector databases typically implement approximate nearest neighbor algorithms so users can search for records semantically similar to a given input, unlike traditional databases which primarily look up records by exact match. Use-cases for vector databases include similarity search, semantic search, multi-modal search, recommendations engines, object detection, and retrieval-augmented generation (RAG). Vector embeddings are mathematical representations of data in a high-dimensional space. In this space, each dimension corresponds to a feature of the data, with the number of dimensions ranging from a few hundred to tens of thousands, depending on the complexity of the data being represented. Each data item is represented by one vector in this space. Words, phrases, or entire documents, as well as images, audio, and other types of data, can all be vectorized. These feature vectors may be computed from the raw data using machine learning methods such as feature extraction algorithms, word embeddings or deep learning networks. The goal is that semantically similar data items receive feature vectors close to each other. Vector retrieval can be combined with metadata filtering or lexical search to support filtered and hybrid retrieval workflows. == Techniques == Common techniques for similarity search on high-dimensional vectors include: Hierarchical Navigable Small World (HNSW) graphs Locality-sensitive hashing (LSH) and sketching Product quantization (PQ) Inverted files These techniques may also be combined in vector search systems. In recent benchmarks, HNSW-based implementations have been among the best performers. Conferences such as the International Conference on Similarity Search and Applications (SISAP) and the Conference on Neural Information Processing Systems (NeurIPS) have hosted competitions on vector search in large databases. == Applications == Vector databases are used in a wide range of machine learning applications including similarity search, semantic search, multi-modal search, recommendations engines, object detection, and retrieval-augmented generation. === Retrieval-augmented generation === An especially common use-case for vector databases is in retrieval-augmented generation (RAG), a method to improve domain-specific responses of large language models. The retrieval component of a RAG can be any search system, but is most often implemented as a vector database. Text documents describing the domain of interest are collected, and for each document or document section, a feature vector (known as an "embedding") is computed, typically using a deep learning network, and stored in a vector database along with a link to the document. Given a user prompt, the feature vector of the prompt is computed, and the database is queried to retrieve the most relevant documents. These are then automatically added into the context window of the large language model, and the large language model proceeds to create a response to the prompt given this context. == Implementations ==

    Read more →
  • Stochastic gradient descent

    Stochastic gradient descent

    Stochastic gradient descent (often abbreviated SGD) is an iterative method for optimizing an objective function with suitable smoothness properties (e.g. differentiable or subdifferentiable). It can be regarded as a stochastic approximation of gradient descent optimization, since it replaces the actual gradient (calculated from the entire data set) by an estimate thereof (calculated from a randomly selected subset of the data). Especially in high-dimensional optimization problems this reduces the very high computational burden, achieving faster iterations in exchange for a lower convergence rate. The basic idea behind stochastic approximation can be traced back to the Robbins–Monro algorithm of the 1950s. Today, stochastic gradient descent has become an important optimization method in machine learning. == Background == Both statistical estimation and machine learning consider the problem of minimizing an objective function that has the form of a sum: Q ( w ) = 1 n ∑ i = 1 n Q i ( w ) , {\displaystyle Q(w)={\frac {1}{n}}\sum _{i=1}^{n}Q_{i}(w),} where the parameter w {\displaystyle w} that minimizes Q ( w ) {\displaystyle Q(w)} is to be estimated. Each summand function Q i {\displaystyle Q_{i}} is typically associated with the i {\displaystyle i} -th observation in the data set (used for training). In classical statistics, sum-minimization problems arise in least squares and in maximum-likelihood estimation (for independent observations). The general class of estimators that arise as minimizers of sums are called M-estimators. However, in statistics, it has been long recognized that requiring even local minimization is too restrictive for some problems of maximum-likelihood estimation. Therefore, contemporary statistical theorists often consider stationary points of the likelihood function (or zeros of its derivative, the score function, and other estimating equations). The sum-minimization problem also arises for empirical risk minimization. There, Q i ( w ) {\displaystyle Q_{i}(w)} is the value of the loss function at i {\displaystyle i} -th example, and Q ( w ) {\displaystyle Q(w)} is the empirical risk. When used to minimize the above function, a standard (or "batch") gradient descent method would perform the following iterations: w := w − η ∇ Q ( w ) = w − η n ∑ i = 1 n ∇ Q i ( w ) . {\displaystyle w:=w-\eta \,\nabla Q(w)=w-{\frac {\eta }{n}}\sum _{i=1}^{n}\nabla Q_{i}(w).} The step size is denoted by η {\displaystyle \eta } (sometimes called the learning rate in machine learning) and here " := {\displaystyle :=} " denotes the update of a variable in the algorithm. In many cases, the summand functions have a simple form that enables inexpensive evaluations of the sum-function and the sum gradient. For example, in statistics, one-parameter exponential families allow economical function-evaluations and gradient-evaluations. However, in other cases, evaluating the sum-gradient may require expensive evaluations of the gradients from all summand functions. When the training set is enormous and no simple formulas exist, evaluating the sums of gradients becomes very expensive, because evaluating the gradient requires evaluating all the summand functions' gradients. To economize on the computational cost at every iteration, stochastic gradient descent samples a subset of summand functions at every step. This is very effective in the case of large-scale machine learning problems. == Iterative method == In stochastic (or "on-line") gradient descent, the true gradient of Q ( w ) {\displaystyle Q(w)} is approximated by a gradient at a single sample: w := w − η ∇ Q i ( w ) . {\displaystyle w:=w-\eta \,\nabla Q_{i}(w).} As the algorithm sweeps through the training set, it performs the above update for each training sample. Several passes can be made over the training set until the algorithm converges. If this is done, the data can be shuffled for each pass to prevent cycles. Typical implementations may use an adaptive learning rate so that the algorithm converges. In pseudocode, stochastic gradient descent can be presented as : A compromise between computing the true gradient and the gradient at a single sample is to compute the gradient against more than one training sample (called a "mini-batch") at each step. This can perform significantly better than "true" stochastic gradient descent described, because the code can make use of vectorization libraries rather than computing each step separately as was first shown in where it was called "the bunch-mode back-propagation algorithm". It may also result in smoother convergence, as the gradient computed at each step is averaged over more training samples. The convergence of stochastic gradient descent has been analyzed using the theories of convex minimization and of stochastic approximation. Briefly, when the learning rates η {\displaystyle \eta } decrease with an appropriate rate, and subject to relatively mild assumptions, stochastic gradient descent converges almost surely to a global minimum when the objective function is convex or pseudoconvex, and otherwise converges almost surely to a local minimum. This is in fact a consequence of the Robbins–Siegmund theorem. == Linear regression == Suppose we want to fit a straight line y ^ = w 1 + w 2 x {\displaystyle {\hat {y}}=w_{1}+w_{2}x} to a training set with observations ( ( x 1 , y 1 ) , ( x 2 , y 2 ) … , ( x n , y n ) ) {\displaystyle ((x_{1},y_{1}),(x_{2},y_{2})\ldots ,(x_{n},y_{n}))} and corresponding estimated responses ( y ^ 1 , y ^ 2 , … , y ^ n ) {\displaystyle ({\hat {y}}_{1},{\hat {y}}_{2},\ldots ,{\hat {y}}_{n})} using least squares. The objective function to be minimized is Q ( w ) = ∑ i = 1 n Q i ( w ) = ∑ i = 1 n ( y ^ i − y i ) 2 = ∑ i = 1 n ( w 1 + w 2 x i − y i ) 2 . {\displaystyle Q(w)=\sum _{i=1}^{n}Q_{i}(w)=\sum _{i=1}^{n}\left({\hat {y}}_{i}-y_{i}\right)^{2}=\sum _{i=1}^{n}\left(w_{1}+w_{2}x_{i}-y_{i}\right)^{2}.} The last line in the above pseudocode for this specific problem will become: [ w 1 w 2 ] ← [ w 1 w 2 ] − η [ ∂ ∂ w 1 ( w 1 + w 2 x i − y i ) 2 ∂ ∂ w 2 ( w 1 + w 2 x i − y i ) 2 ] = [ w 1 w 2 ] − η [ 2 ( w 1 + w 2 x i − y i ) 2 x i ( w 1 + w 2 x i − y i ) ] . {\displaystyle {\begin{bmatrix}w_{1}\\w_{2}\end{bmatrix}}\leftarrow {\begin{bmatrix}w_{1}\\w_{2}\end{bmatrix}}-\eta {\begin{bmatrix}{\frac {\partial }{\partial w_{1}}}(w_{1}+w_{2}x_{i}-y_{i})^{2}\\{\frac {\partial }{\partial w_{2}}}(w_{1}+w_{2}x_{i}-y_{i})^{2}\end{bmatrix}}={\begin{bmatrix}w_{1}\\w_{2}\end{bmatrix}}-\eta {\begin{bmatrix}2(w_{1}+w_{2}x_{i}-y_{i})\\2x_{i}(w_{1}+w_{2}x_{i}-y_{i})\end{bmatrix}}.} Note that in each iteration or update step, the gradient is only evaluated at a single x i {\displaystyle x_{i}} . This is the key difference between stochastic gradient descent and batched gradient descent. In general, given a linear regression y ^ = ∑ k ∈ 1 : m w k x k {\displaystyle {\hat {y}}=\sum _{k\in 1:m}w_{k}x_{k}} problem, stochastic gradient descent behaves differently when m < n {\displaystyle m

  • Tucker decomposition

    Tucker decomposition

    In mathematics, Tucker decomposition decomposes a tensor into a set of matrices and one small core tensor. It is named after Ledyard R. Tucker although it goes back to Hitchcock in 1927. Initially described as a three-mode extension of factor analysis and principal component analysis it may actually be generalized to higher mode analysis, which is also called higher-order singular value decomposition (HOSVD) or the M-mode SVD. The algorithm to which the literature typically refers when discussing the Tucker decomposition or the HOSVD is the M-mode SVD algorithm introduced by Vasilescu and Terzopoulos, but misattributed to Tucker or De Lathauwer etal. It may be regarded as a more flexible PARAFAC (parallel factor analysis) model. In PARAFAC the core tensor is restricted to be "diagonal". In practice, Tucker decomposition is used as a modelling tool. For instance, it is used to model three-way (or higher way) data by means of relatively small numbers of components for each of the three or more modes, and the components are linked to each other by a three- (or higher-) way core array. The model parameters are estimated in such a way that, given fixed numbers of components, the modelled data optimally resemble the actual data in the least squares sense. The model gives a summary of the information in the data, in the same way as principal components analysis does for two-way data. For a 3rd-order tensor T ∈ F n 1 × n 2 × n 3 {\displaystyle T\in F^{n_{1}\times n_{2}\times n_{3}}} , where F {\displaystyle F} is either R {\displaystyle \mathbb {R} } or C {\displaystyle \mathbb {C} } , Tucker Decomposition can be denoted as follows, T = T × 1 U ( 1 ) × 2 U ( 2 ) × 3 U ( 3 ) {\displaystyle T={\mathcal {T}}\times _{1}U^{(1)}\times _{2}U^{(2)}\times _{3}U^{(3)}} where T ∈ F d 1 × d 2 × d 3 {\displaystyle {\mathcal {T}}\in F^{d_{1}\times d_{2}\times d_{3}}} is the core tensor, a 3rd-order tensor that contains the 1-mode, 2-mode and 3-mode singular values of T {\displaystyle T} , which are defined as the Frobenius norm of the 1-mode, 2-mode and 3-mode slices of tensor T {\displaystyle {\mathcal {T}}} respectively. U ( 1 ) , U ( 2 ) , U ( 3 ) {\displaystyle U^{(1)},U^{(2)},U^{(3)}} are unitary matrices in F d 1 × n 1 , F d 2 × n 2 , F d 3 × n 3 {\displaystyle F^{d_{1}\times n_{1}},F^{d_{2}\times n_{2}},F^{d_{3}\times n_{3}}} respectively. The k-mode product (k = 1, 2, 3) of T {\displaystyle {\mathcal {T}}} by U ( k ) {\displaystyle U^{(k)}} is denoted as T × U ( k ) {\displaystyle {\mathcal {T}}\times U^{(k)}} with entries as ( T × 1 U ( 1 ) ) ( i 1 , j 2 , j 3 ) = ∑ j 1 = 1 d 1 T ( j 1 , j 2 , j 3 ) U ( 1 ) ( j 1 , i 1 ) ( T × 2 U ( 2 ) ) ( j 1 , i 2 , j 3 ) = ∑ j 2 = 1 d 2 T ( j 1 , j 2 , j 3 ) U ( 2 ) ( j 2 , i 2 ) ( T × 3 U ( 3 ) ) ( j 1 , j 2 , i 3 ) = ∑ j 3 = 1 d 3 T ( j 1 , j 2 , j 3 ) U ( 3 ) ( j 3 , i 3 ) {\displaystyle {\begin{aligned}({\mathcal {T}}\times _{1}U^{(1)})(i_{1},j_{2},j_{3})&=\sum _{j_{1}=1}^{d_{1}}{\mathcal {T}}(j_{1},j_{2},j_{3})U^{(1)}(j_{1},i_{1})\\({\mathcal {T}}\times _{2}U^{(2)})(j_{1},i_{2},j_{3})&=\sum _{j_{2}=1}^{d_{2}}{\mathcal {T}}(j_{1},j_{2},j_{3})U^{(2)}(j_{2},i_{2})\\({\mathcal {T}}\times _{3}U^{(3)})(j_{1},j_{2},i_{3})&=\sum _{j_{3}=1}^{d_{3}}{\mathcal {T}}(j_{1},j_{2},j_{3})U^{(3)}(j_{3},i_{3})\end{aligned}}} Altogether, the decomposition may also be written more directly as T ( i 1 , i 2 , i 3 ) = ∑ j 1 = 1 d 1 ∑ j 2 = 1 d 2 ∑ j 3 = 1 d 3 T ( j 1 , j 2 , j 3 ) U ( 1 ) ( j 1 , i 1 ) U ( 2 ) ( j 2 , i 2 ) U ( 3 ) ( j 3 , i 3 ) {\displaystyle T(i_{1},i_{2},i_{3})=\sum _{j_{1}=1}^{d_{1}}\sum _{j_{2}=1}^{d_{2}}\sum _{j_{3}=1}^{d_{3}}{\mathcal {T}}(j_{1},j_{2},j_{3})U^{(1)}(j_{1},i_{1})U^{(2)}(j_{2},i_{2})U^{(3)}(j_{3},i_{3})} Taking d i = n i {\displaystyle d_{i}=n_{i}} for all i {\displaystyle i} is always sufficient to represent T {\displaystyle T} exactly, but often T {\displaystyle T} can be compressed or efficiently approximately by choosing d i < n i {\displaystyle d_{i} Read more →