VOCEDplus

VOCEDplus

VOCEDplus is a free international research database about tertiary education, maintained and developed by staff at the c (NCVER) in Adelaide, South Australia. The focus of the database content is the relation of post-compulsory education and training to workforce needs, skills development, and social inclusion. == Structure == The content of the VOCEDplus database encompasses vocational education and training (VET), higher education, lifelong learning, informal learning, VET in schools, adult and community education, apprenticeships/traineeships, international education, providers of education and training, and workforce development. It is international in scope and contains over 84,000 English language records, many with links to full text documents. VOCEDplus contains extensive Australian materials and includes a wide range of international information, covering outcomes of tertiary education in the shape of published research, practice, policy, and statistics. Entries are included for the following types of publications: reports; annual reports; papers; discussion papers; occasional papers; working papers; books; book chapters; conference papers; conference proceedings; journals; journal articles; policy documents; published statistics; theses; podcasts; and teaching and training materials. Each database entry contains standard bibliographic information and an abstract. Many entries include full text access via the publisher's website or a digitised copy. == History == === 1989-1997 === In the early years VOCEDplus was known as VOCED. The original database was produced by a network of clearinghouses across Australia with the aim of sharing activities in the technical and further education (TAFE) sector. VOCED was produced in hardcopy and an electronic version was distributed on diskette. === 1997-2001 === 1997 - the first web version of VOCED was made available from the National Centre for Vocational Education Research (NCVER) organisational website 1998 - a major project to upgrade the database and expand its international coverage commenced 2001 - creation of VOCED's own website 2001 - VOCED endorsed as the UNESCO international database for technical and vocational education and training (TVET) research information === 2001-2009 === Many changes to the database and website occurred during this period with a focus on continuous improvement to meet the needs of users and utilise emerging technologies. 2006 - materials produced for two adult literacy and learning programs funded by the Australian Department of Education, Employment and Workplace Relations (DEEWR) - the Workplace English Language and Learning (WELL) Programme and the Adult Literacy National Project (ALNP) included in VOCED 2007 - the Australian clearinghouse network transferred most of the hardcopy collections to NCVER, to form a centralised repository of resources 2009 - materials produced by Reframing the Future (RTF) a vocational education and training workforce development initiative of the Australian, State and Territory Governments included in VOCED === 2009-2014 === A major rebuild of the database and website was undertaken during this period to take advantage of the potential of new technologies to provide improved services and incorporate Web 2.0 technologies (RSS feeds, and share and bookmark tools). 2009 - scope expanded to more fully encompass the higher education sector 2011 - launch of VOCEDplus with the name change representing the enhanced features and extended focus 2012 - a major retrospective digitisation project commenced and by the end of the 2012-2013 financial year a total of 9,328 publications (593,534 pages/microfiche frames) had been digitised, ensuring these publications are available electronically for free === 2014-2019 === A number of significant curated content products were released during this period. 2015 - release of a refreshed look to adopt the new NCVER branding plus a number of search enhancements (Guided search, Expert search, and Glossary search) were added 2015 - first in the series of 'Focus on...' pages released 2016 - launch of the 'Pod Network', a convenient and efficient platform that allows instant access to research and a multitude of resources on a range of subjects 2017 - completion of the 'Pod Network', consisting of 20 Pods (on broad subjects including Apprenticeships and traineeships, Foundation skills, Teaching and learning, Career development, and Students) and 74 Podlets (on narrow topics including Online learning, Social media, VET in schools, STEM skills, and Adult literacy) 2018 - launch of the 'Timeline of Australian VET Policy Initiatives' and the 'VET Knowledge Bank' which contains a suite of products capturing Australia's diverse, complex and ever-changing VET system 2019 - after an internal review, a refreshed, streamlined version of the 'Pod Network' was released, consisting of 13 Pods and 20 Podlets 2019 - launch of the 'VET Practitioner Resource' which contains a range of information to support VET practitioners in their work and is organised into three sections: (1) Teaching, training and assessment: standards, guidance, research and good practice resources to inform daily work; (2) Practitioners as researchers: information for undertaking practitioner-led research; and (3) The VET workforce: information about VET teachers and trainers, and the professional development needs of the VET workforce 2019 - VOCEDplus celebrated 30 years of providing information to the tertiary education sector and the homepage was refreshed to make it more modern and easier to use === 2020- === VOCEDplus continued to be accessible throughout the COVID-19 pandemic. 2020-2021 - the VET Knowledge Bank added a dedicated page, 'COVID-19 announcements', that showcases the measures introduced by the Australian, state and territory governments to mitigate the impact of the pandemic and promote economic recovery 2020-2024 - published research about the effects of the pandemic on education and training, providers, students, labour markets, employment and employees was collected and made permanently available in the database 2024 - VOCEDplus celebrated 35 years of providing information to the tertiary education sector. The homepage was refreshed and a number of enhancements and new features were implemented including a new My Profile feature, improvements to My Selection, accessible search history and saved searches, enhanced search functionality, and improved navigation.

International Road Traffic and Accident Database

The International Road Traffic and Accident Database (IRTAD) is an initiative dedicated to compiling and analyzing global road crash data. It is managed by the International Transport Forum (ITF) under the auspices of its permanent working group, which specializes in road safety, commonly referred to as the IRTAD Group. The primary objective of IRTAD is to provide a robust empirical basis for international comparisons in the field of road safety and to offer data to support the formulation of effective road safety policies. == Data availability == A portion of the data gathered by IRTAD is accessible for free through the OECD statistics website, however the remaining data requires a subscription for access. == History == The IRTAD database was originally started in 1988 by Germany's Federal Institution for Roads (BASt) in response to demands for international comparative data. It was later taken over and expanded by the International Transport Forum and has grown to be an important resource for comparing road safety metrics between countries worldwide, although mostly in the developed world. Every year, the ITF publishes comparative and country-by-country road safety data gathered for the IRTAD database and analysed by the IRTAD Group in the ITF Road Safety Annual Report, informally known as "IRTAD Report". Over the years, the IRTAD acronym has come to stand not only for the database, but also for the Traffic Safety Data and Analysis Group (usually referred to as IRTAD Group). The IRTAD Group is the International Transport Forum's permanent working group on road safety. It consists of a group of international road safety experts drawn from national road administrations, road safety research institutes, International organizations, automobile associations, insurance companies, car manufacturers and other road safety stakeholders. The IRTAD Group is a major forum for international road safety collaboration and exchange of best practices. Its focus is on improving road safety data as a basis for targeting interventions that are effective in reducing the number of road deaths and serious traffic injuries. The work of IRTAD, among that of others, has spawned the creation of road safety observatories for different world regions: the Ibero-American Road Safety Observatory Archived 2020-06-28 at the Wayback Machine (OISEVI), the African Road Safety Observatory Archived 2020-06-10 at the Wayback Machine, and the South-East Asian Road Safety Observatory. The ITF supports OISEVI through the Spanish-language IRTAD-LAC database and is actively involved in the implementation of the African and South East-Asian observatories. The genesis of the road safety observatory movement dates back to 2008, when the ITF, via IRTAD, began to facilitate twinning between countries striving to improve their road safety record and countries with high road safety performance. The initial twinning was between Jamaica and the United Kingdom. This work was supported by the World Bank, the Inter-American Development Bank (IADB) and the FIA Foundation. The twinning between Argentina and Spain in 2011 led to the creation of OISEVI. To this day, the ITF supports OISEVI through the Spanish-language IRTAD-LAC database. In 2006, the ITF set up Safer City Streets, a global traffic safety network for cities that replicates the successful IRTAD approach for urban road safety.

AlphaTensor

AlphaTensor is an artificial intelligence system developed by DeepMind for discovering efficient matrix multiplication algorithms using reinforcement learning. Introduced in 2022, the system was based on AlphaZero and formulated the search for matrix multiplication algorithms as a single-player game called TensorGame. AlphaTensor was designed to search for new ways to multiply matrices with fewer scalar multiplication operations. Matrix multiplication is a fundamental operation in linear algebra, numerical analysis, scientific computing, computer graphics, and machine learning. The system discovered thousands of matrix multiplication algorithms, including algorithms that rediscovered known human-designed methods and others that improved on previously known results for particular matrix sizes and mathematical settings. == Background == Matrix multiplication is one of the basic operations in numerical computing. The standard algorithm for multiplying two square matrices has cubic time complexity, while faster algorithms such as the Strassen algorithm reduce the number of multiplication operations by using more complex algebraic decompositions. Finding optimal matrix multiplication algorithms can be difficult because it involves searching through a large space of possible tensor decompositions. AlphaTensor approached this problem by representing algorithm discovery as TensorGame, in which each move corresponds to an operation that reduces a tensor representing matrix multiplication. The goal of the game is to find a low-rank decomposition of the matrix multiplication tensor, corresponding to an efficient multiplication algorithm. == Development == AlphaTensor was developed by DeepMind and described in a paper published in Nature in October 2022. The system built on the reinforcement-learning approach used in AlphaZero, which had previously been applied to games such as Go, chess, and shogi. Unlike those games, TensorGame involved a very large search space, requiring changes to the AlphaZero-style search method and neural network architecture. DeepMind released source code and discovered algorithms associated with the publication through a public GitHub repository. == Results == AlphaTensor discovered matrix multiplication algorithms over both standard arithmetic and finite fields. One widely reported result was a method for multiplying 4 × 4 matrices over the field with two elements using 47 multiplication operations, improving on the 49 operations required by applying Strassen's algorithm recursively in that setting. The system also found algorithms optimized for particular computer hardware, including algorithms designed for graphics processing units and Tensor Processing Units. DeepMind stated that some of the hardware-specific algorithms improved practical execution time compared with commonly used algorithms on the tested hardware. == Significance == AlphaTensor was described as an example of using machine learning not only to apply existing algorithms, but to assist in discovering new ones. The work was connected to broader research in algorithm discovery, automated machine learning, program synthesis, and computational complexity theory, especially the open problem of determining the optimal complexity of matrix multiplication. AlphaTensor later became part of a broader group of Google DeepMind systems for algorithm and mathematical discovery, alongside systems such as AlphaDev and AlphaEvolve.

Document

A document is a written, drawn, presented, or memorialized representation of thought, often the manifestation of non-fictional, as well as fictional, content. The etymology of the word "document" derives from the Latin documentum, which denotes a "teaching" or "lesson": the verb doceō denotes "to teach". Historically, the term "document" was usually used to indicate written proof useful as evidence of a truth or fact. In the Computer Age, the term "document" typically refers to a primarily textual computer file, encompassing its structural and format elements, such as fonts, colors, and images. In the contemporary era, the definition of "document" has expanded beyond its traditional medium, such as paper, to encompass electronic documents as well. History, events, examples, opinions, stories, and creativity can all be expressed in documents. "Documentation" is distinct because it has more denotations than "document". Documents are also distinguished from "realia", which are three-dimensional objects that would otherwise satisfy the definition of "document" because they memorialize or represent thought. Documents are usually considered to be two-dimensional representations. == Abstract definitions == The concept of "document" has been defined by Suzanne Briet as "any concrete or symbolic indication, preserved or recorded, for reconstructing or for proving a phenomenon, whether physical or mental." An often-cited article concludes that "the evolving notion of document" among Jonathan Priest, Paul Otlet, Briet, Walter Schürmeyer, and the other documentalists increasingly emphasized whatever functioned as a document rather than traditional physical forms of documents. The shift to digital technology would seem to make this distinction even more important. David M. Levy has said that an emphasis on the technology of digital documents has impeded our understanding of digital documents as documents. A conventional document, such as a mail message or a technical report, exists physically in digital technology as a string of bits, as does everything else in a digital environment. As an object of study, it has been made into a document. It has become physical evidence by those who study it. "Document" is defined in library and information science and documentation science as a fundamental, abstract idea: the word denotes everything that may be represented or memorialized to serve as evidence. The classic example provided by Briet is an antelope: "An antelope running wild on the plains of Africa should not be considered a document[;] she rules. But if it were to be captured, taken to a zoo and made an object of study, it has been made into a document. It has become physical evidence being used by those who study it. Indeed, scholarly articles written about the antelope are secondary documents, since the antelope itself is the primary document." This opinion has been interpreted as an early expression of actor–network theory. == Kinds == A document can be structured, like tabular documents, lists, forms, or scientific charts, semi-structured like a book or a newspaper article, or unstructured like a handwritten note. Documents are sometimes classified as secret, private, or public. They may also be described as drafts or proofs. When a document is copied, the source is denominated the "original". Documents are used in numerous fields, e.g.: Academia: manuscript, thesis, paper, journal, chart, and technical drawing Media: mock-up, script, image, photography, and newspaper article Administration, law, and politics: application, brief, certificate, commission, constitutional document, form, gazette, identity document, license, manifesto, summons, census, and white paper Business: invoice, request for proposal, proposal, contract, packing slip, manifest, report (detailed and summary), spreadsheet, material safety data sheet, waybill, bill of lading, financial statement, nondisclosure agreement (NDA), mutual nondisclosure agreement, and user guide Geography and planning: topographic map, cadastre, legend, and architectural plan Such standard documents can be drafted based on a template. == Drafting == The page layout of a document is how information is graphically arranged in the space of the document, e.g., on a page. If the appearance of the document is of concern, the page layout is generally the responsibility of a graphic designer. Typography concerns the design of letter and symbol forms and their physical arrangement in the document (see typesetting). Information design concerns the effective communication of information, especially in industrial documents and public signs. Simple textual documents may not require visual design and may be drafted only by an author, clerk, or transcriber. Forms may require a visual design for their initial fields, but not to complete the forms. == Media == Traditionally, the medium of a document was paper and the information was applied to it in ink, either by handwriting (to make a manuscript) or by a mechanical process (e.g., a printing press or laser printer). Today, some short documents also may consist of sheets of paper stapled together. Historically, documents were inscribed with ink on papyrus (starting in ancient Egypt) or parchment; scratched as runes or carved on stone using a sharp tool, e.g., the Tablets of Stone described in the Bible; stamped or incised in clay and then baked to make clay tablets, e.g., in the Sumerian and other Mesopotamian civilizations. The papyrus or parchment was often rolled into a scroll or cut into sheets and bound into a codex (book). Contemporary electronic means of memorializing and displaying documents include: Monitor of a desktop computer, laptop, tablet; optionally with a printer to produce a hard copy; Personal digital assistant; Dedicated e-book device; Electronic paper, typically, using the Portable Document Format (PDF); Information appliance; Digital audio player; and Radio and television service provider. Digital documents usually require a specific file format to be presentable in a specific medium. == In law == Documents in all forms frequently serve as material evidence in criminal and civil proceedings. The forensic analysis of such a document is within the scope of questioned document examination. To catalog and manage the large number of documents that may be produced during litigation, Bates numbering is often applied to all documents in the lawsuit so that each document has a unique, arbitrary, identification number.

Chinese speech synthesis

Chinese speech synthesis is the application of speech synthesis to the Chinese language (usually Standard Chinese). It poses additional difficulties due to Chinese characters frequently having different pronunciations in different contexts and the complex prosody, which is essential to convey the meaning of words, and sometimes the difficulty in obtaining agreement among native speakers concerning what the correct pronunciation is of certain phonemes. == Concatenation (Ekho and KeyTip) == Recordings can be concatenated in any desired combination, but the joins sound forced (as is usual for simple concatenation-based speech synthesis) and this can severely affect prosody; these synthesizers are also inflexible in terms of speed and expression. However, because these synthesizers do not rely on a corpus, there is no noticeable degradation in performance when they are given more unusual or awkward phrases. Ekho is an open source TTS which simply concatenates sampled syllables. It currently supports Cantonese, Mandarin, and experimentally Korean. Some of the Mandarin syllables have been pitched-normalised in Praat. A modified version of these is used in Gradint's "synthesis from partials". cjkware.com used to ship a product called KeyTip Putonghua Reader which worked similarly; it contained 120 Megabytes of sound recordings (GSM-compressed to 40 Megabytes in the evaluation version), comprising 10,000 multi-syllable dictionary words plus single-syllable recordings in 6 different prosodies (4 tones, neutral tone, and an extra third-tone recording for use at the end of a phrase). == Lightweight synthesizers (eSpeak and Yuet) == The lightweight open-source speech project eSpeak, which has its own approach to synthesis, has experimented with Mandarin and Cantonese. eSpeak was used by Google Translate from May 2010 until December 2010. The commercial product "Yuet" is also lightweight (it is intended to be suitable for resource-constrained environments like embedded systems); it was written from scratch in ANSI C starting from 2013. Yuet claims a built-in NLP model that does not require a separate dictionary; the speech synthesised by the engine claims clear word boundaries and emphasis on appropriate words. Communication with its author is required to obtain a copy. Both eSpeak and Yuet can synthesis speech for Cantonese and Mandarin from the same input text, and can output the corresponding romanisation (for Cantonese, Yuet uses Yale and eSpeak uses Jyutping; both use Pinyin for Mandarin). eSpeak does not concern itself with word boundaries when these don't change the question of which syllable should be spoken. == Corpus-based == A "corpus-based" approach can sound very natural in most cases but can err in dealing with unusual phrases if they can't be matched with the corpus. The synthesiser engine is typically very large (hundreds or even thousands of megabytes) due to the size of the corpus. === iFlyTek === Anhui USTC iFlyTek Co., Ltd (iFlyTek) published a W3C paper in which they adapted Speech Synthesis Markup Language to produce a mark-up language called Chinese Speech Synthesis Markup Language (CSSML) which can include additional markup to clarify the pronunciation of characters and to add some prosody information. The amount of data involved is not disclosed by iFlyTek but can be seen from the commercial products that iFlyTek have licensed their technology to; for example, Bider's SpeechPlus is a 1.3 Gigabyte download, 1.2 Gigabytes of which is used for the highly compressed data for a single Chinese voice. iFlyTek's synthesiser can also synthesise mixed Chinese and English text with the same voice (e.g. Chinese sentences containing some English words); they claim their English synthesis to be "average". The iFlyTek corpus appears to be heavily dependent on Chinese characters, and it is not possible to synthesize from pinyin alone. It is sometimes possible by means of CSSML to add pinyin to the characters to disambiguate between multiple possible pronunciations, but this does not always work. === NeoSpeech === There is an online interactive demonstration for NeoSpeech speech synthesis, which accepts Chinese characters and also pinyin if it's enclosed in their proprietary "VTML" markup. === Mac OS === Mac OS had Chinese speech synthesizers available up to version 9. This was removed in 10.0 and reinstated in 10.7 (Lion). === Historical corpus-based synthesizers (no longer available) === A corpus-based approach was taken by Tsinghua University in SinoSonic, with the Harbin dialect voice data taking 800 Megabytes. This was planned to be offered as a download but the link was never activated. Nowadays, only references to it can be found on Internet Archive. Bell Labs' approach, which was demonstrated online in 1997 but subsequently removed, was described in a monograph "Multilingual Text-to-Speech Synthesis: The Bell Labs Approach" (Springer, October 31, 1997, ISBN 978-0-7923-8027-6), and the former employee who was responsible for the project, Chilin Shih (who subsequently worked at the University of Illinois) put some notes about her methods on her website.

Negobot

Negobot also referred to as Lolita or Lolita chatbot is a chatterbot that was introduced to the public in 2013, designed by researchers from the University of Deusto and Optenet to catch online pedophiles. It is a conversational agent that utilizes natural language processing (NLP), information retrieval (IR) and Automatic Learning. Because the bot poses as a young female in order to entice and track potential predators, it became known in media as the "virtual Lolita", in reference to Vladimir Nabokov's novel. == Background == In 2013, the University of Deusto researchers published a paper on their work with Negobot and disclosed the text online. In their abstract, the researchers addressed the issue that an increasing number of children are using the internet and that these young users are more susceptible to existing internet risks. Their main objective was to create a chatterbot with the ability to trap online predators that posed a threat to children. They intended to deploy the bot into sites frequented by predators such as social networks and chatrooms. The university researchers used information provided by anti-pedophilia activist organization Perverted-Justice, including examples of online encounters and conversations with sexual predators, to supplement the program's artificial intelligence system. == Features == === Programmed persona === The chatterbot takes the guise of a naive and vulnerable 14-year-old girl. The bot's programmers used methods of artificial intelligence and natural language processing to create a conversational agent fluent in typical teenage slang, misspellings, and knowledge of pop culture. Through these linguistic features, the bot is able to mimic the conversational style of young teenagers. It also features split personalities and seven different patterns of conversation. Negobot's primary creator, Dr. Carlos Laorden, expressed the significance of the bot's distinguishable style of communication, stating that normally, "chatbots tend to be very predictable. Their behavior and interest in a conversation are flat, which is a problem when attempting to detect untrustworthy targets like paedophiles." What makes Negobot different is its game theory feature, which makes it able to "maintain a much more realistic conversation." Apart from being able to imitate a stereotypical teenager, the program is also able to translate messages into different languages. === Game theory === Negobot's designers programmed it with the ability to treat conversations with potential predators as if it were a game, the objective being to collect as much information on the suspect as possible that could provide evidence of pedophilic characteristics and motives. The use of game theory shapes the decisions the bot makes and the overall direction of the conversation. The bot initiates its undercover operations by entering a chat as a passive participant, waiting to be chatted by a user. Once a user elicits conversation, the bot will frame the conversation in such a way that keeps the target engaged, extracting personal information and discouraging it from leaving the chat. The information is then recorded to be potentially sent to the police. If the target seems to lose interest, the bot attempts to make it feel guilty by expressing sentiments of loneliness and emotional need through strategic, formulated responses, ultimately prolonging interaction. In addition, the bot may provide fake information about itself in attempt to lure the target into physical meetings. === Limitations === Despite being able to carry out a realistic conversation, Negobot is still unable to detect linguistic subtleties in the messages of others, including sarcasm. == Controversy == John Carr, a specialist in online child safety, expressed his concern to BBC over the legality of this undercover investigation. He claimed that using the bot on unsuspecting internet users could be considered a form of entrapment or harassment. The type of information that Negobot collects from potential online predators, he said, is unlikely to be upheld in court. Furthermore, he warned that relying on only software without any real-world policing risks enticing individuals to do or say things that they would not have if real-world policing were a factor.

Leiden algorithm

The Leiden algorithm is a community detection algorithm developed by Traag et al at Leiden University. It was developed as a modification of the Louvain method. Like the Louvain method, the Leiden algorithm attempts to optimize modularity in extracting communities from networks; however, it addresses key issues present in the Louvain method, namely poorly connected communities and the resolution limit of modularity. == Improvement over Louvain method == Broadly, the Leiden algorithm uses the same two primary phases as the Louvain algorithm: a local node moving step (though, the method by which nodes are considered in Leiden is more efficient) and a graph aggregation step. However, to address the issues with poorly-connected communities and the merging of smaller communities into larger communities (the resolution limit of modularity), the Leiden algorithm employs an intermediate refinement phase in which communities may be split to guarantee that all communities are well-connected. Consider, for example, the following graph: Three communities are present in this graph (each color represents a community). Additionally, the center "bridge" node (represented with an extra circle) is a member of the community represented by blue nodes. Now consider the result of a node-moving step which merges the communities denoted by red and green nodes into a single community (as the two communities are highly connected): Notably, the center "bridge" node is now a member of the larger red community after node moving occurs (due to the greedy nature of the local node moving algorithm). In the Louvain method, such a merging would be followed immediately by the graph aggregation phase. However, this causes a disconnection between two different sections of the community represented by blue nodes. In the Leiden algorithm, the graph is instead refined: The Leiden algorithm's refinement step ensures that the center "bridge" node is kept in the blue community to ensure that it remains intact and connected, despite the potential improvement in modularity from adding the center "bridge" node to the red community. == Graph components == Before defining the Leiden algorithm, it will be helpful to define some of the components of a graph. === Vertices and edges === A graph is composed of vertices (nodes) and edges. Each edge is connected to two vertices, and each vertex may be connected to zero or more edges. Edges are typically represented by straight lines, while nodes are represented by circles or points. In set notation, let V {\displaystyle V} be the set of vertices, and E {\displaystyle E} be the set of edges: V := { v 1 , v 2 , … , v n } E := { e i j , e i k , … , e k l } {\displaystyle {\begin{aligned}V&:=\{v_{1},v_{2},\dots ,v_{n}\}\\E&:=\{e_{ij},e_{ik},\dots ,e_{kl}\}\end{aligned}}} where e i j {\displaystyle e_{ij}} is the directed edge from vertex v i {\displaystyle v_{i}} to vertex v j {\displaystyle v_{j}} . We can also write this as an ordered pair: e i j := ( v i , v j ) {\displaystyle {\begin{aligned}e_{ij}&:=(v_{i},v_{j})\end{aligned}}} === Community === A community is a unique set of nodes: C i ⊆ V C i ⋂ C j = ∅ ∀ i ≠ j {\displaystyle {\begin{aligned}C_{i}&\subseteq V\\C_{i}&\bigcap C_{j}=\emptyset ~\forall ~i\neq j\end{aligned}}} and the union of all communities must be the total set of vertices: V = ⋃ i = 1 C i {\displaystyle {\begin{aligned}V&=\bigcup _{i=1}C_{i}\end{aligned}}} === Partition === A partition is the set of all communities: P = { C 1 , C 2 , … , C n } {\displaystyle {\begin{aligned}{\mathcal {P}}&=\{C_{1},C_{2},\dots ,C_{n}\}\end{aligned}}} == Partition quality == How communities are partitioned is an integral part on the Leiden algorithm. How partitions are decided can depend on how their quality is measured. Additionally, many of these metrics contain parameters of their own that can change the outcome of their communities. === Modularity === Modularity is a highly used quality metric for assessing how well a set of communities partition a graph. The equation for this metric is defined for an adjacency matrix, A, as: Q = 1 2 m ∑ i j ( A i j − k i k j 2 m ) δ ( c i , c j ) {\displaystyle Q={\frac {1}{2m}}\sum _{ij}(A_{ij}-{\frac {k_{i}k_{j}}{2m}})\delta (c_{i},c_{j})} where: A i j {\displaystyle A_{ij}} represents the edge weight between nodes i {\displaystyle i} and j {\displaystyle j} ; see Adjacency matrix; k i {\displaystyle k_{i}} and k j {\displaystyle k_{j}} are the sum of the weights of the edges attached to nodes i {\displaystyle i} and j {\displaystyle j} , respectively; m {\displaystyle m} is the sum of all of the edge weights in the graph; c i {\displaystyle c_{i}} and c j {\displaystyle c_{j}} are the communities to which the nodes i {\displaystyle i} and j {\displaystyle j} belong; and δ {\displaystyle \delta } is Kronecker delta function: δ ( c i , c j ) = { 1 if c i and c j are the same community 0 otherwise {\displaystyle {\begin{aligned}\delta (c_{i},c_{j})&={\begin{cases}1&{\text{if }}c_{i}{\text{ and }}c_{j}{\text{ are the same community}}\\0&{\text{otherwise}}\end{cases}}\end{aligned}}} === Reichardt Bornholdt Potts Model (RB) === One of the most well used metrics for the Leiden algorithm is the Reichardt Bornholdt Potts Model (RB). This model is used by default in most mainstream Leiden algorithm libraries under the name RBConfigurationVertexPartition. This model introduces a resolution parameter γ {\displaystyle \gamma } and is highly similar to the equation for modularity. This model is defined by the following quality function for an adjacency matrix, A, as: Q = ∑ i j ( A i j − γ k i k j 2 m ) δ ( c i , c j ) {\displaystyle Q=\sum _{ij}(A_{ij}-\gamma {\frac {k_{i}k_{j}}{2m}})\delta (c_{i},c_{j})} where: γ {\displaystyle \gamma } represents a linear resolution parameter === Constant Potts Model (CPM) === Another metric similar to RB is the Constant Potts Model (CPM). This metric also relies on a resolution parameter γ {\displaystyle \gamma } The quality function is defined as: H = − ∑ i j ( A i j w i j − γ ) δ ( c i , c j ) {\displaystyle H=-\sum _{ij}(A_{ij}w_{ij}-\gamma )\delta (c_{i},c_{j})} === Understanding Potts Model resolution parameters/Resolution limit === Typically Potts models such as RB or CPM include a resolution parameter in their calculation. Potts models are introduced as a response to the resolution limit problem that is present in modularity maximization based community detection. The resolution limit problem is that, for some graphs, maximizing modularity may cause substructures of a graph to merge and become a single community and thus smaller structures are lost. These resolution parameters allow modularity adjacent methods to be modified to suit the requirements of the user applying the Leiden algorithm to account for small substructures at a certain granularity. The figure on the right illustrates why resolution can be a helpful parameter when using modularity based quality metrics. In the first graph, modularity only captures the large scale structures of the graph; however, in the second example, a more granular quality metric could potentially detect all substructures in a graph. == Algorithm == The Leiden algorithm starts with a graph of disorganized nodes (a) and sorts it by partitioning them to maximize modularity (the difference in quality between the generated partition and a hypothetical randomized partition of communities). The method it uses is similar to the Louvain algorithm, except that after moving each node it also considers that node's neighbors that are not already in the community it was placed in. This process results in our first partition (b), also referred to as P {\displaystyle {\mathcal {P}}} . Then the algorithm refines this partition by first placing each node into its own individual community and then moving them from one community to another to maximize modularity. It does this iteratively until each node has been visited and moved, and each community has been refined - this creates partition (c), which is the initial partition of P refined {\displaystyle {\mathcal {P}}_{\text{refined}}} . Then an aggregate network (d) is created by turning each community into a node. P refined {\displaystyle {\mathcal {P}}_{\text{refined}}} is used as the basis for the aggregate network while P {\displaystyle {\mathcal {P}}} is used to create its initial partition. Because we use the original partition P {\displaystyle {\mathcal {P}}} in this step, we must retain it so that it can be used in future iterations. These steps together form the first iteration of the algorithm. In subsequent iterations, the nodes of the aggregate network (which each represent a community) are once again placed into their own individual communities and then sorted according to modularity to form a new P refined {\displaystyle {\mathcal {P}}_{\text{refined}}} , forming (e) in the above graphic. In the case depicted by the graph, the nodes were already sorted optimally, so no change too