AI Email Free Generator

AI Email Free Generator — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • The Business Cloud

    The Business Cloud

    The Business Cloud is an API enabled self-service platform, developed by Domo, that provides an array of services like data connection and data visualization. == History == Domo, Inc. was founded in 2010 by Josh James who also co-founded the web analytics software company Omniture in 1996, which he took public in 2006. Domo launched the Domo Appstore, with 1,000 apps with social and mobile capabilities, in 2016. This appstore creates a network of business apps and an ecosystem of companies into a single, integrated business cloud. This decision came after Domo announced a $131 million round of funding from BlackRock. According to the company, the concept behind The Business Cloud is to connect smaller clouds relating to apps or other functional areas of a business into a single business cloud that allows self-service and other social features to customers. == Services == The Business Cloud is offered as a free service, claimed to be the world's first business cloud with Domo appstore as one of its core services. This free package includes all of the Domo's features and functionality including Domo platform, Domo Apps, visualizations, alerts, company directories, org charts, profiles, tasks and Domo Mobile. The Business Cloud allows customers to leverage their preferred cloud as well as on-premises software and monitor all aspects of their business in routine. The company is supported by a $500 million fund from investors all over the world.

    Read more →
  • Lion algorithm

    Lion algorithm

    Lion algorithm (LA) is one among the bio-inspired (or) nature-inspired optimization algorithms (or) that are mainly based on meta-heuristic principles. It was first introduced by B. R. Rajakumar in 2012 in the name, Lion’s Algorithm. It was further extended in 2014 to solve the system identification problem. This version was referred as LA, which has been applied by many researchers for their optimization problems. == Inspiration from lion’s social behaviour == Lions form a social system called a "pride", which consists of 1–3 pair of lions. A pride of lions shares a common area known as territory in which a dominant lion is called as territorial lion. The territorial lion safeguards its territory from outside attackers, especially nomadic lions. This process is called territorial defense. It protects the cubs till they become sexually matured. The maturity period is about 2–4 years. The pride undergoes survival fights to protect its territory and the cubs from nomadic lions. Upon getting defeated by the nomadic lions, the dominating nomadic lion takes the role of territorial lion by killing or driving out the cubs of the pride. The lioness of the pride give birth to cubs though the new territorial lion. When the cubs of the pride mature and considered to be stronger than the territorial lion, they take over the pride. This process is called territorial take-over. If territorial take-over happens, either the old territorial lion, which is considered to be laggard, is driven out or it leaves the pride. The stronger lions and lioness form the new pride and give birth to their own cubs == Terminology == In the LA, the terms that are associated with lion’s social system are mapped to the terminology of optimization problems. Few of such notable terms are related here. Lion: A potential solution to be generated or determined as optimal (or) near-optimal solution of the problem. The lion can be a territorial lion and lioness, cubs and nomadic lions that represent the solution based on the processing steps of the LA. Territorial lion: The strongest solution of the pride that tends to meet the objective function. Nomadic lion: A random solution, sometimes termed as nomad, to facilitate the exploration principle Laggard lion: Poor solutions that are failed in the survival fight. Pride: A pool of potential solutions i.e. a lion, lioness and their cubs, that are potential solutions of the search problem. Fertility evaluation: A process of evaluating whether the territorial lion and lioness are able to provide potential solutions in the future generations i.e. It ensures that the lion or lioness converge at every generation. Survival fight: It is a greedy selection process, which is often carried out between the pride and nomadic lion. == Algorithm == The steps involved in LA are given below: Pride Generation: Generate X m a l e {\displaystyle X^{male}} , X f e m a l e {\displaystyle X^{female}} and X 1 n o m a d {\displaystyle X_{1}^{nomad}} Determine f ( X m a l e ) {\displaystyle f(X^{male})} , f ( X f e m a l e ) {\displaystyle f(X^{female})} , f ( X 1 n o m a d ) {\displaystyle f(X_{1}^{nomad})} Initialize f r e f {\displaystyle f^{ref}} as f ( X m a l e ) {\displaystyle f(X^{male})} and N g {\displaystyle N_{g}} as 0 Memorize X m a l e {\displaystyle X^{male}} and X f e m a l e {\displaystyle X^{female}} Apply Fertility evaluation Process Generation of cubpool by mating Gender clustering: Define X c u b m a l e {\displaystyle X_{cub}^{male}} and X c u b f e m a l e {\displaystyle X_{cub}^{female}} Initialize a g e c u b {\displaystyle age_{cub}} as zero Apply Cub growth function Territorial defense: If X m a l e {\displaystyle X^{male}} (or pride) fails in the survival fight i.e. X 1 n o m a d {\displaystyle X_{1}^{nomad}} defeats the pride, go to step 4, else continue Increase a g e c u b {\displaystyle age_{cub}} by 1 and check whether cub attains maturity i.e., if a g e c u b > a g e m a x {\displaystyle age_{cub}>age_{max}} , go to Step 9, else continue Territorial takeover: If X c u b m a l e {\displaystyle X_{cub}^{male}} and X c u b f e m a l e {\displaystyle X_{cub}^{female}} are found to be closer to optimal solution, update X m a l e {\displaystyle X^{male}} and X f e m a l e {\displaystyle X^{female}} Increment N g {\displaystyle N_{g}} by 1 Repeat from Step 5, if termination criterion is not violated, else return X m a l e {\displaystyle X^{male}} as the near-optimal solution == Variants == The LA has been further taken forward to adopt in different problem areas. According to the characteristics of the problem area, significant amendment has been done in the processes and the models used in the LA. Accordingly, diverse variants have been developed by the researchers. They can be broadly grouped as hybrid LAs and non-hybrid LAs. Hybrid LAs are the LAs that are amended by the principle of other meta-heuristics, whereas the Non-hybrid LAs take any scientific amendment inside its operation that are felt to be essential to attend the respective problem area. == Applications == LA is applied in diverse engineering applications that range from network security, text mining, image processing, electrical systems, data mining and many more. Few of the notable applications are discussed here. Networking applications: In WSN, LA is used to solve the cluster head selection problem by determining optimal cluster head. Route discovery problem in both the VANET and MANET are also addressed by the LA in the literature. It is also used to detect attacks in advanced networking scenarios such as Software-Defined Networks (SDN) Power Systems: LA has attended generation rescheduling problem in a deregulated environment, optimal localization and sizing of FACTS devices for power quality enhancement and load-frequency controlling problem Cloud computing: LA is used in optimal container-resource allocation problem in cloud environment and cloud security

    Read more →
  • Visual Peer Review

    Visual Peer Review

    == Development and history == Visual Peer Review was first described in a 2017 classroom study by Friedman and Rosen, which examined how students evaluate peer-produced data visualizations using structured rubrics. Developed within the broader fields of data visualization, information visualization, and educational technology, the system emphasized clear labeling, visual integrity, and reduction of chartjunk. Students assigned rubric scores and provided written explanations, aligning the activity with established principles of peer review. Follow-up research expanded both the methodological and analytic dimensions of the framework. Friedman and colleagues applied natural language processing (NLP) to peer-review text to analyze part-of-speech patterns, sentence complexity, and comment length. These analyses offered insight into how students expressed critique and engaged with core design principles. Later studies incorporated advanced statistical modeling to evaluate system-level behavior, including peer review networks and reviewer typologies. Between 2021 and 2024, the framework underwent iterative refinement through a series of studies that explored interface design, behavioral nudges, reviewer engagement, and social network dynamics. The system was influenced by earlier work in computer-supported peer review—particularly My Reviewers, a rubric-based writing assessment platform developed by Joe Moxley at the University of South Florida. While Moxley's platform focused on text-based feedback, Visual Peer Review adapted its core structure to support critique of DataVis and visual analytics. To guide structured analysis and feedback, Friedman and Rosen also drew on the “what, why, and how” framework introduced by Liu and Stasko (2010), which emphasizes understanding a visualization's purpose, task alignment, and encoding strategy. == Framework and components == Visual Peer Review is designed to support critique, reflection, and learning in courses focusing on data visualization, visual analytics, and related fields in educational technology. The system consists of interconnected component. Core components include: Visual Artifacts: Students generate original visualizations using software such as R (e.g., ggplot2), Tableau, Python, or Adobe Illustrator. These artifacts may include statistical graphics, dashboards, or design-oriented infographics. Rubric-Based Assessment: Peer reviewers evaluate submitted visualizations using structured rubrics grounded in visualization theory and design heuristics. Rubric dimensions typically include: Use of labeling and axis scales Minimalization of chartjunk and clutter (following Tufte's principles) Optimization of the data–ink ratio Preservation of visual integrity through accurate representation (lie factor) Written Peer Comments: In addition to scoring, reviewers provide narrative feedback explaining their reasoning. These comments aim to improve design literacy, strengthen visual reasoning, and support the learning process common to peer review across educational contexts. Instructor Analytics Dashboard: Instructors access an analytics dashboard that displays peer-review activity across the course. Metrics include comment length, rubric coverage, participation patterns, and potential indicators of disengagement. These features position the framework within the domain of learning analytics, where visualized data helps instructors monitor student progress and identify support needs. == Ongoing development == Current work focuses on enhancing rubric structure, integrating principles from human–computer interaction, DataVis and expanding learning-analytics capabilities. Ongoing studies investigate how interface design, reviewer behavior, and classroom context influence the quality of feedback and overall engagement. Continuing development positions Visual Peer Review at the intersection of data visualization education, peer assessment, and educational technology.

    Read more →
  • Reference data

    Reference data

    Reference data is data used to classify or categorize other data. Typically, they are static or slowly changing over time. Examples of reference data include: Units of measurement Country codes Corporate codes Fixed conversion rates e.g., weight, temperature, and length Calendar structure and constraints Reference data sets are sometimes alternatively referred to as a "controlled vocabulary" or "lookup" data. Reference data differs from master data. While both provide context for business transactions, reference data is concerned with classification and categorisation, while master data is concerned with business entities. A further difference between reference data and master data is that a change to the reference data values may require an associated change in business process to support the change, while a change in master data will always be managed as part of existing business processes. For example, adding a new customer or sales product is part of the standard business process. However, adding a new product classification (e.g. "restricted sales item") or a new customer type (e.g. "gold level customer") will result in a modification to the business processes to manage those items. == Externally-defined reference data == For most organisations, most or all reference data is defined and managed within that organisation. Some reference data, however, may be externally defined and managed, for example by standards organizations. An example of externally defined reference data is the set of country codes as defined in ISO 3166-1. == Reference data management == Curating and managing reference data is key to ensuring its quality and thus fitness for purpose. All aspects of an organisation, operational and analytical, are greatly dependent on the quality of an organization's reference data. Without consistency across business process or applications, for example, similar things may be described in quite different ways. Reference data gain in value when they are widely re-used and widely referenced. Examples of good practice in reference data management include: Formalize the reference data management Use external reference data as much as possible Govern the reference data specific to your enterprise Manage reference data at enterprise level Version control your reference data

    Read more →
  • Distributed file system for cloud

    Distributed file system for cloud

    A distributed file system for cloud is a file system that allows many clients to have access to data and supports operations (create, delete, modify, read, write) on that data. Each data file may be partitioned into several parts called chunks. Each chunk may be stored on different remote machines, facilitating the parallel execution of applications. Typically, data is stored in files in a hierarchical tree, where the nodes represent directories. There are several ways to share files in a distributed architecture: each solution must be suitable for a certain type of application, depending on how complex the application is. Meanwhile, the security of the system must be ensured. Confidentiality, availability and integrity are the main keys for a secure system. Users can share computing resources through the Internet thanks to cloud computing which is typically characterized by scalable and elastic resources – such as physical servers, applications and any services that are virtualized and allocated dynamically. Synchronization is required to make sure that all devices are up-to-date. Distributed file systems enable many big, medium, and small enterprises to store and access their remote data as they do local data, facilitating the use of variable resources. == Overview == === History === Today, there are many implementations of distributed file systems. The first file servers were developed by researchers in the 1970s. Sun Microsystem's Network File System became available in the 1980s. Before that, people who wanted to share files used the sneakernet method, physically transporting files on storage media from place to place. Once computer networks started to proliferate, it became obvious that the existing file systems had many limitations and were unsuitable for multi-user environments. Users initially used FTP to share files. FTP first ran on the PDP-10 at the end of 1973. Even with FTP, files needed to be copied from the source computer onto a server and then from the server onto the destination computer. Users were required to know the physical addresses of all computers involved with the file sharing. === Supporting techniques === Modern data centers must support large, heterogenous environments, consisting of large numbers of computers of varying capacities. Cloud computing coordinates the operation of all such systems, with techniques such as data center networking (DCN), the MapReduce framework, which supports data-intensive computing applications in parallel and distributed systems, and virtualization techniques that provide dynamic resource allocation, allowing multiple operating systems to coexist on the same physical server. === Applications === Cloud computing provides large-scale computing thanks to its ability to provide the needed CPU and storage resources to the user with complete transparency. This makes cloud computing particularly suited to support different types of applications that require large-scale distributed processing. This data-intensive computing needs a high performance file system that can share data between virtual machines (VM). Cloud computing dynamically allocates the needed resources, releasing them once a task is finished, requiring users to pay only for needed services, often via a service-level agreement. Cloud computing and cluster computing paradigms are becoming increasingly important to industrial data processing and scientific applications such as astronomy and physics, which frequently require the availability of large numbers of computers to carry out experiments. == Architectures == Most distributed file systems are built on the client-server architecture, but other, decentralized, solutions exist as well. === Client-server architecture === Network File System (NFS) uses a client-server architecture, which allows sharing of files between a number of machines on a network as if they were located locally, providing a standardized view. The NFS protocol allows heterogeneous clients' processes, probably running on different machines and under different operating systems, to access files on a distant server, ignoring the actual location of files. Relying on a single server results in the NFS protocol suffering from potentially low availability and poor scalability. Using multiple servers does not solve the availability problem since each server is working independently. The model of NFS is a remote file service. This model is also called the remote access model, which is in contrast with the upload/download model: Remote access model: Provides transparency, the client has access to a file. He sends requests to the remote file (while the file remains on the server). Upload/download model: The client can access the file only locally. It means that the client has to download the file, make modifications, and upload it again, to be used by others' clients. The file system used by NFS is almost the same as the one used by Unix systems. Files are hierarchically organized into a naming graph in which directories and files are represented by nodes. === Cluster-based architectures === A cluster-based architecture ameliorates some of the issues in client-server architectures, improving the execution of applications in parallel. The technique used here is file-striping: a file is split into multiple chunks, which are "striped" across several storage servers. The goal is to allow access to different parts of a file in parallel. If the application does not benefit from this technique, then it would be more convenient to store different files on different servers. However, when it comes to organizing a distributed file system for large data centers, such as Amazon and Google, that offer services to web clients allowing multiple operations (reading, updating, deleting,...) to a large number of files distributed among a large number of computers, then cluster-based solutions become more beneficial. Note that having a large number of computers may mean more hardware failures. Two of the most widely used distributed file systems (DFS) of this type are the Google File System (GFS) and the Hadoop Distributed File System (HDFS). The file systems of both are implemented by user level processes running on top of a standard operating system (Linux in the case of GFS). ==== Design principles ==== ===== Goals ===== Google File System (GFS) and Hadoop Distributed File System (HDFS) are specifically built for handling batch processing on very large data sets. For that, the following hypotheses must be taken into account: High availability: the cluster can contain thousands of file servers and some of them can be down at any time A server belongs to a rack, a room, a data center, a country, and a continent, in order to precisely identify its geographical location The size of a file can vary from many gigabytes to many terabytes. The file system should be able to support a massive number of files The need to support append operations and allow file contents to be visible even while a file is being written Communication is reliable among working machines: TCP/IP is used with a remote procedure call RPC communication abstraction. TCP allows the client to know almost immediately when there is a problem and a need to make a new connection. ===== Load balancing ===== Load balancing is essential for efficient operation in distributed environments. It means distributing work among different servers, fairly, in order to get more work done in the same amount of time and to serve clients faster. In a system containing N chunkservers in a cloud (N being 1000, 10000, or more), where a certain number of files are stored, each file is split into several parts or chunks of fixed size (for example, 64 megabytes), the load of each chunkserver being proportional to the number of chunks hosted by the server. In a load-balanced cloud, resources can be efficiently used while maximizing the performance of MapReduce-based applications. ===== Load rebalancing ===== In a cloud computing environment, failure is the norm, and chunkservers may be upgraded, replaced, and added to the system. Files can also be dynamically created, deleted, and appended. That leads to load imbalance in a distributed file system, meaning that the file chunks are not distributed equitably between the servers. Distributed file systems in clouds such as GFS and HDFS rely on central or master servers or nodes (Master for GFS and NameNode for HDFS) to manage the metadata and the load balancing. The master rebalances replicas periodically: data must be moved from one DataNode/chunkserver to another if free space on the first server falls below a certain threshold. However, this centralized approach can become a bottleneck for those master servers, if they become unable to manage a large number of file accesses, as it increases their already heavy loads. The load rebalance problem is NP-hard. In order to get a large number of chunkservers to work in collaboration, and to

    Read more →
  • Maritime Informatics

    Maritime Informatics

    Maritime Informatics is a thematic topic within the broader discipline of informatics. It can be considered as both a field of study and domain of application. As an application domain, it is the outlet of innovations originating from data science and artificial intelligence; as a field of study, it is positioned between computer science and marine engineering. == Beginnings of maritime informatics == As a result of the increasing levels of digitalisation occurring in the maritime sector starting around 2010 and stimulated by the EU-endorsed MonaLisa project for sea traffic management (STM), a number of academics and shipping industry leaders recognised that the maritime transportation sector would benefit from a specific field of study and application to be known as Maritime Informatics - the use of information systems, data sharing and data analytics in the business and operations of maritime transportation. They considered that it would lead to improvements in efficiency, safety, resilience, and ecological sustainability - all of which are currently lacking for many aspects of sea transport. One of the first public airings of the concept of Maritime Informatics was a presentation delivered on 11 September 2014 in Gothenburg, Sweden. A proposal for an inaugural minitrack on Maritime Informatics was accepted for the 2015 Americas Conference on Information Systems in Puerto Rico where three papers were presented. Since then numerous publications has been brought forward captured at www.maritimeinformatics.org and in late 2020 the first reference book on Maritime Informatics was co-written by 81 expert contributors (47 practitioners and 34 researchers) from 20 countries. Most impactful authors and journals in the domain have been documented in a review paper. Dimitrios Zissis, Luca Cazzanti and Leonardo M. Millefiori are the top three authors; top journals and conferences include Ocean Engineering, Proceedings of the 12th ACM International Conference on Distributed and Event-based Systems, Sensors, the international Conference On Engineering, Technology And Innovation, Expert Systems With Applications, IEEE Access, and Journal of Navigation. == Background == The shipping industry has several particular organisational aspects that are recognised and taken into account in maritime informatics: It is predominantly a self-organising ecosystem Many activities are undertaken as part of episodic tight coupling There is a so-called maritime stack There is increasing pressure to balance capital productivity and energy efficiency There is the potential virtuous interplay between different types of systems == Data sharing == Digital data sharing is key to the all-important, arguably fundamental, data analytics aspects of maritime informatics because it opens the way for better access to relevant and reliable data. As in land-based commerce, digital data sharing is a growing phenomenon in maritime operations - though there is a way to go. It is enabling greater transparency for all those involved in the transportation of goods and passengers, not least being the end-customer. This leads to better and more informed decision-making and planning by all those involved. The push for digitalisation and data sharing is being pursued both by governments and the commercial sector. For example, the Member States of the IMO agreed a mandatory requirement for their governments to introduce electronic information exchange between ships and ports as from 8 April 2019. Meanwhile, commercial operators, particularly in the container lines are putting systems in place for sharing data for mutual benefit in their operations. Data sharing is an important aspect of the Port Collaborative Decision Making (PortCDM) and Port Call Optimization initiatives, both of which seek to improve the coordination, synchronization and efficiency of the port call process by enabling a common and shared situational awareness among all those involved. == Standardisation == The availability and sharing of relevant digital data underpins maritime informatics and is key to more effective and efficient coordination and synchronisation in the predominantly self-organising ecosystem that is maritime transportation. For this to occur, a high priority underpinning maritime informatics is the encouragement of standardised digital data exchange and data sharing, leading, in turn, to improvements in shipping analytics. Improved availability of data will support better historical analysis, now-casting and forecasting. The International Maritime Organization (IMO) FAL Committee is taking the lead in ensuring that the common terms used in the various standards being developed or in use in the maritime sector are compatible and therefore interoperable as far as is practicable, by creating and maintaining The IMO Compendium on Facilitation and Electronic Business. The IMO Compendium consists of an IMO Data Set and IMO Reference Data Model agreed by the main organisations involved in the development of standards for the electronic exchange of information related to the FAL Convention: the World Customs Organization (WCO), the United Nations Economic Commission for Europe (UNECE) and the International Organization for Standardization (ISO). There are several other prominent international governmental and non-governmental organisations actively contributing to the ongoing standardisation and harmonisation process including the UN Electronic Data Interchange for Administration, Commerce and Transport (UN EDIFACT), the Digital Container Shipping Association (DCSA), the International Harbour Masters Association (IHMA) and BIMCO - the world's largest direct-membership organisation for shipowners, charterers, shipbrokers and agents.

    Read more →
  • Algorithm characterizations

    Algorithm characterizations

    Algorithm characterizations are attempts to formalize the word algorithm. Algorithm does not have a generally accepted formal definition. Researchers are actively working on this problem. This article will present some of the "characterizations" of the notion of "algorithm" in more detail. == The problem of definition == Over the last 200 years, the definition of the algorithm has become more complicated and detailed as researchers have tried to pin down the term. Indeed, there may be more than one type of "algorithm". But most agree that algorithm has something to do with defining generalized processes for the creation of "output" integers from other "input" integers – "input parameters" arbitrary and infinite in extent, or limited in extent but still variable—by the manipulation of distinguishable symbols (counting numbers) with finite collections of rules that a person can perform with paper and pencil. The most common number-manipulation schemes—both in formal mathematics and in routine life—are: (1) the recursive functions calculated by a person with paper and pencil, and (2) the Turing machine or its Turing equivalents—the primitive register-machine or "counter-machine" model, the random-access machine model (RAM), the random-access stored-program machine model (RASP) and its functional equivalent "the computer". When we are doing "arithmetic" we are really calculating by the use of "recursive functions" in the shorthand algorithms we learned in grade school, for example, adding and subtracting. The proofs that every "recursive function" we can calculate by hand we can compute by machine and vice versa—note the usage of the words calculate versus compute—is remarkable. But this equivalence together with the thesis (unproven assertion) that this includes every calculation/computation indicates why so much emphasis has been placed upon the use of Turing-equivalent machines in the definition of specific algorithms, and why the definition of "algorithm" itself often refers back to "the Turing machine". This is discussed in more detail under Stephen Kleene's characterization. The following are summaries of the more famous characterizations (Kleene, Markov, Knuth) together with those that introduce novel elements—elements that further expand the definition or contribute to a more precise definition. [ A mathematical problem and its result can be considered as two points in a space, and the solution consists of a sequence of steps or a path linking them. Quality of the solution is a function of the path. There might be more than one attribute defined for the path, e.g. length, complexity of shape, an ease of generalizing, difficulty, and so on. ] == Chomsky hierarchy == There is more consensus on the "characterization" of the notion of "simple algorithm". All algorithms need to be specified in a formal language, and the "simplicity notion" arises from the simplicity of the language. The Chomsky (1956) hierarchy is a containment hierarchy of classes of formal grammars that generate formal languages. It is used for classifying of programming languages and abstract machines. From the Chomsky hierarchy perspective, if the algorithm can be specified on a simpler language (than unrestricted), it can be characterized by this kind of language, else it is a typical "unrestricted algorithm". Examples: a "general purpose" macro language, like M4 is unrestricted (Turing complete), but the C preprocessor macro language is not, so any algorithm expressed in C preprocessor is a "simple algorithm". See also Relationships between complexity classes. == Features of a good algorithm == The following are desirable features of a well-defined algorithm, as discussed in Scheider and Gersting (1995): Unambiguous Operations: an algorithm must have specific, outlined steps. The steps should be exact enough to precisely specify what to do at each step. Well-Ordered: The exact order of operations performed in an algorithm should be concretely defined. Feasibility: All steps of an algorithm should be possible (also known as effectively computable). Input: an algorithm should be able to accept a well-defined set of inputs. Output: an algorithm should produce some result as an output, so that its correctness can be reasoned about. Finiteness: an algorithm should terminate after a finite number of instructions. Properties of specific algorithms that may be desirable include space and time efficiency, generality (i.e. being able to handle many inputs), or determinism. == 1881 John Venn's negative reaction to W. Stanley Jevons's Logical Machine of 1870 == In early 1870 W. Stanley Jevons presented a "Logical Machine" (Jevons 1880:200) for analyzing a syllogism or other logical form e.g. an argument reduced to a Boolean equation. By means of what Couturat (1914) called a "sort of logical piano [,] ... the equalities which represent the premises ... are "played" on a keyboard like that of a typewriter. ... When all the premises have been "played", the panel shows only those constituents whose sum is equal to 1, that is, ... its logical whole. This mechanical method has the advantage over VENN's geometrical method..." (Couturat 1914:75). For his part John Venn, a logician contemporary to Jevons, was less than thrilled, opining that "it does not seem to me that any contrivances at present known or likely to be discovered really deserve the name of logical machines" (italics added, Venn 1881:120). But of historical use to the developing notion of "algorithm" is his explanation for his negative reaction with respect to a machine that "may subserve a really valuable purpose by enabling us to avoid otherwise inevitable labor": (1) "There is, first, the statement of our data in accurate logical language", (2) "Then secondly, we have to throw these statements into a form fit for the engine to work with – in this case the reduction of each proposition to its elementary denials", (3) "Thirdly, there is the combination or further treatment of our premises after such reduction," (4) "Finally, the results have to be interpreted or read off. This last generally gives rise to much opening for skill and sagacity." He concludes that "I cannot see that any machine can hope to help us except in the third of these steps; so that it seems very doubtful whether any thing of this sort really deserves the name of a logical engine."(Venn 1881:119–121). == 1943, 1952 Stephen Kleene's characterization == This section is longer and more detailed than the others because of its importance to the topic: Kleene was the first to propose that all calculations/computations—of every sort, the totality of—can equivalently be (i) calculated by use of five "primitive recursive operators" plus one special operator called the mu-operator, or be (ii) computed by the actions of a Turing machine or an equivalent model. Furthermore, he opined that either of these would stand as a definition of algorithm. A reader first confronting the words that follow may well be confused, so a brief explanation is in order. Calculation means done by hand, computation means done by Turing machine (or equivalent). (Sometimes an author slips and interchanges the words). A "function" can be thought of as an "input-output box" into which a person puts natural numbers called "arguments" or "parameters" (but only the counting numbers including 0—the nonnegative integers) and gets out a single nonnegative integer (conventionally called "the answer"). Think of the "function-box" as a little man either calculating by hand using "general recursion" or computing by Turing machine (or an equivalent machine). "Effectively calculable/computable" is more generic and means "calculable/computable by some procedure, method, technique ... whatever...". "General recursive" was Kleene's way of writing what today is called just "recursion"; however, "primitive recursion"—calculation by use of the five recursive operators—is a lesser form of recursion that lacks access to the sixth, additional, mu-operator that is needed only in rare instances. Thus most of life goes on requiring only the "primitive recursive functions." === 1943 "Thesis I", 1952 "Church's Thesis" === In 1943 Kleene proposed what has come to be known as Church's thesis: "Thesis I. Every effectively calculable function (effectively decidable predicate) is general recursive" (First stated by Kleene in 1943 (reprinted page 274 in Davis, ed. The Undecidable; appears also verbatim in Kleene (1952) p.300) In a nutshell: to calculate any function the only operations a person needs (technically, formally) are the 6 primitive operators of "general" recursion (nowadays called the operators of the mu recursive functions). Kleene's first statement of this was under the section title "12. Algorithmic theories". He would later amplify it in his text (1952) as follows: "Thesis I and its converse provide the exact definition of the notion of a calculation (decision) procedure or algorithm, for the

    Read more →
  • Algorithms and Combinatorics

    Algorithms and Combinatorics

    Algorithms and Combinatorics (ISSN 0937-5511) is a book series in mathematics, and particularly in combinatorics and the design and analysis of algorithms. It is published by Springer Science+Business Media, and was founded in 1987. == Books == The books published in this series include: The Simplex Method: A Probabilistic Analysis (Karl Heinz Borgwardt, 1987, vol. 1) Geometric Algorithms and Combinatorial Optimization (Martin Grötschel, László Lovász, and Alexander Schrijver, 1988, vol. 2; 2nd ed., 1993) Systems Analysis by Graphs and Matroids (Kazuo Murota, 1987, vol. 3) Greedoids (Bernhard Korte, László Lovász, and Rainer Schrader, 1991, vol. 4) Mathematics of Ramsey Theory (Jaroslav Nešetřil and Vojtěch Rödl, eds., 1990, vol. 5) Matroid Theory and its Applications in Electric Network Theory and in Statics (Andras Recszki, 1989, vol. 6) Irregularities of Partitions: Papers from the meeting held in Fertőd, July 7–11, 1986 (Gábor Halász and Vera T. Sós, eds., 1989, vol. 8) Paths, Flows, and VLSI-Layout: Papers from the meeting held at the University of Bonn, Bonn, June 20–July 1, 1988 (Bernhard Korte, László Lovász, Hans Jürgen Prömel, and Alexander Schrijver, eds., 1990, vol. 9) New Trends in Discrete and Computational Geometry (János Pach, ed., 1993, vol. 10) Discrete Images, Objects, and Functions in Z n {\displaystyle \mathbb {Z} ^{n}} (Klaus Voss, 1993, vol. 11) Linear Optimization and Extensions (Manfred Padberg, 1999, vol. 12) The Mathematics of Paul Erdős I (Ronald Graham and Jaroslav Nešetřil, eds., 1997, vol. 13) The Mathematics of Paul Erdős II (Ronald Graham and Jaroslav Nešetřil, eds., 1997, vol. 14) Geometry of Cuts and Metrics (Michel Deza and Monique Laurent, 1997, vol. 15) Probabilistic Methods for Algorithmic Discrete Mathematics (M. Habib, C. McDiarmid, J. Ramirez-Alfonsin, and B. Reed, 1998, vol. 16) Modern Cryptography, Probabilistic Proofs and Pseudorandomness (Oded Goldreich, 1999, vol. 17) Geometric Discrepancy: An Illustrated Guide (Jiří Matoušek, 1999, vol. 18) Applied Finite Group Actions (Adalbert Kerber, 1999, vol. 19) Matrices and Matroids for Systems Analysis (Kazuo Murota, 2000, vol. 20; corrected ed., 2010) Combinatorial Optimization (Bernhard Korte and Jens Vygen, 2000, vol. 21; 5th ed., 2012) The Strange Logic of Random Graphs (Joel Spencer, 2001, vol. 22) Graph Colouring and the Probabilistic Method (Michael Molloy and Bruce Reed, 2002, Vol. 23) Combinatorial Optimization: Polyhedra and Efficiency (Alexander Schrijver, 2003, vol. 24. In three volumes: A. Paths, flows, matchings; B. Matroids, trees, stable sets; C. Disjoint paths, hypergraphs) Discrete and Computational Geometry: The Goodman-Pollack Festschrift (B. Aronov, S. Basu, J. Pach, and M. Sharir, eds., 2003, vol. 25) Topics in Discrete Mathematics: Dedicated to Jarik Nešetril on the Occasion of his 60th birthday (M. Klazar, J. Kratochvíl, M. Loebl, J. Matoušek, R. Thomas, and P. Valtr, eds., 2006, vol. 26) Boolean Function Complexity: Advances and Frontiers (Stasys Jukna, 2012, Vol. 27) Sparsity: Graphs, Structures, and Algorithms (Jaroslav Nešetřil and Patrice Ossona de Mendez, 2012, vol. 28) Optimal Interconnection Trees in the Plane (Marcus Brazil and Martin Zachariasen, 2015, vol. 29) Combinatorics and Complexity of Partition Functions (Alexander Barvinok, 2016, vol. 30)

    Read more →
  • Pydio

    Pydio

    Pydio Cells, previously known as just Pydio and formerly known as AjaXplorer, is an open-source file-sharing and synchronisation software that runs on the user's own server or in the cloud. == Presentation == The project was created by musician Charles Du Jeu (current CEO and CTO) in 2007 under the name AjaXplorer. The name was changed in 2013 and became Pydio (an acronym for Put Your Data in Orbit). In May 2018, Pydio switched from PHP to Go with the release of Pydio Cells. The PHP version reached end-of-life state on 31 December 2019. Pydio Cells runs on any server supporting a recent Go version. Windows/Linux/macOS on the Intel architecture are directly supported; a fully functional working ARM implementation is under active development. Pydio Cells has been developed from scratch using the Go programming language; release 4.0.0 introduced code refactoring to fully support the Go modular structure as well as grid computing. Nevertheless, the web-based interface of Cells is very similar to the one from Pydio 8 (in PHP), and it successfully replicates most of its features, while adding a few more. There is also a new synchronisation client (also written in Go). The PHP version has been phased out as the company's focus is moving to Pydio Cells, with community feedback on the new features. According to the company, the switch to the new environment was made "to overcome inherent PHP limitations and provide you with a future-proof and modern solution for collaborating on documents". From a technical point of view, Pydio differs from solutions such as Google Drive or Dropbox. Pydio is not based on a public cloud; instead, the software connects to the user's existing storage (such as SAN / Local FS, SAMBA / CIFS, (s)FTP, NFS, S3-compatible cloud storage, Azure Blob Storage, Google Cloud Storage) as well as to the existing user directories (LDAP / AD, OAuth2 / OIDC SSO, SAML / Azure ADFS SSO, RADIUS, Shibboleth...), which allows companies to keep their data inside their infrastructure, according to their data security policy and user rights management. The software is built in a modular perspective; up to Pydio 8, various plugins allowed administrators to implement extra features. On the server side, Pydio Cells is deployed as a collection of independent microservices communicating among themselves using gRPC and logging user actions via Activity Streams 2.0 (AS2). Pydio Cells microservices are built with the Go Micro framework (using an embedded NATS server). A standard installation will deploy all required services on the same physical server, but for the purposes of performance, reliability and high availability, these can now be spread across several different servers (even in geographically separate locations) according to the 12-factors architecture pattern. Pydio Cells is available either through a free and open-source community distribution (Pydio Cells Home), or a commercially-licensed enterprise distribution (in two variants, Pydio Cells Connect and Pydio Cells Enterprise), which add features not available in the community distribution as well as additional levels of support beyond the community forums. == Features == File sharing between different internal users and across other Pydio instances SSL/TLS Encryption WebDAV file server Creation of dedicated workspaces, for each line of business / project / client, with a dedicated user rights management for each workspace. File-sharing with external users (private links, public links, password protection, download limitation, etc.) Online viewing and editing of documents with Collabora Office (Pydio Cells Enterprise also offers OnlyOffice integration) Preview and editing of image files Integrated audio and video reader Activity stream ('timeline') for all actions taken by users Integrated chat platform Client applications are available for all major desktop and mobile platforms.

    Read more →
  • Information quality

    Information quality

    Information quality (IQ) is a contextual property of or a perspective to the content within information systems. There exist two complementary yet partially conflicting definitions of high-quality: firstly, information is considered high quality if it is fit for its intended purpose ; secondly, it is deemed high quality if it conforms to specified requirements . The primary distinction between these definitions is that Juran's perspective focuses on the suitability of information for its intended purpose, which can be measured by the success of its application even without direct access to or exact knowledge of the data. For example, a black-box AI with access to English Wikipedia can work well for users' purposes but using Estonian Wikipedia fails for the same purposes. Given that the AI remains the same, it can be concluded that English version data would be of higher quality in comparison to Estonian version, even without exact comparison of data contents and their properties in each version. In contrast, Crosby emphasizes adherence to predefined specifications, assuming specific criteria rather than measuring the success of its use; for instance, information in Wikipedia could be proven to be good based on criteria such as existing peer validation and academic references, even if the AI results are poor. This approach falls into problems when data is not completely accessible or all quality properties cannot be known and measured leading to false impression of quality due to lacking and misleading metrics. Numerous IQ frameworks and methodologies provide tangible approach to assess and measure DQ/IQ in a robust and rigorous manner. == Conceptual problems == Although the foundational definitions are usable for most everyday purposes, specialists often use more complex models for information quality. It has been suggested, however, that higher the quality the greater will be the confidence in meeting more general, less specific contexts. == Dimensions and metrics of information quality == "Information quality" is a measure of its fitness for use or conformance to requirements. In this way, "quality" is considered contextual and it can then vary across users and uses of the information. The exact degree of quality is often described with dimensions such as accuracy, timeliness, completeness, and similar scales. Although a huge amount of academic research has been directed to these dimensions, there does not exist consensus on their definitions or practical usefulness . Historically, Richard Wang and Diane Strong proposed a list of dimensions or elements used in assessing Information Quality is: Intrinsic IQ: accuracy, objectivity, believability, reputation Contextual IQ: relevance, value-added, timeliness, completeness, amount of information Representational IQ: interpretability, format, coherence, compatibility Accessibility IQ: accessibility, access security Other authors propose similar but different lists of dimensions for analysis, and emphasize measurement and reporting as information quality metrics. Larry English prefers the term "characteristics" to dimensions. However, a considerable amount of information quality research involves investigating and describing various categories of desirable attributes (or dimensions) of data. Research has recently shown the huge diversity of terms and classification structures used. === Quality metrics === Source: Authority/verifiability Authority refers to the expertise or recognized official status of a source. Consider the reputation of the author and publisher. When working with legal or government information, consider whether the source is the official provider of the information. Verifiability refers to the ability of a reader to verify the validity of the information irrespective of how authoritative the source is. To verify the facts is part of the duty of care of the journalistic deontology, as well as, where possible, to provide the sources of information so that they can be verified Scope of coverage Scope of coverage refers to the extent to which a source explores a topic. Consider time periods, geography or jurisdiction and coverage of related or narrower topics. Composition and organization Composition and organization has to do with the ability of the information source to present its particular message in a coherent, logically sequential manner. Objectivity Objectivity is the bias or opinion expressed when a writer interprets or analyze facts. Consider the use of persuasive language, the source's presentation of other viewpoints, its reason for providing the information and advertising. Integrity Adherence to moral and ethical principles; soundness of moral character The state of being whole, entire, or undiminished Comprehensiveness Of large scope; covering or involving much; inclusive: a comprehensive study. Comprehending mentally; having an extensive mental grasp. Insurance. covering or providing broad protection against loss. Validity Validity of some information has to do with the degree of obvious truthfulness which the information carries Uniqueness As much as 'uniqueness' of a given piece of information is intuitive in meaning, it also significantly implies not only the originating point of the information but also the manner in which it is presented and thus the perception which it conjures. The essence of any piece of information we process consists to a large extent of those two elements. Timeliness Timeliness refers to information that is current at the time of publication. Consider publication, creation and revision dates. Beware of Web site scripting that automatically reflects the current day's date on a page. Reproducibility (utilized primarily when referring to instructive information) Means that documented methods are capable of being used on the same data set to achieve a consistent result. == Professional associations == IQ International—the International Association for Information and Data Quality IQ International is a not-for-profit, vendor neutral, professional association formed in 2004, dedicated to building the information and data quality profession. CDOIQ Society Chief Data Officers and Information Quality Society is a global professional society supporting data leaders with networking, meetings, best practices, experience, certification, and training. == Information quality conferences == A number of major conferences relevant to information quality are held annually: Annual MIT Chief Data Officer & Information Quality (CDOIQ) Symposium Annual conferences held at the Massachusetts Institute of Technology, Cambridge, MA, USA Data Governance and Information Quality Conference Commercial conferences held each year in the USA Data Quality Asia Pacific Commercial conference held annually in Sydney or Melbourne, Australia Enterprise Data and Business Intelligence Conference Europe Commercial conferences held annually in London, England. Information and Data Quality Conference Not for profit conference run annually by IQ International (the International Association for Information and Data Quality) in the USA International Conference on Information Quality Academic Conference launched through MITIQ held annually at a University Master Data Management & Data Governance Conferences Six major conferences are run annually by the MDM Institute in venues such as London, San Francisco, Sydney, Toronto, Madrid, Frankfurt, Shanghai and New York City.

    Read more →
  • Algorithmic mechanism design

    Algorithmic mechanism design

    Algorithmic mechanism design (AMD) lies at the intersection of economic game theory, optimization, and computer science. The prototypical problem in mechanism design is to design a system for multiple self-interested participants, such that the participants' self-interested actions at equilibrium lead to good system performance. Typical objectives studied include revenue maximization and social welfare maximization. Algorithmic mechanism design differs from classical economic mechanism design in several respects. It typically employs the analytic tools of theoretical computer science, such as worst case analysis and approximation ratios, in contrast to classical mechanism design in economics which often makes distributional assumptions about the agents. It also considers computational constraints to be of central importance: mechanisms that cannot be efficiently implemented in polynomial time are not considered to be viable solutions to a mechanism design problem. This often, for example, rules out the classic economic mechanism, the Vickrey–Clarke–Groves auction. == History == Noam Nisan and Amir Ronen first coined "Algorithmic mechanism design" in a research paper published in 1999.

    Read more →
  • Hall circles

    Hall circles

    Hall circles (also known as M-circles and N-circles) are a graphical tool in control theory used to obtain values of a closed-loop transfer function from the Nyquist plot (or the Nichols plot) of the associated open-loop transfer function. Hall circles have been introduced in control theory by Albert C. Hall in his thesis. == Construction == Consider a closed-loop linear control system with open-loop transfer function given by transfer function G ( s ) {\displaystyle G(s)} and with a unit gain in the feedback loop. The closed-loop transfer function is given by T ( s ) = G ( s ) 1 + G ( s ) {\textstyle T(s)={\frac {G(s)}{1+G(s)}}} . To check the stability of T(s), it is possible to use the Nyquist stability criterion with the Nyquist plot of the open-loop transfer function G(s). Note, however, that the Nyquist plot of G(s) does not give the actual values of T(s). To get this information from the G(s)-plane, Hall proposed to construct the locus of points in the G(s)-plane such that T(s) has constant magnitude and also the locus of points in the G(s)-plane such that T(s) has constant phase angle. Given a positive real value M representing a fixed magnitude, and denoting G(s) by z, the points satisfying M = | T ( s ) | = | G ( s ) | | 1 + G ( s ) | = | z | | 1 + z | {\displaystyle M=|T(s)|={\frac {|G(s)|}{|1+G(s)|}}={\frac {|z|}{|1+z|}}} are given by the points z in the G(s)-plane such that the ratio of the distance between z and 0 and the distance between z and -1 is equal to M. The points z satisfying this locus condition are circles of Apollonius, and this locus is known in the context of control systems as M-circles. Given a positive real value N representing a phase angle, the points satisfying N = arg ⁡ [ G ( s ) 1 + G ( s ) ] = arg ⁡ [ G ( s ) ] − arg ⁡ [ 1 + G ( s ) ] = arg ⁡ [ z ] − arg ⁡ [ 1 + z ] {\displaystyle N=\arg \left[{\frac {G(s)}{1+G(s)}}\right]=\arg[G(s)]-\arg[1+G(s)]=\arg[z]-\arg[1+z]} are given by the points z in the G(s)-plane such that the angle between -1 and z and the angle between 0 and z is constant. In other words, the angle opposed to the line segment between -1 and 0 must be constant. This implies that the points z satisfying this locus condition are arcs of circles, and this locus is known in the context of control systems as N-circles. == Usage == To use the Hall circles, a plot of M and N circles is done over the Nyquist plot of the open-loop transfer function. The points of the intersection between these graphics give the corresponding value of the closed-loop transfer function. Hall circles are also used with the Nichols plot and in this setting, are also known as Nichols chart. Rather than overlaying directly the Hall circles over the Nichols plot, the points of the circles are transferred to a new coordinate system where the ordinate is given by 20 log 10 ⁡ ( | G ( s ) | ) {\displaystyle 20\log _{10}(|G(s)|)} and the abscissa is given by arg ⁡ ( G ( s ) ) {\displaystyle \arg(G(s))} . The advantage of using Nichols chart is that adjusting the gain of the open loop transfer function directly reflects in up and down translation of the Nichols plot in the chart.

    Read more →
  • Topological deep learning

    Topological deep learning

    Topological deep learning (TDL) is a research field that extends deep learning to handle complex, non-Euclidean data structures. Traditional deep learning models, such as convolutional neural networks (CNNs) and recurrent neural networks (RNNs), excel in processing data on regular grids and sequences. However, scientific and real-world data often exhibit more intricate data domains encountered in scientific computations, including point clouds, meshes, time series, scalar fields graphs, or general topological spaces like simplicial complexes and CW complexes. TDL addresses this by incorporating topological concepts to process data with higher-order relationships, such as interactions among multiple entities and complex hierarchies. This approach leverages structures like simplicial complexes and hypergraphs to capture global dependencies and qualitative spatial properties, offering a more nuanced representation of data. TDL also encompasses methods from computational and algebraic topology that permit studying properties of neural networks and their training process, such as their predictive performance or generalization properties. The mathematical foundations of TDL are algebraic topology, differential topology, and geometric topology. Therefore, TDL can be generalized for data on differentiable manifolds, knots, links, tangles, curves, etc. == History and motivation == Traditional techniques from deep learning often operate under the assumption that a dataset is residing in a highly-structured space (like images, where convolutional neural networks exhibit outstanding performance over alternative methods) or a Euclidean space. The prevalence of new types of data, in particular graphs, meshes, and molecules, resulted in the development of new techniques, culminating in the field of geometric deep learning, which originally proposed a signal-processing perspective for treating such data types. While originally confined to graphs, where connectivity is defined based on nodes and edges, follow-up work extended concepts to a larger variety of data types, including simplicial complexes and CW complexes, with recent work proposing a unified perspective of message-passing on general combinatorial complexes. An independent perspective on different types of data originated from topological data analysis, which proposed a new framework for describing structural information of data, i.e., their "shape," that is inherently aware of multiple scales in data, ranging from local information to global information. While at first restricted to smaller datasets, subsequent work developed new descriptors that efficiently summarized topological information of datasets to make them available for traditional machine-learning techniques, such as support vector machines or random forests. Such descriptors ranged from new techniques for feature engineering over new ways of providing suitable coordinates for topological descriptors, or the creation of more efficient dissimilarity measures. Contemporary research in this field is largely concerned with either integrating information about the underlying data topology into existing deep-learning models or obtaining novel ways of training on topological domains. == Learning on topological spaces == One of the core concepts in topological deep learning is considering the domain upon which this data is defined and supported. In case of Euclidean data, such as images, this domain is a grid, upon which the pixel value of the image is supported. In a more general setting this domain might be a topological domain. Studying and developing deep learning models that are supported ln topological domains constitute the essence of topological deep learning. Next, we introduce the most common topological domains that are encountered in a deep learning setting. These domains include, but not limited to, graphs, simplicial complexes, cell complexes, combinatorial complexes and hypergraphs. Given a finite set S of abstract entities, a neighborhood function N {\displaystyle {\mathcal {N}}} on S is an assignment that attach to every point x {\displaystyle x} in S a subset of S or a relation. Such a function can be induced by equipping S with an auxiliary structure. Edges provide one way of defining relations among the entities of S. More specifically, edges in a graph allow one to define the notion of neighborhood using, for instance, the one hop neighborhood notion. Edges however, limited in their modeling capacity as they can only be used to model binary relations among entities of S since every edge is connected typically to two entities. In many applications, it is desirable to permit relations that incorporate more than two entities. The idea of using relations that involve more than two entities is central to topological domains. Such higher-order relations allow for a broader range of neighborhood functions to be defined on S to capture multi-way interactions among entities of S. Next we review the main properties, advantages, and disadvantages of some commonly studied topological domains in the context of deep learning, including (abstract) simplicial complexes, regular cell complexes, hypergraphs, and combinatorial complexes. ==== Comparisons among topological domains ==== Each of the enumerated topological domains has its own characteristics, advantages, and limitations: Simplicial complexes Simplest form of higher-order domains. Extensions of graph-based models. Admit hierarchical structures, making them suitable for various applications. Hodge theory can be naturally defined on simplicial complexes. Require relations to be subsets of larger relations, imposing constraints on the structure. Cell Complexes Generalize simplicial complexes. Provide more flexibility in defining higher-order relations. Each cell in a cell complex is homeomorphic to an open ball, attached together via attaching maps. Boundary cells of each cell in a cell complex are also cells in the complex. Represented combinatorially via incidence matrices. Hypergraphs Allow arbitrary set-type relations among entities. Relations are not imposed by other relations, providing more flexibility. Do not explicitly encode the dimension of cells or relations. Useful when relations in the data do not adhere to constraints imposed by other models like simplicial and cell complexes. Combinatorial Complexes : Generalize and bridge the gaps between simplicial complexes, cell complexes, and hypergraphs. Allow for hierarchical structures and set-type relations. Combine features of other complexes while providing more flexibility in modeling relations. Can be represented combinatorially, similar to cell complexes. ==== Hierarchical structure and set-type relations ==== The properties of simplicial complexes, cell complexes, and hypergraphs give rise to two main features of relations on higher-order domains, namely hierarchies of relations and set-type relations. ===== Rank function ===== A rank function on a higher-order domain X is an order-preserving function rk: X → Z, where rk(x) attaches a non-negative integer value to each relation x in X, preserving set inclusion in X. Cell and simplicial complexes are common examples of higher-order domains equipped with rank functions and therefore with hierarchies of relations. ===== Set-type relations ===== Relations in a higher-order domain are called set-type relations if the existence of a relation is not implied by another relation in the domain. Hypergraphs constitute examples of higher-order domains equipped with set-type relations. Given the modeling limitations of simplicial complexes, cell complexes, and hypergraphs, we develop the combinatorial complex, a higher-order domain that features both hierarchies of relations and set-type relations. The learning tasks in TDL can be broadly classified into three categories: Cell classification: Predict targets for each cell in a complex. Examples include triangular mesh segmentation, where the task is to predict the class of each face or edge in a given mesh. Complex classification: Predict targets for an entire complex. For example, predict the class of each input mesh. Cell prediction: Predict properties of cell-cell interactions in a complex, and in some cases, predict whether a cell exists in the complex. An example is the prediction of linkages among entities in hyperedges of a hypergraph. In practice, to perform the aforementioned tasks, deep learning models designed for specific topological spaces must be constructed and implemented. These models, known as topological neural networks, are tailored to operate effectively within these spaces. === Topological neural networks === Central to TDL are topological neural networks (TNNs), specialized architectures designed to operate on data structured in topological domains. Unlike traditional neural networks tailored for grid-like structures, TNNs are adept at handling more intricate data representations, such as graphs

    Read more →
  • OpenSMILE

    OpenSMILE

    openSMILE is source-available software for automatic extraction of features from audio signals and for classification of speech and music signals. "SMILE" stands for "Speech & Music Interpretation by Large-space Extraction". The software is mainly applied in the area of automatic emotion recognition and is widely used in the affective computing research community. The openSMILE project exists since 2008 and is maintained by the German company audEERING GmbH since 2013. openSMILE is provided free of charge for research purposes and personal use under a source-available license. For commercial use of the tool, the company audEERING offers custom license options. == Application Areas == openSMILE is used for academic research as well as for commercial applications in order to automatically analyze speech and music signals in real-time. In contrast to automatic speech recognition which extracts the spoken content out of a speech signal, openSMILE is capable of recognizing the characteristics of a given speech or music segment. Examples for such characteristics encoded in human speech are a speaker's emotion, age, gender, and personality, as well as speaker states like depression, intoxication, or vocal pathological disorders. The software further includes music classification technology for automatic music mood detection and recognition of chorus segments, key, chords, tempo, meter, dance-style, and genre. The openSMILE toolkit serves as benchmark in manifold research competitions such as Interspeech ComParE, AVEC, MediaEval, and EmotiW. == History == The openSMILE project was started in 2008 by Florian Eyben, Martin Wöllmer, and Björn Schuller at the Technical University of Munich within the European Union research project SEMAINE. The goal of the SEMAINE project was to develop a virtual agent with emotional and social intelligence. In this system, openSMILE was applied for real-time analysis of speech and emotion. The final SEMAINE software release is based on openSMILE version 1.0.1. In 2009, the emotion recognition toolkit (openEAR) was published based on openSMILE. "EAR" stands for "Emotion and Affect Recognition". In 2010, openSMILE version 1.0.1 was published and was introduced and awarded at the ACM Multimedia Open-Source Software Challenge. Between 2011 and 2013, the technology of openSMILE was extended and improved by Florian Eyben and Felix Weninger in the context of their doctoral thesis at the Technical University of Munich. The software was also applied for the project ASC-Inclusion, which was funded by the European Union. For this project, the software was extended by Erik Marchi in order to teach emotional expression to autistic children, based on automatic emotion recognition and visualization. In 2013, the company audEERING acquired the rights to the code-base from the Technical University of Munich and version 2.0 was published under a source-available research license. Until 2016, openSMILE was downloaded more than 50,000 times worldwide and has established itself as a standard toolkit for emotion recognition. == Awards == openSMILE was awarded in 2010 in the context of the ACM Multimedia Open Source Competition. The software tool is applied in numerous scientific publications on automatic emotion recognition. openSMILE and its extension openEAR have been cited in more than 1000 scientific publications until today.

    Read more →
  • Leiden algorithm

    Leiden algorithm

    The Leiden algorithm is a community detection algorithm developed by Traag et al at Leiden University. It was developed as a modification of the Louvain method. Like the Louvain method, the Leiden algorithm attempts to optimize modularity in extracting communities from networks; however, it addresses key issues present in the Louvain method, namely poorly connected communities and the resolution limit of modularity. == Improvement over Louvain method == Broadly, the Leiden algorithm uses the same two primary phases as the Louvain algorithm: a local node moving step (though, the method by which nodes are considered in Leiden is more efficient) and a graph aggregation step. However, to address the issues with poorly-connected communities and the merging of smaller communities into larger communities (the resolution limit of modularity), the Leiden algorithm employs an intermediate refinement phase in which communities may be split to guarantee that all communities are well-connected. Consider, for example, the following graph: Three communities are present in this graph (each color represents a community). Additionally, the center "bridge" node (represented with an extra circle) is a member of the community represented by blue nodes. Now consider the result of a node-moving step which merges the communities denoted by red and green nodes into a single community (as the two communities are highly connected): Notably, the center "bridge" node is now a member of the larger red community after node moving occurs (due to the greedy nature of the local node moving algorithm). In the Louvain method, such a merging would be followed immediately by the graph aggregation phase. However, this causes a disconnection between two different sections of the community represented by blue nodes. In the Leiden algorithm, the graph is instead refined: The Leiden algorithm's refinement step ensures that the center "bridge" node is kept in the blue community to ensure that it remains intact and connected, despite the potential improvement in modularity from adding the center "bridge" node to the red community. == Graph components == Before defining the Leiden algorithm, it will be helpful to define some of the components of a graph. === Vertices and edges === A graph is composed of vertices (nodes) and edges. Each edge is connected to two vertices, and each vertex may be connected to zero or more edges. Edges are typically represented by straight lines, while nodes are represented by circles or points. In set notation, let V {\displaystyle V} be the set of vertices, and E {\displaystyle E} be the set of edges: V := { v 1 , v 2 , … , v n } E := { e i j , e i k , … , e k l } {\displaystyle {\begin{aligned}V&:=\{v_{1},v_{2},\dots ,v_{n}\}\\E&:=\{e_{ij},e_{ik},\dots ,e_{kl}\}\end{aligned}}} where e i j {\displaystyle e_{ij}} is the directed edge from vertex v i {\displaystyle v_{i}} to vertex v j {\displaystyle v_{j}} . We can also write this as an ordered pair: e i j := ( v i , v j ) {\displaystyle {\begin{aligned}e_{ij}&:=(v_{i},v_{j})\end{aligned}}} === Community === A community is a unique set of nodes: C i ⊆ V C i ⋂ C j = ∅ ∀ i ≠ j {\displaystyle {\begin{aligned}C_{i}&\subseteq V\\C_{i}&\bigcap C_{j}=\emptyset ~\forall ~i\neq j\end{aligned}}} and the union of all communities must be the total set of vertices: V = ⋃ i = 1 C i {\displaystyle {\begin{aligned}V&=\bigcup _{i=1}C_{i}\end{aligned}}} === Partition === A partition is the set of all communities: P = { C 1 , C 2 , … , C n } {\displaystyle {\begin{aligned}{\mathcal {P}}&=\{C_{1},C_{2},\dots ,C_{n}\}\end{aligned}}} == Partition quality == How communities are partitioned is an integral part on the Leiden algorithm. How partitions are decided can depend on how their quality is measured. Additionally, many of these metrics contain parameters of their own that can change the outcome of their communities. === Modularity === Modularity is a highly used quality metric for assessing how well a set of communities partition a graph. The equation for this metric is defined for an adjacency matrix, A, as: Q = 1 2 m ∑ i j ( A i j − k i k j 2 m ) δ ( c i , c j ) {\displaystyle Q={\frac {1}{2m}}\sum _{ij}(A_{ij}-{\frac {k_{i}k_{j}}{2m}})\delta (c_{i},c_{j})} where: A i j {\displaystyle A_{ij}} represents the edge weight between nodes i {\displaystyle i} and j {\displaystyle j} ; see Adjacency matrix; k i {\displaystyle k_{i}} and k j {\displaystyle k_{j}} are the sum of the weights of the edges attached to nodes i {\displaystyle i} and j {\displaystyle j} , respectively; m {\displaystyle m} is the sum of all of the edge weights in the graph; c i {\displaystyle c_{i}} and c j {\displaystyle c_{j}} are the communities to which the nodes i {\displaystyle i} and j {\displaystyle j} belong; and δ {\displaystyle \delta } is Kronecker delta function: δ ( c i , c j ) = { 1 if c i and c j are the same community 0 otherwise {\displaystyle {\begin{aligned}\delta (c_{i},c_{j})&={\begin{cases}1&{\text{if }}c_{i}{\text{ and }}c_{j}{\text{ are the same community}}\\0&{\text{otherwise}}\end{cases}}\end{aligned}}} === Reichardt Bornholdt Potts Model (RB) === One of the most well used metrics for the Leiden algorithm is the Reichardt Bornholdt Potts Model (RB). This model is used by default in most mainstream Leiden algorithm libraries under the name RBConfigurationVertexPartition. This model introduces a resolution parameter γ {\displaystyle \gamma } and is highly similar to the equation for modularity. This model is defined by the following quality function for an adjacency matrix, A, as: Q = ∑ i j ( A i j − γ k i k j 2 m ) δ ( c i , c j ) {\displaystyle Q=\sum _{ij}(A_{ij}-\gamma {\frac {k_{i}k_{j}}{2m}})\delta (c_{i},c_{j})} where: γ {\displaystyle \gamma } represents a linear resolution parameter === Constant Potts Model (CPM) === Another metric similar to RB is the Constant Potts Model (CPM). This metric also relies on a resolution parameter γ {\displaystyle \gamma } The quality function is defined as: H = − ∑ i j ( A i j w i j − γ ) δ ( c i , c j ) {\displaystyle H=-\sum _{ij}(A_{ij}w_{ij}-\gamma )\delta (c_{i},c_{j})} === Understanding Potts Model resolution parameters/Resolution limit === Typically Potts models such as RB or CPM include a resolution parameter in their calculation. Potts models are introduced as a response to the resolution limit problem that is present in modularity maximization based community detection. The resolution limit problem is that, for some graphs, maximizing modularity may cause substructures of a graph to merge and become a single community and thus smaller structures are lost. These resolution parameters allow modularity adjacent methods to be modified to suit the requirements of the user applying the Leiden algorithm to account for small substructures at a certain granularity. The figure on the right illustrates why resolution can be a helpful parameter when using modularity based quality metrics. In the first graph, modularity only captures the large scale structures of the graph; however, in the second example, a more granular quality metric could potentially detect all substructures in a graph. == Algorithm == The Leiden algorithm starts with a graph of disorganized nodes (a) and sorts it by partitioning them to maximize modularity (the difference in quality between the generated partition and a hypothetical randomized partition of communities). The method it uses is similar to the Louvain algorithm, except that after moving each node it also considers that node's neighbors that are not already in the community it was placed in. This process results in our first partition (b), also referred to as P {\displaystyle {\mathcal {P}}} . Then the algorithm refines this partition by first placing each node into its own individual community and then moving them from one community to another to maximize modularity. It does this iteratively until each node has been visited and moved, and each community has been refined - this creates partition (c), which is the initial partition of P refined {\displaystyle {\mathcal {P}}_{\text{refined}}} . Then an aggregate network (d) is created by turning each community into a node. P refined {\displaystyle {\mathcal {P}}_{\text{refined}}} is used as the basis for the aggregate network while P {\displaystyle {\mathcal {P}}} is used to create its initial partition. Because we use the original partition P {\displaystyle {\mathcal {P}}} in this step, we must retain it so that it can be used in future iterations. These steps together form the first iteration of the algorithm. In subsequent iterations, the nodes of the aggregate network (which each represent a community) are once again placed into their own individual communities and then sorted according to modularity to form a new P refined {\displaystyle {\mathcal {P}}_{\text{refined}}} , forming (e) in the above graphic. In the case depicted by the graph, the nodes were already sorted optimally, so no change too

    Read more →