BioBike(nee. BioLingua ) is a cloud-based, through-the-web programmable (Paas) symbolic biocomputing and bioinformatics platform that aims to make computational biology, and especially intelligent biocomputing (that is, the application of Artificial Intelligence to computational biology) accessible to research scientists who are not expert programmers. == Unique capabilities == BioBIKE is an integrated symbolic biocomputing and bioinformatics platform, built from the start as an entirely (what is now called) cloud-based architecture where all computing is done in remote servers, and all user access is accomplished through web browsers. BioBIKE has a built-in frame system in which all objects, data, and knowledge are represented. This enables code written either in the native Lisp, in the visual programming language, or systems of rules expressed in the SNARK theorem prover to access the whole of biological knowledge in an integrated manner. For its time (released in 2002) it was unique in permitting users to create fully functional biocomputing programs that run on the back-end servers entirely through the web browser UI. (In modern terms it was one of the first PaaS (Platform as a Service) systems, predating even Salesforce in this capability.) Initially this programming was carried out in raw Lisp, but Jeff Elhai's team at VCU, with NSF funding, created an entirely graphical programming environment on top of BioBIKE based upon the Boxer-style programming environments. Being a multi-headed, multi-threaded, multi-user, multi-tenancy cloud-based system, BioBIKE users were able to directly work together through their web browsers, remotely sharing the same listener and memory space. This permitted a unique sort of collaboration, discussed in Shrager (2007). A specialized offshoot of BioBIKE called "BioDeducta" includes SRI's SNARK theorem prover, offering unique "deductive biocomputing" capabilities. == Implementation == BioBIKE is open-source software implemented using the Lisp programming language. Continuing development takes place by the BioBIKE team centered at Virginia Commonwealth University . == History == BioBIKE was originally called "BioLingua", and was developed by Jeff Shrager at The Carnegie Inst. of Washington Dept. of Plant Biology, and JP Massar with funding from NASA's Astrobiology Division. Shrager and Massar wanted to create a web-based, multi-user Lisp Machine, specialized for bioinformatics. Other early contributors to the project included Mike Travers, and Jeff Elhai of VCU. Elhai obtained continuing funding from the National Science Foundation for the project, which was renamed BioBIKE. Elhai and colleagues added BioBIKE's unique visual programming language. Shrager, meanwhile, collaborated with Richard Waldinger at SRI to build SRI's (SNARK) theorem prover into BioBIKE, creating a deductive biocomputing system, called BioDeducta. == Instances == There used to be a number of BioBIKE verticals in different biological domains, including viral pathogens, cyanobacteria and other bacteria, Arabidopsis thaliana, and several others described in the references.
Language engineering
Language engineering involves the creation of natural language processing systems, whose cost and outputs are measurable and predictable. It is a distinct field contrasted to natural language processing and computational linguistics. A recent trend of language engineering is the use of Semantic Web technologies for the creation, archiving, processing, and retrieval of machine processable language data. Meta-Language Engineering is a proposed extension of Language Engineering first recorded in 2025, associated with the work of Delyone de Paula Canedo Filho. The term is used to designate an approach that, in addition to natural language processing, encompasses the symbolic, cognitive, and epistemological structuring of language systems.
Information professional
The term information professional or information specialist refers to professionals responsible for the collection, documentation, organization, storage, preservation, retrieval, and dissemination of printed and digital information. The service delivered to the client is known as an information service. The term "information professional" is a versatile one, used to describe similar and sometimes overlapping professions, such as librarians, archivists, information managers, information systems specialists, information scientists, records managers, and information consultants. However, terminology differs among sources and organisations. Information professionals are employed in a variety of private, public, and academic institutions, as well as independently. == Skills == Since the term information professional is broad, the skills required for this profession are also varied. A Gartner report in 2011 pointed out that "Professional roles focused on information management will be different to that of established IT roles. An 'information professional' will not be one type of role or skill set, but will in fact have a number of specializations". Thus, an information professional can possess a variety of different skills, depending on the sector in which the person is employed. Some essential cross-sector skills are: IT skills, such as word-processing and spreadsheets, digitisation skills, and conducting Internet searches, together with skills loan systems, databases, content management systems, and specially designed programmes and packages. Customer service. An information professional should have the ability to address the information needs of customers. Language proficiency. This is essential in order to manage the information at hand and deal with customer needs. Soft skills. These include skills such as negotiating, conflict resolution, and time management. Management training. An information professional should be familiar with notions such as strategic planning and project management. Moreover, an information professional should be skilled in planning and using relevant systems, in capturing and securing information, and in accessing it to deliver service whenever the information is required. == Associations == Most countries have a professional association who oversee the professional and academic standards of librarians and other information professionals. There are also international associations related to LIS (library and information science), the most prominent of which is the International Federation of Library Associations and Institutions (IFLA). In many countries, LIS courses are accredited by the relevant professional association, as the American Library Association (ALA) in the USA, the Chartered Institute of Library and Information Professionals (CILIP) in the UK, and the Australian Library and Information Association (ALIA) in Australia. == Qualifications == Educational institutions around the world offer academic degrees, or degrees on related subjects such as Archival Studies, Information Systems, Information Management, and Records Management. Some of the institutions offering information science education refer to themselves as an iSchool, such as the CiSAP (Consortium of iSchools Asia Pacific, founded 2006) in Asia and the iSchool Caucus in the USA. There are also online e-learning resources, some of which offer certification for information professionals. === Africa === Information development in Africa started later than in other continents, mainly due to a lack of internet access, expertise and resources to manage digital infrastructure, and "opportunities for capacity development and knowledge-sharing". Nowadays, academic degrees in information studies are available at many universities of African countries, such as the University of Pretoria (South Africa), University of Nairobi (Kenya), Makerere University (Uganda), University of Botswana (Botswana), and University of Nigeria (Nigeria). === Asia === LIS-related studies are available in more than 30 Asian countries. Some examples listed by iSchools Inc. are the University of Hong Kong, University of Tsukuba, Japan, Yonsei University, South Korea, National Taiwan University and Wuhan University, China. Centre of Library and Information Management Science (CLIMS) at Tata Institute of Social Science in Mumbai, India. In Southeast Asia, the Congress of Southeast Asian Librarians (CONSAL) connects librarians and libraries in more than 10 countries with resources, networking opportunities, and support for growing library systems. === Australasia === The Australian Library and Information Association (ALIA) as of 2021 lists six schools offering undergraduate and postgraduate accredited university courses for "Librarian and Information Specialists" on their website. In New Zealand, the Open Polytechnic of New Zealand and the Victoria University of Wellington offer undergraduate and postgraduate degree courses for information professionals. === Europe === The majority of European countries have universities, colleges, or schools which offer bachelor's degrees in LIS studies. Over 40 universities offer master's degrees in LIS-related fields, and many institutions, such as the Swedish School of Library and Information Science at the University of Borås (Sweden), the University of Barcelona (Spain), Loughborough University (UK), and Aberystwyth University (Wales, UK) also offer PhD degrees. === North America === Information studies and degrees are available at numerous academic institutions throughout the U.S. and Canada. U.S. professional associations, together with their European counterparts, have undertaken many educational initiatives and pioneered many advances in the field of Information studies, such as increased interdisciplinarity and more effective delivery of distance learning. The Association for Intelligent Information Management, based in Silver Spring, Maryland, offers a qualification called Certified Information Professional (CIP), earned upon passing an examination, with certification remaining valid for three years. === South America === There are many schools and colleges in Latin America, which offer courses in Library Science, Archival Studies, and Information Studies, however these subjects are taught completely separately.
Enterprise information integration
Enterprise information integration (EII) is the ability to support a unified view of data and information for an entire organization. The goal of EII is to get a large set of heterogeneous data sources to appear to a user or system as a single, homogeneous data source. In a data virtualization application of EII, there is a process of information integration, using data abstraction to provide a unified interface (known as uniform data access) for viewing all the data within an organization, and a single set of structures and naming conventions (known as uniform information representation) to represent this data. == Overview == Data within an enterprise can be stored in heterogeneous formats, including relational databases (which themselves come in a large number of varieties), text files, XML files, spreadsheets and a variety of proprietary storage methods, each with their own indexing and data access methods. Standardized data access APIs have emerged that offer a specific set of commands to retrieve and modify data from a generic data source. Many applications exist that implement these APIs' commands across various data sources, most notably relational databases. Such APIs include ODBC, JDBC, XQJ, OLE DB, and more recently ADO.NET. There are also standard formats for representing data within a file that are very important to information integration. The best-known of these is XML, which has emerged as a standard universal representation format. There are also more specific XML "grammars" defined for specific types of data such as Geography Markup Language for expressing geographical features and Directory Service Markup Language for holding directory-style information. In addition, non-XML standard formats exist such as iCalendar for representing calendar information and vCard for business card information. Enterprise Information Integration (EII) applies data integration commercially. Despite the theoretical problems described above, the private sector shows more concern with the problems of data integration as a viable product. EII emphasizes neither correctness nor tractability, but speed and simplicity. === Uniform data access === Uniform data access means connectivity and controllability across numerous target data sources. Necessary to fields such as EII and Electronic Data Interchange (EDI), it is most often used regarding analysis of disparate data types and data sources, which must be rendered into a uniform information representation, and generally must appear homogenous to the analysis tools—when the data being analyzed is typically heterogeneous and widely varying in size, type, and original representation. === Uniform information representation === Uniform information representation allows information from several realms or disciplines to be displayed and worked with as if it came from the same realm or discipline. It takes information from a number of sources, which may have used different methodologies and metrics in their data collection, and builds a single large collection of information, where some records may be more complete than others across all fields of data Uniform information representation is particularly important in EII and Electronic Data Interchange (EDI), where different departments of a large organization may have collected information for different purposes, with different labels and units, until one department realized that data already collected by those other departments could be re-purposed for their own needs—saving the enterprise the effort and cost of re-collecting the same information. === Combining disparate data sets === Each data source is disparate and as such is not designed to support EII. Therefore, data virtualization as well as data federation depends upon accidental data commonality to support combining data and information from disparate data sets. Because of this lack of data value commonality across data sources, the return set may be inaccurate, incomplete, and impossible to validate. One solution is to recast disparate databases to integrate these databases without the need for ETL. The recast databases support commonality constraints where referential integrity may be enforced between databases. The recast databases provide designed data access paths with data value commonality across databases. === Simplicity of deployment === Even if recognized as a solution to a problem, EII as of 2009 currently takes time to apply and offers complexities in deployment. Proposed schema-less solutions include "Lean Middleware". === Handling higher-order information === Analysts experience difficulty—even with a functioning information integration system—in determining whether the sources in the database will satisfy a given application. Answering these kinds of questions about a set of repositories requires semantic information like metadata and/or ontologies. == Applications == EII products enable loose coupling between homogeneous-data consuming client applications and services and heterogeneous-data stores. Such client applications and services include Desktop Productivity Tools (spreadsheets, word processors, presentation software, etc.), development environments and frameworks (Java EE, .NET, Mono, SOAP or RESTful Web services, etc.), business intelligence (BI), business activity monitoring (BAM) software, enterprise resource planning (ERP), Customer relationship management (CRM), business process management (BPM and/or BPEL) Software, and web content management (CMS). == Data access technologies == Service Data Objects (SDO) for Java, C++ and .Net clients and any type of data source XQuery and XQuery API for Java
Collective operation
Collective operations are building blocks for interaction patterns, that are often used in SPMD algorithms in the parallel programming context. Hence, there is an interest in efficient realizations of these operations. A realization of the collective operations is provided by the Message Passing Interface (MPI). == Definitions == In all asymptotic runtime functions, we denote the latency α {\displaystyle \alpha } (or startup time per message, independent of message size), the communication cost per word β {\displaystyle \beta } , the number of processing units p {\displaystyle p} and the input size per node n {\displaystyle n} . In cases where we have initial messages on more than one node we assume that all local messages are of the same size. To address individual processing units we use p i ∈ { p 0 , p 1 , … , p p − 1 } {\displaystyle p_{i}\in \{p_{0},p_{1},\dots ,p_{p-1}\}} . If we do not have an equal distribution, i.e. node p i {\displaystyle p_{i}} has a message of size n i {\displaystyle n_{i}} , we get an upper bound for the runtime by setting n = max ( n 0 , n 1 , … , n p − 1 ) {\displaystyle n=\max(n_{0},n_{1},\dots ,n_{p-1})} . A distributed memory model is assumed. The concepts are similar for the shared memory model. However, shared memory systems can provide hardware support for some operations like broadcast (§ Broadcast) for example, which allows convenient concurrent read. Thus, new algorithmic possibilities can become available. == Broadcast == The broadcast pattern is used to distribute data from one processing unit to all processing units, which is often needed in SPMD parallel programs to dispense input or global values. Broadcast can be interpreted as an inverse version of the reduce pattern (§ Reduce). Initially only root r {\displaystyle r} with i d {\displaystyle id} 0 {\displaystyle 0} stores message m {\displaystyle m} . During broadcast m {\displaystyle m} is sent to the remaining processing units, so that eventually m {\displaystyle m} is available to all processing units. Since an implementation by means of a sequential for-loop with p − 1 {\displaystyle p-1} iterations becomes a bottleneck, divide-and-conquer approaches are common. One possibility is to utilize a binomial tree structure with the requirement that p {\displaystyle p} has to be a power of two. When a processing unit is responsible for sending m {\displaystyle m} to processing units i . . j {\displaystyle i..j} , it sends m {\displaystyle m} to processing unit ⌈ ( i + j ) / 2 ⌉ {\displaystyle \left\lceil (i+j)/2\right\rceil } and delegates responsibility for the processing units ⌈ ( i + j ) / 2 ⌉ . . j {\displaystyle \left\lceil (i+j)/2\right\rceil ..j} to it, while its own responsibility is cut down to i . . ⌈ ( i + j ) / 2 ⌉ − 1 {\displaystyle i..\left\lceil (i+j)/2\right\rceil -1} . Binomial trees have a problem with long messages m {\displaystyle m} . The receiving unit of m {\displaystyle m} can only propagate the message to other units, after it received the whole message. In the meantime, the communication network is not utilized. Therefore pipelining on binary trees is used, where m {\displaystyle m} is split into an array of k {\displaystyle k} packets of size ⌈ n / k ⌉ {\displaystyle \left\lceil n/k\right\rceil } . The packets are then broadcast one after another, so that data is distributed fast in the communication network. Pipelined broadcast on balanced binary tree is possible in O ( α log p + β n ) {\displaystyle {\mathcal {O}}(\alpha \log p+\beta n)} , whereas for the non-pipelined case it takes O ( ( α + β n ) log p ) {\displaystyle {\mathcal {O}}((\alpha +\beta n)\log p)} cost. == Reduce == The reduce pattern is used to collect data or partial results from different processing units and to combine them into a global result by a chosen operator. Given p {\displaystyle p} processing units, message m i {\displaystyle m_{i}} is on processing unit p i {\displaystyle p_{i}} initially. All m i {\displaystyle m_{i}} are aggregated by ⊗ {\displaystyle \otimes } and the result is eventually stored on p 0 {\displaystyle p_{0}} . The reduction operator ⊗ {\displaystyle \otimes } must be associative at least. Some algorithms require a commutative operator with a neutral element. Operators like s u m {\displaystyle sum} , m i n {\displaystyle min} , m a x {\displaystyle max} are common. Implementation considerations are similar to broadcast (§ Broadcast). For pipelining on binary trees the message must be representable as a vector of smaller object for component-wise reduction. Pipelined reduce on a balanced binary tree is possible in O ( α log p + β n ) {\displaystyle {\mathcal {O}}(\alpha \log p+\beta n)} . == All-Reduce == The all-reduce pattern (also called allreduce) is used if the result of a reduce operation (§ Reduce) must be distributed to all processing units. Given p {\displaystyle p} processing units, message m i {\displaystyle m_{i}} is on processing unit p i {\displaystyle p_{i}} initially. All m i {\displaystyle m_{i}} are aggregated by an operator ⊗ {\displaystyle \otimes } and the result is eventually stored on all p i {\displaystyle p_{i}} . Analog to the reduce operation, the operator ⊗ {\displaystyle \otimes } must be at least associative. All-reduce can be interpreted as a reduce operation with a subsequent broadcast (§ Broadcast). For long messages a corresponding implementation is suitable, whereas for short messages, the latency can be reduced by using a hypercube (Hypercube (communication pattern) § All-Gather/ All-Reduce) topology, if p {\displaystyle p} is a power of two. All-reduce can also be implemented with a butterfly algorithm and achieve optimal latency and bandwidth. All-reduce is possible in O ( α log p + β n ) {\displaystyle {\mathcal {O}}(\alpha \log p+\beta n)} , since reduce and broadcast are possible in O ( α log p + β n ) {\displaystyle {\mathcal {O}}(\alpha \log p+\beta n)} with pipelining on balanced binary trees. All-reduce implemented with a butterfly algorithm achieves the same asymptotic runtime. == Prefix-Sum/Scan == The prefix-sum or scan operation is used to collect data or partial results from different processing units and to compute intermediate results by an operator, which are stored on those processing units. It can be seen as a generalization of the reduce operation (§ Reduce). Given p {\displaystyle p} processing units, message m i {\displaystyle m_{i}} is on processing unit p i {\displaystyle p_{i}} . The operator ⊗ {\displaystyle \otimes } must be at least associative, whereas some algorithms require also a commutative operator and a neutral element. Common operators are s u m {\displaystyle sum} , m i n {\displaystyle min} and m a x {\displaystyle max} . Eventually processing unit p i {\displaystyle p_{i}} stores the prefix sum ⊗ i ′ <= i {\displaystyle \otimes _{i'<=i}} m i ′ {\displaystyle m_{i'}} . In the case of the so-called exclusive prefix sum, processing unit p i {\displaystyle p_{i}} stores the prefix sum ⊗ i ′ < i {\displaystyle \otimes _{i'
Predictive text
Predictive text is an input technology used where one key or button represents many letters, such as on the physical numeric keypads of mobile phones and in accessibility technologies. Each key press results in a prediction rather than repeatedly sequencing through the same group of "letters" it represents, in the same, invariable order. Predictive text could allow for an entire word to be input by a single keypress. Predictive text makes efficient use of fewer device keys to input writing into a text message, an e-mail, an address book, a calendar, and the like. The most widely used, general, predictive text systems are T9, iTap, eZiText, and LetterWise/WordWise. There are many ways to build a device that predicts text, but all predictive text systems have initial linguistic settings that offer predictions that are re-prioritized to adapt to each user. This learning adapts, by way of the device memory, to a user's disambiguating feedback that results in corrective key presses, such as pressing a "next" key to get to the intention. Most predictive text systems have a user database to facilitate this process. Theoretically the number of keystrokes required per desired character in the finished writing is, on average, comparable to using a keyboard. This is approximately true provided that all words used are in its database, punctuation is ignored, and no input mistakes are made when typing or spelling. The theoretical keystrokes per character, KSPC, of a keyboard is KSPC=1.00, and of multi-tap is KSPC=2.03. Eatoni's LetterWise is a predictive multi-tap hybrid, which when operating on a standard telephone keypad achieves KSPC=1.15 for English. The choice of which predictive text system is the best to use involves matching the user's preferred interface style, the user's level of learned ability to operate predictive text software, and the user's efficiency goal. There are various levels of risk in predictive text systems, versus multi-tap systems, because the predicted text that is automatically written provides the speed and mechanical efficiency benefit, which, if the user is not careful to review, results in transmitting misinformation. Predictive text systems take time to learn to use well, and so generally, a device's system has user options to set up the choice of multi-tap or any one of several schools of predictive text methods. == Background == Short message service (SMS) permits a mobile phone user to send text messages (also called messages, SMSes, texts, and txts) as a short message. The most common system of SMS text input is referred to as "multi-tap". Using multi-tap, a key is pressed multiple times to access the list of letters on that key. For instance, pressing the "2" key once displays an "a", twice displays a "b" and three times displays a "c". To enter two successive letters that are on the same key, the user must either pause or hit a "next" button. A user can type by pressing an alphanumeric keypad without looking at the electronic equipment display. Thus, multi-tap is easy to understand and can be used without any visual feedback. However, multi-tap is not very efficient, requiring potentially many keystrokes to enter a single letter. In ideal predictive text entry, all words used are in the dictionary, punctuation is ignored, no spelling mistakes are made, and no typing mistakes are made. The ideal dictionary would include all slang, proper nouns, abbreviations, URLs, foreign-language words and other user-unique words. This ideal circumstance gives predictive text software a reduction in the number of key strokes a user is required to enter a word. The user presses the number corresponding to each letter. As long as the word exists in the predictive text dictionary or is correctly disambiguated by non-dictionary systems, it will appear. For instance, pressing "4663" will typically be interpreted as the word good, provided that a linguistic database in English is currently in use, though alternatives such as home, hood and hoof are also valid interpretations of the sequence of key strokes. The most widely used systems of predictive text are Tegic's T9, Motorola's iTap, and the Eatoni Ergonomics' LetterWise and WordWise. T9 and iTap use dictionaries, but Eatoni Ergonomics' products use a disambiguation process, a set of statistical rules to recreate words from keystroke sequences. All predictive text systems require a linguistic database for every supported input language. == Dictionary vs. non-dictionary systems == Traditional disambiguation works by referencing a dictionary of commonly used words, though Eatoni offers a dictionaryless disambiguation system. In dictionary-based systems, as the user presses the number buttons, an algorithm searches the dictionary for a list of possible words that match the keypress combination and offers up the most probable choice. The user can then confirm the selection and move on, or use a key to cycle through the possible combinations. A non-dictionary system constructs words and other sequences of letters from the statistics of word parts. To attempt predictions of the intended result of keystrokes not yet entered, disambiguation may be combined with a word completion facility. Either system (disambiguation or predictive) may include a user database, which can be further classified as a "learning" system when words or phrases are entered into the user database without direct user intervention. The user database is for storing words or phrases that are not well disambiguated by the pre-supplied database. Some disambiguation systems further attempt to correct spelling, format text or perform other automatic rewrites, with the risky effect of either enhancing or frustrating user efforts to enter text. == History == The predictive text and autocomplete technology was invented out of necessities by Chinese scientists and linguists in the 1950s to solve the input inefficiency of the Chinese typewriter, as the typing process involved finding and selecting thousands of logographic characters on a tray, drastically slowing down the word processing speed. The actuating keys of the Chinese typewriter created by Lin Yutang in the 1940s included suggestions for the characters following the one selected. In 1951, the Chinese typesetter Zhang Jiying arranged Chinese characters in associative clusters, a precursor of modern predictive text entry, and broke speed records by doing so. Predictive entry of text from a telephone keypad has been known at least since the 1970s (Smith and Goodwin, 1971). Predictive text was mainly used to look up names in directories over the phone until mobile phone text messaging came into widespread use. == Example == On a typical phone keypad, if users wished to type the in a "multi-tap" keypad entry system, they would need to: Press 8 (tuv) once to select t. Press 4 (ghi) twice to select h. Press 3 (def) twice to select e. Meanwhile, in a phone with predictive text, they need only: Press 8 once to select the (tuv) group for the first character. Press 4 once to select the (ghi) group for the second character. Press 3 once to select the (def) group for the third character. The system updates the display as each keypress is entered, to show the most probable entry. In this example, prediction reduced the number of button presses from five to three. The effect is even greater with longer words and those composed of letters later in each key's sequence. A dictionary-based predictive system is based on the hope that the desired word is in the dictionary. That hope may be misplaced if the word differs in any way from common usage—in particular, if the word is not spelled or typed correctly, is slang, or is a proper noun. In these cases, some other mechanism must be used to enter the word. Furthermore, the simple dictionary approach fails with agglutinative languages, where a single word does not necessarily represent a single semantic entity. == Companies and products == Predictive text is developed and marketed in a variety of competing products, such as Nuance Communications's T9. Other products include Motorola's iTap; Eatoni Ergonomic's LetterWise (character, rather than word-based prediction); WordWise (word-based prediction without a dictionary); EQ3 (a QWERTY-like layout compatible with regular telephone keypads); Prevalent Devices's Phraze-It; Xrgomics' TenGO (a six-key reduced QWERTY keyboard system); Adaptxt (considers language, context, grammar and semantics); Lightkey (a predictive typing software for Windows); Clevertexting (statistical nature of the language, dictionaryless, dynamic key allocation); and Oizea Type (temporal ambiguity); Intelab's Tauto; WordLogic's Intelligent Input Platform™ (patented, layer-based advanced text prediction, includes multi-language dictionary, spell-check, built-in Web search); Google's Gboard. == Textonyms == Words produced by the same combination of keypresses have been called "textonyms"; also "txtonyms"; or "T9o
Broadcast (parallel pattern)
Broadcast is a collective communication primitive in parallel programming to distribute programming instructions or data to nodes in a cluster. It is the reverse operation of reduction. The broadcast operation is widely used in parallel algorithms, such as matrix-vector multiplication, Gaussian elimination and shortest paths. The Message Passing Interface implements broadcast in MPI_Bcast. == Definition == A message M [ 1.. m ] {\displaystyle M[1..m]} of length m {\displaystyle m} should be distributed from one node to all other p − 1 {\displaystyle p-1} nodes. T byte {\displaystyle T_{\text{byte}}} is the time it takes to send one byte. T start {\displaystyle T_{\text{start}}} is the time it takes for a message to travel to another node, independent of its length. Therefore, the time to send a package from one node to another is t = s i z e × T byte + T start {\displaystyle t=\mathrm {size} \times T_{\text{byte}}+T_{\text{start}}} . p {\displaystyle p} is the number of nodes and the number of processors. == Binomial Tree Broadcast == With Binomial Tree Broadcast the whole message is sent at once. Each node that has already received the message sends it on further. This grows exponentially as each time step the amount of sending nodes is doubled. The algorithm is ideal for short messages but falls short with longer ones as during the time when the first transfer happens only one node is busy. Sending a message to all nodes takes log 2 ( p ) t {\displaystyle \log _{2}(p)t} time which results in a runtime of log 2 ( p ) ( m T byte + T start ) {\displaystyle \log _{2}(p)(mT_{\text{byte}}+T_{\text{start}})} == Linear Pipeline Broadcast == The message is split up into k {\displaystyle k} packages and sent piecewise from node n {\displaystyle n} to node n + 1 {\displaystyle n+1} . The time needed to distribute the first message piece is p t = m k T byte + T start {\textstyle pt={\frac {m}{k}}T_{\text{byte}}+T_{\text{start}}} whereby t {\displaystyle t} is the time needed to send a package from one processor to another. Sending a whole message takes ( p + k ) ( m T byte k + T start ) = ( p + k ) t = p t + k t {\displaystyle (p+k)\left({\frac {mT_{\text{byte}}}{k}}+T_{\text{start}}\right)=(p+k)t=pt+kt} . Optimal is to choose k = m ( p − 2 ) T byte T start {\displaystyle k={\sqrt {\frac {m(p-2)T_{\text{byte}}}{T_{\text{start}}}}}} resulting in a runtime of approximately m T byte + p T start + m p T start T byte {\displaystyle mT_{\text{byte}}+pT_{\text{start}}+{\sqrt {mpT_{\text{start}}T_{\text{byte}}}}} The run time is dependent on not only message length but also the number of processors that play roles. This approach shines when the length of the message is much larger than the amount of processors. == Pipelined Binary Tree Broadcast == This algorithm combines Binomial Tree Broadcast and Linear Pipeline Broadcast, which makes the algorithm work well for both short and long messages. The aim is to have as many nodes work as possible while maintaining the ability to send short messages quickly. A good approach is to use Fibonacci trees for splitting up the tree, which are a good choice as a message cannot be sent to both children at the same time. This results in a binary tree structure. We will assume in the following that communication is full-duplex. The Fibonacci tree structure has a depth of about d ≈ log Φ ( p ) {\displaystyle d\approx \log _{\Phi }(p)} whereby Φ = 1 + 5 2 {\displaystyle \Phi ={\frac {1+{\sqrt {5}}}{2}}} the golden ratio. The resulting runtime is ( m k T byte + T start ) ( d + 2 k − 2 ) {\textstyle ({\frac {m}{k}}T_{\text{byte}}+T_{\text{start}})(d+2k-2)} . Optimal is k = n ( d − 2 ) T byte 3 T start {\displaystyle k={\sqrt {\frac {n(d-2)T_{\text{byte}}}{3T_{\text{start}}}}}} . This results in a runtime of 2 m T byte + T start log Φ ( p ) + 2 m log Φ ( p ) T start T byte {\displaystyle 2mT_{\text{byte}}+T_{\text{start}}\log _{\Phi }(p)+{\sqrt {2m\log _{\Phi }(p)T_{\text{start}}T_{\text{byte}}}}} . == Two Tree Broadcast (23-Broadcast) == === Definition === This algorithm aims to improve on some disadvantages of tree structure models with pipelines. Normally in tree structure models with pipelines (see above methods), leaves receive just their data and cannot contribute to send and spread data. The algorithm concurrently uses two binary trees to communicate over. Those trees will be called tree A and B. Structurally in binary trees there are relatively more leave nodes than inner nodes. Basic Idea of this algorithm is to make a leaf node of tree A be an inner node of tree B. It has also the same technical function in opposite side from B to A tree. This means, two packets are sent and received by inner nodes and leaves in different steps. === Tree construction === The number of steps needed to construct two parallel-working binary trees is dependent on the amount of processors. Like with other structures one processor can is the root node who sends messages to two trees. It is not necessary to set a root node, because it is not hard to recognize that the direction of sending messages in binary tree is normally top to bottom. There is no limitation on the number of processors to build two binary trees. Let the height of the combined tree be h = ⌈log(p + 2)⌉. Tree A and B can have a height of h − 1 {\displaystyle h-1} . Especially, if the number of processors correspond to p = 2 h − 1 {\displaystyle p=2^{h}-1} , we can make both sides trees and a root node. To construct this model efficiently and easily with a fully built tree, we can use two methods called "Shifting" and "Mirroring" to get second tree. Let assume tree A is already modeled and tree B is supposed to be constructed based on tree A. We assume that we have p {\displaystyle p} processors ordered from 0 to p − 1 {\displaystyle p-1} . ==== Shifting ==== The "Shifting" method, first copies tree A and moves every node one position to the left to get tree B. The node, which will be located on -1, becomes a child of processor p − 2 {\displaystyle p-2} . ==== Mirroring ==== "Mirroring" is ideal for an even number of processors. With this method tree B can be more easily constructed by tree A, because there are no structural transformations in order to create the new tree. In addition, a symmetric process makes this approach simple. This method can also handle an odd number of processors, in this case, we can set processor p − 1 {\displaystyle p-1} as root node for both trees. For the remaining processors "Mirroring" can be used. === Coloring === We need to find a schedule in order to make sure that no processor has to send or receive two messages from two trees in a step. The edge, is a communication connection to connect two nodes, and can be labelled as either 0 or 1 to make sure that every processor can alternate between 0 and 1-labelled edges. The edges of A and B can be colored with two colors (0 and 1) such that no processor is connected to its parent nodes in A and B using edges of the same color- no processor is connected to its children nodes in A or B using edges of the same color. In every even step the edges with 0 are activated and edges with 1 are activated in every odd step. === Time complexity === In this case the number of packet k is divided in half for each tree. Both trees are working together the total number of packets k = k / 2 + k / 2 {\displaystyle k=k/2+k/2} (upper tree + bottom tree) In each binary tree sending a message to another nodes takes 2 i {\displaystyle 2i} steps until a processor has at least a packet in step i {\displaystyle i} . Therefore, we can calculate all steps as d := log 2 ( p + 1 ) ⇒ log 2 ( p + 1 ) ≈ log 2 ( p ) {\displaystyle d:=\log _{2}(p+1)\Rightarrow \log _{2}(p+1)\approx \log _{2}(p)} . The resulting run time is T ( m , p , k ) ≈ ( m k T byte + T start ) ( 2 d + k − 1 ) {\textstyle T(m,p,k)\approx ({\frac {m}{k}}T_{\text{byte}}+T_{\text{start}})(2d+k-1)} . (Optimal k = m ( 2 d − 1 ) T byte / T start {\textstyle k={\sqrt {{m(2d-1)T_{\text{byte}}}/{T_{\text{start}}}}}} ) This results in a run time of T ( m , p ) ≈ m T byte + T start ⋅ 2 log 2 ( p ) + m ⋅ 2 log 2 ( p ) T start T byte {\displaystyle T(m,p)\approx mT_{\text{byte}}+T_{\text{start}}\cdot 2\log _{2}(p)+{\sqrt {m\cdot 2\log _{2}(p)T_{\text{start}}T_{\text{byte}}}}} . == ESBT-Broadcasting (Edge-disjoint Spanning Binomial Trees) == In this section, another broadcasting algorithm with an underlying telephone communication model will be introduced. A Hypercube creates network system with p = 2 d ( d = 0 , 1 , 2 , 3 , . . . ) {\displaystyle p=2^{d}(d=0,1,2,3,...)} . Every node is represented by binary 0 , 1 {\displaystyle {0,1}} depending on the number of dimensions. Fundamentally ESBT(Edge-disjoint Spanning Binomial Trees) is based on hypercube graphs, pipelining( m {\displaystyle m} messages are divided by k {\displaystyle k} packets) and binomial trees. The Processor 0 d {\displaystyle 0^{d}} cyclically spreads packets to roots of ESB