AI Chatbot You Can Talk To

AI Chatbot You Can Talk To — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Zero-shot learning

    Zero-shot learning

    Zero-shot learning (ZSL) is a problem setup in deep learning where, at test time, a learner observes samples from classes which were not observed during training, and needs to predict the class that they belong to. The name is a play on words based on the earlier concept of one-shot learning, in which classification can be learned from only one, or a few, examples. Zero-shot methods generally work by associating observed and non-observed classes through some form of auxiliary information, which encodes observable distinguishing properties of objects. For example, given a set of images of animals to be classified, along with auxiliary textual descriptions of what animals look like, an artificial intelligence model which has been trained to recognize horses, but has never been given a zebra, can still recognize a zebra when it also knows that zebras look like striped horses. This problem is widely studied in computer vision, natural language processing, and machine perception. == Background and history == The first paper on zero-shot learning in natural language processing appeared in a 2008 paper by Chang, Ratinov, Roth, and Srikumar, at the AAAI'08, but the name given to the learning paradigm there was dataless classification. The first paper on zero-shot learning in computer vision appeared at the same conference, under the name zero-data learning. The term zero-shot learning itself first appeared in the literature in a 2009 paper from Palatucci, Hinton, Pomerleau, and Mitchell at NIPS'09. This terminology was repeated later in another computer vision paper and the term zero-shot learning caught on, as a take-off on one-shot learning that was introduced in computer vision years earlier. In computer vision, zero-shot learning models learned parameters for seen classes along with their class representations and rely on representational similarity among class labels so that, during inference, instances can be classified into new classes. In natural language processing, the key technical direction developed builds on the ability to "understand the labels"—represent the labels in the same semantic space as that of the documents to be classified. This supports the classification of a single example without observing any annotated data, the purest form of zero-shot classification. The original paper made use of the Explicit Semantic Analysis (ESA) representation but later papers made use of other representations, including dense representations. This approach was also extended to multilingual domains, fine entity typing and other problems. Moreover, beyond relying solely on representations, the computational approach has been extended to depend on transfer from other tasks, such as textual entailment and question answering. The original paper also points out that, beyond the ability to classify a single example, when a collection of examples is given, with the assumption that they come from the same distribution, it is possible to bootstrap the performance in a semi-supervised like manner (or transductive learning). Unlike standard generalization in machine learning, where classifiers are expected to correctly classify new samples to classes they have already observed during training, in ZSL, no samples from the classes have been given during training the classifier. It can therefore be viewed as an extreme case of domain adaptation. == Prerequisite information for zero-shot classes == Naturally, some form of auxiliary information has to be given about these zero-shot classes, and this type of information can be of several types. Learning with attributes: classes are accompanied by pre-defined structured description. For example, for bird descriptions, this could include "red head", "long beak". These attributes are often organized in a structured compositional way, and taking that structure into account improves learning. While this approach was used mostly in computer vision, there are some examples for it also in natural language processing. Learning from textual description. As pointed out above, this has been the key direction pursued in natural language processing. Here class labels are taken to have a meaning and are often augmented with definitions or free-text natural-language description. This could include for example a wikipedia description of the class. Class-class similarity. Here, classes are embedded in a continuous space. A zero-shot classifier can predict that a sample corresponds to some position in that space, and the nearest embedded class is used as a predicted class, even if no such samples were observed during training. == Generalized zero-shot learning == The above ZSL setup assumes that at test time, only zero-shot samples are given, namely, samples from new unseen classes. In generalized zero-shot learning, samples from both new and known classes, may appear at test time. This poses new challenges for classifiers at test time, because it is very challenging to estimate if a given sample is new or known. Some approaches to handle this include: a gating module, which is first trained to decide if a given sample comes from a new class or from an old one, and then, at inference time, outputs either a hard decision, or a soft probabilistic decision a generative module, which is trained to generate feature representation of the unseen classes—a standard classifier can then be trained on samples from all classes, seen and unseen. == Domains of application == Zero shot learning has been applied to the following fields: image classification semantic segmentation image generation object detection natural language processing computational biology abstract reasoning

    Read more →
  • Controlled vocabulary

    Controlled vocabulary

    A controlled vocabulary provides a way to organize knowledge for subsequent retrieval. Controlled vocabularies are used in subject indexing schemes, subject headings, thesauri, taxonomies and other knowledge organization systems. Controlled vocabulary schemes mandate the use of predefined, preferred terms that have been preselected by the designers of the schemes, in contrast to natural language vocabularies, which have no such restriction. == In library and information science == In library and information science, controlled vocabulary is a carefully selected list of words and phrases, which are used to tag units of information (document or work) so that they may be more easily retrieved by a search. Controlled vocabularies solve the problems of homographs, synonyms and polysemes by a bijection between concepts and preferred terms. In short, controlled vocabularies reduce unwanted ambiguity inherent in normal human languages where the same concept can be given different names and ensure consistency. For example, in the Library of Congress Subject Headings (a subject heading system that uses a controlled vocabulary), preferred terms—subject headings in this case—have to be chosen to handle choices between variant spellings of the same word (American versus British), choice among scientific and popular terms (cockroach versus Periplaneta americana), and choices between synonyms (automobile versus car), among other difficult issues. Choices of preferred terms are based on the principles of user warrant (what terms users are likely to use), literary warrant (what terms are generally used in the literature and documents), and structural warrant (terms chosen by considering the structure, scope of the controlled vocabulary). Controlled vocabularies also typically handle the problem of homographs with qualifiers. For example, the term pool has to be qualified to refer to either swimming pool or the game pool to ensure that each preferred term or heading refers to only one concept. === Types used in libraries === There are two main kinds of controlled vocabulary tools used in libraries: subject headings and thesauri. While the differences between the two are diminishing, there are still some minor differences: Historically, subject headings were designed to describe books in library catalogs by catalogers while thesauri were used by indexers to apply index terms to documents and articles. Subject headings tend to be broader in scope describing whole books, while thesauri tend to be more specialized covering very specific disciplines. Because of the card catalog system, subject headings tend to have terms that are in indirect order (though with the rise of automated systems this is being removed), while thesaurus terms are always in direct order. Subject headings tend to use more pre-coordination of terms such that the designer of the controlled vocabulary will combine various concepts together to form one preferred subject heading. (e.g., children and terrorism) while thesauri tend to use singular direct terms. Thesauri list not only equivalent terms but also narrower, broader terms and related terms among various preferred and non-preferred (but potentially synonymous) terms, while historically most subject headings did not. For example, the Library of Congress Subject Heading itself did not have much syndetic structure until 1943, and it was not until 1985 when it began to adopt the thesauri type term "Broader term" and "Narrow term". The terms are chosen and organized by trained professionals (including librarians and information scientists) who possess expertise in the subject area. Controlled vocabulary terms can accurately describe what a given document is actually about, even if the terms themselves do not occur within the document's text. Well known subject heading systems include the Library of Congress system, Medical Subject Headings (MeSH) created by the United States National Library of Medicine, and Sears. Well known thesauri include the Art and Architecture Thesaurus and the ERIC Thesaurus. When selecting terms for a controlled vocabulary, the designer has to consider the specificity of the term chosen, whether to use direct entry, inter consistency and stability of the language. Lastly the amount of pre-coordination (in which case the degree of enumeration versus synthesis becomes an issue) and post-coordination in the system is another important issue. Controlled vocabulary elements (terms/phrases) employed as tags, to aid in the content identification process of documents, or other information system entities (e.g. DBMS, Web Services) qualifies as metadata. == Indexing languages == There are three main types of indexing languages. Controlled indexing language – only approved terms can be used by the indexer to describe the document Natural language indexing language – any term from the document in question can be used to describe the document Free indexing language – any term (not only from the document) can be used to describe the document When indexing a document, the indexer also has to choose the level of indexing exhaustivity, the level of detail in which the document is described. For example, using low indexing exhaustivity, minor aspects of the work will not be described with index terms. In general the higher the indexing exhaustivity, the more terms indexed for each document. In recent years free text search as a means of access to documents has become popular. This involves using natural language indexing with an indexing exhaustively set to maximum (every word in the text is indexed). These methods have been compared in some studies, such as the 2007 article, "A Comparative Evaluation of Full-text, Concept-based, and Context-sensitive Search". === Advantages === Controlled vocabularies are often claimed to improve the accuracy of free text searching, such as to reduce irrelevant items in the retrieval list. These irrelevant items (false positives) are often caused by the inherent ambiguity of natural language. Take the English word football for example. Football is the name given to a number of different team sports. Worldwide the most popular of these team sports is association football, which also happens to be called soccer in several countries. The word football is also applied to rugby football (rugby union and rugby league), American football, Australian rules football, Gaelic football, and Canadian football. A search for football therefore will retrieve documents that are about several completely different sports. Controlled vocabulary solves this problem by tagging the documents in such a way that the ambiguities are eliminated. Compared to free text searching, the use of a controlled vocabulary can dramatically increase the performance of an information retrieval system, if performance is measured by precision (the percentage of documents in the retrieval list that are actually relevant to the search topic). In some cases controlled vocabulary can enhance recall as well, because unlike natural language schemes, once the correct preferred term is searched, there is no need to search for other terms that might be synonyms of that term. === Disadvantages === A controlled vocabulary search may lead to unsatisfactory recall, in that it will fail to retrieve some documents that are actually relevant to the search question. This is particularly problematic when the search question involves terms that are sufficiently tangential to the subject area such that the indexer might have decided to tag it using a different term (but the searcher might consider the same). Essentially, this can be avoided only by an experienced user of controlled vocabulary whose understanding of the vocabulary coincides with that of the indexer. Another possibility is that the article is just not tagged by the indexer because indexing exhaustivity is low. For example, an article might mention football as a secondary focus, and the indexer might decide not to tag it with "football" because it is not important enough compared to the main focus. But it turns out that for the searcher that article is relevant and hence recall fails. A free text search would automatically pick up that article regardless. On the other hand, free text searches have high exhaustivity (every word is searched) so although it has much lower precision, it has potential for high recall as long as the searcher overcome the problem of synonyms by entering every combination. Controlled vocabularies may become outdated rapidly in fast developing fields of knowledge, unless the preferred terms are updated regularly. Even in an ideal scenario, a controlled vocabulary is often less specific than the words of the text itself. Indexers trying to choose the appropriate index terms might misinterpret the author, while this precise problem is not a factor in a free text, as it uses the author's own words. The use of controlled vocabularies can be costly compared to free

    Read more →
  • Artificial intelligence in Brazilian industry

    Artificial intelligence in Brazilian industry

    In 2022, 16.9% (1,620) of the 9,586 Brazilian industrial companies with 100 or more employees used artificial intelligence in their operations Among the companies that used AI, the areas of administration (73.8%), product project development (65.9%), processes, services and marketing (65.1%) were those that used it the most, followed by the areas of production (56.4%) and logistics (48.4%). == Current scenario == === Adoption in Brazilian industrial sectors === In senior management, the majority (56%) of executives have a long-term vision for its use. The study also shows that IT, Innovation, and Marketing are the areas where AI use is most widespread, and that 43% of companies are developing or adapting the algorithms they use. The majority of large institutions that reported some type of AI use purchased these solutions from other companies (76%). Some factors for the adoption of artificial intelligence in companies include the establishment of an autonomous strategy by the company (87.0%), and the influence of suppliers and/or customers (63.0%) and the main difficulties in using technologies were high costs (80.8%), lack of qualified personnel in the company (54.6%) and excessive economic risks (49.5%). Three variables are considered the most relevant to explain the option to use AI: the implementation of a digital security policy, the size of companies with 250 or more employees and the characteristics of the company related to information and communication. When analyzing AI use by company size in Brazil, large companies have the highest proportion of AI use, mainly due to their investment capacity and technology experimentation. However, when comparing Brazil and Europe, indicators show an acceleration in AI use among large European companies, while in Brazil the situation remains stable. In 2023, 30% of large companies in the European bloc used some type of AI, a figure that rose to 41% in 2024, while in Brazil these proportions were 41% in 2023 and 38% in 2024. === Workforce === The challenge of upskilling begins with employees who are capable of understanding recent technological changes. Similarly, companies must create the environment and conditions for workforce development conducive to innovation, and universities must be prepared to provide knowledge aligned with the transition process, which in turn must be supported by public policies. The concern with training a specialized workforce in AI can be seen in the low number of graduates and PhDs in computer science and computer engineering in Brazil, compared to the number shown in other countries. As recorded in the document Recommendations for the Advancement of Artificial Intelligence in Brazil, 2019 data from the Coordination for the Improvement of Higher Education Personnel (CAPES) indicate that "the number of PhDs graduated annually in computing remained below 400 in 2016, and is not expected to have increased during the Covid-19 pandemic" (ABC, 2023). In the United States, by contrast, the number of PhDs graduated in these two areas has remained around 1,800 for the past 11 years, and during this period, the number of PhDs specializing in AI jumped from 10% to 19%. Based on data from the CNPq Lattes Platform (October 2019), it is possible to observe that the number of professionals in the AI field in Brazil is 4,429 specialists. This is still a small number compared to the 415,166 IT jobs in the country's business sector alone. === R&D, scientific production and integration with industry === China and the United States lead in the number of publications. These two countries are followed by the G7 members: India, Austria, South Korea, and Spain. Brazil appears in the next group, alongside the Netherlands, Russia, Indonesia, and Ireland. Regarding the promotion of research and technologies related to AI, public entities such as the Coordination for the Improvement of Higher Education Personnel (Capes) and the National Council for Scientific and Technological Development (CNPq) stood out as the main funders. Currently, different countries and territories have been promoting the development of Artificial Intelligence (AI). In the Brazilian case, one of the main initiatives is the creation of Engineering Research Centers/Applied Research Centers (CPE/CPA) in AI by the São Paulo Research Foundation (FAPESP), in collaboration with the Ministry of Science, Technology and Innovation (MCTI), the Ministry of Communications (MC) and the Brazilian Internet Steering Committee (CGI.br). In terms of the number of patents filed and the volume of investments, the leading nations in AI are the United States, China, France, Germany, the United Kingdom, Russia, India, Switzerland, Japan, South Korea, the Netherlands, Sweden, Finland, Ireland, Singapore, Canada, Israel, and Italy. Brazil appears among the top twenty countries in some rankings, mainly due to its good number of publications (approximately 10% of the number of articles published by the United States). The US is home to approximately 60% of the world's top AI researchers, followed by China (11%), Europe (10%), and Canada (6%). To change this scenario, in August 2024, the Brazilian government announced an investment of R$23 billion until 2028 in artificial intelligence, seeking to “transform the country into a global reference in innovation”. == Future challenges == The Organization for Economic Cooperation and Development (2020) report highlighted three factors that hinder the digital transformation journey and application of AI in Brazil: insufficient infrastructure, high costs due to the tax system, and financial limitations, such as limited access to financing. The costs of adopting technology, its incompatibility with the business, and the lack of training also represent obstacles that Brazilian industry must overcome. There are also inherent obstacles for companies. A McKinsey review emphasizes that once a company chooses one or more sectors to focus on, it must select specific applications. Buyers aren't interested in artificial intelligence simply because it's a breakthrough technology; they want AI to generate a good return on investment, whether by solving specific problems, saving money, or increasing sales. If an AI vendor tried to offer a horizontal solution, the value proposition might not be as compelling. Part of the solution to Brazil's technological backwardness involves building an ecosystem fueled by private institutions, universities, and governments.

    Read more →
  • Environmental informatics

    Environmental informatics

    Environmental informatics is the science of information applied to environmental science. As such, it provides the information processing and communication infrastructure to the interdisciplinary field of environmental sciences aiming at data, information and knowledge integration, the application of computational intelligence to environmental data as well as the identification of environmental impacts of information technology. Environmental informatics thus acts as a bridge, providing an interdisciplinary means of analysing, describing and understanding the complex interactions between humans, nature and technology. Since each field of applied computer science has its own subject matter, terminology and methods, specialised disciplines, such as environmental, bio- and geoinformatics have emerged, each of which combines computer science with a specific field of application such as environmental, bio- or geosciences. Environmental informatics, bioinformatics and geoinformatics all deal with computer-based processing of environmental phenomena. However, environmental informatics is the only field that pursues normative goals (e.g., political goals of environmental protection, environmental planning, and sustainability). This also influences the choice of methods. This also distinguishes it from application areas such as numerical weather prediction, which is considered an early and important example of computer simulation of environmental phenomena. The UK Natural Environment Research Council defines environmental informatics as the "research and system development focusing on the environmental sciences relating to the creation, collection, storage, processing, modelling, interpretation, display and dissemination of data and information." Kostas Karatzas defined environmental informatics as the "creation of a new 'knowledge-paradigm' towards serving environmental management needs." Karatzas argued further that environmental informatics "is an integrator of science, methods and techniques and not just the result of using information and software technology methods and tools for serving environmental engineering needs." Environmental informatics emerged in early 1990 in Central Europe. Current initiatives to effectively manage, share, and reuse environmental and ecological data are indicative of the increasing importance of fields like environmental informatics and ecoinformatics to develop the foundations for effectively managing ecological information. Examples of these initiatives are National Science Foundation Datanet projects, DataONE and Data Conservancy. == Subject matter and objectives == The subject of environmental informatics are environmental information systems (EIS). An EIS 'is a computer-based system that integrates and stores data collected about the natural environment and provides powerful methods for accessing and evaluating it.' This allows environmental data to be processed by computers for environmental protection, planning, research and technology. According to Jaeschke and Bossel, environmental informatics has three interrelated objectives: Environmental informatics serves to procure data and information for describing the state and development of the environment. Of particular importance is information that is needed to prevent or limit undesirable changes and to support desirable changes. Based on the evaluation and analysis of data, environmental informatics improves our understanding of the environment and the interactions between nature, technology and society. It thus supports environmentally relevant decisions. This enables the influence of development (system correction), the assessment of the effects and side effects of potential measures, and the creation of tools for the routine planning, implementation and monitoring of measures. == History == The simulation model World3, which formed the basis of the highly acclaimed study The Limits to Growth, is considered the starting point of environmental informatics. It incorporated environmental information, among other things, to calculate scenarios for global development. In the mid-1980s, interest grew in structuring environmental protection as an area of application for computer science. One of the first publications in German was the book Informatik im Umweltschutz. Anwendungen und Perspektiven (Computer science in environmental protection. Applications and perspectives) from 1986. The term 'environmental informatics' did not appear until around 1993, which is why the development of environmental informatics is usually referred to as having taken place in the 1990s. In 1993, the first university chair for environmental informatics was established in Cottbus. In 1994, the anthology Umweltinformatik. Informatikmethoden für Umweltschutz und Umweltforschung (Environmental Informatics: Informatics Methods for Environmental Protection and Environmental Research) was published. The development of environmental informatics was 'primarily initiated by German computer science.' In the English-speaking world, the volume Environmental Informatics was published in 1995, mainly based on the German anthology of 1994. An article in the conference proceedings of the World Computer Congress of the International Federation for Information Processing (IFIP) in Hamburg in 1994 describes the initial situation of environmental informatics as follows: 'On the one hand, we suffer from the huge amount of available data – people sometimes speak of data graveyards – on the other hand, the really relevant data may still be missing.' This statement indicates the need that led to the emergence of environmental informatics as a specialised discipline of applied computer science. Furthermore, the specific characteristics and processing requirements of environmental data necessitated the emergence of environmental informatics. The special features of environmental data include: The data structures required are highly heterogeneous due to specific processes and differing perspectives on environmental aspects (e.g., water protection, emission control, hazardous substances). In addition to the heterogeneity of the data, heterogeneous databases also play a role, as environmental data is often obtained and presented in an interdisciplinary manner. Obligations change frequently as a result of new legislation, whether regional (e.g. state regulations on water protection), national (e.g. federal emission control regulations) or international (e.g. Registration, Evaluation, Authorisation and Restriction of Chemicals|REACH). The objects represented are often multidimensional and, therefore, require complex geometric representation using curves or polygons. It is often necessary to process uncertain, imprecise or incomplete data, which is, for example, the result of extrapolations or forecasts. A new "knowledge paradigm" has emerged to meet the requirements of environmental management. Environmental informatics produces its own concepts, methods and techniques and is not merely the result of using information and communication technology methods and tools to meet environmental requirements. The development of environmental informatics since the 1990s has been significantly influenced by the newly established conferences EnviroInfo, ISESS and ITEE and is documented in the respective proceedings. Aspects of sustainability and sustainable development were increasingly integrated into environmental informatics after 2000, thereby expanding the field. In 2004, the Working Group on Sustainable Information Society of the Gesellschaft für Informatik e. V. (German Informatics Society, GI) published the Memorandum on a Sustainable Information Society, which formulates recommendations for an information society that is compatible with human, social and natural needs. Since 2007, environmental informatics has often been described in more detail as informatics for environmental protection, sustainable development and risk management. The increased focus on sustainability has also contributed to the formation of the research focus Information and Communications Technology for Sustainability (ICT4S) and to the emergence of the international conference ICT4S in 2013. ICT-ENSURE, the European Commission's funding measure for the establishment of a European research area on "ICT for Environmental Sustainability Research" (2008–2010), has also contributed to the structuring of environmental informatics. == Environmental informatics and sustainable development == Efforts to place environmental informatics within the context of sustainable development have been growing since 2000 and were significantly influenced by the Memorandum on a Sustainable Information Society. According to this Memorandum, the information society offers great but unevenly distributed opportunities for education, participation and intercultural understanding. In addition, the Memorandum highlighted the material and energy consumption of inf

    Read more →
  • Ciscogate

    Ciscogate

    Ciscogate, also known as the Black Hat Bug, is the name given to a legal incident that occurred at the Black Hat Briefings security conference in Las Vegas, Nevada, on July 27, 2005. On the morning of the first day of the conference, July 26, 2005, some attendees noticed that 30 pages of text had been physically ripped out of the extensive conference presentation booklet the night before at the request of Cisco Systems and the CD-ROM with presentation slides was not included. It was determined the pages covered a talk to be given by Michael Lynn, a security researcher with Atlanta-based IBM Internet Security Systems (ISS). Instead of the pages with the details, attendees found a photographed copy of a notice from Black Hat saying "Due to some last minute changes beyond Black Hat's control, and at the request of the presenter, the included materials aren't up to the standards Black Hat tries to meet. Black Hat will be the first to apologize. We hope the vendors involved will follow suit." According to Lynn's lawyer, his employer had approved of the talk leading up to the conference but changed their minds two days before the scheduled talk, forbidding him from presenting. Lynn's original presentation was to cover a vulnerability in Cisco routers. The presentation was one of four scheduled to follow Jeff Moss' keynote address on the first day of the conference, titled "Cisco IOS Security Architecture". After being told by his employer that he could not present on the topic, Lynn chose an alternate topic. Cisco and ISS had offered to give new joint presentation but this was turned down by Black Hat because the original speaking slot was given to Lynn, not Cisco. Lynn's presentation began by covering security issues in services that allow users to make Voice over IP telephone calls. Shortly after beginning the presentation Lynn changed back to his original topic and began disclosing some technical details of the vulnerability he found in Cisco routers stating that he would rather resign from his job at ISS than keep the details private. == Lawsuit == Shortly after Lynn concluded his talk he met Jennifer Granick, who would soon become his lawyer. During their initial meeting Lynn told Granick that he expected to be sued. Later in the evening Lynn had heard that Cisco and ISS had filed a lawsuit and requested a temporary restraining order against Black Hat but not himself. A public relations representative from Black Hat told Granick that the lawsuit was against both Black Hat and Lynn and that the companies had scheduled an Ex parte hearing in San Francisco the next morning to request the restraining order. That night, Andrew Valentine, an attorney for ISS and Cisco called Lynn who directed them to Granick. During the conversation Valentine explained the claims and accusations against Lynn, which included three things: 1) ISS claimed copyright over the presentation that Lynn gave, 2) Cisco claimed copyright over the decompiled machine code obtained from the router which was included in the presentation, and 3) Cisco claimed the presentation contained trade secrets. These complaints were outlined in a civil complaint at the U.S. Northern District of California and filed against both Lynn and Black Hat. According to Granick, she and Valentine were able agree to an injunction to settle the case without court proceedings. This deal was almost called off due to an inadvertent mistake by Black Hat in which they had restored Lynn's presentation on their web server. Black Hat, Granick, and the plaintiff's lawyers were able to resolve this problem and the deal stood. One condition of the settlement required Lynn to provide an image of all computer data he used in his research to be provided to a third party for forensic analysis before erasing his research and any Cisco data from his systems. The settlement also stipulated that Lynn was prohibited from talking about the vulnerability in the future. == FBI Investigation == Shortly after lawyers for Lynn and ISS / Cisco filed settlement papers, FBI agents from the Las Vegas office arrived at the conference to begin asking questions. According to Granick, they were there at the request of the Atlanta FBI office and Lynn was not of interest. Granick asserted the Fifth and Sixth amendment rights on behalf of her client, Lynn. Granick asserted his rights for the Atlanta office and asked if an arrest warrant had been issued for Lynn. Over the next 24 hours Granick was not able to ascertain the status of a warrant but ultimately determined no warrant was issued. When the FBI was asked about the case by a journalist, spokesman Paul Bresson declined to discuss the case saying "Our policy is to not make any comment on anything that is ongoing. That's not to confirm that something is, because I really don't know". Granick would only confirm to journalists that the "investigation has to do with the presentation". == Response == === Attendees === Attendees of Black Hat Briefings, as well as many that also attended DEF CON, were not happy with vendors threatening legal action over vulnerability disclosure. The term "Ciscogate" was coined quickly by an unknown person, but some attendees were quick to create shirts to commemorate the incident. === Cisco === Mojgan Khalili, a senior manager for corporate PR at Cisco, issued a statement to the press saying "It is important to note that the information Mr. Lynn presented was not a disclosure of a new vulnerability or a flaw with Cisco IOS software. Mr. Lynn's research explores possible ways to expand exploitations of existing security vulnerabilities impacting routers." === ISS === Kim Duffy, managing director of ISS Australia, was asked about ISS's response to the incident. Duffy responded that it was "business as usual" as the company handled the incident "strictly by the book". He gave a brief statement to ZDNet UK saying "ISS has published rules for disclosure and that is what we stick to. We didn't care to publish [the disclosure] because we were not ready. We had not completed the research to our satisfaction so it was not ready to be disclosed". ISS spokesperson Roger Fortier confirmed that Lynn was no longer employed with the company and that ISS was still working with Cisco on the matter. He gave a statement to the Washington Post saying "ISS and Cisco have been working on this in the background and didn't feel at this time that the material was ready for publication. The decision was made on Monday to pull the presentation because we wanted to make sure the research was fully baked."

    Read more →
  • Communication-avoiding algorithm

    Communication-avoiding algorithm

    Communication-avoiding algorithms minimize movement of data within a memory hierarchy for improving its running-time and energy consumption. These minimize the total of two costs (in terms of time and energy): arithmetic and communication. Communication, in this context refers to moving data, either between levels of memory or between multiple processors over a network. It is much more expensive than arithmetic. == Formal theory == === Two-level memory model === A common computational model in analyzing communication-avoiding algorithms is the two-level memory model: There is one processor and two levels of memory. Level 1 memory is infinitely large. Level 0 memory ("cache") has size M {\displaystyle M} . In the beginning, input resides in level 1. In the end, the output resides in level 1. Processor can only operate on data in cache. The goal is to minimize data transfers between the two levels of memory. === Matrix multiplication === Corollary 6.2: More general results for other numerical linear algebra operations can be found in. The following proof is from. == Motivation == Consider the following running-time model: Measure of computation = Time per FLOP = γ Measure of communication = No. of words of data moved = β ⇒ Total running time = γ·(no. of FLOPs) + β·(no. of words) From the fact that β >> γ as measured in time and energy, communication cost dominates computation cost. Technological trends indicate that the relative cost of communication is increasing on a variety of platforms, from cloud computing to supercomputers to mobile devices. The report also predicts that gap between DRAM access time and FLOPs will increase 100× over coming decade to balance power usage between processors and DRAM. Energy consumption increases by orders of magnitude as we go higher in the memory hierarchy. United States president Barack Obama cited communication-avoiding algorithms in the FY 2012 Department of Energy budget request to Congress: New Algorithm Improves Performance and Accuracy on Extreme-Scale Computing Systems. On modern computer architectures, communication between processors takes longer than the performance of a floating-point arithmetic operation by a given processor. ASCR researchers have developed a new method, derived from commonly used linear algebra methods, to minimize communications between processors and the memory hierarchy, by reformulating the communication patterns specified within the algorithm. This method has been implemented in the TRILINOS framework, a highly-regarded suite of software, which provides functionality for researchers around the world to solve large scale, complex multi-physics problems. == Objectives == Communication-avoiding algorithms are designed with the following objectives: Reorganize algorithms to reduce communication across all memory hierarchies. Attain the lower-bound on communication when possible. The following simple example demonstrates how these are achieved. === Matrix multiplication example === Let A, B and C be square matrices of order n × n. The following naive algorithm implements C = C + A B: for i = 1 to n for j = 1 to n for k = 1 to n C(i,j) = C(i,j) + A(i,k) B(k,j) Arithmetic cost (time-complexity): n2(2n − 1) for sufficiently large n or O(n3). Rewriting this algorithm with communication cost labelled at each step for i = 1 to n {read row i of A into fast memory} - n2 reads for j = 1 to n {read C(i,j) into fast memory} - n2 reads {read column j of B into fast memory} - n3 reads for k = 1 to n C(i,j) = C(i,j) + A(i,k) B(k,j) {write C(i,j) back to slow memory} - n2 writes Fast memory may be defined as the local processor memory (CPU cache) of size M and slow memory may be defined as the DRAM. Communication cost (reads/writes): n3 + 3n2 or O(n3) Since total running time = γ·O(n3) + β·O(n3) and β >> γ the communication cost is dominant. The blocked (tiled) matrix multiplication algorithm reduces this dominant term: ==== Blocked (tiled) matrix multiplication ==== Consider A, B and C to be n/b-by-n/b matrices of b-by-b sub-blocks where b is called the block size; assume three b-by-b blocks fit in fast memory. for i = 1 to n/b for j = 1 to n/b {read block C(i,j) into fast memory} - b2 × (n/b)2 = n2 reads for k = 1 to n/b {read block A(i,k) into fast memory} - b2 × (n/b)3 = n3/b reads {read block B(k,j) into fast memory} - b2 × (n/b)3 = n3/b reads C(i,j) = C(i,j) + A(i,k) B(k,j) - {do a matrix multiply on blocks} {write block C(i,j) back to slow memory} - b2 × (n/b)2 = n2 writes Communication cost: 2n3/b + 2n2 reads/writes << 2n3 arithmetic cost Making b as large possible: 3b2 ≤ M we achieve the following communication lower bound: 31/2n3/M1/2 + 2n2 or Ω (no. of FLOPs / M1/2) == Previous approaches for reducing communication == Most of the approaches investigated in the past to address this problem rely on scheduling or tuning techniques that aim at overlapping communication with computation. However, this approach can lead to an improvement of at most a factor of two. Ghosting is a different technique for reducing communication, in which a processor stores and computes redundantly data from neighboring processors for future computations. Cache-oblivious algorithms represent a different approach introduced in 1999 for fast Fourier transforms, and then extended to graph algorithms, dynamic programming, etc. They were also applied to several operations in linear algebra as dense LU and QR factorizations. The design of architecture specific algorithms is another approach that can be used for reducing the communication in parallel algorithms, and there are many examples in the literature of algorithms that are adapted to a given communication topology.

    Read more →
  • PureXML

    PureXML

    pureXML is the native XML storage feature in the IBM Db2 data server. pureXML provides query languages, storage technologies, indexing technologies, and other features to support XML data. The word pure in pureXML was chosen to indicate that Db2 natively stores and natively processes XML data in its inherent hierarchical structure, as opposed to treating XML data as plain text or converting it into a relational format. == Technical information == Db2 includes two distinct storage mechanisms: one for efficiently managing traditional SQL data types, and another for managing XML data. The underlying storage mechanism is transparent to users and applications; they simply use SQL (including SQL with XML extensions or SQL/XML) or XQuery to work with the data. XML data is stored in columns of Db2 tables that have the XML data type. XML data is stored in a parsed format that reflects the hierarchical nature of the original XML data. As such, pureXML uses trees and nodes as its model for storing and processing XML data. If you instruct Db2 to validate XML data against an XML schema prior to storage, Db2 annotates all nodes in the XML hierarchy with information about the schema types; otherwise, it will annotate the nodes with default type information. Upon storage, Db2 preserves the internal structure of XML data, converting its tag names and other information into integer values. Doing so helps conserve disk space and also improves the performance of queries that use navigational expressions. However, users aren't aware of this internal representation. Finally, Db2 automatically splits XML nodes across multiple database pages, as needed. XML schemas specify which XML elements are valid, in what order these elements should appear in XML data, which XML data types are associated with each element, and so on. pureXML allows you to validate the cells in a column of XML data against no schema, one schema, or multiple schemas. pureXML also provides tools to support evolving XML schemas. IBM has enhanced its programming language interfaces to support access to its XML data. These enhancements span Java (JDBC), C (embedded SQL and call-level interface), COBOL (embedded SQL), PHP, and Microsoft's .NET Framework (through the DB2.NET provider). == History == pureXML was first included in the DB2 9 for Linux, Unix, and Microsoft Windows release, which was codenamed Viper, in June 2006. It was available on DB2 9 for z/OS in March 2007. In October 2007, IBM released DB2 9.5 with improved XML data transaction performance and improved storage savings. In June 2009, IBM released DB2 9.7 with XML supported for database-partitioned, range-partitioned, and multi-dimensionally clustered tables as well as compression of XML data and indices. == Competition == Db2 is a hybrid data server—it offers data management for traditional relational data, as well as providing native XML data management. Other vendors that offer data management for both relational data and native XML storage include Oracle with its 11g product and Microsoft with its SQL Server product. pureXML also competes with native XML databases like BaseX, eXist, MarkLogic or Sedna. == Books == IBM International Technical Support Organization (ITSO) has published the following books, which are available in print or as free e-books: DB2 9: pureXML Overview and Fast Start DB2 9 pureXML Guide The following books are also available for purchase: DB2 pureXML Cookbook: Master the Power of IBM Hybrid Data Server == Education and training == The following pureXML classroom and online courses are available from IBM Education: Query and Manage XML Data with DB2 9. IBM course CG130. Classroom. Duration: 4 days. Query XML Data with DB2 9. IBM course CG100. Classroom. Duration: 2 days (first 2 days of CG130). Managing XML Data in DB2 9. IBM course CG160. Classroom. Duration: 2 days (last 2 days of CG130). DB2 pureXML. IBM Course CT140. Self-paced study plus Live Virtual Classroom.

    Read more →
  • Grid-oriented storage

    Grid-oriented storage

    Grid-oriented Storage (GOS) was a term used for data storage by a university project during the era when the term grid computing was popular. == Description == GOS was a successor of the term network-attached storage (NAS). GOS systems contained hard disks, often RAIDs (redundant arrays of independent disks), like traditional file servers. GOS was designed to deal with long-distance, cross-domain and single-image file operations, which is typical in Grid environments. GOS behaves like a file server via the file-based GOS-FS protocol to any entity on the grid. Similar to GridFTP, GOS-FS integrates a parallel stream engine and Grid Security Infrastructure (GSI). Conforming to the universal VFS (Virtual Filesystem Switch), GOS-FS can be pervasively used as an underlying platform to best utilize the increased transfer bandwidth and accelerate the NFS/CIFS-based applications. GOS can also run over SCSI, Fibre Channel or iSCSI, which does not affect the acceleration performance, offering both file level protocols and block level protocols for storage area network (SAN) from the same system. In a grid infrastructure, resources may be geographically distant from each other, produced by differing manufacturers, and have differing access control policies. This makes access to grid resources dynamic and conditional upon local constraints. Centralized management techniques for these resources are limited in their scalability both in terms of execution efficiency and fault tolerance. Provision of services across such platforms requires a distributed resource management mechanism and the peer-to-peer clustered GOS appliances allow a single storage image to continue to expand, even if a single GOS appliance reaches its capacity limitations. The cluster shares a common, aggregate presentation of the data stored on all participating GOS appliances. Each GOS appliance manages its own internal storage space. The major benefit of this aggregation is that clustered GOS storage can be accessed by users as a single mount point. GOS products fit the thin-server categorization. Compared with traditional “fat server”-based storage architectures, thin-server GOS appliances deliver numerous advantages, such as the alleviation of potential network/grid bottle-necks, CPU and OS optimized for I/O only, ease of installation, remote management and minimal maintenance, low cost and Plug and Play, etc. Examples of similar innovations include NAS, printers, fax machines, routers and switches. An Apache server has been installed in the GOS operating system, ensuring an HTTPS-based communication between the GOS server and an administrator via a Web browser. Remote management and monitoring makes it easy to set up, manage, and monitor GOS systems. == History == Frank Zhigang Wang and Na Helian proposed a funding proposal to the UK government titled “Grid-Oriented Storage (GOS): Next Generation Data Storage System Architecture for the Grid Computing Era” in 2003. The proposal was approved and granted one million pounds in 2004. The first prototype was constructed in 2005 at Centre for Grid Computing, Cambridge-Cranfield High Performance Computing Facility. The first conference presentation was at IEEE Symposium on Cluster Computing and Grid (CCGrid), 9–12 May 2005, Cardiff, UK. As one of the five best work-in-progress, it was included in the IEEE Distributed Systems Online. In 2006, the GOS architecture and its implementations was published in IEEE Transactions on Computers, titled “Grid-oriented Storage: A Single-Image, Cross-Domain, High-Bandwidth Architecture”. Starting in January 2007, demonstrations were presented at Princeton University, Cambridge University Computer Lab and others. By 2013, the Cranfield Centre still used future tense for the project. Peer-to-peer file sharings use similar techniques.

    Read more →
  • Mooky (app)

    Mooky (app)

    Mooky was a location-based social and dating application, designed to help its users to find the perfect match by providing a large scale of filters. Mooky was free of charge. The app made use of mobile devices' geolocation, a feature of smart phones and other devices which allows users to locate other users who are nearby. == History == Mooky was published on Google Play on April 17, 2016, by Mooky BV. The latest version of this application was version 1.0.6. == Overview == === How it works === Mooky used Facebook to build a user profile with photos and basic information, like the user's surname and age. From there on the user had to fill in their Mooky profile, which contains information about the user's height, posture, hair color, eye color, ethnicity and religion. After this the user could select its preferences to find matches nearby. === User verification === Mooky asked their users to take a selfie holding a piece of paper saying 'Mooky'. Mooky would then manually accept or decline the user verification.

    Read more →
  • Algorithms and Combinatorics

    Algorithms and Combinatorics

    Algorithms and Combinatorics (ISSN 0937-5511) is a book series in mathematics, and particularly in combinatorics and the design and analysis of algorithms. It is published by Springer Science+Business Media, and was founded in 1987. == Books == The books published in this series include: The Simplex Method: A Probabilistic Analysis (Karl Heinz Borgwardt, 1987, vol. 1) Geometric Algorithms and Combinatorial Optimization (Martin Grötschel, László Lovász, and Alexander Schrijver, 1988, vol. 2; 2nd ed., 1993) Systems Analysis by Graphs and Matroids (Kazuo Murota, 1987, vol. 3) Greedoids (Bernhard Korte, László Lovász, and Rainer Schrader, 1991, vol. 4) Mathematics of Ramsey Theory (Jaroslav Nešetřil and Vojtěch Rödl, eds., 1990, vol. 5) Matroid Theory and its Applications in Electric Network Theory and in Statics (Andras Recszki, 1989, vol. 6) Irregularities of Partitions: Papers from the meeting held in Fertőd, July 7–11, 1986 (Gábor Halász and Vera T. Sós, eds., 1989, vol. 8) Paths, Flows, and VLSI-Layout: Papers from the meeting held at the University of Bonn, Bonn, June 20–July 1, 1988 (Bernhard Korte, László Lovász, Hans Jürgen Prömel, and Alexander Schrijver, eds., 1990, vol. 9) New Trends in Discrete and Computational Geometry (János Pach, ed., 1993, vol. 10) Discrete Images, Objects, and Functions in Z n {\displaystyle \mathbb {Z} ^{n}} (Klaus Voss, 1993, vol. 11) Linear Optimization and Extensions (Manfred Padberg, 1999, vol. 12) The Mathematics of Paul Erdős I (Ronald Graham and Jaroslav Nešetřil, eds., 1997, vol. 13) The Mathematics of Paul Erdős II (Ronald Graham and Jaroslav Nešetřil, eds., 1997, vol. 14) Geometry of Cuts and Metrics (Michel Deza and Monique Laurent, 1997, vol. 15) Probabilistic Methods for Algorithmic Discrete Mathematics (M. Habib, C. McDiarmid, J. Ramirez-Alfonsin, and B. Reed, 1998, vol. 16) Modern Cryptography, Probabilistic Proofs and Pseudorandomness (Oded Goldreich, 1999, vol. 17) Geometric Discrepancy: An Illustrated Guide (Jiří Matoušek, 1999, vol. 18) Applied Finite Group Actions (Adalbert Kerber, 1999, vol. 19) Matrices and Matroids for Systems Analysis (Kazuo Murota, 2000, vol. 20; corrected ed., 2010) Combinatorial Optimization (Bernhard Korte and Jens Vygen, 2000, vol. 21; 5th ed., 2012) The Strange Logic of Random Graphs (Joel Spencer, 2001, vol. 22) Graph Colouring and the Probabilistic Method (Michael Molloy and Bruce Reed, 2002, Vol. 23) Combinatorial Optimization: Polyhedra and Efficiency (Alexander Schrijver, 2003, vol. 24. In three volumes: A. Paths, flows, matchings; B. Matroids, trees, stable sets; C. Disjoint paths, hypergraphs) Discrete and Computational Geometry: The Goodman-Pollack Festschrift (B. Aronov, S. Basu, J. Pach, and M. Sharir, eds., 2003, vol. 25) Topics in Discrete Mathematics: Dedicated to Jarik Nešetril on the Occasion of his 60th birthday (M. Klazar, J. Kratochvíl, M. Loebl, J. Matoušek, R. Thomas, and P. Valtr, eds., 2006, vol. 26) Boolean Function Complexity: Advances and Frontiers (Stasys Jukna, 2012, Vol. 27) Sparsity: Graphs, Structures, and Algorithms (Jaroslav Nešetřil and Patrice Ossona de Mendez, 2012, vol. 28) Optimal Interconnection Trees in the Plane (Marcus Brazil and Martin Zachariasen, 2015, vol. 29) Combinatorics and Complexity of Partition Functions (Alexander Barvinok, 2016, vol. 30)

    Read more →
  • Documentalist

    Documentalist

    A documentalist is a professional, trained in documentation science and specializing in assisting researchers in their search for scientific and technical documentation. With the development of bibliographical databases such as MEDLINE, documentalists were professionals who searched such databases on the behalf of users. When the field of documentation changed its name to information science, the terms information specialist or information professional often replaced the term documentalist.

    Read more →
  • Artificial intelligence in architecture

    Artificial intelligence in architecture

    Artificial intelligence in architecture is the use of artificial intelligence in automation, design, and planning in the architectural process or in assisting human skills in the field of architecture. AI has been used by some architects for design, and has been proposed as a way to automate planning and routine tasks in the field. == Implications == === Benefits === Artificial intelligence, according to ArchDaily, is said to potentially significantly augment the architectural profession through its ability to improve the design and planning process as well as increasing productivity. Through its ability to handle a large amount of data, AI is said to potentially allow architects a range of design choices with criteria considerations such as budget, requirements adjusted to space, and sustainability goals calculated as part of the design process. ArchDaily said this may allow the design of optimized alternatives that can then undergo human review. AI tools are also said to potentially allow architects to assimilate urban and environmental data to inform their designs, streamlining initial stages of project planning and increasing efficiency and productivity. The advances in generative design through the input of specific prompts allow architects to produce visual designs, including photorealistic images, and thus render and explore various material choices and spatial configurations. ArchDaily noted this could speed the creative process as well as allow for experimentation and sophistication in the design. Additionally, AI's capacity for pattern recognition and coding could aid architects in organizing design resources and developing custom applications, thus enhancing the efficiency and collaboration between both architects and AI. AI is thought to also be able to contribute to the sustainability of buildings by analyzing various factors and following recommended energy-efficient modifications, thus pushing the industry towards greener practices. The use of AI in building maintenance, project management, and the creation of immersive virtual reality experiences are also thought of as potentially augmenting the architectural design process and workflow. Examples include the use of text-to-image systems such as Midjourney to create detailed architectural images, and the use of AI optimization systems from companies such as Finch3D and Autodesk to automatically generate floor plans from simple programmatic inputs. In contrast to digital-only creative practices, the high materiality of architectural outputs requires transitions from ephemeral digital files to permanent physical structures that are subject to strict safety regulations, material constraints, sensory intuition, and site-specific cultural contexts, making full automation difficult. Early adopters such as architect Stephen Coorlas have actively challenged the boundaries of architectural practice through AI. His early experimental initiative, Speculations on AI and Architecture, confronts the discipline's traditional workflows by training text-to-image AI tools such as Midjourney, Luma AI, and PromeAI to generate more nuanced architectural illustrations including construction documents, architectural details, and assembly sequences for various structures. Coorlas inputs precise terminology and architectural language to provoke the AI into producing axonometric drawings that resemble conventional documentation, then experiments with animating the outputs using AI generated depth maps and other AI image-to-3D wireframe tools. Stephen's inventive process invites architects and designers to reconsider authorship, automation, and the future of visual communication in the built environment. Rather than treating AI as a peripheral tool, Stephen has advocated for AI to be a speculative collaborator capable of engaging with discipline-specific challenges. His work contributes to the growing discourse on generative design, parametric optimization, and the philosophical implications of machine-assisted creativity raising urgent questions about how such technologies will reshape architectural agency, precision, and pedagogy. Another prominent advocate is Architect Andrew Kudless, who in an interview to Dezeen recounted that he uses AI to innovate in architectural design by incorporating materials and scenes not usually present in initial plans, which he believes can significantly alter client presentations. He told Dezeen he believes one should show clients renderings from the onset, with AI assisting in this work, arguing that changes in design should be a positive aspect of the client-designer relationship by actively involving clients in the process. Additionally, Kudless highlighted the AI's potential to facilitate labor in architectural firms, particularly in automating rendering tasks, thus reducing the workload on junior staff while maintaining control over the creative output. === Emergent aesthetics === In an interview for the AItopia series to Dezeen, designer Tim Fu discussed the transformative potential of AI in architecture, and proposed a future where AI could herald a "neoclassical futurist" style, blending the grandeur of classical aesthetics with futuristic design. Through his collaborative project, The AI Stone Carver, Fu showcased how AI can innovate traditional practices by generating design concepts that are then realized through human craftsmanship, such as stone carving by mason Till Apfel. This approach, he believed, celebrated the fusion of diverse architectural styles and also emphasized the unique capabilities of AI in enhancing creative design processes. Fu told Dezeen he envisions the integration of AI in design as a means to revive the ornamentation and detailed aesthetics characteristic of classical architecture, moving away from minimalism, which he said dominates contemporary architecture. He argued that AI's involvement in the ideation phase of design allows for a reversal in the roles of machine and human, enabling architects and designers to focus on creating more intricate and ornamental structures. Fu's optimistic outlook extended to the broader impact of AI on the architectural field, seeing it as an indispensable tool that will shift rather than replace human roles, enriching the field with innovative designs that pay homage to the beauty and qualities of classical architecture not present in contemporary architecture while embracing new technologies. This perspective resonates with designers like Manas Bhatia, whose explorations similarly embrace generative AI as a co-creator and a medium to express ideas, blend architectural traditions, and speculate spatial futures. === Concerns === As AI continues to expand its presence across various industries, its impact on the architectural profession has become a topic of growing discussion. These discussions focus on how AI processes may influence traditional architectural practices, potentially altering job roles, and shaping the nature of creativity. While AI-driven processes may increase efficiency in some aspects of the profession, they also raise questions about the potential loss of unique design perspectives. These thoughts have been countered by many prominent creative figures in the realm of AI architecture, such as Stephen Coorlas, Tim Fu, Hassan Ragab, and Manas Bhatia who have showcased the amplification of creativity in design and potential benefits in terms of restoring creative power to the designer. A key concern is that AI-powered tools could diminish the need for human involvement in specific tasks traditionally performed by architects. This has led to speculation that the profession may increasingly shift toward roles focused on oversight, coordination, and strategic decision-making rather than hands-on design work. In some design scenarios, algorithmically generated solutions can be adjusted to prioritize efficiency and cost-effectiveness, which some argue may overshadow the creative and contextual nuances that define individual architectural styles. As with any discipline though, it has been determined that AI can be configured to provide beneficial results based on inputs and end goals the architect or designer assigns it. There are also concerns about the potential for AI to exacerbate inequalities within the architectural profession. For instance, larger firms with greater resources to invest in advanced AI technologies may gain a competitive edge over smaller firms and independent architects. This dynamic could contribute to industry consolidation, potentially limiting the diversity of architectural practice and stifling innovation. Ethical considerations in regard to cultural sensitivity have also been raised due to the datasets used to train AI. Without proper vetting of data or implementing failsafe overrides, AI generated outcomes can trend toward overly documented and prioritized content.

    Read more →
  • Nona-binning

    Nona-binning

    Nona-binning is a pixel binning technique used in high-resolution image sensors, primarily in smartphone cameras. The method is based on merging groups of nine neighbouring pixels arranged in a 3×3 pattern. This configuration allows a sensor with very small individual pixels to increase its effective light sensitivity when operating in low-light conditions, while still maintaining high nominal resolution in bright environments. == Overview == Nona-binning is most commonly implemented in sensors with a resolution of 108 megapixels and higher. As pixel counts grew, the physical dimensions of individual pixels continued to shrink, reducing the amount of light captured by each. The 3×3 binning structure enables a sensor to operate in two modes. In well-lit scenes, each pixel is processed separately, providing the full resolution of the sensor. In darker settings, nine pixels with identical colour filters are combined into a single output unit, increasing signal strength and reducing noise. == Technical principles == Unlike the traditional Bayer colour filter array, which alternates colours on a per-pixel basis, nona-binning uses a grouped layout. The sensor forms blocks of nine pixels with matching colour filters — typically within a Quad Bayer–derived arrangement extended to 3×3 regions. When operating in the binning mode, the sensor aggregates the charge generated by all nine pixels in each block. This increases effective sensitivity but lowers the final image resolution. When lighting conditions allow, the sensor returns to processing pixel data individually. == Applications == Nona-binning is primarily used in: Smartphone photography, particularly in devices equipped with sensors exceeding 100 megapixels. Low-light imaging, where increased sensitivity improves exposure stability and reduces noise. Computational photography systems, such as multi-frame processing and HDR capture. == Related technologies == Nona-binning belongs to the broader group of pixel-binning approaches used in modern sensors. Other implementations include Tetracell, which merges four pixels in a 2×2 block, and hexa-binning, which combines six pixels, though it is less common. All of these methods aim to balance the high nominal resolution of mobile sensors with the need for improved low-light performance.

    Read more →
  • Token-based replay

    Token-based replay

    Token-based replay technique is a conformance checking algorithm that checks how well a process conforms with its model by replaying each trace on the model (in Petri net notation ). Using the four counters produced tokens, consumed tokens, missing tokens, and remaining tokens, it records the situations where a transition is forced to fire and the remaining tokens after the replay ends. Based on the count at each counter, we can compute the fitness value between the trace and the model. == The algorithm == Source: The token-replay technique uses four counters to keep track of a trace during the replaying: p: Produced tokens c: Consumed tokens m: Missing tokens (consumed while not there) r: Remaining tokens (produced but not consumed) Invariants: At any time: p + m ≥ c ≥ m {\displaystyle p+m\geq c\geq m} At the end: r = p + m − c {\displaystyle r=p+m-c} At the beginning, a token is produced for the source place (p = 1) and at the end, a token is consumed from the sink place (c' = c + 1). When the replay ends, the fitness value can be computed as follows: 1 2 ( 1 − m c ) + 1 2 ( 1 − r p ) {\displaystyle {\frac {1}{2}}(1-{\frac {m}{c}})+{\frac {1}{2}}(1-{\frac {r}{p}})} == Example == Suppose there is a process model in Petri net notation as follows: === Example 1: Replay the trace (a, b, c, d) on the model M === Step 1: A token is initiated. There is one produced token ( p = 1 {\displaystyle p=1} ). Step 2: The activity a {\displaystyle \mathbf {a} } consumes 1 token to be fired and produces 2 tokens ( p = 1 + 2 = 3 {\displaystyle p=1+2=3} and c = 1 {\displaystyle c=1} ). Step 3: The activity b {\displaystyle \mathbf {b} } consumes 1 token and produces 1 token ( p = 3 + 1 = 4 {\displaystyle p=3+1=4} and c = 1 + 1 = 2 {\displaystyle c=1+1=2} ). Step 4: The activity c {\displaystyle \mathbf {c} } consumes 1 token and produces 1 token ( p = 4 + 1 = 5 {\displaystyle p=4+1=5} and c = 2 + 1 = 3 {\displaystyle c=2+1=3} ). Step 5: The activity d {\displaystyle \mathbf {d} } consumes 2 tokens and produces 1 token ( p = 5 + 1 = 6 {\displaystyle p=5+1=6} , c = 3 + 2 = 5 {\displaystyle c=3+2=5} ). Step 6: The token at the end place is consumed ( c = 5 + 1 = 6 {\displaystyle c=5+1=6} ). The trace is complete. The fitness of the trace ( a , b , c , d {\displaystyle \mathbf {a,b,c,d} } ) on the model M {\displaystyle \mathbf {M} } is: 1 2 ( 1 − m c ) + 1 2 ( 1 − r p ) = 1 2 ( 1 − 0 6 ) + 1 2 ( 1 − 0 6 ) = 1 {\displaystyle {\frac {1}{2}}(1-{\frac {m}{c}})+{\frac {1}{2}}(1-{\frac {r}{p}})={\frac {1}{2}}(1-{\frac {0}{6}})+{\frac {1}{2}}(1-{\frac {0}{6}})=1} === Example 2: Replay the trace (a, b, d) on the model M === Step 1: A token is initiated. There is one produced token ( p = 1 {\displaystyle p=1} ). Step 2: The activity a {\displaystyle \mathbf {a} } consumes 1 token to be fired and produces 2 tokens ( p = 1 + 2 = 3 {\displaystyle p=1+2=3} and c = 1 {\displaystyle c=1} ). Step 3: The activity b {\displaystyle \mathbf {b} } consumes 1 token and produces 1 token ( p = 3 + 1 = 4 {\displaystyle p=3+1=4} and c = 1 + 1 = 2 {\displaystyle c=1+1=2} ). Step 4: The activity d {\displaystyle \mathbf {d} } needs to be fired but there are not enough tokens. One artificial token was produced and the missing token counter is increased by one ( m = 1 {\displaystyle m=1} ). The artificial token and the token at place [ b , d ] {\displaystyle [\mathbf {b,d} ]} are consumed ( c = 2 + 2 = 4 {\displaystyle c=2+2=4} ) and one token is produced at place end ( p = 4 + 1 = 5 {\displaystyle p=4+1=5} ). Step 5: The token in the end place is consumed ( c = 4 + 1 = 5 {\displaystyle c=4+1=5} ). The trace is complete. There is one remaining token at place [ a , c ] {\displaystyle [\mathbf {a,c} ]} ( r = 1 {\displaystyle r=1} ). The fitness of the trace ( a , b , d {\displaystyle \mathbf {a,b,d} } ) on the model M {\displaystyle \mathbf {M} } is: 1 2 ( 1 − m c ) + 1 2 ( 1 − r p ) = 1 2 ( 1 − 1 5 ) + 1 2 ( 1 − 1 5 ) = 0.8 {\displaystyle {\frac {1}{2}}(1-{\frac {m}{c}})+{\frac {1}{2}}(1-{\frac {r}{p}})={\frac {1}{2}}(1-{\frac {1}{5}})+{\frac {1}{2}}(1-{\frac {1}{5}})=0.8}

    Read more →
  • Cancer Likelihood in Plasma

    Cancer Likelihood in Plasma

    Cancer Likelihood in Plasma (CLiP) refers to a set of ensemble learning methods for integrating various genomic features useful for the noninvasive detection of early cancers from blood plasma. An application of this technique for early detection of lung cancer (Lung-CLiP) was originally described by Chabon et al. (2020) from the labs of Ash Alizadeh and Max Diehn at Stanford. This method relies on several improvements to cancer personalized profiling by deep sequencing (CAPP-Seq) for analysis of circulating tumor DNA (ctDNA). The CLiP technique integrates multiple distinctive genomic features of a cancer of interest findings within a machine-learning framework for cancer detection. For example, studies have shown that the majority of somatic mutations found in cell-free DNA (cfDNA) are not tumor derived, but instead reflect clonal hematopoeisis (also known as CHIP). Even though CHIP tends to target specific genes, it also involves many generally non-recurrent mutations that can be shed from leukocytes and detected in cfDNA, regardless of whether profiling patients with cancer and healthy adults. However, genuine tumor derived ctDNA mutations can be distinguished from CHIP-derived mutations. This is because unlike tumor-derived mutations, CHIP-derived mutations that are shed from leukocytes into plasma tend to occur on longer cfDNA fragments, and to lack specific mutational signatures such as those associated with tobacco smoking in lung cancer that are also found in tumor derived ctDNA molecules. CLiP integrates these features within hierarchical ensemble machine learning models that consider somatic mutations and copy number alternations, among other features. While the CLiP method is unique in relying exclusively on mutations and copy number alterations, it is related to a variety of other liquid biopsy methods being commercially developed for early cancer detection using ctDNA and proteins (e.g., CancerSEEK / DETECT-A ), cfDNA fragmentation patterns (e.g., DELFI), and DNA methylation (e.g., cfMeDIP-Seq, Grail). While the CLiP method has not yet been broadly applied for population-based cancer screening, it has been shown to distinguish discriminate early-stage lung cancers from risk-matched controls across multiple cohorts of patients enrolled across the US.

    Read more →