Small data

Small data

Small data is data that is 'small' enough for human comprehension. It is data in a volume and format that makes it accessible, informative and actionable. The term "big data" is about machines and "small data" is about people. This is to say that eyewitness observations or five pieces of related data could be small data. Small data is what we used to think of as data. The only way to comprehend Big data is to reduce the data into small, visually-appealing objects representing various aspects of large data sets (such as histogram, charts, and scatter plots). Big Data is all about finding correlations, but Small Data is all about finding the causation, the reason why. A formal definition of small data has been proposed by Allen Bonde, former vice-president of Innovation at Actuate - now part of OpenText: "Small data connects people with timely, meaningful insights (derived from big data and/or “local” sources), organized and packaged – often visually – to be accessible, understandable, and actionable for everyday tasks." Another definition of small data is: The small set of specific attributes produced by the Internet of Things. These are typically a small set of sensor data such as temperature, wind speed, vibration and status. It was estimated (2016) that “If one takes the top 100 biggest innovations of our time, perhaps around 60% to 65% percent are really based on Small Data.” as Martin Lindstrom puts it. Small data includes everything from Snapchat to simple objects such as the post-it note. Lindstrom believes we become so focused on Big-Data that we tend to forget about more basic concepts and creativity. Lindstrom defines Small Data "as seemingly insignificant observations you identify in consumers’ homes, is everything from how you place your shoes on how you hang your paintings". He thus considers that one should perfectly master the basic (Small Data) in order to mine and find correlations. == Academic Recognition and Methodology == The growing significance of "small data" as a distinct field of inquiry was highlighted by the 2024 Thematic Einstein Semester (TES) on Small Data Analysis, hosted by the Berlin Mathematics Research Center MATH+. A central focus of this semester was the transition from theoretical analysis to practical decision-making. Because small data sets are primarily used to drive specific actions, the presentation of results becomes an essential methodological step. The semester’s findings emphasized that while small data may lack volume, it often contains a high density of "many possible interpretations." Consequently, the final conference of the TES was structured around the pillars of interpretation, explanation, and knowledge gain. Participants sought to develop new mathematical and methodical representations that could accurately depict this wealth of interpretative possibilities. This work underscores that analyzing small data is not purely a computational task; it requires a robust interface between mathematics and diverse disciplines to ensure that insights are both contextually grounded and scientifically rigorous. == Uses in business == === Marketing === Bonde has written about the topic for Forbes, Direct Marketing News, CMO.com and other publications. According to Martin Lindstrom, in his book, Small Data: "{In customer research, small data is} Seemingly insignificant behavioural observations containing very specific attributes pointing towards an unmet customer need. Small data is the foundation for breakthrough ideas or completely new ways to turnaround brands." His approach is based on the combination of the observation of small samples with intuition. Marketers can obtain market insights from gathering Small Data by engaging with and observing people in their own environments. In comparison to Big Data, Small Data has the power to trigger emotions and to provide insights into the reasons behind the behaviours of customers. It may uncover detailed information on a person's extroversion or introversion, self-confidence, whether one is having problems in his/her relationship, etc. According to Lindstrom, relationships among people and customer segments are organized around four criteria: Climate: It reveals for example how a person's environment affects their diet. Rulership: The power or government in charge Religion: The prevalence of religion in a country, depending on its influence, indicates whether a person's decision making process is impacted by their belief system. Tradition: Cultural norms influence people's behaviors and interactions. Many companies underestimate the power of Small Data, using samples of millions of consumers instead of recognizing the value of closely observing small samples in their market research. In his book, Lindstrom defines "7Cs", which companies should consider in the attempt to derive meaningful customer insights and market trends through small data from their customers: Collecting: Understanding the manner in which observations are translated inside a home. Clues: Uncovering other distinctive emotional reflections that can be observed. Connecting: Identifying the consequences of emotional behaviour. Causation: Understanding what emotions are being evoked. Correlation: Identifying the initial date of appearance of the behaviour or emotion. Compensation: Identifying the unmet or unfulfilled desire. Concept: Defining the “big idea” compensation for the identified consumer need. Some of Lindstrom's clients such as Lowes Foods looked at data in a different way and actually chose to live with the customer. “As you enter their store, they have now created an amazing community where every staff member acts in a character mood, based on Small Data”. The supermarket made everything it can to make the customer feel at home. All the behaviours of employees are inspired by customer feedbacks gathered from interviews directly done at customer’s home. === Healthcare === Researchers at Cornell University started developing applications to monitor health problems in patients, based on small data. This is an initiative of Cornell's Small Data Lab, in close cooperation with Weill Cornell Medicine College, led by Deborah Estrin. The Small Data Lab developed a series of apps, focusing not only on gathering data from patients' pain but also tracking habits in areas such as grocery shopping. In the case of patients with rheumatoid arthritis for example, which has flares and remissions that do not follow a particular cycle, the app gathers information passively, thus allowing to forecast when a flare might be coming up based on small changes in behaviour. Other apps developed also include monitoring online grocery shopping, to use this information from every user to adapt their groceries to the recommendations of nutritionists, or monitoring email language to identify patterns that might indicate "fluctuations in cognitive performance, fatigue, side effects of medication or poor sleep, and other conditions and treatments that are typically self-reported and self-medicated". === Postal Service === The United States Postal Service (USPS) used optical character recognition (OCR) to automatically read and process 98% of all hand-addressed mail and 99.5% of machine-printed mail. By combining this technology with its small data sample of US zip codes, the USPS can now process more than 36,000 pieces of mail per hour. === Aerospace === In 2015, Boeing established the analytics lab for aerospace data in cooperation with the Carnegie Mellon University to leverage the university's leadership in machine learning, language technologies and data analytics. One of the initiatives projects aims to by standardize maintenance logs using AI to dramatically reduce costs. Currently, there is no standardized procedure to document maintenance logs leading to small but highly unstructured data sets. As a result, it becomes highly difficult for maintenance workers to translate these variations in maintenance logs within a short period of time. However, with AI and a narrow data set of common aircraft maintenance terminology, it becomes possible to dynamically translate these logs in real time. By using AI to enhance the speed and accuracy of the airline maintenance workflow, airlines stand to save billions according to the Harvard Business Review.

Ontology learning

Ontology learning (ontology extraction, ontology augmentation generation, ontology generation, or ontology acquisition) is the automatic or semi-automatic creation of ontologies, including extracting the corresponding domain's terms and the relationships between the concepts that these terms represent from a corpus of natural language text, and encoding them with an ontology language for easy retrieval. As building ontologies manually is extremely labor-intensive and time-consuming, there is great motivation to automate the process. Typically, the process starts by extracting terms and concepts or noun phrases from plain text using linguistic processors such as part-of-speech tagging and phrase chunking. Then statistical or symbolic techniques are used to extract relation signatures, often based on pattern-based or definition-based hypernym extraction techniques. == Procedure == Ontology learning (OL) is used to (semi-)automatically extract whole ontologies from natural language text. The process is usually split into the following eight tasks, which are not all necessarily applied in every ontology learning system. === Domain terminology extraction === During the domain terminology extraction step, domain-specific terms are extracted, which are used in the following step (concept discovery) to derive concepts. Relevant terms can be determined, e.g., by calculation of the TF/IDF values or by application of the C-value / NC-value method. The resulting list of terms has to be filtered by a domain expert. In the subsequent step, similarly to coreference resolution in information extraction, the OL system determines synonyms, because they share the same meaning and therefore correspond to the same concept. The most common methods therefore are clustering and the application of statistical similarity measures. === Concept discovery === In the concept discovery step, terms are grouped to meaning bearing units, which correspond to an abstraction of the world and therefore to concepts. The grouped terms are these domain-specific terms and their synonyms, which were identified in the domain terminology extraction step. === Concept hierarchy derivation === In the concept hierarchy derivation step, the OL system tries to arrange the extracted concepts in a taxonomic structure. This is mostly achieved with unsupervised hierarchical clustering methods. Because the result of such methods is often noisy, a supervision step, e.g., user evaluation, is added. A further method for the derivation of a concept hierarchy exists in the usage of several patterns that should indicate a sub- or supersumption relationship. Patterns like “X, that is a Y” or “X is a Y” indicate that X is a subclass of Y. Such pattern can be analyzed efficiently, but they often occur too infrequently to extract enough sub- or supersumption relationships. Instead, bootstrapping methods are developed, which learn these patterns automatically and therefore ensure broader coverage. === Learning of non-taxonomic relations === In the learning of non-taxonomic relations step, relationships are extracted that do not express any sub- or supersumption. Such relationships are, e.g., works-for or located-in. There are two common approaches to solve this subtask. The first is based upon the extraction of anonymous associations, which are named appropriately in a second step. The second approach extracts verbs, which indicate a relationship between entities, represented by the surrounding words. The result of both approaches need to be evaluated by an ontologist to ensure accuracy. === Rule discovery === During rule discovery, axioms (formal description of concepts) are generated for the extracted concepts. This can be achieved, e.g., by analyzing the syntactic structure of a natural language definition and the application of transformation rules on the resulting dependency tree. The result of this process is a list of axioms, which, afterwards, is comprehended to a concept description. This output is then evaluated by an ontologist. === Ontology population === At this step, the ontology is augmented with instances of concepts and properties. For the augmentation with instances of concepts, methods based on the matching of lexico-syntactic patterns are used. Instances of properties are added through the application of bootstrapping methods, which collect relation tuples. === Concept hierarchy extension === In this step, the OL system tries to extend the taxonomic structure of an existing ontology with further concepts. This can be performed in a supervised manner with a trained classifier or in an unsupervised manner via the application of similarity measures. === Frame and Event detection === During frame/event detection, the OL system tries to extract complex relationships from text, e.g., who departed from where to what place and when. Approaches range from applying SVM with kernel methods to semantic role labeling (SRL) to deep semantic parsing techniques. == Tools == Dog4Dag (Dresden Ontology Generator for Directed Acyclic Graphs) is an ontology generation plugin for Protégé 4.1 and OBOEdit 2.1. It allows for term generation, sibling generation, definition generation, and relationship induction. Integrated into Protégé 4.1 and OBO-Edit 2.1, DOG4DAG allows ontology extension for all common ontology formats (e.g., OWL and OBO). Limited largely to EBI and Bio Portal lookup service extensions.

Anti-social Media Bill (Nigeria)

Anti-social Media Bill was introduced by the Senate of the Federal Republic of Nigeria on 5 November 2019 to criminalise the use of the social media in peddling false or malicious information. The original title of the bill is Protection from Internet Falsehood and Manipulations Bill 2019. It was sponsored by Senator Mohammed Sani Musa from the largely conservative northern Nigeria. After the bill passed second reading on the floor of the Nigeria Senate and its details were made public, information emerged on the social media accusing the sponsor of the bill of plagiarising a similar law in Singapore which is at the bottom of global ranking in the freedom of speech and of the press. But the senator denied that he plagiarised Singaporean law. == Opposition to the bill == Angry reactions trailed the introduction of the bill, and a number of civil society organisations, human rights activists, and Nigerian citizens unanimously opposed the bill. International rights group, Amnesty International and Human Rights Watch condemned the proposed legislation saying it is aimed at gagging freedom of speech which is a universal right in a country of over two hundred million people. Opposition political parties are very critical of the bill and accused the government of attempting to strip bare, Nigerian citizens of their rights to free speech and destroying same social media on whose power and influence the ruling All Progressives Congress, APC came to power in 2015. Nigeria Information Minister, Lai Mohammed has been at the center of public criticism because he is suspected to be the brain behind the proposed act. Lai was a former spokesman of then opposition All Progressives Congress. A "Stop the Social Media Bill! You can no longer take our rights from us" online petition campaign to force the Nigeria parliament to drop the bill received over 90,000 signatures within 24 hours. In November 2019, after the bill passed second reading in the senate, Akon Eyakenyi, a senator from Akwa Ibom State publicly said he would resist the bill. === Support for the bill === Those who support the proposed act especially Senators have often argued that the law would help curtail hate speech. President Muhammad Buhari who is seen as a beneficiary of the influence and power of the social media and free speech has been mute about it. But the president's senior aides and family members have publicly spoken in support of the bill. In November 2019, the wife of the president, Aisha Buhari, told a gathering at the Nigeria's National Mosque in the capital, Abuja that if China with over one billion people could regulate the social media, Nigeria should do same. But Nigerians reacted saying Nigeria is not a one-party communist state like China. Days later, a daughter to the president, Zahra Indimi told a gathering of young people in Abuja that social media had become a potent weapon for bullying those they thought were doing better than them in terms of social class and called for a critical regulation. == Key provisions of the bill == === Title === Protection from Internet Falsehoods, Manipulations and Other Related Matters Bill 2019. === Explanatory memorandum === This Act is to prevent Falsehoods and Manipulations in Internet transmission and correspondences in Nigeria. To suppress falsehoods and manipulations and counter the effects of such communications and transmissions and to sanction offenders with a view to encouraging and enhancing transparency by Social Media Platforms using the internet correspondences. === Objectives === One objective of the bill is to prevent the transmission of false statements or declaration of facts in Nigeria. Another objective of the bill is to end the financing of online mediums that transmit false statements. Measures will be taken to detect and control inauthentic behaviour and misuse of online accounts (parody accounts). When paid content is posted towards a political end, there will be measures to ensure the poster discloses such information. There will be sanction for offenders. === Transmission of false statement === According to the bill, a person must not: Transmit a statement that is false or, Transmit a statement that might: i. Affect the security or any part of Nigeria. ii. Affect public health, public safety or public finance. iii. Affect Nigeria's relationship with other countries. iv. influence the outcome of an election to any office in a general election. v. Cause enmity or hatred towards a person or group of persons. Anyone guilty of the above is liable to a fine of N300,000 or three years' imprisonment or both (for individual); and a fine not exceeding ten million naira (for corporate organisations). Same punishment applies for fake online accounts that transmit statements listed above. === Parody accounts === The bill says a person shall not open an account to transmit false statement. Anyone found guilty will be fined N200,000 or three years' imprisonment or both (for an individual) or five million naira (for corporate organisations). If such accounts transmit a statement that will affect security or influence the outcome of an election, such a person will be fined N300,000 or three years' imprisonment or both. If a person receives payment or reward to help another to transmit false statements knowingly, he/she is liable to a fine of N150,000 or three years' imprisonment or both. If a person receives payment or reward to help another to transmit a statement affects security or influence the outcome of an election, the fine is N300,000 or three years' imprisonment or both (for individual) and ten million naira for organisations. === Declaration === According to the bill, a law enforcement department can issue a "declaration" to offenders. And this declaration will be issued even if the "false statement" has been corrected or pulled down. The offender will be required to publish a "correction notice" in a specified newspaper, online location or other printed publication of Nigeria. Failure to comply, a person is liable to N200,000 or 12 months' imprisonment or both (for individual) and five million naira for organisations. === Access blocking order === The bill says the law enforcement department will also issue an access blocking order to offenders. The law enforcement department may direct the NCC to order the internet access service provider to disable access by users in Nigeria to the online location and the NCC must give the internet access service provider an access blocking order. An internet access service provider that does not comply with any access blocking order is liable on conviction to a fine not exceeding ten million naira for each day during any part of which that order is not fully complied with, up to a total of five million naira.

Deconfliction line

A deconfliction line is an official line of communications established between militaries who are or could be hostile, to avoid dangerous misunderstandings and miscalculations based on ignorance. The ultimate aim is to avoid accidents and conflict escalation. In the 2010s and 2020s, the US and Russia set up deconfliction lines during the Syrian civil war and Russo-Ukrainian War. They were regularly tested by military staff, and used by air traffic controllers and senior military officers. They were used to avoid midair collisions between aircraft in the same or adjacent airspace, and sometimes to give warning of airstrikes. In April 2017, Russia severed the Syrian line in retaliation for a called strike.

Communications system

A communications system is a collection of individual telecommunications networks systems, relay stations, tributary stations, and terminal equipment usually capable of interconnection and interoperation to form an integrated whole. Communication systems allow the transfer of information from one place to another or from one device to another through a specified channel or medium. The components of a communications system serve a common purpose, are technically compatible, use common procedures, respond to controls, and operate in union. In the structure of a communication system, the transmitter first converts the data received from the source into a light signal and transmits it through the medium to the destination of the receiver. The receiver connected at the receiving end converts it to digital data, maintaining certain protocols e.g. FTP, ISP assigned protocols etc. Telecommunications is a method of communication (e.g., for sports broadcasting, mass media, journalism, etc.). Communication is the act of conveying intended meanings from one entity or group to another through the use of mutually understood signs and semiotic rules. == Types == === By media === An optical communication system is any form of communications system that uses light as the transmission medium. Equipment consists of a transmitter, which encodes a message into an optical signal, a communication channel, which carries the signal to its destination, and a receiver, which reproduces the message from the received optical signal. Fiber-optic communication systems transmit information from one place to another by sending light through an optical fiber. The light forms a carrier signal that is modulated to carry information. A radio communication system is composed of several communications subsystems that give exterior communications capabilities. A radio communication system comprises a transmitting conductor in which electrical oscillations or currents are produced and which is arranged to cause such currents or oscillations to be propagated through the free space medium from one point to another remote therefrom and a receiving conductor at such distant point adapted to be excited by the oscillations or currents propagated from the transmitter. Power-line communication systems operate by impressing a modulated carrier signal on power wires. Different types of power-line communications use different frequency bands, depending on the signal transmission characteristics of the power wiring used. Since the power wiring system was originally intended for transmission of AC power, the power wire circuits have only a limited ability to carry higher frequencies. The propagation problem is a limiting factor for each type of power line communications. === By technology === A duplex communication system is a system composed of two connected parties or devices which can communicate with one another in both directions. The term duplex is used when describing communication between two parties or devices. Duplex systems are employed in nearly all communications networks, either to allow for a communication "two-way street" between two connected parties or to provide a "reverse path" for the monitoring and remote adjustment of equipment in the field. An antenna is basically a small length of a conductor that is used to radiate or receive electromagnetic waves. It acts as a conversion device. At the transmitting end it converts high frequency current into electromagnetic waves. At the receiving end it transforms electromagnetic waves into electrical signals that is fed into the input of the receiver. several types of antenna are used in communication. Examples of communications subsystems include the Defense Communications System (DCS). === Examples: by technology === Telephone Mobile phone Tablet computer Television Telegraph Edison Telegraph TV cable Computer === By application area === The term transmission system is used in the telecommunications industry to emphasize the intermediate media, protocols, and equipment in the circuit, rather than particular end-user applications. A tactical communications system is a communications system that (a) is used within, or in direct support of tactical forces (b) is designed to meet the requirements of changing tactical situations and varying environmental conditions, (c) provides securable communications, such as voice, data, and video, among mobile users to facilitate command and control within, and in support of, tactical forces, and (d) usually requires extremely short installation times, usually on the order of hours, in order to meet the requirements of frequent relocation. An Emergency communication system is any system (typically computer based) that is organized for the primary purpose of supporting the two way communication of emergency messages between both individuals and groups of individuals. These systems are commonly designed to integrate the cross-communication of messages between are variety of communication technologies. An Automatic call distributor (ACD) is a communication system that automatically queues, assigns and connects callers to handlers. This is used often in customer service (such as for product or service complaints), ordering by telephone (such as in a ticket office), or coordination services (such as in air traffic control). A Voice Communication Control System (VCCS) is essentially an ACD with characteristics that make it more adapted to use in critical situations (no waiting for dial tone, or lengthy recorded announcements, radio and telephone lines equally easily connected to, individual lines immediately accessible etc..) == Key components == =

Situated

In artificial intelligence and cognitive science, the term situated refers to an agent which is embedded in an environment. The term situated is commonly used to refer to robots, but some researchers argue that software agents can also be situated if: they exist in a dynamic (rapidly changing) environment, which they can manipulate or change through their actions, and which they can sense or perceive. Examples might include web-based agents, which can alter data or trigger processes (such as purchases) over the internet, or virtual-reality bots which inhabit and change virtual worlds, such as Second Life. Being situated is generally considered to be part of being embodied, but it is useful to consider each perspective individually. The situated perspective emphasizes that intelligent behaviour derives from the environment and the agent's interactions with it. The nature of these interactions are defined by an agent's embodiment.

Go-box

Go-box is a name used for a number of electronic devices. The "Go-Box" is often a box, crate, carry-case, modified briefcase or similar construction containing electronic equipment pre-setup and ready to function. The box can then be taken into the field or placed at a remote site with minimal effort. These are often used by radio amateurs (or "Hams") for emergency communications, experimental work, or field communications. This has also led to similar equipment being used in the Emergency Services, utility companies, military, and government agencies. A search of the YouTube website can reveal a number of ideas for these devices mostly built by people at home. Terms created after the use of "go-box" include the "go-bag" which is an 'essentials' bag of items needed for evacuations or quick departures, i.e. medicines, clothes, torch, Broadcast radio receiver, batteries, etc. In Austria it is a radio transmitter used in trucks as part of the Videomaut toll collection system. One use of the term in the United States it is a device which is supposed to change traffic signals from red to green. U.S. Fire trucks have a similar device, called an Opticon, that uses an infrared beam. Two residents of Miami, Florida, were arrested for selling fake go-boxes online. Several hundred were sold, prices ranging from $69 to $150. In reality, the boxes contained nothing more than strobe lights.