Data Science Africa

Data Science Africa

Data Science Africa (DSA) is a non-profit knowledge sharing professional group that aims at bringing together leading researchers and practitioners working on data science methods or applications relevant to Africa, and providing training on state of the art data science methods to students and others interested in developing practical skills. Since 2013, DSA has been organizing conference, workshops and summer schools on machine learning and data science across East Africa. Facilitators of Summer School and workshops are researchers and practitioners from the academia, private and public institutions across the world. == Summer schools and workshops == The first summer school which started as Gaussian Process Summer School was held at Makerere University in Kampala, Uganda from 6th to 9 August 2013. The First Data Science Summer School and Workshop was held at Dedan Kimathi University of Technology in Nyeri, Kenya from 15th to 19 June 2015. The Second Data Science Summer School was held at Makerere University, Kampala, Uganda from 27th to 29 July 2016, and the workshop was held at Pulse Lab, Kampala, Uganda from 30 July to 1 August 2016. The Third Data Science Summer School and Workshop was held at Nelson Mandela African Institute of Science and Technology, Tanzania from 19th to 21 July 2017. Among the sponsors of the event was ARM

Verbal overshadowing

Verbal overshadowing is a phenomenon where giving a verbal description of sensory input impairs formation of memories of that input. This was first reported by Schooler and Engstler-Schooler (1990) where it was shown that the effects can be observed across multiple domains of cognition which are known to rely on non-verbal knowledge and perceptual expertise. One example of this is memory, which has been known to be influenced by language. Seminal work by Carmichael and collaborators (1932) demonstrated that when verbal labels are connected to non-verbal forms during an individual's encoding process, it could potentially bias the way those forms are reproduced. Because of this, memory performance relying on reportable aspects of memory that encode visual forms should be vulnerable to the effects of verbalization. == Initial findings == Schooler and Engstler-Schooler (1990) were the first to report findings of verbal overshadowing. In their study, participants watched a video of a simulated robbery and were instructed to either verbally describe the robber or engage in a control task. Those who engaged in giving a verbal description were less likely to correctly identify the robber from a test lineup, compared to those who engaged in the control task. A larger effect was detected when the verbal description was provided 20, rather than 5, minutes after the video, and immediately before the test lineup. A meta-analysis by Meissner and Brigham (2001) supported the effects of verbal overshadowing, showing a small but reliably negative effect. == General effects of verbal overshadowing == The effects of verbal overshadowing have been generalized across multiple domains of cognition that are known to rely on non-verbal knowledge and perceptual expertise, such as memory. Memory has been known to be influenced by language. Seminal work by Carmichael and collaborators (1932) demonstrated that labels attached to, or associated with, non-verbal forms during memory encoding can affect the way the forms were subsequently reproduced. Because of this, memory performance that relies on reportable aspects of memory that encode visual forms should be vulnerable to the effects of verbalization. Pelizzon, Brandimonte, and Luccio (2002) found that visual memory representations appear to incorporate visual, spatial, and temporal characteristics. It is explained as follows: With the temporal code (where the only information available is the sequence of the stimuli), performance levels remain high, unless participants are required to retrieve the stimuli in a different order from that used at encoding (visual cue). In this case, performance is significantly impaired, even in the presence of a visual cue. The study showed that order information acts as a link between the two separate representations of figure and background, hence preventing verbal overshadowing at encoding (temporal component) or attenuating its influence at retrieval (spatial component).(p. 960) Hatano, Ueno, Kitagami, and Kawaguchi found that verbal overshadowing is likely to occur when participants verbally described targets in detail. Detailed verbal descriptions resulted in more frequently inaccurate descriptions that in turn created inaccurate representations in the memories of participants. Inaccuracies are also likely to occur when face recognition comes immediately after verbalization. Other forms of non-verbal knowledge affected by verbal overshadowing include the following: [Verbal overshadowing] has also been observed when participants attempt to generate descriptions of other 'difficult-to-describe' stimuli such as colors (Schooler and Engstler-Schooler, 1990) or abstract figures (Brandimonte et al., 1997), or other non-visual tasks such as wine tasting (Melcher and Schooler, 1996), decision making (Wilson and Schooler, 1991), and insight problem-solving. (p. 871) (Schooler et al., 1993) Verbalization of stimuli leads to the disruption of non-reportable processes that are necessary for achieving insight solutions, which are distinct from language processes. Schooler, Ohlsson, and Brooks (1993) found that face recognition requires information that cannot be adequately verbalized, giving rise to difficulty in describing factors in recognition judgments. Subjects were less effective in solving insight problems when compelled to put their thoughts in words, which suggests that language may interfere with thought. The verbal overshadowing effect was not seen when participants engaged in articulatory suppression. Performance was reduced in both the verbal and non-verbal description conditions. This is evidence that verbal encoding plays a role in face recognition. By testing with distracting faces presented between study and test, Lloyd-Jones and Brown (2008) suggested a dual-process approach to recognition memory took place, that verbalization influenced familiarity-based processes at first, but its effects were later seen on recollection, when discrimination between items became more difficult. == Verbal overshadowing in facial recognition == The verbal overshadowing effect can be found for facial recognition because faces are predominately processed in a holistic or configurable manner. (Tanaka & Farah, 1993; Tanaka & Sengco, 1997) Verbalizing one's memory for a face is done using a featural or analytic strategy, leading to a drift from the configurable information about the face and to impaired recognition performance. However, Fallshore & Schooler (1995) found that the verbal overshadowing effect was not found when participants described faces of races different from their own. A study by Brown and Lloyd-Jones (2003) found that there was no verbal overshadowing effect found in car descriptions; it was only seen in facial descriptions. The authors noted that descriptions were no different on any measure including accuracy. It is suggested that less expertise in verbalizing faces rather than cars invokes a stronger shift in verbal and featural processing. This supports the concept of a transfer inappropriate retrieval framework and addresses some limitations of the effect. Wickham and Swift (2006) suggested that the verbal overshadowing effect is not seen in describing all faces, and one aspect that determines this is distinctiveness. Results showed that typical faces produce verbal overshadowing, while distinctive faces did not. In studies of eyewitness reports, variation in response criteria given by participants influenced the quality of the descriptions generated and accuracy on identification task, known as the retrieval-based effect. Face recognition was also impaired when subjects described a familiar face, such as a parent, or when describing a previously seen but novel face. Dodson, Johnson, and Schooler (1997) found that recognition was also impaired when participants were provided with a description of a previously seen face, and they were able to ignore provided versus self-generated descriptions more easily. This finding of verbal overshadowing suggested that eyewitness recognition is not only affected by their own descriptions, but of descriptions heard from others, such other eyewitness testimonies. == Voice recognition == The verbal overshadowing effect has also been found to affect voice identification. Research shows that describing a non-verbal stimuli leads to a decrease in recognition accuracy. In an unpublished study by Schooler, Fiore, Melcher, and Ambadar (1996), participants listened to a tape-recorded voice, after which they were asked either to verbally describe it or to not do so, and then asked to distinguish the voice from 3 similar distractor voices. The results showed that verbal overshadowing impaired accuracy of recognition based on gut feeling, suggesting an overall verbal overshadowing for voice recognition. Due to the forensic relevance of voices heard over the telephone and harassing phone calls that are often a problem for police, Perfect, Hunt, and Harris (2002) examined the influence of three factors on accuracy and confidence in voice recognition from a line-up. They expected to find an effect, because voice represents a class of stimuli that is difficult to describe verbally. This meets Schooler et al.'s (1997) modality mismatch criterion, meaning that describing the speakers age, gender, or accent is difficult, making voice recognition susceptible to the verbal overshadowing phenomenon. It was found that the method of memory encoding had no impact on performance, and that hearing a telephone voice reduced confidence but did not affect accuracy. They also found that providing a verbal description impaired accuracy but had no effect on confidence. The data showed an effect of verbal overshadowing in voice recognition and provided yet another disassociation between confidence and performance. Although there was a difference in confidence level, witnesses were able to identify voices over the telephone as accurately as voices heard direc

Code (cryptography)

In cryptology, a code is a method used to encrypt a message that operates at the level of meaning; that is, words or phrases are converted into something else. A code might transform "change" into "CVGDK" or "cocktail lounge". The U.S. National Security Agency defined a code as "A substitution cryptosystem in which the plaintext elements are primarily words, phrases, or sentences, and the code equivalents (called "code groups") typically consist of letters or digits (or both) in otherwise meaningless combinations of identical length." A codebook is needed to encrypt, and decrypt the phrases or words. By contrast, ciphers encrypt messages at the level of individual letters, or small groups of letters, or even, in modern ciphers, individual bits. Messages can be transformed first by a code, and then by a cipher. Such multiple encryption, or "superencryption" aims to make cryptanalysis more difficult. Another comparison between codes and ciphers is that a code typically represents a letter or groups of letters directly without the use of mathematics. As such the numbers are configured to represent these three values: 1001 = A, 1002 = B, 1003 = C, ... . The resulting message, then would be 1001 1002 1003 to communicate ABC. Ciphers, however, utilize a mathematical formula to represent letters or groups of letters. For example, A = 1, B = 2, C = 3, ... . Thus the message ABC results by multiplying each letter's value by 13. The message ABC, then would be 13 26 39. Codes have a variety of drawbacks, including susceptibility to cryptanalysis and the difficulty of managing the cumbersome codebooks, so ciphers are now the dominant technique in modern cryptography. In contrast, because codes are representational, they are not susceptible to mathematical analysis of the individual codebook elements. In the example, the message 13 26 39 can be cracked by dividing each number by 13 and then ranking them alphabetically. However, the focus of codebook cryptanalysis is the comparative frequency of the individual code elements matching the same frequency of letters within the plaintext messages using frequency analysis. In the above example, the code group, 1001, 1002, 1003, might occur more than once and that frequency might match the number of times that ABC occurs in plain text messages. (In the past, or in non-technical contexts, code and cipher are often used to refer to any form of encryption). == One- and two-part codes == Codes are defined by "codebooks" (physical or notional), which are dictionaries of codegroups listed with their corresponding plaintext. Codes originally had the codegroups assigned in 'plaintext order' for convenience of the code designed, or the encoder. For example, in a code using numeric code groups, a plaintext word starting with "a" would have a low-value group, while one starting with "z" would have a high-value group. The same codebook could be used to "encode" a plaintext message into a coded message or "codetext", and "decode" a codetext back into plaintext message. In order to make life more difficult for codebreakers, codemakers designed codes with no predictable relationship between the codegroups and the ordering of the matching plaintext. In practice, this meant that two codebooks were now required, one to find codegroups for encoding, the other to look up codegroups to find plaintext for decoding. Such "two-part" codes required more effort to develop, and twice as much effort to distribute (and discard safely when replaced), but they were harder to break. The Zimmermann Telegram in January 1917 used the German diplomatic "0075" two-part code system which contained upwards of 10,000 phrases and individual words. == One-time code == A one-time code is a prearranged word, phrase or symbol that is intended to be used only once to convey a simple message, often the signal to execute or abort some plan or confirm that it has succeeded or failed. One-time codes are often designed to be included in what would appear to be an innocent conversation. Done properly they are almost impossible to detect, though a trained analyst monitoring the communications of someone who has already aroused suspicion might be able to recognize a comment like "Aunt Bertha has gone into labor" as having an ominous meaning. Famous example of one time codes include: In the Bible, Jonathan prearranges a code with David, who is going into hiding from Jonathan's father, King Saul. If, during archery practice, Jonathan tells the servant retrieving arrows "the arrows are on this side of you," David may safely return to court; if the command is "the arrows are beyond you," David must flee. "One if by land; two if by sea" in "Paul Revere's Ride" made famous in the poem by Henry Wadsworth Longfellow "Climb Mount Niitaka" - the signal to Japanese planes to begin the attack on Pearl Harbor During World War II the British Broadcasting Corporation's overseas service frequently included "personal messages" as part of its regular broadcast schedule. The seemingly nonsensical stream of messages read out by announcers were actually one time codes intended for Special Operations Executive (SOE) agents operating behind enemy lines. An example might be "The princess wears red shoes" or "Mimi's cat is asleep under the table". Each code message was read out twice. By such means, the French Resistance were instructed to start sabotaging rail and other transport links the night before D-day. "Over all of Spain, the sky is clear" was a signal (broadcast on radio) to start the nationalist military revolt in Spain on July 17, 1936. Sometimes messages are not prearranged and rely on shared knowledge hopefully known only to the recipients. An example is the telegram sent to U.S. President Harry Truman, then at the Potsdam Conference to meet with Soviet premier Joseph Stalin, informing Truman of the first successful test of an atomic bomb. "Operated on this morning. Diagnosis not yet complete but results seem satisfactory and already exceed expectations. Local press release necessary as interest extends great distance. Dr. Groves pleased. He returns tomorrow. I will keep you posted." == Idiot code == An idiot code is a code that is created by the parties using it. This type of communication is akin to the hand signals used by armies in the field. Example: Any sentence where 'day' and 'night' are used means 'attack'. The location mentioned in the following sentence specifies the location to be attacked. Plaintext: Attack X. Codetext: We walked day and night through the streets but couldn't find it! Tomorrow we'll head into X. An early use of the term appears to be by George Perrault, a character in the science fiction book Friday by Robert A. Heinlein: The simplest sort [of code] and thereby impossible to break. The first ad told the person or persons concerned to carry out number seven or expect number seven or it said something about something designated as seven. This one says the same with respect to code item number ten. But the meaning of the numbers cannot be deduced through statistical analysis because the code can be changed long before a useful statistical universe can be reached. It's an idiot code... and an idiot code can never be broken if the user has the good sense not to go too often to the well. Terrorism expert Magnus Ranstorp said that the men who carried out the September 11 attacks on the United States used basic e-mail and what he calls "idiot code" to discuss their plans. == Cryptanalysis of codes == While solving a monoalphabetic substitution cipher is easy, solving even a simple code is difficult. Decrypting a coded message is a little like trying to translate a document written in a foreign language, with the task basically amounting to building up a "dictionary" of the codegroups and the plaintext words they represent. One fingerhold on a simple code is the fact that some words are more common than others, such as "the" or "a" in English. In telegraphic messages, the codegroup for "STOP" (i.e., end of sentence or paragraph) is usually very common. This helps define the structure of the message in terms of sentences, if not their meaning, and this is cryptanalytically useful. Further progress can be made against a code by collecting many codetexts encrypted with the same code and then using information from other sources spies newspapers diplomatic cocktail party chat the location from where a message was sent where it was being sent to (i.e., traffic analysis) the time the message was sent, events occurring before and after the message was sent the normal habits of the people sending the coded messages etc. For example, a particular codegroup found almost exclusively in messages from a particular army and nowhere else might very well indicate the commander of that army. A codegroup that appears in messages preceding an attack on a particular location may very well stand for that location. Cribs can be an immediate giveaway to the definiti

ServerNet

ServerNet is a switched fabric communications link primarily used in proprietary computers made by Tandem Computers, Compaq, and HP. Its features include good scalability, clean fault containment, error detection and failover. The ServerNet architecture specification defines a connection between nodes, either processor or high performance I/O nodes such as storage devices. == History == Tandem Computers developed the original ServerNet architecture and protocols for use in its own proprietary computer systems starting in 1992, and released the first ServerNet systems in 1995. Early attempts to license the technology and interface chips to other companies failed, due in part to a disconnect between the culture of selling complete hardware / software / middleware computer systems and that needed for selling and supporting chips and licensing technology. A follow-on development effort ported the Virtual Interface Architecture to ServerNet with PCI interface boards connecting personal computers. Infiniband directly inherited many ServerNet features. As of 2017, systems still ship based on the ServerNet architecture.

Reverse proxy

In computer networks, a reverse proxy or surrogate server is a proxy server that appears to any client to be an ordinary web server, but in reality merely acts as an intermediary that forwards the client's requests to one or more ordinary web servers. Reverse proxies help increase scalability, performance, resilience, and security, but they also carry a number of risks. Companies that run web servers often set up reverse proxies to facilitate the communication between an Internet user's browser and the web servers. An important advantage of doing so is that the web servers can be hidden behind a firewall on a company-internal network, and only the reverse proxy needs to be directly exposed to the Internet. Reverse proxy servers are implemented in popular open-source web servers. Dedicated reverse proxy servers are used by some of the biggest websites on the Internet. A reverse proxy is capable of tracking IP addresses of requests that are relayed through it as well as reading and/or modifying any non-encrypted traffic. However, this implies that anyone who has compromised the server could do so as well. Reverse proxies differ from forward proxies, which are used when the client is restricted to a private, internal network and asks a forward proxy to retrieve resources from the public Internet. == Uses == Large websites and content delivery networks use reverse proxies, together with other techniques, to balance the load between internal servers. Reverse proxies can keep a cache of static content, which further reduces the load on these internal servers and the internal network. It is also common for reverse proxies to add features such as compression or TLS encryption to the communication channel between the client and the reverse proxy. Reverse proxies can inspect HTTP headers, which, for example, allows them to present a single IP address to the Internet while relaying requests to different internal servers based on the URL of the HTTP request. Reverse proxies can hide the existence and characteristics of origin servers. This can make it more difficult to determine the actual location of the origin server / website and, for instance, more challenging to initiate legal action such as takedowns or block access to the website, as the IP address of the website may not be immediately apparent. Additionally, the reverse proxy may be located in a different jurisdiction with different legal requirements, further complicating the takedown process. Application firewall features can protect against common web-based attacks, like a denial-of-service attack (DoS) or distributed denial-of-service attacks (DDoS). Without a reverse proxy, removing malware or initiating takedowns (while simultaneously dealing with the attack) on one's own site, for example, can be difficult. In the case of secure websites, a web server may not perform TLS encryption itself, but instead offload the task to a reverse proxy that may be equipped with TLS acceleration hardware. (See TLS termination proxy.) A reverse proxy can distribute the load from incoming requests to several servers, with each server supporting its own application area. In the case of reverse proxying web servers, the reverse proxy may have to rewrite the URL in each incoming request in order to match the relevant internal location of the requested resource. A reverse proxy can reduce load on its origin servers by caching static content and dynamic content, known as web acceleration. Proxy caches of this sort can often satisfy a considerable number of website requests, greatly reducing the load on the origin server(s). A reverse proxy can optimize content by compressing it in order to speed up loading times. In a technique named "spoon-feeding", a dynamically generated page can be produced in its entirety and served to the reverse proxy, which can feed the page to the client as the connection allows. The program that generates the page need not remain open, thus releasing server resources during the possibly extended time the client requires to complete the transfer. Reverse proxies can operate wherever multiple web-servers must be accessible via a single public IP address. The web servers listen on different ports in the same machine, with the same local IP address or, possibly, on different machines with different local IP addresses. The reverse proxy analyzes each incoming request and delivers it to the right server within the local area network. Reverse proxies can perform A/B testing and multivariate testing without requiring application code to handle the logic of which version is served to a client. A reverse proxy can add access authentication to a web server that does not have any authentication. == Risks == When the transit traffic is encrypted and the reverse proxy needs to filter/cache/compress or otherwise modify or improve the traffic, the proxy first must decrypt and re-encrypt communications. This requires the proxy to possess the TLS certificate and its corresponding private key, extending the number of systems that can have access to non-encrypted data and making it a more valuable target for attackers. The vast majority of external data breaches happen either when hackers succeed in abusing an existing reverse proxy that was intentionally deployed by an organization, or when hackers succeed in converting an existing Internet-facing server into a reverse proxy server. Compromised or converted systems allow external attackers to specify where they want their attacks proxied to, enabling their access to internal networks and systems. Applications that were developed for the internal use of a company are not typically hardened to public standards and are not necessarily designed to withstand all hacking attempts. When an organization allows external access to such internal applications via a reverse proxy, they might unintentionally increase their own attack surface and invite hackers. If a reverse proxy is not configured to filter attacks or it does not receive daily updates to keep its attack signature database up to date, a zero-day vulnerability can pass through unfiltered, enabling attackers to gain control of the system(s) that are behind the reverse proxy server. Giving the reverse proxy of a third party access to private keys (for caching or optimizing content) places the entire triad of confidentiality, integrity and availability in the hands of the third party who operates the proxy. A reverse proxy is a single point of failure for the back-end services it fronts: an outage caused by misconfiguration, a denial-of-service attack, or a software fault can make every fronted service unreachable to outside clients, even when the back-end services themselves remain healthy. For example, a 2020 outage at Cloudflare briefly took down major sites and services that relied on its reverse-proxy edge, including Discord.

Color reproduction

Color reproduction is an aspect of color science concerned with producing light spectra that evoke a desired color, either through additive (light emitting) or subtractive (surface color) models. It converts physical correlates of color perception (CIE 1931 XYZ color space tristimulus values and related quantities) into light spectra that can be experienced by observers. In this way, it is the opposite of colorimetry. It is concerned with the faithful reproduction of a color in one medium, with a color in another, so it is a central concept in color management and relies heavily on color calibration. For example, food packaging must be able to faithfully reproduce the colors of the foods therein in order to appeal to a customer. This involves proper color calibration of at least four devices: Lighting, which must have a high color rendering index and not give a color cast to the object. Camera, which measures the reflected spectrum of the object and converts to a trichromatic color space (e.g. RGB). Screen, which reproduces color so a designer can proof the captured image and make color corrections as necessary. Printer, which reproduces the final color on paper.

Data product

In data management and product management, a data product is a reusable, active, and standardized data asset designed to deliver measurable value to its users, whether internal or external, by applying the rigorous principles of product thinking and management. It comprises one or more data artifacts (e.g., datasets, models, pipelines) and is enriched with metadata, including governance policies, data quality rules, data contracts, and, where applicable, a software bill of materials (SBOM) to document its dependencies and components. Ownership of a data product is aligned to a specific domain or use case, ensuring accountability, stewardship, and its continuous evolution throughout its lifecycle. Adhering to the FAIR principles – findable, accessible, interoperable, and reusable – a data product is designed to be discoverable, scalable, reusable, and aligned with both business and regulatory standards, driving innovation and efficiency in modern data ecosystems. == History == In 2012, DJ Patil proposed the first documented definition: a data product is a product that facilitates an end goal through the use of data. In 2019, Zhamak Dehghani introduced Data Mesh, with a strong focus on domain-oriented data products. Later, in 2020, she solidifies Data Mesh around four principles, one being Data as a Product, in which she defines Data Product as the node on the mesh that encapsulates three structural components required for its function, providing access to the domain's analytical data as a product. In 2024, Andrea Gioia published one of the first books specifically on data products post Data Mesh announcement. In his book, Gioia defines the concept of pure data product. In 2025, during the Data Day Texas conference, Jean-Georges Perrin and a collective of product managers and data engineers got together to craft the current definition and make it available to the public domain. In July 2025, Bitol, a project of The Linux Foundation, released and early version of the Open Data Product Standard (ODPS) aiming at normalizing data products