AI Art Is Real Art

AI Art Is Real Art — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Evolutionary robotics

    Evolutionary robotics

    Evolutionary robotics is an embodied approach to Artificial Intelligence (AI) in which robots are automatically designed using Darwinian principles of natural selection. The design of a robot, or a subsystem of a robot such as a neural controller, is optimized against a behavioral goal (e.g. run as fast as possible). Usually, designs are evaluated in simulations as fabricating thousands or millions of designs and testing them in the real world is prohibitively expensive in terms of time, money, and safety. An evolutionary robotics experiment starts with a population of randomly generated robot designs. The worst performing designs are discarded and replaced with mutations and/or combinations of the better designs. This evolutionary algorithm continues until a prespecified amount of time elapses or some target performance metric is surpassed. Evolutionary robotics methods are particularly useful for engineering machines that must operate in environments in which humans have limited intuition (nanoscale, space, etc.). Evolved simulated robots can also be used as scientific tools to generate new hypotheses in biology and cognitive science, and to test old hypothesis that require experiments that have proven difficult or impossible to carry out in reality. == History == In the early 1990s, two separate European groups demonstrated different approaches to the evolution of robot control systems. Dario Floreano and Francesco Mondada at EPFL evolved controllers for the Khepera robot. Adrian Thompson, Nick Jakobi, Dave Cliff, Inman Harvey, and Phil Husbands evolved controllers for a Gantry robot at the University of Sussex. However the body of these robots was presupposed before evolution. The first simulations of evolved robots were reported by Karl Sims and Jeffrey Ventrella of the MIT Media Lab, also in the early 1990s. However these so-called virtual creatures never left their simulated worlds. The first evolved robots to be built in reality were 3D-printed by Hod Lipson and Jordan Pollack at Brandeis University at the turn of the 21st century.

    Read more →
  • Kerckhoffs's principle

    Kerckhoffs's principle

    Kerckhoffs's principle (also called Kerckhoffs's desideratum, assumption, axiom, doctrine or law) of cryptography was stated by the Dutch cryptographer Auguste Kerckhoffs in the 19th century. The principle holds that a cryptosystem should be secure, even if everything about the system, except the key, is public knowledge. This concept is widely embraced by cryptographers, in contrast to security through obscurity, which is not. Kerckhoffs's principle was phrased by the American mathematician Claude Shannon as "the enemy knows the system", i.e., "one ought to design systems under the assumption that the enemy will immediately gain full familiarity with them". In that form, it is called Shannon's maxim. Another formulation by American researcher and professor Steven M. Bellovin is: In other words—design your system assuming that your opponents know it in detail. (A former official at NSA's National Computer Security Center told me that the standard assumption there was that serial number 1 of any new device was delivered to the Kremlin.) == Origins == The invention of telegraphy radically changed military communications and increased the number of messages that needed to be protected from the enemy dramatically, leading to the development of field ciphers which had to be easy to use without large confidential codebooks prone to capture on the battlefield. It was this environment which led to the development of Kerckhoffs's requirements. Auguste Kerckhoffs was a professor of German language at Ecole des Hautes Etudes Commerciales (HEC) in Paris. In early 1883, Kerckhoffs's article, La Cryptographie Militaire, was published in two parts in the Journal of Military Science, in which he stated six design rules for military ciphers. Translated from French, they are: The system must be practically, if not mathematically, indecipherable; It should not require secrecy, and it should not be a problem if it falls into enemy hands; It must be possible to communicate and remember the key without using written notes, and correspondents must be able to change or modify it at will; It must be applicable to telegraph communications; It must be portable, and should not require several persons to handle or operate; Lastly, given the circumstances in which it is to be used, the system must be easy to use and should not be stressful to use or require its users to know and comply with a long list of rules. Some are no longer relevant given the ability of computers to perform complex encryption. The second rule, now known as Kerckhoffs's principle, is still critically important. == Explanation of the principle == Kerckhoffs viewed cryptography as a rival to, and a better alternative than, steganographic encoding, which was common in the nineteenth century for hiding the meaning of military messages. One problem with encoding schemes is that they rely on humanly-held secrets such as "dictionaries" which disclose for example, the secret meaning of words. Steganographic-like dictionaries, once revealed, permanently compromise a corresponding encoding system. Another problem is that the risk of exposure increases as the number of users holding the secrets increases. Nineteenth century cryptography, in contrast, used simple tables which provided for the transposition of alphanumeric characters, generally given row-column intersections which could be modified by keys which were generally short, numeric, and could be committed to human memory. The system was considered "indecipherable" because tables and keys do not convey meaning by themselves. Secret messages can be compromised only if a matching set of table, key, and message falls into enemy hands in a relevant time frame. Kerckhoffs viewed tactical messages as only having a few hours of relevance. Systems are not necessarily compromised, because their components (i.e. alphanumeric character tables and keys) can be easily changed. === Advantage of secret keys === Using secure cryptography is supposed to replace the difficult problem of keeping messages secure with a much more manageable one, keeping relatively small keys secure. A system that requires long-term secrecy for something as large and complex as the whole design of a cryptographic system obviously cannot achieve that goal. It only replaces one hard problem with another. However, if a system is secure even when the enemy knows everything except the key, then all that is needed is to manage keeping the keys secret. There are a large number of ways the internal details of a widely used system could be discovered. The most obvious is that someone could bribe, blackmail, or otherwise threaten staff or customers into explaining the system. In war, for example, one side will probably capture some equipment and people from the other side. Each side will also use spies to gather information. If a method involves software, someone could do memory dumps or run the software under the control of a debugger in order to understand the method. If hardware is being used, someone could buy or steal some of the hardware and build whatever programs or gadgets needed to test it. Hardware can also be dismantled so that the chip details can be examined under the microscope. === Maintaining security === A generalization some make from Kerckhoffs's principle is: "The fewer and simpler the secrets that one must keep to ensure system security, the easier it is to maintain system security." Bruce Schneier ties it in with a belief that all security systems must be designed to fail as gracefully as possible: Kerckhoffs's principle applies beyond codes and ciphers to security systems in general: every secret creates a potential failure point. Secrecy, in other words, is a prime cause of brittleness—and therefore something likely to make a system prone to catastrophic collapse. Conversely, openness provides ductility. Any security system depends crucially on keeping some things secret. However, Kerckhoffs's principle points out that the things kept secret ought to be those least costly to change if inadvertently disclosed. For example, a cryptographic algorithm may be implemented by hardware and software that is widely distributed among users. If security depends on keeping that secret, then disclosure leads to major logistic difficulties in developing, testing, and distributing implementations of a new algorithm – it is "brittle". On the other hand, if keeping the algorithm secret is not important, but only the keys used with the algorithm must be secret, then disclosure of the keys simply requires the simpler, less costly process of generating and distributing new keys. == Applications == In accordance with Kerckhoffs's principle, the majority of civilian cryptography makes use of publicly known algorithms. By contrast, ciphers used to protect classified government or military information are often kept secret (see Type 1 encryption). However, it should not be assumed that government/military ciphers must be kept secret to maintain security. It is possible that they are intended to be as cryptographically sound as public algorithms, and the decision to keep them secret is in keeping with a layered security posture. == Security through obscurity == It is moderately common for companies to keep the inner workings of a system secret. Some argue this "security by obscurity" makes the product safer and less vulnerable to attack. A counter-argument is that keeping the innards secret may improve security in the short term, but in the long run, only systems that have been published and analyzed should be trusted. Steven Bellovin and Randy Bush commented: Security Through Obscurity Considered Dangerous Hiding security vulnerabilities in algorithms, software, and/or hardware decreases the likelihood they will be repaired and increases the likelihood that they can and will be exploited. Discouraging or outlawing discussion of weaknesses and vulnerabilities is extremely dangerous and deleterious to the security of computer systems, the network, and its citizens. Open Discussion Encourages Better Security The long history of cryptography and cryptoanalysis has shown time and time again that open discussion and analysis of algorithms exposes weaknesses not thought of by the original authors, and thereby leads to better and more secure algorithms. As Kerckhoffs noted about cipher systems in 1883 [Kerc83], "Il faut qu'il n'exige pas le secret, et qu'il puisse sans inconvénient tomber entre les mains de l'ennemi." (Roughly, "the system must not require secrecy and must be able to be stolen by the enemy without causing trouble.")

    Read more →
  • Social media stock bubble

    Social media stock bubble

    The social media bubble is a hypothesis stating that there was a speculative boom and bust phenomenon in the field of social media in the 2010s, particularly in the United States. The Wall Street Journal defined a bubble as stocks "priced above a level that can be justified by economic fundamentals," but this bubble includes social media. Social networking services (SNS) have seen huge growth since 2006, but some investors believed around 2014-2015, that the "bubble" was similar to the dot-com bubble of the late 1990s and early 2000s. In 2015, Mark Cuban, owner of the Dallas Mavericks NBA team and star of the TV show, Shark Tank, sounded an alarm on his personal blog over the social media bubble, calling it worse than the tech bubble in 2000 due to the lack of liquidity in social media stocks. A year prior, however, Cuban told CNBC that he did not believe social media stocks were on the verge of a bubble. In a letter to investors in 2014, David Einhorn, who runs the hedge-fund Greenlight Capital, wrote that "we are witnessing our second tech bubble in 15 years." He went on to write, "What is uncertain is how much further the bubble can expand, and what might pop it." Einhorn cited several factors supporting the existence an over-exuberance including "rejection of conventional valuation methods" and "huge first day IPO pops for companies that have done little more than use the right buzzwords and attract the right venture capital." Since those claims, services like Facebook, Twitter, Instagram, and Snapchat have grown to become multi-billion-dollar corporations generating enormous revenues, though some continue to lose money. == History of social networking services == Social networking services have grown and evolved with time since the launch of SixDegrees.com in 1997. Cutting edge at its time, SixDegrees.com allowed users to create a profile, invite friends, and connect within its platform. At its peak, SixDegrees.com had more than 3.5 million users. Between 1997 and 2001 more social sites aimed at allowing users to connect with others for personal, professional, or dating reasons. Friendster and MySpace were next to enter the social SNS arena, followed by Facebook in 2004. Even though MySpace had a following of more than 300 million users, it could not compete with Facebook, which now has overtaken the social networking world. However, as development of SNS started to emerge, a market saturation began to take effect. Some classrooms have begun to incorporate technology in daily learning as well as social channels specific to student's course work. Traditional social media sites are used, as are educational oriented sites such as ShowMe and Educreations Interactive Whiteboard. == Controversies == While SNS continue to play an influential role in helping people form real-world connections via the Internet, renewed concerns over the social media bubble have surfaced due to recent controversies. These threats include growing concerns about breaches in data, the rise of bot accounts, and the sharing of fake news on SNS platforms. There are also concerns that big data figures associated with these SNS are inflated or fake, as well as worries about the role the platforms played in national elections (see Russian interference in the 2016 United States elections). These issues have resulted in a lack of trust among the sites' users.

    Read more →
  • Letter frequency

    Letter frequency

    Letter frequency is the number of times letters of the alphabet appear on average in written language. Letter frequency analysis dates back to the Arab mathematician Al-Kindi (c. AD 801–873), who formally developed the method to break ciphers. Letter frequency analysis gained importance in Europe with the development of movable type in AD 1450, wherein one must estimate the amount of type required for each letterform. Linguists use letter frequency analysis as a rudimentary technique for language identification, where it is particularly effective as an indication of whether an unknown writing system is alphabetic, syllabic, or logographic. The use of letter frequencies and frequency analysis plays a fundamental role in cryptograms and several word puzzle games, including hangman, Scrabble, Wordle and the television game show Wheel of Fortune. One of the earliest descriptions in classical literature of applying the knowledge of English letter frequency to solving a cryptogram is found in Edgar Allan Poe's famous story "The Gold-Bug", where the method is successfully applied to decipher a message giving the location of a treasure hidden by Captain Kidd. Herbert S. Zim, in his classic introductory cryptography text Codes and Secret Writing, gives the English letter frequency sequence as "ETAON RISHD LFCMU GYPWB VKJXZQ", the most common letter pairs as "TH HE AN RE ER IN ON AT ND ST ES EN OF TE ED OR TI HI AS TO", and the most common doubled letters as "LL EE SS OO TT FF RR NN PP CC". Different ways of counting can produce somewhat different orders. Letter frequencies also have a strong effect on the design of some keyboard layouts. The most frequent letters are placed on the home row of the Blickensderfer typewriter, the Dvorak keyboard layout, Colemak and other optimized layouts, while the commonly used QWERTY layout places common letters apart from each other to prevent typewriter jamming. == Background == The frequency of letters in text has been studied for use in cryptanalysis, and frequency analysis in particular, dating back to the Arab mathematician al-Kindi (c. AD 801–873 ), who formally developed the method (the ciphers breakable by this technique go back at least to the Caesar cipher used by Julius Caesar, so this method could have been explored in classical times). Letter frequency analysis gained additional importance in Europe with the development of movable type in AD 1450, wherein one must estimate the amount of type required for each letterform, as evidenced by the variations in letter compartment size in typographer's type cases. No exact letter frequency distribution underlies a given language, since all writers write slightly differently. However, most languages have a characteristic distribution which is strongly apparent in longer texts. Even language changes as extreme as from Old English to modern English (regarded as mutually unintelligible) show strong trends in related letter frequencies: over a small sample of Biblical passages, from most frequent to least frequent, enaid sorhm tgþlwu æcfy ðbpxz of Old English compares to eotha sinrd luymw fgcbp kvjqxz of modern English, with the most extreme differences concerning letterforms not shared. Linotype machines for the English language assumed the letter order, from most to least common, to be etaoin shrdlu cmfwyp vbgkqj xz based on the experience and custom of manual compositors. The equivalent for the French language was elaoin sdrétu cmfhyp vbgwqj xz. Arranging the alphabet in Morse into groups of letters that require equal amounts of time to transmit, and then sorting these groups in increasing order, yields e it san hurdm wgvlfbk opxcz jyq. Letter frequency was used by other telegraph systems, such as the Murray Code. Similar ideas are used in modern data-compression techniques such as Huffman coding. Letter frequencies, like word frequencies, tend to vary, both by writer and by subject. For instance, ⟨d⟩ occurs with greater frequency in fiction, as most fiction is written in past tense and thus most verbs will end in the inflectional suffix -ed / -d. One cannot write an essay about x-rays without using ⟨x⟩ frequently, and the essay will have an idiosyncratic letter frequency if the essay is about, say, Queen Zelda of Zanzibar requesting X-rays from Qatar to examine hypoxia in zebras. Different authors have habits which can be reflected in their use of letters. Hemingway's writing style, for example, is visibly different from Faulkner's. Letter, bigram, trigram, word frequencies, word length, and sentence length can be calculated for specific authors and used to prove or disprove authorship of texts, even for authors whose styles are not so divergent. Accurate average letter frequencies can only be gleaned by analyzing a large amount of representative text. With the availability of modern computing and collections of large text corpora, such calculations are easily made. Examples can be drawn from a variety of sources (press reporting, religious texts, scientific texts and general fiction) and there are differences especially for general fiction with the position of ⟨h⟩ and ⟨i⟩, with ⟨h⟩ becoming more common. Different dialects of a language will also affect a letter's frequency. For example, an author in the United States would produce something in which ⟨z⟩ is more common than an author in the United Kingdom writing on the same topic: words like "analyze", "apologize", and "recognize" contain the letter in American English, whereas the same words are spelled "analyse", "apologise", and "recognise" in British English. This would highly affect the frequency of the letter ⟨z⟩, as it is rarely used by British writers in the English language. The "top twelve" letters constitute about 80% of the total usage. The "top eight" letters constitute about 65% of the total usage. Letter frequency as a function of rank can be fitted well by several rank functions, with the two-parameter Cocho/Beta rank function being the best. Another rank function with no adjustable free parameter also fits the letter frequency distribution reasonably well (the same function has been used to fit the amino acid frequency in protein sequences.) A spy using the VIC cipher or some other cipher based on a straddling checkerboard typically uses a mnemonic such as "a sin to err" (dropping the second "r") or "at one sir" to remember the top eight characters. == Relative frequencies of letters in the English language == There are three ways to count letter frequency that result in very different charts for common letters. The first method, used in the chart below, is to count letter frequency in lemmas of a dictionary. The lemma is the word in its canonical form. The second method is to include all word variants when counting, such as "abstracts", "abstracted" and "abstracting" and not just the lemma of "abstract". This second method results in letters like ⟨s⟩ appearing much more frequently, such as when counting letters from lists of the most used English words on the Internet. ⟨s⟩ is especially common in inflected words (non-lemma forms) because it is added to form plurals and third person singular present tense verbs. A final method is to count letters based on their frequency of use in actual texts, resulting in certain letter combinations like ⟨th⟩ becoming more common due to the frequent use of common words like "the", "then", "both", "this", etc. Absolute usage frequency measures like this are used when creating keyboard layouts or letter frequencies in old fashioned printing presses. An analysis of entries in the Concise Oxford dictionary, ignoring frequency of word use, gives an order of "EARIOTNSLCUDPMHGBFYWKVXZJQ". The letter-frequency table above is taken from Pavel Mička's website, which cites Robert Lewand's Cryptological Mathematics. According to Lewand, arranged from most to least common in appearance, the letters are: etaoinshrdlcumwfgypbvkjxqz. Lewand's ordering differs slightly from others, such as Cornell University Math Explorer's Project, which produced a table after measuring 40,000 words. In English, the space character occurs almost twice as frequently as the top letter (⟨e⟩) and the non-alphabetic characters (digits, punctuation, etc.) collectively occupy the fourth position (having already included the space) between ⟨t⟩ and ⟨a⟩. == Relative frequencies of the first letters of a word in the English language == The frequency of the first letters of words or names is helpful in pre-assigning space in physical files and indexes. Given 26 filing cabinet drawers, rather than a 1:1 assignment of one drawer to one letter of the alphabet, it is often useful to use a more equal-frequency-letter code by assigning several low-frequency letters to the same drawer (often one drawer is labeled VWXYZ), and to split up the most-frequent initial letters (⟨s, a, c⟩) into several drawers (often 6 drawers Aa-An, Ao-Az, Ca-Cj, Ck-Cz, Sa-Si, Sj-Sz). The same system is used in some mult

    Read more →
  • Flo (app)

    Flo (app)

    Flo is a period-tracking app that provides menstrual cycle, ovulation and pregnancy tracking as well as perimenopause symptom tracking that was developed by Flo Health, Inc. It has over 380 million downloads worldwide and over 70 million monthly active users as of November 2024. In mid-2024, it reached unicorn status, and became Europe’s first femtech unicorn. The company has been accused of sharing users' sensitive health data with third parties without consent and misleading its users about data practices. == History == Flo Health, Inc. was co-founded in 2015 by Dmitry and Yuri Gurski, in Belarus. Their backgrounds helped build the first version of the software having experience in other fitness and health apps. Dmitry serves as the company's CEO. The company's development hubs are in London, Amsterdam and Vilnius. In 2016, the company raised $1 million in seed round funding from Flint Capital and Haxus Venture Fund. In 2017, Flo received an investment of $5 million from Flint Capital and model Natalia Vodianova with Vodianova helping develop an awareness campaign for the company. In 2018, Flo received an investment of $6 million from Mangrove Capital Partners, with participation from Flint Capital and Haxus, giving the company a valuation of $200 million. In mid-2019, Flo received an additional investment of $7.5 million led by Founders Fund. In 2020, the Federal Trade Commission alleged that Flo had misled users about its handling of health information to third parties including Google, Facebook, AppsFlyer, and Flurry since 2016. These allegations followed a 2019 report by The Wall Street Journal in reference to Facebook. The company reached a settlement in 2021 and was required to notify users of how their personal information was shared and obtain permission before any further information was shared. The agreement also required that Flo to undertake an independent privacy audit which it completed in March 2022. In early September 2021, Flo announced it closed $50M in a Series B financing, bringing the total capital raised to $65 million and company valuation to $800M led by VNV Global and Target Global. In March 2024, the Supreme Court of British Columbia certified a class action suit against Flo for sharing intimate data with Facebook and other third parties without user knowledge. In July 2024, Flo announced it raised more than $200M in Series C financing from General Atlantic bringing its valuation beyond $1 billion. As of November 2024, the app had over 380 million downloads world wide, and over 70 million monthly active users. In 2025, Flo adopted a data intelligence platform from Databricks to power its analytics and AI features, allowing users personalized cycle predictions. In 2025, a class action lawsuit in California was settled for $56 million with Flo paying $8 million and Google paying $48 million. == Features and privacy == Flo was initially created as a period and ovulation tracking application. It now provides reminders of upcoming menstrual cycles and a place to record various other health symptoms such as contraceptive methods, vaginal discharge (leukorrhea), water intake, pains, mood swings, and sexual activity. The application is available on iOS and Android. Flo is free to download and the free basic version gives you access to period and ovulation tracking and predictions, symptom tracking, cycle history, and anonymous mode. In Pregnancy mode, the app provides tracking features and educational material for pregnancy. In October 2023, Flo launched Flo for Partners, a feature that allows users to share their Flo data with their partner. In September 2022, as a response to Roe v. Wade being overturned, Flo sped up the release of a feature called "Anonymous Mode". Flo said this mode allows users to access the app without any personal identifiers such as name, email address, or technical identifiers being associated with their health data. Flo said it uses a technology called Oblivious HTTP to help protect user privacy in Anonymous Mode. == Recognition == Flo was named to Bloomberg’s Top 25 UK Startups to Watch for 2024. Flo's Anonymous Mode feature was recognized on both Fast Company's World Changing Ideas 2023 and TIME's Best Inventions List 2023. Flo is a CES 2019 Innovation Awards Honoree in the Software and Mobile Applications category.

    Read more →
  • Unknown key-share attack

    Unknown key-share attack

    As defined by Blake-Wilson & Menezes (1999), an unknown key-share (UKS) attack on an authenticated key agreement (AK) or authenticated key agreement with key confirmation (AKC) protocol is an attack whereby an entity A {\displaystyle A} ends up believing she shares a key with B {\displaystyle B} , and although this is in fact the case, B {\displaystyle B} mistakenly believes the key is instead shared with an entity E ≠ A {\displaystyle E\neq A} . In other words, in a UKS, an opponent, say Eve, coerces honest parties Alice and Bob into establishing a secret key where at least one of Alice and Bob does not know that the secret key is shared with the other. For example, Eve may coerce Bob into believing he shares the key with Eve, while he actually shares the key with Alice. The “key share” with Alice is thus unknown to Bob.

    Read more →
  • Sumazi

    Sumazi

    Sumazi is a social media and social intelligence platform for enterprises, brands, and celebrities. Its technology performs social data analysis across social networking services including Facebook, Twitter and LinkedIn, to identify key people in his/her network who are experts, influencers or are located in a specific area for marketing, advertising or sales campaigns. The technology company was founded in 2011 by former Sun Microsystems employee Sumaya Kazi. The company was headquartered in San Francisco, California. The company was out of business by 2017. == Reception == Sumazi was one of 25 startups selected out of more than 1,200 to compete at TechCrunch Disrupt Startup Battlefield, where it won the Omidyar Network award for the startup "Most Likely to Change the World." Sumazi, which was based out of San Francisco, California, had been profiled in The New York Times as well as USA Today, which commented the advantages of the startup's location in the Silicon Valley. American Express OPEN Forum also featured Sumazi as a "Startup of the Week". Sumazi has additionally been mentioned in articles by Mashable, The Wall Street Journal, Current Editorials, Harvard Business Review, Smashing Magazine, and TechCrunch.

    Read more →
  • Frame (networking)

    Frame (networking)

    A frame is a digital data transmission unit in computer networking and telecommunications. In packet switched systems, a frame is a simple container for a single network packet. In other telecommunications systems, a frame is a repeating structure supporting time-division multiplexing. A frame typically includes frame synchronization features consisting of a sequence of bits or symbols that indicate to the receiver the beginning and end of the payload data within the stream of symbols or bits it receives. If a receiver is connected to the system during frame transmission, it ignores the data until it detects a new frame synchronization sequence. == Packet switching == In the OSI model of computer networking, a frame is the protocol data unit at the data link layer. Frames are the result of the final layer of encapsulation before the data is transmitted over the physical layer. A frame is "the unit of transmission in a link layer protocol, and consists of a link layer header followed by a packet." Each frame is separated from the next by an interframe gap. A frame is a series of bits generally composed of frame synchronization bits, the packet payload, and a frame check sequence. Examples are Ethernet frames, Wi-Fi frames, 4G frames, Point-to-Point Protocol (PPP) frames, Fibre Channel frames, and V.42 modem frames. Often, frames of several different sizes are nested inside each other. For example, when using Point-to-Point Protocol (PPP) over asynchronous serial communication, the eight bits of each individual byte are framed by start and stop bits, the payload data bytes in a network packet are framed by the header and footer, and several packets can be framed with frame boundary octets. == Time-division multiplex == In telecommunications, specifically in time-division multiplex (TDM) and time-division multiple access (TDMA) variants, a frame is a cyclically repeated data block that consists of a fixed number of time slots, one for each logical TDM channel or TDMA transmitter. In this context, a frame is typically an entity at the physical layer. TDM application examples are SONET/SDH and the ISDN circuit-switched B-channel, while TDMA examples are Circuit Switched Data used in early cellular voice services. The frame is also an entity for time-division duplex, where the mobile terminal may transmit during some time slots and receive during others.

    Read more →
  • Sparrow (chatbot)

    Sparrow (chatbot)

    Sparrow is a chatbot developed by the artificial intelligence research lab DeepMind, a subsidiary of Alphabet Inc. It is designed to answer users' questions correctly, while reducing the risk of unsafe and inappropriate answers. One motivation behind Sparrow is to address the problem of language models producing incorrect, biased or potentially harmful outputs. Sparrow is trained using human judgements, in order to be more “Helpful, Correct and Harmless” compared to baseline pre-trained language models. The development of Sparrow involved asking paid study participants to interact with Sparrow, and collecting their preferences to train a model of how useful an answer is. To improve accuracy and help avoid the problem of hallucinating incorrect answers, Sparrow has the ability to search the Internet using Google Search in order to find and cite evidence for any factual claims it makes. To make the model safer, its behaviour is constrained by a set of rules, for example "don't make threatening statements" and "don't make hateful or insulting comments", as well as rules about possibly harmful advice, and not claiming to be a person. During development study participants were asked to converse with the system and try to trick it into breaking these rules. A 'rule model' was trained on judgements from these participants, which was used for further training. Sparrow was introduced in a paper in September 2022, titled "Improving alignment of dialogue agents via targeted human judgements"; however, the bot was not released publicly. DeepMind CEO Demis Hassabis said DeepMind is considering releasing Sparrow for a "private beta" some time in 2023. == Training == Sparrow is a deep neural network based on the transformer machine learning model architecture. It is fine-tuned from DeepMind's Chinchilla AI pre-trained large language model (LLM), which has 70 Billion parameters. Sparrow is trained using reinforcement learning from human feedback (RLHF), although some supervised fine-tuning techniques are also used. The RLHF training utilizes two reward models to capture human judgements: a “preference model” that predicts what a human study participant would prefer and a “rule model” that predicts if the model has broken one of the rules. == Limitations == Sparrow's training data corpus is mainly in English, meaning it performs worse in other languages. When adversarially probed by study participants it breaks the rules 8% of the time; however, this is still three times lower than the baseline prompted pre-trained model (Chinchilla).

    Read more →
  • Communications security

    Communications security

    Communications security is the discipline of preventing unauthorized interceptors from accessing telecommunications in an intelligible form, while still delivering content to the intended recipients. In the North Atlantic Treaty Organization culture, including United States Department of Defense culture, it is often referred to by the abbreviation COMSEC. The field includes cryptographic security, transmission security, emissions security and physical security of COMSEC equipment and associated keying material. COMSEC is used to protect both classified and unclassified traffic on military communications networks, including voice, video, and data. It is used for both analog and digital applications, and both wired and wireless links. Voice over secure internet protocol VOSIP has become the de facto standard for securing voice communication, replacing the need for Secure Terminal Equipment (STE) in much of NATO, including the U.S.A. USCENTCOM moved entirely to VOSIP in 2008. == Specialties == Cryptographic security: The component of communications security that results from the provision of technically sound cryptosystems and their proper use. This includes ensuring message confidentiality and authenticity. Emission security (EMSEC): The protection resulting from all measures taken to deny unauthorized persons information of value that might be derived from communications systems and cryptographic equipment intercepts and the interception and analysis of compromising emanations from cryptographic equipment, information systems, and telecommunications systems. Transmission security (TRANSEC): The component of communications security that results from the application of measures designed to protect transmissions from interception and exploitation by means other than cryptanalysis (e.g. frequency hopping and spread spectrum). Physical security: The component of communications security that results from all physical measures necessary to safeguard classified equipment, material, and documents from access thereto or observation thereof by unauthorized persons. == Related terms == ACES – Automated Communications Engineering Software AEK – Algorithmic Encryption Key AKMS – the Army Key Management System CCI – Controlled Cryptographic Item - equipment which contains COMSEC embedded devices CT3 – Common Tier 3 DTD – Data Transfer Device ICOM – Integrated COMSEC, e.g. a radio with built in encryption KEK – Key Encryption Key KG-30 – family of COMSEC equipment KOI-18 – Tape Reader General Purpose KPK – Key production key KYK-13 – Electronic Transfer Device KYX-15 – Electronic Transfer Device LCMS – Local COMSEC Management Software OTAR – Over the Air Rekeying OWK – Over the Wire Key SKL – Simple Key Loader SOI – Signal operating instructions STE – Secure Terminal Equipment (secure phone) STU-III – (obsolete secure phone, replaced by STE) TED – Trunk Encryption Device such as the WALBURN/KG family TEK – Traffic Encryption Key TPI – Two person integrity TSEC – Telecommunications Security (sometimes referred to in error transmission security or TRANSEC) Types of COMSEC equipment: Authentication equipment Crypto equipment: Any equipment that embodies cryptographic logic or performs one or more cryptographic functions (key generation, encryption, and authentication). Crypto-ancillary equipment: Equipment designed specifically to facilitate efficient or reliable operation of crypto-equipment, without performing cryptographic functions itself. Crypto-production equipment: Equipment used to produce or load keying material == DoD Electronic Key Management System == The Electronic Key Management System (EKMS) is a United States Department of Defense (DoD) key management, COMSEC material distribution, and logistics support system. The National Security Agency (NSA) established the EKMS program to supply electronic key to COMSEC devices in securely and timely manner, and to provide COMSEC managers with an automated system capable of ordering, generation, production, distribution, storage, security accounting, and access control. The Army's platform in the four-tiered EKMS, AKMS, automates frequency management and COMSEC management operations. It eliminates paper keying material, hardcopy Signal operating instructions (SOI) and saves the time and resources required for courier distribution. It has 4 components: LCMS provides automation for the detailed accounting required for every COMSEC account, and electronic key generation and distribution capability. ACES is the frequency management portion of AKMS. ACES has been designated by the Military Communications Electronics Board as the joint standard for use by all services in development of frequency management and crypto-net planning. CT3 with DTD software is in a fielded, ruggedized hand-held device that handles, views, stores, and loads SOI, Key, and electronic protection data. DTD provides an improved net-control device to automate crypto-net control operations for communications networks employing electronically keyed COMSEC equipment. SKL is a hand-held PDA that handles, views, stores, and loads SOI, Key, and electronic protection data. == Key Management Infrastructure (KMI) Program == KMI is intended to replace the legacy Electronic Key Management System to provide a means for securely ordering, generating, producing, distributing, managing, and auditing cryptographic products (e.g., asymmetric keys, symmetric keys, manual cryptographic systems, and cryptographic applications). This system is currently being fielded by Major Commands and variants will be required for non-DoD Agencies with a COMSEC Mission.

    Read more →
  • Copyright

    Copyright

    A copyright is a type of intellectual property that gives its owner the exclusive legal right to copy, distribute, adapt, display, and perform a creative work, usually for a limited time. The creative work may be in a literary, artistic, educational, or musical form. Copyright is intended to protect the original expression of an idea in the form of a creative work, but not the idea itself. A copyright is subject to limitations based on public interest considerations, such as the fair use doctrine in the United States and fair dealing doctrine in the United Kingdom. Some jurisdictions require "fixing" copyrighted works in a tangible form. It is often shared among multiple authors, each of whom holds a set of rights to use or license the work, and who are commonly referred to as rights holders. These rights normally include reproduction, control over derivative works, distribution, public performance, and moral rights such as attribution. Copyrights can be granted by public law and are in that case considered "territorial rights". This means that copyrights granted by the law of a certain state do not extend beyond the territory of that specific jurisdiction. Copyrights of this type vary by country; many countries, and sometimes a large group of countries, have made agreements with other countries on procedures applicable when works "cross" national borders or national rights are inconsistent. Typically, the public law duration of a copyright expires 50 to 100 years after the creator dies, depending on the jurisdiction. Some countries require certain copyright formalities to establishing copyright, others recognize copyright in any completed work, without a formal registration. When the copyright of a work expires, it enters the public domain. == History == === Background === The concept of copyright developed after the printing press came into use in Europe in the 15th and 16th centuries. It was associated with a common law and rooted in the civil law system. The printing press made it much cheaper to produce works, but as there was initially no copyright law, anyone could buy or rent a press and print any text. Popular new works were immediately re-set and re-published by competitors, so printers needed a constant stream of new material. Fees paid to authors for new works were high and significantly supplemented the incomes of many academics. Printing brought profound social changes. The rise in literacy across Europe led to a dramatic increase in the demand for reading matter. Prices of reprints were low, so publications could be bought by poorer people, creating a mass audience. In German-language markets before the advent of copyright, technical materials, like academic papers and handbooks, were inexpensive and widely available; it has been suggested this contributed to Germany's industrial and economic success. === Conception === The concept of copyright first developed in England. In reaction to the printing of "scandalous books and pamphlets", the English Parliament passed the Licensing of the Press Act 1662, which required all intended publications to be registered with the government-approved Stationers' Company, giving the Stationers the right to regulate what material could be printed. The Statute of Anne, enacted in 1710 in England and Scotland, provided the first legislation to protect copyrights (but not authors' rights). The Copyright Act 1814 extended more rights for authors but did not protect British publications from being reprinted in the US. The Berne International Copyright Convention of 1886 finally provided protection for authors among the countries who signed the agreement, although the US did not join the Berne Convention until 1989. In the US, the Constitution grants Congress the right to establish copyright and patent laws. Shortly after the Constitution was passed, Congress enacted the Copyright Act of 1790, modeling it after the Statute of Anne. While the national law protected authors' published works, authority was granted to the states to protect authors' unpublished works. The most recent major overhaul of copyright in the US, the Copyright Act of 1976, extended federal copyright to works as soon as they are created and "fixed", without requiring publication or registration. State law continues to apply to unpublished works that are not otherwise copyrighted by federal law. This act also changed the calculation of copyright term from a fixed term (then a maximum of fifty-six years) to "life of the author plus 50 years". These changes brought the US closer to conformity with the Berne Convention, and in 1989 the United States further revised its copyright law and joined the Berne Convention officially. Copyright laws allow products of creative human activities, such as literary and artistic production, to be preferentially exploited and thus incentivized. Different cultural attitudes, social organizations, economic models and legal frameworks are seen to account for why copyright emerged in Europe and not, for example, in Asia. In the Middle Ages in Europe, there was generally a lack of any concept of literary property due to the general relations of production, the specific organization of literary production and the role of culture in society. The latter refers to the tendency of oral societies, such as that of Europe in the medieval period, to view knowledge as the product and expression of the collective, rather than to see it as individual property. However, with copyright laws, intellectual production comes to be seen as a product of an individual, with attendant rights. The most significant point is that patent and copyright laws support the expansion of the range of creative human activities that can be commodified. This parallels the ways in which capitalism led to the commodification of many aspects of social life that earlier had no monetary or economic value perse. Copyright has developed into a concept that has a significant effect on nearly every modern industry, including not just literary work, but also forms of creative work such as sound recordings, films, photographs, software, and architecture. === National copyrights === Often seen as the first real copyright law, the 1709 British Statute of Anne gave authors and the publishers to whom they did chose to license their works, the right to publish the author's creations for a fixed period, after which the copyright expired. It was "An Act for the Encouragement of Learning, by Vesting the Copies of Printed Books in the Authors or the Purchasers of such Copies, during the Times therein mentioned." The act also alluded to individual rights of the artist. It began: "Whereas Printers, Booksellers, and other Persons, have of late frequently taken the Liberty of Printing ... Books, and other Writings, without the Consent of the Authors ... to their very great Detriment, and too often to the Ruin of them and their Families:". A right to benefit financially from the work is articulated, and court rulings and legislation have recognized a right to control the work, such as ensuring that the integrity of it is preserved. An irrevocable right to be recognized as the work's creator appears in some countries' copyright laws. The Copyright Clause of the United States, Constitution (1787) authorized copyright legislation: "To promote the Progress of Science and useful Arts, by securing for limited Times to Authors and Inventors the exclusive Right to their respective Writings and Discoveries." That is, by guaranteeing them a period of time in which they alone could profit from their works, they would be enabled and encouraged to invest the time required to create them, and this would be good for society as a whole. A right to profit from the work has been the philosophical underpinning for much legislation extending the duration of copyright, to the life of the creator and beyond, to their heirs. Yet scholars like Lawrence Lessig have argued that copyright terms have been extended beyond the scope imagined by the Framers. Lessig refers to the Copyright Clause as the "Progress Clause" to emphasize the social dimension of intellectual property rights. The original length of copyright in the United States was 14 years, and it had to be explicitly applied for. If the author wished, they could apply for a second 14‑year monopoly grant, but after that the work entered the public domain, so it could be used and built upon by others. === Continental law === In many jurisdictions of the European continent, comparable legal concepts to copyright did exist from the 16th century on but did change under Napoleonic rule into another legal concept: authors' rights or creator's right laws, from French: droits d'auteur and German Urheberrecht. In many modern-day publications the terms copyright and authors' rights are being mixed, or used as translations, but in a juridical sense the legal concepts do essentially differ. Authors' rights are, generally speaking,

    Read more →
  • Social Media (Age-Restricted Users) Bill

    Social Media (Age-Restricted Users) Bill

    The Social Media (Age-Restricted Users) Bill is a member's bill by National Party Member of Parliament Catherine Wedd that seeks to ban children under the age of 16 years from accessing social media by forcing social media companies to implement age verification measures. It is modelled after the Australian government's Online Safety Amendment. In mid October 2025, the New Zealand Parliament confirmed plans to introduce the social media age restriction bill. == Background == In late November 2024, the Albanese government of Australia, with support from the opposition Coalition parties, passed the Online Safety Amendment creating a world-first age verification regime targeting social media platforms operating in the country. The ban targets several social media platforms including Facebook, Instagram, Kick, Reddit, Snapchat, Threads, TikTok, Twitch, X (formerly Twitter) and YouTube. These platforms were required to implement age verification systems and to remove under-age users by 10 December 2025, when the law change came into effect. == Draft provisions == The draft Social Media (Age-Restricted Users) Bill defines social media platforms as electronic platforms that enable social media interactions between two or more end-users, facilitates communication between multiple end-users and allows users to post content on the platform. The proposed bill requires social media companies to take action to prevent users under the age of 16 from creating accounts on their platforms. It also creates a framework for courts to impose fines on platforms that fail to take reasonable steps to prevent underaged users from accessing the platform. == Legislative history == === Draft legislation === On 6 May 2025, Wedd announced a private member's bill called the "Social Media (Age-Restricted Users) Bill" that would bar access to social media platforms for people under the age of 16 years. She said that she was motivated as the mother of four children to support families, parents and teachers' efforts to manage their children's online exposure and the passage of the Australian Online Safety Amendment legislation in December 2024. Since National's coalition partner ACT New Zealand had refused to support the bill, the Sixth National Government announce it as a member's bill rather than a government bill. Prime Minister Christopher Luxon has confirmed that National would seek cross-party support for the legislation. ACT MP and the Minister of Internal Affairs Brooke van Velden said that the Government would watch the implementation of the Australian social media age restriction policy. In October 2025, Wedd's bill was drawn from the parliamentary ballot. In addition, Labour Reuben Davidson drafted a similar member's bill that would hold social media providers responsible for restricting "harmful content" and imposed NZ$50,000 fines for non-compliance. In November 2025, Luxon reiterated his support for social media age restriction legislation and said the New Zealand government would introduce a bill in 2026 before the 2026 New Zealand general election. He also confirmed that Education Minister Erica Stanford was leading an investigation into what lessons could be learnt from the Australian legislation. At the request of ACT MP Parmjeet Parmar, Parliament's Education and Workforce Committee held an inquiry into a proposed social media ban in early October 2025. The committee was led by National MP Carl Bates and received 430 submissions from 400 groups and individuals. The committee also heard from 87 in-person submissions. On 10 December 2025, the committee made 12 recommendations including restricting social media access to persons under the age of 16, re-evaluating existing legislation such as the Films, Videos, and Publications Classification Act and the Harmful Digital Communications Act 2015, and regulating online platforms and Internet service providers. The ACT party released a dissenting view disagreeing with the need for a law restricting social media access to under-16 year olds. In mid-May 2026, the Government confirmed that work on the proposed bill to ban under-16 year olds from social media had been paused. The New Zealand Parliament held a debate on the proposed bill on 13 May following a select committee inquiry into the harms caused by social media platforms. While the opposition Labour Party has agreed to support the member's bill, the ACT and Green parties opposed the proposed bill on the grounds that the rules were easy to circumvent, that at-risk groups could become more isolated, and that social media also harmed other age groups. == Responses == === Academia and civil society === In late July 2025, the New Zealand Council for Civil Liberties (NZCCL) expressed concern that the proposed social media age restriction could infringe upon the New Zealand Bill of Rights Act 1990, the Privacy Act 2020 and the United Nations' Convention on the Rights of the Child. The NZCCL also questioned the practicality of age verification software, a social media age limit and whether it would fulfil its stated goal of combating online harm. In August 2025, University of Auckland criminologist and senior lecturer Claire Meehan expressed concern that the social media age restriction legislation would cut children from their friendship and support networks. She also said that children and young people were digital natives who could use VPNs to circumvent the ban. Similar sentiments were echoed by Victoria University of Wellington media and communications lecturer Alex Beattie and "Ocean Today" Instagram social media influencer "Charlie." In October 2025, New Zealand Initiative representative Dr Eric Crampton expressed concern that a social media age restriction would involve the introduction of digital IDs. He argued that a new law was unnecessary and said that parents could limit their children's exposure to social media via Google's Family Link and Apple's equivalent. Similarly, Institute of Economic Affairs public policy fellow Matthew Lesh and the British Free Speech Union expressed concerns that young people could use VPNs to circumvent a social media ban, citing the spike in VPN usage in the United Kingdom following the passage of the Online Safety Act 2023. The advocacy group B416's co-chair Anna Curzon advocated for a social media ban on underage users, stating that social media apps "are made to be addictive" and made it difficult for parents to relate with their children. In late November 2025, B416's co-founder Anna Mowbray expressed support for the Government's social media age restriction bill but expressed disappointment that Luxon had not timed his announcement with the launch of the group's campaign. Generation-Z Aotearoa co-founder Lola Fisher has called on the New Zealand Government to consult with young people on the development of the legislation. === Government agencies and departments === In early October 2025, Privacy Commissioner Michael Webster expressed concern that social media platforms requiring users to prove their age via digital IDs could raise privacy concerns. Webster suggested that age verification systems could relay on various documents including passports. He said that age estimation technologies had high error rates and that age inference technologies relied on data mining. === Political parties === In early May 2025, the National Party government expressed support for a social media age restriction legislation. By contrast, its coalition partner ACT has opposed such legislation. ACT leader David Seymour described the ban as hasty and unworkable since it did not involve parents. Meanwhile, New Zealand First leader Winston Peters expressed support for a social media age restriction but said the bill should be subject to a select committee inquiry. The opposition Labour Party leader Chris Hipkins has expressed interest in a social media age restriction legislation but emphasised the need for consensus. Meanwhile, Green Party co-leader Chlöe Swarbrick said she wanted to learn more about the bill but described it as simplistic. Fellow Greens co-leader Marama Davidson said that the proposed bill would punish children and young people for the harm caused by big tech platforms. === Tech companies === In early October 2025, representatives of TikTok and Meta Platforms cautioned against proposed social media ban on under-16 years olds. During a one-day parliamentary inquiry, Ella Woods-Joyce, TikTok's public policy lead for Australia and New Zealand, and Mia Garlick, Meta's regional director of policy, expressed concern that the social media age restriction could send children and young people to less regulated online spaces. Woods-Joyce highlighted TikTok's policy of closing down accounts belonging to users under the age of 13 years while Garlick highlighted Meta's policy of placing users under the age of 16 in private accounts by default. In early February 2026 Meta's vice president and global head of safety, Antigone Da

    Read more →
  • Super app

    Super app

    A super app or super-app (also known as an everything app) is a mobile or web application that can provide multiple services including payment and instant messaging services, effectively becoming an all-encompassing, self-contained, commerce and communication online platform that embraces many aspects of personal and commercial life. Notable examples of super apps include Tencent's WeChat in China, Tata Neu in India, Grab in Southeast Asia and Max in Russia. For end users, a super app is an application that provides a set of core features while also giving access to independently developed miniapps. For app developers, a super app is an application integrated with the capabilities of platforms and ecosystems that allows third-parties to develop and publish miniapps. == History == The super app term was first used to describe WeChat when it combined the instant messaging service with the digital wallet function. Recognition of WeChat as a super app stems from its combination of messaging, payments, e-commerce, and much more within a single application, making it indispensable for many users. WeChat's establishment of the super app model has led companies like Meta to try to build similar applications outside of China. In India, Tata Group has announced that it is currently developing a super app named Tata Neu. Major Indian companies like Paytm, PhonePe, and ITC Maars also have apps in development that might constitute super apps. In Southeast Asia, Grab and Gojek lay claim to the super app classification despite lacking many of the features offered by WeChat. Accordingly, growth-stage companies like Shopee, Traveloka, and AirAsia have also expanded the range of services offered by their respective applications. == Notable examples == === Alipay === Alipay is a third-party mobile and online payment platform established in Hangzhou, China in February 2004 by Alibaba Group and its founder Jack Ma. It operates in association with Ant Group, an affiliate company of the Chinese Alibaba Group. === Gojek === Gojek is an Indonesian on-demand multiservice digital platform and fintech payment super app. Established in Jakarta in 2010, as a call center to connect consumers to courier delivery and two-wheeled ride-hailing services, it launched its mobile app in 2015 with four services: GoRide, GoSend, GoShop, and GoFood, which has since expanded to offer over 20 services. In 2021, it merged with another Indonesian unicorn, Tokopedia, forming the decacorn GoTo Gojek Tokopedia. === Grab === Grab is a Southeast Asian technology company headquartered in Singapore and Indonesia. Founded in 2012 as the MyTeksi app in Kuala Lumpur, Malaysia, it expanded the following year as GrabTaxi, before moving its headquarters to Singapore in 2014 and rebranding officially as Grab. In addition to ride-hailing and transportation services, the company's mobile app also offers food delivery and digital payment services. === Max === Max is a messenger from the Russian company VK, positioned as a super app. The application combines messaging, calls, and channels features with the integration of additional services: payments, miniapps, taxi ordering, deliveries, and other everyday services are available within a single interface. The goal is to unite communication and routine tasks in a unified ecosystem. === Tata Neu === Tata Neu is a multipurpose super app, developed in India by the Tata Group. It is the country's first super app. The app was launched to coincide with the start of a 2022 Indian Premier League cricket match. === WeChat === WeChat is a Chinese multipurpose instant messaging, social media and mobile payment app. First released in 2011, it became the world's largest standalone mobile app in 2018, with over 1 billion monthly active users. WeChat provides text messaging, hold-to-talk voice messaging, broadcast (one-to-many) messaging, video conferencing, video games, the sharing of photographs and videos and location sharing. === X === X is an American social network, originally known as Twitter from its launch through 2023. Prior to his acquisition of the service, new owner Elon Musk stated that he planned for Twitter to become an "everything app" known as "X"; in 2023, the service added an AI chatbot known as "Grok" as well as integrated job search tools known as "X Hiring". In January 2025, X announced its intent to offer a digital wallet service in the future. Later in the year, X revamped its direct messaging system as "Chat". == Criticism == Although apps that fit the super app classification can offer users a wider variety of services in comparison to single-purpose alternatives, internet regulators in regions such as the US and Europe have become more concerned about the overall power of the technology industry and have become more critical of companies developing such apps. In China, WeChat and other local firms have been ordered to open up their platforms to rivals by local regulators. There are also reports that suggest it might be difficult to replicate WeChat's super app model. This stems partly from the peaking of smartphone penetration rates in many regions worldwide, which has led to overcrowded app stores and tighter restrictions on targeted advertising as regulators assert more control over the companies. From a technical viewpoint, single-purpose apps are comparatively faster, more responsive and easier to navigate than super apps, which helps improve the overall user experience. Super-apps are also likelier to store larger amounts of personal data to facilitate the delivery of their services, so users run a greater risk of becoming victims of severe data breaches. In 2020, this unfolded with Tokopedia, which had the data of 91 million of its users stolen and shared by crackers. It has also been noted that a user who loses access to their account or is banned from a super app generally loses access to multiple real-life services and digital applications; the Chinese government has used this approach to penalize people who shared the photos of the Sitong Bridge protest.

    Read more →
  • Data deduplication

    Data deduplication

    In computing, data deduplication is a technique for eliminating duplicate copies of repeating data. Successful implementation of the technique can improve storage utilization, which may in turn lower capital expenditure by reducing the overall amount of storage media required to meet storage capacity needs. It can also be applied to network data transfers to reduce the number of bytes that must be sent. The deduplication process requires comparison of data 'chunks' (also known as 'byte patterns') which are unique, contiguous blocks of data. These chunks are identified and stored during a process of analysis, and compared to other chunks within existing data. Whenever a match occurs, the redundant chunk is replaced with a small reference that points to the stored chunk. Given that the same byte pattern may occur dozens, hundreds, or even thousands of times (the match frequency is dependent on the chunk size), the amount of data that must be stored or transferred can be greatly reduced. A related technique is single-instance (data) storage, which replaces multiple copies of content at the whole-file level with a single shared copy. While possible to combine this with other forms of data compression and deduplication, it is distinct from newer approaches to data deduplication (which can operate at the segment or sub-block level). Deduplication is different from data compression algorithms, such as LZ77 and LZ78. Whereas compression algorithms identify redundant data inside individual files and encodes this redundant data more efficiently, the intent of deduplication is to inspect large volumes of data and identify large sections – such as entire files or large sections of files – that are identical, and replace them with a shared copy. == Functioning principle == For example, a typical email system might contain 100 instances of the same 1 MB (megabyte) file attachment. Each time the email platform is backed up, all 100 instances of the attachment are saved, requiring 100 MB storage space. With data deduplication, only one instance of the attachment is actually stored; the subsequent instances are referenced back to the saved copy for deduplication ratio of roughly 100 to 1. Deduplication is often paired with data compression for additional storage saving: Deduplication is first used to eliminate large chunks of repetitive data, and compression is then used to efficiently encode each of the stored chunks. In computer code, deduplication is done by, for example, storing information in variables so that they don't have to be written out individually but can be changed all at once at a central referenced location. Examples are CSS classes and named references in MediaWiki. == Benefits == Storage-based data deduplication reduces the amount of storage needed for a given set of files. It is most effective in applications where many copies of very similar or even identical data are stored on a single disk. In the case of data backups, which routinely are performed to protect against data loss, most data in a given backup remain unchanged from the previous backup. Common backup systems try to exploit this by omitting (or hard linking) files that haven't changed or storing differences between files. Neither approach captures all redundancies, however. Hard-linking does not help with large files that have only changed in small ways, such as an email database; differences only find redundancies in adjacent versions of a single file (consider a section that was deleted and later added in again, or a logo image included in many documents). In-line network data deduplication is used to reduce the number of bytes that must be transferred between endpoints, which can reduce the amount of bandwidth required. See WAN optimization for more information. Virtual servers and virtual desktops benefit from deduplication because it allows nominally separate system files for each virtual machine to be coalesced into a single storage space. At the same time, if a given virtual machine customizes a file, deduplication will not change the files on the other virtual machines—something that alternatives like hard links or shared disks do not offer. Backing up or making duplicate copies of virtual environments is similarly improved. == Classification == === Post-process versus in-line deduplication === Deduplication may occur "in-line", as data is flowing, or "post-process" after it has been written. With post-process deduplication, new data is first stored on the storage device and then a process at a later time will analyze the data looking for duplication. The benefit is that there is no need to wait for the hash calculations and lookup to be completed before storing the data, thereby ensuring that store performance is not degraded. Implementations offering policy-based operation can give users the ability to defer optimization on "active" files, or to process files based on type and location. One potential drawback is that duplicate data may be unnecessarily stored for a short time, which can be problematic if the system is nearing full capacity. Alternatively, deduplication hash calculations can be done in-line: synchronized as data enters the target device. If the storage system identifies a block which it has already stored, only a reference to the existing block is stored, rather than the whole new block. The advantage of in-line deduplication over post-process deduplication is that it requires less storage and network traffic, since duplicate data is never stored or transferred. On the negative side, hash calculations may be computationally expensive, thereby reducing the storage throughput. However, certain vendors with in-line deduplication have demonstrated equipment which performs in-line deduplication at high rates. Post-process and in-line deduplication methods are often heavily debated. === Data formats === The SNIA Dictionary identifies two methods: Content-agnostic data deduplication – a data deduplication method that does not require awareness of specific application data formats. Content-aware data deduplication – a data deduplication method that leverages knowledge of specific application data formats. === Source versus target deduplication === Another way to classify data deduplication methods is according to where they occur. Deduplication occurring close to where data is created, is referred to as "source deduplication". When it occurs near where the data is stored, it is called "target deduplication". Source deduplication ensures that data on the data source is deduplicated. This generally takes place directly within a file system. The file system will periodically scan new files creating hashes and compare them to hashes of existing files. When files with same hashes are found then the file copy is removed and the new file points to the old file. Unlike hard links however, duplicated files are considered to be separate entities and if one of the duplicated files is later modified, then using a system called copy-on-write a copy of that changed file or block is created. The deduplication process is transparent to the users and backup applications. Backing up a deduplicated file system will often cause duplication to occur resulting in the backups being bigger than the source data. Source deduplication can be declared explicitly for copying operations, as no calculation is needed to know that the copied data is in need of deduplication. This leads to a new form of link on file systems, called a reference-counted link, or reflink, in some systems (e.g. Linux), or a cloned file on macOS, where one or more inodes (file information entries) are made to share some or all of their data. It is named analogously to hard links, which work at the inode level, and symbolic links, which work at the filename level.The individual entries have a copy-on-write behavior that is non-aliasing, i.e. changing one copy afterwards will not affect other copies. Microsoft's ReFS also supports this operation. Target deduplication is the process of removing duplicates when the data was not generated at that location. Example of this would be a server connected to a SAN/NAS, The SAN/NAS would be a target for the server (target deduplication). The server is not aware of any deduplication, the server is also the point of data generation. A second example would be backup. Generally this will be a backup store such as a data repository or a virtual tape library. === Deduplication methods === One of the most common forms of data deduplication implementations works by comparing chunks of data to detect duplicates. For that to happen, each chunk of data is assigned an identification, calculated by the software, typically using cryptographic hash functions. In many implementations, the assumption is made that if the identification is identical, the data is identical, even though this cannot be true in all cases due to the pigeonhole principle; other implementations do not as

    Read more →
  • Blacker (security)

    Blacker (security)

    Blacker (styled BLACKER) is a U.S. Department of Defense computer network security project designed to achieve A1 class ratings (very high assurance) of the Trusted Computer System Evaluation Criteria (TCSEC). The first Blacker program began in the late 1970s, with a follow-on eventually producing fielded devices in the late 1980s. It was the first secure system with trusted end-to-end encryption on the United States' Defense Data Network. The project was implemented by SDC (software), and Burroughs (hardware), and after their merger, by the resultant company Unisys.

    Read more →