Facebook is an American social networking service owned by the American technology conglomerate Meta Platforms. It was founded in 2004 by Mark Zuckerberg, along with his Harvard College roommates and fellow students Eduardo Saverin, Andrew McCollum, Dustin Moskovitz, and Chris Hughes. The name Facebook derives from the face book directories often given to American university students. The service was initially limited to Harvard students before gradually expanding to other universities in North America. Since 2006, Facebook has permitted registration by individuals aged 13 and older, with the exception of South Korea, Spain, and Quebec, where the minimum age is 14. As of December 2023, Facebook reported approximately 3.07 billion monthly active users worldwide. As of July 2025, it was ranked as the third-most-visited website in the world, with 23 percent of its traffic originating from the United States. It was the most downloaded mobile application of the 2010s. Facebook can be accessed from devices with Internet connectivity, such as personal computers, tablets and smartphones. After registering, users can create a profile revealing personal information about themselves. They can post text, photos and multimedia which are shared either publicly or exclusively with other users who have agreed to be their friend, depending on privacy settings. Users can also communicate directly with each other with Messenger, edit messages (within 15 minutes after sending), join common-interest groups, and receive notifications on the activities of their Facebook friends and the pages they follow. Facebook has often been criticized over issues such as user privacy (as with the Facebook–Cambridge Analytica data scandal), political manipulation (as with the 2016 U.S. elections) and mass surveillance. The company has also been subject to criticism over its psychological effects such as addiction and low self-esteem, and over content such as fake news, conspiracy theories, copyright infringement, and hate speech. Commentators have accused Facebook of willingly facilitating the spread of such content, as well as overemphasizing its number of users to appeal to advertisers. == History == The history of Facebook traces its growth from a college networking site to a global social networking service. While attending Phillips Exeter in the early 2000s, Zuckerberg met Kris Tillery. Tillery, a one-time project collaborator with Zuckerberg, would create a school-based social networking project called Photo Address Book. Photo Address Book was a digital face book, created through a linked database composed of student information derived from the official records of the Exeter Student Council. The database contained linkages such as name, dorm-specific landline numbers, and student headshots. Mark Zuckerberg built a website called "Facemash" in 2003 while attending Harvard University. The site was comparable to Hot or Not and used photos from online face books, asking users to choose the 'hotter' person". Zuckerberg was reported and faced expulsion, but the charges were dropped. A "face book" is a student directory featuring photos and personal information. In January 2004, Zuckerberg coded a new site known as "TheFacebook", stating, "It is clear that the technology needed to create a centralized Website is readily available ... the benefits are many." Zuckerberg met with Harvard student Eduardo Saverin, and each agreed to invest $1,000. On February 4, 2004, Zuckerberg launched "TheFacebook". Membership was initially restricted to students of Harvard College. Dustin Moskovitz, Andrew McCollum, and Chris Hughes joined Zuckerberg to help manage the growth of the site. It became available successively to most universities in the US and Canada. In 2004, Napster co-founder Sean Parker became company president and the company moved to Palo Alto, California. PayPal co-founder Peter Thiel gave Facebook its first investment. In 2005, the company dropped "the" from its name after purchasing the domain name Facebook.com. In 2006, Facebook opened to everyone at least or only 13 years old with a valid email address. Facebook introduced key features like the News Feed, which became central to user engagement. By late 2007, Facebook had 100,000 pages on which companies promoted themselves. Facebook had surpassed MySpace in global traffic and became the world's most popular social media platform. Microsoft announced that it had purchased a 1.6% share of Facebook for $240 million ($373 million in 2025 dollars), giving Facebook an implied value of around $15 billion ($23.3 billion in 2025 dollars). Facebook focused on generating revenue through targeted advertising based on user data, a model that drove its rapid financial growth. In 2012, Facebook went public with one of the largest IPOs in tech history. Acquisitions played a significant role in Facebook's dominance. In 2012, it purchased Instagram, followed by WhatsApp and Oculus VR in 2014, extending its influence beyond social networking into messaging and virtual reality. The Facebook–Cambridge Analytica data scandal in 2018 revealed misuse of user data to influence elections, sparking global outcry and leading to regulatory fines and hearings. Facebook's role in global events, including its use in organizing movements like the Arab Spring and its impact on events like the Rohingya genocide in Myanmar, highlighted its dual nature as a tool for both empowerment and harm. In 2021, Facebook rebranded as Meta, reflecting its shift toward building the "metaverse" and focusing on virtual reality and augmented reality technologies. == Features == Facebook does not officially publish a maximum character limit for posts; however, user posts can be lengthy, with unofficial sources suggesting a high character limit. Posts may also include images and videos. According to Facebook's official business documentation, videos can be up to 240 minutes long and 10 GB in file size, with supported resolutions up to 1080p. Users can "friend" users, both sides must agree to being friends. Posts can be changed to be seen by everyone (public), friends, people in a certain group (group) or by selected friends (private). Users can join groups. Groups are composed of persons with shared interests. For example, they might go to the same sporting club, live in the same suburb, have the same breed of pet or share a hobby. Posts posted in a group can be seen only by those in a group, unless set to public. Users are able to buy, sell, and swap things on Facebook Marketplace or in a Buy, Swap and Sell group. Facebook users may advertise events, which can be offline, on a website other than Facebook, or on Facebook. == Website == === Technical aspects === The site's primary color is blue as Zuckerberg is red–green colorblind, a realization that occurred after a test taken around 2007. Facebook was initially built using PHP, a popular scripting language designed for web development. PHP was used to create dynamic content and manage data on the server side of the Facebook application. Zuckerberg and co-founders chose PHP for its simplicity and ease of use, which allowed them to quickly develop and deploy the initial version of Facebook. As Facebook grew in user base and functionality, the company encountered scalability and performance challenges with PHP. In response, Facebook engineers developed tools and technologies to optimize PHP performance. One of the most significant was the creation of the HipHop Virtual Machine (HHVM). This significantly improved the performance and efficiency of PHP code execution on Facebook's servers. The site upgraded from HTTP to the more secure HTTPS in January 2011. ==== 2012 architecture ==== Facebook is developed as one monolithic application. According to an interview in 2012 with Facebook build engineer Chuck Rossi, Facebook compiles into a 1.5 GB binary blob which is then distributed to the servers using a custom BitTorrent-based release system. Rossi stated that it takes about 15 minutes to build and 15 minutes to release to the servers. The build and release process has zero downtime. Changes to Facebook are rolled out daily. Facebook used a combination platform based on HBase to store data across distributed machines. Using a tailing architecture, events are stored in log files, and the logs are tailed. The system rolls these events up and writes them to storage. The user interface then pulls the data out and displays it to users. Facebook handles requests as AJAX behavior. These requests are written to a log file using Scribe (developed by Facebook). Data is read from these log files using Ptail, an internally built tool to aggregate data from multiple Scribe stores. It tails the log files and pulls data out. Ptail data are separated into three streams and sent to clusters in different data centers (Plugin impression, News feed impressions, Actions (plugin + news feed)). Puma is used to manage periods of high data
AIXI
AIXI is a theoretical mathematical formalism for artificial general intelligence. It combines Solomonoff induction with sequential decision theory. AIXI was first proposed by Marcus Hutter in 2000 and several results regarding AIXI are proved in Hutter's 2005 book Universal Artificial Intelligence. AIXI is a reinforcement learning (RL) agent. It maximizes the expected total rewards received from the environment. Intuitively, it simultaneously considers every computable hypothesis (or environment). In each time step, it looks at every possible program and evaluates how many rewards that program generates depending on the next action taken. The promised rewards are then weighted by the subjective belief that this program constitutes the true environment. This belief is computed from the length of the program: longer programs are considered less likely, in line with Occam's razor. AIXI then selects the action that has the highest expected total reward in the weighted sum of all these programs. == Etymology == According to Hutter, the word "AIXI" can have several interpretations. AIXI can stand for AI based on Solomonoff's distribution, denoted by ξ {\displaystyle \xi } (which is the Greek letter xi), or e.g. it can stand for AI "crossed" (X) with induction (I). There are other interpretations. == Definition == AIXI is a reinforcement learning agent that interacts with some stochastic and unknown but computable environment μ {\displaystyle \mu } . The interaction proceeds in time steps, from t = 1 {\displaystyle t=1} to t = m {\displaystyle t=m} , where m ∈ N {\displaystyle m\in \mathbb {N} } is the lifespan of the AIXI agent. At time step t, the agent chooses an action a t ∈ A {\displaystyle a_{t}\in {\mathcal {A}}} (e.g. a limb movement) and executes it in the environment, and the environment responds with a "percept" e t ∈ E = O × R {\displaystyle e_{t}\in {\mathcal {E}}={\mathcal {O}}\times \mathbb {R} } , which consists of an "observation" o t ∈ O {\displaystyle o_{t}\in {\mathcal {O}}} (e.g., a camera image) and a reward r t ∈ R {\displaystyle r_{t}\in \mathbb {R} } , distributed according to the conditional probability μ ( o t r t | a 1 o 1 r 1 . . . a t − 1 o t − 1 r t − 1 a t ) {\displaystyle \mu (o_{t}r_{t}|a_{1}o_{1}r_{1}...a_{t-1}o_{t-1}r_{t-1}a_{t})} , where a 1 o 1 r 1 . . . a t − 1 o t − 1 r t − 1 a t {\displaystyle a_{1}o_{1}r_{1}...a_{t-1}o_{t-1}r_{t-1}a_{t}} is the "history" of actions, observations and rewards. The environment μ {\displaystyle \mu } is thus mathematically represented as a probability distribution over "percepts" (observations and rewards) which depend on the full history, so there is no Markov assumption (as opposed to other RL algorithms). Note again that this probability distribution is unknown to the AIXI agent. Furthermore, note again that μ {\displaystyle \mu } is computable, that is, the observations and rewards received by the agent from the environment μ {\displaystyle \mu } can be computed by some program (which runs on a Turing machine), given the past actions of the AIXI agent. The only goal of the AIXI agent is to maximize ∑ t = 1 m r t {\displaystyle \sum _{t=1}^{m}r_{t}} , that is, the sum of rewards from time step 1 to m. The AIXI agent is associated with a stochastic policy π : ( A × E ) ∗ → A {\displaystyle \pi :({\mathcal {A}}\times {\mathcal {E}})^{}\rightarrow {\mathcal {A}}} , which is the function it uses to choose actions at every time step, where A {\displaystyle {\mathcal {A}}} is the space of all possible actions that AIXI can take and E {\displaystyle {\mathcal {E}}} is the space of all possible "percepts" that can be produced by the environment. The environment (or probability distribution) μ {\displaystyle \mu } can also be thought of as a stochastic policy (which is a function): μ : ( A × E ) ∗ × A → E {\displaystyle \mu :({\mathcal {A}}\times {\mathcal {E}})^{}\times {\mathcal {A}}\rightarrow {\mathcal {E}}} , where the ∗ {\displaystyle } is the Kleene star operation. In general, at time step t {\displaystyle t} (which ranges from 1 to m), AIXI, having previously executed actions a 1 … a t − 1 {\displaystyle a_{1}\dots a_{t-1}} (which is often abbreviated in the literature as a < t {\displaystyle a_{ Since the rise of social media, there have been numerous cases of individuals being influenced towards committing suicide or self-harm through their use of social media, and even of individuals arranging to broadcast suicide attempts, some successful, on social media. Researchers have studied social media and suicide to determine what, if any, risks social media poses in terms of suicide, and to identify methods of mitigating such risks, if they exist. The search for a correlation has not yet uncovered a clear answer. == Background == Suicide is one of the leading causes of death worldwide, and as of 2020, the second leading cause of death in the United States for those aged 15–34. According to the Center for Disease Control and Prevention, suicide was the third leading cause of death among adolescents in the US, from 1999 to 2006. In 2020, people in the US had a suicide rate of 13.5 per 100,000. Suicide was a leading cause of death in the United States accounting for 48,183 deaths in 2021. Suicide rates increased by 30 per cent from 2000 to 2018 and declined in 2019 and 2020. Suicide remains a significant public health issue worldwide, despite prevention efforts and treatments. Suicide has been identified not only as an individual phenomenon but also as being influenced by social and environmental factors. There is growing evidence that online activity has influenced suicide-related behavior. The use of social media throughout the 21st century has grown exponentially. For this reason, there are a variety of sources that are accessible to the public in various forms, especially social media sites such as Facebook, Instagram, Twitter, YouTube, Snapchat, TikTok and many more. Although these platforms were intended to allow people to connect virtually, these platforms can lead to cyber-bullying, insecurity, and emotional distress, and sometimes may influence a person to attempt suicide. Bullying, whether on social media or elsewhere, physical or not, significantly increases victims' risk of suicidal behavior. Since social media was introduced some people have taken their lives as a result of cyberbullying. Furthermore, suicide rates among teenagers have increased from 2010 to 2022 as social media has become something that people interact with more throughout their day-to-day lives. Media algorithms tend to popularize videos and posts to inform the country of the rising trouble, which may create a popular appeal to the young and immature minds of teenagers. This is why, social media could provide higher risks with the promotion of different kinds of pro-suicidal sites, message boards, chat rooms, and forums. Moreover, the Internet not only reports suicide incidents but documents suicide methods (for example, suicide pacts, an agreement between two or more people to kill themselves at a particular time and often by the same lethal means). Therefore, the role the Internet plays, particularly social media, in suicide-related behavior is a topic of growing interest. == Cyberbullying == There is substantial evidence that the Internet and social media can influence suicide-related behavior. Such evidence includes an increase in exposure to graphic content. A research study conducted by Sameer Hinduja and Justin Patchin found a correlation between cyberbullying and suicide. According to their findings, cyber-bullying increases suicidal thoughts by 14.5 percent and suicide attempts by 8.7 percent. Particularly alarming is the fact that children and young people under 25 who are victims of cyberbullying are more than twice as likely to self-harm and engage in suicidal behavior. Overall, teen suicide rates have increased within the past decade.This presents a significant public health concern, with over 40,000 suicides in the United States and nearly one million worldwide annually. Adolescents involved in cyberbullying often downplay its seriousness by calling it a joke or blaming the victim. These moral disengagement strategies can normalize harmful behavior and reduce feelings of guilt. This normalization may increase emotional distress and contribute to risks like depression and suicidal thoughts. Recent data from the Centers for Disease Control and Prevention reveals that 14.9 per cent of teenagers have experienced online bullying, while 13.6 per cent of teenagers have seriously attempted suicide. Both of these incidents are in increasing numbers in the United States. Furthermore, in numerous recent incidents, cyber-bullying led the victim to commit suicide; this phenomenon is now known as cyberbullicide. Many parents and children are unaware of the dangers and potential legal consequences of cyberbullying. As a response, anti-bullying regulations implemented by schools aim to prevent any form of bullying, including through technology, and protect students from online harassment. While some states have enacted laws against cyberbullying, there are currently no federal regulations addressing this issue. == Social media's influence on suicide == The media may portray suicidal behavior or language which can potentially influence people to act on these suicidal ideation. This may include news reports of actual suicides that have occurred or television shows and films that reenact suicides. Some organizations have proposed guidelines about how the media should report suicide. There is evidence that compliance with the guidelines varies. Some research showed that it is unclear whether the guidelines have successfully reduced the number of suicides. On the contrary, other research studies stated that the guidelines have worked in some cases. == Impact of pro-suicidal sites, message boards, chat rooms and forums == Social media platforms have transformed traditional methods of communication by allowing instantaneous and interactive sharing of information created and controlled by individuals, groups, organizations, and governments. As of the third quarter of 2022, Facebook had 266 million monthly active users, between Canada and the US. An immense quantity of information on the topic of suicide is available on the Internet and via social media. The information available on social media on the topic of suicide can influence suicidal behavior, both negatively and positively. The social cognitive theory plays a vital role in suicide attempts influenced through social media. This theory is demonstrated when one is influenced by what they see through various processes that form into modeled behaviors. This can be shown when people post their suicide attempts online or promote suicidal behavior in general. Contributors to these social media platforms may also exert peer pressure and encourage others to take their own lives, idolize those who have killed themselves, and facilitate suicide pacts. These pro-suicidal sites reported the following. For example, on a Japanese message board in 2008, it was shared that people can kill themselves using hydrogen sulfide gas. Shortly afterwards, 220 people attempted suicide in this way, and 208 were successful. Biddle et al. conducted a systematic Web search of 12 suicide-associated terms (e.g., suicide, suicide methods, how to kill yourself, and best suicide methods) to analyze the search results, and found that pro-suicide sites and chat rooms that discussed general issues associated with suicide most often occurred within the first few hits of a search. In another study, 373 suicide-related websites were found using Internet search engines and examined. Among them, 31% were suicide-neutral, 29% were anti-suicide, and 11% were pro-suicide. Together, these studies have shown that obtaining pro-suicide information on the Internet, including detailed information on suicide methods, is very easy. While social media has been prevalent in young adult suicide, some young adults find comfort and solace through these platforms. Young adults are making connections with people in like situations that are helping them feel less lonely. Although the public opinion is that message boards are harmful, the following studies show how they point to suicide prevention and have positive influences. A study using content analysis analyzed all of the postings on the AOL Suicide Bulletin Board over 11 months and concluded that most contributions contained positive, empathetic, and supportive postings. Then, a multi-method study was able to demonstrate that the users of such forums experience a great deal of social support and only a small amount of social strain. Lastly, in the survey participants were asked to assess the extent of their suicidal thoughts on a 7-level scale (0, absolutely no suicidal thoughts, to 7, very strong suicidal thoughts) for the time directly before their first forum visit and at the time of the survey. The study found a significant reduction after using the forum. The study however cannot conclude the forum is the only reason for the decrease. Together, these studies show how forums can reduce the number of The IBM 2780 and the IBM 3780 are devices developed by IBM for performing remote job entry (RJE) and other batch functions over telephone lines; they communicate with the mainframe via Binary Synchronous Communications (BSC or Bisync) and replaced older terminals using synchronous transmit-receive (STR). In addition, IBM has developed workstation programs for the 1130, 360/20, 2922, System/360 other than 360/20, System/370 and System/3. == 2780 Data Transmission Terminal == The 2780 Data Transmission Terminal first shipped in 1967. It consists of: A line printer similar to the IBM 1443 that can print up to 240 lines per minute (lpm), or 300 lpm using an extremely restricted character set. A card reader/punch unit, similar to an IBM 1442, that can read up to 400 cards per minute (cpm) and can punch up to 355 cpm. A line buffer that stores data received or to be transmitted over the communications line. A binary synchronous adapter which controls the flow of data over the communications line. The 2780 is capable of local (offline) card to print operation. It comes in four models: Model 1: Can read punched cards and transmit the data to a remote host computer, and can receive and print data sent by the host. Model 2: Same as Model 1 but adds the ability to punch card data received from the host. Model 3: Can only print data received from the host, but not send data to it. Model 4: Can read and punch card data, but has no printing capabilities. The 2780 uses a dedicated communication line at speeds of 1200, 2000, 2400 or 4800 bits per second. It is a half duplex device, although full duplex lines can be used with some increase in throughput. It can communicate in Transcode (a 6-bit code), 8-bit EBCDIC, or 7-bit ASCII. == 2770 Data Communication System == The 2770, announced in 1969, "was said to surpass all other IBM terminals in the variety of available input-output devices." The 2770 was developed by the IBM General Products Division (GPD) in Rochester, MN. It comes standard with a desktop terminal with keyboard. The printer and other devices (any two in any combination) can be attached to the 2772 Multi-Purpose Control unit. Possible devices include: 50 Magnetic Data Inscriber 545 Card Punch Model 3 (non-printing) or Model 4 (printing) 1017 Paper Tape Reader 1018 Paper Tape Punch 1053 Printer Model 1 1255 Magnetic Character Reader Models 1, 2 or 3 2203 Printer Model A1 or A2 2213 Printer Model 1 or 2 2265 Display Station Model 2 2502 Card Reader Model A1 or A2 5496 Data Recorder == 3780 Data Communications Terminal == In May 1972, IBM announced the IBM 3780, an enhanced version of the 2780. The 3780 was developed by IBM's Data Processing Division (DPD). There is one model, with an optional card punch. The 3780 drops Transcode support and incorporates several performance enhancements. It supports compression of blank fields in data using run-length encoding. It provides the ability to interleave data between devices, introduces double buffering, and adds support for the Wait-before-transmit ACKnowledgement (WACK) and Temporary Text Delay (TTD) Binary Synchronous control characters. The integrated punched card unit can read cards at 600 cards per minute. The integrated printer is rated at 300, 350 or 425 lines per minute based on characters set (63, 52 or 39 characters). The 3781 Card Punch is an optional feature. It punches 160 columns per second, or 91 cards per minute if all 80 columns are punched. The IBM 2780 and 3780 were later emulated on various types of equipment, including eventually the personal computer. A notable early emulation was the DN60, by Digital Equipment Corporation in the late 1970s. == 3770 Data Communications System == In 1974 IBM Data Processing Division (DPD) offered a successor to the 3780, called the 3770 Data Communications System, supporting SDLC, BSC, BSC Multi-leaving and SNA, depending on the configuration. The 3770 is a family of desk console style terminals that offers a variety of keyboard and printer combinations as well as I/O equipment attachment and communications features. The terminals come built into a desk and include the following models: 3771 Communication Terminal (optional card reader, optional card punch, wire matrix printer) Models 1 (40 cps printer), 2 (80 cps printer), and 3 (120 cps printer). 3773 Communication Terminal (diskette, wire matrix printer) Models 1 (40 cps printer), 2 (80 cps printer), and 3 (120 cps printer). Each model has a P version which adds some programming features. 3774 Communication Terminal (optional card reader, optional card punch, optional belt printer, wire matrix printer) Models 1 (80 cps printer), and 2 (120 cps printer). Each model has a P version which adds some programming features, a 480-character display and a non-removable diskette. 3775 Communication Terminal (optional card reader, optional card punch, optional diskette, belt printer) Model 1 (120 lpm printer). The model P1 adds some programming features, a 480-character display and a non-removable diskette. 3776 Communication Terminal (optional card reader, optional card punch, optional diskette, belt printer) Models 1 (300 lpm printer) and 2 (400 lpm printer). Models 3 and 4 are similar to models 1 and 2. 3777 Communication Terminal (optional card reader, optional diskette, train printer) Model 1 (up to 1000 lpm printer depending on character set). Model 2 adds an optional card punch, model 3 adds an optional magnetic tape drive and model 4 replaces the train printer with a slower model called the IBM 3262. The model 4 also allows a second, optional, 3262. The following I/O devices can be attached to a 3770 terminal: IBM 2502 Card Reader: Models A1 (up to 150 card per minute), A2 (up to 300 cards per minute) or A3 (up to 400 cards per minute) IBM 3203 Printer Model 3: 1000 LPM using 48 character set IBM 3501 Card Reader: Up to 50 cards per minute desktop unit IBM 3521 Card Punch: Up to 50 cards per minute IBM 3782 Card Attachment unit, which allows the 2502 or 3521 to be attached to any terminal except the 3777 IBM 3784 Line Printer, can be attached to a 3774 as a second printer. Up to 155 LPM with 48 characters set print belt. == Workstation programs == IBM distributes workstation programs with systems software including OS/360 Attached Support Processor (ASP) Houston Automatic Spooling Priority (HASP and HASP II) Operating System/Virtual Storage 1 (OS/VS1) Operating System/Virtual Storage 2 (OS/VS2 MVS) Release 2 through 3.8 MVS versions from MVS/SP Version 1 through z/OS Priority Output Writers, Execution processors and input Readers (POWER) Remote Spooling Communications Subsystem (RSCS) Except for the RJE workstation programs in OS/360, these programs use a variation of BSC known as Multi-leaving. In addition, IBM provides separately ordered workstation programs using BSC. Systems Network Architecture (SNA) and TCP/IP. Workstation programs are available from IBM and third-party vendors to support all of these protocols: 2770/3770 2780/3780 Multileaving Network Job Entry (NJE) OS/360 RJE SNA TCP/IP Ultra was the designation adopted by British military intelligence in June 1941 for wartime signals intelligence obtained by breaking high-level encrypted enemy radio and teleprinter communications at the Government Code and Cypher School (GC&CS) at Bletchley Park. Ultra eventually became the standard designation among the western Allies for all such intelligence. The name arose because the intelligence obtained was considered more important than that designated by the highest British security classification then used (Most Secret) and so was regarded as being Ultra Secret. Several other cryptonyms had been used for such intelligence. The code name "Boniface" was used as a cover name for Ultra. In order to ensure that the successful code-breaking did not become apparent to the Germans, British intelligence created a fictional MI6 master spy, Boniface, who controlled a fictional series of agents throughout Germany. Information obtained through code-breaking was often attributed to the human intelligence from the Boniface network. The U.S. used the codename Magic for its decrypts from Japanese sources, including the "Purple" cipher. Much of the German cipher traffic was encrypted on the Enigma machine. Used properly, the German military Enigma would have been virtually unbreakable; in practice, shortcomings in operation allowed it to be broken. The term "Ultra" has often been used almost synonymously with "Enigma decrypts". However, Ultra also encompassed decrypts of the German Lorenz SZ 40/42 machines that were used by the German High Command, and the Hagelin machine. Many observers, at the time and later, regarded Ultra as immensely valuable to the Allies. Winston Churchill was reported to have told King George VI, when presenting to him Stewart Menzies (head of the Secret Intelligence Service and the person who controlled distribution of Ultra decrypts to the government): "It is thanks to the secret weapon of General Menzies, put into use on all the fronts, that we won the war!" F. W. Winterbotham quoted the western Supreme Allied Commander, Dwight D. Eisenhower, at war's end describing Ultra as having been "decisive" to Allied victory. Sir Harry Hinsley, Bletchley Park veteran and official historian of British Intelligence in World War II, made a similar assessment of Ultra, saying that while the Allies would have won the war without it, "the war would have been something like two years longer, perhaps three years longer, possibly four years longer than it was." However, Hinsley and others have emphasized the difficulties of counterfactual history in attempting such conclusions, and some historians, such as John Keegan, have said the shortening might have been as little as the three months it took the United States to deploy the atomic bomb. == Sources of intelligence == Most Ultra intelligence was derived from reading radio messages that had been encrypted with cipher machines, complemented by material from radio communications using traffic analysis and direction finding. In the early phases of the war, particularly during the eight-month Phoney War, the Germans could transmit most of their messages using land lines and so had no need to use radio. This meant that those at Bletchley Park had some time to build up experience of collecting and starting to decrypt messages on the various radio networks. German Enigma messages were the main source, with those of the German air force (the Luftwaffe) predominating, as they used radio more and their operators were particularly ill-disciplined. === German === ==== Enigma ==== "Enigma" refers to a family of electro-mechanical rotor cipher machines. These produced a polyalphabetic substitution cipher and were widely thought to be unbreakable in the 1920s, when a variant of the commercial Model D was first used by the Reichswehr. The German Army (Heer), Navy, Air Force, Nazi party, Gestapo and German diplomats used Enigma machines in several variants. Abwehr (German military intelligence) used a four-rotor machine without a plugboard and Naval Enigma used different key management from that of the army or air force, making its traffic far more difficult to cryptanalyse; each variant required different cryptanalytic treatment. The commercial versions were not as secure and Dilly Knox of GC&CS is said to have broken one before the war. German military Enigma was first broken in December 1932 by Marian Rejewski and the Polish Cipher Bureau, using a combination of brilliant mathematics, the services of a spy in the German office responsible for administering encrypted communications, and good luck. The Poles read Enigma to the outbreak of World War II and beyond, in France. At the turn of 1939, the Germans made the systems ten times more complex, which required a tenfold increase in Polish decryption equipment, which they could not meet. On 25 July 1939, the Polish Cipher Bureau handed reconstructed Enigma machines and their techniques for decrypting ciphers to the French and British. Gordon Welchman wrote, Ultra would never have got off the ground if we had not learned from the Poles, in the nick of time, the details both of the German military Enigma machine, and of the operating procedures that were in use. At Bletchley Park, some of the key people responsible for success against Enigma included mathematicians Alan Turing and Hugh Alexander and, at the British Tabulating Machine Company, chief engineer Harold Keen. After the war, interrogation of German cryptographic personnel led to the conclusion that German cryptanalysts understood that cryptanalytic attacks against Enigma were possible but were thought to require impracticable amounts of effort and investment. The Poles' early start at breaking Enigma and the continuity of their success gave the Allies an advantage when World War II began. ==== Lorenz cipher ==== In June 1941, the Germans started to introduce on-line stream cipher teleprinter systems for strategic point-to-point radio links, to which the British gave the code-name Fish. Several systems were used, principally the Lorenz SZ 40/42 (codenamed "Tunny" by the British) and Geheimfernschreiber ("Sturgeon"). These cipher systems were cryptanalysed, particularly Tunny, which the British thoroughly penetrated. It was eventually attacked using Colossus machines, which were the first digital programme-controlled electronic computers. In many respects the Tunny work was more difficult than for the Enigma, since the British codebreakers had no knowledge of the machine producing it and no head-start such as that the Poles had given them against Enigma. Although the volume of intelligence derived from this system was much smaller than that from Enigma, its importance was often far higher because it produced primarily high-level, strategic intelligence that was sent between Wehrmacht high command (Oberkommando der Wehrmacht, OKW). The eventual bulk decryption of Lorenz-enciphered messages contributed significantly, and perhaps decisively, to the defeat of Nazi Germany. Nevertheless, the Tunny story has become much less well known among the public than the Enigma one. At Bletchley Park, some of the key people responsible for success in the Tunny effort included mathematicians W. T. "Bill" Tutte and Max Newman and electrical engineer Tommy Flowers. === Italian === In June 1940, the Italians were using book codes for most of their military messages, except for the Italian Navy, which in early 1941 had started using a version of the Hagelin rotor-based cipher machine C-38. This was broken from June 1941 onwards by the Italian subsection of GC&CS at Bletchley Park. === Japanese === In the Pacific theatre, a Japanese cipher machine, called "Purple" by the Americans, was used for highest-level Japanese diplomatic traffic. It produced a polyalphabetic substitution cipher, but unlike Enigma, was not a rotor machine, being built around electrical stepping switches. It was broken by the US Army Signal Intelligence Service and disseminated as Magic. Detailed reports by the Japanese ambassador to Germany were encrypted on the Purple machine. His reports included reviews of German assessments of the military situation, reviews of strategy and intentions, reports on direct inspections by the ambassador (in one case, of Normandy beach defences), and reports of long interviews with Hitler. The Japanese are said to have obtained an Enigma machine in 1937, although it is debated whether they were given it by the Germans or bought a commercial version, which, apart from the plugboard and internal wiring, was the German Heer/Luftwaffe machine. Having developed a similar machine, the Japanese did not use the Enigma machine for their most secret communications. The chief fleet communications code system used by the Imperial Japanese Navy was called JN-25 by the Americans, and by early 1942 the US Navy had made considerable progress in decrypting Japanese naval messages. The US Army also made progress on the Clarizen, Inc. is a project management software and collaborative work management company. Clarizen uses a software as a service business model. Clarizen's features include attaching CAD drawings to a project, moving between the project view and design view and an E-mail reporting feature. In May 2014 Clarizen raised $35 million in venture capital investment led by Goldman Sachs. The round brought investment to $90 million. Previous investors, including Benchmark Capital, Carmel Ventures, DAG Ventures, Opus Capital and Vintage Investment Partners participated. In April 2020, Clarizen appointed Matt Zilli as its new CEO, replacing Boaz Chalamish who is appointed as Executive Chairman. In January 2021 Clarizen was acquired by Planview. Data lineage refers to the process of tracking how data is generated, transformed, transmitted and used across systems over time. It documents data's origins, transformations and movements, providing detailed visibility into its life cycle. This process simplifies the identification of errors in data analytics workflows, by enabling users to trace issues back to their root causes. Data lineage facilitates the ability to replay specific segments or inputs of the dataflow. This can be used in debugging or regenerating lost outputs. In database systems, this concept is closely related to data provenance, which involves maintaining records of inputs, entities, systems and processes that influence data. Data provenance provides a historical record of data origins and transformations. It supports forensic activities such as data-dependency analysis, error/compromise detection, recovery, auditing and compliance analysis: "Lineage is a simple type of why provenance." Data governance plays a critical role in managing metadata by establishing guidelines, strategies and policies. Enhancing data lineage with data quality measures and master data management adds business value. Although data lineage is typically represented through a graphical user interface (GUI), the methods for gathering and exposing metadata to this interface can vary. Based on the metadata collection approach, data lineage can be categorized into three types: Those involving software packages for structured data, programming languages and Big data systems. Data lineage information includes technical metadata about data transformations. Enriched data lineage may include additional elements such as data quality test results, reference data, data models, business terminology, data stewardship information, program management details and enterprise systems associated with data points and transformations. Data lineage visualization tools often include masking features that allow users to focus on information relevant to specific use cases. To unify representations across disparate systems, metadata normalization or standardization may be required. == Representation of data lineage == Representation broadly depends on the scope of the metadata management and reference point of interest. Data lineage provides sources of the data and intermediate data flow hops from the reference point with backward data lineage, leading to the final destination's data points and its intermediate data flows with forward data lineage. These views can be combined with end-to-end lineage for a reference point that provides a complete audit trail of that data point of interest from sources to their final destinations. As the data points or hops increase, the complexity of such representation becomes incomprehensible. Thus, the best feature of the data lineage view is the ability to simplify the view by temporarily masking unwanted peripheral data points. Tools with the masking feature enable scalability of the view and enhance analysis with the best user experience for both technical and business users. Data lineage also enables companies to trace sources of specific business data to track errors, implement changes in processes and implementing system migrations to save significant amounts of time and resources. Data lineage can improve efficiency in business intelligence BI processes. Data lineage can be represented visually to discover the data flow and movement from its source to destination via various changes and hops on its way in the enterprise environment. This includes how the data is transformed along the way, how the representation and parameters change and how the data splits or converges after each hop. A simple representation of the Data Lineage can be shown with dots and lines, where dots represent data containers for data points, and lines connecting them represent transformations the data undergoes between the data containers. Data lineage can be visualized at various levels based on the granularity of the view. At a very high-level, data lineage is visualized as systems that the data interacts with before it reaches its destination. At its most granular, visualizations at the data point level can provide the details of the data point and its historical behavior, attribute properties and trends and data quality of the data passed through that specific data point in the data lineage. The scope of the data lineage determines the volume of metadata required to represent its data lineage. Usually, data governance and data management of an organization determine the scope of the data lineage based on their regulations, enterprise data management strategy, data impact, reporting attributes and critical data elements of the organization. == Rationale == Distributed systems like Google Map Reduce, Microsoft Dryad, Apache Hadoop (an open-source project) and Google Pregel provide such platforms for businesses and users. However, even with these systems, Big Data analytics can take several hours, days or weeks to run, simply due to the data volumes involved. For example, a ratings prediction algorithm for the Netflix Prize challenge took nearly 20 hours to execute on 50 cores, and a large-scale image processing task to estimate geographic information took 3 days to complete using 400 cores. "The Large Synoptic Survey Telescope is expected to generate terabytes of data every night and eventually store more than 50 petabytes, while in the bioinformatics sector, the 12 largest genome sequencing houses in the world now store petabytes of data apiece. It is very difficult for a data scientist to trace an unknown or an unanticipated result. === Big data debugging === Big data analytics is the process of examining large data sets to uncover hidden patterns, unknown correlations, market trends, customer preferences and other useful business information. Machine learning, among other algorithms, is used to transform and analyze the data. Due to the large size of the data, there could be unknown features in the data. The massive scale and unstructured nature of data, the complexity of these analytics pipelines, and long runtimes pose significant manageability and debugging challenges. Even a single error in these analytics can be extremely difficult to identify and remove. While one may debug them by re-running the entire analytics through a debugger for stepwise debugging, this can be expensive due to the amount of time and resources needed. Auditing and data validation are other major problems due to the growing ease of access to relevant data sources for use in experiments, the sharing of data between scientific communities and use of third-party data in business enterprises. As such, more cost-efficient ways of analyzing data intensive scale-able computing (DISC) are crucial to their continued effective use. === Challenges in Big Data debugging === ==== Massive scale ==== According to an EMC/IDC study, 2.8 ZB of data were created and replicated in 2012. Furthermore, the same study states that the digital universe will double every two years between now and 2020, and that there will be approximately 5.2 TB of data for every person in 2020. Based on current technology, the storage of this much data will mean greater energy usage by data centers. ==== Unstructured data ==== Unstructured data usually refers to information that doesn't reside in a traditional row-column database. Unstructured data files often include text and multimedia content, such as e-mail messages, word processing documents, videos, photos, audio files, presentations, web pages and many other kinds of business documents. While these types of files may have an internal structure, they are still considered "unstructured" because the data they contain doesn't fit neatly into a database. The amount of unstructured data in enterprises is growing many times faster than structured databases are growing. Big data can include both structured and unstructured data, but IDC estimates that 90 percent of Big Data is unstructured data. The fundamental challenge of unstructured data sources is that they are difficult for non-technical business users and data analysts alike to unbox, understand and prepare for analytic use. Beyond issues of structure, the sheer volume of this type of data contributes to such difficulty. Because of this, current data mining techniques often leave out valuable information and make analyzing unstructured data laborious and expensive. In today's competitive business environment, companies have to find and analyze the relevant data they need quickly. The challenge is going through the volumes of data and accessing the level of detail needed, all at a high speed. The challenge only grows as the degree of granularity increases. One possible solution is hardware. Some vendors are using increased memory and parallel processing to crunch large volumes of data quickly. Another method is putting data in-memory but using a gridSocial media and suicide
IBM remote batch terminals
Ultra (cryptography)
Clarizen
Data lineage