Judea Pearl

Judea Pearl (Hebrew: יהודה פרל; born September 4, 1936) is an Israeli-American electrical engineer, computer scientist and philosopher, best known for championing the probabilistic approach to artificial intelligence and the development of Bayesian networks (see the article on belief propagation). He is also credited for developing a theory of causal and counterfactual inference based on structural models (see article on causality). In 2011, the Association for Computing Machinery (ACM) awarded Pearl with the Turing Award, the highest distinction in computer science, "for fundamental contributions to artificial intelligence through the development of a calculus for probabilistic and causal reasoning". He is the author of several books, including the technical Causality: Models, Reasoning and Inference, and The Book of Why, a book on causality aimed at the general public. Judea Pearl is the father of journalist Daniel Pearl, who was kidnapped and murdered by terrorists in Pakistan connected with Al-Qaeda and the International Islamic Front in 2002. == Biography == Judea Pearl was born in Tel Aviv, British Mandate for Palestine, in 1936 to Eliezer and Tova Pearl, who were Polish Jewish immigrants, grew up in Bnei Brak. His grandfather Chaim Pearl was one of Bnei Brak's founders. He is a descendant of Menachem Mendel of Kotzk on his mother's side. After serving in the Israel Defense Forces and joining a kibbutz, Pearl decided to study engineering in 1956. He received a B.S. in electrical engineering from the Technion 1960. That same year, he emigrated to the United States and pursued graduate studies. He received an M.S. in electrical engineering from the Newark College of Engineering (now New Jersey Institute of Technology) in 1961, and went on to receive an M.S. in physics from Rutgers University and a PhD in electrical engineering from the Polytechnic Institute of Brooklyn (now the New York University Tandon School of Engineering) in 1965. He worked at RCA Research Laboratories (now SRI International) in Princeton, New Jersey on superconductive parametric amplifiers and storage devices and at Electronic Memories, Inc., on advanced memory systems. When semiconductors "wiped out" Pearl's work, as he later expressed it, he joined UCLA's School of Engineering in 1970 and started work on probabilistic artificial intelligence. He is one of the founding editors of the Journal of Causal Inference. Pearl is currently a professor of computer science and statistics and director of the Cognitive Systems Laboratory at UCLA. He and his wife, Ruth, had three children. In addition, as of 2011, he is a member of the International Advisory Board of NGO Monitor. Former Israeli Chief Rabbi, Rabbi Yisrael Meir Lau, partnered with Judea Pearl in the documentary With My Whole Broken Heart. == Murder of Daniel Pearl == In 2002, his son, Daniel Pearl, a journalist working for the Wall Street Journal was kidnapped and murdered in Pakistan, leading Judea and the other members of the family and friends to create the Daniel Pearl Foundation. On the seventh anniversary of Daniel's death, Judea wrote an article in the Wall Street Journal titled Daniel Pearl and the Normalization of Evil: When will our luminaries stop making excuses for terror?. Emeritus Chief Rabbi Jonathan Sacks quoted Judea Pearl's beliefs in a lesson on Judaism: "I asked Judea Pearl, father of the murdered journalist Daniel Pearl, why he was working for reconciliation between Jews and Muslims...he replied with heartbreaking lucidity, 'Hate killed my son. Therefore I am determined to fight hate.'" == Views == On his religious views, Pearl states that he is a "practicing disbeliever." He is very connected to Jewish traditions such as holidays and kiddush on Friday night. Pearl sits on the NGO Monitor international advisory board, a right-wing organization based in Jerusalem that reports on non-governmental organization activity from a pro-Israel perspective. == Research == Pearl is credited for "laying the foundations of modern artificial intelligence, so computer systems can process uncertainty and relate causes to effects." He is one of the pioneers of Bayesian networks and the probabilistic approach to artificial intelligence, and one of the first to mathematize causal modeling in the empirical sciences. His work is also intended as a high-level cognitive model. He is interested in the philosophy of science, knowledge representation, nonstandard logics, and learning. Pearl is described as "one of the giants in the field of artificial intelligence" by UCLA computer science professor Richard E. Korf. His work on causality has "revolutionized the understanding of causality in statistics, psychology, medicine and the social sciences" according to the Association for Computing Machinery. === Notable contributions === A summary of Pearl's scientific contributions is available in a chronological account authored by Stuart J. Russell (2012). An annotated bibliography of Pearl's contributions was compiled by the ACM in 2012. A video describing Pearl's major contributions to AI is available here. Pearl's opinion pieces, touching on Jewish identity, the war on terrorism, and the Middle East conflict can be accessed here. === Books === Heuristics, Addison-Wesley, 1984 Probabilistic Reasoning in Intelligent Systems, Morgan-Kaufmann, 1988 Pearl, Judea (2000). Causality: Models, Reasoning, and Inference. Cambridge University Press. I Am Jewish: Personal Reflections Inspired by the Last Words of Daniel Pearl, Jewish Lights, 2004. (Winner of a 2004 National Jewish Book Award) Causal Inference in Statistics: A Primer, (with Madelyn Glymour and Nicholas Jewell), Wiley, 2016. ISBN 978-1-119-18684-7 A previous survey: Causal inference in statistics: An overview, Statistics Surveys, 3:96–146, 2009. Pearl, Judea; Dana Mackenzie (2018). "The Book of Why: The New Science of Cause and Effect". Science. 361 (6405): 855. Bibcode:2018Sci...361..855.. doi:10.1126/science.aau9731. === Awards ===

Glossary of robotics

Robotics is the branch of technology that deals with the design, construction, operation, structural disposition, manufacture and application of robots. Robotics is related to the sciences of electronics, engineering, mechanics, and software. The following is a list of common definitions related to the Robotics field. == A == Actuator: a motor that translates control signals into mechanical movement. The control signals are usually electrical but may, more rarely, be pneumatic or hydraulic. The power supply may likewise be any of these. It is common for electrical control to be used to modulate a high-power pneumatic or hydraulic motor. Aerobot: a robot capable of independent flight on other planets. A type of aerial robot. Arduino: The current platform of choice for small-scale robotic experimentation and physical computing. Artificial intelligence: is the intelligence of machines and the branch of computer science that aims to create it. Aura (satellite): a robotic spacecraft launched by NASA in 2004 which collects atmospheric data from Earth. Automaton: an early self-operating robot, performing exactly the same actions, over and over. Autonomous vehicle: a vehicle equipped with an autopilot system, which is capable of driving from one point to another without input from a human operator. == B == Biomimetic: See Bionics. Bionics: also known as biomimetics, biognosis, biomimicry, or bionical creativity engineering is the application of biological methods and systems found in nature to the study and design of engineering systems and modern technology. == C == CAD/CAM (computer-aided design and computer-aided manufacturing): These systems and their data may be integrated into robotic operations. Čapek, Karel: Czech author who coined the term 'robot' in his 1921 play, Rossum's Universal Robots. Chandra X-ray Observatory: a robotic spacecraft launched by NASA in 1999 to collect astronomical data. Cloud robotics: robots empowered with more capacity and intelligence from cloud. Combat, robot: a hobby or sport event where two or more robots fight in an arena to disable each other. This has developed from a hobby in the 1990s to several TV series worldwide. Cruise missile: a robot-controlled guided missile that carries an explosive payload. Cyborg: also known as a cybernetic organism, a being with both biological and artificial (e.g. electronic, mechanical or robotic) parts. == D == Degrees of freedom: the extent to which a robot can move itself; expressed in terms of Cartesian coordinates (x, y, and z) and angular movements (yaw, pitch, and roll). Delta robot: a tripod linkage, used to construct fast-acting manipulators with a wide range of movement. Drive Power: The energy source or sources for the robot actuators. == E == Emergent behaviour, a complicated resultant behaviour that emerges from the repeated operation of simple underlying behaviours. Envelope (Space), Maximum The volume of space encompassing the maximum designed movements of all robot parts including the end-effector, workpiece, and attachments. Explosive ordnance disposal robot A mobile robot designed to assess whether an object contains explosives; some carry detonators that can be deposited at the object and activated after the robot withdraws. == F == FIRST(For Inspiration and Recognition of Science and Technology): an organization founded by inventor Dean Kamen in 1989 in order to develop ways to inspire students in engineering and technology fields. Forward chaining: a process in which events or received data are considered by an entity to intelligently adapt its behavior. == G == Gynoid: A humanoid robot designed to look like a human female. == H == Haptic: tactile feedback technology using the operator's sense of touch. Also sometimes applied to robot manipulators with their own touch sensitivity. Hexapod (platform): A movable platform using six linear actuators. Often used in flight simulators and fairground rides, they also have applications as a robotic manipulator. Hexapod (walker): A six-legged walking robot, using a simple insect-like locomotion. Human–computer interaction. Humanoid: A robotic entity designed to resemble a human being in form, function, or both. Hydraulics: the control of mechanical force and movement, generated by the application of liquid under pressure. cf. pneumatics. == I == Industrial robot: A reprogrammable, multifunctional manipulator designed to move material, parts, tools, or specialized devices through variable programmed motions for the performance of a variety of tasks. Insect robot: A small robot designed to imitate insect behaviors rather than complex human behaviors. == K == Kalman filter: a mathematical technique to estimate the value of a sensor measurement, from a series of intermittent and noisy values. Kinematics: the study of motion, as applied to robots. This includes both the design of linkages to perform motion, their power, control and stability; also their planning, such as choosing a sequence of movements to achieve a broader task. Inverse Kinematics: the process of determining joint angles required for a robot's end-effector to reach a desired position and orientation in space. Used in motion planning to calculate motor commands from target positions. == L == Linear actuator A form of motor that generates a linear movement directly. == M == Manipulator or gripper: A robotic 'hand'. Mobile robot: A self-propelled and self-contained robot that is capable of moving over a mechanically unconstrained course. Muting: The deactivation of a presence-sensing safeguarding device during a portion of the robot cycle. Mecanum wheel: A wheel fitted with angled rollers that enables a robot vehicle to move in multiple directions, including sideways. == O == Ornithopter – An aerial robot or drone that achieves flight through a flapping-wing mechanism rather than rotating blades or fixed wings, often utilized for highly maneuverable flight. == P == Parallel manipulator: an articulated robot or manipulator based on a number of kinematic chains, actuators and joints, in parallel. cf. serial manipulator. Pendant: Any portable control device that permits an operator to control the robot from within the restricted envelope (space) of the robot. Pneumatics: the control of mechanical force and movement, generated by the application of compressed gas. cf. hydraulics. Powered exoskeleton: is a wearable mobile machine that allow for limb movement with increased strength and endurance. Prosthetic robots: programmable manipulators or devices for missing human limbs. == R == Remote manipulator: A manipulator under direct human control, often used for work with hazardous materials. Robonaut: a development project conducted by NASA to create humanoid robots capable of using space tools and working in similar environments to suited astronauts. == S == Sensor fusion:The process of combining data from multiple sensors, such as LiDAR, cameras, global positioning systems (GPS), and inertial measurement units (IMUs), to produce a more accurate and reliable understanding of an environment than using a single sensor alone. It is widely used in robotics and autonomous systems to improve perception, localization, and decision-making. Serial manipulator: an articulated robot or manipulator with a single series kinematic chain of actuators. cf. parallel manipulator. Service robots are machines that extend human capabilities. Servo, a motor that moves to and maintains a set position under command, rather than continuously moving. Servomechanism An automatic device that uses error-sensing negative feedback to correct the performance of a mechanism. Single Point of Control The ability to operate the robot such that initiation or robot motion from one source of control is possible only from that source and cannot be overridden from another source. Slow Speed Control A mode of robot motion control where the velocity of the robot is limited to allow persons sufficient time either to withdraw the hazardous motion or stop the robot. Snake robot A robot component resembling a tentacle or elephant's trunk, where many small actuators are used to allow continuous curved motion of a robot component, with many degrees of freedom. This is usually applied to snake-arm robots, which use this as a flexible manipulator. A rarer application is the snakebot, where the entire robot is mobile and snake-like, so as to gain access through narrow spaces. Stepper motor Stewart platform A movable platform using six linear actuators, hence also known as a Hexapod. Subsumption architecture A robot architecture that uses a modular, bottom-up design beginning with the least complex behavioral tasks. Surgical robot, a remote manipulator used for keyhole surgery Swarm robotics involve large numbers of mostly simple physical robots. Their actions may seek to incorporate emergent behavior observed in social insects (swarm intelligence). Synchro == T == Teach Mode: The control state that al

Information leakage

Information leakage happens whenever a system that is designed to be closed to an eavesdropper reveals some information to unauthorized parties nonetheless. In other words: Information leakage occurs when secret information correlates with, or can be correlated with, observable information. For example, when designing an encrypted instant messaging network, a network engineer without the capacity to crack encryption codes could see when messages are transmitted, even if he could not read them. == Risk vectors == A modern example of information leakage is the leakage of secret information via data compression, by using variations in data compression ratio to reveal correlations between known (or deliberately injected) plaintext and secret data combined in a single compressed stream. Another example is the key leakage that can occur when using some public-key systems when cryptographic nonce values used in signing operations are insufficiently random. Bad randomness cannot protect proper functioning of a cryptographic system, even in a benign circumstance, it can easily produce crackable keys that cause key leakage. Information leakage can sometimes be deliberate: for example, an algorithmic converter may be shipped that intentionally leaks small amounts of information, in order to provide its creator with the ability to intercept the users' messages, while still allowing the user to maintain an illusion that the system is secure. This sort of deliberate leakage is sometimes known as a subliminal channel. Generally, only very advanced systems employ defenses against information leakage. Following are the commonly implemented countermeasures : Use steganography to hide the fact that a message is transmitted at all. Use chaffing to make it unclear to whom messages are transmitted (but this does not hide from others the fact that messages are transmitted). For busy re-transmitting proxies, such as a Mixmaster node: randomly delay and shuffle the order of outbound packets - this will assist in disguising a given message's path, especially if there are multiple, popular forwarding nodes, such as are employed with Mixmaster mail forwarding. When a data value is no longer going to be used, erase it from the memory.

Server-Gated Cryptography

Server-Gated Cryptography (SGC), also known as International Step-Up by Netscape, is a defunct mechanism that was used to step up from 40-bit or 56-bit to 128-bit cipher suites with SSL. It was created in response to United States federal legislation on the export of strong cryptography in the 1990s. The legislation had limited encryption to weak algorithms and shorter key lengths in software exported outside of the United States of America. When the legislation added an exception for financial transactions, SGC was created as an extension to SSL with the certificates being restricted to financial organisations. In 1999, this list was expanded to include online merchants, healthcare organizations, and insurance companies. This legislation changed in January 2000, resulting in vendors no longer shipping export-grade browsers and SGC certificates becoming available without restriction. Internet Explorer supported SGC starting with patched versions of Internet Explorer 3. SGC became obsolete when Internet Explorer 5.01 SP1 and Internet Explorer 5.5 started supporting strong encryption without the need for a separate high encryption pack (except on Windows 2000, which needs its own high encryption pack that was included in Service Pack 2 and later). "Export-grade" browsers are unusable on the modern Web due to many servers disabling export cipher suites. Additionally, these browsers are incapable of using SHA-2 family signature hash algorithms like SHA-256. Certification authorities are trying to phase out the new issuance of certificates with the older SHA-1 signature hash algorithm. The continuing use of SGC facilitates the use of obsolete, insecure Web browsers with HTTPS. However, while certificates that use the SHA-1 signature hash algorithm remain available, some certificate authorities continue to issue SGC certificates (often charging a premium for them) although they are obsolete. The reason certificate authorities can charge a premium for SGC certificates is that browsers only allowed a limited number of roots to support SGC. When an SSL handshake takes place, the software (e.g. a web browser) would list the ciphers that it supports. Although the weaker exported browsers would only include weaker ciphers in its initial SSL handshake, the browser also contained stronger cryptography algorithms. There are two protocols involved to activate them. Netscape Communicator 4 used International Step-Up, which used the now obsolete insecure renegotiation to change to a stronger cipher suite. Microsoft used SGC, which sends a new Client Hello message listing the stronger cipher suites on the same connection after the certificate is determined to be SGC capable, and also supported Netscape Step-Up for compatibility (though this support in the NT 4.0 SP6 and IE 5.01 version had a bug where changing MAC algorithms during Step-Up did not work properly).

Data independence

Data independence is the type of data transparency that matters for a centralized DBMS. It refers to the immunity of user applications to changes made in the definition and organization of data. Application programs should not, ideally, be exposed to details of data representation and storage. The DBMS provides an abstract view of the data that hides such details. There are two types of data independence: physical and logical data independence. The data independence and operation independence together gives the feature of data abstraction. There are two levels of data independence. == Logical data independence == The logical structure of the data is known as the 'schema definition'. In general, if a user application operates on a subset of the attributes of a relation, it should not be affected later when new attributes are added to the same relation. Logical data independence indicates that the conceptual schema can be changed without affecting the existing schemas. == Physical data independence == The physical structure of the data is referred to as "physical data description". Physical data independence deals with hiding the details of the storage structure from user applications. The application should not be involved with these issues since, conceptually, there is no difference in the operations carried out against the data. There are three types of data independence: Logical data independence: The ability to change the logical (conceptual) schema without changing the External schema (User View) is called logical data independence. For example, the addition or removal of new entities, attributes, or relationships to the conceptual schema or having to rewrite existing application programs. Physical data independence: The ability to change the physical schema without changing the logical schema is called physical data independence. For example, a change to the internal schema, such as using different file organization or storage structures, storage devices, or indexing strategy, should be possible without having to change the conceptual or external schemas. View level data independence: always independent no effect, because there doesn't exist any other level above view level. == Data independence == Data independence can be explained as follows: Each higher level of the data architecture is immune to changes of the next lower level of the architecture. The logical scheme stays unchanged even though the storage space or type of some data is changed for reasons of optimization or reorganization. In this, external schema does not change. In this, internal schema changes may be required due to some physical schema were reorganized here. Physical data independence is present in most databases and file environment in which hardware storage of encoding, exact location of data on disk, merging of records, so on this are hidden from user. == Data independence types == The ability to modify schema definition in one level without affecting schema of that definition in the next higher level is called data independence. There are two levels of data independence, they are Physical data independence and Logical data independence. Physical data independence is the ability to modify the physical schema without causing application programs to be rewritten. Modifications at the physical level are occasionally necessary to improve performance. It means we change the physical storage/level without affecting the conceptual or external view of the data. The new changes are absorbed by mapping techniques. Logical data independence is the ability to modify the logical schema without causing application programs to be rewritten. Modifications at the logical level are necessary whenever the logical structure of the database is altered (for example, when money-market accounts are added to banking system). Logical Data independence means if we add some new columns or remove some columns from table then the user view and programs should not change. For example: consider two users A & B. Both are selecting the fields "EmployeeNumber" and "EmployeeName". If user B adds a new column (e.g. salary) to his table, it will not affect the external view for user A, though the internal schema of the database has been changed for both users A & B. Logical data independence is more difficult to achieve than physical data independence, since application programs are heavily dependent on the logical structure of the data that they access.

Google Books Ngram Viewer

The Google Books Ngram Viewer is an online search engine that charts the frequencies of any set of search strings using a yearly count of n-grams found in printed sources published between 1500 and 2022 in Google's text corpora in English, Chinese (simplified), French, German, Hebrew, Italian, Russian, or Spanish. There are also some specialized English corpora, such as American English, British English, and English Fiction. The program can search for a word or a phrase. The n-grams are matched with the text within the selected corpus, and if found in 40 or more books, are then displayed as a graph. The program supports searches for parts of speech and wildcards. It is routinely used in research. == History == The Ngram Viewer was created by Google software engineers Will Brockman and Jon Orwant , who teamed up with Harvard researchers Jean-Baptiste Michel and Erez Lieberman Aiden. The service was released on December 16, 2010. Before the release, it was difficult to quantify the rate of linguistic change because of the absence of a database that was designed for this purpose, said Steven Pinker, a well-known linguist who was one of the co-authors of the Science paper published on the same day. The Google Books Ngram Viewer was developed in the hope of opening a new window to quantitative research in the humanities field, and the database contained 500 billion words from 5.2 million books publicly available from the very beginning. The intended audience was scholarly, but the Google Books Ngram Viewer made it possible for anyone with a computer to see a graph that represents the diachronic change of the use of words and phrases with ease. Lieberman said in response to The New York Times that the developers aimed to provide even children with the ability to browse cultural trends throughout history. In the Science paper, Lieberman and his collaborators called the method of high-volume data analysis in digitized texts "culturomics". == Usage == Commas delimit user-entered search terms, where each comma-separated term is searched in the database as an n-gram (for example, "nursery school" is a 2-gram or bigram). The Ngram Viewer then returns a plotted line chart. Due to limitations on the size of the Ngram database, only matches found in at least 40 books are indexed. == Limitations == The data sets of the Ngram Viewer have been criticized for their reliance upon inaccurate optical character recognition (OCR) and for including large numbers of incorrectly dated and categorized texts. Because of these errors, and because they are uncontrolled for bias (such as the increasing amount of scientific literature, which causes other terms to appear to decline in popularity), care must be taken in using the corpora to study language or test theories. Furthermore, the data sets may not reflect general linguistic or cultural change and can only hint at such an effect because they do not involve any metadata like date published, author, length, or genre, to avoid any potential copyright infringements. Systemic errors like the confusion of s and f in pre-19th century texts (due to the use of ſ, the long s, which is similar in appearance to f) can cause systemic bias. Although the Google Books team claims that the results are reliable from 1800 onwards, poor OCR and insufficient data mean that frequencies given for languages such as Chinese may only be accurate from 1970 onward, with earlier parts of the corpus showing no results at all for common terms, and data for some years containing more than 50% noise. Guidelines for doing research with data from Google Ngram have been proposed that try to address some of the issues discussed above.

Data lake

A data lake is a system or repository of data stored in its natural/raw format, usually object blobs or files. A data lake is usually a single store of data including raw copies of source system data, sensor data, social data etc., and transformed data used for tasks such as reporting, visualization, advanced analytics, and machine learning. A data lake can include structured data from relational databases (rows and columns), semi-structured data (CSV, logs, XML, JSON), unstructured data (emails, documents, PDFs), and binary data (images, audio, video). A data lake can be established on premises (within an organization's data centers) or in the cloud (using cloud services). == Background == James Dixon, then chief technology officer at Pentaho, coined the term by 2011 to contrast it with data mart, which is a smaller repository of interesting attributes derived from raw data. In promoting data lakes, he argued that data marts have several inherent problems, such as information siloing. PricewaterhouseCoopers (PwC) said that data lakes could "put an end to data silos". In their study on data lakes, they noted that enterprises were "starting to extract and place data for analytics into a single, Hadoop-based repository." == Examples == Many companies use cloud storage services such as Google Cloud Storage and Amazon S3 or a distributed file system such as Apache Hadoop distributed file system (HDFS). There is a gradual academic interest in the concept of data lakes. For example, Personal DataLake at Cardiff University is a new type of data lake which aims at managing big data of individual users by providing a single point of collecting, organizing, and sharing personal data. Early data lakes, such as Hadoop 1.0, had limited capabilities because it only supported batch-oriented processing (Map Reduce). Interacting with it required expertise in Java, map reduce and higher-level tools like Apache Pig, Apache Spark and Apache Hive (which were also originally batch-oriented). == Criticism == Poorly managed data lakes have been facetiously called data swamps. In June 2015, David Needle characterized "so-called data lakes" as "one of the more controversial ways to manage big data". PwC was also careful to note in their research that not all data lake initiatives are successful. They quote Sean Martin, CTO of Cambridge Semantics: We see customers creating big data graveyards, dumping everything into Hadoop distributed file system (HDFS) and hoping to do something with it down the road. But then they just lose track of what’s there. The main challenge is not creating a data lake, but taking advantage of the opportunities it presents. They describe companies that build successful data lakes as gradually maturing their lake as they figure out which data and metadata are important to the organization. Another criticism is that the term data lake is used with many different meanings. It may be used to refer to, for example: any tools or data management practices that are not data warehouses; a particular technology for implementation; a raw data reservoir; a hub for ETL offload; or a central hub for self-service analytics. While critiques of data lakes are warranted, in many cases they apply to other data projects as well. For example, the definition of data warehouse is also changeable, and not all data warehouse efforts have been successful. In response to various critiques, McKinsey noted that the data lake should be viewed as a service model for delivering business value within the enterprise, not a technology outcome. == Data lakehouses == Data lakehouses are a hybrid approach that can ingest a variety of raw data formats like a data lake, while also providing ACID transactions and enforced data quality like a data warehouse.