Evolvability (computer science)

Evolvability (computer science)

The term evolvability is a framework of computational learning introduced by Leslie Valiant in his paper of the same name. The aim of this theory is to model biological evolution and categorize which types of mechanisms are evolvable. Evolution is an extension of PAC learning and learning from statistical queries. == General framework == Let F n {\displaystyle F_{n}\,} and R n {\displaystyle R_{n}\,} be collections of functions on n {\displaystyle n\,} variables. Given an ideal function f ∈ F n {\displaystyle f\in F_{n}} , the goal is to find by local search a representation r ∈ R n {\displaystyle r\in R_{n}} that closely approximates f {\displaystyle f\,} . This closeness is measured by the performance Perf ⁡ ( f , r ) {\displaystyle \operatorname {Perf} (f,r)} of r {\displaystyle r\,} with respect to f {\displaystyle f\,} . As is the case in the biological world, there is a difference between genotype and phenotype. In general, there can be multiple representations (genotypes) that correspond to the same function (phenotype). That is, for some r , r ′ ∈ R n {\displaystyle r,r'\in R_{n}} , with r ≠ r ′ {\displaystyle r\neq r'\,} , still r ( x ) = r ′ ( x ) {\displaystyle r(x)=r'(x)\,} for all x ∈ X n {\displaystyle x\in X_{n}} . However, this need not be the case. The goal then, is to find a representation that closely matches the phenotype of the ideal function, and the spirit of the local search is to allow only small changes in the genotype. Let the neighborhood N ( r ) {\displaystyle N(r)\,} of a representation r {\displaystyle r\,} be the set of possible mutations of r {\displaystyle r\,} . For simplicity, consider Boolean functions on X n = { − 1 , 1 } n {\displaystyle X_{n}=\{-1,1\}^{n}\,} , and let D n {\displaystyle D_{n}\,} be a probability distribution on X n {\displaystyle X_{n}\,} . Define the performance in terms of this. Specifically, Perf ⁡ ( f , r ) = ∑ x ∈ X n f ( x ) r ( x ) D n ( x ) . {\displaystyle \operatorname {Perf} (f,r)=\sum _{x\in X_{n}}f(x)r(x)D_{n}(x).} Note that Perf ⁡ ( f , r ) = Prob ⁡ ( f ( x ) = r ( x ) ) − Prob ⁡ ( f ( x ) ≠ r ( x ) ) . {\displaystyle \operatorname {Perf} (f,r)=\operatorname {Prob} (f(x)=r(x))-\operatorname {Prob} (f(x)\neq r(x)).} In general, for non-Boolean functions, the performance will not correspond directly to the probability that the functions agree, although it will have some relationship. Throughout an organism's life, it will only experience a limited number of environments, so its performance cannot be determined exactly. The empirical performance is defined by Perf s ⁡ ( f , r ) = 1 s ∑ x ∈ S f ( x ) r ( x ) , {\displaystyle \operatorname {Perf} _{s}(f,r)={\frac {1}{s}}\sum _{x\in S}f(x)r(x),} where S {\displaystyle S\,} is a multiset of s {\displaystyle s\,} independent selections from X n {\displaystyle X_{n}\,} according to D n {\displaystyle D_{n}\,} . If s {\displaystyle s\,} is large enough, evidently Perf s ⁡ ( f , r ) {\displaystyle \operatorname {Perf} _{s}(f,r)} will be close to the actual performance Perf ⁡ ( f , r ) {\displaystyle \operatorname {Perf} (f,r)} . Given an ideal function f ∈ F n {\displaystyle f\in F_{n}} , initial representation r ∈ R n {\displaystyle r\in R_{n}} , sample size s {\displaystyle s\,} , and tolerance t {\displaystyle t\,} , the mutator Mut ⁡ ( f , r , s , t ) {\displaystyle \operatorname {Mut} (f,r,s,t)} is a random variable defined as follows. Each r ′ ∈ N ( r ) {\displaystyle r'\in N(r)} is classified as beneficial, neutral, or deleterious, depending on its empirical performance. Specifically, r ′ {\displaystyle r'\,} is a beneficial mutation if Perf s ⁡ ( f , r ′ ) − Perf s ⁡ ( f , r ) ≥ t {\displaystyle \operatorname {Perf} _{s}(f,r')-\operatorname {Perf} _{s}(f,r)\geq t} ; r ′ {\displaystyle r'\,} is a neutral mutation if − t < Perf s ⁡ ( f , r ′ ) − Perf s ⁡ ( f , r ) < t {\displaystyle -t<\operatorname {Perf} _{s}(f,r')-\operatorname {Perf} _{s}(f,r) 0 {\displaystyle \epsilon >0\,} , for all ideal functions f ∈ F n {\displaystyle f\in F_{n}} and representations r 0 ∈ R n {\displaystyle r_{0}\in R_{n}} , with probability at least 1 − ϵ {\displaystyle 1-\epsilon \,} , Perf ⁡ ( f , r g ( n , 1 / ϵ ) ) ≥ 1 − ϵ , {\displaystyle \operatorname {Perf} (f,r_{g(n,1/\epsilon )})\geq 1-\epsilon ,} where the sizes of neighborhoods N ( r ) {\displaystyle N(r)\,} for r ∈ R n {\displaystyle r\in R_{n}\,} are at most p ( n , 1 / ϵ ) {\displaystyle p(n,1/\epsilon )\,} , the sample size is s ( n , 1 / ϵ ) {\displaystyle s(n,1/\epsilon )\,} , the tolerance is t ( 1 / n , ϵ ) {\displaystyle t(1/n,\epsilon )\,} , and the generation size is g ( n , 1 / ϵ ) {\displaystyle g(n,1/\epsilon )\,} . F {\displaystyle F\,} is evolvable over D {\displaystyle D\,} if it is evolvable by some R {\displaystyle R\,} over D {\displaystyle D\,} . F {\displaystyle F\,} is evolvable if it is evolvable over all distributions D {\displaystyle D\,} . == Results == The class of conjunctions and the class of disjunctions are evolvable over the uniform distribution for short conjunctions and disjunctions, respectively. The class of parity functions (which evaluate to the parity of the number of true literals in a given subset of literals) are not evolvable, even for the uniform distribution. Evolvability implies PAC learnability.

Label noise

Label noise refers to errors or inaccuracies in the class labels of data instances. This is a widespread issue in machine learning datasets, arising from human annotator mistakes, unclear labeling instructions, automated labeling methods, or adversarial attacks in supervised learning. Label noise can be roughly divided into random noise, where labels are flipped independently of input features, and systematic noise, where mislabeling is dependent on certain patterns or biases in the data. Label noise can be damaging to model performance, especially for complex models that may overfit to noisy labels rather than generalizable patterns. Many approaches have been proposed to deal with the effects of label noise, including robust loss functions, noise-tolerant algorithms, data cleaning methods, and semi-supervised learning approaches. To reduce the impact of wrong labels during training, techniques like label smoothing, sample reweighting and using trusted validation sets are used. The role of noise-robust training paradigms and curriculum learning strategies to improve resilience against mislabeled data is also explored in recent research.

Minion (solver)

Minion is a solver for satisfaction problems. Unlike constraint programming toolkits, which expect users to write programs in a traditional programming language like C++, Java or Prolog, Minion takes a text file which specifies the problem, and solves using only this. This makes using Minion much simpler, at the cost of much less customization. Minion has been shown to be faster than major commercial constraint solvers including CPLEX (formerly IBM ILOG). == Overview == Minion was introduced in 2006 by researchers at the University of St Andrews as a “fast, scalable” solver for large and hard CSP instances. The project provides a compact input language and a low-overhead C++ implementation aimed at throughput and memory efficiency. == Design and features == Minion implements a range of variable and constraint types commonly used in CSP modelling, plus search heuristics and optimisation support. The solver architecture prioritises cache-friendly data structures and specialised propagators. Notably, the developers adapted watched literal techniques from SAT solving to speed up constraint propagation for, among others, Boolean sums, the element global constraint, and table constraints. The modelling approach relies on a plain-text format (parsed by Minion) rather than embedding models into a host programming language. This reduces overhead and supports rapid “model-and-run” experimentation for large benchmark sets. == Performance == In the original evaluation on standard benchmarks, the authors reported that Minion often ran between one and two orders of magnitude faster than state-of-the-art toolkits of the time (including ILOG Solver and Gecode) on large, hard instances, with smaller gains—or slowdowns—on easier problems. Subsequent research has used Minion as a baseline solver in empirical studies and test generation tasks, reflecting its adoption within parts of the constraint programming community. == Applications == Minion has been applied in academic work on combinatorial search, scheduling and test generation, and is available to other environments via wrappers (for example, from the R language).

POSC Caesar

POSC Caesar Association (PCA) is an international, open and not-for-profit, member organization that promotes the development of open specifications to be used as standards for enabling the interoperability of data, software and related matters. PCA is the initiator of ISO 15926 "Integration of life-cycle data for process plants including oil and gas production facilities" and is committed to its maintenance and enhancement. Nils Sandsmark has been the General Manager of POSC Caesar Association since 1999 and Thore Langeland, Norwegian Oil Industry Association (Norwegian: Oljeindustriens Landsforening, OLF), is the chairman of the board. == History == === Caesar Offshore === The first predecessor of POSC Caesar Association, the Caesar Offshore program, started in 1993. The original focus was on standardizing technical data definitions for capital intensive projects at the handover from the EPC contractor to the owner/operators of onshore and offshore oil and gas production facilities. The program was sponsored by The Research Council of Norway, two EPC contractors (Aker Maritime and Kværner), three owners/operators (Norsk Hydro, Saga Petroleum and Statoil) and DNV as service provider and project owner. === POSC Caesar project === During the period 1994–96, Caesar Offshore Program was defined as a project of Petrotechnical Open Software Corporation (POSC) (now Energistics), and changed its name to the POSC Caesar Project. In 1995 the project was joined by BP, Brown and Root and Elf Aquitaine and in 1997 by Intergraph, IBM, Oracle, Lloyd's, Shell, ABB and UMOE Technologies. During that time, POSC Caesar also became a member of European Process Industries STEP Technical Liaison Executive (EPISTLE) where it collaborates with PISTEP (UK), and USPI-NL (The Netherlands) on the development of ISO 10303, also known as "Standard for the Exchange of Product model data (STEP)". === POSC Caesar Association === In 1997, POSC Caesar Association was founded as an independent, global, non-profit, member organization. POSC Caesar Association serves an international membership and collaborates with other international organizations. It has its main office in Norway. Albeit the name of POSC Caesar Association still hints to its past as a project within the Petrotechnical Open Software Corporation (POSC) (now Energistics), from 1997 onwards, the organization has been independent. Energistics and POSC Caesar Association do collaborate, and are formally member in each other's organization. == Membership == POSC Caesar Association has with its current 36 members from around the world and has established an international footprint (with a strong membership in Norway) that includes a variety of backgrounds, from academia and solution providers to engineering contractors and owners/operators. The members are (subdivided by organization type): Associations: Energistics (USA) and The Norwegian Oil Industry Association (OLF, Norway); Universities and Research Institutes: International Research Institute of Stavanger (IRIS, Norway), Norwegian University of Science and Technology (NTNU, Norway), Korea Advanced Institute of Science and Technology (KAIST, Korea), SINTEF (Norway), University of Bergen (Norway), University of Oslo (Norway), University of Stavanger (Norway), University of Tromsø (Norway) and Western Norway Research Institute (Norway); Oil and Gas Companies: BP (UK), Petronas (Malaysia) and Statoil (Norway); Engineering contractors and consultants: Akvaplan-niva (Norway), Aker Solutions (Norway), Asset Life Cycle Information Management (ALCIM, Malaysia), CAESAR systems (USA), Bechtel (USA), Det Norske Veritas (DNV, Norway), Information Logic (USA) and iXIT Engineering Technology (Germany), Phusion IM Ltd (UK); Solution providers: Aveva (UK), Bentley Systems (USA), Jotne EPM Technology (Norway), Epsis (Norway), Eurostep (Sweden), International Business Machines Corporation (IBM, USA), Siemens - Comos Industry Solutions (before Innotec) (Germany), Intergraph (USA), Invenia (Norway), Keel Solution (Denmark), Noumenon (UK), NRX (Canada), Octaga (Norway) and Tektonisk (Norway). In general, the organization holds three membership meetings a year; one in January / February in North-America (typically USA), one in April / May in Europe (typically Norway) and one in October in Asia (typically Malaysia). == Activities and services == === Initiator and custodian of ISO 15926 === In consultation with the other EPISTLE members and the International Organization for Standardization (ISO), it was decided in 2003 (some say already in 1997) that for modeling-technical reasons it was better to discontinue the development of ISO 10303 and to initiate the development of ISO 15926 "Integration of life-cycle data for process plants including oil and gas production facilities." Over the years, the scope of the standard has increased from the initial capital-intensive projects in the upstream oil and gas industry, to include also relevant terminology for downstream oil and gas industry applications and to deal with real-time data related to the actual oil and gas production. ISO 15926 has also over the years evolved from a dictionary (a list of terms with definitions), over a taxonomy (added hierarchy) to an ontology (a formal representation of a set of concepts within a domain and the relationships between those concepts). ISO 15926 is therefore sometimes nicknamed the "Oil and Gas Ontology", for some considered to be an essential prerequisite together with Semantic Web technologies to get to better interoperability, an optimal use of all available data across boundaries and an increase in efficiency. This is what some call the next generation of Integrated Operations. === Reference data services === Placeholders: Flow scheme of WIP - RDS - ISO and role of SIGs RDS Standards in database pilot (ISO) === Special interest groups === Placeholders: Overview of SIGs Drilling and Completion Reservoir and Production Operations and Maintenance == Projects == There are a number of projects (co-)organized by POSC Caesar Association working on the extension of the ISO 15926 standard in different application areas. === Capital intensive projects application domain === The following projects are running at the moment (August 2009): The ADI Project of FIATECH, to build the tools (which will then be made available in the public domain) The IDS Project of POSC Caesar Association, to define product models required for data sheets A joint collaboration project between FIATECH POSC Caesar Association is the ADI-IDS project is the ISO 15926 WIP === Upstream oil and gas industry application domain === The following projects are currently running (August 2009): The Integrated Operations in the High North (IOHN) project is working on extending ISO 15926 to handle real-time data transmission and (pre-)processing to enable the next generation of Integrated Operations. The Environment Web project to include environmental reporting terms and definitions as used in EPIM's EnvironmentWeb in ISO 15926. Finalised projects include: The Integrated Information Platform (IIP) project working on establishing a real-time information pipeline based on open standards. It worked among others on: Daily Drilling Report (DDR) to including all terms and definitions in ISO 15926. This standard became mandatory on February 1, 2008 for reporting on the Norwegian Continental Shelf by the Norwegian Petroleum Directorate (NPD) and Safety Authority Norway (PSA). NPD says that the quality of the reports has improved considerably since. Daily Production Report (DPR) to including all terms and definitions in ISO 15926. This standard was tested successfully on the Valhall (BP-operated) and Åsgard (StatoilHydro-operated) fields offshore Norway. The terminology and XML schemata developed have also been included in Energistics’ PRODML standard. == Conferences and events == === Semantic Days === === Sogndal academic network meeting === == Collaborations == POSC Caesar is collaborating with a number of standardization bodies, including: Mimosa: collaboration on open information standards for Operations and Maintenance mainly for the downstream oil and gas industry; FIATECH: collaboration on open information standards for life cycle data of capital projects; Energistics: collaboration on information standards for the upstream oil and gas industry, including WITSML and PRODML; OASIS: collaboration on e-business standards; ISO TC184/SC4: the host of the ISO 15926 standard.

Eimear Kenny

Eimear E. Kenny is a researcher in population genetics and translation genomics, and is the Founding Director of the Institute for Genomic Health, and Endowed Chair and Professor of Genomic Health at the Icahn School of Medicine at Mount Sinai. She is known for novel approaches in computational genomics, advancing the study of human genetic variation and its connection to disease risk and diagnosis. Her research has laid the foundation for integrating artificial intelligence (AI) and genomics into precision medicine and routine clinical care. By combining genomics, computer science, and medicine, her work leverages genomic sequencing technologies and machine learning algorithms to uncover insights that improve patient care, accelerate genomic data analysis, and enable the future of AI-driven healthcare. She has led multiple genomics-based clinical trials, applying computational biology and AI in clinical settings to advance genomic medicine and precision healthcare. == Research == A recipient of the Early-Career Award from the American Society of Human Genetics (USA), Kenny, as of 2024, leads a team in genetics, computer science, and medicine, focusing on genetic ancestry, large-scale genomics, clinical trials, and genomic medicine at the Institute for Genomic Health. The lab works to advance understanding of genetic ancestry and its impact on health in order to inform better clinical medicine models. She is recognized for her work to leverage biobanks for translational genomics and her development of new genetic tests an strategies for health care management. In one study, she and her colleagues investigated genetic disorders that might be under-diagnosed due to insufficient data, and found a variant in a collagen gene associated with Steel syndrome. This syndrome caused short stature and bone and joint issues and was thought to be rare. However, the study revealed it is common in individuals with Puerto Rican ancestry. Three of Kenny's genomic medicine clinical trials assessed how to bring new technology, such as digital apps, or information, such as polygenic risk scores, into routine clinical care. In the 2010s, Kenny was instrumental in several large-scale sequencing studies, including the 1000 Genomes Project, the Exome Sequencing Project, the Genome Sequencing Project, and the Trans-Omics for Precision Medicine. In 2012, she led work that discovered the variant responsible for blond hair in Melanesia, work that was featured in the Smithsonian NHGRI Human Genome Exhibit in Washington, D.C. In 2017, her group was one of the first to demonstrate that polygenic risk scores derived in predominantly European populations have reduced accuracy when applied in populations now widely acknowledged as a major challenge in the field of genomic risk prediction. As of 2024, she is Principal Investigator in many NIH-funded international consortium focused on computational genomics and genomic medicine, including Electronic Medical Records and Genomics, Polygenic Risk Methods in Diverse Populations, and the Human Pangenome Reference Consortium. In 2023, Kenny played a key role in a groundbreaking advancement in genomics research by helping to map a diverse human pangenome—a major shift from reliance on a single reference genome. Unlike the earlier genetic map, based on one man of mixed European and African ancestry in Buffalo, this new pangenome project captures far greater human genetic diversity. As reported by The Washington Post, Kenny's work demonstrates how a more inclusive human genome can drive discoveries in rare genetic diseases, improve genomic medicine, and accelerate the future of precision healthcare. Kenny was co-developer and current license holder for Random Forest adMIXture (RFMix), a patented software for inferring continental and sub-continental ancestry at genomic loci. == Education and career == Kenny graduated from Trinity College Dublin with a BA in Biochemistry in 1999 and did a masters in Bioinformatics at Leeds University. She received her PhD in Computational Genomics at Rockefeller University, and did her post-doctoral work in the lab of Dr. Carlos D. Bustamante at Stanford University. === Academic appointments === As of 2024, at Mount Sinai, she serves as the Endowed Chair and Professor of Genomic Health, Professor at the Department of Medicine and Professor at the Department of Genetics and Genomic Sciences. Since 2018 she has served as the Founding Director of the Institute for Genomic Health, and since 2022, she also serves as the Founding Director of the Center for Translational Genomics. She is also the Director of Translational Research, Division for Genomic Medicine. Former appointments include Assistant Professor at the Department of Genetics and Genomic Sciences and Member at The Charles Bronfman Institute of Personalized Medicine, both at Mount Sinai. She was also Bioinformatics Programmer at the California Institute of Technology, and research assistant at the Massachusetts Institute of Technology. == Publications == As of 2024, Kenny is an advisor to Cell Genomics. Google Scholar reports 50,623 citations, an h-index of 66 and an i10-index of 130. The five most-cited articles she contributed to are: Auton, A; Brooks, LD; Durbin, RM; Garrison, EP; Kang, HM; Korbel, JO; Marchini, JL; McCarthy, S; McVean, GA; Abecasis, GR (2015). "A global reference for human genetic variation". Nature. 526 (7571): 68–74. Bibcode:2015Natur.526...68T. doi:10.1038/nature15393. PMC 4750478. PMID 26432245.. Cited by 14847 Abecasis, GR; Auton, A; Brooks, LD; DePristo, MA; Durbin, RM; Handsaker, RE; Kang, HM; Marth, GT; McVean, GA (2012). "An integrated map of genetic variation from 1,092 human genomes". Nature. 491 (7422): 56–65. Bibcode:2012Natur.491...56T. doi:10.1038/nature11632. PMC 3498066. PMID 23128226.. Cited by 8287 Jacob A. Tennessen et al. Evolution and Functional Impact of Rare Coding Variation from Deep Sequencing of Human Exomes.Science337,64–69(2012).DOI:10.1126/science.1219240 Cited by 1886 Taliun, D.; Harris, D.N.; Kessler, M.D.; et al. (2021). "Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program". Nature. 590 (7845): 290–299. Bibcode:2021Natur.590..290T. doi:10.1038/s41586-021-03205-y. PMC 7875770. PMID 33568819.. Cited by 1369 Vilhjálmsson, BJ; et al. (2015). "Modeling Linkage Disequilibrium Increases Accuracy of Polygenic Risk Scores". Am J Hum Genet. 97 (4): 576–92. doi:10.1016/j.ajhg.2015.09.001. PMC 4596916. PMID 26430803.. Cited by 1327

CEITON

CEITON is a web-based software system for facilitating and automating business processes such as planning, scheduling, and payroll using workflow technologies. The system is used by several media companies such as MDR, Yle, RAI and Red Bull Media House. In December 2018, the first CEITON User Group Meeting took place in Leipzig, Germany. == Architecture == The software runs on a server (on premises) or in the cloud and is scalable on parallel servers. Data security is warranted by role-based access control (RBAC). The software is used via web-browsers and not dependent on particular system software. == Structure and Features == CEITON combines the two classical approaches of production planning and control and workflow management. === Project Management === The scheduling system plans, manages, bills, and analyzes projects or tasks. It manages human and technical resources, material, and locations on a single GUI. The system uses a gantt chart to assign tasks to be done to available and eligible resources (i.e. staff), automatically or by drag-and-drop. The scheduling module includes material management, resource management/ human resource management, integration of freelancers, clients and suppliers, long-term budget planning, time-tracking, shift scheduling, quality management, delivery and logistics, document management, archive, analysis and controlling, business reporting, as well as all accounting and documentation processes. === Workflow === The workflow management system module coordinates business processes. Processes are defined once as a workflow and then repeatedly executed. Human resources are automatically assigned to steps (tasks) and integrated in workflow forms. Systems are integrated with an EAI/SOAP module, allowing data exchange with arbitrary external systems which are also involved in the business process. It also features a 3-D workflow overview in which the status of each project step can be determined by its color in the overview. === Process Management === For project and order processing management, business processes are designed as workflows, and coordinate communication automatically. Different user interfaces for staff, customers or suppliers can be created so each gets only relevant information. Different workflow forms are associated with different log-ins. The main application for the system is knowledge-based business processes, in which many people are involved and virtual results are produced, e.g. in research, or development of media products, such as TV and movies. Broadcasters and media companies such as MDR and Yle use CEITON to control their production processes for products and services and coordinate complex workflows with all kinds of resources. === Integrations === An integrated EAI module allows CEITON to integrate every external system in any business process without programming, using SOAP and similar technologies. Aspera and FileCatalyst were integrated for faster data transfer, yet complex ERP systems and numerous SAP modules have also been integrated, for example, to extract working times to payroll. === Mobile Working === Since Version 7, released in 2015, CEITON includes a time-tracking module allowing employees to enter their times from mobile devices such as tablets running Android, iPhones etc. == History == Ceiton Technologies (SME tech firm), the company developing CEITON, was founded in Leipzig, Germany in 2000, staffing solutions for the Bureau of Internal Revenue in Manila, Philippines, were implemented in 2000 together with the Deutsche Gesellschaft für Technische Zusammenarbeit of the German government. The first version (1.0) of the software was released in July 2001. The product was originally developed for German broadcasting companies. CEITON is named after the Japanese concept Seiton, one of the principles of Japanese workplace design methodology known as 5S. Since version 7, released in 2015, CEITON includes a time-tracking module allowing employees to enter their times from mobile devices such as tablets running Android, iPhones etc. In May 2005 CEITON won the IQ innovation award, sponsored by Siemens, in the category Excellent innovation in the IT-sector. Since 2007, CEITON has been present at the broadcast trade fairs NAB in Las Vegas and IBC in Amsterdam. In 2020, the company celebrated its 20th anniversary.

Safe and Secure Innovation for Frontier Artificial Intelligence Models Act

The Safe and Secure Innovation for Frontier Artificial Intelligence Models Act, or SB 1047, was a failed 2024 California bill intended to "mitigate the risk of catastrophic harms from AI models so advanced that they are not yet known to exist". Specifically, the bill would have applied to models which cost more than $100 million to train and were trained using a quantity of computing power greater than 1026 integer or floating-point operations. SB 1047 would have applied to all AI companies doing business in California—the location of the company would not matter. The bill would have created protections for whistleblowers and required developers to perform risk assessments of their models prior to release, with guidance from the Government Operations Agency. It would also have established CalCompute, a University of California public cloud computing cluster for startups, researchers and community groups. == Background == The rapid increase in capabilities of AI systems in the 2020s, including the release of ChatGPT in November 2022, caused some researchers and members of the public to become concerned about the existential risks associated with increasingly powerful AI systems. Hundreds of tech executives and AI researchers, including two of the so-called "Godfathers of AI", Geoffrey Hinton and Yoshua Bengio, signed a statement in May 2023 calling for the mitigation of the "risk of extinction from AI" to be a global priority alongside "pandemics and nuclear war". However, the plausibility of these risks is still widely debated. Strong regulation of AI has been criticized for purportedly causing regulatory capture by large AI companies like OpenAI, a phenomenon in which regulation advances the interest of larger companies at the expense of smaller competition and the public in general, although OpenAI ended up opposing the bill. Other advocates of AI regulation aim to prevent bias and privacy violations, rather than existential risks. For example, some experts who view existential concerns as overblown and unrealistic view them as a distraction from near-term harms of AI like discriminatory automated decision making. In the face of existential concerns, technology companies have made voluntary commitments to conduct safety testing, for example at the AI Safety Summit and AI Seoul Summit. In 2023, not long before the bill was proposed, Governor Newsom of California and President Biden issued executive orders on artificial intelligence. State Senator Wiener said SB 1047 draws heavily on the Biden executive order, and is motivated by the absence of unified federal legislation on AI safety. Historically, California has passed regulation on several tech issues itself, including consumer privacy and net neutrality, in the absence of action by Congress. == History == === Proposal and voting === The bill was authored by State Senator Scott Wiener. Wiener first proposed AI legislation for California through an intent bill called SB 294, the Safety in Artificial Intelligence Act, in September 2023. On February 7, 2024, Wiener introduced SB 1047. On May 21, SB 1047 passed the Senate 32–1. The bill was significantly amended by Wiener on August 15, 2024, in response to industry advice. Amendments included adding clarifications, and removing the creation of a "Frontier Model Division" and the penalty of perjury. On August 28, the bill passed the State Assembly 48–16. Then, due to the amendments, the bill was once again voted on by the Senate, passing 30–9. === Veto by governor === On September 29, Governor Gavin Newsom vetoed the bill. The deadline for California lawmakers to overrule Newsom's veto was November 30, 2024. Newsom cited concerns over the bill's regulatory framework targeting only large AI models based on their computational size, while not taking into account whether the models are deployed in high-risk environments. Newsom emphasized that this approach could create a false sense of security, overlooking smaller models that might present equally significant risks. He acknowledged the need for AI safety protocols but stressed the importance of adaptability in regulation as AI technology continues to evolve rapidly. Governor Newsom also committed to working with technology experts, federal partners, and research institutions, including the Carnegie Endowment for International Peace, led by former California Supreme Court Justice Mariano-Florentino Cuéllar; and Stanford University's Human-Centered AI (HAI) Institute, led by Dr. Fei-Fei Li. He announced plans to collaborate with these entities to advance responsible AI development, aiming to protect the public while fostering innovation. == Provisions == SB 1047 would have covered AI models with training compute over 1026 integer or floating-point operations and a cost of over $100 million. If a covered model is fine-tuned using more than $10 million, the resulting model would also have been covered. The bill would have defined critical harms with respect to four categories: Creation or use of a chemical, biological, radiological, or nuclear weapon Cyberattacks on critical infrastructure causing mass casualties or at least $500 million of damage Autonomous crimes causing mass casualties or at least $500 million of damage Other harms of comparable severity Developers would have needed to create a "safety and security protocol" before training covered models. Before deployment, they would have submitted a statement of compliance, confirming they took reasonable care to take measures to prevent covered models that pose an unreasonable risk of critical harms. The statement would have included risk assessments and descriptions of their compliance process. These rules would have applied to both covered models and their derivatives, including post-training modifications, with annual third-party audits required starting in 2026. Safeguards to reduce risk included the ability to shut down the model, which has been variously described as a "kill switch" and "circuit breaker". Whistleblowing provisions would have protected employees who report safety problems and incidents. Additionally, SB 1047 would have created a public cloud computing cluster called CalCompute, associated with the University of California, to support startups, researchers, and community groups that lack large-scale computing resources. === Compliance and supervision === SB 1047 would have required developers, beginning January 1, 2026, to annually retain a third-party auditor to perform an independent audit of compliance with the requirements of the bill, as provided. The Government Operations Agency would have reviewed the results of safety tests and incidents, and issue guidance, standards, and best practices. The bill would have created a Board of Frontier Models to supervise the application of the bill by the Government Operations Agency. It is would be composed of 9 members. == Reception == === Subjects of debate === Proponents of the bill described its provisions as simple and narrowly focused, with Sen. Scott Weiner describing it as a "light-touch, basic safety bill". This was disputed by critics of the bill, who described the bill's language as vague and criticized it as consolidating power in the largest AI companies at the expense of smaller ones. Proponents, in turn, argued that the bill only applies to models trained using more than 1026 FLOPS and with over $100 million, or fine-tuned with more than $10 million, and that the threshold could be increased if needed. The penalty of perjury was also a subject of debate, and was eventually removed through an amendment. The scope of the "kill switch" requirement was also reduced, following concerns from open-source developers. The use of the term "reasonable assurance" in the bill was also controversial, and it was eventually amended to "reasonable care". Critics then argued that "reasonable care" imposed an excessive burden by requiring confidence that models could not be used to cause catastrophic harm; proponents claimed that the standard did not require certainty and that it already applied to AI developers under existing law. === Support and opposition === Individual supporters of the bill included Turing Award recipients Yoshua Bengio and Geoffrey Hinton, Elon Musk, Bill de Blasio, Kevin Esvelt, Dan Hendrycks, Vitalik Buterin, OpenAI whistleblowers Daniel Kokotajlo and William Saunders, Lawrence Lessig, Sneha Revanur, Stuart Russell, Jan Leike, actors Mark Ruffalo, Sean Astin, and Rosie Perez, Scott Aaronson, and Max Tegmark. Over 120 Hollywood celebrities, including Mark Hamill, Jane Fonda, and J. J. Abrams, also signed a statement in support of the bill. Max Tegmark likened the bill's focus on holding companies responsible for the harms caused by their models to the FDA requiring clinical trials before a company can release a drug to the market. Organizations sponsoring the bill included the Center for AI Safety, Economic Security California and Encode. The la