AI Coding Meta

AI Coding Meta — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Canva

    Canva

    Canva Pty Ltd. is an Australian multinational proprietary software company launched in 2013 based in Sydney, Australia. The platform provides a graphic design platform to create visual content for presentations, websites, and other digital products. Its uses include templates for presentations, posters, and social media content, as well as photo and video editing functionality. The platform uses a drag-and-drop interface designed for users without professional design training or experience. Canva operates on a freemium model and has added features such as print services and video editing tools over time. == History == === 2013–2020 === Canva was founded in Perth, Australia, by Melanie Perkins, Cliff Obrecht and Cameron Adams on 1 January 2013. One of the company's early investors was Susan Wu, an American entrepreneur. In its first year, Canva had more than 750,000 users. In 2017, the company reached profitability and had 294,000 paying customers. In January 2018, Perkins announced that the company had raised A$40 million from Sequoia Capital, Blackbird Ventures, and Felicis Ventures, and the company was valued at A$1 billion. It raised A$70 million in May 2019, followed by A$85 million in October 2019 and the launch of Canva for Enterprise. In December 2019, Canva announced Canva for Education, a free product for schools and other educational institutions intended to facilitate collaboration between students and teachers. === 2021–2025 === In June 2020, Canva announced a partnership with FedEx Office and with Office Depot the following month. As of June 2020, Canva's valuation had risen to A$6 billion, rising to A$40 billion by September 2021. In September 2021, Canva raised US$200 million, with its value peaking that year at US$40 billion. By September 2022, the valuation of the company had leveled at US$26 billion. While Canva's value declined from its 2021 peak by mid-2022, it remained one of Australia's most prominent technology companies, alongside Atlassian. In March 2022, Canva had over 75 million monthly active users. In 2023, the pair were named in the Australian Financial Review's AFR Rich List as among the 10 most wealthy people in Australia. On 7 December 2022, Canva launched Magic Write, which is the platform's AI-powered copywriting assistant. On 22 March 2023, Canva announced its new Assistant tool, which makes recommendations on graphics and styles that match the user's existing design. On 11 January 2024, Canva launched its own GPT in OpenAI's GPT Store. The company has announced it intends to compete with Google and Microsoft in the office software category with website and whiteboard products. In May 2024, the company announced the launch of Canva Enterprise, a plan designed for large organisations, alongside new tools including Work Kits, Courses and AI capabilities. In 2024, it announced a co-funded solar energy project to enhance its sustainability efforts. On 10 April 2025, Canva released Visual Suite 2. The new interface combines Canva's design and productivity tools. New features include a spreadsheets application (Canva Sheets), a generative AI coding assistant (Canva Code), a chatbot, and an updated photo editor that can modify or remove background objects. In August 2025, Canva launched a stock sale to employees, valuing the company at US$42 billion. == Acquisitions == In 2018, the company acquired presentations startup Zeetings for an undisclosed amount, as part of its expansion into the presentations space. In May 2019, the company announced the acquisitions of Pixabay and Pexels, two free stock photography sites based in Germany, which enabled Canva users to access their photos for designs. In February 2021, Canva acquired Austrian startup Kaleido.ai and the Czech-based Smartmockups. In 2022, Canva acquired Flourish, a London-based data visualization startup. In March 2024, Canva acquired UK-based Serif, the developers of the Affinity suite of graphic design software, for approximately $380 million. In August 2024, Canva acquired the AI image generation platform and startup, Leonardo AI, for an undisclosed amount. In June 2025, it was announced that Canva had acquired Australian AI marketing startup MagicBrief for an undisclosed amount. In February 2026, Canva acquired two startups: Cavalry, which specializes in animation software, and MangoAI, which focuses on improving advertising performance. In April 2026, Canva acquired Simtheory, an AI Workflow Tool, and Ortto, a marketing automation tool. == Philanthropy == Canva's co-founders, Melanie Perkins and Cliff Obrecht, have publicly stated their intention to donate a significant portion of their personal wealth to charity. In 2021, Canva started a partnership with GiveDirectly, a nonprofit organization operating in low income areas that makes unconditional cash transfers to families living in extreme poverty. Since then, the company has donated $50 million to support GiveDirectly's work across Malawi. In 2025, Canva announced an additional $100 million commitment to expand its GiveDirectly partnership. == Controversies == === Data breach === In May 2019, Canva experienced a data breach in which the data of roughly 139 million users was exposed. The exposed data included real names of users, usernames, email addresses, geographical information, and password hashes for some users. In January 2020, approximately 4 million user passwords were decrypted and shared online. Canva responded by resetting the passwords of every user who had not changed their password since the initial breach. === Russian operations === In May 2022 Canva was criticized for continuing to provide free access to its services in Russia, even after suspending payment processing in the country. Activists from the Ukrainian diaspora in Australia and others said this could be viewed as indirectly supporting Russia’s war effort. They noted the company was the only one of several major Australian firms to receive the lowest “digging in” rating on a tracker run by the Yale School of Management for failing to pull out of Russia. Canva responded that it had suspended financial transactions in Russia from March 2022 and maintained the free version to allow the continued creation and sharing of “pro-peace and anti-war” content for its 1.4 million Russian users.

    Read more →
  • Algorithmic transparency

    Algorithmic transparency

    Algorithmic transparency is the principle that the factors that influence the decisions made by algorithms should be visible, or transparent, to the people who use, regulate, and are affected by systems that employ those algorithms. Although the phrase was coined in 2016 by Nicholas Diakopoulos and Michael Koliska about the role of algorithms in deciding the content of digital journalism services, the underlying principle dates back to the 1970s and the rise of automated systems for scoring consumer credit. The phrases "algorithmic transparency" and "algorithmic accountability" are sometimes used interchangeably – especially since they were coined by the same people – but they have subtly different meanings. Specifically, "algorithmic transparency" states that the inputs to the algorithm and the algorithm's use itself must be known, but they need not be fair. "Algorithmic accountability" implies that the organizations that use algorithms must be accountable for the decisions made by those algorithms, even though the decisions are being made by a machine, and not by a human being. Current research around algorithmic transparency interested in both societal effects of accessing remote services running algorithms, as well as mathematical and computer science approaches that can be used to achieve algorithmic transparency. In the United States, the Federal Trade Commission's Bureau of Consumer Protection studies how algorithms are used by consumers by conducting its own research on algorithmic transparency and by funding external research. In the European Union, the data protection laws that came into effect in May 2018 include a "right to explanation" of decisions made by algorithms, though it is unclear what this means. Furthermore, the European Union founded The European Center for Algorithmic Transparency (ECAT).

    Read more →
  • Energy informatics

    Energy informatics

    Energy informatics is a research field covering the use of information and communication technology to address energy utilization and management challenges. Methods used for "smart" implementations often combine IoT sensors with artificial intelligence and machine learning. Energy Informatics is founded on flow networks that are the major suppliers and consumers of energy. Their efficiency can be improved by collecting and analyzing information. == Application areas == The field among other consider application areas within: Smart Buildings by developing ICT-centred solutions for improving the energy-efficiency of buildings. Smart Cities by investigating the synergies between demand patterns and supply availability of energy flows in cities and communities to improve energy efficiency, increase integration of renewable sources, and provide resilience towards system faults caused by extreme situations, like hurricanes and flooding. Smart Industries including the development of ICT-centred solutions for improving the energy efficiency and predictability of energy intensive industrial processes, without compromising process and product quality. Smart Energy Networks by developing ICT-centred solutions for coordinating the supply and demand in environmentally sustainable energy networks.

    Read more →
  • Data quality

    Data quality

    Data quality refers to the state of qualitative or quantitative pieces of information. There are many definitions of data quality, but data is generally considered high quality if it is "fit for [its] intended uses in operations, decision making and planning". Data is deemed of high quality if it correctly represents the real-world construct to which it refers. Apart from these definitions, as the number of data sources increases, the question of internal data consistency becomes significant, regardless of fitness for use for any particular external purpose. People's views on data quality can often be in disagreement, even when discussing the same set of data used for the same purpose. When this is the case, businesses may adopt recognised international standards for data quality (See #International Standards for Data Quality below). Data governance can also be used to form agreed upon definitions and standards, including international standards, for data quality. In such cases, data cleansing, including standardization, may be required in order to ensure data quality. == Definitions == Defining data quality is difficult due to the many contexts data are used in, as well as the varying perspectives among end users, producers, and custodians of data. From a consumer perspective, data quality is: "data that are fit for use by data consumers" data "meeting or exceeding consumer expectations" data that "satisfies the requirements of its intended use" From a business perspective, data quality is: data that are "'fit for use' in their intended operational, decision-making and other roles" or that exhibits "'conformance to standards' that have been set, so that fitness for use is achieved" data that "are fit for their intended uses in operations, decision making and planning" "the capability of data to satisfy the stated business, system, and technical requirements of an enterprise" From a standards-based perspective, data quality is: the "degree to which a set of inherent characteristics (quality dimensions) of an object (data) fulfills requirements" "the usefulness, accuracy, and correctness of data for its application" Arguably, in all these cases, "data quality" is a comparison of the actual state of a particular set of data to a desired state, with the desired state being typically referred to as "fit for use," "to specification," "meeting consumer expectations," "free of defect," or "meeting requirements." These expectations, specifications, and requirements are usually defined by one or more individuals or groups, standards organizations, laws and regulations, business policies, or software development policies. == Dimensions of data quality == Drilling down further, those expectations, specifications, and requirements are stated in terms of characteristics or dimensions of the data, such as: accessibility or availability accuracy or correctness comparability completeness or comprehensiveness consistency, coherence, or clarity credibility, reliability, or reputation flexibility plausibility relevance, pertinence, or usefulness timeliness or latency uniqueness validity or reasonableness A systematic scoping review of the literature suggests that data quality dimensions and methods with real world data are not consistent in the literature, and as a result quality assessments are challenging due to the complex and heterogeneous nature of these data. == International standards for data quality == ISO 8000 is an international standard for data quality. Managed by the International Organization for Standardization, the ISO 8000 standards address and describe general aspects of data quality including principles, vocabulary and measurement data governance data quality management data quality assessment quality of master data, including exchange of characteristic data and identifiers quality of industrial data == History == Before the rise of the inexpensive computer data storage, massive mainframe computers were used to maintain name and address data for delivery services. This was so that mail could be properly routed to its destination. The mainframes used business rules to correct common misspellings and typographical errors in name and address data, as well as to track customers who had moved, died, gone to prison, married, divorced, or experienced other life-changing events. Government agencies began to make postal data available to a few service companies to cross-reference customer data with the National Change of Address registry (NCOA). This technology saved large companies millions of dollars in comparison to manual correction of customer data. Large companies saved on postage, as bills and direct marketing materials made their way to the intended customer more accurately. Initially sold as a service, data quality moved inside the walls of corporations, as low-cost and powerful server technology became available. Companies with an emphasis on marketing often focused their quality efforts on name and address information, but data quality is recognized as an important property of all types of data. Principles of data quality can be applied to supply chain data, transactional data, and nearly every other category of data found. For example, making supply chain data conform to a certain standard has value to an organization by: 1) avoiding overstocking of similar but slightly different stock; 2) avoiding false stock-out; 3) improving the understanding of vendor purchases to negotiate volume discounts; and 4) avoiding logistics costs in stocking and shipping parts across a large organization. For companies with significant research efforts, data quality can include developing protocols for research methods, reducing measurement error, bounds checking of data, cross tabulation, modeling and outlier detection, verifying data integrity, etc. == Overview == There are a number of theoretical frameworks for understanding data quality. A systems-theoretical approach influenced by American pragmatism expands the definition of data quality to include information quality, and emphasizes the inclusiveness of the fundamental dimensions of accuracy and precision on the basis of the theory of science (Ivanov, 1972). One framework, dubbed "Zero Defect Data" (Hansen, 1991) adapts the principles of statistical process control to data quality. Another framework seeks to integrate the product perspective (conformance to specifications) and the service perspective (meeting consumers' expectations) (Kahn et al. 2002). Another framework is based in semiotics to evaluate the quality of the form, meaning and use of the data (Price and Shanks, 2004). One highly theoretical approach analyzes the ontological nature of information systems to define data quality rigorously (Wand and Wang, 1996). A considerable amount of data quality research involves investigating and describing various categories of desirable attributes (or dimensions) of data. Nearly 200 such terms have been identified and there is little agreement in their nature (are these concepts, goals or criteria?), their definitions or measures (Wang et al., 1993). Software engineers may recognize this as a similar problem to "ilities". MIT has an Information Quality (MITIQ) Program, led by Professor Richard Wang, which produces a large number of publications and hosts a significant international conference in this field (International Conference on Information Quality, ICIQ). This program grew out of the work done by Hansen on the "Zero Defect Data" framework (Hansen, 1991). In practice, data quality is a concern for professionals involved with a wide range of information systems, ranging from data warehousing and business intelligence to customer relationship management and supply chain management. One industry study estimated the total cost to the U.S. economy of data quality problems at over U.S. $600 billion per annum (Eckerson, 2002). Incorrect data – which includes invalid and outdated information – can originate from different data sources – through data entry, or data migration and conversion projects. In 2002, the USPS and PricewaterhouseCoopers released a report stating that 23.6 percent of all U.S. mail sent is incorrectly addressed. One reason contact data becomes stale very quickly in the average database – more than 45 million Americans change their address every year. In fact, the problem is such a concern that companies are beginning to set up a data governance team whose sole role in the corporation is to be responsible for data quality. In some organizations, this data governance function has been established as part of a larger Regulatory Compliance function - a recognition of the importance of Data/Information Quality to organizations. Problems with data quality don't only arise from incorrect data; inconsistent data is a problem as well. Eliminating data shadow systems and centralizing data in a warehouse is one of the initiatives a company can take to ensure data consistency. En

    Read more →
  • AI nationalism

    AI nationalism

    AI nationalism is the idea that nations should develop and control their own artificial intelligence technologies to advance their own interests and ensure technological sovereignty. This concept is gaining traction globally, leading countries to implement new laws, form strategic alliances, and invest significantly in domestic AI capabilities. == Global trends and national strategies == In 2018, British technology investor Ian Hogarth published an influential essay titled AI Nationalism. He argued that as AI gains more power and its economic and military significance expands, governments will take measures to bolster their own domestic AI industries, and predicted that the advancement of machine learning systems would lead to what he termed "AI nationalism." He anticipated that this rise in AI would accelerate a global arms race, resulting in more closed economies, restrictions on foreign acquisitions, and limitations on the movement of talent. Hogarth predicted that AI policy would become a central focus of government agendas. He also criticized Britain’s approach to AI strategy, citing the sale of London-based DeepMind—one of the leading AI laboratories, acquired by Google for a relatively modest £400 million in 2014—as a significant misstep. AI nationalism is chiefly reflected in the escalating rhetoric of an artificial intelligence arms race, portraying AI development as a zero-sum game where the winner gains significant economic, political, and military advantages. This mindset, as highlighted in a 2017 Pentagon report, warns that sharing AI technology could erode technological supremacy and enhance rivals' capabilities. The winner-takes-all mentality of AI nationalism poses risks including unsafe AI development, increased geopolitical tension, and potential military aggression (such as cyberattacks or targeting AI professionals). Several countries, including Canada, France, and India, have formulated national strategies to advance their positions in AI. In the United States, a leading player in the global AI arena, trade policies have been enacted to restrict China's access to critical microchips, reflecting a strategic effort to maintain a technological edge. The United States’ National Security Commission on Artificial Intelligence (NSCAI) frames AI development as a critical aspect of a broader technology competition crucial for national success. It emphasizes the need to outpace China in AI to maintain strategic advantage, reflecting AI nationalism by linking geopolitical power directly to advancements in AI. France has seen notable governmental support for local AI startups, particularly those specializing in language technologies that cater to French and other non-English languages. In Saudi Arabia, Crown Prince Mohammed bin Salman is investing billions in AI research and development. The country has actively collaborated with major technology firms such as Amazon, IBM, and Microsoft to establish itself as a prominent AI hub. == Historical and cultural context == AI nationalism is seen as deeply connected to historical racism and imperialism. It is viewed not merely as a technological competition but as a contest over racial and civilizational superiority. Historically, technological achievements were often used to justify colonialism and racial hierarchies, with Western societies perceiving their advancements as evidence of superiority. In the context of AI, this historical context continues to shape views on intelligence and development. Some argue that AI nationalism reinforces the idea of fundamental civilizational divides, especially between the Western world and China. This perspective often frames China's progress in AI as a direct challenge to Western values, presenting the AI competition as a struggle over values. AI nationalism is said to draw from long-standing anti-Asian stereotypes, such as the "Yellow Peril," which portray Asian nations as threats to Western civilization. This viewpoint links Asian technological advances with dehumanization and artificiality, reflecting persistent anxieties about China's growing role in the global tech landscape. == Implications == AI nationalism is seen as a component of a broader trend towards the fragmentation of the internet, where digital services are increasingly influenced by local regulations and national interests. This shift is creating a new technological landscape in which the impact of artificial intelligence on individuals' lives can vary significantly depending on their geographic location. J. Paul Goode argues that AI nationalism may exacerbate existing societal divisions by promoting the development of systems that embed cultural biases, thereby privileging certain groups while disadvantaging others.

    Read more →
  • Nike+iPod

    Nike+iPod

    The Nike+iPod Sport Kit is an activity tracker device, developed by Nike, Inc., which measures and records the distance and pace of a walk or run. The Nike+iPod consists of a small transmitter device attached to or embedded in a shoe, which communicates with either the Nike+ Sportband, or a receiver plugged into an iPod Nano. It can also work directly with a 2nd Generation iPod Touch (or higher), iPhone 3GS, iPhone 4, iPhone 4S, iPhone 5, The Nike+iPod was announced on May 23, 2006. On September 7, 2010, Nike released the Nike+ Running App (originally called Nike+ GPS) on the App Store, which used a tracking engine powered by MotionX that does not require the separate shoe sensor or pedometer. This application works using the accelerometer and GPS of the iPhone and the accelerometer of the iPod Touch, which does not have a GPS chip. Nike+Running is compatible with the iPhone 6 and iPhone 6 Plus down to iPhone 3GS and iPod touch. On June 21, 2012, Nike released Nike+ Running App for Android. The current app is compatible with all Android phones running 4.0.3 and up. == Overview == The sensor and iPod kit were revealed on May 20, 2006. The kit stores information such as the elapsed time of the workout, the distance traveled, pace, and calories burned by the individual. Nike+ was a collaboration between Nike and Apple; the platform consisted of an iPod, a wireless chip, Nike shoes that accepted the wireless chip, an iTunes membership, and a Nike+ online community. iPods using Nike iPod require a sensor and remote. The next upgraded product was the Sportband kit, which was announced in April 2008. The kit allows users to store run information without the iPod Nano. The Sportband consists of two parts: a rubber holding strap which is worn around the wrist, and a receiver which resembles a USB key-disk. The receiver displays information comparable to that of the iPod kit on the built-in display. After a run, the receiver can be plugged straight into a USB port and the software will upload the run information automatically to the Nike+ website. As of August 2008 "Nike+iPod for the Gym" launched, allowing users to record their cardio workouts directly to their iPods. No Sport kit or shoe sensor is required; all that is needed is a compatible iPod (1st–6th generation iPod Nano or 2nd/3rd gen iPod Touch) and an enabled piece of cardio equipment. As of March 2009, the seven largest commercial equipment providers were shipping enabled equipment (Life Fitness, Technogym, Precor USA, Star Trac, Cybex International, Matrix Fitness and Free Motion). The models of compatible cardio equipment include treadmills, stationary bicycles, stair climbers, ellipticals, and others such as Precor's Adaptive Motion Trainer. Once the user syncs an iPod with iTunes, the cardio workouts are automatically stored at Nikeplus.com, where each workout is visualized and tracked based on the number of calories burned. The calories are converted to "CardioMiles", at a ratio of 100:1, allowing cardio users to take full advantage of all the tools and features of Nikeplus.com, and allow them to engage in challenges with other runners, walkers and cardio users, using a common currency. With the release of the second-generation iPod Touch in 2008, Apple Inc. included a built-in ability to receive Nike+ signals, which allowed the iPod to connect directly to the wireless sensor thus eliminating the need for an external receiver to be connected. Apple also added this capability to the iPhone 3GS (released 2009), iPhone 4 (2010), and third-generation iPod Touch (2009). Those devices use their Broadcom Bluetooth chipset to receive the signals. On June 7, 2010, Polar and Nike introduced the Polar WearLink+ that works with Nike+. This new product works with the Nike+ SportBand and the fifth generation iPod nano in conjunction with the Nike+ iPod Sport Kit. Polar WearLink+ that works with Nike+ communicates directly with the fifth generation iPod nano and Nike+ SportBand using a proprietary digital protocol but it is dual-mode so it is also compatible with most Polar training computers (all those using 5 kHz analog transmission technology). Nike+ had 18 million global users as of April 2013. One year later, Nike updated the number of global users to 28 million. In iOS 6.1.2 (and possibly higher), a hole in the compatibility for the app has allowed jailbroken iPad users to use the native Nike + iPod iPhone and iPod app by moving the app bundle and setting permissions for the app. On April 30, 2018, Nike retired services for legacy Nike wearable devices, such as the Nike+ FuelBand and the Nike+ SportWatch GPS, and previous versions of apps, including Nike Run Club and Nike Training Club version 4.X and lower. Likewise, Nike no longer supported the Nike+ Connect software that transferred data to a NikePlus Profile or the Nike+ Fuel/FuelBand and Nike+ Move apps. == Sports kit equipment == The kit consists of two pieces: a piezoelectric sensor with a Nordic Semiconductor nRF2402 transmitter that is mounted under the inner sole of the shoe and a receiver that connects to the iPod. They communicate using a 2.4 GHz wireless radio and use Nordic Semiconductor's "ShockBurst" network protocol. The wireless data is encrypted in transit, but some uniquely identifying data is sent in the plain. The wireless protocol was reverse engineered and documented by Dmitry Grinberg in 2011. Nike recommends that the shoe be a Nike+ model with a special pocket in which to place the device. Nike has released the sensor for individual sale meaning that consumers no longer have to purchase the whole set (the iPod receiver and sensor). As the sensor battery cannot be replaced, a new one must be purchased every time the battery runs out. Aftermarket solutions are available to users who do not want to use shoes with built-in or hand-made pockets for the foot sensor, such as shoe pouches and containment devices designed to affix the sensor against the shoe laces. No matter how the sensor is integrated with the user's shoes, care must be taken that it is firmly fixed in place and will not jerk around while in use, which would degrade the accuracy. == Sports kit usage == The Sports Kit can be used to track running, which it refers to as "workouts". New workouts are started by plugging the receiving unit into the iPod, then navigating through the iPod menu system. The user chooses a goal for the workout, which might be to cover a specific distance, or burn a number of calories, or work out for a specified time. A workout can also be started without a goal, which is called a "Basic Workout". When the workout goal has been set, the receiver seeks the sensor, possibly asking the user to "walk around to activate [the] sensor". The user then must press the center button on the iPod to begin the workout. Audio feedback is provided in the user's choice of generic male or female voice by the iPod over the course of the workout, depending on the type of workout chosen. For goal-oriented workouts, the feedback will correspond to significant milestones toward the goal. In a distance workout, for example, the audio feedback will inform the user as each mile or kilometer has been completed, as well as the half-way point of the workout, and a countdown of four 100-meter increments at the end of the workout. The iPod's control wheel functions change slightly during a workout. The Pause button now not only pauses the music but also the workout. Similarly, the Menu button is used to access the controls to end the workout. The Forward and Back buttons are unchanged, performing audio track skip and reverse functions. The Center button has two functions: audio feedback about the current distance, time, and pace are provided when the button is tapped once, while if the button is held down the iPod skips to the "PowerSong" - an audio track chosen by the user, generally intended for motivation. In addition to the in-workout audio feedback, there are pre-recorded congratulations provided by Lance Armstrong, Tiger Woods, Joan Benoit Samuelson, and Paula Radcliffe whenever a user achieves a personal best (such as fastest mile, fastest 5K, fastest 10K, longest run yet) or reaches certain long-term milestones (such as 250 miles, 500 kilometers). This "celebrity feedback" is heard after the usual end-of-run statistics. While the Sports Kit can be used immediately after purchase, it will report more accurate results if it is calibrated before the first usage and then regularly afterwards. For calibration, the user finds a fixed known distance of at least 0.25 mile or 400 meters and then sets the Nike+ to calibration mode for the walk or run over that distance. When the walk or run is complete, the device calibrates itself and future workout reporting will reflect statistics closer to that individual user's workout style. Consumer Reports magazine tested the device and found it accurate as long as you keep an even pace. In workouts with varied pa

    Read more →
  • Enterprise Objects Framework

    Enterprise Objects Framework

    The Enterprise Objects Framework, or simply EOF, was introduced by NeXT in 1994 as a pioneering object-relational mapping product for its NeXTSTEP and OpenStep development platforms. EOF abstracts the process of interacting with a relational database by mapping database rows to Java or Objective-C objects. This largely relieves developers from writing low-level SQL code. EOF enjoyed some niche success in the mid-1990s among financial institutions who were attracted to the rapid application development advantages of NeXT's object-oriented platform. Since Apple Inc's merger with NeXT in 1996, EOF has evolved into a fully integrated part of WebObjects, an application server also originally from NeXT. Many of the core concepts of EOF re-emerged as part of Core Data, which further abstracts the underlying data formats to allow it to be based on non-SQL stores. == History == In the early 1990s NeXT Computer recognized that connecting to databases was essential to most businesses and yet also potentially complex. Every data source has a different data-access language (or API), driving up the costs to learn and use each vendor's product. The NeXT engineers wanted to apply the advantages of object-oriented programming, by getting objects to "talk" to relational databases. As the two technologies are very different, the solution was to create an abstraction layer, insulating developers from writing the low-level procedural code (SQL) specific to each data source. The first attempt came in 1992 with the release of Database Kit (DBKit), which wrapped an object-oriented framework around any database. Unfortunately, NEXTSTEP at the time was not powerful enough and DBKit had serious design flaws. NeXT's second attempt came in 1994 with the Enterprise Objects Framework (EOF) version 1, a complete rewrite that was far more modular and OpenStep compatible. EOF 1.0 was the first product released by NeXT using the Foundation Kit and introduced autoreleased objects to the developer community. The development team at the time was only four people: Jack Greenfield, Rich Williamson, Linus Upson and Dan Willhite. EOF 2.0, released in late 1995, further refined the architecture, introducing the editing context. At that point, the development team consisted of Dan Willhite, Craig Federighi, Eric Noyau and Charly Kleissner. EOF achieved a modest level of popularity in the financial programming community in the mid-1990s, but it would come into its own with the emergence of the World Wide Web and the concept of web applications. It was clear that EOF could help companies plug their legacy databases into the Web without any rewriting of that data. With the addition of frameworks to do state management, load balancing and dynamic HTML generation, NeXT was able to launch the first object-oriented Web application server, WebObjects, in 1996, with EOF at its core. In 2000, Apple Inc. (which had merged with NeXT) officially dropped EOF as a standalone product, meaning that developers would be unable to use it to create desktop applications for the forthcoming Mac OS X. It would, however, continue to be an integral part of a major new release of WebObjects. WebObjects 5, released in 2001, was significant for the fact that its frameworks had been ported from their native Objective-C programming language to the Java language. Critics of this change argue that most of the power of EOF was a side effect of its Objective-C roots, and that EOF lost the beauty or simplicity it once had. Third-party tools, such as EOGenerator, help fill the deficiencies introduced by Java (mainly due to the loss of categories). The Objective-C code base was re-introduced with some modifications to desktop application developers as Core Data, part of Apple's Cocoa API, with the release of Mac OS X Tiger in April 2005. == How EOF works == Enterprise Objects provides tools and frameworks for object-relational mapping. The technology specializes in providing mechanisms to retrieve data from various data sources, such as relational databases via JDBC and JNDI directories, and mechanisms to commit data back to those data sources. These mechanisms are designed in a layered, abstract approach that allows developers to think about data retrieval and commitment at a higher level than a specific data source or data source vendor. Central to this mapping is a model file (an "EOModel") that you build with a visual tool — either EOModeler, or the EOModeler plug-in to Xcode. The mapping works as follows: Database tables are mapped to classes. Database columns are mapped to class attributes. Database rows are mapped to objects (or class instances). You can build data models based on existing data sources or you can build data models from scratch, which you then use to create data structures (tables, columns, joins) in a data source. The result is that database records can be transposed into Java objects. The advantage of using data models is that applications are isolated from the idiosyncrasies of the data sources they access. This separation of an application's business logic from database logic allows developers to change the database an application accesses without needing to change the application. EOF provides a level of database transparency not seen in other tools and allows the same model to be used to access different vendor databases and even allows relationships across different vendor databases without changing source code. Its power comes from exposing the underlying data sources as managed graphs of persistent objects. In simple terms, this means that it organizes the application's model layer into a set of defined in-memory data objects. It then tracks changes to these objects and can reverse those changes on demand, such as when a user performs an undo command. Then, when it is time to save changes to the application's data, it archives the objects to the underlying data sources. === Using Inheritance === In designing Enterprise Objects developers can leverage the object-oriented feature known as inheritance. A Customer object and an Employee object, for example, might both inherit certain characteristics from a more generic Person object, such as name, address, and phone number. While this kind of thinking is inherent in object-oriented design, relational databases have no explicit support for inheritance. However, using Enterprise Objects, you can build data models that reflect object hierarchies. That is, you can design database tables to support inheritance by also designing enterprise objects that map to multiple tables or particular views of a database table. == Enterprise Objects (EOs) == An Enterprise Object is analogous to what is often known in object-oriented programming as a business object — a class which models a physical or conceptual object in the business domain (e.g. a customer, an order, an item, etc.). What makes an EO different from other objects is that its instance data maps to a data store. Typically, an enterprise object contains key-value pairs that represent a row in a relational database. The key is basically the column name, and the value is what was in that row in the database. So it can be said that an EO's properties persist beyond the life of any particular running application. More precisely, an Enterprise Object is an instance of a class that implements the com.webobjects.eocontrol.EOEnterpriseObject interface. An Enterprise Object has a corresponding model (called an EOModel) that defines the mapping between the class's object model and the database schema. However, an enterprise object doesn't explicitly know about its model. This level of abstraction means that database vendors can be switched without it affecting the developer's code. This gives Enterprise Objects a high degree of reusability. == EOF and Core Data == Despite their common origins, the two technologies diverged, with each technology retaining a subset of the features of the original Objective-C code base, while adding some new features. === Features Supported Only by EOF === EOF supports custom SQL; shared editing contexts; nested editing contexts; and pre-fetching and batch faulting of relationships, all features of the original Objective-C implementation not supported by Core Data. Core Data also does not provide the equivalent of an EOModelGroup—the NSManagedObjectModel class provides methods for merging models from existing models, and for retrieving merged models from bundles. === Features Supported Only by Core Data === Core Data supports fetched properties; multiple configurations within a managed object model; local stores; and store aggregation (the data for a given entity may be spread across multiple stores); customization and localization of property names and validation warnings; and the use of predicates for property validation. These features of the original Objective-C implementation are not supported by the Java implementation.

    Read more →
  • Subject indexing

    Subject indexing

    Subject indexing is the act of describing or classifying a document by index terms, keywords, or other symbols in order to indicate what different documents are about, to summarize their contents or to increase findability. In other words, the objective is to identify and describe the subject of documents. Indexes are constructed, separately, on three distinct levels: terms in a document, such as a book; objects in a collection, such as a library; and documents (such as books and articles) within a field of knowledge. Subject indexing is used in information retrieval especially to create bibliographic indexes to retrieve documents on a particular subject. Examples of academic indexing services are Zentralblatt MATH, Chemical Abstracts, and PubMed. The index terms were mostly assigned by experts but author keywords are also common. The process of indexing begins with any analysis of the subject of the document. The indexer must then identify terms that appropriately identify the subject, either by extracting words directly from the document or assigning words from a controlled vocabulary. The terms in the index are then presented in a systematic order. Indexers must decide how many terms to include and how specific the terms should be. Together this gives a depth of indexing. == Subject analysis == The first step in indexing is to decide on the subject matter of the document. In manual indexing, the indexer would consider the subject matter in terms of answer to a set of questions such as "Does the document deal with a specific product, condition or phenomenon?". As the analysis is influenced by the knowledge and experience of the indexer, it follows that two indexers may analyze the content differently and so come up with different index terms. This will impact on the success of retrieval. === Automatic vs. manual subject analysis === Automatic indexing follows set processes of analyzing frequencies of word patterns and comparing results to other documents in order to assign to subject categories. This requires no understanding of the material being indexed. This leads to more uniform indexing but at the expense of the true meaning being interpreted. A computer program will not understand the meaning of statements and may therefore fail to assign some relevant terms or assign incorrectly. Human indexers focus their attention on certain parts of the document such as the title, abstract, summary and conclusions, as analyzing the full text in depth is costly and time-consuming. An automated system takes away the time limit and allows the entire document to be analyzed, but also has the option to be directed to particular parts of the document. == Term selection == The second stage of indexing involves the translation of the subject analysis into a set of index terms. This can involve extracting from the document or assigning from a controlled vocabulary. With the ability to conduct a full text search widely available, many people have come to rely on their own expertise in conducting information searches and full text search has become very popular. Subject indexing and its experts, professional indexers, catalogers, and librarians, remains crucial to information organization and retrieval. These experts understand controlled vocabularies and are able to find information that cannot be located by full text search. The cost of expert analysis to create subject indexing is not easily compared to the cost of hardware, software and labor to manufacture a comparable set of full-text, fully searchable materials. With new web applications that allow every user to annotate documents, social tagging has gained popularity especially in the Web. One application of indexing, the book index, remains relatively unchanged despite the information revolution. === Extraction/Derived indexing === Extraction indexing involves taking words directly from the document. It uses natural language and lends itself well to automated techniques where word frequencies are calculated and those with a frequency over a pre-determined threshold are used as index terms. A stop-list containing common words (such as "the", "and") would be referred to and such stop words would be excluded as index terms. Automated extraction indexing may lead to loss of meaning of terms by indexing single words as opposed to phrases. Although it is possible to extract commonly occurring phrases, it becomes more difficult if key concepts are inconsistently worded in phrases. Automated extraction indexing also has the problem that, even with use of a stop-list to remove common words, some frequent words may not be useful for allowing discrimination between documents. For example, the term glucose is likely to occur frequently in any document related to diabetes. Therefore, use of this term would likely return most or all the documents in the database. Post-coordinated indexing where terms are combined at the time of searching would reduce this effect but the onus would be on the searcher to link appropriate terms as opposed to the information professional. In addition terms that occur infrequently may be highly significant for example a new drug may be mentioned infrequently but the novelty of the subject makes any reference significant. One method for allowing rarer terms to be included and common words to be excluded by automated techniques would be a relative frequency approach where frequency of a word in a document is compared to frequency in the database as a whole. Therefore, a term that occurs more often in a document than might be expected based on the rest of the database could then be used as an index term, and terms that occur equally frequently throughout will be excluded. Another problem with automated extraction is that it does not recognize when a concept is discussed but is not identified in the text by an indexable keyword. Since this process is based on simple string matching and involves no intellectual analysis, the resulting product is more appropriately known as a concordance than an index. === Assignment indexing === An alternative is assignment indexing where index terms are taken from a controlled vocabulary. This has the advantage of controlling for synonyms as the preferred term is indexed and synonyms or related terms direct the user to the preferred term. This means the user can find articles regardless of the specific term used by the author and saves the user from having to know and check all possible synonyms. It also removes any confusion caused by homographs by inclusion of a qualifying term. A third advantage is that it allows the linking of related terms whether they are linked by hierarchy or association, e.g. an index entry for an oral medication may list other oral medications as related terms on the same level of the hierarchy but would also link to broader terms such as treatment. Assignment indexing is used in manual indexing to improve inter-indexer consistency as different indexers will have a controlled set of terms to choose from. Controlled vocabularies do not completely remove inconsistencies as two indexers may still interpret the subject differently. == Index presentation == The final phase of indexing is to present the entries in a systematic order. This may involve linking entries. In a pre-coordinated index the indexer determines the order in which terms are linked in an entry by considering how a user may formulate their search. In a post-coordinated index, the entries are presented singly and the user can link the entries through searches, most commonly carried out by computer software. Post-coordination results in a loss of precision in comparison to pre-coordination. == Depth of indexing == Indexers must make decisions about what entries should be included and how many entries an index should incorporate. The depth of indexing describes the thoroughness of the indexing process with reference to exhaustivity and specificity. === Exhaustivity === An exhaustive index is one which lists all possible index terms. Greater exhaustivity gives a higher recall, or more likelihood of all the relevant articles being retrieved, however, this occurs at the expense of precision. This means that the user may retrieve a larger number of irrelevant documents or documents which only deal with the subject in little depth. In a manual system a greater level of exhaustivity brings with it a greater cost as more man-hours are required. The additional time taken in an automated system would be much less significant. At the other end of the scale, in a selective index only the most important aspects are covered. Recall is reduced in a selective index as if an indexer does not include enough terms, a highly relevant article may be overlooked. Therefore, indexers should strive for a balance and consider what the document may be used. They may also have to consider the implications of time and expense. === Specificity === The specificity describes how closel

    Read more →
  • Feature hashing

    Feature hashing

    In machine learning, feature hashing, also known as the hashing trick (by analogy to the kernel trick), is a fast and space-efficient way of vectorizing features, i.e. turning arbitrary features into indices in a vector or matrix. It works by applying a hash function to the features and using their hash values as indices directly (after a modulo operation), rather than looking the indices up in an associative array. In addition to its use for encoding non-numeric values, feature hashing can also be used for dimensionality reduction. This trick is often attributed to Weinberger et al. (2009), but there exists a much earlier description of this method published by John Moody in 1989. == Motivation == === Motivating example === In a typical document classification task, the input to the machine learning algorithm (both during learning and classification) is free text. From this, a bag of words (BOW) representation is constructed: the individual tokens are extracted and counted, and each distinct token in the training set defines a feature (independent variable) of each of the documents in both the training and test sets. Machine learning algorithms, however, are typically defined in terms of numerical vectors. Therefore, the bags of words for a set of documents is regarded as a term-document matrix where each row is a single document, and each column is a single feature/word; the entry i, j in such a matrix captures the frequency (or weight) of the j'th term of the vocabulary in document i. (An alternative convention swaps the rows and columns of the matrix, but this difference is immaterial.) Typically, these vectors are extremely sparse—according to Zipf's law. The common approach is to construct, at learning time or prior to that, a dictionary representation of the vocabulary of the training set, and use that to map words to indices. Hash tables and tries are common candidates for dictionary implementation. E.g., the three documents John likes to watch movies. Mary likes movies too. John also likes football. can be converted, using the dictionary to the term-document matrix ( John likes to watch movies Mary too also football 1 1 1 1 1 0 0 0 0 0 1 0 0 1 1 1 0 0 1 1 0 0 0 0 0 1 1 ) {\displaystyle {\begin{pmatrix}{\textrm {John}}&{\textrm {likes}}&{\textrm {to}}&{\textrm {watch}}&{\textrm {movies}}&{\textrm {Mary}}&{\textrm {too}}&{\textrm {also}}&{\textrm {football}}\\1&1&1&1&1&0&0&0&0\\0&1&0&0&1&1&1&0&0\\1&1&0&0&0&0&0&1&1\end{pmatrix}}} (Punctuation was removed, as is usual in document classification and clustering.) The problem with this process is that such dictionaries take up a large amount of storage space and grow in size as the training set grows. On the contrary, if the vocabulary is kept fixed and not increased with a growing training set, an adversary may try to invent new words or misspellings that are not in the stored vocabulary so as to circumvent a machine learned filter. To address this challenge, Yahoo! Research attempted to use feature hashing for their spam filters. Note that the hashing trick isn't limited to text classification and similar tasks at the document level, but can be applied to any problem that involves large (perhaps unbounded) numbers of features. === Mathematical motivation === Mathematically, a token is an element t {\displaystyle t} in a finite (or countably infinite) set T {\displaystyle T} . Suppose we only need to process a finite corpus, then we can put all tokens appearing in the corpus into T {\displaystyle T} , meaning that T {\displaystyle T} is finite. However, suppose we want to process all possible words made of the English letters, then T {\displaystyle T} is countably infinite. Most neural networks can only operate on real vector inputs, so we must construct a "dictionary" function ϕ : T → R n {\displaystyle \phi :T\to \mathbb {R} ^{n}} . When T {\displaystyle T} is finite, of size | T | = m ≤ n {\displaystyle |T|=m\leq n} , then we can use one-hot encoding to map it into R n {\displaystyle \mathbb {R} ^{n}} . First, arbitrarily enumerate T = { t 1 , t 2 , . . , t m } {\displaystyle T=\{t_{1},t_{2},..,t_{m}\}} , then define ϕ ( t i ) = e i {\displaystyle \phi (t_{i})=e_{i}} . In other words, we assign a unique index i {\displaystyle i} to each token, then map the token with index i {\displaystyle i} to the unit basis vector e i {\displaystyle e_{i}} . One-hot encoding is easy to interpret, but it requires one to maintain the arbitrary enumeration of T {\displaystyle T} . Given a token t ∈ T {\displaystyle t\in T} , to compute ϕ ( t ) {\displaystyle \phi (t)} , we must find out the index i {\displaystyle i} of the token t {\displaystyle t} . Thus, to implement ϕ {\displaystyle \phi } efficiently, we need a fast-to-compute bijection h : T → { 1 , . . . , m } {\displaystyle h:T\to \{1,...,m\}} , then we have ϕ ( t ) = e h ( t ) {\displaystyle \phi (t)=e_{h(t)}} . In fact, we can relax the requirement slightly: It suffices to have a fast-to-compute injection h : T → { 1 , . . . , n } {\displaystyle h:T\to \{1,...,n\}} , then use ϕ ( t ) = e h ( t ) {\displaystyle \phi (t)=e_{h(t)}} . In practice, there is no simple way to construct an efficient injection h : T → { 1 , . . . , n } {\displaystyle h:T\to \{1,...,n\}} . However, we do not need a strict injection, but only an approximate injection. That is, when t ≠ t ′ {\displaystyle t\neq t'} , we should probably have h ( t ) ≠ h ( t ′ ) {\displaystyle h(t)\neq h(t')} , so that probably ϕ ( t ) ≠ ϕ ( t ′ ) {\displaystyle \phi (t)\neq \phi (t')} . At this point, we have just specified that h {\displaystyle h} should be a hashing function. Thus we reach the idea of feature hashing. == Algorithms == === Feature hashing (Weinberger et al. 2009) === The basic feature hashing algorithm presented in (Weinberger et al. 2009) is defined as follows. First, one specifies two hash functions: the kernel hash h : T → { 1 , 2 , . . . , n } {\displaystyle h:T\to \{1,2,...,n\}} , and the sign hash ζ : T → { − 1 , + 1 } {\displaystyle \zeta :T\to \{-1,+1\}} . Next, one defines the feature hashing function: ϕ : T → R n , ϕ ( t ) = ζ ( t ) e h ( t ) {\displaystyle \phi :T\to \mathbb {R} ^{n},\quad \phi (t)=\zeta (t)e_{h(t)}} Finally, extend this feature hashing function to strings of tokens by ϕ : T ∗ → R n , ϕ ( t 1 , . . . , t k ) = ∑ j = 1 k ϕ ( t j ) {\displaystyle \phi :T^{}\to \mathbb {R} ^{n},\quad \phi (t_{1},...,t_{k})=\sum _{j=1}^{k}\phi (t_{j})} where T ∗ {\displaystyle T^{}} is the set of all finite strings consisting of tokens in T {\displaystyle T} . Equivalently, ϕ ( t 1 , . . . , t k ) = ∑ j = 1 k ζ ( t j ) e h ( t j ) = ∑ i = 1 n ( ∑ j : h ( t j ) = i ζ ( t j ) ) e i {\displaystyle \phi (t_{1},...,t_{k})=\sum _{j=1}^{k}\zeta (t_{j})e_{h(t_{j})}=\sum _{i=1}^{n}\left(\sum _{j:h(t_{j})=i}\zeta (t_{j})\right)e_{i}} ==== Geometric properties ==== We want to say something about the geometric property of ϕ {\displaystyle \phi } , but T {\displaystyle T} , by itself, is just a set of tokens, we cannot impose a geometric structure on it except the discrete topology, which is generated by the discrete metric. To make it nicer, we lift it to T → R T {\displaystyle T\to \mathbb {R} ^{T}} , and lift ϕ {\displaystyle \phi } from ϕ : T → R n {\displaystyle \phi :T\to \mathbb {R} ^{n}} to ϕ : R T → R n {\displaystyle \phi :\mathbb {R} ^{T}\to \mathbb {R} ^{n}} by linear extension: ϕ ( ( x t ) t ∈ T ) = ∑ t ∈ T x t ζ ( t ) e h ( t ) = ∑ i = 1 n ( ∑ t : h ( t ) = i x t ζ ( t ) ) e i {\displaystyle \phi ((x_{t})_{t\in T})=\sum _{t\in T}x_{t}\zeta (t)e_{h(t)}=\sum _{i=1}^{n}\left(\sum _{t:h(t)=i}x_{t}\zeta (t)\right)e_{i}} There is an infinite sum there, which must be handled at once. There are essentially only two ways to handle infinities. One may impose a metric, then take its completion, to allow well-behaved infinite sums, or one may demand that nothing is actually infinite, only potentially so. Here, we go for the potential-infinity way, by restricting R T {\displaystyle \mathbb {R} ^{T}} to contain only vectors with finite support: ∀ ( x t ) t ∈ T ∈ R T {\displaystyle \forall (x_{t})_{t\in T}\in \mathbb {R} ^{T}} , only finitely many entries of ( x t ) t ∈ T {\displaystyle (x_{t})_{t\in T}} are nonzero. Define an inner product on R T {\displaystyle \mathbb {R} ^{T}} in the obvious way: ⟨ e t , e t ′ ⟩ = { 1 , if t = t ′ , 0 , else. ⟨ x , x ′ ⟩ = ∑ t , t ′ ∈ T x t x t ′ ⟨ e t , e t ′ ⟩ {\displaystyle \langle e_{t},e_{t'}\rangle ={\begin{cases}1,{\text{ if }}t=t',\\0,{\text{ else.}}\end{cases}}\quad \langle x,x'\rangle =\sum _{t,t'\in T}x_{t}x_{t'}\langle e_{t},e_{t'}\rangle } As a side note, if T {\displaystyle T} is infinite, then the inner product space R T {\displaystyle \mathbb {R} ^{T}} is not complete. Taking its completion would get us to a Hilbert space, which allows well-behaved infinite sums. Now we have an inner product space, with enough structure to describe the geometry of the feature hashing function ϕ : R T → R n {\displaystyle \phi :\ma

    Read more →
  • AI-driven design automation

    AI-driven design automation

    AI-driven design automation is the use of artificial intelligence (AI) to automate and improve different parts of the electronic design automation (EDA) process. It is particularly important in the design of integrated circuits (chips) and complex electronic systems, where it can potentially increase productivity, decrease costs, and speed up design cycles. AI Driven Design Automation uses several methods, including machine learning, expert systems, and reinforcement learning. These are used for many tasks, from planning a chip's architecture and logic synthesis to its physical design and final verification. == History == === 1980s–1990s: Expert systems and early experiments === The use of AI for design automation originated in the 1980s and 1990s, mainly with the creation of expert systems. These systems tried to capture the knowledge and practical rules used by human design experts, and used these rules, along with reasoning engines, to direct the design process. A notable early project was the ULYSSES system from Carnegie Mellon University. ULYSSES was a CAD tool integration environment that let expert designers turn their design methods into scripts that could be run automatically. It treated design tools as sources of knowledge that a scheduler could manage. Another example was the ADAM (Advanced Design AutoMation) system at the University of Southern California, which used an expert system called the Design Planning Engine. This engine figured out design strategies on the fly and handled different design jobs by organizing specialized knowledge into structured formats called frames. Other systems like DAA (Design Automation Assistant) used a rule-based approach for specific jobs, such as register transfer level (RTL) design for systems like the IBM 370. Researchers at Carnegie Mellon University also created TALIB, an expert system for mask layout that used over 1200 rules, and EMUCS/DAA for CPU architectural design which used about 70 rules. These projects showed that AI worked better for problems where relatively few rules were required to describe much larger amounts of data. At the same time, there was a surge of tools called silicon compilers like MacPitts, Arsenic, and Palladio. They used algorithms and search techniques to explore different design paradigms. This was another way to automate design, even if it was not always based on expert systems. Early tests with neural networks in VLSI design also happened during this time, although they were not as common as systems based on rules. === 2000s: Introduction of machine learning === In the 2000s, interest in AI for design automation increased. This was mostly because of better machine learning (ML) algorithms and more available data from design and manufacturing. For example, they were used to model and reduce the effects of small manufacturing differences in semiconductor devices. This became very important as the size of components on chips became smaller. The large amount of data created during chip design provided the foundation needed to train smarter ML models. This allowed for predicting outcomes and optimizing in areas that were hard to automate before. === 2016–2020: Reinforcement learning and large scale initiatives === A major turning point happened in the mid to late 2010s, sparked by successes in other areas of AI. The success of DeepMind's AlphaGo in mastering the game of Go inspired researchers. They began to apply reinforcement learning (RL) to difficult EDA problems. These problems often require searching through many options and making a series of decisions. In 2018, the U.S. DARPA started the Intelligent Design of Electronic Assets (IDEA) program. A main goal of IDEA was to create a fully automated layout generator that required no human intervention, able to produce a chip design ready for manufacturing from RTL specifications in 24 hours. Another big initiative was the OpenROAD project, a large effort under IDEA led by UC San Diego with industry and university partners, aimed to build an open source, independent toolchain. It used machine learning, parallelization and divide and conquer approaches. A much-publicized but controversial demonstration of RL's potential came from Google researchers between 2020 and 2021. They created a deep reinforcement learning method for planning the layout of a chip, known as floorplanning. They reported that this method created layouts that were as good as or better than those made by human experts, and it did so in less than six hours. This method used a type of network called a graph convolutional neural network. It showed that it could learn general patterns that could be applied to new problems, getting better as it saw more chip designs. The technology was later used to design Google's Tensor Processing Unit (TPU) accelerators. However, in the original paper, the improvement (if any) from AI was not demonstrated. There was no comparison with existing non-AI tools performing the same task, and since the data is proprietary, no ability for anyone else to perform this comparison. Various efforts to reproduce the AI algorithm, and compare its results with various commercial and academic tools, have yielded mixed results with no conclusive advantage to AI. === 2020s: Autonomous systems and agents === Entering the 2020s, the industry saw the commercial launch of autonomous AI driven EDA systems. For example, Synopsys launched DSO.ai (Design Space Optimization AI) in early 2020, calling it the first autonomous artificial intelligence application for chip design in the industry. This system uses reinforcement learning to search for the best ways to optimize a design within the huge number of possible solutions, trying to improve power, performance, and area (PPA). By 2023, DSO.ai had been used to produce over 100 commercial chips, showing mainstream adoption. Synopsys later grew its AI tools into a suite called Synopsys.ai. The goal was to use AI in the entire EDA workflow, including verification and testing. These advancements, which combine modern AI methods with cloud computing and large data resources, have led to talks about a new phase in EDA. Industry experts and participants sometimes call this 'EDA 4.0'. This new era is defined by the widespread use of AI and machine learning to deal with growing design complexity, automate more of the design process, and help engineers handle the huge amounts of data that EDA tools create. The purpose of EDA 4.0 is to optimize product performance, get products to market faster and make development and manufacturing smoother through intelligent automation. == Applications == Artificial intelligence (AI) is now used in many stages of the electronic design workflow. It aims to improve productivity, get better results, and handle the growing complexity of modern integrated circuits. AI helps designers from the very first ideas about architecture all the way to manufacturing and testing. === High level synthesis and architectural exploration === In the first phases of chip design, AI helps with High Level Synthesis (HLS) and exploring different system level design options (DSE). These processes are key for turning general ideas into detailed hardware plans. AI algorithms, often using supervised learning, are used to build simpler, substitute models. These models can quickly guess important design measurements like area, performance, and power for many different architectural options or HLS settings. For example, the Ithemal tool uses deep neural networks to estimate how fast basic code blocks will run, which helps in making processor architecture decisions. Similarly, PRIMAL uses machine learning estimate power use at the register transfer level (RTL), giving early information about how much power the chip will use. Reinforcement learning (RL) and Bayesian optimization are also used to guide the DSE process. They help search through the many parameters to find the best HLS settings or architectural details like cache sizes. LLMs are also being tested for creating architectural plans or initial C code for HLS, as seen with GPT4AIGChip. === Logic synthesis and optimization === Logic synthesis starts from a high level hardware description and generates an optimized list of electronic gates, known as a gate level netlist, that is ready for placement, routing, and then construction in a specific manufacturing process. AI methods help with different parts of this process, including logic optimization, technology mapping, and making improvements after mapping. Supervised learning, especially with Graph Neural Networks (GNNs), is good at handling data or problems that can be represented as graphs. Since circuit diagrams are instances of directed graphs, supervised learning can help create models that predict design properties like power or error rates in circuits. In logic synthesis and optimization reinforcement learning is used to perform logic optimization directly. In some cases ag

    Read more →
  • Known-item search

    Known-item search

    Known-item search is a specialization of information exploration which represents the activities carried out by searchers who have a particular item in mind. In the context of library catalogs, known‐item search means a search for an item for which the author or title is known. Although the concept of known-item search originated in library science, it is now applied in the context of web search and other online search activities. Known-item search is distinguished from exploratory search, in which a searcher is unfamiliar with the domain of their search goal, unsure about the ways to achieve their goal, and/or unsure about what their goal is.

    Read more →
  • Transaction data

    Transaction data

    Transaction data or transaction information is a category of data describing transactions. Transaction data/information gather variables generally referring to reference data or master data – e.g. dates, times, time zones, currencies. Typical transactions are: Financial transactions about orders, invoices, payments; Work transactions about plans, activity records; Logistic transactions about deliveries, storage records, travel records, etc.. == Management == Recording and storing transactions is called records management. The record of the transaction is stored in a place where the retention can be guaranteed and where data is archived or removed following a retention period. Formats of recorded transactions can be digital data in databases and spreadsheets, or handwritten texts in physical documents like former bankbooks. Transaction processing systems are application software that generate transactions and manage transaction data/information, e.g. SAP and Oracle Financials. == Data warehousing == Transaction data can be summarised in a data warehouse, which helps accessibility and analysis of the data.

    Read more →
  • Rademacher complexity

    Rademacher complexity

    In computational learning theory (machine learning and theory of computation), Rademacher complexity, named after Hans Rademacher, measures richness of a class of sets with respect to a probability distribution. The concept can also be extended to real valued functions. == Definitions == === Rademacher complexity of a set === Given a set A ⊆ R m {\displaystyle A\subseteq \mathbb {R} ^{m}} , the Rademacher complexity of A is defined as follows: Rad ⁡ ( A ) := 1 m E σ [ sup a ∈ A ∑ i = 1 m σ i a i ] {\displaystyle \operatorname {Rad} (A):={\frac {1}{m}}\mathbb {E} _{\sigma }\left[\sup _{a\in A}\sum _{i=1}^{m}\sigma _{i}a_{i}\right]} where σ 1 , σ 2 , … , σ m {\displaystyle \sigma _{1},\sigma _{2},\dots ,\sigma _{m}} are independent random variables drawn from the Rademacher distribution i.e. Pr ( σ i = + 1 ) = Pr ( σ i = − 1 ) = 1 / 2 {\displaystyle \Pr(\sigma _{i}=+1)=\Pr(\sigma _{i}=-1)=1/2} for i ∈ { 1 , 2 , … , m } {\displaystyle i\in \{1,2,\dots ,m\}} , and a = ( a 1 , … , a m ) ∈ A {\displaystyle a=(a_{1},\ldots ,a_{m})\in A} . Some authors take the absolute value of the sum before taking the supremum, but if A {\displaystyle A} is symmetric this makes no difference. === Rademacher complexity of a function class === Let S = { z 1 , z 2 , … , z m } ⊆ Z {\displaystyle S=\{z_{1},z_{2},\dots ,z_{m}\}\subseteq Z} be a sample of points and consider a function class F {\displaystyle {\mathcal {F}}} of real-valued functions over Z {\displaystyle Z} . Then, the empirical Rademacher complexity of F {\displaystyle {\mathcal {F}}} given S {\displaystyle S} is defined as: Rad S ⁡ ( F ) = 1 m E σ [ sup f ∈ F | ∑ i = 1 m σ i f ( z i ) | ] {\displaystyle \operatorname {Rad} _{S}({\mathcal {F}})={\frac {1}{m}}\mathbb {E} _{\sigma }\left[\sup _{f\in {\mathcal {F}}}\left|\sum _{i=1}^{m}\sigma _{i}f(z_{i})\right|\right]} This can also be written using the previous definition: Rad S ⁡ ( F ) = Rad ⁡ ( F ∘ S ) {\displaystyle \operatorname {Rad} _{S}({\mathcal {F}})=\operatorname {Rad} ({\mathcal {F}}\circ S)} where F ∘ S {\displaystyle {\mathcal {F}}\circ S} denotes function composition, i.e.: F ∘ S := { ( f ( z 1 ) , … , f ( z m ) ) ∣ f ∈ F } {\displaystyle {\mathcal {F}}\circ S:=\{(f(z_{1}),\ldots ,f(z_{m}))\mid f\in {\mathcal {F}}\}} The worst case empirical Rademacher complexity is Rad ¯ m ( F ) = sup S = { z 1 , … , z m } Rad S ⁡ ( F ) {\displaystyle {\overline {\operatorname {Rad} }}_{m}({\mathcal {F}})=\sup _{S=\{z_{1},\dots ,z_{m}\}}\operatorname {Rad} _{S}({\mathcal {F}})} Let P {\displaystyle P} be a probability distribution over Z {\displaystyle Z} . The Rademacher complexity of the function class F {\displaystyle {\mathcal {F}}} with respect to P {\displaystyle P} for sample size m {\displaystyle m} is: Rad P , m ⁡ ( F ) := E S ∼ P m [ Rad S ⁡ ( F ) ] {\displaystyle \operatorname {Rad} _{P,m}({\mathcal {F}}):=\mathbb {E} _{S\sim P^{m}}\left[\operatorname {Rad} _{S}({\mathcal {F}})\right]} where the above expectation is taken over an identically independently distributed (i.i.d.) sample S = ( z 1 , z 2 , … , z m ) {\displaystyle S=(z_{1},z_{2},\dots ,z_{m})} generated according to P {\displaystyle P} . == Intuition == The Rademacher complexity is typically applied on a function class of models that are used for classification, with the goal of measuring their ability to classify points drawn from a probability space under arbitrary labellings. When the function class is rich enough, it contains functions that can appropriately adapt for each arrangement of labels, simulated by the random draw of σ i {\displaystyle \sigma _{i}} under the expectation, so that this quantity in the sum is maximized. The Rademacher complexity of a set A {\displaystyle A} can be rewritten as Rad ⁡ ( A ) := 1 m E σ [ sup a ∈ A ∑ i = 1 m σ i a i ] = 1 m 2 m ∑ σ ∈ { − 1 / m , + 1 / m } m [ sup a ∈ A ⟨ σ , a ⟩ ] . {\displaystyle \operatorname {Rad} (A):={\frac {1}{m}}\mathbb {E} _{\sigma }\left[\sup _{a\in A}\sum _{i=1}^{m}\sigma _{i}a_{i}\right]={\frac {1}{{\sqrt {m}}2^{m}}}\sum _{\sigma \in \{-1/{\sqrt {m}},+1/{\sqrt {m}}\}^{m}}\left[\sup _{a\in A}\langle \sigma ,a\rangle \right].} Each term in the summation is the farthest distance of the set A {\displaystyle A} from the origin, along a unit-length direction σ {\displaystyle \sigma } . The directions are along the vertices of a hypercube. Thus, we can also write it as Rad ⁡ ( A ) = 1 2 m 1 2 m − 1 ∑ σ ∈ { − 1 / m , + 1 / m } m / { − 1 , + 1 } [ sup a ∈ A ⟨ σ , a ⟩ − inf a ∈ A ⟨ σ , a ⟩ ] {\displaystyle \operatorname {Rad} (A)={\frac {1}{2{\sqrt {m}}}}{\frac {1}{2^{m-1}}}\sum _{\sigma \in \{-1/{\sqrt {m}},+1/{\sqrt {m}}\}^{m}/\{-1,+1\}}\left[\sup _{a\in A}\langle \sigma ,a\rangle -\inf _{a\in A}\langle \sigma ,a\rangle \right]} Here, the set { − 1 / m , + 1 / m } m / { − 1 , + 1 } {\displaystyle \{-1/{\sqrt {m}},+1/{\sqrt {m}}\}^{m}/\{-1,+1\}} denotes half of the vertices of a hypercube, selected so that each diagonal has exactly one vertex selected. In words, this states that 2 m Rad ⁡ ( A ) {\displaystyle 2{\sqrt {m}}\operatorname {Rad} (A)} is precisely the average width of the set A {\displaystyle A} along all diagonal directions of a hypercube. == Examples == A singleton set has 0 width in any direction, so it has Rademacher complexity 0. The set A = { ( 1 , 1 ) , ( 1 , 2 ) } ⊆ R 2 {\displaystyle A=\{(1,1),(1,2)\}\subseteq \mathbb {R} ^{2}} has average width 1 / 2 {\displaystyle 1/{\sqrt {2}}} along the two diagonal directions of the square, so it has Rademacher complexity 1 / 4 {\displaystyle 1/4} . The unit cube [ 0 , 1 ] m {\displaystyle [0,1]^{m}} has constant width m {\displaystyle {\sqrt {m}}} along the diagonal directions, so it has Rademacher complexity 1 / 2 {\displaystyle 1/2} . Similarly, the unit cross-polytope { x ∈ R m : ‖ x ‖ 1 ≤ 1 } {\displaystyle \{x\in \mathbb {R} ^{m}:\|x\|_{1}\leq 1\}} has constant width 2 / m {\displaystyle 2/{\sqrt {m}}} along the diagonal directions, so it has Rademacher complexity 1 / m {\displaystyle 1/m} . == Using the Rademacher complexity == The Rademacher complexity can be used to derive data-dependent upper-bounds on the learnability of function classes. Intuitively, a function-class with smaller Rademacher complexity is easier to learn. === Bounding the representativeness === In machine learning, it is desired to have a training set that represents the true distribution of some sample data S {\displaystyle S} . This can be quantified using the notion of representativeness. Denote by P {\displaystyle P} the probability distribution from which the samples are drawn. Denote by H {\displaystyle H} the set of hypotheses (potential classifiers) and denote by F {\displaystyle {\mathcal {F}}} the corresponding set of error functions, i.e., for every hypothesis h ∈ H {\displaystyle h\in H} , there is a function f h ∈ F {\displaystyle f_{h}\in F} , that maps each training sample (features,label) to the error of the classifier h {\displaystyle h} (note in this case hypothesis and classifier are used interchangeably). For example, in the case that h {\displaystyle h} represents a binary classifier, the error function is a 0–1 loss function, i.e. the error function f h {\displaystyle f_{h}} returns 0 if h {\displaystyle h} correctly classifies a sample and 1 else. We omit the index and write f {\displaystyle f} instead of f h {\displaystyle f_{h}} when the underlying hypothesis is irrelevant. Define: L P ( f ) := E z ∼ P [ f ( z ) ] {\displaystyle L_{P}(f):=\mathbb {E} _{z\sim P}[f(z)]} – the expected error of some error function f ∈ F {\displaystyle f\in {\mathcal {F}}} on the real distribution P {\displaystyle P} ; L S ( f ) := 1 m ∑ i = 1 m f ( z i ) {\displaystyle L_{S}(f):={1 \over m}\sum _{i=1}^{m}f(z_{i})} – the estimated error of some error function f ∈ F {\displaystyle f\in {\mathcal {F}}} on the sample S {\displaystyle S} . The representativeness of the sample S {\displaystyle S} , with respect to P {\displaystyle P} and F {\displaystyle {\mathcal {F}}} , is defined as: Rep P ⁡ ( F , S ) := sup f ∈ F ( L P ( f ) − L S ( f ) ) {\displaystyle \operatorname {Rep} _{P}({\mathcal {F}},S):=\sup _{f\in F}(L_{P}(f)-L_{S}(f))} Smaller representativeness is better, since it provides a way to avoid overfitting: it means that the true error of a classifier is not much higher than its estimated error, and so selecting a classifier that has low estimated error will ensure that the true error is also low. Note however that the concept of representativeness is relative and hence can not be compared across distinct samples. The expected representativeness of a sample can be bounded above by the Rademacher complexity of the function class: If F {\displaystyle {\mathcal {F}}} is a set of functions with range within [ 0 , 1 ] {\displaystyle [0,1]} , then Rad P , m ⁡ ( F ) − ln ⁡ 2 2 m ≤ E S ∼ P m [ Rep P ⁡ ( F , S ) ] ≤ 2 Rad P , m ⁡ ( F ) {\displaystyle \operatorname {Rad} _{P,m}({\mathcal {F}})-{\sqrt {\frac {\ln 2}{2m}}}\leq \mathbb {E} _{S\sim P^{m}}[\operatorname {Rep} _{P}({\

    Read more →
  • Algorithmic Puzzles

    Algorithmic Puzzles

    Algorithmic Puzzles is a book of puzzles based on computational thinking. It was written by computer scientists Anany and Maria Levitin, and published in 2011 by Oxford University Press. == Topics == The book begins with a "tutorial" introducing classical algorithm design techniques including backtracking, divide-and-conquer algorithms, and dynamic programming, methods for the analysis of algorithms, and their application in example puzzles. The puzzles themselves are grouped into three sets of 50 puzzles, in increasing order of difficulty. A final two chapters provide brief hints and more detailed solutions to the puzzles, with the solutions forming the majority of pages of the book. Some of the puzzles are well known classics, some are variations of known puzzles making them more algorithmic, and some are new. They include: Puzzles involving chessboards, including the eight queens puzzle, knight's tours, and the mutilated chessboard problem Balance puzzles River crossing puzzles The Tower of Hanoi Finding the missing element in a data stream The geometric median problem for Manhattan distance == Audience and reception == The puzzles in the book cover a wide range of difficulty, and in general do not require more than a high school level of mathematical background. William Gasarch notes that grouping the puzzles only by their difficulty and not by their themes is actually an advantage, as it provides readers with fewer clues about their solutions. Reviewer Narayanan Narayanan recommends the book to any puzzle aficionado, or to anyone who wants to develop their powers of algorithmic thinking. Reviewer Martin Griffiths suggests another group of readers, schoolteachers and university instructors in search of examples to illustrate the power of algorithmic thinking. Gasarch recommends the book to any computer scientist, evaluating it as "a delight".

    Read more →
  • Australian Geoscience Data Cube

    Australian Geoscience Data Cube

    The Australian Geoscience Data Cube (AGDC) is an approach to storing, processing and analyzing large collections of Earth observation data. The technology is designed to meet challenges of national interest by being agile and flexible with vast amounts of layered grid data. The AGDC reduces processing time of traditional image analysis by calibrating, pre-computing known extents, pixel alignment and storing metadata in a cell lattice structure. The temporal-pixel aligned data can often be analysed faster across space and time dimensions than previous scene based techniques. This allows the AGDC to be flexible in tackling future challenges and improve analysis times on every-increasing data repositories of earth observation. The AGDC has also been used internationally to allow countries to maintain ecologically sustainable programs and reduce the difficulty curve of utilizing Remote Sensing data. == Background == The AGDC was originally conceived by Geoscience Australia but is now maintained in a partnership between Geoscience Australia, Commonwealth Scientific and Industrial Research Organisation (CSIRO) and National Computational Infrastructure National Facility (Australia) (NCI). This is made possible by the funding from the partnership and a number of organisations such as National Collaborative Research Infrastructure Strategy (NCRIS). == Analysis ready data, ingestion and indexing == The data processed in the cube is made analysis ready before being ingested and indexed into the AGDC. Analysis ready data is pre-processed data that has applied corrections for instrument calibration (gains and offsets), geolocation (spatial alignment) and radiometry (solar illumination, incidence angle, topography, atmospheric interference). The ingestion process manages the translation of datasets into the storage units while maintaining a database index. The data within the storage and index can be accessed via API calls often compiled within code such as Python (programming language). Example: s2a_l1c = dc.load(product='s2a_level1c_granule',x=(147.36, 147.41), y=(-35.1, -35.15), measurements=['04','03','02'], output_crs='EPSG:4326', resolution=(-0.00025,0.00025)) === Datasets currently stored === Geoscience Australia Landsat Surface Reflectance (1987 to present) Landsat Pixel Quality Landsat Fractional Cover Landsat NDVI === Datasets that have been piloted === USGS Landsat Surface Reflectance SRTM DEM Himawari 8 MODIS Sentinel-2 L1C / S2A Australian Gridded Climate Data == Open source == The AGDC code base is situated in GitHub as an open repository. The core code base moved to the Open Data Cube in early 2017 as part of an international collaboration. Whilst the code base is the Open Data Cube, individual cubes exist as their own right such as the AGDC on the National Computational Infrastructure National Facility (Australia) (NCI) using the High-Performance Computing Cluster HPCC. The core code can be installed on personal computers or public computers (using git) and has many unit tests. Documentation for the code base exists on Read the Docs. == Challenges of the AGDC == The AGDC is designed to meet nationally significant challenges such as the following. Sustainability Environment Water resource management Disaster assist Policy development Community planning Forest preservation Carbon measurement == International awards == The AGDC won the 2016 Content Platform of the Year award from Geospatial World Forum.

    Read more →