AI For Business For Dummies

AI For Business For Dummies — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Neural processing unit

    Neural processing unit

    A neural processing unit (NPU), also known as an AI accelerator or deep learning processor, is a class of specialized hardware accelerator or computer system designed to accelerate artificial intelligence and machine learning applications, including artificial neural networks and computer vision. == Use == Their purpose is either to efficiently execute already trained AI models (inference) or to train AI models. NPUs can be more efficient in terms of speed or power consumption. NPU applications include algorithms for robotics, Internet of things, and data-intensive or sensor-driven tasks. They are often manycore or spatial designs and focus on low-precision arithmetic, novel dataflow architectures, or in-memory computing capability. As of 2024, a widely used datacenter-grade AI integrated circuit chip, the Nvidia H100 GPU, contains tens of billions of MOSFETs. === Consumer devices === AI accelerators are used in Apple silicon, Qualcomm, Samsung, Huawei, and Google Tensor smartphone processors. Vision processing units are accelerators specialized for machine vision algorithms such as CNN (convolutional neural networks) and SIFT (scale-invariant feature transform). They are used in devices that need to keep track of objects visually such as AR headsets and drones. It is more recently (circa 2017) added to processors from Apple and (circa 2022) to processors from Intel and AMD. All models of Intel Meteor Lake processors have a built-in versatile processor unit (VPU) for accelerating inference for computer vision and deep learning. On consumer devices, the NPU is intended to be small, power-efficient, but reasonably fast when used to run small models. To do this they are designed to support low-bitwidth operations using data types such as INT4, INT8, FP8, and FP16. A common metric is trillions of operations per second (TOPS). Although TOPS does not explicitly specify the kind of operations, it is typically INT8 additions and multiplications. === Datacenters === Accelerators are used in cloud computing servers: e.g., tensor processing units (TPU) for Google Cloud Platform, and Trainium and Inferentia chips for Amazon Web Services. Many vendor-specific terms exist for devices in this category, and it is an emerging technology without a dominant design. Since the late 2010s, graphics processing units designed by companies such as Nvidia and AMD often include AI-specific hardware in the form of dedicated functional units for low-precision matrix-multiplication operations. These GPUs are commonly used as AI accelerators, both for training and inference. === Scientific computation === Although NPUs are tailored for low-precision (e.g., FP16, INT8) matrix multiplication operations, they can be used to emulate higher-precision matrix multiplications in scientific computing. As modern GPUs place much focus on making the NPU part fast, using emulated FP64 (Ozaki scheme) on NPUs can potentially outperform native FP64. This has been demonstrated using FP16-emulated FP64 on NVIDIA TITAN RTX and using INT8-emulated FP64 on NVIDIA consumer GPUs and the A100 GPU. Consumer GPUs especially benefited as they have limited FP64 hardware capacity, showing a 6× speedup. Since CUDA Toolkit 13.0 Update 2, cuBLAS automatically uses INT8-emulated FP64 matrix multiplication of the equivalent precision if it is faster than native. This is in addition to the FP16-emulated FP32 feature introduced in version 12.9. == Programming == An operating system or a higher-level library may provide application programming interfaces such as TensorFlow with LiteRT Next (Android), CoreML (iOS, macOS) or DirectML (Windows). Formats such as ONNX are used to represent trained neural networks. Consumer CPU-integrated NPUs are accessible through vendor-specific APIs. AMD (Ryzen AI), Intel (OpenVINO), Apple silicon (CoreML), and Qualcomm (SNPE) each have their own APIs, which can be built upon by a higher-level library. GPUs generally use existing GPGPU pipelines such as CUDA and OpenCL adapted for lower precisions and specialized matrix-multiplication operations. Vulkan is also being used. Custom-built systems such as the Google TPU use private interfaces. There are a large number of separate underlying acceleration APIs and compilers/runtimes in use in the AI field, causing a great increase in software development effort due to the many combinations involved. As of 2025, the open standard organization Khronos Group is pursuing standardization of AI-related interfaces to reduce the amount of work needed. Khronos is working on three separate fronts: expansion of data types and intrinsic operations in OpenCL and Vulkan, inclusion of compute graphs in SPIR-V, and a NNEF/SkriptND file format for describing a neural network.

    Read more →
  • Classora

    Classora

    Classora is a knowledge base for the Internet oriented to data analysis. From a practical point of view, Classora is a digital repository that stores structured information and allows it to be displayed in multiple formats: analytically, graphically, geographically (through maps); as well as carry out OLAP analysis. The information contained in Classora comes from public sources and is uploaded into the system through bots and ETL processes. The Knowledge Base has a commercial API for semantic enhancement, and an open web through which any user can access to part of the information collected (it also allows users to complete data and share opinions). Internally, Classora is organized into Knowledge Units and Reports. A «Knowledge Unit» is any element of the World about which information may be stored and presented in the form of a data sheet (a person, a company, a country, etc.) A «Report» is a group of Knowledge Units: a ranking of companies, a sport classification table, a survey about people, etc. In fact, one of the technical capabilities of Classora is that it allows the comparison of reports and knowledge units gathered from different sources, thereby generating an added value for the media in which this information is published: digital media, interactive TV, etc. == Key definitions == === Knowledge unit === The units of knowledge (also known as entries) in Classora are data sheets that have a certain semantic equivalence with the articles on the Wikipedia: they store information about any element of the world, be it a film, a country, a company or an animal. However, they differ from Wikipedia in that Classora stores structured information, enriched with a metadata layer; and therefore it is able to automatically interpret the meaning of each unit of knowledge. === Data report === A report is a group of units of knowledge in which the repetition of elements is not allowed. This definition includes any list, poll, ranking, etc.; and, in general, any consultation that involves more than one unit of knowledge. Classora excels at the reports management due to its visualization capabilities, being able to display data in the form of tables, graphs and maps. Types of reports: Sports scores: Sports competitions results sanctioned by the competent institution. Rankings and lists: All types of interesting and curious lists, whether they have an implicit order or not. Polls: Units of knowledge that are ranked according to users’ votes. Queries to the Knowledge Base: Questions from users using CQL. Networks of connections: automatically calculated from the reports and the taxonomy of each Knowledge Unit. === Organizational taxonomy === An organizational taxonomy (also referred to as entry type) is a data sheet that brings together the common attributes of a set of units of knowledge. For instance, the organizational taxonomy F1 Driver displays attributes such as date of debut, team, etc.; and the organizational taxonomy Football Club presents attributes such as city, stadium, etc. In Classora, taxonomies are hierarchically organized, so that they inherit attributes from their parent taxonomies. For instance, F1 Driver is a subsidiary taxonomy of Sportsperson, which is a subsidiary taxonomy of Person, which in turn is a subsidiary taxonomy of Organism. The simplest type of entry in Classora is Classora Object. All the other taxonomies are its subsidiaries and inherit its attributes. In fact, the only attribute Classora Object possesses is name (all units of knowledge are required to have one name at least). == Architecture of Classora == === Data Extraction Module === The Data Extraction Module consists of a set of robots coordinated by software that also manages the potential incidents. Most of the information available in Classora is automatically uploaded through those robots, which connect to the main online public sources to gather all types of data. There are three categories of robots: Extraction robots: responsible for the massive uploading of reports from official public sources (FIFA, CIA, IMF, Eurostat...). They are used for either absolute or incremental data uploading. Data scanner robots: responsible for looking for and updating the data of a unit of knowledge. They use specific sources to perform this task: Wikipedia, IMDB, World Bank, etc. Content aggregators: they don’t connect to external sources. Instead, they generate new information using Classora’s internal database. === Participatory Module === In Classora’s Open Website, Internet users may participate providing their knowledge as they would on the Wikipedia. There are different ways to participate: adding or correcting data in the Knowledge Base, voting in surveys (participatory rankings) and creating new Knowledge Units and Data Reports. === Connectivity Module === The Knowledge Base is designed to be embedded in multi-platform, multi-channel systems, thus enabling its integration into mobile devices, tablets, interactive TV, etc. This integration may be carried out through specific plugins (for navigators or other devices) or an API REST that provides content in XML or JSON formats. The API is divided into three blocks of operations. The first one is the block of general utility tools (ranging from autosuggest components about geographical hierarchies to operations to obtain the list of today’s celebrity birthdays, using CQL). The second one is the block of operations for widget generation (graphs, maps, rankings) using information from the knowledge base. Finally, there is a block of operations designed for the publication of free-source content. == Project statistics == As of April 2012, 2,000,000 Knowledge Units, 15,000 Reports, around 10,000 Maps and several million potential Comparative Analyses had been added to Classora. According to the site of web metrics Alexa, Classora Open Website is ranked at 100,557 globally and at 2,880 in the Spanish traffic ranking. Users spend an average of 9 ½ minutes in Classora.

    Read more →
  • Torus interconnect

    Torus interconnect

    A torus interconnect is a switch-less network topology for connecting processing nodes in a parallel computer system. == Introduction == In geometry, a torus is created by revolving a circle about an axis coplanar to the circle. While this is a general definition in geometry, the topological properties of this type of shape describes the network topology in its essence. === Geometry illustration === In the representations below, the first is a one dimension torus, a simple circle. The second is a two dimension torus, in the shape of a 'doughnut'. The animation illustrates how a two dimension torus is generated from a rectangle by connecting its two pairs of opposite edges. At one dimension, a torus topology is equivalent to a ring interconnect network, in the shape of a circle. At two dimensions, it becomes equivalent to a two dimension mesh, but with extra connection at the edge nodes. === Torus network topology === A torus interconnect is a switch-less topology that can be seen as a mesh interconnect with nodes arranged in a rectilinear array of N = 2, 3, or more dimensions, with processors connected to their nearest neighbors, and corresponding processors on opposite edges of the array connected.[1] In this lattice, each node has 2N connections. This topology is named for the lattice formed in this way, which is topologically homogeneous to an N-dimensional torus. == Visualization == The first 3 dimensions of torus network topology are easier to visualize and are described below: 1D Torus: one dimension, n nodes are connected in closed loop with each node connected to its two nearest neighbors. Communication can take place in two directions, +x and −x. A 1D Torus is the same as ring interconnection. 2D Torus: two dimensions with degree of four, the nodes are imagined laid out in a two-dimensional rectangular lattice of n rows and n columns, with each node connected to its four nearest neighbors, and corresponding nodes on opposite edges connected. Communication can take place in four directions, +x, −x, +y, and −y. The total nodes of a 2D Torus is n2. 3D Torus: three dimensions, the nodes are imagined in a three-dimensional lattice in the shape of a rectangular prism, with each node connected with its six neighbors, with corresponding nodes on opposing faces of the array connected. Each edge consists of n nodes. communication can take place in six directions, +x, −x, +y, −y, +z, −z. Each edge of a 3D Torus consist of n nodes. The total nodes of 3D Torus is n3. ND Torus: N dimensions, each node of an N dimension torus has 2N neighbors, Communication can take place in 2N directions. Each edge consists of n nodes. Total nodes of this torus is nN. The main motivation of having higher dimension of torus is to achieve higher bandwidth, lower latency, and higher scalability. Higher-dimensional arrays are difficult to visualize. The above ruleset shows that each higher dimension adds another pair of nearest neighbor connections to each node. == Performance == A number of supercomputers on the TOP500 list use three-dimensional torus networks, e.g. IBM's Blue Gene/L and Blue Gene/P, and the Cray XT3. IBM's Blue Gene/Q uses a five-dimensional torus network. Fujitsu's K computer and the PRIMEHPC FX10 use a proprietary three-dimensional torus 3D mesh interconnect called Tofu. === 3D Torus performance simulation === Sandeep Palur and Dr. Ioan Raicu from Illinois Institute of Technology conducted experiments to simulate 3D torus performance. Their experiments ran on a computer with 250GB RAM, 48 cores and x86_64 architecture. The simulator they used was ROSS (Rensselaer’s Optimistic Simulation System). They mainly focused on three aspects: Varying network size Varying number of servers Varying message size They concluded that throughput decreases with the increase of servers and network size. Otherwise, throughput increases with the increase of message size. === 6D Torus product performance === Fujitsu Limited developed a 6D torus computer model called "Tofu". In their model, a 6D torus can achieve 100 GB/s off-chip bandwidth, 12 times higher scalability than a 3D torus, and high fault tolerance. The model is used in the K computer and Fugaku. === Cost === While long wrap-around links may be the easiest way to visualize the connection topology, in practice, restrictions on cable lengths often make long wrap-around links impractical. Instead, directly connected nodes—including nodes that the above visualization places on opposite edges of a grid, connected by a long wrap-around link—are physically placed nearly adjacent to each other in a folded torus network. Every link in the folded torus network is very short—almost as short as the nearest-neighbor links in a simple grid interconnect—and therefore low-latency.

    Read more →
  • VK (service)

    VK (service)

    VK (short for its original name VKontakte; Russian: ВКонтакте, lit. 'InContact') is a Russian online social media and social networking service based in Saint Petersburg. VK is available in multiple languages but it is predominantly used by Russian speakers. VK users can message each other publicly or privately, edit messages, create groups, public pages, and events; share and tag images, audio, and video; and play browser-based games. As of August 2018, VK had at least 500 million accounts. As of November 2022, it was the sixth most popular website in Russia. The network was also popular in Ukraine until it was banned by the Verkhovna Rada in 2017. According to Semrush, in 2024, VK was the 30th most visited website in the world; as YouTube is subject to blocking in Russia, VK Video overtook Google's top position in monthly web traffic for the first time in December 2024, as part of the major substitution to domestic business. == History == VKontakte was conceived in 2006 when Pavel Durov, creator of the popular student forum spbgu.ru, met his former classmate Vyacheslav Mirilashvili in St. Petersburg after graduating from the Faculty of Philology at St Petersburg State University. Vyacheslav showed Durov the increasingly popular Facebook, after which the friends decided to create a new Russian social network. Lev Leviev, an Israeli classmate of Vyacheslav Mirilashivili, became the third co-founder. Vyacheslav Mirilashvili borrowed the money from his billionaire father and became the largest shareholder. Lev Leviev took over operational management, and Durov became CEO. Pavel Durov convinced his older brother Nikolai, a multiple winner of international math and programming competitions, to develop the site. Durov launched VKontakte for beta testing in September 2006. The following month, the domain name Vkontakte.ru was registered. The new project was incorporated on 19 January 2007 as a Russian private limited company. In February 2007 the site reached a user base of over 100,000 and was recognized as the second largest company in Russia's nascent social network market. In the same month, the site was subjected to a severe DDoS attack, which briefly put it offline. The user base reached 1 million in July 2007, and 10 million in April 2008. In December 2008 VK overtook rival Odnoklassniki as Russia's most popular social networking service. == Website == Similar to many social networks, the platform's fundamental features revolve around private messaging, sharing photos, posting status updates, and exchanging links with friends. VK also provides tools for administering online communities and managing celebrity pages. The site allows its users to upload, search and stream media content, such as videos and music. VK features an advanced search engine, that allows complex queries for finding friends, as well as a real-time news search. VK updated its features and design in April 2016. === Features === Messaging. VK Private Messages can be exchanged between groups of 2 to 500 people. An email address can also be specified as the recipient. Each message may contain up to 10 attachments: Photos, Videos, Audio Files, Maps (an embedded map with a manually placed marker), and Documents. News. VK users can post on their profile walls, each post may contain up to 10 attachments – media files, maps, and documents (see above). User mentions and hashtags are supported. In the case of multiple photo attachments, the previews are automatically scaled and arranged in a magazine-style layout. The news feed can be switched between all news (default) and most interesting modes. The site features a news-recommendation engine, global real-time search, and individual search for posts and comments on specific users' walls. Communities. VK features three types of communities. Groups are better suited for decentralized communities (discussion boards, wiki-style articles, editable by all members, etc.). Public pages is a news feed-orientated broadcasting tool for celebrities and businesses. The two types are largely interchangeable, the main difference being in the default settings. The third type of community is called Events, which are used for appropriately organizing concerts and events in an appropriate way. Like buttons. VK like buttons for posts, comments, media, and external sites operate differently from Facebook. Liked content doesn't get automatically pushed to the user's wall, but is saved in the private Favorites section instead. The user has to press a second 'share with friends' button to share an item on their wall or send it via private message to a friend. Privacy. Users can control the availability of their content within the network and on the Internet. Blanket and granular privacy settings are available for pages and individual content. Synchronization with other social networks. Any news published on the VK wall will appear on Facebook or Twitter. Certain news may not be published by clicking on the logo next to the "Send" button. Editing a post in VK does not change the post in Facebook or Twitter and vice versa. However, removing the news in VK will remove it from other social networks. SMS service. Russian users can receive and reply to a private message or leave a comment for community news using SMS. Music. Users have access to the audio files uploaded by other users. In addition, users can upload the audio files themselves, create playlists and share audios with others by attaching to messages and wall posts. The uploaded audio files cannot violate copyright laws. === Popularity === As of May 2017, according to Alexa Internet ranking, VK is one of the most visited websites in some Eurasian countries. It is: 4th most visited in Russia; 3rd most visited in Belarus; 6th most visited in Kazakhstan; 8th most visited in Kyrgyzstan and Moldova; 12th most visited in Latvia. It was the fourth most viewed site in Ukraine until, in May 2017, the Ukrainian government banned the use of VK in Ukraine. According to a study for May 2018 conducted by Factum Group Ukraine VK remained the fourth most viewed site in Ukraine, but Facebook was twice as much visited. For 2019, VK appeared as the most visited social network in Ukraine according to Alexa. According to the Internet Association of Ukraine the share of Ukrainian Internet users who visit VK daily had fallen from 54% to 10% from September 2016 to September 2019. They also claimed in November 2019 that Facebook was the most popular social network. VK was expected to gain most of the users lost by Facebook and Instagram after they were blocked in Russia in 2022, according to a Calltouch poll. == Ownership == Initially, founder and CEO Pavel Durov owned 20% of shares (although he had majority voting power through proxy votes), and a trio of Russian-Israeli investors Yitzchak Mirilashvili, his father Mikhael Mirilashvili, and Lev Leviev owned 60%, 10%, and 10% respectively. In 2007, Digital Sky Technologies, an investment company managed by Yuri Milner, acquired a total of 24.99% of the shares from shareholders, investing $16.3 million. In preparation for the IPO in September 2010, DST separated international and Russian assets: the former formed the DST Global fund, while the latter, including VKontakte and rival social network Odnoklassniki, were merged into Mail.ru Group. Mail.ru Group used part of the money to acquire 7.5% of the social network for $112.5 million at a valuation of the entire project of 1.5 billion dollars. After exercising a 7.5% option in July 2011 for $111.7 million, Mail.ru Group accumulated a 39.99% stake in VKontakte. The head of Mail.ru Group, Dmitry Grishin, voiced the company's intention to gain 100% control over VKontakte. MRG was discussing with shareholders to buy out shares from the valuation of the entire company in $2-3 billion. In the summer of 2011, Mirilashvili and Leviev were ready to accept in payment owned by Mail.ru Group shares of Facebook, Groupon, and Zynga, but the deal failed due to Durov's unwillingness to sell a stake on MRG terms. Later, the co-founders considered VKontakte's IPO as an alternative. In March 2012, Durov "accidentally" became plugged into the negotiations where Mirilashvili and Leviev discussed selling their stakes directly to Mail.ru Group's main investor, Alisher Usmanov. On the same day, Durov deleted the pages of the first co-investors, stopped contacting them, and soon announced that VKontakte would postpone its IPO indefinitely. On 29 May 2012, Mail.ru Group announced its decision to yield control of the company to Durov by offering him the voting rights on its shares. Combined with Durov's personal 12% stake, this gave him 52% of the votes. In April 2013, the Mirilashvili family sold its 40% share in VK to United Capital Partners for $1.12 billion, while Lev Leviev sold his 8% share in the same deal, giving United Capital Partners 48% ownership. In January 2014, VK's founder Pavel Durov sold his 12% stake in the company to I

    Read more →
  • Spanner (database)

    Spanner (database)

    Spanner is a distributed SQL database management and storage service developed by Google. It provides features such as global transactions, strongly consistent reads, and automatic multi-site replication and failover. Spanner is used in Google F1, the database for its advertising business Google Ads, as well as Gmail and Google Photos. == Features == Spanner stores large amounts of mutable structured data. Spanner allows users to perform arbitrary queries using SQL with relational data while maintaining strong consistency and high availability for that data with synchronous replication. Key features of Spanner: Transactions can be applied across rows, columns, tables, and databases within a Spanner universe. Clients can control the replication and placement of data using automatic multi-site replication and failover. Replication is synchronous and strongly consistent. Reads are strongly consistent and data is versioned to allow for stale reads: clients can read previous versions of data, subject to garbage collection windows. Supports a native SQL interface for reading and writing data. Support for Graph Query Language == History == Spanner was first described in 2012 for internal Google data centers. Spanner's SQL capability was added in 2017 and documented in a SIGMOD 2017 paper. It became available as part of Google Cloud Platform in 2017, under the name "Cloud Spanner". == Architecture == Spanner uses the Paxos algorithm as part of its operation to shard (partition) data across up to hundreds of servers. It makes heavy use of hardware-assisted clock synchronization using GPS clocks and atomic clocks to ensure global consistency. TrueTime is the brand name for Google's distributed cloud infrastructure, which provides Spanner with the ability to generate monotonically increasing timestamps in data centers around the world. Google's F1 SQL database management system (DBMS) is built on top of Spanner, replacing Google's custom MySQL variant.

    Read more →
  • Data security

    Data security

    Data security or data protection is the process of securing digital information to protect it from online threats. Data security or protection means protecting digital data, such as those in a database, from destructive forces and from the unwanted actions of unauthorized users, such as a cyberattack or a data breach. Data security protects computer hardware, software, storage devices, and the data of user devices. Data security also protects the data of organizations, companies and administrative controls. Data security guarantees the protection of individual data, such as identity documents and bank data, and protects against unauthorized access, theft and loss of individual data. Data security also protects data breaches that occurs in companies and industries. Good security measures in industries reduce the probability of data breaches, and employees can rely on the company with their data and private information to be kept secured while companies can continue to maintain a stable reputation. The CIA Triad (Confidentiality, Integrity, and Availability) is what is used to practice what an information security is required to follow. Confidentiality, protects information from being accessed by unauthorized persons. Integrity, makes sure data is trustworthy; and Availability, meaning that data can be accessed by approved users when it is needed; are three goals for data security. Non-repudiation in data security definition, is a device/service that shows where the data originated from and the proof of integrity. == Technologies == === Disk encryption === Disk encryption refers to encryption technology that encrypts data on a hard disk drive. It takes data from a storage device and coverts it into an unreadable format. Disk encryption typically takes form in either software (see disk encryption software) or hardware (see disk encryption hardware) which can be used together. Disk encryption is often referred to as on-the-fly encryption (OTFE) or transparent encryption. Full disk encryption encrypts each individual sector of a disk volume. Files and user data are encrypted to hinder unauthorized users from accessing without a decryption key. A diversifier permits a plaintext of a specific disk sector to be encrypted into different ciphertexts, which does not require additional storage, such as an initialization vector (IV) or message authentication code (MAC). === Software versus hardware-based mechanisms for protecting data === Software-based security solutions encrypt the data to protect it from theft. However, a malicious program or a hacker could corrupt the data to make it unrecoverable, making the system unusable. Hardware-based security solutions prevent read and write access to data, which provides very strong protection against tampering and unauthorized access. Hardware-based security or assisted computer security offers an alternative to software-only computer security. Security tokens such as those using PKCS#11 or a mobile phone may be more secure due to the physical access required in order to be compromised. Access is enabled only when the token is connected and the correct PIN is entered (see two-factor authentication). However, dongles can be used by anyone who can gain physical access to it. Newer technologies in hardware-based security solve this problem by offering full proof of security for data. Working off hardware-based security: A hardware device allows a user to log in, log out and set different levels through manual actions. Many devices use biometric technology to prevent malicious users from logging in, logging out, and changing privilege levels. The current state of a user of the device is read by controllers in peripheral devices such as hard disks. Illegal access by a malicious user or a malicious program is interrupted based on the current state of a user by hard disk and DVD controllers making illegal access to data impossible. Hardware-based access control is more secure than the protection provided by the operating systems as operating systems are vulnerable to malicious attacks by viruses and hackers. The data on hard disks can be corrupted after malicious access is obtained. With hardware-based protection, the software cannot manipulate the user privilege levels. A hacker or a malicious program cannot gain access to secure data protected by hardware or perform unauthorized privileged operations. This assumption is broken only if the hardware itself is malicious or contains a backdoor. The hardware protects the operating system image and file system privileges from being tampered with. Therefore, a completely secure system can be created using a combination of hardware-based security and secure system administration policies. === Backups === Backup is the process of reproducing copies of essential data and storing in a separate, secured place. It is used to ensure data that is lost can be recovered from another source. Backups contains a minimum of one copy of the data that requires preservation. It is considered essential to keep a backup of any data in most industries and the process is recommended for any files of importance to a user. There are 3 types of backups; full backups, incremental backups, and differential backups. Full backups secure all data from a production system, such as a server, database, or other connected data source. It is impossible to lose all data in a full backup if a breach or corruption were to occur. Full backups require a significantly large amount of time to back up and may be time-consuming taking hours to days to complete. Incremental backups only secures changed data since last backup. While all backups are done in full backups, incremental backups only save data that is recently or frequently changed. Incremental backups require lower storage costs making it a prominent solution for growing datasets. === Data Privacy === Data privacy (or information privacy) is the right for individual's data to be secured to obstruct the use of unauthorized access. It gives individuals control over their data and how it can be shared to third parties. The U.S Privacy Protection Law (see Privacy laws of the United States) requires organizations to inform individuals of how their data is collected and when a data breach occurs. By implementing an encryption, it ensures that private data is unreadable to cybercriminals. === Data masking === Data masking of structured data is the process of obscuring (masking) specific data within a database table or cell to ensure that data security is maintained and sensitive information is not exposed to unauthorized personnel. This may include masking the data from users (for example so banking customer representatives can only see the last four digits of a customer's national identity number), developers (who need real production data to test new software releases but should not be able to see sensitive financial data), outsourcing vendors, etc. Data masking is a form of encryption, as it obscures data by modifying particular letters and numbers to keep data concealed and protected from potential hackers. The individual that has access to the code that decrypts the replaced characters are the only ones that can uncover the data. === Data erasure === Data erasure (or data deletion, data destruction) is a method of software-based overwriting that permanently clears all electronic data residing on a hard drive or other digital media to ensure that no sensitive data is lost when an asset is retired or reused. Article 17: Right to be Forgotten states that users have the right to permanently remove all of their private information from their old devices/services to give people more control over their data. Users are able to switch between devices efficiently. == Threats == === Malware === Malware (or malicious software) is designed to destroy, corrupt or gain unauthorized access to a computer for the purpose of stealing, or destroying data. Hackers who use malware typically utilize many types of malware, which includes computer virus, computer worms, ransomware, spyware and Trojan horse to create a vast system of disruption and cause easy data theft. One of the victims of the vast system of disruption includes healthcare workers, who are targeted by compromised systems by infections and then having their data attacked. === Phishing === Phishing is a type of scam that allows hackers to hoax people using psychological and social engineering (using human emotions such as their trust and fear) tactics into giving personal data through emails and messages, and install computer viruses if the individual were to click on a malicious link unknowingly. Attackers are able to create websites that are very similar to original websites, which makes it difficult to detect a fake website, causing individuals to fall for giving in information. Phishing attackers use human emotion to exploit them, such as making them feel fear, urgency, sympathy with the message

    Read more →
  • Data marketplace

    Data marketplace

    Data marketplace is an online platform for sharing and consuming data in the form of data assets or data products. Part of the data management stack, it aims to bring together data producers and data consumers (including business users and AI) in a single space, with the objective of increasing access to understandable, high-quality data. Included within its Data Marketplaces and Exchange (DME) category by Gartner, data marketplaces can provide data internally within an organization, externally with partners, or as open data. == Concept == Digitization has dramatically increased data volumes within organizations, with IDC predicting that by 2025 the world will contain 175 zettabytes of data. This has created a need to both manage this data and provide access to it to enable business intelligence and data analysis. However, data is often scattered within multiple systems (such as data warehouses and data lakes), and is in formats that are only understandable by technical experts, such as data scientists. According to IDC, 81% of IT leaders cite data silos as a major barrier to digital transformation. This means that data is not freely available to business users or external audiences such as partners or citizens, limiting its value, and holding back AI deployments. Data marketplaces solve this issue, providing seamless, self-service access to high-quality data in an understandable, secure and auditable manner. They break down data silos, reduce friction in data access, and enable a broader range of users, including non-technical profiles, to find, understand, and consume data autonomously. Data assets on the marketplace can be raw data, data visualizations or data products. Data marketplaces combine data management functions such as data governance with the user-friendly experience offered by e-commerce marketplaces in order to increase the usage of data. These include features such as powerful search engines, feedback, ratings, subscriptions and product description sheets. According to Gartner, data marketplaces provide infrastructure, transactional capabilities, and services for both consumers and providers of data assets. == History and timeline == Data marketplaces have evolved since they first emerged in terms of both their scope and usage. === 2000s === With the rise of the internet, data brokers began collecting, aggregating, distributing and selling personal, financial and marketing data to third parties online. Data marketplaces were deployed to monetize this data, making it discoverable and accessible to users, either through subscriptions or one-off purchases. At the same time, regulations, such as the US Open Government Initiative of 2009 and others around the world mandated greater transparency and data sharing with the public. Data sharing portals were created by public and government bodies to make this information available through self-service to all users. === 2010s === Due to the growth of big data and cloud platforms, cloud-based data exchange platforms emerged. These were offered by major infrastructure providers, and included Amazon Web Services (AWS) Data Exchange, Snowflake Data Marketplace, and the Google Cloud Platform. These platforms moved beyond simple data brokerage or open data by providing structured, catalogued data sharing between organizations. === 2020s === Driven by a need to increase internal data sharing with both business users and AI, organizations are now looking to adopt internal data marketplaces. These aim to democratize data consumption by providing seamless access for all employees and AI to trusted data, including data products, through an intuitive, e-commerce style experience. According to Gartner analyst Richa Jha, "by providing a single, governed platform for discovering, sharing, and scaling data products, data marketplaces drive productivity, collaboration, and ROI across the enterprise." == Data marketplaces within the overall data architecture == Data marketplaces provide a consumption and collaboration layer for data. That means they complement and integrate with other parts of the overall data architecture, including: === Data warehouses and data lakes === Data marketplaces connect to data sources, such as data warehouses or data lakes, to provide intuitive access to the data stored within them, enabling data to be shared and distributed to non-technical audiences. Access can be direct, with data and data products stored within the data marketplace or virtualized. === Data catalog === A data catalog provides a technical inventory of an organization's data estate. It collects technical information on all available data assets within an organization, based on metadata descriptions. This ensures traceability, and supports compliance and governance requirements. Unlike a data marketplace, a data catalog does not provide access to data, and is designed to be used by data professionals, rather than the business. This means it lacks an intuitive, understandable interface and is consequently not easily accessible by business users. === Data mesh === Data mesh is an architecture and framework for data management, first defined by Zhamak Dehghani in 2019. It aims to decentralize data ownership to delegate responsibility, empowering teams and focusing on delivering data to users in the form of self-service data products. The data marketplace is a central pillar of data mesh, providing intuitive access to these data products, and creating a collaboration space for data owners and data consumers. === Data product === Data products are high-value, consumable data assets that package high-quality data and associated tools to enable seamless usage by business users at scale. First defined by McKinsey in 2022, they have an identified owner, a service level agreement (SLA), and a reusability logic. == Core components of a data marketplace == A data marketplace typically includes specific core components: === E-commerce style interface === An e-commerce style experience that engages non-technical users, minimizes the need for training and builds confidence and trust in data. Look and feel should be customizable to incorporate corporate design guidelines to ensure consistency with other organizational applications. === Built-in data catalog === As in a standalone data catalog, this indexes all available data, based on metadata that includes type, source, owner, freshness, and quality level. === Discovery and search engine === This enables users to search, filter, explore and discover available data intuitively. As in an e-commerce marketplace, it should be intelligent, and provide relevant results based on natural language queries. === Access control and security management === Data marketplaces will contain data that needs to be protected under regulations such as the General Data Protection Regulation (GDPR) in Europe, the California Consumer Privacy Act (CCPA) in the United States, and sector-specific frameworks in industries such as finance and healthcare. To ensure both security and compliance while maximizing data consumption, the data marketplace should include granular access management and a full audit trail. === Semantic layer and business glossary === Different parts of the business are likely to use different terms to describe data. This leads to inconsistencies and an inability to share data across systems and teams. The semantic layer and business glossary standardize a shared vocabulary and common definitions of business indicators and concepts, providing a single language for data across the business and for AI agents. === Data governance mechanisms === These enforce corporate data governance policies, ensuring data traceability through data lineage, quality certification, usage monitoring, and continuous improvement through user feedback loops. === Collaboration features === As on an e-commerce website, a data marketplace should provide collaboration features that bring together data users and data owners. This includes the ability to rate data products, share use cases, and provide feedback to data owners, creating a community around data and supporting a data-driven culture. == Types of data marketplace == While they share the same underlying technology, data marketplaces can be deployed in three broad ways: === Internal data marketplaces === These bring together data from across an organization and make it available via self-service to employees from across the business. They aim to widen access to data and consequently to improve decision-making and reporting, increase performance and maximize efficiency. === Ecosystem data marketplaces === These extend sharing beyond a single organization, enabling multiple partners (public institutions, industry players, research bodies) to share and consume data within a governed framework. Data can be provided by all parties or simply by one organization and consumed by others. Ecosystem data marketplaces are particularly relevant in

    Read more →
  • Polygraphic substitution

    Polygraphic substitution

    Polygraphic substitution is a substitution cipher in which a uniform substitution is performed on blocks of letters. When the length of the block is specifically known, more precise terms are used: for instance, a cipher in which pairs of letters are substituted is bigraphic. As a concept, polygraphic substitution contrasts with monoalphabetic (or simple) substitutions in which individual letters are uniformly substituted, or polyalphabetic substitutions in which individual letters are substituted in different ways depending on their position in the text. In theory, there is some overlap in these definitions; one could conceivably consider a Vigenère cipher with an eight-letter key to be an octographic substitution. In practice, this is not a useful observation since it is far more fruitful to consider it to be a polyalphabetic substitution cipher. == Specific ciphers == In 1563, Giambattista della Porta devised the first bigraphic substitution. However, it was nothing more than a matrix of symbols. In practice, it would have been all but impossible to memorize, and carrying around the table would lead to risks of falling into enemy hands. In 1854, Charles Wheatstone came up with the Playfair cipher, a keyword-based system that could be performed on paper in the field. This was followed up over the next fifty years with the closely related four-square and two-square ciphers, which are slightly more cumbersome but offer slightly better security. In 1929, Lester S. Hill developed the Hill cipher, which uses matrix algebra to encrypt blocks of any desired length. However, encryption is very difficult to perform by hand for any sufficiently large block size, although it has been implemented by machine or computer. This is therefore on the frontier between classical and modern cryptography. == Cryptanalysis of general polygraphic substitutions == Polygraphic systems do provide a significant improvement in security over monoalphabetic substitutions. Given an individual letter 'E' in a message, it could be encrypted using any of 52 instructions depending on its location and neighbors, which can be used to great advantage to mask the frequency of individual letters. However, the security boost is limited; while it generally requires a larger sample of text to crack, it can still be done by hand. One can identify a polygraphically-encrypted text by performing a frequency chart of polygrams and not merely of individual letters. These can be compared to the frequency of plaintext English. The distribution of digrams is even more stark than individual letters. For example, the six most common letters in English (23%) represent approximately half of English plaintext, but it takes only the most frequent 8% of the 676 digrams to achieve the same potency. In addition, even in a plaintext many thousands of characters long, one would expect that nearly half of the digrams would not occur, or only barely. In addition, looking over the text one would expect to see a fairly regular scattering of repeated text in multiples of the block length and relatively few that are not multiples. Cracking a code identified as polygraphic is similar to cracking a general monoalphabetic substitution except with a larger 'alphabet'. One identifies the most frequent polygrams, experiments with replacing them with common plaintext polygrams, and attempts to build up common words, phrases, and finally meaning. Naturally, if the investigation led the cryptanalyst to suspect that a code was of a specific type, like a Playfair or order-2 Hill cipher, then they could use a more specific attack.

    Read more →
  • AI effect

    AI effect

    The AI effect is a phenomenon in which advances in artificial intelligence lead to a redefinition of what is considered intelligence, such that capabilities achieved by AI systems are no longer regarded as examples of "real" intelligence. The concept has been used to describe both a cognitive tendency and a sociotechnical pattern, in which successful AI techniques are reclassified as routine computation or absorbed into other domains. Historian Pamela McCorduck described this as a recurring feature of AI research, noting in her 2004 book Machines Who Think that once a problem is solved, it is no longer considered evidence of intelligence. Researcher Rodney Brooks similarly observed in 2002 that once systems are understood, they are often regarded as "just computation". == Definition == The AI effect refers to a shift in how intelligence is defined as machines acquire new capabilities. Tasks such as playing chess, recognizing speech, or interpreting images were historically considered indicators of intelligence, but after successful automation they are often reclassified as routine computation. McCorduck described this as an "odd paradox", in which successful AI systems are assimilated into other domains, leaving AI researchers to focus on unsolved problems. The phenomenon is often interpreted as an instance of moving the goalposts. A commonly cited formulation is Tesler's theorem, often expressed as "AI is whatever hasn't been done yet". When problems are not fully formalised, they may be described using models involving human computation, such as human-assisted Turing machines. == Historical examples == === Game playing === Early AI systems capable of playing games such as checkers and chess were initially regarded as demonstrations of machine intelligence. As these systems improved and became better understood, their achievements were often reinterpreted as examples of computation rather than intelligence. The victory of IBM's Deep Blue over Garry Kasparov in 1997 is a frequently cited example. Critics argued that the system relied on brute-force methods rather than genuine understanding. === Pattern recognition === Technologies such as optical character recognition and speech recognition were once considered core problems in artificial intelligence. As these systems became reliable and widely deployed, they were increasingly treated as standard engineering solutions. === Integration into applications === Many techniques originally developed within AI research have been incorporated into broader technological systems, including marketing, automation, and software applications. Michael Swaine reported in 2007 that AI advances are often presented as developments in other fields. Marvin Minsky observed that successful AI innovations often evolve into separate disciplines. Nick Bostrom noted in 2006 that widely adopted technologies are often no longer labeled as AI. == Contemporary discussion == The AI effect continues to be discussed in the context of recent advances in machine learning, particularly large language models and other generative AI systems. As these systems have become more widely used, some researchers and commentators have noted that their capabilities are frequently described as statistical or mechanical once understood, rather than as intelligence. A 2016 survey of artificial intelligence also noted that AI systems are increasingly embedded in everyday applications, reinforcing earlier observations that successful AI technologies tend to become normalized and no longer identified as AI. At the same time, the widespread commercial use of artificial intelligence has led to greater visibility of the field, contrasting with earlier periods in which AI techniques were often present but unacknowledged. == Interpretations == === Cognitive bias === Some authors describe the AI effect as a cognitive bias in which expectations of intelligence shift as machines achieve new capabilities. === Sociotechnical perspective === Another interpretation emphasizes how technologies are reclassified over time as they become widespread and commercially successful. === Philosophical debate === Some philosophers argue that reclassification reflects genuine conceptual distinctions rather than bias. == Historical context == During periods such as the AI winter, researchers sometimes avoided the term "artificial intelligence" due to negative perceptions. In the 21st century, however, the term "AI" has become widely used in public discourse and marketing. == Broader implications == The AI effect has been linked to broader questions about human uniqueness and the nature of intelligence. Michael Kearns suggested that people may seek to preserve a special role for humans. Similar patterns have been observed in studies of animal cognition. Herbert A. Simon noted that artificial intelligence can provoke strong emotional reactions.

    Read more →
  • Control-flow diagram

    Control-flow diagram

    A control-flow diagram (CFD) is a diagram to describe the control flow of a business process, process or review. Control-flow diagrams were developed in the 1950s, and are widely used in multiple engineering disciplines. They are one of the classic business process modeling methodologies, along with flow charts, drakon-charts, data flow diagrams, functional flow block diagram, Gantt charts, PERT diagrams, and IDEF. == Overview == A control-flow diagram can consist of a subdivision to show sequential steps, with if-then-else conditions, repetition, and/or case conditions. Suitably annotated geometrical figures are used to represent operations, data, or equipment, and arrows are used to indicate the sequential flow from one to another. There are several types of control-flow diagrams, for example: Change-control-flow diagram, used in project management Configuration-decision control-flow diagram, used in configuration management Process-control-flow diagram, used in process management Quality-control-flow diagram, used in quality control. In software and systems development, control-flow diagrams can be used in control-flow analysis, data-flow analysis, algorithm analysis, and simulation. Control and data are most applicable for real time and data-driven systems. These flow analyses transform logic and data requirements text into graphic flows which are easier to analyze than the text. PERT, state transition, and transaction diagrams are examples of control-flow diagrams. == Types of control-flow diagrams == === Process-control-flow diagram === A flow diagram can be developed for the process [control system] for each critical activity. Process control is normally a closed cycle in which a sensor. The application determines if the sensor information is within the predetermined (or calculated) data parameters and constraints. The results of this comparison, which controls the critical component. This [feedback] may control the component electronically or may indicate the need for a manual action. This closed-cycle process has many checks and balances to ensure that it stays safe. It may be fully computer controlled and automated, or it may be a hybrid in which only the sensor is automated and the action requires manual intervention. Further, some process control systems may use prior generations of hardware and software, while others are state of the art. === Performance-seeking control-flow diagram === The figure presents an example of a performance-seeking control-flow diagram of the algorithm. The control law consists of estimation, modeling, and optimization processes. In the Kalman filter estimator, the inputs, outputs, and residuals were recorded. At the compact propulsion-system-modeling stage, all the estimated inlet and engine parameters were recorded. In addition to temperatures, pressures, and control positions, such estimated parameters as stall margins, thrust, and drag components were recorded. In the optimization phase, the operating-condition constraints, optimal solution, and linear-programming health-status condition codes were recorded. Finally, the actual commands that were sent to the engine through the DEEC were recorded.

    Read more →
  • G.9963

    G.9963

    Recommendation G.9963 is a home networking standard under development at the International Telecommunication Union standards sector, the ITU-T. It was begun in 2010 by ITU-T to add multiple-input and multiple-output (known as MIMO) capabilities to the G.hn standard originally defined in Recommendation G.9960. The standard is also known as "G.hn-mimo". As part of the family of G.hn standards, G.9963 was endorsed by the HomeGrid Forum.

    Read more →
  • Social business model

    Social business model

    The social business model is use of social media tools and social networking behavioral standards by businesses for communication with customers, suppliers, and others. Combining social networking etiquette (being helpful, transparent and authentic) with business engagement on LinkedIn (for one-to-one interaction), Twitter (for immediacy) and Facebook (for content sharing) more fully involves employees in the organization and increases customer intimacy and trust. == Overview == Traditional business models, particularly in large organizations, have had as one common characteristic careful limitation of direct contact between those within the organization and those outside of it. Only certain specific individuals (most frequently in roles such as sales, customer service and field consulting) were designated as "customer-facing" personnel. Organizations further limited outside access to internal employees through filtering mechanisms such as publishing only a main switchboard number (whether routed through a live receptionist or an interactive voice response system) and generic "sales@" or "info@" email addresses. The Cluetrain Manifesto (written by Rick Levine, Christopher Locke, Doc Searls, and David Weinberger and published in 1999) was among the first books to predict the demise of this old order and the emergence of more open business models, though most of the business world was slow to adopt the book's recommended cultural changes. Thirteen years later, authors Dion Hinchcliffe and Peter Kim added structural underpinnings to the cultural shifts outlined in The Cluetrain Manifesto in their book, Social Business by Design. The book details many of the ways social media tools and practices are being adopted within organizations, to support both internal employee collaboration and external customer engagement (which the authors describe as the "bigger problem"). == Elements == In implementing the social business model, organizations apply social networking protocols and tools in a range of areas, potentially including: Marketing Customer Support Recruiting Crowdsourcing Internal employee collaboration Sales Product Development Supply Chain Operations Investor Relations == Characteristics of organizations adopting the social business model == Organizations that fully adopt the social business model will exhibit four key characteristics: Connected – employees will be able to seamlessly engage one-on-one in real-time with other employees and individuals outside the organization (customers, prospects, partners, media, etc.) using a variety of communications methods including text chat, voice, file sharing, email, and video chat. Social – employees will follow social networking etiquette (being authentic, helpful and transparent) in external interactions. The focus will be on answering questions and providing information rather than overt sales or promotion. Presence – these conversations may originate on the company's website or elsewhere online (e.g., publication websites, industry portals, or social networking sites such as LinkedIn or Facebook). Intelligent – organizations will use in-depth analytics to monitor connections, social interactions and presence; measure corresponding business results; and continually adjust and improve practices for increased effectiveness. == Technical and functional requirements == While much of the change inherent in adopting the social business model is cultural, it also requires process changes enabled by social business technology. Functional requirements for a social business technology platform include: Analytics (including the cost of engagement as well as various measures of return on investment such as leads, sales, referrals, recommendations, and retained customers). Integration with other social media and business tools such as CRM systems, partner relationship management (PRM) software, product development, website analytics, and employee-recruiting applications. Rules-based workflow (e.g. routing a comment to the appropriate individual for a response, based on content). Geolocation (so customers or prospects can be automatically routed to local sales or customer service representatives). Content sharing. Collaboration tools. Transparency (i.e., people should know who they are engaging with) Unified communications (the ability to engage via voice, text, video, email, and share a wide variety of file types) Storage (the ability to store interactions for legal, training, compliance or compensation purposes, and purge stored data when no longer needed based on company policy or regulatory requirements). Immediacy (real-time monitoring and response).

    Read more →
  • GasBuddy

    GasBuddy

    GasBuddy is a technology company headquartered in Dallas, United States, that offers mobile applications and websites for tracking crowd-sourced locations and prices of gas stations and convenience stores in the United States and Canada. Their platforms offer information sourced from users, gas station operators, and partner companies. They also provide business-to-business services to gas stations and convenience store owners. == History == GasBuddy was founded in Minneapolis in 2000 by Dustin Coupal, Jason Toews as a community website for sharing gas prices. In 2004, they filed as a for-profit corporation in Minnesota under the name GasBuddy Organization Inc. In 2009, GasBuddy launched OpenStore, a platform that allows convenience stores to build and manage their own mobile apps. In 2010, the company launched its own mobile apps that allowed users to input gas prices from their smartphones. In 2013, Oil Price Information Service (OPIS), a subsidiary of UCG, acquired GasBuddy. OPIS is a provider of petroleum pricing and news for businesses. In 2016, IHS acquired OPIS, separating from GasBuddy, which remained with UCG as a subsidiary company. Initially only available in the United States and Canada, GasBuddy launched in Australia in March 2016. Also in that year, GasBuddy released a completely redesigned app, its first major redesign since its release in 2010. GasBuddy also unveiled a new logo and launched GasBuddy Business Pages. GasBuddy shut down the Australian version of their app in 2022. In 2017, GasBuddy launched a gas savings program titled "Pay with GasBuddy" intended to let consumers save at gas stations in the United States. In the same year, GasBuddy was involved in a lawsuit with Reveal Mobile, a location-based marketing company, over the sale of user location data. It was revealed that GasBuddy sold information on more than 4.5 million users to Reveal each month for $9.50 per 1000 users. According to CNET, that information included "users' latitude, longitude, IP address, and time stamps on the data collected," which sparked concern in the media and between its users. In 2021, the GasBuddy app rose to the most popular app on both Android and iPhone platforms in the wake of the Colonial Pipeline ransomware attack PDI acquired GasBuddy in 2021.

    Read more →
  • Cryptovirology

    Cryptovirology

    Cryptovirology refers to the study of cryptography use in malware, such as ransomware and asymmetric backdoors. Traditionally, cryptography and its applications are defensive in nature, and provide privacy, authentication, and security to users. Cryptovirology employs a twist on cryptography, showing that it can also be used offensively. It can be used to mount extortion based attacks that cause loss of access to information, loss of confidentiality, and information leakage, tasks which cryptography typically prevents. The field was born with the observation that public-key cryptography can be used to break the symmetry between what an antivirus analyst sees regarding malware and what the attacker sees. The antivirus analyst sees a public key contained in the malware, whereas the attacker sees the public key contained in the malware as well as the corresponding private key (outside the malware) since the attacker created the key pair for the attack. The public key allows the malware to perform trapdoor one-way operations on the victim's computer that only the attacker can undo. == Overview == The field encompasses covert malware attacks in which the attacker securely steals private information such as symmetric keys, private keys, PRNG state, and the victim's data. Examples of such covert attacks are asymmetric backdoors. An asymmetric backdoor is a backdoor (e.g., in a cryptosystem) that can be used only by the attacker, even after it is found. This contrasts with the traditional backdoor that is symmetric, i.e., anyone that finds it can use it. Kleptography, a subfield of cryptovirology, is the study of asymmetric backdoors in key generation algorithms, digital signature algorithms, key exchanges, pseudorandom number generators, encryption algorithms, and other cryptographic algorithms. The NIST Dual EC DRBG random bit generator has an asymmetric backdoor in it. The EC-DRBG algorithm utilizes the discrete-log kleptogram from kleptography, which by definition makes the EC-DRBG a cryptotrojan. Like ransomware, the EC-DRBG cryptotrojan contains and uses the attacker's public key to attack the host system. The cryptographer Ari Juels indicated that NSA effectively orchestrated a kleptographic attack on users of the Dual EC DRBG pseudorandom number generation algorithm and that, although security professionals and developers have been testing and implementing kleptographic attacks since 1996, "you would be hard-pressed to find one in actual use until now." Due to public outcry about this cryptovirology attack, NIST rescinded the EC-DRBG algorithm from the NIST SP 800-90 standard. Covert information leakage attacks carried out by cryptoviruses, cryptotrojans, and cryptoworms that, by definition, contain and use the public key of the attacker is a major theme in cryptovirology. In "deniable password snatching," a cryptovirus installs a cryptotrojan that asymmetrically encrypts host data and covertly broadcasts it. This makes it available to everyone, noticeable by no one (except the attacker), and only decipherable by the attacker. An attacker caught installing the cryptotrojan claims to be a virus victim. An attacker observed receiving the covert asymmetric broadcast is one of the thousands, if not millions of receivers, and exhibits no identifying information whatsoever. The cryptovirology attack achieves "end-to-end deniability." It is a covert asymmetric broadcast of the victim's data. Cryptovirology also encompasses the use of private information retrieval (PIR) to allow cryptoviruses to search for and steal host data without revealing the data searched for even when the cryptotrojan is under constant surveillance. By definition, such a cryptovirus carries within its own coding sequence the query of the attacker and the necessary PIR logic to apply the query to host systems. == History == The first cryptovirology attack and discussion of the concept was by Adam L. Young and Moti Yung, at the time called "cryptoviral extortion" and it was presented at the 1996 IEEE Security & Privacy conference. In this attack, a cryptovirus, cryptoworm, or cryptotrojan contains the public key of the attacker and hybrid encrypts the victim's files. The malware prompts the user to send the asymmetric ciphertext to the attacker who will decipher it and return the symmetric decryption key it contains for a fee. The victim needs the symmetric key to decrypt the encrypted files if there is no way to recover the original files (e.g., from backups). The 1996 IEEE paper predicted that cryptoviral extortion attackers would one day demand e-money, long before Bitcoin even existed. Many years later, the media relabeled cryptoviral extortion as ransomware. In 2016, cryptovirology attacks on healthcare providers reached epidemic levels, prompting the U.S. Department of Health and Human Services to issue a Fact Sheet on Ransomware and HIPAA. The fact sheet states that when electronic protected health information is encrypted by ransomware, a breach has occurred, and the attack therefore constitutes a disclosure that is not permitted under HIPAA, the rationale being that an adversary has taken control of the information. Sensitive data might never leave the victim organization, but the break-in may have allowed data to be sent out undetected. California enacted a law that defines the introduction of ransomware into a computer system with the intent of extortion as being against the law. == Examples == === Tremor virus === While viruses in the wild have used cryptography in the past, the only purpose of such usage of cryptography was to avoid detection by antivirus software. For example, the tremor virus used polymorphism as a defensive technique in an attempt to avoid detection by anti-virus software. Though cryptography does assist in such cases to enhance the longevity of a virus, the capabilities of cryptography are not used in the payload. The One-half virus was amongst the first viruses known to have encrypted affected files. === Tro_Ransom.A virus === An example of a virus that informs the owner of the infected machine to pay a ransom is the virus nicknamed Tro_Ransom.A. This virus asks the owner of the infected machine to send $10.99 to a given account through Western Union. Virus.Win32.Gpcode.ag is a classic cryptovirus. This virus partially uses a version of 660-bit RSA and encrypts files with many different extensions. It instructs the owner of the machine to email a given mail ID if the owner desires the decryptor. If contacted by email, the user will be asked to pay a certain amount as ransom in return for the decryptor. === CAPI === It has been demonstrated that using just 8 different calls to Microsoft's Cryptographic API (CAPI), a cryptovirus can satisfy all its encryption needs. == Other uses of cryptography-enabled malware == Apart from cryptoviral extortion, there are other potential uses of cryptoviruses, such as deniable password snatching, cryptocounters, private information retrieval, and in secure communication between different instances of a distributed cryptovirus.

    Read more →
  • Data profiling

    Data profiling

    Data profiling is the process of examining the data available from an existing information source (e.g. a database or a file) and collecting statistics or informative summaries about that data. The purpose of these statistics may be to: Find out whether existing data can be easily used for other purposes Improve the ability to search data by tagging it with keywords, descriptions, or assigning it to a category Assess data quality, including whether the data conforms to particular standards or patterns Assess the risk involved in integrating data in new applications, including the challenges of joins Discover metadata of the source database, including value patterns and distributions, key candidates, foreign-key candidates, and functional dependencies Assess whether known metadata accurately describes the actual values in the source database Understanding data challenges early in any data intensive project, so that late project surprises are avoided. Finding data problems late in the project can lead to delays and cost overruns. Have an enterprise view of all data, for uses such as master data management, where key data is needed, or data governance for improving data quality. == Introduction == Data profiling refers to the analysis of information for use in a data warehouse in order to clarify the structure, content, relationships, and derivation rules of the data. Profiling helps to not only understand anomalies and assess data quality, but also to discover, register, and assess enterprise metadata. The result of the analysis is used to determine the suitability of the candidate source systems, usually giving the basis for an early go/no-go decision, and also to identify problems for later solution design. == How data profiling is conducted == Data profiling utilizes methods of descriptive statistics such as minimum, maximum, mean, mode, percentile, standard deviation, frequency, variation, aggregates such as count and sum, and additional metadata information obtained during data profiling such as data type, length, discrete values, uniqueness, occurrence of null values, typical string patterns, and abstract type recognition. The metadata can then be used to discover problems such as illegal values, misspellings, missing values, varying value representation, and duplicates. Different analyses are performed for different structural levels. E.g. single columns could be profiled individually to get an understanding of frequency distribution of different values, type, and use of each column. Embedded value dependencies can be exposed in a cross-columns analysis. Finally, overlapping value sets possibly representing foreign key relationships between entities can be explored in an inter-table analysis. Normally, purpose-built tools are used for data profiling to ease the process. The computational complexity increases when going from single column, to single table, to cross-table structural profiling. Therefore, performance is an evaluation criterion for profiling tools. == When is data profiling conducted? == According to Kimball, data profiling is performed several times and with varying intensity throughout the data warehouse developing process. A light profiling assessment should be undertaken immediately after candidate source systems have been identified and DW/BI business requirements have been satisfied. The purpose of this initial analysis is to clarify at an early stage if the correct data is available at the appropriate detail level and that anomalies can be handled subsequently. If this is not the case the project may be terminated. Additionally, more in-depth profiling is done prior to the dimensional modeling process in order assess what is required to convert data into a dimensional model. Detailed profiling extends into the ETL system design process in order to determine the appropriate data to extract and which filters to apply to the data set. Additionally, data profiling may be conducted in the data warehouse development process after data has been loaded into staging, the data marts, etc. Conducting data at these stages helps ensure that data cleaning and transformations have been done correctly and in compliance of requirements. == Benefits and examples == Data profiling can improve data quality, shorten the implementation cycle of major projects, and improve users' understanding of data. Discovering business knowledge embedded in data itself is one of the significant benefits derived from data profiling. It can improve data accuracy in corporate databases.

    Read more →