AI Grammar Paraphrase

AI Grammar Paraphrase — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Cybersecurity in space

    Cybersecurity in space

    Cybersecurity in space involves the defense of all space assets (e.g. navigation systems, satellites, ground antennas, networks, etc.). The security of space can be affected by attacks such as disruption, corruption as well as the destruction of depended-upon assets/collected data. Government (e.g. militaries) and non-government sectors (e.g. financial industries) have started to become more reliant on numerous space-based services. Due to the criticality of these services, space security experts have identified these assets as high-value targets (HVT) that can cause detrimental consequences to all of Earth. == Scope and definitions == Space assets are broken down by three sub-sectors: the space component, the ground component, and the individual user component. The architecture of space assets is extremely complex and allows for a frequent attack vector utilized, the disruption by radio frequency (RF) cyber-attacks. In 2020, a memorandum was published by President Donald Trump, Space Policy Directive‑5 (SPD‑5). It established principles to ensure the safeguarding of all space assets. In 2023, the National Institute of Standards and Technology’s (NIST) published IR 8270, Introduction to Cybersecurity for Commercial Satellite Operations. This report established a baseline risk-management framework (RMF) to be implemented into space operations. == History == During the Cold War in the 1950s-1960s, the United States and Russia entered what was called the “Space Race”. By 1957, the Soviet Union successfully launched the first satellite into space named Sputnik. By 1961, the first key milestone was accomplished when the Soviet Union’s Yuri Gagarin became the first human to orbit Earth. This was later followed by the first American, Alan Shepard, to be launched into space; this was followed by John Glenn becoming the first American to orbit Earth in 1962. In 1969, a pinnacle milestone was reached when Apollo 11 launched into space and Neil Armstrong became the first man to walk on the moon. As space operations furthered, Commercial off-the-shelf products became increasingly popular but resulted in a rapid increase to the cyber-attack surface. Public awareness of space security did not increase until 2022, when the Viasat KA-SAT incident occurred, resulting in the disruption of a large number of modems across Europe. The attack was later accredited to Russia by the U.S. and the U.K. Policy and standards started to rapidly increase by 2020. The establishment of SPD-5 was released in 2020 followed by asset hardening instructions in 2022, and NIST’s IR 8270 in 2023. It was not until 2025 that Europe published their own findings in the Space Threat Landscape 2025 Report. This document led to the EU’s security proposals and standards. == Threats == === Radio-frequency Interference and Global Navigation Satellite Systems (GNSS) Spoofing === Space services are highly dependent on RF links for systems such as GNSS, however, a consequence of this dependency on RF is denial of service and deception. In 2017, the Black Sea maritime event occurred when numerous ships were subject to spoofing. Space services depend on RF links susceptible to jamming (denial) and spoofing (deception), including for GNSS/Positioning, Navigation, and Timing (PNT). Annotated incidents include the 2017 Black Sea maritime spoofing event affecting numerous ships, and extensive aviation GNSS spoofing patterns surveyed in various regions during 2024–2025. === Network intrusion and malware === Cyber threats can intrude and infect assets with malware. They do this by finding misconfiguration vulnerabilities, remote-management interfaces, and/or supply-chain vulnerabilities mainly in ground networks and user terminals. When KA-SAT occurred, it resulted from bulk modem disturbances. Forensic analysts later suggested malicious management controls and wiper malware as the root cause. === Supply-chain and lifecycle risks === The outsource of COTS components, external vendors, and software defined payloads allowed for vulnerabilities to emerge in the System/Product Lifecycle. In response, EU recommended the implementation of lifecycle-wide controls as mitigating factors. === Espionage, disruption, and influence === As Advanced Persistent Threats (APTs), Global Positioning System (GPS) intervention, and information warfare increased, assets like transponders became more frequent targets of attack. == Noteworthy incidents == The Viasat KA‑SAT incident of 2022, where a large number of modems in Europe were disrupted, resulted in the loss of telemetry access to a significant amount of wind turbines in Germany. The mass GNSS deception of the Black Sea in 2017 affected numerous ships when they started to convey fake central locations in Russia. Between 2024 and 2025, there was a mass, repetitive aviation GNSS spoofing that affected the aircraft of various regions. == Standards, guidelines, and best practices == SPD‑5 (U.S.) – This established risk-based engineering, verifying and ensuring positive control, and the implementation of risk mitigation controls. NIST IR 8270 – This created a RMF for COTS satellites. CISA/FBI SATCOM Advisory (AA22‑076) – Provided guidance on hardening techniques such as least-privileged, access control, encryption, etc.). ENISA Space Threat Landscape 2025 – It established the categorization of assets to organize threats, ensuring the observation of system/product lifecycle, and an RMF for COTS satellites. ECSS‑E‑ST‑80C (2024) – This established a standard for securing lifecycles in space, covering all segments (e.g. ground, launch, etc.). == Regulation and governance == As of 2025, there is no international regulations established for space assets, but the U.S., EU, and ESA institutional initiatives have published standards to address security concerns. The U.S. implemented SPD-5 and the Federal Communications Commission (FCC); the FCC addressed orbital debris. While the EU created standards to address technological mandates and support the implementation of NIS2. Lastly, the ESA created a special operations center to safeguard their satellites. International governance is still evolving, but forums have been held by the United Nations Committee on the Peaceful Uses of Outer Space. International conversations under forums such as the UN Committee on the Peaceful Uses of Outer Space (COPUOS) progressively note the cyber–space safety relationship, though formal global norms specific to space cybersecurity continue evolving. == Risk management approaches == Through RMF, mitigation controls have been implemented to reduce the risk of exploitation while increasing the security of space. Controls addressing mitigation include proper configuration, system hardening, zero-trust architectures, encryption, etc. Both the government and industries have placed an emphasis on incident response procedures to identify, contain, and remediate breaches.

    Read more →
  • Neurorobotics

    Neurorobotics

    Neurorobotics is the combined study of neuroscience, robotics, and artificial intelligence. It is the science and technology of embodied autonomous neural systems. Neural systems include brain-inspired algorithms (e.g. connectionist networks), computational models of biological neural networks (e.g. artificial spiking neural networks, large-scale simulations of neural microcircuits) and actual biological systems (e.g. in vivo and in vitro neural nets). Such neural systems can be embodied in machines with mechanic or any other forms of physical actuation. This includes robots, prosthetic or wearable systems but also, at smaller scale, micro-machines and, at the larger scales, furniture and infrastructures. Neurorobotics is that branch of neuroscience with robotics, which deals with the study and application of science and technology of embodied autonomous neural systems like brain-inspired algorithms. It is based on the idea that the brain is embodied and the body is embedded in the environment. Therefore, most neurorobots are required to function in the real world, as opposed to a simulated environment. Beyond brain-inspired algorithms for robots neurorobotics may also involve the design of brain-controlled robot systems. == Major classes of models == Neurorobots can be divided into various major classes based on the robot's purpose. Each class is designed to implement a specific mechanism of interest for study. Common types of neurorobots are those used to study motor control, memory, action selection, and perception. === Locomotion and motor control === Neurorobots are often used to study motor feedback and control systems, and have proved their merit in developing controllers for robots. Locomotion is modeled by a number of neurologically inspired theories on the action of motor systems. Locomotion control has been mimicked using models or central pattern generators, clumps of neurons capable of driving repetitive behavior, to make four-legged walking robots. Other groups have expanded the idea of combining rudimentary control systems into a hierarchical set of simple autonomous systems. These systems can formulate complex movements from a combination of these rudimentary subsets. This theory of motor action is based on the organization of cortical columns, which progressively integrate from simple sensory input into a complex afferent signals, or from complex motor programs to simple controls for each muscle fiber in efferent signals, forming a similar hierarchical structure. Another method for motor control uses learned error correction and predictive controls to form a sort of simulated muscle memory. In this model, awkward, random, and error-prone movements are corrected for using error feedback to produce smooth and accurate movements over time. The controller learns to create the correct control signal by predicting the error. Using these ideas, robots have been designed which can learn to produce adaptive arm movements or to avoid obstacles in a course. === Learning and memory systems === Robots designed to test theories of animal memory systems. Many studies examine the memory system of rats, particularly the rat hippocampus, dealing with place cells, which fire for a specific location that has been learned. Systems modeled after the rat hippocampus are generally able to learn mental maps of the environment, including recognizing landmarks and associating behaviors with them, allowing them to predict the upcoming obstacles and landmarks. Another study has produced a robot based on the proposed learning paradigm of barn owls for orientation and localization based on primarily auditory, but also visual stimuli. The hypothesized method involves synaptic plasticity and neuromodulation, a mostly chemical effect in which reward neurotransmitters such as dopamine or serotonin affect the firing sensitivity of a neuron to be sharper. The robot used in the study adequately matched the behavior of barn owls. Furthermore, the close interaction between motor output and auditory feedback proved to be vital in the learning process, supporting active sensing theories that are involved in many of the learning models. Neurorobots in these studies are presented with simple mazes or patterns to learn. Some of the problems presented to the neurorobot include recognition of symbols, colors, or other patterns and execute simple actions based on the pattern. In the case of the barn owl simulation, the robot had to determine its location and direction to navigate in its environment. === Action selection and value systems === Action selection studies deal with negative or positive weighting to an action and its outcome. Neurorobots can and have been used to study simple ethical interactions, such as the classical thought experiment where there are more people than a life raft can hold, and someone must leave the boat to save the rest. However, more neurorobots used in the study of action selection contend with much simpler persuasions such as self-preservation or perpetuation of the population of robots in the study. These neurorobots are modeled after the neuromodulation of synapses to encourage circuits with positive results. In biological systems, neurotransmitters such as dopamine or acetylcholine positively reinforce neural signals that are beneficial. One study of such interaction involved the robot Darwin VII, which used visual, auditory, and a simulated taste input to "eat" conductive metal blocks. The arbitrarily chosen good blocks had a striped pattern on them while the bad blocks had a circular shape on them. The taste sense was simulated by conductivity of the blocks. The robot had positive and negative feedbacks to the taste based on its level of conductivity. The researchers observed the robot to see how it learned its action selection behaviors based on the inputs it had. Other studies have used herds of small robots which feed on batteries strewn about the room, and communicate its findings to other robots. === Sensory perception === Neurorobots have also been used to study sensory perception, particularly vision. These are primarily systems that result from embedding neural models of sensory pathways in automatas. This approach gives exposure to the sensory signals that occur during behavior and also enables a more realistic assessment of the degree of robustness of the neural model. It is well known that changes in the sensory signals produced by motor activity provide useful perceptual cues that are used extensively by organisms. For example, researchers have used the depth information that emerges during replication of human head and eye movements to establish robust representations of the visual scene. == Biological robots == Biological robots are not officially neurorobots in that they are not neurologically inspired AI systems, but actual neuron tissue wired to a robot. This employs the use of cultured neural networks to study brain development or neural interactions. These typically consist of a neural culture raised on a multielectrode array (MEA), which is capable of both recording the neural activity and stimulating the tissue. In some cases, the MEA is connected to a computer which presents a simulated environment to the brain tissue and translates brain activity into actions in the simulation, as well as providing sensory feedback The ability to record neural activity gives researchers a window into a brain, which they can use to learn about a number of the same issues neurorobots are used for. An area of concern with the biological robots is ethics. Many questions are raised about how to treat such experiments. The central question concerns consciousness and whether or not the rat brain experiences it. There are many theories about how to define consciousness. == Implications for neuroscience == Neuroscientists benefit from neurorobotics because it provides a blank slate to test various possible methods of brain function in a controlled and testable environment. While robots are more simplified versions of the systems they emulate, they are more specific, allowing more direct testing of the issue at hand. They also have the benefit of being accessible at all times, while it is more difficult to monitor large portions of a brain while the human or animal is active, especially individual neurons. The development of neuroscience has produced neural treatments. These include pharmaceuticals and neural rehabilitation. Progress is dependent on an intricate understanding of the brain and how exactly it functions. It is difficult to study the brain, especially in humans, due to the danger associated with cranial surgeries. Neurorobots can improved the range of tests and experiments that can be performed in the study of neural processes.

    Read more →
  • Principle of rationality

    Principle of rationality

    The principle of rationality (or rationality principle) was coined by Karl R. Popper in his Harvard Lecture of 1963, and published in his book Myth of Framework. It is related to what he called the 'logic of the situation' in an Economica article of 1944/1945, published later in his book The Poverty of Historicism. According to Popper's rationality principle, agents act in the most adequate way according to the objective situation. It is an idealized conception of human behavior which he used to drive his model of situational analysis. Cognitive scientist Allen Newell elaborated on the principle in his account of knowledge level modeling. == Popper == Popper called for social science to be grounded in what he called situational analysis or situational logic. This requires building models of social situations which include individual actors and their relationship to social institutions, e.g. markets, legal codes, bureaucracies, etc. These models attribute certain aims and information to the actors. This forms the 'logic of the situation', the result of reconstructing meticulously all circumstances of an historical event. The 'principle of rationality' is the assumption that people are instrumental in trying to reach their goals, and this is what drives the model. Popper believed that this model could be continuously refined to approach the objective truth. Popper called his principle of rationality nearly empty (a technical term meaning without empirical content) and strictly speaking false, but nonetheless tremendously useful. These remarks earned him a lot of criticism because seemingly he had swerved from his famous Logic of Scientific Discovery. Among the many philosophers having discussed Popper's principle of rationality from the 1960s up to now are Noretta Koertge, R. Nadeau, Viktor J. Vanberg, Hans Albert, E. Matzner, Ian C. Jarvie, Mark A. Notturno, John Wettersten, Ian C. Böhm. == Newell == In the context of knowledge-based systems, Newell (in 1982) proposed the following principle of rationality: "If an agent has knowledge that one of its actions will lead to one of its goals, then the agent will select that action." This principle is employed by agents at the knowledge level to move closer to a desired goal. An important philosophical difference between Newell and Popper is that Newell argued that the knowledge level is real in the sense that it exists in nature and is not made up. This allowed Newell to treat the rationality principle as a way of understanding nature and avoid the problems Popper ran into by treating knowledge as non physical and therefore non empirical.

    Read more →
  • Time series

    Time series

    In mathematics, a time series is a sequence of data points indexed, listed, or graphed in chronological order. Most commonly, a time series consists of observations recorded at successive equally spaced points in time. Thus, it represents a form of discrete-time data. A time series may describe measurements collected over seconds, days, years, or even centuries. Common examples include heights of ocean tides, counts of sunspots, daily temperature readings, and the closing values of stock market indices such as the Dow Jones Industrial Average. A time series is often visualized using a run chart (a type of temporal line chart), which helps identify patterns such as trends, seasonal effects, and irregular fluctuations. Time series are widely used in statistics, actuarial science, signal processing, pattern recognition, econometrics, mathematical finance, weather forecasting, earthquake prediction, electroencephalography, control engineering, astronomy, communications engineering, and many other areas of applied science and engineering that involve temporal measurements. Time series analysis comprises methods for analyzing time series data in order to extract meaningful statistics and other characteristics of the data. Time series forecasting is the use of a model to predict future values based on previously observed values. Generally, time series data is modeled as a stochastic process. While regression analysis is often employed in such a way as to test relationships between one or more different time series, this type of analysis is not usually called "time series analysis", which refers in particular to relationships between different points in time within a single series. Time series data have a natural temporal ordering. This makes time series analysis distinct from cross-sectional studies, in which there is no natural ordering of the observations (e.g. explaining people's wages by reference to their respective education levels, where the individuals' data could be entered in any order). Time series analysis is also distinct from spatial data analysis where the observations typically relate to geographical locations (e.g. accounting for house prices by the location as well as the intrinsic characteristics of the houses). A stochastic model for a time series will generally reflect the fact that observations close together in time will be more closely related than observations further apart. In addition, time series models will often make use of the natural one-way ordering of time so that values for a given period will be expressed as deriving in some way from past values, rather than from future values (see time reversibility). Time series analysis can be applied to real-valued, continuous data, discrete numeric data, or discrete symbolic data (i.e. sequences of characters, such as letters and words in the English language). == Methods for analysis == Methods for time series analysis may be divided into two classes: frequency-domain methods and time-domain methods. The former include spectral analysis and wavelet analysis; the latter include auto-correlation and cross-correlation analysis. In the time domain, correlation and analysis can be made in a filter-like manner using scaled correlation, thereby mitigating the need to operate in the frequency domain. Additionally, time series analysis techniques may be divided into parametric and non-parametric methods. The parametric approaches assume that the underlying stationary stochastic process has a certain structure which can be described using a small number of parameters (for example, using an autoregressive or moving-average model). In these approaches, the task is to estimate the parameters of the model that describes the stochastic process. By contrast, non-parametric approaches explicitly estimate the covariance or the spectrum of the process without assuming that the process has any particular structure. Methods of time series analysis may also be divided into linear and non-linear, and univariate and multivariate. == Panel data == A time series is one type of panel data. Panel data is the general class, a multidimensional data set, whereas a time series data set is a one-dimensional panel (as is a cross-sectional dataset). A data set may exhibit characteristics of both panel data and time series data. One way to tell is to ask what makes one data record unique from the other records. If the answer is the time data field, then this is a time series data set candidate. If determining a unique record requires a time data field and an additional identifier which is unrelated to time (e.g. student ID, stock symbol, country code), then it is panel data candidate. If the differentiation lies on the non-time identifier, then the data set is a cross-sectional data set candidate. == Analysis == There are several types of motivation and data analysis available for time series which are appropriate for different purposes. === Motivation === In the context of statistics, econometrics, quantitative finance, seismology, meteorology, and geophysics the primary goal of time series analysis is forecasting. In the context of signal processing, control engineering and communication engineering it is used for signal detection. Other applications are in data mining, pattern recognition and machine learning, where time series analysis can be used for clustering, classification, query by content, anomaly detection as well as forecasting. === Exploratory analysis === A simple way to examine a regular time series is manually with a line chart. The datagraphic shows tuberculosis deaths in the United States, along with the yearly change and the percentage change from year to year. The total number of deaths declined in every year until the mid-1980s, after which there were occasional increases, often proportionately - but not absolutely - quite large. A study of corporate data analysts found two challenges to exploratory time series analysis: discovering the shape of interesting patterns, and finding an explanation for these patterns. Visual tools that represent time series data as heat map matrices can help overcome these challenges. === Estimation, filtering, and smoothing === This approach may be based on harmonic analysis and filtering of signals in the frequency domain using the Fourier transform, and spectral density estimation. Its development was significantly accelerated during World War II by mathematician Norbert Wiener, electrical engineers Rudolf E. Kálmán, Dennis Gabor and others for filtering signals from noise and predicting signal values at a certain point in time. An equivalent effect may be achieved in the time domain, as in a Kalman filter; see filtering and smoothing for more techniques. Other related techniques include: Autocorrelation analysis to examine serial dependence Spectral analysis to examine cyclic behavior which need not be related to seasonality. For example, sunspot activity varies over 11 year cycles. Other common examples include celestial phenomena, weather patterns, neural activity, commodity prices, and economic activity. Separation into components representing trend, seasonality, slow and fast variation, and cyclical irregularity: see trend estimation and decomposition of time series === Curve fitting === Curve fitting is the process of constructing a curve, or mathematical function, that has the best fit to a series of data points, possibly subject to constraints. Curve fitting can involve either interpolation, where an exact fit to the data is required, or smoothing, in which a "smooth" function is constructed that approximately fits the data. A related topic is regression analysis, which focuses more on questions of statistical inference such as how much uncertainty is present in a curve that is fit to data observed with random errors. Fitted curves can be used as an aid for data visualization, to infer values of a function where no data are available, and to summarize the relationships among two or more variables. Extrapolation refers to the use of a fitted curve beyond the range of the observed data, and is subject to a degree of uncertainty since it may reflect the method used to construct the curve as much as it reflects the observed data. For processes that are expected to generally grow in magnitude one of the curves in the graphic (and many others) can be fitted by estimating their parameters. The construction of economic time series involves the estimation of some components for some dates by interpolation between values ("benchmarks") for earlier and later dates. Interpolation is estimation of an unknown quantity between two known quantities (historical data), or drawing conclusions about missing information from the available information ("reading between the lines"). Interpolation is useful where the data surrounding the missing data is available and its trend, seasonality, and longer-term cycles are known. This is often done by using a relat

    Read more →
  • ShowDocument

    ShowDocument

    ShowDocument is an online web application that allows multiple users to conduct web meetings, upload, share and review documents from remote locations. The service was developed by the HBR Labs company, established in 2007. == Features == Users can collaborate on and review documents in real time, with annotations and text being visible to all users and accessible for co-editing. The idea of every user being able to annotate can cause conflicts within the sessions, and so main navigation options are under the "presenter"'s control - which can be given to a different user as well. An earlier version of the application, by contrast, had allowed all users to navigate and edit at once, causing the system to drop all incomplete edits. It is possible to draw and write on a virtual whiteboard, and to stream a YouTube video to a group in full synchronization. A feature also exists for co-browsing of Google Maps. Entering an open session in the application can be done with a given code number, or by receiving a link through an Email message. Different file formats can be uploaded and saved either online or offline, such as PDF. A PDF file's text cannot be edited - text is edited through the separate text editor. Although the platform contains a text chat, it is not intended to replace instant messaging software, as there are no extensive messaging features. The application has a paid and free version, with the free version having a few limitations: audio and video options are disabled, number of participants is limited and sessions are time-limited. == Development == ShowDocument was first developed in 2007. On September 8, 2009, HBR labs released a new update which included features such as secure online document storage and mobile device support.

    Read more →
  • Data annotation

    Data annotation

    Data annotation is the process of labeling or tagging relevant metadata within a dataset to enable machines to interpret the data accurately. The dataset can take various forms, including images, audio files, video footage, or text. == Applications == Data is a fundamental component in the development of artificial intelligence (AI). Training AI models, particularly in computer vision and natural language processing, requires large volumes of annotated data. Proper annotation ensures that machine learning algorithms can recognize patterns and make accurate predictions. Common types of data annotation include classification, bounding boxes, semantic segmentation, and keypoint annotation. Data annotation is used in AI-driven fields, including healthcare, autonomous vehicles, retail, security, and entertainment. By accurately labeling data, machine learning models can perform complex tasks such as object detection, sentiment analysis, and speech recognition with greater precision. This growing demand has led to the emergence of specialized sectors and platforms dedicated to AI training and human-in-the-loop workflows, which often utilize Reinforcement Learning from Human Feedback (RLHF) to refine model behavior. == In computer vision == === Image classification === Image classification, also known as image categorization, involves assigning predefined labels to images. Machine learning algorithms trained on classified images can later recognize objects and differentiate between categories. For instance, an AI model trained to recognize furniture styles can distinguish between Georgian and Rococo armchairs. === Semantic segmentation === Semantic segmentation assigns each pixel in an image to a specific class, such as trees, vehicles, humans, or buildings. This type of annotation enables machine learning models to differentiate objects by grouping similar pixels, allowing for a detailed understanding of an image. === Bounding boxes === Bounding box annotation involves drawing rectangular boxes around objects in an image. This technique is commonly used in autonomous driving, security surveillance, and retail analytics to detect and classify objects such as pedestrians, vehicles, and products on store shelves. === 3D cuboids === 3D cuboid annotation enhances traditional bounding boxes by adding depth, enabling models to predict an object's spatial orientation, movement, and size. This method is particularly useful for autonomous vehicles and robotics, where understanding object dimensions and depth is critical. === Polygonal annotation === For objects with irregular shapes, such as curved or multi-sided items, polygonal annotation provides more precise labeling than bounding boxes. This technique is often used in applications that require detailed object recognition, such as medical imaging or aerial mapping. === Keypoint annotation === Keypoint annotation marks specific points on an object, such as facial landmarks or body joints, to enable tracking and motion analysis. This method is widely used in facial recognition, emotion detection, sports analytics, and augmented reality applications.

    Read more →
  • Instance selection

    Instance selection

    Instance selection (or dataset reduction, or dataset condensation) is an important data pre-processing step that can be applied in many machine learning (or data mining) tasks. Approaches for instance selection can be applied for reducing the original dataset to a manageable volume, leading to a reduction of the computational resources that are necessary for performing the learning process. Algorithms of instance selection can also be applied for removing noisy instances, before applying learning algorithms. This step can improve the accuracy in classification problems. Algorithm for instance selection should identify a subset of the total available data to achieve the original purpose of the data mining (or machine learning) application as if the whole data had been used. Considering this, the optimal outcome of IS would be the minimum data subset that can accomplish the same task with no performance loss, in comparison with the performance achieved when the task is performed using the whole available data. Therefore, every instance selection strategy should deal with a trade-off between the reduction rate of the dataset and the classification quality. == Instance selection algorithms == The literature provides several different algorithms for instance selection. They can be distinguished from each other according to several different criteria. Considering this, instance selection algorithms can be grouped in two main classes, according to what instances they select: algorithms that preserve the instances at the boundaries of classes and algorithms that preserve the internal instances of the classes. Within the category of algorithms that select instances at the boundaries it is possible to cite DROP3, ICF and LSBo. On the other hand, within the category of algorithms that select internal instances, it is possible to mention ENN and LSSm. In general, algorithm such as ENN and LSSm are used for removing harmful (noisy) instances from the dataset. They do not reduce the data as the algorithms that select border instances, but they remove instances at the boundaries that have a negative impact on the data mining task. They can be used by other instance selection algorithms, as a filtering step. For example, the ENN algorithm is used by DROP3 as the first step, and the LSSm algorithm is used by LSBo. There is also another group of algorithms that adopt different selection criteria. For example, the algorithms LDIS, CDIS and XLDIS select the densest instances in a given arbitrary neighborhood. The selected instances can include both, border and internal instances. The LDIS and CDIS algorithms are very simple and select subsets that are very representative of the original dataset. Besides that, since they search by the representative instances in each class separately, they are faster (in terms of time complexity and effective running time) than other algorithms, such as DROP3 and ICF. Besides that, there is a third category of algorithms that, instead of selecting actual instances of the dataset, select prototypes (that can be synthetic instances). In this category it is possible to include PSSA, PSDSP and PSSP. The three algorithms adopt the notion of spatial partition (a hyperrectangle) for identifying similar instances and extract prototypes for each set of similar instances. In general, these approaches can also be modified for selecting actual instances of the datasets. The algorithm ISDSP adopts a similar approach for selecting actual instances (instead of prototypes).

    Read more →
  • Automation in construction

    Automation in construction

    Automation in construction is the combination of methods, processes, and systems that allow for greater machine autonomy in construction activities. Construction automation may have multiple goals, including but not limited to, reducing jobsite injuries, decreasing activity completion times, and assisting with quality control and quality assurance. Some systems may be fielded as a direct response to increasing skilled labor shortages in some countries. Opponents claim that increased automation may lead to less construction jobs and that software leaves heavy equipment vulnerable to hackers. Research insights on this subject are today published in several journals such as Automation in Construction by Elsevier. == Uses of automation in construction == Equipment control and management: Automation can be used to control and monitor construction equipment, such as cranes, excavators, and bulldozers. Material handling: Automated systems can be used to handle, transport, and place materials such as concrete, bricks, and stones. Surveying: Automated survey equipment and drones can be used to collect and analyze data on construction sites. Quality control: Automated systems can be used to monitor and control the quality of materials and construction processes. Safety management: Automated systems can be used to monitor and control safety conditions on construction sites. Scheduling and planning: Automated systems can be used to manage schedules, resources, and costs. Waste management: Automated systems can be used to manage and dispose of waste materials generated during construction. 3D printing: Automated 3D printing can be used to create prototypes, models, and even full-scale building components. == Autonomous heavy equipment == Advances in sensors, machine learning, and autonomous vehicle technology have led to the development of self-operating construction equipment and retrofit systems designed to automate excavators, bulldozers, tracked loaders, skid steer loaders, and haul trucks, allowing them to perform tasks with limited human supervision. Since 2017, tech companies have developed autonomous or semi-autonomous retrofit kits that can be installed on existing construction machinery. Examples include Bedrock Robotics, Built Robotics, and SafeAI, which develop sensor and software systems that enable excavators and other earthmoving machines to operate with varying degrees of autonomy. Major equipment manufacturers have also introduced autonomous capabilities: Caterpillar and John Deere have developed autonomous or semi-autonomous systems for construction and mining equipment, including haul trucks and earthmoving machines. == Transportation сonstruction == Kratos Defense & Security Solutions fielded the world’s first Autonomous Truck-Mounted Attenuator (ATMA) in 2017, in conjunction with Royal Truck & Equipment. == Benefits of automation in construction == The use of automation in construction has become increasingly prevalent in recent years due to its numerous benefits. Automation in construction refers to the use of machinery, software, and other technologies to perform tasks that were previously done manually by workers. One of the most significant benefits of automation in construction is increased productivity. Automation can help speed up construction processes, reduce project completion times, and improve overall efficiency. For example, using automated machinery for tasks such as concrete pouring, bricklaying, and welding can significantly increase the speed and accuracy of these tasks, allowing for more work to be completed in a shorter amount of time. Another benefit of automation in construction is improved safety. By automating tasks that are hazardous to workers, such as demolition or working at height, companies can reduce the risk of accidents and injuries on site. Automation can also help to reduce worker fatigue, which can be a significant factor in accidents and mistakes. Overall, the use of automation in construction can improve productivity, reduce costs, increase safety, and improve the quality of construction projects. As technology continues to advance, the use of automation is likely to become even more prevalent in the construction industry.

    Read more →
  • Hamilton C shell

    Hamilton C shell

    Hamilton C shell is a clone of the Unix C shell and utilities for Microsoft Windows created by Nicole Hamilton at Hamilton Laboratories as a completely original work, not based on any prior code. It was first released on OS/2 on December 12, 1988 and on Windows NT in July 1992. The OS/2 version was discontinued in 2003 but the Windows version continues to be actively supported. == Design == Hamilton C shell differs from the Unix C shell in several respects. These include its compiler architecture, its use of threads, and the decision to follow Windows rather than Unix conventions. === Parser === The original C shell uses an ad hoc parser. This has led to complaints about its limitations. It works well enough for the kinds of things users type interactively but not very well for the more complex commands a user might take time to write in a script. It is not possible, for example, to pipe the output of a foreach statement into grep. There was a limit to how complex a command it could handle. By contrast, Hamilton uses a top-down recursive descent parser that allows it to compile statements to an internal form before running them. As a result, statements can be nested or piped arbitrarily. The language has also been extended with built-in and user-defined procedures, local variables, floating point and additional expression, editing and wildcarding operators, including an "indefinite directory" wildcard construct written as "..." that matches zero or more directory levels as required to make the rest of the pattern match. === Threads === Lacking fork or a high performance way to recreate that functionality, Hamilton uses the Windows threads facilities instead. When a new thread is created, it runs within the same process space and it shares all of the process state. If one thread changes the current directory or the contents of memory, it's changed for all the threads. It's much cheaper to create a thread than a process but there's no isolation between them. To recreate the missing isolation of separate processes, the threads cooperate to share resources using locks. === Windows conventions === Hamilton differs from other Unix shells in that it also directly supports Windows conventions for drive letters, filename slashes, escape characters, etc.

    Read more →
  • Leakage (machine learning)

    Leakage (machine learning)

    In statistics and machine learning, leakage (also known as data leakage or target leakage) refers to the use of information during model training that would not be available at prediction time. This results in overly optimistic performance estimates, as the model appears to perform better during evaluation than it actually would in a production environment. Leakage is often subtle and indirect, making it difficult to detect and eliminate. It can lead a statistician or modeler to select a suboptimal model, which may be outperformed by a leakage-free alternative. == Leakage modes == Leakage can occur at multiple stages of the machine learning workflow. Broadly, its sources can be divided into two categories: those arising from features and those arising from training examples. === Feature leakage === Feature or column-wise leakage is caused by the inclusion of columns which are one of the following: a duplicate label, a proxy for the label, or the label itself. These features, known as anachronisms, will not be available when the model is used for predictions, and result in leakage if included when the model is trained. For example, including a "MonthlySalary" column when predicting "YearlySalary"; or "MinutesLate" when predicting "IsLate". === Training example leakage === Row-wise leakage is caused by improper sharing of information between rows of data. Types of row-wise leakage include: Premature featurization; leaking from premature featurization before Cross-validation/Train/Test split (must fit MinMax/ngrams/etc on only the train split, then transform the test set) Duplicate rows between train/validation/test (for example, oversampling a dataset to pad its size before splitting; or, different rotations/augmentations of a single image; bootstrap sampling before splitting; or duplicating rows to up sample the minority class) Non-independent and identically distributed random (non-IID) data Time leakage (for example, splitting a time-series dataset randomly instead of newer data in test set using a train/test split or rolling-origin cross-validation) Group leakage—not including a grouping split column (for example, Andrew Ng's group had 100k x-rays of 30k patients, meaning ~3 images per patient. The paper used random splitting instead of ensuring that all images of a patient were in the same split. Hence the model partially memorized the patients instead of learning to recognize pneumonia in chest x-rays.) A 2023 review found data leakage to be "a widespread failure mode in machine-learning (ML)-based science", having affected at least 294 academic publications across 17 disciplines, and causing a potential reproducibility crisis. == Detection == Data leakage in machine learning can be detected through various methods, focusing on performance analysis, feature examination, data auditing, and model behavior analysis. Performance-wise, unusually high accuracy or significant discrepancies between training and test results often indicate leakage. Inconsistent cross-validation outcomes may also signal issues. Feature examination involves scrutinizing feature importance rankings and ensuring temporal integrity in time series data. A thorough audit of the data pipeline is crucial, reviewing pre-processing steps, feature engineering, and data splitting processes. Detecting duplicate entries across dataset splits is also important. For language models, the Min-K% method can detect the presence of data in a pretraining dataset. It presents a sentence suspected to be present in the pretraining dataset, and computes the log-likelihood of each token, then compute the average of the lowest K of these. If this exceeds a threshold, then the sentence is likely present. This method is improved by comparing against a baseline of the mean and variance. Analyzing model behavior can reveal leakage. Models relying heavily on counter-intuitive features or showing unexpected prediction patterns warrant investigation. Performance degradation over time when tested on new data may suggest earlier inflated metrics due to leakage. Advanced techniques include backward feature elimination, where suspicious features are temporarily removed to observe performance changes. Using a separate hold-out dataset for final validation before deployment is advisable.

    Read more →
  • Commonsense knowledge (artificial intelligence)

    Commonsense knowledge (artificial intelligence)

    In artificial intelligence research, commonsense knowledge consists of facts about the everyday world, such as "Lemons are sour" or "Cows say moo", that all humans are expected to know. It is currently an unsolved problem in artificial general intelligence. The first AI program to address common sense knowledge was Advice Taker in 1959 by John McCarthy. Commonsense knowledge can underpin a commonsense reasoning process, to attempt inferences such as "You might bake a cake because you want people to eat the cake." A natural language processing process can be attached to the commonsense knowledge base to allow the knowledge base to attempt to answer questions about the world. Common sense knowledge also helps to solve problems in the face of incomplete information. Using widely held beliefs about everyday objects, or common sense knowledge, AI systems make common sense assumptions or default assumptions about the unknown similar to the way people do. In an AI system or in English, this is expressed as "Normally P holds", "Usually P" or "Typically P so Assume P". For example, if we know the fact "Tweety is a bird", because we know the commonly held belief about birds, "typically birds fly," without knowing anything else about Tweety, we may reasonably assume the fact that "Tweety can fly." As more knowledge of the world is discovered or learned over time, the AI system can revise its assumptions about Tweety using a truth maintenance process. If we later learn that "Tweety is a penguin" then truth maintenance revises this assumption because we also know "penguins do not fly". == Commonsense reasoning == Commonsense reasoning simulates the human ability to use commonsense knowledge to make presumptions about the type and essence of ordinary situations they encounter every day, and to change their "minds" should new information come to light. This includes time, missing or incomplete information and cause and effect. The ability to explain cause and effect is an important aspect of explainable AI. Truth maintenance algorithms automatically provide an explanation facility because they create elaborate records of presumptions. Compared with humans, all existing computer programs that attempt human-level AI perform extremely poorly on modern "commonsense reasoning" benchmark tests such as the Winograd Schema Challenge. The problem of attaining human-level competency at "commonsense knowledge" tasks is considered to probably be "AI complete" (that is, solving it would require the ability to synthesize a fully human-level intelligence), although some oppose this notion and believe compassionate intelligence is also required for human-level AI. Common sense reasoning has been applied successfully in more limited domains such as natural language processing and automated diagnosis or analysis. == Commonsense knowledge base construction == Compiling comprehensive knowledge bases of commonsense assertions (CSKBs) is a long-standing challenge in AI research. From early expert-driven efforts like CYC and WordNet, significant advances were achieved via the crowdsourced OpenMind Commonsense project, which led to the crowdsourced ConceptNet KB. Several approaches have attempted to automate CSKB construction, most notably, via text mining (WebChild, Quasimodo, TransOMCS, Ascent), as well as harvesting these directly from pre-trained language models (AutoTOMIC). These resources are significantly larger than ConceptNet, though the automated construction mostly makes them of moderately lower quality. Challenges also remain on the representation of commonsense knowledge: Most CSKB projects follow a triple data model, which is not necessarily best suited for breaking more complex natural language assertions. A notable exception here is GenericsKB, which applies no further normalization to sentences, but retains them in full. == Applications == Around 2013, MIT researchers developed BullySpace, an extension of the commonsense knowledgebase ConceptNet, to catch taunting social media comments. BullySpace included over 200 semantic assertions based around stereotypes, to help the system infer that comments like "Put on a wig and lipstick and be who you really are" are more likely to be an insult if directed at a boy than a girl. ConceptNet has also been used by chatbots and by computers that compose original fiction. At Lawrence Livermore National Laboratory, common sense knowledge was used in an intelligent software agent to detect violations of a comprehensive nuclear test ban treaty. == Data == As an example, as of 2012 ConceptNet includes these 21 language-independent relations: IsA (An "RV" is a "vehicle" | X is an instance of a Y) UsedFor (a "cake tin" is used for "making cakes" | X is used for the purpose Y) HasA (A "rabbit" has a "tail" | X possesses Y element or feature) CapableOf (a "cook" is capable of "making baked goods" | X is capable of doing Y) Desires (a "child" desires "the aroma of baking" | X has a desire for Y) CreatedBy ("cake" is created by a "baker" | X is created by Y) PartOf (a "knife" is be part of a "knife set" | X is a part of Y) Causes ("Heat" causes "cooking"| X is what causes Y) LocatedNear (the "oven" is located near the "refrigerator" | X is located near Y) AtLocation (Somewhere a "Cook" can be at a "restaurant" | X is at the location of Y) DefinedAs (a "Cupcake" is defined as a "cake" that also has the qualities of being "small", "baked within a wrapper", and "containing only one area of frosting or icing" | X is defined as Y that also has the properties A, B & C) SymbolOf (a "heart" is a symbol of "affection" | X is a symbolic representation of Y) ReceivesAction ("cake" can receive the action of being "eaten" | X is capable of receiving action Y) HasPrerequisite ("baking" has the prerequisite of obtaining the "ingredients" | X cannot do Y unless A does B) MotivatedByGoal ("baking" is motivated by the goal of "consumption"/"eating" | X has the motivation of Y goal) CausesDesire ("baking" makesYou want to "follow recipe" | X causes the desire to do Y) MadeOf ("Cake" is made of "flour"/"eggs"/"sugar"/"oil"/etc | X is made of Y) HasFirstSubevent ("baking" has first subevent "make batter" | To do X the first thing that needs to be done is Y) HasSubevent ("eat" has subevent "swallow" | Doing X will lead to Y event following) HasLastSubevent ("sleeping" has last subevent of "waking" | Doing X ends with the event Y) == Commonsense knowledge bases == Cyc Open Mind Common Sense (data source) and ConceptNet (datastore and NLP engine) Evi Graphiq

    Read more →
  • Semantic analysis (machine learning)

    Semantic analysis (machine learning)

    In machine learning, semantic analysis of a text corpus is the task of building structures that approximate concepts from a large set of documents. It generally does not involve prior semantic understanding of the documents. Semantic analysis strategies include: Metalanguages based on first-order logic, which can analyze the speech of humans. Understanding the semantics of a text is symbol grounding: if language is grounded, it is equal to recognizing a machine-readable meaning. For the restricted domain of spatial analysis, a computer-based language understanding system was demonstrated. Latent semantic analysis (LSA), a class of techniques where documents are represented as vectors in a term space. A prominent example is probabilistic latent semantic analysis (PLSA). Latent Dirichlet allocation, which involves attributing document terms to topics. n-grams and hidden Markov models, which work by representing the term stream as a Markov chain, in which each term is derived from preceding terms. == Stochastic semantic analysis ==

    Read more →
  • Radek Maneuver

    Radek Maneuver

    The Radek Maneuver is a scale-up-then-scale-down tactic used in the administration of web services, specifically those deployed under a cloud computing paradigm (by a provider e.g. Amazon Elastic Compute Cloud or Microsoft Azure). == History == Developed by Olivier "Radek" Dabrowski in the mid-2010s, the Radek Maneuver was originally conceived of in using and maintaining applications running on a PaaS system. == Execution == The Radek Maneuver consists of a series of steps, usually executed via the PaaS or web portal interface. The tactic should be used when a service is misbehaving or otherwise experiencing errors, and the suspected cause is the underlying cloud layer, rather than the application layer. This includes networking issues and other "bad box" problems. The steps are as follows: Identify the application or service which is misbehaving. Increase the compute resource (number of CPU cores, amount of ram) for the instance on which the application is running. This is also known as scaling up. Wait for the application to re-deploy and stabilize. Scale back down to the original instance size. == Principle of action == This scale-up-scale-down method is understood to shift the application to a different physical machine underlying the PaaS service or application virtual machine. While this layer of the cloud computing stack is generally out of the access of an application developer (instead in the hands of the cloud provider), the maneuver allows troubleshooting and dodging errors in that layer.

    Read more →
  • Virtual intelligence

    Virtual intelligence

    Virtual intelligence (VI) is the term given to artificial intelligence that exists within a virtual world. Many virtual worlds have options for persistent avatars that provide information, training, role-playing, and social interactions. The immersion in virtual worlds provides a platform for VI beyond the traditional paradigm of past user interfaces (UIs). What Alan Turing established as a benchmark for telling the difference between human and computerized intelligence was devoid of visual influences. With today's VI bots, virtual intelligence has evolved past the constraints of past testing into a new level of the machine's ability to demonstrate intelligence. The immersive features of these environments provide nonverbal elements that affect the realism provided by virtually intelligent agents. Virtual intelligence is the intersection of these two technologies: Virtual environments: Immersive 3D spaces provide for collaboration, simulations, and role-playing interactions for training. Many of these virtual environments are currently being used for government and academic projects, including Second Life, VastPark, Olive, OpenSim, Outerra, Oracle's Open Wonderland, Duke University's Open Cobalt, and many others. Some of the commercial virtual worlds are also taking this technology into new directions, including the high-definition virtual world Blue Mars. Artificial intelligence (AI): AI is a branch of computer science that aims to create intelligent machines capable of performing tasks that typically require human intelligence. VI is a type of AI that operates within virtual environments to simulate human-like interactions and responses. == Applications == Cutlass Bomb Disposal Robot: Northrop Grumman developed a virtual training opportunity because of the prohibitive real-world cost and dangers associated with bomb disposal. By replicating a complicated system without having to learn advanced code, the virtual robot has no risk of damage, trainee safety hazards, or accessibility constraints. MyCyberTwin: NASA is among the companies that have used the MyCyberTwin AI technologies. They used it for the Phoenix rover in the virtual world Second Life. Their MyCyberTwin used a programmed profile to relay information about what the Phoenix rover was doing and its purpose. Second China: The University of Florida developed the "Second China" project as an immersive training experience for learning how to interact with the culture and language in a foreign country. Students are immersed in an environment that provides role-playing challenges coupled with language and cultural sensitivities magnified during country-level diplomatic missions or during times of potential conflict or regional destabilization. The virtual training provides participants with opportunities to access information, take part in guided learning scenarios, communicate, collaborate, and role-play. While China was the country for the prototype, this model can be modified for use with any culture to help better understand social and cultural interactions and see how other people think and what their actions imply. Duke School of Nursing Training Simulation: Extreme Reality developed virtual training to test critical thinking with a nurse performing trained procedures to identify critical data to make decisions and performing the correct steps for intervention. Bots are programmed to respond to the nurse's actions as the patient with their conditions improving if the nurse performs the correct actions.

    Read more →
  • Model compression

    Model compression

    Model compression is a machine learning technique for reducing the size of trained models. Large models can achieve high accuracy, but often at the cost of significant resource requirements. Compression techniques aim to compress models without significant performance reduction. Smaller models require less storage space, and consume less memory and compute during inference. Compressed models enable deployment on resource-constrained devices such as smartphones, embedded systems, edge computing devices, and consumer electronics computers. Efficient inference is also valuable for large corporations that serve large model inference over an API, allowing them to reduce computational costs and improve response times for users. Model compression is not to be confused with knowledge distillation, in which a smaller "student" model is trained to imitate the input-output behavior of a larger "teacher" model (as opposed to using the "teacher"'s trained parameters or the "teacher"'s training targets). == Techniques == Several techniques are employed for model compression. === Pruning === Pruning sparsifies a large model by setting some parameters to exactly zero. This effectively reduces the number of parameters. This allows the use of sparse matrix operations, which are faster than dense matrix operations. Pruning criteria can be based on magnitudes of parameters, the statistical pattern of neural activations, Hessian values, etc. === Quantization === Quantization reduces the numerical precision of weights and activations. For example, instead of storing weights as 32-bit floating-point numbers, they can be represented using 8-bit integers. Low-precision parameters take up less space, and takes less compute to perform arithmetic with. It is also possible to quantize some parameters more aggressively than others, so for example, a less important parameter can have 8-bit precision while another, more important parameter, can have 16-bit precision. Inference with such models requires mixed-precision arithmetic. Quantized models can also be used during training (rather than after training). PyTorch implements automatic mixed-precision (AMP), which performs autocasting, gradient scaling, and loss scaling. === Low-rank factorization === Weight matrices can be approximated by low-rank matrices. Let W {\displaystyle W} be a weight matrix of shape m × n {\displaystyle m\times n} . A low-rank approximation is W ≈ U V T {\displaystyle W\approx UV^{T}} , where U {\displaystyle U} and V {\displaystyle V} are matrices of shapes m × k , n × k {\displaystyle m\times k,n\times k} . When k {\displaystyle k} is small, this both reduces the number of parameters needed to represent W {\displaystyle W} approximately, and accelerates matrix multiplication by W {\displaystyle W} . Low-rank approximations can be found by singular value decomposition (SVD). The choice of rank for each weight matrix is a hyperparameter, and jointly optimized as a mixed discrete-continuous optimization problem. The rank of weight matrices may also be pruned after training, taking into account the effect of activation functions like ReLU on the implicit rank of the weight matrices. == Training == Model compression may be decoupled from training, that is, a model is first trained without regard for how it might be compressed, then it is compressed. However, it may also be combined with training. The "train big, then compress" method trains a large model for a small number of training steps (less than it would be if it were trained to convergence), then heavily compress the model. It is found that at the same compute budget, this method results in a better model than lightly compressed, small models. In Deep Compression, the compression has three steps. First loop (pruning): prune all weights lower than a threshold, then finetune the network, then prune again, etc. Second loop (quantization): cluster weights, then enforce weight sharing among all weights in each cluster, then finetune the network, then cluster again, etc. Third step: Use Huffman coding to losslessly compress the model. The SqueezeNet paper reported that Deep Compression achieved a compression ratio of 35 on AlexNet, and a ratio of ~10 on SqueezeNets.

    Read more →