As a subfield in artificial intelligence, diagnosis is concerned with the development of algorithms and techniques that are able to determine whether the behaviour of a system is correct. If the system is not functioning correctly, the algorithm should be able to determine, as accurately as possible, which part of the system is failing, and which kind of fault it is facing. The computation is based on observations, which provide information on the current behaviour. The expression diagnosis also refers to the answer of the question of whether the system is malfunctioning or not, and to the process of computing the answer. This word comes from the medical context where a diagnosis is the process of identifying a disease by its symptoms. == Example == An example of diagnosis is the process of a garage mechanic with an automobile. The mechanic will first try to detect any abnormal behavior based on the observations on the car and his knowledge of this type of vehicle. If he finds out that the behavior is abnormal, the mechanic will try to refine his diagnosis by using new observations and possibly testing the system, until he discovers the faulty component; the mechanic plays an important role in the vehicle diagnosis. == Expert diagnosis == The expert diagnosis (or diagnosis by expert system) is based on experience with the system. Using this experience, a mapping is built that efficiently associates the observations to the corresponding diagnoses. The experience can be provided: By a human operator. In this case, the human knowledge must be translated into a computer language. By examples of the system behaviour. In this case, the examples must be classified as correct or faulty (and, in the latter case, by the type of fault). Machine learning methods are then used to generalize from the examples. The main drawbacks of these methods are: The difficulty acquiring the expertise. The expertise is typically only available after a long period of use of the system (or similar systems). Thus, these methods are unsuitable for safety- or mission-critical systems (such as a nuclear power plant, or a robot operating in space). Moreover, the acquired expert knowledge can never be guaranteed to be complete. In case a previously unseen behaviour occurs, leading to an unexpected observation, it is impossible to give a diagnosis. The complexity of the learning. The off-line process of building an expert system can require a large amount of time and computer memory. The size of the final expert system. As the expert system aims to map any observation to a diagnosis, it will in some cases require a huge amount of storage space. The lack of robustness. If even a small modification is made on the system, the process of constructing the expert system must be repeated. A slightly different approach is to build an expert system from a model of the system rather than directly from an expertise. An example is the computation of a diagnoser for the diagnosis of discrete event systems. This approach can be seen as model-based, but it benefits from some advantages and suffers some drawbacks of the expert system approach. == Model-based diagnosis == Model-based diagnosis is an example of abductive reasoning using a model of the system. In general, it works as follows: We have a model that describes the behaviour of the system (or artefact). The model is an abstraction of the behaviour of the system and can be incomplete. In particular, the faulty behaviour is generally little-known, and the faulty model may thus not be represented. Given observations of the system, the diagnosis system simulates the system using the model, and compares the observations actually made to the observations predicted by the simulation. The modelling can be simplified by the following rules (where A b {\displaystyle Ab\,} is the Abnormal predicate): ¬ A b ( S ) ⇒ I n t 1 ∧ O b s 1 {\displaystyle \neg Ab(S)\Rightarrow Int1\wedge Obs1} A b ( S ) ⇒ I n t 2 ∧ O b s 2 {\displaystyle Ab(S)\Rightarrow Int2\wedge Obs2} (fault model) The semantics of these formulae is the following: if the behaviour of the system is not abnormal (i.e. if it is normal), then the internal (unobservable) behaviour will be I n t 1 {\displaystyle Int1\,} and the observable behaviour O b s 1 {\displaystyle Obs1\,} . Otherwise, the internal behaviour will be I n t 2 {\displaystyle Int2\,} and the observable behaviour O b s 2 {\displaystyle Obs2\,} . Given the observations O b s {\displaystyle Obs\,} , the problem is to determine whether the system behaviour is normal or not ( ¬ A b ( S ) {\displaystyle \neg Ab(S)\,} or A b ( S ) {\displaystyle Ab(S)\,} ). This is an example of abductive reasoning. == Diagnosability == A system is said to be diagnosable if whatever the behavior of the system, we will be able to determine without ambiguity a unique diagnosis. The problem of diagnosability is very important when designing a system because on one hand one may want to reduce the number of sensors to reduce the cost, and on the other hand one may want to increase the number of sensors to increase the probability of detecting a faulty behavior. Several algorithms for dealing with these problems exist. One class of algorithms answers the question whether a system is diagnosable; another class looks for sets of sensors that make the system diagnosable, and optionally comply to criteria such as cost optimization. The diagnosability of a system is generally computed from the model of the system. In applications using model-based diagnosis, such a model is already present and doesn't need to be built from scratch.
Journal of Machine Learning Research
The Journal of Machine Learning Research is a peer-reviewed open access scientific journal covering machine learning. It was established in 2000 and the first editor-in-chief was Leslie Kaelbling. The current editors-in-chief are Francis Bach (Inria) and David Blei (Columbia University). == History == The journal was established as an open-access alternative to the journal Machine Learning. In 2001, forty editorial board members of Machine Learning resigned, saying that in the era of the Internet, it was detrimental for researchers to continue publishing their papers in expensive journals with pay-access archives. The open access model employed by the Journal of Machine Learning Research allows authors to publish articles for free and retain copyright, while archives are freely available online. Print editions of the journal were published by MIT Press until 2004 and by Microtome Publishing thereafter. From its inception, the journal received no revenue from the print edition and paid no subvention to MIT Press or Microtome Publishing. In response to the prohibitive costs of arranging workshop and conference proceedings publication with traditional academic publishing companies, the journal launched a proceedings publication arm in 2007 and now publishes proceedings for several leading machine learning conferences, including the International Conference on Machine Learning, COLT, AISTATS, and workshops held at the Conference on Neural Information Processing Systems.
Technical data management system
A technical data management system (TDMS) is a document management system (DMS) pertaining to the management of technical and engineering drawings and documents. Often the data are contained in 'records' of various forms, such as on paper, microfilms or digital media. Hence technical data management is also concerned with record management involving technical data. Technical document management systems are used within large organisations with large scale projects involving engineering. For example, a TDMS can be used for integrated steel plants (ISP), automobile factories, aero-space facilities, infrastructure companies, city corporations, research organisations, etc. In such organisations, technical archives or technical documentation centres are created as central facilities for effective management of technical data and records. TDMS functions are similar to that of conventional archive functions in concepts, except that the archived materials in this case are essentially engineering drawings, survey maps, technical specifications, plant and equipment data sheets, feasibility reports, project reports, operation and maintenance manuals, standards, etc. Document registration, indexing, repository management, reprography, etc. are parts of TDMS. Various kinds of sophisticated technologies such as document scanners, microfilming and digitization camera units, wide format printers, digital plotters, software, etc. are available, making TDMS functions an easier process than previous times. == Constituents of a technical data management system == Technical data refers to both scientific and technical information recorded and presented in any form or manner (excluding financial and management information). A Technical Data Management System is created within an organisation for archiving and sharing information such as technical specifications, datasheets and drawings. Similar to other types of data management system, a Technical Data Management System consists of the 4 crucial constituents mentioned below. === Data planning === Data plans (long-term or short-term) are constructed as the first essential step of a proper and complete TDMS. It is created to ultimately help with the 3 other constituents, data acquisition, data management and data sharing. A proper data plan should not exceed 2 pages and should address the following basics: Types of data (samples, experiment results, reports, drawings, etc.) and metadata (data that summarizes and describes other data. In this case, it refers to details such as sample sizes, experiment conditions and procedures, dates of reports, explanations of drawings, etc.) Means of researches and collections of data (field works, experiments in production lines, etc.) Costs of researches Policies for access, sharing (re-use within the organisation and re-distribution to the public) Proposals for archiving data and maintaining access to it === Data acquisition === Raw data is collected from primary sites of the organisations through the use of modern technologies. Please reference the table below for examples. The data collected is then transferred to technical data centres for data management. === Data management === After data acquisition, data is sorted out, whilst useful data is archived, unwanted data is disposed. When managing and archiving data, the features below of the data are considered. Names, labels, values and descriptions for variables and records. (In the case of TDMS, one example is names of equipments on an equipment datasheet) Derived data from the original data, with code, algorithm or command file used to create them. (In the case of TDMS, one example is an expectation report derived from the analysis of an equipment datasheet) Metadata associates with the data being archived === Data sharing === Archived and managed data are accessible to rightful entities. A proper and complete TDMS should share data to a suitable extent, under suitable security, in order to achieve optimal usage of data within the organisation. It aims for easy access when reused by other researchers and hence it enhances other research processes. Data is often referred in other tests and technical specifications, where new analysis is generated, managed and archived again. As a result, data is flowing within the organisation under effective management through the use of TDMS. == Advantages and disadvantages of usage of technical data management systems == There are strengths and weakness when using technical data management systems (TDMS) to archive data. Some of the advantages and disadvantages are listed below. === Advantages === ==== 1. Faster and easier data management ==== Since TDMS is integrated into the organisation's systems, whenever workers develop data files (SolidWorks, AutoCAD, Microsoft Word, etc.), they can also archive and manage data, linking what they need to their current work, at the same time they can also update the archives with useful data. This speeds up working processes and makes them more efficient. ==== 2. Increased security ==== All data files are centralized, hence internal and external data leakages are less likely to happen, and the data flow is more closely monitored. As a result, data in the organisation is more secured. ==== 3. Increased collaboration within the organisation ==== Since the data files are centralized and the data flow within the organisation increases, researchers and workers within the organisation are able to work on joint projects. More complex tasks can be performed for higher yields. ==== 4. Compatible to various formats of data ==== TDMS is compatible to many formats of data, from basic data like Microsoft Words to complex data like voice data. This enhances the quality of the management of data archived. === Disadvantages === ==== 1. Higher financial costs ==== Implementing TDMS into the organisation's systems involves monetary costs. Maintenance costs certain amount of human resources and money as well. These resources involve opportunity costs as they can be utilized in other aspects. ==== 2. Lower stability ==== Since TDMS manages and centralizes all the data the organisation processes, it links the working processes within the whole organisation together. It also increases the vulnerability of the organisation data network. If TDMS is not stable enough or when it is exposed to hacker and virus attacks, the organisation's data flow might shut down completely, affecting the work in an organisation-wide scale and leading to a lower stability as results. == Comparison between traditional data management approaches and technical data management systems == Test engineers and researchers are facing great challenges in turning complex test results and simulation data into usable information for higher yields of firms. These challenges are listed below. Increase in complication of designs Reduced in time and budgets available Higher quality is demanded === Traditional data management approaches === Many organisations are still applying the conventional file management systems, due to the difficulty in building a proper and complete archives for data management. The first approach is the simple file-folder system. This costs the problem of ineffectiveness as workers and researchers have to manually go through numerous layers of systems and files for the target data. Moreover, the target data may contain files with different formats and these files may not be stored in the same machine. These files are also easily lost if renamed or moved to another location. The second approach is conventional databases such as Oracle. These databases are capable of enabling easy search and access of data. However, a great drawback is that huge effort for preparing and modeling the data is required. For large-scale projects, huge monetary costs are induced, and extra IT human resources must be employed for constant handling, expanding and maintaining the inflexible system, which is custom for specific tasks, instead of all tasks. In the long-term, it is not cost-effective. === Technical data management systems (TDMS) === TDMS is developed based on 3 principles, flexible and organized file storage, self-scaling hybrid data index, and an interactive post-processing environment. The system in practical, mainly consists of 3 components, data files with essential and relevant Metadata, data finders for organizing and managing data regardless of files formats, and, a software of searching, analyzing and reporting. With metadata attached to original data files, the data finder can identify different related data files during searches, even if they are in different file formats. TDMS hence allows researchers to search for data like browsing the Internet. Last but not least, it can adapt to changes and update itself according to the changes, unlike databases. == Comparison between strong information systems and weak information systems == Complex organizations may need large amounts
List of algorithm general topics
This is a list of algorithm general topics. Analysis of algorithms Ant colony algorithm Approximation algorithm Best and worst cases Big O notation Combinatorial search Competitive analysis Computability theory Computational complexity theory Embarrassingly parallel problem Emergent algorithm Evolutionary algorithm Fast Fourier transform Genetic algorithm Graph exploration algorithm Heuristic Hill climbing Implementation Las Vegas algorithm Lock-free and wait-free algorithms Monte Carlo algorithm Numerical analysis Online algorithm Polynomial time approximation scheme Problem size Pseudorandom number generator Quantum algorithm Random-restart hill climbing Randomized algorithm Running time Sorting algorithm Search algorithm Stable algorithm (disambiguation) Super-recursive algorithm Tree search algorithm
VMDS
VMDS abbreviates the relational database technology called Version Managed Data Store provided by GE Energy as part of its Smallworld technology platform and was designed from the outset to store and analyse the highly complex spatial and topological networks typically used by enterprise utilities such as power distribution and telecommunications. VMDS was originally introduced in 1990 as has been improved and updated over the years. Its current version is 6.0. VMDS has been designed as a spatial database. This gives VMDS a number of distinctive characteristics when compared to conventional attribute only relational databases. == Distributed server processing == VMDS is composed of two parts: a simple, highly scalable data block server called SWMFS (Smallworld Master File Server) and an intelligent client API written in C and Magik. Spatial and attribute data are stored in data blocks that reside in special files called data store files on the server. When the client application requests data it has sufficient intelligence to work out the optimum set of data blocks that are required. This request is then made to SWMFS which returns the data to the client via the network for processing. This approach is particularly efficient and scalable when dealing with spatial and topological data which tends to flow in larger volumes and require more processing then plain attribute data (for example during a map redraw operation). This approach makes VMDS well suited to enterprise deployment that might involve hundreds or even thousands of concurrent clients. == Support for long transactions == Relational databases support short transactions in which changes to data are relatively small and are brief in terms in duration (the maximum period between the start and the end of a transaction is typically a few seconds or less). VMDS supports long transactions in which the volume of data involved in the transaction can be substantial and the duration of the transaction can be significant (days, weeks or even months). These types of transaction are common in advanced network applications used by, for example, power distribution utilities. Due to the time span of a long transaction in this context the amount of change can be significant (not only within the scope of the transaction, but also within the context of the database as a whole). Accordingly, it is likely that the same record might be changed more than once. To cope with this scenario VMDS has inbuilt support for automatically managing such conflicts and allows applications to review changes and accept only those edits that are correct. == Spatial and topological capabilities == As well as conventional relational database features such as attribute querying, join fields, triggers and calculated fields, VMDS has numerous spatial and topological capabilities. This allows spatial data such as points, texts, polylines, polygons and raster data to be stored and analysed. Spatial functions include: find all features within a polygon, calculate the Voronoi polygons of a set of sites and perform a cluster analysis on a set of points. Vector spatial data such as points, polylines and polygons can be given topological attributes that allow complex networks to be modelled. Network analysis engines are provided to answer questions such as find the shortest path between two nodes or how to optimize a delivery route (the travelling salesman problem). A topology engine can be configured with a set of rules that define how topological entities interact with each other when new data is added or existing data edited. == Data abstraction == In VMDS all data is presented to the application as objects. This is different from many relational databases that present the data as rows from a table or query result using say JDBC. VMDS provides a data modelling tool and underlying infrastructure as part of the Smallworld technology platform that allows administrators to associate a table in the database with a Magik exemplar (or class). Magik get and set methods for the Magik exemplar can be automatically generated that expose a table's field (or column). Each VMDS row manifests itself to the application as an instance of a Magik object and is known as an RWO (or real world object). Tables are known as collections in Smallworld parlance. # all_rwos hold all the rwos in the database and is heterogeneous all_rwos << my_application.rwo_set() # valve_collection holds the valve collection valves << all_rwos.select(:collection, {:valve}) number_of_valves << valves.size Queries are built up using predicate objects: # find 'open' valves. open_valves << valves.select(predicate.eq(:operating_status, "open")) number_of_open_valves << open_valves.size _for valve _over open_valves.elements() _loop write(valve.id) _endloop Joins are implemented as methods on the parent RWO. For example, a manager might have several employees who report to him: # get the employee collection. employees << my_application.database.collection(:gis, :employees) # find a manager called 'Steve' and get the first matching element steve << employees.select(predicate.eq(:name, "Steve").and(predicate.eq(:role, "manager")).an_element() # display the names of his direct reports. name is a field (or column) # on the employee collection (or table) _for employee _over steve.direct_reports.elements() _loop write(employee.name) _endloop Performing a transaction: # each key in the hash table corresponds to the name of the field (or column) in # the collection (or table) valve_data << hash_table.new_with( :asset_id, 57648576, :material, "Iron") # get the valve collection directly valve_collection << my_application.database.collection(:gis, :valve) # create an insert transaction to insert a new valve record into the collection a # comment can be provide that describes the transaction transaction << record_transaction.new_insert(valve_collection, valve_data, "Inserted a new valve") transaction.run()
Kernel (image processing)
In image processing, a kernel, convolution matrix, or mask is a small matrix used for blurring, sharpening, embossing, edge detection, and more. This is accomplished by doing a convolution between the kernel and an image. Or more simply, when each pixel in the output image is a function of the nearby pixels (including itself) in the input image, the kernel is that function. == Details == The general expression of a convolution is g x , y = ω ∗ f x , y = ∑ i = − a a ∑ j = − b b ω i , j f x − i , y − j , {\displaystyle g_{x,y}=\omega f_{x,y}=\sum _{i=-a}^{a}{\sum _{j=-b}^{b}{\omega _{i,j}f_{x-i,y-j}}},} where g ( x , y ) {\displaystyle g(x,y)} is the filtered image, f ( x , y ) {\displaystyle f(x,y)} is the original image, ω {\displaystyle \omega } is the filter kernel. Every element of the filter kernel is considered by − a ≤ i ≤ a {\displaystyle -a\leq i\leq a} and − b ≤ j ≤ b {\displaystyle -b\leq j\leq b} . Depending on the element values, a kernel can cause a wide range of effects: The above are just a few examples of effects achievable by convolving kernels and images. === Origin === The origin is the position of the kernel which is above (conceptually) the current output pixel. This could be outside of the actual kernel, though usually it corresponds to one of the kernel elements. For a symmetric kernel, the origin is usually the center element. == Convolution == Convolution is the process of adding each element of the image to its local neighbors, weighted by the kernel. This is related to a form of mathematical convolution. The matrix operation being performed—convolution—is not traditional matrix multiplication, despite being similarly denoted by . For example, if we have two three-by-three matrices, the first a kernel, and the second an image piece, convolution is the process of flipping both the rows and columns of the kernel and multiplying locally similar entries and summing. The element at coordinates [2, 2] (that is, the central element) of the resulting image would be a weighted combination of all the entries of the image matrix, with weights given by the kernel: ( [ a b c d e f g h i ] ∗ [ 1 2 3 4 5 6 7 8 9 ] ) [ 2 , 2 ] = {\displaystyle \left({\begin{bmatrix}a&b&c\\d&e&f\\g&h&i\end{bmatrix}}{\begin{bmatrix}1&2&3\\4&5&6\\7&8&9\end{bmatrix}}\right)[2,2]=} ( i ⋅ 1 ) + ( h ⋅ 2 ) + ( g ⋅ 3 ) + ( f ⋅ 4 ) + ( e ⋅ 5 ) + ( d ⋅ 6 ) + ( c ⋅ 7 ) + ( b ⋅ 8 ) + ( a ⋅ 9 ) . {\displaystyle (i\cdot 1)+(h\cdot 2)+(g\cdot 3)+(f\cdot 4)+(e\cdot 5)+(d\cdot 6)+(c\cdot 7)+(b\cdot 8)+(a\cdot 9).} The other entries would be similarly weighted, where we position the center of the kernel on each of the boundary points of the image, and compute a weighted sum. The values of a given pixel in the output image are calculated by multiplying each kernel value by the corresponding input image pixel values. This can be described algorithmically with the following pseudo-code: for each image row in input image: for each pixel in image row: set accumulator to zero for each kernel row in kernel: for each element in kernel row: if element position corresponding to pixel position then multiply element value corresponding to pixel value add result to accumulator endif set output image pixel to accumulator corresponding input image pixels are found relative to the kernel's origin. If the kernel is symmetric then place the center (origin) of the kernel on the current pixel. The kernel will overlap the neighboring pixels around the origin. Each kernel element should be multiplied with the pixel value it overlaps with and all of the obtained values should be summed. This resultant sum will be the new value for the current pixel currently overlapped with the center of the kernel. If the kernel is not symmetric, it has to be flipped both around its horizontal and vertical axis before calculating the convolution as above. The general form for matrix convolution is [ x 11 x 12 ⋯ x 1 n x 21 x 22 ⋯ x 2 n ⋮ ⋮ ⋱ ⋮ x m 1 x m 2 ⋯ x m n ] ∗ [ y 11 y 12 ⋯ y 1 n y 21 y 22 ⋯ y 2 n ⋮ ⋮ ⋱ ⋮ y m 1 y m 2 ⋯ y m n ] = ∑ i = 0 m − 1 ∑ j = 0 n − 1 x ( m − i ) ( n − j ) y ( 1 + i ) ( 1 + j ) {\displaystyle {\begin{bmatrix}x_{11}&x_{12}&\cdots &x_{1n}\\x_{21}&x_{22}&\cdots &x_{2n}\\\vdots &\vdots &\ddots &\vdots \\x_{m1}&x_{m2}&\cdots &x_{mn}\\\end{bmatrix}}{\begin{bmatrix}y_{11}&y_{12}&\cdots &y_{1n}\\y_{21}&y_{22}&\cdots &y_{2n}\\\vdots &\vdots &\ddots &\vdots \\y_{m1}&y_{m2}&\cdots &y_{mn}\\\end{bmatrix}}=\sum _{i=0}^{m-1}\sum _{j=0}^{n-1}x_{(m-i)(n-j)}y_{(1+i)(1+j)}} === Edge handling === Kernel convolution usually requires values from pixels outside of the image boundaries. There are a variety of methods for handling image edges. Extend The nearest border pixels are conceptually extended as far as necessary to provide values for the convolution. Corner pixels are extended in 90° wedges. Other edge pixels are extended in lines. Wrap The image is conceptually wrapped (or tiled) and values are taken from the opposite edge or corner. Mirror The image is conceptually mirrored at the edges. For example, attempting to read a pixel 3 units outside an edge reads one 3 units inside the edge instead. Crop / Avoid overlap Any pixel in the output image which would require values from beyond the edge is skipped. This method can result in the output image being slightly smaller, with the edges having been cropped. Move kernel so that values from outside of image is never required. Machine learning mainly uses this approach. Example: Kernel size 10x10, image size 32x32, result image is 23x23. Kernel Crop Any pixel in the kernel that extends past the input image isn't used and the normalizing is adjusted to compensate. Constant Use constant value for pixels outside of image. Usually black or sometimes gray is used. Generally this depends on application. === Normalization === Normalization is defined as the division of each element in the kernel by the sum of all kernel elements, so that the sum of the elements of a normalized kernel is unity. This will ensure the average pixel in the modified image is as bright as the average pixel in the original image. === Optimization === Fast convolution algorithms include: separable convolution ==== Separable convolution ==== 2D convolution with an M × N kernel requires M × N multiplications for each sample (pixel). If the kernel is separable, then the computation can be reduced to M + N multiplications. Using separable convolutions can significantly decrease the computation by doing 1D convolution twice instead of one 2D convolution. === Implementation === Here a concrete convolution implementation done with the GLSL shading language :
Information Rules
Information Rules is a 1999 book by Carl Shapiro and Hal Varian applying traditional economic theories to modern information-based technologies. The book examines commercial strategies appropriate to companies that deal in information, given the high "first copy" and low "subsequent copy" costs of information commodities, such as music CDs or original texts. == Content == The book examines competing standards, and how a company might influence widespread consumer acceptance of one over another, such as VHS versus Betamax, or HD DVD versus Blu-ray. The book mentions possible business strategies of such publishers as Encyclopædia Britannica who have to confront how to stay viable as technology changes the value and availability of information.