AI App For Home Design

AI App For Home Design — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Bin picking

    Bin picking

    Bin picking (also referred to as random bin picking) is a core problem in computer vision and robotics. The goal is to have a robot with sensors and cameras attached to it pick-up known objects with random poses out of a bin using a suction gripper, parallel gripper, or other kind of robot end effector. Early work on bin picking made use of Photometric Stereo in recovering the shapes of objects and to determine their orientation in space. Amazon previously held a competition focused on bin picking referred to as the "Amazon Picking Challenge", which was held from 2015 to 2017. The challenge tasked entrants with building their own robot hardware and software that could attempt simplified versions of the general task of picking and stowing items on shelves. The robots were scored by how many items were picked and stowed in a fixed amount of time. The first Amazon Robotics challenge was won by a team from TU Berlin in 2015, followed by a team from TU Delft and the Dutch company "Fizyr" in 2016. The last Amazon Robotics Challenge was won by the Australian Centre for Robotic Vision at Queensland University of Technology with their robot named Cartman. The Amazon Robotics/Picking Challenge was discontinued following the 2017 competition. Although there can be some overlap, bin picking is distinct from "each picking" and the bin packing problem.

    Read more →
  • Information scientist

    Information scientist

    The term information scientist developed in the latter part of the twentieth century by Wm. Hovey Smith to describe an individual, usually with a relevant subject degree (such as one in Information and Computer Science - CIS) or high level of subject knowledge, providing focused information to scientific and technical research staff in industry. It is a role quite distinct from and complementary to that of a librarian. Developments in end-user searching, together with some convergence between the roles of librarian and information scientist, have led to a diminution in its use in this context, and the term information officer or information professional (information specialist) are also now used. The term was, and is, also used for an individual carrying out research in information science. Brian C. Vickery mentions that the Institute of Information Scientists (IIS) was established in London during 1958 and lists the criteria put forward by this institute "Criteria for Information Science" (appendix 1) as well as his own "Areas of study in information science" (appendix 2). The IIS merged with the Library Association in 2002 to form the Chartered Institute of Library and Information Professionals (CILIP). == Notable Information Scientists == See also Award of Merit - Association for Information Science and Technology Marcia Bates David Blair (information technologist) Samuel C. Bradford Michael Buckland John M. Carroll Blaise Cronin Emilia Currás Brenda Dervin Eugene Garfield Paul B. Kantor Frederick Wilfrid Lancaster Calvin Mooers Tefko Saracevic Linda C. Smith Robert Saxton Taylor Brian Campbell Vickery Thomas D. Wilson == Additional reading == Ellis, David and Merete Haugan. (1997) "Modelling the information seeking patterns of engineers and research scientists in an industrial environment" (Journal of Documentation, Volume 53(4): pp. 384–403) Poole, Alex H. (2024). "'There's a big difference between going through life with the wind at your back, and going through life leaning into the wind': Feminism in Post-World War II Information Science". Proceedings of the Association for Information Science and Technology. 61: 300–313. doi:10.1002/pra2.1029. Vickery, Brian Campbell (1988) "Essays presented to B. C. Vickery" (Journal of Documentation, Volume 44, pp. 199–283). Vickery, B. & Vickery, A. (1987) Information Science in theory and practice (London: Bowker-Saur, pp. 361–369)

    Read more →
  • Leiden algorithm

    Leiden algorithm

    The Leiden algorithm is a community detection algorithm developed by Traag et al at Leiden University. It was developed as a modification of the Louvain method. Like the Louvain method, the Leiden algorithm attempts to optimize modularity in extracting communities from networks; however, it addresses key issues present in the Louvain method, namely poorly connected communities and the resolution limit of modularity. == Improvement over Louvain method == Broadly, the Leiden algorithm uses the same two primary phases as the Louvain algorithm: a local node moving step (though, the method by which nodes are considered in Leiden is more efficient) and a graph aggregation step. However, to address the issues with poorly-connected communities and the merging of smaller communities into larger communities (the resolution limit of modularity), the Leiden algorithm employs an intermediate refinement phase in which communities may be split to guarantee that all communities are well-connected. Consider, for example, the following graph: Three communities are present in this graph (each color represents a community). Additionally, the center "bridge" node (represented with an extra circle) is a member of the community represented by blue nodes. Now consider the result of a node-moving step which merges the communities denoted by red and green nodes into a single community (as the two communities are highly connected): Notably, the center "bridge" node is now a member of the larger red community after node moving occurs (due to the greedy nature of the local node moving algorithm). In the Louvain method, such a merging would be followed immediately by the graph aggregation phase. However, this causes a disconnection between two different sections of the community represented by blue nodes. In the Leiden algorithm, the graph is instead refined: The Leiden algorithm's refinement step ensures that the center "bridge" node is kept in the blue community to ensure that it remains intact and connected, despite the potential improvement in modularity from adding the center "bridge" node to the red community. == Graph components == Before defining the Leiden algorithm, it will be helpful to define some of the components of a graph. === Vertices and edges === A graph is composed of vertices (nodes) and edges. Each edge is connected to two vertices, and each vertex may be connected to zero or more edges. Edges are typically represented by straight lines, while nodes are represented by circles or points. In set notation, let V {\displaystyle V} be the set of vertices, and E {\displaystyle E} be the set of edges: V := { v 1 , v 2 , … , v n } E := { e i j , e i k , … , e k l } {\displaystyle {\begin{aligned}V&:=\{v_{1},v_{2},\dots ,v_{n}\}\\E&:=\{e_{ij},e_{ik},\dots ,e_{kl}\}\end{aligned}}} where e i j {\displaystyle e_{ij}} is the directed edge from vertex v i {\displaystyle v_{i}} to vertex v j {\displaystyle v_{j}} . We can also write this as an ordered pair: e i j := ( v i , v j ) {\displaystyle {\begin{aligned}e_{ij}&:=(v_{i},v_{j})\end{aligned}}} === Community === A community is a unique set of nodes: C i ⊆ V C i ⋂ C j = ∅ ∀ i ≠ j {\displaystyle {\begin{aligned}C_{i}&\subseteq V\\C_{i}&\bigcap C_{j}=\emptyset ~\forall ~i\neq j\end{aligned}}} and the union of all communities must be the total set of vertices: V = ⋃ i = 1 C i {\displaystyle {\begin{aligned}V&=\bigcup _{i=1}C_{i}\end{aligned}}} === Partition === A partition is the set of all communities: P = { C 1 , C 2 , … , C n } {\displaystyle {\begin{aligned}{\mathcal {P}}&=\{C_{1},C_{2},\dots ,C_{n}\}\end{aligned}}} == Partition quality == How communities are partitioned is an integral part on the Leiden algorithm. How partitions are decided can depend on how their quality is measured. Additionally, many of these metrics contain parameters of their own that can change the outcome of their communities. === Modularity === Modularity is a highly used quality metric for assessing how well a set of communities partition a graph. The equation for this metric is defined for an adjacency matrix, A, as: Q = 1 2 m ∑ i j ( A i j − k i k j 2 m ) δ ( c i , c j ) {\displaystyle Q={\frac {1}{2m}}\sum _{ij}(A_{ij}-{\frac {k_{i}k_{j}}{2m}})\delta (c_{i},c_{j})} where: A i j {\displaystyle A_{ij}} represents the edge weight between nodes i {\displaystyle i} and j {\displaystyle j} ; see Adjacency matrix; k i {\displaystyle k_{i}} and k j {\displaystyle k_{j}} are the sum of the weights of the edges attached to nodes i {\displaystyle i} and j {\displaystyle j} , respectively; m {\displaystyle m} is the sum of all of the edge weights in the graph; c i {\displaystyle c_{i}} and c j {\displaystyle c_{j}} are the communities to which the nodes i {\displaystyle i} and j {\displaystyle j} belong; and δ {\displaystyle \delta } is Kronecker delta function: δ ( c i , c j ) = { 1 if c i and c j are the same community 0 otherwise {\displaystyle {\begin{aligned}\delta (c_{i},c_{j})&={\begin{cases}1&{\text{if }}c_{i}{\text{ and }}c_{j}{\text{ are the same community}}\\0&{\text{otherwise}}\end{cases}}\end{aligned}}} === Reichardt Bornholdt Potts Model (RB) === One of the most well used metrics for the Leiden algorithm is the Reichardt Bornholdt Potts Model (RB). This model is used by default in most mainstream Leiden algorithm libraries under the name RBConfigurationVertexPartition. This model introduces a resolution parameter γ {\displaystyle \gamma } and is highly similar to the equation for modularity. This model is defined by the following quality function for an adjacency matrix, A, as: Q = ∑ i j ( A i j − γ k i k j 2 m ) δ ( c i , c j ) {\displaystyle Q=\sum _{ij}(A_{ij}-\gamma {\frac {k_{i}k_{j}}{2m}})\delta (c_{i},c_{j})} where: γ {\displaystyle \gamma } represents a linear resolution parameter === Constant Potts Model (CPM) === Another metric similar to RB is the Constant Potts Model (CPM). This metric also relies on a resolution parameter γ {\displaystyle \gamma } The quality function is defined as: H = − ∑ i j ( A i j w i j − γ ) δ ( c i , c j ) {\displaystyle H=-\sum _{ij}(A_{ij}w_{ij}-\gamma )\delta (c_{i},c_{j})} === Understanding Potts Model resolution parameters/Resolution limit === Typically Potts models such as RB or CPM include a resolution parameter in their calculation. Potts models are introduced as a response to the resolution limit problem that is present in modularity maximization based community detection. The resolution limit problem is that, for some graphs, maximizing modularity may cause substructures of a graph to merge and become a single community and thus smaller structures are lost. These resolution parameters allow modularity adjacent methods to be modified to suit the requirements of the user applying the Leiden algorithm to account for small substructures at a certain granularity. The figure on the right illustrates why resolution can be a helpful parameter when using modularity based quality metrics. In the first graph, modularity only captures the large scale structures of the graph; however, in the second example, a more granular quality metric could potentially detect all substructures in a graph. == Algorithm == The Leiden algorithm starts with a graph of disorganized nodes (a) and sorts it by partitioning them to maximize modularity (the difference in quality between the generated partition and a hypothetical randomized partition of communities). The method it uses is similar to the Louvain algorithm, except that after moving each node it also considers that node's neighbors that are not already in the community it was placed in. This process results in our first partition (b), also referred to as P {\displaystyle {\mathcal {P}}} . Then the algorithm refines this partition by first placing each node into its own individual community and then moving them from one community to another to maximize modularity. It does this iteratively until each node has been visited and moved, and each community has been refined - this creates partition (c), which is the initial partition of P refined {\displaystyle {\mathcal {P}}_{\text{refined}}} . Then an aggregate network (d) is created by turning each community into a node. P refined {\displaystyle {\mathcal {P}}_{\text{refined}}} is used as the basis for the aggregate network while P {\displaystyle {\mathcal {P}}} is used to create its initial partition. Because we use the original partition P {\displaystyle {\mathcal {P}}} in this step, we must retain it so that it can be used in future iterations. These steps together form the first iteration of the algorithm. In subsequent iterations, the nodes of the aggregate network (which each represent a community) are once again placed into their own individual communities and then sorted according to modularity to form a new P refined {\displaystyle {\mathcal {P}}_{\text{refined}}} , forming (e) in the above graphic. In the case depicted by the graph, the nodes were already sorted optimally, so no change too

    Read more →
  • Enterprise data planning

    Enterprise data planning

    Enterprise data planning is the starting point for enterprise wide change. It states the destination and describes how you will get there. It defines benefits, costs and potential risks. It provides measures to be used along the way to judge progress and adjust the journey according to changing circumstances. Data is fundamental to investment enterprises. Effective, economic management of data underpins operations and enables transformations needed to satisfy customer demands, competition and regulation. Data warehouse(s) and other aspects of the overall data architecture are critical to the enterprise. EDMworks has created a strategic data planning approach for the Investment Sector. It consists of a planning process, planning intranets, templates and training materials. EDMworks planning process is based on the belief that extensive domain knowledge significantly shortens planning iterations and enables progressively higher quality plans to be produced and implemented. This approach drives the development of an effective and economic enterprise data architecture. Enterprise data planning is based on proven business disciplines. Key architectural layers for data and applications are then added in order to provide an enterprise wide understanding of the uses and interdependencies of data. This enables the definition of the core components of the EDM plan: Industry structure and business objectives Assessment of systems and services Target architecture for applications, data and infrastructure Target organization structures Systems, database, infrastructure and organizational plans Business case, costs, benefits, results and risks. EDMworks uses several components from the Open Systems Group TOGAF enterprise systems planning process. TOGAF acts as an extension to good business planning methods to provide a framework for the development of the systems and data architectural components. == History == James Martin was one of the pathfinders in data planning methodologies. He was one of the first to identify data as being an enterprise wide asset that required management. He developed a series of tools and methods to support that process. Most of the large consulting firms developed their own methods to address the same basic issue. Frequently, their approaches were incorporated into their own branded system development methodologies that encompassed the complete systems development life-cycle. Others, such as Ed Tozer, developed more focused offerings that dealt with the complexities of extracting key business needs from senior management and then defining relevant architectural visions for the specific enterprise. From these various sources, the concepts of Business, Data, Applications and Technology Architectures emerged. The Open Group Architectural Framework (TOGAF) has taken this work forward and has established a sound method in TOGAF version 9. EDMworks approach is to adopt these planning and architectural practices as a basis and then add two additional dimensions to the planning and implementation focus: Domain knowledge of the Investments sector. Investments is a complex global industry with a common set of characteristics about clients, information vendors, competition and regulation. Domain knowledge significantly improves the quality of the planning and implementation processes Development of people and teams. Change is a major feature of in any Enterprise Data Management program and people and teams both need development in order to make EDM effective throughout an organization.

    Read more →
  • Dark mode

    Dark mode

    A dark mode, dark theme, night mode, or light-on-dark color scheme is a color scheme that uses light-colored text, icons, and graphical user interface elements on a dark background. It is often discussed in terms of computer user interface design and web design. Many modern websites and operating systems offer the user an optional light-on-dark display mode. Some users find dark mode displays more visually appealing, and claim that it can reduce eye strain. Displaying white at full brightness uses roughly six times as much power as pure black on a 2016 Google Pixel, which has an OLED display. However, conventional LED displays may not benefit from reduced power consumption; but if a LED display has the partial dimming features, it still benefits from reduced power consumption. Most modern operating systems support an optional light-on-dark color scheme. == History == Microsoft introduced the high contrast themes in Windows 95. Later, Microsoft introduced a dark theme in the Anniversary Update of Windows 10 in 2016. In 2018, Apple followed in macOS Mojave. In September 2019, iOS 13 and Android 10 both introduced dark modes. Some operating systems provide tools to change the dark mode state automatically at sundown or sunrise. A "prefers-color-scheme" option was created for front-end web developers in 2019, being a CSS property that signals a user's choice for their system to use a light or dark color theme. Firefox and Chromium have optional dark theme for all internal screens. It is also possible for third-party developers to implement their own dark themes. There are also a variety of browser add-ons that can re-theme web sites with dark color schemes, also aligning with system theme. Wikipedia's mobile and desktop versions received a dark mode option in 2024. == Implementation == There is a prefers-color-scheme media query in CSS, to detect if the user has requested light or dark color scheme and serve the requested color scheme. It can be indicated from the user's operating system preference or a user agent. CSS example: JavaScript example: == Energy usage == Light on dark color schemes require less energy to display on OLED displays. This positively impacts battery life and reduces energy consumption. While an OLED will consume around 40% of the power of an LCD displaying an image that is primarily black, it can use more than three times as much power to display an image with a white background, such as a document or web site. This can lead to reduced battery life and higher energy usage unless a light-on-dark color scheme is used. The long-term reduced power usage may also prolong battery life or the useful life of the display and battery. The energy savings that can be achieved using a light-on-dark color scheme are because of how OLED screens work: in an OLED screen, each subpixel generates its own light and it only consumes power when generating light. This is in contrast to how an LCD works: in an LCD, subpixels either block or allow light from an always-on (lit) LED backlight to pass through. "AMOLED Black" color schemes (that use pure black instead of dark gray) do not necessarily save more energy than other light-on-dark color schemes that use dark gray instead of black, as the power consumption on an AMOLED screen decreases proportionately to the average brightness of the displayed pixels. Although it is true that AMOLED black does save more energy than dark gray, the additional energy savings are often negligible; AMOLED black will only give an additional energy saving of less than 1%, for instance, over the dark gray that's used in the dark theme for Google's official Android apps. In November 2018, Google confirmed that dark mode on Android saved battery life. == Web issues == Some argue that a color scheme with light text on a dark background is easier to read on the screen, because the lower overall brightness causes less eyestrain, while others argue to the contrary. Some pages on the web are designed for white backgrounds; Image assets (GIF, PNG, SVG, WOFF, etc) can be used improperly causing visual artifacts if dark mode is forced (instead of designed for) with a plugin like Dark Reader.

    Read more →
  • MaPS S.A.

    MaPS S.A.

    MaPS S.A. is a software editor founded in 2011 by Thierry Muller. The company is headquartered in Luxembourg. Its platform, called MaPS System, provides Data Management software for Multichannel Marketing. == History and funding == The first version of MaPS System was released under the agency Prem1um S.A. in 2005 in the partnership with Pingroom agency. In combination with MaPS System, Prem1um also provided various consulting services in Marketing, Publishing and Sales. This is where MaPS System takes its names (M stands for Marketing, P for Publishing and S for Sales). In 2011, after being successful, Prem1um S.A. decided to enable the software MaPS System to operate independently under MaPS S.A., as a separate company and editor of the software. The first financial supports were provided by Malta ICI, a Venture Capital firm, and the local partner Chameleon Invest, a seed-capital fund led by Business Angels, who invested €900,000. In a second investment round in 2014 led by Newion Investments, a Venture Capital firm, €1.4 Million were raised, thus amounting to total assets of €2.2 Million. In 2016, the company was taken over by three private investors. In 2018, after two years of continuous growth and European expansion in Belgium, Germany and Switzerland, MaPS S.A acquired Awevo, an e-commerce web agency. == Products == The services included in MaPS System range from the data centralization, Data Governance to an optimized Multichannel Marketing. The software currently includes more than 35 modules for Master Data Management, Product Information Management, Digital Asset Management, Business Process Management including catalogue Publishing features. == Certifications == In 2019, MaPS System and Awevo received "Made in Luxembourg" label, given to the companies whose services are entirely designed in Luxembourg, without any production or development offshoring. MaPS System is a member of ICT Cluster by Luxinnovation.

    Read more →
  • Semantic translation

    Semantic translation

    Semantic translation is the process of using semantic information to aid in the translation of data in one representation or data model to another representation or data model. Semantic translation takes advantage of semantics that associate meaning with individual data elements in one dictionary to create an equivalent meaning in a second system. An example of semantic translation is the conversion of XML data from one data model to a second data model using formal ontologies for each system such as the Web Ontology Language (OWL). This is frequently required by intelligent agents that wish to perform searches on remote computer systems that use different data models to store their data elements. The process of allowing a single user to search multiple systems with a single search request is also known as federated search. Semantic translation should be differentiated from data mapping tools that do simple one-to-one translation of data from one system to another without actually associating meaning with each data element. Semantic translation requires that data elements in the source and destination systems have "semantic mappings" to a central registry or registries of data elements. The simplest mapping is of course where there is equivalence. There are three types of Semantic equivalence: Class Equivalence - indicating that class or "concepts" are equivalent. For example: "Person" is the same as "Individual" Property Equivalence - indicating that two properties are equivalent. For example: "PersonGivenName" is the same as "FirstName" Instance Equivalence - indicating that two individual instances of objects are equivalent. For example: "Dan Smith" is the same person as "Daniel Smith" Semantic translation is very difficult if the terms in a particular data model do not have direct one-to-one mappings to data elements in a foreign data model. In that situation, an alternative approach must be used to find mappings from the original data to the foreign data elements. This problem can be alleviated by centralized metadata registries that use the ISO-11179 standards such as the National Information Exchange Model (NIEM).

    Read more →
  • Recording format

    Recording format

    A recording format is a format for encoding data for storage on a storage medium. The format can be container information such as sectors on a disk, or user/audience information (content) such as analog stereo audio. Multiple levels of encoding may be achieved in one format. For example, a text encoded page may contain HTML and XML encoding, combined in a plain text file format, using either EBCDIC or ASCII character encoding, on a UDF digitally formatted disk. In electronic media, the primary format is the encoding that requires hardware to interpret (decode) data; while secondary encoding is interpreted by secondary signal processing methods, usually computer software. == Recording container formats == A container format is a system for dividing physical storage space or virtual space for data. Data space can be divided evenly by a system of measurement, or divided unevenly with meta data. A grid may divide physical or virtual space with physical or virtual (dividers) borders, evenly or unevenly. Just as a physical container (such as a file cabinet) is divided by physical borders (such as drawers and file folders), data space is divided by virtual borders. Meta data such as a unit of measurement, address, or meta tags act as virtual borders in a container format. A template may be considered an abstract format for containing a solution as well as the content itself. Systems of measurement Metric system Geographic coordinate system Page grid Film formats Audio data format Video tape format Disk format File format Meta data Text formatting Template Data structure == Raw content formats == A raw content format is a system of converting data to displayable information. Raw content formats may either be recorded in secondary signal processing methods such as a software container format (e.g. digital audio, digital video) or recorded in the primary format. A primary raw content format may be directly observable (e.g. image, sound, motion, smell, sensation) or physical data which only requires hardware to display it, such as a phonographic needle and diaphragm or a projector lamp and magnifying glass.

    Read more →
  • Viola–Jones object detection framework

    Viola–Jones object detection framework

    The Viola–Jones object detection framework is a machine learning object detection framework proposed in 2001 by Paul Viola and Michael Jones. It was motivated primarily by the problem of face detection, although it can be adapted to the detection of other object classes. In short, it consists of a sequence of classifiers. Each classifier is a single perceptron with several binary masks (Haar features). To detect faces in an image, a sliding window is computed over the image. For each image, the classifiers are applied. If at any point, a classifier outputs "no face detected", then the window is considered to contain no face. Otherwise, if all classifiers output "face detected", then the window is considered to contain a face. The algorithm is efficient for its time, able to detect faces in 384 by 288 pixel images at 15 frames per second on a conventional 700 MHz Intel Pentium III. It is also robust, achieving high precision and recall. While it has lower accuracy than more modern methods such as convolutional neural network, its efficiency and compact size (only around 50k parameters, compared to millions of parameters for typical CNN like DeepFace) means it is still used in cases with limited computational power. For example, in the original paper, they reported that this face detector could run on the Compaq iPAQ at 2 fps (this device has a low power StrongARM without floating point hardware). == Problem description == Face detection is a binary classification problem combined with a localization problem: given a picture, decide whether it contains faces, and construct bounding boxes for the faces. To make the task more manageable, the Viola–Jones algorithm only detects full view (no occlusion), frontal (no head-turning), upright (no rotation), well-lit, full-sized (occupying most of the frame) faces in fixed-resolution images. The restrictions are not as severe as they appear, as one can normalize the picture to bring it closer to the requirements for Viola-Jones. any image can be scaled to a fixed resolution for a general picture with a face of unknown size and orientation, one can perform blob detection to discover potential faces, then scale and rotate them into the upright, full-sized position. the brightness of the image can be corrected by white balancing. the bounding boxes can be found by sliding a window across the entire picture, and marking down every window that contains a face. This would generally detect the same face multiple times, for which duplication removal methods, such as non-maximal suppression, can be used. The "frontal" requirement is non-negotiable, as there is no simple transformation on the image that can turn a face from a side view to a frontal view. However, one can train multiple Viola-Jones classifiers, one for each angle: one for frontal view, one for 3/4 view, one for profile view, a few more for the angles in-between them. Then one can at run time execute all these classifiers in parallel to detect faces at different view angles. The "full-view" requirement is also non-negotiable, and cannot be simply dealt with by training more Viola-Jones classifiers, since there are too many possible ways to occlude a face. == Components of the framework == A full presentation of the algorithm is in. Consider an image I ( x , y ) {\displaystyle I(x,y)} of fixed resolution ( M , N ) {\displaystyle (M,N)} . Our task is to make a binary decision: whether it is a photo of a standardized face (frontal, well-lit, etc) or not. Viola–Jones is essentially a boosted feature learning algorithm, trained by running a modified AdaBoost algorithm on Haar feature classifiers to find a sequence of classifiers f 1 , f 2 , . . . , f k {\displaystyle f_{1},f_{2},...,f_{k}} . Haar feature classifiers are crude, but allows very fast computation, and the modified AdaBoost constructs a strong classifier out of many weak ones. At run time, a given image I {\displaystyle I} is tested on f 1 ( I ) , f 2 ( I ) , . . . f k ( I ) {\displaystyle f_{1}(I),f_{2}(I),...f_{k}(I)} sequentially. If at any point, f i ( I ) = 0 {\displaystyle f_{i}(I)=0} , the algorithm immediately returns "no face detected". If all classifiers return 1, then the algorithm returns "face detected". For this reason, the Viola-Jones classifier is also called "Haar cascade classifier". === Haar feature classifiers === Consider a perceptron f w , b {\displaystyle f_{w,b}} defined by two variables w ( x , y ) , b {\displaystyle w(x,y),b} . It takes in an image I ( x , y ) {\displaystyle I(x,y)} of fixed resolution, and returns f w , b ( I ) = { 1 , if ∑ x , y w ( x , y ) I ( x , y ) + b > 0 0 , else {\displaystyle f_{w,b}(I)={\begin{cases}1,\quad {\text{if }}\sum _{x,y}w(x,y)I(x,y)+b>0\\0,\quad {\text{else}}\end{cases}}} A Haar feature classifier is a perceptron f w , b {\displaystyle f_{w,b}} with a very special kind of w {\displaystyle w} that makes it extremely cheap to calculate. Namely, if we write out the matrix w ( x , y ) {\displaystyle w(x,y)} , we find that it takes only three possible values { + 1 , − 1 , 0 } {\displaystyle \{+1,-1,0\}} , and if we color the matrix with white on + 1 {\displaystyle +1} , black on − 1 {\displaystyle -1} , and transparent on 0 {\displaystyle 0} , the matrix is in one of the 5 possible patterns shown on the right. Each pattern must also be symmetric to x-reflection and y-reflection (ignoring the color change), so for example, for the horizontal white-black feature, the two rectangles must be of the same width. For the vertical white-black-white feature, the white rectangles must be of the same height, but there is no restriction on the black rectangle's height. ==== Rationale for Haar features ==== The Haar features used in the Viola-Jones algorithm are a subset of the more general Haar basis functions, which have been used previously in the realm of image-based object detection. While crude compared to alternatives such as steerable filters, Haar features are sufficiently complex to match features of typical human faces. For example: The eye region is darker than the upper-cheeks. The nose bridge region is brighter than the eyes. Composition of properties forming matchable facial features: Location and size: eyes, mouth, bridge of nose Value: oriented gradients of pixel intensities Further, the design of Haar features allows for efficient computation of f w , b ( I ) {\displaystyle f_{w,b}(I)} using only constant number of additions and subtractions, regardless of the size of the rectangular features, using the summed-area table. === Learning and using a Viola–Jones classifier === Choose a resolution ( M , N ) {\displaystyle (M,N)} for the images to be classified. In the original paper, they recommended ( M , N ) = ( 24 , 24 ) {\displaystyle (M,N)=(24,24)} . ==== Learning ==== Collect a training set, with some containing faces, and others not containing faces. Perform a certain modified AdaBoost training on the set of all Haar feature classifiers of dimension ( M , N ) {\displaystyle (M,N)} , until a desired level of precision and recall is reached. The modified AdaBoost algorithm would output a sequence of Haar feature classifiers f 1 , f 2 , . . . , f k {\displaystyle f_{1},f_{2},...,f_{k}} . The details of the modified AdaBoost algorithm is detailed below. ==== Using ==== To use a Viola-Jones classifier with f 1 , f 2 , . . . , f k {\displaystyle f_{1},f_{2},...,f_{k}} on an image I {\displaystyle I} , compute f 1 ( I ) , f 2 ( I ) , . . . f k ( I ) {\displaystyle f_{1}(I),f_{2}(I),...f_{k}(I)} sequentially. If at any point, f i ( I ) = 0 {\displaystyle f_{i}(I)=0} , the algorithm immediately returns "no face detected". If all classifiers return 1, then the algorithm returns "face detected". === Learning algorithm === The speed with which features may be evaluated does not adequately compensate for their number, however. For example, in a standard 24x24 pixel sub-window, there are a total of M = 162336 possible features, and it would be prohibitively expensive to evaluate them all when testing an image. Thus, the object detection framework employs a variant of the learning algorithm AdaBoost to both select the best features and to train classifiers that use them. This algorithm constructs a "strong" classifier as a linear combination of weighted simple “weak” classifiers. h ( x ) = sgn ⁡ ( ∑ j = 1 M α j h j ( x ) ) {\displaystyle h(\mathbf {x} )=\operatorname {sgn} \left(\sum _{j=1}^{M}\alpha _{j}h_{j}(\mathbf {x} )\right)} Each weak classifier is a threshold function based on the feature f j {\displaystyle f_{j}} . h j ( x ) = { − s j if f j < θ j s j otherwise {\displaystyle h_{j}(\mathbf {x} )={\begin{cases}-s_{j}&{\text{if }}f_{j}<\theta _{j}\\s_{j}&{\text{otherwise}}\end{cases}}} The threshold value θ j {\displaystyle \theta _{j}} and the polarity s j ∈ ± 1 {\displaystyle s_{j}\in \pm 1} are determined in the training, as well as the coefficients α j {\displaystyle \alpha _{j}} . Here a simplified version of the lea

    Read more →
  • SQL/PSM

    SQL/PSM

    SQL/PSM (SQL/Persistent Stored Modules) is an ISO standard mainly defining an extension of SQL with a procedural language for use in stored procedures. Initially published in 1996 as an extension of SQL-92 (ISO/IEC 9075-4:1996, a version sometimes called PSM-96 or even SQL-92/PSM), SQL/PSM was later incorporated into the multi-part SQL:1999 standard, and has been part 4 of that standard since then, most recently in SQL:2023. The SQL:1999 part 4 covered less than the original PSM-96 because the SQL statements for defining, managing, and invoking routines were actually incorporated into part 2 SQL/Foundation, leaving only the procedural language itself as SQL/PSM. The SQL/PSM facilities are still optional as far as the SQL standard is concerned; most of them are grouped in Features P001-P008. SQL/PSM standardizes syntax and semantics for control flow, exception handling (called "condition handling" in SQL/PSM), local variables, assignment of expressions to variables and parameters, and (procedural) use of cursors. It also defines an information schema (metadata) for stored procedures. SQL/PSM is one language in which methods for the SQL:1999 structured types can be defined. The other is Java, via SQL/JRT. SQL/PSM is derived, seemingly directly, from Oracle's PL/SQL. Oracle developed PL/SQL and released it in 1991, basing the language on the US Department of Defense's Ada programming language. However, Oracle has maintained a distance from the standard in its documentation. IBM's SQL PL (used in DB2) and Mimer SQL's PSM were the first two products officially implementing SQL/PSM. It is commonly thought that these two languages, and perhaps also MySQL/MariaDB's procedural language, are closest to the SQL/PSM standard. However, a PostgreSQL addon implements SQL/PSM (alongside its other procedural languages like the PL/SQL-derived plpgsql), although it is not part of the core product. RDF functionality in OpenLink Virtuoso was developed entirely through SQL/PSM, combined with custom datatypes (e.g., ANY for handling URI and Literal relation objects), sophisticated indexing, and flexible physical storage choices (column-wise or row-wise).

    Read more →
  • Reverse data management

    Reverse data management

    Reverse data management describes a branch and set of research questions in relational database theory that aim to reverse the common focus of standard data management. Instead of focusing on the "forward" transformation of an input databases (a set of relational tables) to an output table, which is the main focus of standard query evaluation, reverse data management reverses that focus and studies the possible input database transformations that would achieve a desired output. Usually the objective is to find an intervention (a deletion, addition, or change of tuples) of minimal size, in order to achieve a particular change in the output. The problem has been studied at least since the 1980s, but has received renewed attention due to an influential paper in the early 2000s that made a connection between provenance and view propagation. The term was coined in a VLDB 2011 vision paper. The problem has been receiving significant attention in recent years due to its connection to computational fairness. == Topics in reverse data management problems == Example topics in reverse data management include: Deletion propagation with source side-effects: Find a minimal number of tuples to delete in the database in order to delete a particular tuple in the output. Deletion propagation with view side-effects: Find a set of tuples to delete in the database in order to delete a particular tuple in the output, while removing the minimal number of other output tuples. Causal responsibility: Find a minimal number of tuples to delete in the database in order to make a particular input tuple counterfactual. This notion is inspired by the notions of actual cause and causal responsibility from the work of Halpern and Pearl. Resilience: Find a minimal number of tuples to delete in the database in order to make a Boolean query false. The complexity of this problem is identical to the problem of deletion propagation with source-side effects over a different database. Smallest witness problem: Find a minimal number of tuples to keep in the a database (or equivalently, delete a maximal number of tuples) while keeping a particular tuple in the output. Minimum repair: Given a database that violates certain integrity constraints, find a minimal number of tuples to delete in the database in order to fulfill all constraints (also called to "repair" the database).

    Read more →
  • Tuple

    Tuple

    In mathematics, a tuple is a finite sequence (or ordered list) of numbers. More generally, it is a sequence of mathematical objects, called the elements of the tuple. An n-tuple is a tuple of n elements, where n is a non-negative integer. There is only one 0-tuple, called the empty tuple. A 1-tuple and a 2-tuple are commonly called a singleton and an ordered pair, respectively. The term "infinite tuple" is occasionally used for "infinite sequences". Tuples are usually written by listing the elements within parentheses "( )" and separated by commas; for example, (2, 7, 4, 1, 7) denotes a 5-tuple. Other types of brackets are sometimes used, although they may have a different meaning. An n-tuple can be formally defined as the image of a function that has the set of the first n natural numbers as its domain (1, 2, ..., n). Tuples may be also defined from ordered pairs by a recurrence starting from an ordered pair; indeed, an n-tuple can be identified with the ordered pair of its (n − 1) first elements and its nth element, for example, ( ( ( 1 , 2 ) , 3 ) , 4 ) = ( 1 , 2 , 3 , 4 ) {\displaystyle \left(\left(\left(1,2\right),3\right),4\right)=\left(1,2,3,4\right)} . In computer science, tuples come in many forms. Most typed functional programming languages implement tuples directly as product types, tightly associated with algebraic data types, pattern matching, and destructuring assignment. Many programming languages offer an alternative to tuples, known as record types, featuring unordered elements accessed by label. A few programming languages combine ordered tuple product types and unordered record types into a single construct, as in C structs and Haskell records. Relational databases may formally identify their rows (records) as tuples. Tuples also occur in relational algebra; when programming the semantic web with the Resource Description Framework (RDF); in linguistics; and in philosophy. == Etymology == The term originated as an abstraction of the sequence: single, couple/double, triple, quadruple, quintuple, sextuple, septuple, octuple, ..., n‑tuple, ..., where the prefixes are taken from the Latin names of the numerals. The unique 0-tuple is called the null tuple or empty tuple. A 1‑tuple is called a single (or singleton), a 2‑tuple is called an ordered pair or couple, and a 3‑tuple is called a triple (or triplet). The number n can be any nonnegative integer. For example, a complex number can be represented as a 2‑tuple of reals, a quaternion can be represented as a 4‑tuple, an octonion can be represented as an 8‑tuple, and a sedenion can be represented as a 16‑tuple. Although these uses treat ‑tuple as the suffix, the original suffix was ‑ple as in "triple" (three-fold) or "decuple" (ten‑fold). This originates from medieval Latin plus (meaning "more") related to Greek ‑πλοῦς, which replaced the classical and late antique ‑plex (meaning "folded"), as in "duplex". == Properties == The general rule for the identity of two n-tuples is ( a 1 , a 2 , … , a n ) = ( b 1 , b 2 , … , b n ) {\displaystyle (a_{1},a_{2},\ldots ,a_{n})=(b_{1},b_{2},\ldots ,b_{n})} if and only if a 1 = b 1 , a 2 = b 2 , … , a n = b n {\displaystyle a_{1}=b_{1},{\text{ }}a_{2}=b_{2},{\text{ }}\ldots ,{\text{ }}a_{n}=b_{n}} . Thus a tuple has properties that distinguish it from a set: A tuple may contain multiple instances of the same element, so tuple ( 1 , 2 , 2 , 3 ) ≠ ( 1 , 2 , 3 ) {\displaystyle (1,2,2,3)\neq (1,2,3)} ; but set { 1 , 2 , 2 , 3 } = { 1 , 2 , 3 } {\displaystyle \{1,2,2,3\}=\{1,2,3\}} . Tuple elements are ordered: tuple ( 1 , 2 , 3 ) ≠ ( 3 , 2 , 1 ) {\displaystyle (1,2,3)\neq (3,2,1)} , but set { 1 , 2 , 3 } = { 3 , 2 , 1 } {\displaystyle \{1,2,3\}=\{3,2,1\}} . A tuple has a finite number of elements, while a set or a multiset may have an infinite number of elements. == Definitions == There are several definitions of tuples that give them the properties described in the previous section. === Tuples as functions === The 0 {\displaystyle 0} -tuple may be identified as the empty function. For n ≥ 1 , {\displaystyle n\geq 1,} the n {\displaystyle n} -tuple ( a 1 , … , a n ) {\displaystyle \left(a_{1},\ldots ,a_{n}\right)} may be identified with the surjective function F : { 1 , … , n } → { a 1 , … , a n } {\displaystyle F~:~\left\{1,\ldots ,n\right\}~\to ~\left\{a_{1},\ldots ,a_{n}\right\}} with domain domain ⁡ F = { 1 , … , n } = { i ∈ N : 1 ≤ i ≤ n } {\displaystyle \operatorname {domain} F=\left\{1,\ldots ,n\right\}=\left\{i\in \mathbb {N} :1\leq i\leq n\right\}} and with codomain codomain ⁡ F = { a 1 , … , a n } , {\displaystyle \operatorname {codomain} F=\left\{a_{1},\ldots ,a_{n}\right\},} that is defined at i ∈ domain ⁡ F = { 1 , … , n } {\displaystyle i\in \operatorname {domain} F=\left\{1,\ldots ,n\right\}} by F ( i ) := a i . {\displaystyle F(i):=a_{i}.} That is, F {\displaystyle F} is the function defined by 1 ↦ a 1 ⋮ n ↦ a n {\displaystyle {\begin{alignedat}{3}1\;&\mapsto &&\;a_{1}\\\;&\;\;\vdots &&\;\\n\;&\mapsto &&\;a_{n}\\\end{alignedat}}} in which case the equality ( a 1 , a 2 , … , a n ) = ( F ( 1 ) , F ( 2 ) , … , F ( n ) ) {\displaystyle \left(a_{1},a_{2},\dots ,a_{n}\right)=\left(F(1),F(2),\dots ,F(n)\right)} necessarily holds. Tuples as sets of ordered pairs Functions are commonly identified with their graphs, which is a certain set of ordered pairs. Indeed, many authors use graphs as the definition of a function. Using this definition of "function", the above function F {\displaystyle F} can be defined as: F := { ( 1 , a 1 ) , … , ( n , a n ) } . {\displaystyle F~:=~\left\{\left(1,a_{1}\right),\ldots ,\left(n,a_{n}\right)\right\}.} === Tuples as nested ordered pairs === Another way of modeling tuples in set theory is as nested ordered pairs. This approach assumes that the notion of ordered pair has already been defined. The 0-tuple (i.e. the empty tuple) is represented by the empty set ∅ {\displaystyle \emptyset } . An n-tuple, with n > 0, can be defined as an ordered pair of its first entry and an (n − 1)-tuple (which contains the remaining entries when n > 1): ( a 1 , a 2 , a 3 , … , a n ) = ( a 1 , ( a 2 , a 3 , … , a n ) ) {\displaystyle (a_{1},a_{2},a_{3},\ldots ,a_{n})=(a_{1},(a_{2},a_{3},\ldots ,a_{n}))} This definition can be applied recursively to the (n − 1)-tuple: ( a 1 , a 2 , a 3 , … , a n ) = ( a 1 , ( a 2 , ( a 3 , ( … , ( a n , ∅ ) … ) ) ) ) {\displaystyle (a_{1},a_{2},a_{3},\ldots ,a_{n})=(a_{1},(a_{2},(a_{3},(\ldots ,(a_{n},\emptyset )\ldots ))))} Thus, for example: ( 1 , 2 , 3 ) = ( 1 , ( 2 , ( 3 , ∅ ) ) ) ( 1 , 2 , 3 , 4 ) = ( 1 , ( 2 , ( 3 , ( 4 , ∅ ) ) ) ) {\displaystyle {\begin{aligned}(1,2,3)&=(1,(2,(3,\emptyset )))\\(1,2,3,4)&=(1,(2,(3,(4,\emptyset ))))\\\end{aligned}}} A variant of this definition starts "peeling off" elements from the other end: The 0-tuple is the empty set ∅ {\displaystyle \emptyset } . For n > 0: ( a 1 , a 2 , a 3 , … , a n ) = ( ( a 1 , a 2 , a 3 , … , a n − 1 ) , a n ) {\displaystyle (a_{1},a_{2},a_{3},\ldots ,a_{n})=((a_{1},a_{2},a_{3},\ldots ,a_{n-1}),a_{n})} This definition can be applied recursively: ( a 1 , a 2 , a 3 , … , a n ) = ( ( … ( ( ( ∅ , a 1 ) , a 2 ) , a 3 ) , … ) , a n ) {\displaystyle (a_{1},a_{2},a_{3},\ldots ,a_{n})=((\ldots (((\emptyset ,a_{1}),a_{2}),a_{3}),\ldots ),a_{n})} Thus, for example: ( 1 , 2 , 3 ) = ( ( ( ∅ , 1 ) , 2 ) , 3 ) ( 1 , 2 , 3 , 4 ) = ( ( ( ( ∅ , 1 ) , 2 ) , 3 ) , 4 ) {\displaystyle {\begin{aligned}(1,2,3)&=(((\emptyset ,1),2),3)\\(1,2,3,4)&=((((\emptyset ,1),2),3),4)\\\end{aligned}}} === Tuples as nested sets === Using Kuratowski's representation for an ordered pair, the second definition above can be reformulated in terms of pure set theory: The 0-tuple (i.e. the empty tuple) is represented by the empty set ∅ {\displaystyle \emptyset } ; Let x {\displaystyle x} be an n-tuple ( a 1 , a 2 , … , a n ) {\displaystyle (a_{1},a_{2},\ldots ,a_{n})} , and let x → b ≡ ( a 1 , a 2 , … , a n , b ) {\displaystyle x\rightarrow b\equiv (a_{1},a_{2},\ldots ,a_{n},b)} . Then, x → b ≡ { { x } , { x , b } } {\displaystyle x\rightarrow b\equiv \{\{x\},\{x,b\}\}} . (The right arrow, → {\displaystyle \rightarrow } , could be read as "adjoined with".) In this formulation: ( ) = ∅ ( 1 ) = ( ) → 1 = { { ( ) } , { ( ) , 1 } } = { { ∅ } , { ∅ , 1 } } ( 1 , 2 ) = ( 1 ) → 2 = { { ( 1 ) } , { ( 1 ) , 2 } } = { { { { ∅ } , { ∅ , 1 } } } , { { { ∅ } , { ∅ , 1 } } , 2 } } ( 1 , 2 , 3 ) = ( 1 , 2 ) → 3 = { { ( 1 , 2 ) } , { ( 1 , 2 ) , 3 } } = { { { { { { ∅ } , { ∅ , 1 } } } , { { { ∅ } , { ∅ , 1 } } , 2 } } } , { { { { { ∅ } , { ∅ , 1 } } } , { { { ∅ } , { ∅ , 1 } } , 2 } } , 3 } } {\displaystyle {\begin{array}{lclcl}()&&&=&\emptyset \\&&&&\\(1)&=&()\rightarrow 1&=&\{\{()\},\{(),1\}\}\\&&&=&\{\{\emptyset \},\{\emptyset ,1\}\}\\&&&&\\(1,2)&=&(1)\rightarrow 2&=&\{\{(1)\},\{(1),2\}\}\\&&&=&\{\{\{\{\emptyset \},\{\emptyset ,1\}\}\},\\&&&&\{\{\{\emptyset \},\{\emptyset ,1\}\},2\}\}\\&&&&\\(1,2,3)&=&(1,2)\rightarrow 3&=&\{\{(1,2)\},\{(1,2),3\}\}\\&&&=&\{\{\{\{\{\{\empty

    Read more →
  • Depth peeling

    Depth peeling

    In computer graphics, depth peeling is an exact multipass method of order-independent transparency that extracts transparent fragments into depth layers and composites those layers in depth order. Depth peeling has the advantage of being able to generate correct results even for complex images containing intersecting transparent objects. == Method == Depth peeling works by rendering the image multiple times. Depth peeling uses two Z buffers, one that works conventionally, and one that is not modified, and sets the minimum distance at which a fragment can be drawn without being discarded. For each pass, the previous pass' conventional Z-buffer is used as the minimal Z-buffer, so each pass removes already-captured nearer fragments and draws the next depth layer behind them. The resulting images can then be composited in depth order to form a single image. A major drawback of classical depth peeling is performance: it requires one geometry pass per peeled layer, so scenes with high depth complexity require many passes that each re-rasterize the transparent geometry. Later variants reduce the number of passes by peeling multiple layers or both front and back layers in a pass. Dual depth peeling reduces the geometry-pass count from N to N/2+1 by peeling one layer from the front and one from the back in each pass, while multi-layer depth peeling peels several layers per pass and reported up to an 8x speed-up in RGBA8 settings.

    Read more →
  • Navigational database

    Navigational database

    A navigational database is a type of database in which records or objects are found primarily by following references from other objects. The term was popularized by the title of Charles Bachman's 1973 Turing Award paper, The Programmer as Navigator. This paper emphasized the fact that the new disk-based database systems allowed the programmer to choose arbitrary navigational routes following relationships from record to record, contrasting this with the constraints of earlier magnetic-tape and punched card systems where data access was strictly sequential. One of the earliest navigational databases was Integrated Data Store (IDS), which was developed by Bachman for General Electric in the 1960s. IDS became the basis for the CODASYL database model in 1969. Although Bachman described the concept of navigation in abstract terms, the idea of navigational access came to be associated strongly with the procedural design of the CODASYL Data Manipulation Language. Writing in 1982, for example, Tsichritzis and Lochovsky state that "The notion of currency is central to the concept of navigation." By the notion of currency, they refer to the idea that a program maintains (explicitly or implicitly) a current position in any sequence of records that it is processing, and that operations such as GET NEXT and GET PRIOR retrieve records relative to this current position, while also changing the current position to the record that is retrieved. Navigational database programming thus came to be seen as intrinsically procedural; and moreover to depend on the maintenance of an implicit set of global variables (currency indicators) holding the current state. As such, the approach was seen as diametrically opposed to the declarative programming style used by the relational model. The declarative nature of relational languages such as SQL offered better programmer productivity and a higher level of data independence (that is, the ability of programs to continue working as the database structure evolves.) Navigational interfaces, as a result, were gradually eclipsed during the 1980s by declarative query languages. During the 1990s it started becoming clear that for certain applications handling complex data (for example, spatial databases and engineering databases), the relational calculus had limitations. At that time, a reappraisal of the entire database market began, with several companies describing the new systems using the marketing term NoSQL. Many of these systems introduced data manipulation languages which, while far removed from the CODASYL DML with its currency indicators, could be understood as implementing Bachman's "navigational" vision. Some of these languages are procedural; others (such as XPath) are entirely declarative. Offshoots of the navigational concept, such as the graph database, found new uses in modern transaction processing workloads. == Description == Navigational access is traditionally associated with the network model and hierarchical model of database, and conventionally describes data manipulation APIs in which records (or objects) are processed one at a time, iteratively. The essential characteristic as described by Bachman, however, is finding records by virtue of their relationship to other records: so an interface can still be navigational if it has set-oriented features. From this viewpoint, the key difference between navigational data manipulation languages and relational languages is the use of explicit named relationships rather than value-based joins: for department with name="Sales", find all employees in set department-employees versus find employees, departments where employee.department-code = department.code and department.name="Sales". In practice, however, most navigational APIs have been procedural: the above query would be executed using procedural logic along the lines of the following pseudo-code: On this viewpoint, the key difference between navigational APIs and the relational model (implemented in relational databases) is that relational APIs use "declarative" or logic programming techniques that ask the system what to fetch, while navigational APIs instruct the system in a sequence of steps how to reach the required records. Most criticisms of navigational APIs fall into one of two categories: Usability: application code quickly becomes unreadable and difficult to debug Data independence: application code needs to change whenever the data structure changes For many years the primary defence of navigational APIs was performance. Database systems that support navigational APIs often use internal storage structures that contain physical links or pointers from one record to another. While such structures may allow very efficient navigation, they have disadvantages because it becomes difficult to reorganize the physical placement of data. It is quite possible to implement navigational APIs without low-level pointer chasing (Bachman's paper envisaged logical relationships being implemented just as in relational systems, using primary keys and foreign keys), so the two ideas should not be conflated. But without the performance benefits of low-level pointers, navigational APIs become harder to justify. Hierarchical models often construct primary keys for records by concatenating the keys that appear at each level in the hierarchy. Such composite identifiers are found in computer file names (/usr/david/docs/index.txt), in URIs, in the Dewey decimal system, and for that matter in postal addresses. Such a composite key can be considered as representing a navigational path to a record; but equally, it can be considered as a simple primary key allowing associative access. As relational systems came to prominence in the 1980s, navigational APIs (and in particular, procedural APIs) were criticized and fell out of favour. The 1990s, however, brought a new wave of object-oriented databases that often provided both declarative and procedural interfaces. One explanation for this is that they were often used to represent graph-structured information (for example spatial data and engineering data) where access is inherently recursive: the mathematics originally underpinning SQL (specifically, first-order predicate calculus) does not have sufficient power to support recursive queries, even those as simple as a transitive closure. More recent SQL implementations do support hierarchical and recursive queries. A current example of a popular navigational API can be found in the Document Object Model (DOM) often used in web browsers and closely associated with JavaScript. The DOM is essentially an in-memory hierarchical database with an API that is both procedural and navigational. By contrast, the same data (XML or HTML) can be accessed using XPath, which can be categorized as declarative and navigational: data is accessed by following relationships, but the calling program does not issue a sequence of instructions to be followed in order. Languages such as SPARQL used to retrieve Linked Data from the Semantic Web are also simultaneously declarative and navigational. == Examples == IBM Information Management System IDMS

    Read more →
  • Subject (documents)

    Subject (documents)

    In library and information science documents (such as books, articles and pictures) are classified and searched by subject – as well as by other attributes such as author, genre and document type. This makes "subject" a fundamental term in this field. Library and information specialists assign subject labels to documents to make them findable. There are many ways to do this and in general there is not always consensus about which subject should be assigned to a given document. To optimize subject indexing and searching, we need to have a deeper understanding of what a subject is. The question: "what is to be understood by the statement 'document A belongs to subject category X'?" has been debated in the field for more than 100 years (see below) == Theoretical view == === Charles Ammi Cutter (1837–1903) === For Cutter the stability of subjects depends on a social process in which their meaning is stabilized in a name or a designation. A subject "referred [...] to those intellections [...] that had received a name that itself represented a distinct consensus in usage" (Miksa, 1983a, p. 60) and: the "systematic structure of established subjects" is "resident in the public realm" (Miksa, 1983a, p. 69); "[s]ubjects are by their very nature locations in a classificatory structure of publicly accumulated knowledge (Miksa, 1983a, p. 61). Bernd Frohmann adds: "The stability of the public realm in turn relies upon natural and objective mental structures which, with proper education, govern a natural progression from particular to general concepts. Since for Cutter, mind, society, and SKO [Systems of Knowledge Organization] stand one behind the other, each supporting each, all manifesting the same structure, his discursive construction of subjects invites connections with discourses of mind, education, and society. The Dewey Decimal Classification (DDC), by contrast, severs those connections. Melvil Dewey emphasized more than once that his system maps no structure beyond its own; there is neither a "transcendental deduction" of its categories nor any reference to Cutter's objective structure of social consensus. It is content-free: Dewey disdained any philosophical excogitation of the meaning of his class symbols, leaving the job of finding verbal equivalents to others. His innovation and the essence of the system lay in the notation. The DDC is a poorly semiotic system of expanding nests of ten digits, lacking any referent beyond itself. In it, a subject is wholly constituted in terms of its position in the system. The essential characteristic of a subject is a class symbol which refers only to other symbols. Its verbal equivalent is accidental, a merely pragmatic characteristic... .... The conflict of interpretations over "subjects" became explicit in the battles between "bibliography" (an approach to subjects having much in common with Cutter's) and Dewey's "close classification". William Fletcher spoke for the scholarly bibliographer.... Fletcher's "subjects", like Cutter's, referred to the categories of a fantasized, stable social order, whereas Dewey's subjects were elements of a semiological system of standardized, techno-bureaucratic administrative software for the library in its corporate, rather than high culture, incarnation". (Frohmann, 1994, 112–113). Cutter's early view on what a subject is, is probably wiser than most understandings that dominated the 20th century – and also the understanding reflected in the ISO-standard quoted below. The early statements quoted by Frohmann indicate that subjects are somehow shaped in social processes. When that is said, it should be added that they are not particularly detailed or clear. We only get a vague idea of the social nature of subjects. === S. R. Ranganathan (1892–1972) === A classification system with an explicit theoretical foundation is Ranganathan's Colon Classification. Ranganathan provided an explicit definition of the concept of "subject": Subject – an organized body of ideas, whose extension and intension are likely to fall coherently within the field of interests and comfortably within the intellectual competence and the field of inevitable specialization of a normal person. A related definition is given by one of Ranganathan's students: A subject is an organized and systematized body of ideas. It may consist of one idea or a combination of several... Ranganathan's definition of "subject" is strongly influenced by his Colon Classification system. The colon system is based on the combination of single elements from facets to subject designation. This is the reason why the combined nature of subjects are emphasized so strongly. It leads, however, to absurdities such as the claim that gold cannot be a subject (but is alternatively termed "an isolate"). This aspect of the theory has been criticized by Metcalfe (1973, p. 318). Metcalfe's skepticism regarding Ranganathan's theory is formulated in hard words (op. cit., p. 317): "This pseudo-science imposed itself on British disciples from about 1950 on...". It seems unacceptable that Ranganathan defines the word subject in a way that favors his own system. A scientific concept like "subject" should make it possible to compare different ways of establishing access to information. Whether or not subjects are combined or not should be examined once their definition has been given, it should not determined a priori, in the definition. Besides the emphasis on the combined, organizing and systematizing nature of subjects contains Ranganathan's definition of subject the pragmatic demand, that a subject should be determined in a way that suits a normal person's competency or specialization. Again we see a strange kind of wishful thinking mixing a general understanding of a concept with demands put by his own specific system. One thing is what the word subject means, quite another issue is how to provide subject descriptions that fulfill demands such as the specificity of a given information retrieval language which fulfill demands put on the system, such as precision and recall. If researchers too often define terms in ways that favor specific kinds of systems, that are such definitions not useful to provide more general theories about subjects, subject analysis and IR. Among other things are comparative studies of different kinds of systems made difficult. Based on these arguments, as well as additional arguments which have been used in the literature, we may conclude that Ranganathan's definition of the concept "subject" is not suited for scientific use. Like the definition of "subject" given by the ISO-standard for topic maps, may Ranganathan's definition be useful within his own closed system. The purpose of a scientific and scholarly field is, however, to examine the relative fruitfulness of systems such as topic maps and Colon Classification. For such purpose is another understanding of "subject" necessary. === Patrick Wilson (1927–2003) === In his book Wilson (1968) examined – in particular by thought experiments – the suitability of different methods of examining the subject of a document. The methods were: identifying the author's purpose for writing the document, weighing the relative dominance and subordination of different elements in the picture, which the reading imposes on the reader, grouping or count the document's use of concepts and references, construing a set of rules for selecting elements deemed necessary (as opposed to unnecessary) for the work as a whole. Patrick Wilson shows convincingly that each of these methods are insufficient to determine the subject of a document and is led to conclude ( p. 89): "The notion of the subject of a writing is indeterminate..." or, on p. 92 (about what users may expect to find using a particular position in a library classification system): "For nothing definite can be expected of the things found at any given position". In connection to the last quote has Wilson an interesting footnote in which he writes that authors of documents often use terms in ambiguous ways ("hostility" is used as an example). Even if the librarian could personally develop a very precise understanding of a concept, he would be unable to use it in his classification, because none of the documents use the term in the same precise way. Based on this argumentation is Wilson led to conclude: "If people write on what are for them ill-defined phenomena, a correct description of their subjects must reflect the ill-definedness". Wilson's concept of subject was discussed by Hjørland (1992) who found that it is problematic to give up the precise understanding of such a basic term in LIS. Wilson's arguments led him to an agnostic position which Hjørland found unacceptable and unnecessary. Concerning the authors' use of ambiguous terms, the role of the subject analysis is to determine which documents would be fruitful for users to identify whether or not the documents use one or another term or whether a given term i

    Read more →