Stochastic grammar

Stochastic grammar

A stochastic grammar (statistical grammar) is a grammar framework with a probabilistic notion of grammaticality: Stochastic context-free grammar Statistical parsing Data-oriented parsing Hidden Markov model (or stochastic regular grammar) Estimation theory The grammar is realized as a language model. Allowed sentences are stored in a database together with the frequency how common a sentence is. Statistical natural language processing uses stochastic, probabilistic and statistical methods, especially to resolve difficulties that arise because longer sentences are highly ambiguous when processed with realistic grammars, yielding thousands or millions of possible analyses. Methods for disambiguation often involve the use of corpora and Markov models. "A probabilistic model consists of a non-probabilistic model plus some numerical quantities; it is not true that probabilistic models are inherently simpler or less structural than non-probabilistic models." == Examples == A probabilistic method for rhyme detection is implemented by Hirjee & Brown in their study in 2013 to find internal and imperfect rhyme pairs in rap lyrics. The concept is adapted from a sequence alignment technique using BLOSUM (BLOcks SUbstitution Matrix). They were able to detect rhymes undetectable by non-probabilistic models.

Douglas Parkhill

Douglas F. Parkhill is a Canadian technologist and former research minister, best known for his pioneering work on what is now called cloud computing, and his work on Canada's Telidon videotex project. He started working at the Canadian ministry of Communications (now part of the Department of Trade and Industry) in 1969, having previously worked at the Mitre Corporation. He was responsible for many activities in communications satellites, computer communications, command and control systems and telecommunications. He was winner of the Treasury Board of Canada Secretariat's Outstanding Achievement award in 1982, the Conestoga shield for services to government and industry in computer communications research and development, the Touche Ross award for Telidon development. He was an author of several publications including the 1966 book, The Challenge of the Computer Utility. In the book, Parkhill thoroughly explored many of the modern-day characteristics of cloud computing (elastic provisioning through a utility service) as well as the comparison to the electricity industry and the use of public, private, government and community forms. The book won the McKinsey Foundation award for distinguished contributions to management literature. He worked with Dave Godfrey, the Canadian writer and novelist on a later book Gutenberg two about the social and political meaning of computer technology. He was in charge of research at the Federal Department of Communications at the time when the department was funding development of the Telidon videotext system, was heavily involved in promoting the system, and had overall control of the program. In a radio broadcast in 1980, he outlined some of the potential of the system, from financial information, to theatre reservations, with the ability to pay and print out tickets from the system. He later documented the history of the Telidon project, and the history of videotext in general. == Publications == The Challenge of the Computer Utility, Addison-Wesley, 1966, ISBN 0-201-05720-4 edited with Dave Godfrey, Gutenberg Two: The New Electronics and Social Change, Press Porcepic, 1979, ISBN 0-88878-191-1 The Beginning of a Beginning. Ottawa; Department of Communications, 1987. A history of the Telidon project.

Paola Velardi

Paola Velardi (born in Rome, April 26, 1955) is a full professor of computer science at Sapienza University in Rome, Italy. Her research encompasses Artificial Intelligence and specifically, natural language processing, machine learning business intelligence and semantic web. Velardi is one of the hundred female scientists included in the database "100esperte.it" (translated from Italian with "100 female experts"). This online, open database champions the recognition of top-rated female scientists in Science, Technology, Engineering and Mathematics (STEM) areas. Among her prestigious appointments and honors, her inclusion stands out —alongside 45 other international female scientists from the past, present, and future— in the Women in Science pavilion of UNESCO’s Virtual Science Museum. == Research == Paola Velardi's research activity has focused, since the early 1980s, on Artificial Intelligence, with a particular emphasis on natural language processing (NLP), Machine learning, and data mining. Her scientific contributions have evolved over time, following the sector's primary paradigms: Semantic Web and Ontologies: She is known for her pioneering work on semantic disambiguation and automated ontology learning, collaborating on the development of systems such as OntoLearn. Social Computing and Predictive Analysis: She has conducted research on extracting information from social media for epidemiological monitoring (syndromic surveillance) and for the identification of opinion leaders. In the educational field, she has developed machine learning models to predict the risk of student dropout. AI for Health and Elder Monitoring: She has coordinated projects to support frailty in the elderly, developing systems based on ambient intelligence and wearables to detect clinical and behavioral anomalies. She has also contributed to models for analyzing behavioral changes through dynamic clustering. Generative AI and Finance: More recently, her research has expanded into the use of generative AI and deep learning for finance, including benchmark studies on price trend prediction based on Limit Order Books (LOB) and the development of diffusion models for realistic market simulation (the TRADES project). According to Google Scholar bibliometrics updated until December 2025, Velardi's scientific publications have been cited more than 8100 times. Her h-index was 42. She has published more than 200 papers in international journals and conference proceedings. Some of her publications have been published in top rated journals such as Artificial Intelligence, Computational Linguistics, Knowledge-Based Systems, IEEE Transactions on Data and Knowledge Engineering , IEEE Transactions on Pattern Analysis and Machine Intelligence, IEEE Transactions on Computers, IEEE Transactions on Software Engineering , Data Mining and Knowledge Discovery, and Journal of Web Semantics. == Education and previous employments == Velardi graduated in electronic engineering from Sapienza University in 1978. From 1978 to 1983, she worked for the Ugo Bordoni Foundation, a research institution focusing on ICT and working under the supervision of the Italian Ministry of Economic Development. In 1983, she was a visiting scholar at Stanford University. During this period she became passionate about Artificial Intelligence, which will remain her area of research throughout her career. From 1984 to 1986, she came back to her natal city and worked as a researcher for IBM. From 1986 to 1996 she was an associate professor in the engineering faculty of Polytechnic University of the Marches (Ancona, Italy). Starting in November 1996, she taught in and did research for the Department of Computer Science at the Sapienza University. Velardi was the head of Bachelor and Master Programs in Computer Science at Sapienza University from 2010 to 2013 and from 2015 to 2016. == Current employment == Since November 2001, Velardi has been a full professor in the department of computer science ("Dipartimento di Informatica" in Italian) at Sapienza University in Rome, Italy. Since 2013, she has been the coordinator of the Distance Learning Degree in Computer Science at Sapienza University. As of today, Velardi is a Senior Associate at the Institute of Cognitive Sciences and Technologies (ISTC) of the CNR. == Recognition == Velardi is one of the hundred female scientists included in the database "100esperte.it" (translated from Italian with "100 female experts"). This database lists top Italian female STEM scientists. Six out of one hundred scientists in the 100esperte's database are computer scientists like Velardi. Velardi is in the list of the top Italian scientists. A top scientist appearing in the Top-Italian-Scientists database is a scientist whose h-index is greater than 30. In March 2017, she was given an IBM Faculty Award for her research on social recommender systems. In December 2018, Velardi was included in the list of the 50 most influential Italian women in science and technology by Inspiring Fifty, a non-profit that aims to increase diversity in STEM by making female role models in tech more visible. In September 2019 she was the local co-organizer and Program Chair of the 6th ACM Celebration of Women in Computing. In November 2019 Velardi received the Standout Woman Award International at the seat of the Italian Parliament in Montecitorio. == Causes == Velardi aims at debunking the myth of computer science as a man-oriented and "inflexible" discipline. She is the founder of the project "NERD? Non e' roba per donne?" (translated from Italian: "NERD? Is it not stuff for women?"). This project was launched by Velardi in 2012 in the Department of Computer Science at Sapienza University. Since 2013 the project has been carried out in partnership with IBM Italy, which later created a spin-off of the project. The goal of the project is two-fold: (1) conveying computer science as creative, interdisciplinary and problem-solving-oriented science, and (2) encouraging young female students in studying computer science by, for instance, developing apps for smartphones. She has been the program chair of the 19th ACM celebration of Women in Computing. She is the creator and coordinator of the G4GRETA, an educational project that involves students of the third and fourth grades of Rome and Lazio. The project combines the development of IT skills with the themes of environmental sustainability and soft skills (teambuilding, pitching, social networking, etc.) Velardi is also involved in scientific dissemination. In 2020 and 2021 she cooperated with RaiCultura, the cultural division of RAI, the national broadcasting company.

HOCR

hOCR is an open standard of data representation for formatted text obtained from optical character recognition (OCR). The definition encodes text, style, layout information, recognition confidence metrics and other information using Extensible Markup Language (XML) in the form of Hypertext Markup Language (HTML) or XHTML. == Software == The following OCR software can output the recognition result as hOCR file: OCRopus Tesseract Cuneiform ghostscript HebOCR gcv2hocr gImageReader == Example == The following example is an extract of an hOCR file: The recognized text is stored in normal text nodes of the HTML file. The distribution into separate lines and words is here given by the surrounding span tags. Moreover, the usual HTML entities are used, for example the p tag for a paragraph. Additional information is given in the properties such as: different layout elements such as "ocr_par", "ocr_line", "ocrx_word" geometric information for each element with a bounding box "bbox" language information "lang" some confidence values "x_wconf" == bbox == === General === The Layout of the Bounding Box Object or bbox Object is Grammar. property-name = "bbox" property-value = uint uint uint uint ==== Example ==== bbox 0 0 100 200 The bbox - short for "bounding box" - of an element is a rectangular box around this element, which is defined by the upper-left corner (x0, y0) and the lower-right corner (x1, y1). the values are with reference to the top-left corner of the document image and measured in pixels the order of the values are x0 y0 x1 y1 = "left top right bottom" ===== Usage ===== Use x_bboxes below for character bounding boxes. Do not use bbox unless the bounding box of the layout component is, in fact, rectangular, some non-rectangular layout components may have rectangular bounding boxes if the non-rectangularity is caused by floating elements around which text flows. The bounding box bbox of this line is shown in blue and it is span by the upper-left corner (10, 20) and the lower-right corner (160, 30). All coordinates are measured with reference to the top-left corner of the document image which border is drawn in black. == Searchable PDF files == The hOCR format is most commonly used in order to make searchable PDF files or as an extracted metadata of the PDF file. In order to create searchable PDF files we can use a scanned document image and a .hocr file of the particular image. We can use the following open source tools in order to achieve that. === hocr-tools === Source: hocr-tools is an open source library written in Python. It has a command-line utility attached in the scripts called hocr-pdf that enables us to convert standard hocr files to a searchable PDF file. It is also worth noting that the version for dealing with hocr files in RTL or non-Latin scripts like Arabic, we need to use the GitHub repository at the moment. hocr-pdf We can use the hocr-pdf utility using the following basic syntax. hocr-pdf—savefile final.pdf folder_images_and_hocr The folder_images_and_hocr must contain the respective .jpg and .hocr format files with their file extensions changed. ==== Known issues ==== Some of the known issues of hocr-pdf script in PyPI installation are the following. Not up to date with GitHub repository. hocr-pdf is broken on line 134 due to decodebytes() depreciated after Python 3.1 ==== Known fixes ==== Compile hocr-tools using latest GitHub repository. === hocr2pdf === hocr2pdf is another library that supports the conversion of hocr files. It is written in C++ and is cross-compatible with other libraries. It also has support for UTF-8 languages but that may require some additional debugging and browsing through some google conversation records to achieve that. According to Ubuntu Manpages,ExactImage is a fast C++ image processing library. Unlike many other library frameworks it allows operation in several color spaces and bit depths natively, resulting in low memory and computational requirements. hocr2pdf creates well layouted, searchable PDF files from hOCR (annotated HTML) input obtained from an OCR system. == hOCR to PDF attempts == In addition to the following discussed and stable libraries there have been many contributions to the hOCR format over the years with support from many of the early adopters of this format. You can get access to inlaying text on an Image with hOCR and converting that in a PDF file using Python 2 with this 12-year-old script as of 2021. This script can also be updated and made functional by converting that Python 2 Source code to Python 3 Supported Context. - HOCRConverter by jbrinley (Documentation) === HOCRConverter === The HOCRConverter is a script written in Python 2.x that can used in order to convert a hOCR file with a specified image file in order to convert it to a searchable PDF file. You can see the documentation using the link above. ==== Known issues ==== Has not been tested. Does not natively support Python 3.x

AI Humanizers Reviews: What Actually Works in 2026

Curious about the best AI humanizer? An AI humanizer is software that uses machine learning to help you get more done — it combines speed, accuracy, and an interface that just works. Hands-on testing shows real-world results vary, so a short free trial is the smartest way to decide. Whether you are a beginner or a pro, the right AI humanizer slots into your workflow and pays for itself fast. Read on for hands-on impressions, pricing tiers, and the standout features that matter.

Microapp

A microapp is a super-specialized application designed to perform one task or use case with the only objective of doing it well. They follow the single responsibility principle, which states that "a class should have one and only one reason to change." Micro applications help developers create less complex applications while reducing costs by breaking down monolithic systems into groups of independent services acting as one system. A good example of Microapps would be https://docs.citrix.com/en-us/legacy-archive/downloads/microapps.pdfthat provide single purpose action from Salesforce and over 40 applications on its workspace. == Requirements and characteristics == Microapps usually are accessible on any device, display, or operating system without installation on the viewer's device. To qualify as a microapp, the entity must: be built and deployed as an independent software module bring together various media types into a single experience have advanced security and compliance features be functionally-extensible comply with granular data demands be agnostic single use case oriented Microapps differentiate from traditional web or mobile applications by how the end-user interacts with them. Consequently, they can be embedded in websites or viewed online to bypass app stores and are typically built to provide a focused experience to the user. == Usage == Microapps are typically used for commercial purposes to reduce development costs for projects not requiring the large scope of a traditional web or mobile application. In addition, they are often used to showcase in-depth information or enrich marketing material with interactivity. Lately, micro apps are being used to boost productivity by providing quick tools to people to reuse best practices. Users have been interacting with microapps for a while with suites like Microsoft 365 and Google Workspace, where each one of their end-user services could be considered as a microapp. All these microapps share a unique identity manager to provide a unified user experience. == Benefits == Replacing monolith systems with microapps provide several advantages like: Reduce complexity for developers and users. Smaller, more cohesive, and maintainable codebases Scalable organizations with decoupled, autonomous teams Allows for hyper-specialization Independent deployment Multi-stack == Cloud-native microapps == Technologies like Kubernetes, or OpenShift, allow companies to replace their monolith and legacy systems with modular software taking advantage of microapps on reducing costs and improve reliability and security. == Microapps vs. microservices == There is a widespread misunderstanding between these two concepts, which is the key difference. Microservices is an architectural style that is systems-centric, meaning it decouples the presentation and data layer using web services APIs. On the other side, micro apps behave more as a super-architecture style (that embraces microservices among other types), and it is user-centric, meaning they decouple the whole monolith system onto modules that are designed to interact with final users. Both architectural styles rely on modularity to provide high performance, scalability, and resilience. == Considerations == Developing Micro apps requires a different approach than traditional software, and user experience is crucial. The following considerations are essential for switching to microapps. To run multiple microapps is required a single identity management system. Microservices are well suited to make microapps more powerful Apps with different levels of maturity might create a non-unified user experience. Duplication of dependencies can create security issues and inefficiencies. Suitable for well-organized teams

Google matrix

A Google matrix is a particular stochastic matrix that is used by Google's PageRank algorithm. The matrix represents a graph with edges representing links between pages. The PageRank of each page can then be generated iteratively from the Google matrix using the power method. However, in order for the power method to converge, the matrix must be stochastic, irreducible and aperiodic. == Adjacency matrix A and Markov matrix S == In order to generate the Google matrix G, we must first generate an adjacency matrix A which represents the relations between pages or nodes. Assuming there are N pages, we can fill out A by doing the following: A matrix element A i , j {\displaystyle A_{i,j}} is filled with 1 if node j {\displaystyle j} has a link to node i {\displaystyle i} , and 0 otherwise; this is the adjacency matrix of links. A related matrix S corresponding to the transitions in a Markov chain of given network is constructed from A by dividing the elements of column "j" by a number of k j = Σ i = 1 N A i , j {\displaystyle k_{j}=\Sigma _{i=1}^{N}A_{i,j}} where k j {\displaystyle k_{j}} is the total number of outgoing links from node j to all other nodes. The columns having zero matrix elements, corresponding to dangling nodes, are replaced by a constant value 1/N. Such a procedure adds a link from every sink, dangling state a {\displaystyle a} to every other node. Now by the construction the sum of all elements in any column of matrix S is equal to unity. In this way the matrix S is mathematically well defined and it belongs to the class of Markov chains and the class of Perron-Frobenius operators. That makes S suitable for the PageRank algorithm. == Construction of Google matrix G == Then the final Google matrix G can be expressed via S as: G i j = α S i j + ( 1 − α ) 1 N ( 1 ) {\displaystyle G_{ij}=\alpha S_{ij}+(1-\alpha ){\frac {1}{N}}\;\;\;\;\;\;\;\;\;\;\;(1)} By the construction the sum of all non-negative elements inside each matrix column is equal to unity. The numerical coefficient α {\displaystyle \alpha } is known as a damping factor. Usually S is a sparse matrix and for modern directed networks it has only about ten nonzero elements in a line or column, thus only about 10N multiplications are needed to multiply a vector by matrix G. == Examples of Google matrix == An example of the matrix S {\displaystyle S} construction via Eq.(1) within a simple network is given in the article CheiRank. For the actual matrix, Google uses a damping factor α {\displaystyle \alpha } around 0.85. The term ( 1 − α ) {\displaystyle (1-\alpha )} gives a surfer probability to jump randomly on any page. The matrix G {\displaystyle G} belongs to the class of Perron-Frobenius operators of Markov chains. The examples of Google matrix structure are shown in Fig.1 for Wikipedia articles hyperlink network in 2009 at small scale and in Fig.2 for University of Cambridge network in 2006 at large scale. == Spectrum and eigenstates of G matrix == For 0 < α < 1 {\displaystyle 0<\alpha <1} there is only one maximal eigenvalue λ = 1 {\displaystyle \lambda =1} with the corresponding right eigenvector which has non-negative elements P i {\displaystyle P_{i}} which can be viewed as stationary probability distribution. These probabilities ordered by their decreasing values give the PageRank vector P i {\displaystyle P_{i}} with the PageRank K i {\displaystyle K_{i}} used by Google search to rank webpages. Usually one has for the World Wide Web that P ∝ 1 / K β {\displaystyle P\propto 1/K^{\beta }} with β ≈ 0.9 {\displaystyle \beta \approx 0.9} . The number of nodes with a given PageRank value scales as N P ∝ 1 / P ν {\displaystyle N_{P}\propto 1/P^{\nu }} with the exponent ν = 1 + 1 / β ≈ 2.1 {\displaystyle \nu =1+1/\beta \approx 2.1} . The left eigenvector at λ = 1 {\displaystyle \lambda =1} has constant matrix elements. With 0 < α {\displaystyle 0<\alpha } all eigenvalues move as λ i → α λ i {\displaystyle \lambda _{i}\rightarrow \alpha \lambda _{i}} except the maximal eigenvalue λ = 1 {\displaystyle \lambda =1} , which remains unchanged. The PageRank vector varies with α {\displaystyle \alpha } but other eigenvectors with λ i < 1 {\displaystyle \lambda _{i}<1} remain unchanged due to their orthogonality to the constant left vector at λ = 1 {\displaystyle \lambda =1} . The gap between λ = 1 {\displaystyle \lambda =1} and other eigenvalue being 1 − α ≈ 0.15 {\displaystyle 1-\alpha \approx 0.15} gives a rapid convergence of a random initial vector to the PageRank approximately after 50 multiplications on G {\displaystyle G} matrix. At α = 1 {\displaystyle \alpha =1} the matrix G {\displaystyle G} has generally many degenerate eigenvalues λ = 1 {\displaystyle \lambda =1} (see e.g. [6]). Examples of the eigenvalue spectrum of the Google matrix of various directed networks is shown in Fig.3 from and Fig.4 from. The Google matrix can be also constructed for the Ulam networks generated by the Ulam method [8] for dynamical maps. The spectral properties of such matrices are discussed in [9,10,11,12,13,15]. In a number of cases the spectrum is described by the fractal Weyl law [10,12]. The Google matrix can be constructed also for other directed networks, e.g. for the procedure call network of the Linux Kernel software introduced in [15]. In this case the spectrum of λ {\displaystyle \lambda } is described by the fractal Weyl law with the fractal dimension d ≈ 1.3 {\displaystyle d\approx 1.3} (see Fig.5 from ). Numerical analysis shows that the eigenstates of matrix G {\displaystyle G} are localized (see Fig.6 from ). Arnoldi iteration method allows to compute many eigenvalues and eigenvectors for matrices of rather large size [13]. Other examples of G {\displaystyle G} matrix include the Google matrix of brain [17] and business process management [18], see also. Applications of Google matrix analysis to DNA sequences is described in [20]. Such a Google matrix approach allows also to analyze entanglement of cultures via ranking of multilingual Wikipedia articles abouts persons [21] == Historical notes == The Google matrix with damping factor was described by Sergey Brin and Larry Page in 1998 [22], see also articles on PageRank history [23], [24].