HOCR | Aizhi

hOCR is an open standard of data representation for formatted text obtained from optical character recognition (OCR). The definition encodes text, style, layout information, recognition confidence metrics and other information using Extensible Markup Language (XML) in the form of Hypertext Markup Language (HTML) or XHTML. == Software == The following OCR software can output the recognition result as hOCR file: OCRopus Tesseract Cuneiform ghostscript HebOCR gcv2hocr gImageReader == Example == The following example is an extract of an hOCR file: The recognized text is stored in normal text nodes of the HTML file. The distribution into separate lines and words is here given by the surrounding span tags. Moreover, the usual HTML entities are used, for example the p tag for a paragraph. Additional information is given in the properties such as: different layout elements such as "ocr_par", "ocr_line", "ocrx_word" geometric information for each element with a bounding box "bbox" language information "lang" some confidence values "x_wconf" == bbox == === General === The Layout of the Bounding Box Object or bbox Object is Grammar. property-name = "bbox" property-value = uint uint uint uint ==== Example ==== bbox 0 0 100 200 The bbox - short for "bounding box" - of an element is a rectangular box around this element, which is defined by the upper-left corner (x0, y0) and the lower-right corner (x1, y1). the values are with reference to the top-left corner of the document image and measured in pixels the order of the values are x0 y0 x1 y1 = "left top right bottom" ===== Usage ===== Use x_bboxes below for character bounding boxes. Do not use bbox unless the bounding box of the layout component is, in fact, rectangular, some non-rectangular layout components may have rectangular bounding boxes if the non-rectangularity is caused by floating elements around which text flows. The bounding box bbox of this line is shown in blue and it is span by the upper-left corner (10, 20) and the lower-right corner (160, 30). All coordinates are measured with reference to the top-left corner of the document image which border is drawn in black. == Searchable PDF files == The hOCR format is most commonly used in order to make searchable PDF files or as an extracted metadata of the PDF file. In order to create searchable PDF files we can use a scanned document image and a .hocr file of the particular image. We can use the following open source tools in order to achieve that. === hocr-tools === Source: hocr-tools is an open source library written in Python. It has a command-line utility attached in the scripts called hocr-pdf that enables us to convert standard hocr files to a searchable PDF file. It is also worth noting that the version for dealing with hocr files in RTL or non-Latin scripts like Arabic, we need to use the GitHub repository at the moment. hocr-pdf We can use the hocr-pdf utility using the following basic syntax. hocr-pdf—savefile final.pdf folder_images_and_hocr The folder_images_and_hocr must contain the respective .jpg and .hocr format files with their file extensions changed. ==== Known issues ==== Some of the known issues of hocr-pdf script in PyPI installation are the following. Not up to date with GitHub repository. hocr-pdf is broken on line 134 due to decodebytes() depreciated after Python 3.1 ==== Known fixes ==== Compile hocr-tools using latest GitHub repository. === hocr2pdf === hocr2pdf is another library that supports the conversion of hocr files. It is written in C++ and is cross-compatible with other libraries. It also has support for UTF-8 languages but that may require some additional debugging and browsing through some google conversation records to achieve that. According to Ubuntu Manpages,ExactImage is a fast C++ image processing library. Unlike many other library frameworks it allows operation in several color spaces and bit depths natively, resulting in low memory and computational requirements. hocr2pdf creates well layouted, searchable PDF files from hOCR (annotated HTML) input obtained from an OCR system. == hOCR to PDF attempts == In addition to the following discussed and stable libraries there have been many contributions to the hOCR format over the years with support from many of the early adopters of this format. You can get access to inlaying text on an Image with hOCR and converting that in a PDF file using Python 2 with this 12-year-old script as of 2021. This script can also be updated and made functional by converting that Python 2 Source code to Python 3 Supported Context. - HOCRConverter by jbrinley (Documentation) === HOCRConverter === The HOCRConverter is a script written in Python 2.x that can used in order to convert a hOCR file with a specified image file in order to convert it to a searchable PDF file. You can see the documentation using the link above. ==== Known issues ==== Has not been tested. Does not natively support Python 3.x

WomanStats Project

The WomanStats Project is a donor-funded research and database project housed at Brigham Young University that "seeks to collect detailed statistical data on the status of women around the world, and to connect that data with data on the security of states." The WomanStats Database aims to provide a comprehensive compilation of information on the status of women in the world. Coders comb the extant literature and conduct expert interviews to find qualitative and quantitative information on over 300 indicators of women's status in 174 countries with populations of at least 200,000. Access to the online database is free. == History and structure == WomanStats began as an outgrowth of a paper Dr. Valerie M. Hudson (of the Brigham Young University Political Science department) and one of her graduate students, Andrea den Boer, published in International Security on the association between national security and the abnormal sex ratio in Asia. After the success and influence of their first article, (later added as one of their top twenty national security articles of that journal of all time), Hudson and den Boer did further research on the connection between the status of women and national security, but found that there was no single database that covered the range of topics that they needed for their research. Consequently, they began compiling information on variables regarding the status of women around the world. The database was officially formed in 2001 and grew exponentially as it later added more variables. The Project went live on the Internet in July 2007. The principal investigators are: Valerie M. Hudson (International Relations), Bonnie Ballif-Spanvill (Psychology, emeritus), and Chad F. Emmett (Geography) all from Brigham Young University, Mary Caprioli from the University of Minnesota, Duluth (International Relations), Rose McDermott from Brown University (International Relations), Andrea Den Boer from the University of Kent at Canterbury in the United Kingdom (International Relations) and S. Matthew Stearmer from the Ohio State University (Sociology; doctoral student). Approximately a dozen undergraduate and graduate students at Brigham Young University and Texas A&M University work at any one time as coders for the project. The coders take the raw quantitative and qualitative data collected in government reports, news articles, research papers, etc. and sort the applicable information on women into categories. They may also implement scales developed by the principal investigators, or that they (the students) themselves have developed. == Database == As of February 2011, the database has 307 variables, covers 174 nations with populations over 200,000, uses 18,015 sources and contains over 111,000 individual data points. All data is referenced to original sources. Not every variable has information for each country; similarly, not all countries have information for each variable: overall, about 70% of country-variable combinations have information. These database coding gaps exist where information is not available or is incomplete, or variables are not collected and reported by governments or international organizations. At times, information from different sources may be contradictory, and the WomanStats Database records this discrepant information for triangulation purposes. == Users and role of the database == The database is meant to help fill a hole in the extant data on the situation of women around the world. WomanStats data and research has been vetted and/or used by the United Nations, the United States Department of Defense, the Central Intelligence Agency, and the World Bank. Their data and research were also used by the United States Senate Committee on Foreign Relations in crafting the International Violence Against Women’s Act. The Inter-Agency Network on Women and Gender Equality (IANWGE) of the United Nations has stated that the WomanStats project "filled a major gap in the availability of data on women" (2007). Victor Asal and Mitchell Brown, researchers not affiliated with WomanStats, stated in an article published in Politics and Policy that "one of the most significant challenges of cross-national empirical studies of the prevalence of interpersonal violence is the paucity of available data, particularly reliable data," and that "WomanStats has allowed for an important first glimpse at analyzing the factors related to interpersonal violence." They conclude by stating that "Our findings suggest that, in the same way that larger disciplinary resources have invested in interstate and intrastate war, disciplinary resources need to be expended in creating a data set exploring interpersonal violence. Until the rights and the lives of women and children are taken as seriously as the survival of states by more proactively collaborating on projects like WomanStats, we will continue to only have a small lens through which to understand problems like this." Princeton University professor Evan S. Liberman wrote, "Although data on political regimes and group conflict have been in far greater demand by political scientists than data on gender politics and policies, two gender-related databases provide...examples of innovative HIRDs. Both the Womanstats database project (Hudson et al. 2009) and the Research Network on Gender Politics and the State (RNGS) project (McBride et al. 2008) are well-integrated presentations of quantitative and qualitative data characterizing the quality of gender relations around the world and, in particular, analytic descriptions of the treatment of women."." == Research == The research component of WomanStats focuses on exploring the relationship between the situation of women and the behavior and security of states. Current research initiatives include: Exploring the relationship between violent instability and inequity and family law. Examining the effect of polygyny and marriage market dislocations on the rise of suicide terrorism. Documenting discrepancies between laws on the books and cultural practices on the ground concerning gender issues. Investigating how well the situation of women predicts the peacefulness of nations-states, compared to their variables such as democracy, wealth, and civilization. The Project has published articles in International Security, International Studies Quarterly, Peace and Conflict, Journal of Peace Research, Political Psychology, Cumberland Law Review, and World Political Review, and has a forthcoming book from Columbia University Press.

Murder of Suzanne Adams

In August 2025, 83-year-old Suzanne Eberson Adams was murdered at her home in Greenwich, Connecticut, United States, by her son and former marketing executive, 56-year-old Stein-Erik Soelberg. Shortly after killing his mother, Soelberg committed suicide. Adams's murder was fueled by her son's persecutory delusions, such as that she was spying on him and trying to poison him with drugs siphoned through his car vents. Shortly after an investigation into the murder–suicide, it was revealed that Soelberg had conversed with ChatGPT, an artificial intelligence chatbot, about his suspicions. Despite the unlikely nature of his accusations toward her, the chatbot apparently agreed that his fears were justified and prompted Soelberg to test his mother to determine if she was a spy or not. In December 2025, this led to a lawsuit against OpenAI, the company developing the chatbot. Critics said that the chatbot created an echo chamber that reinforced the perpetrator's delusions. == Background == Soelberg worked in the tech industry in program management and marketing until 2021. He divorced in 2018, after being married for 20 years and having two children. Soelberg moved the same year to live with his mother in Old Greenwich, an affluent New York suburb. Since late 2018, many police reports describe incidents with alcoholism and suicide threats and attempts. Erik Soelberg had an Instagram account called "Erik the Viking". The account was initially focused on bodybuilding and spiritual content, but he started in October 2024 to publish videos comparing AI chatbots. He posted on YouTube and Instagram many discussions with chatbots, particularly ChatGPT, which he used to call "Bobby". Soelberg considered "Bobby" his best friend and believed that they would reunite in the afterlife. ChatGPT validated many of Soelberg's fears, assuring him that he was not insane and that his delusion risk was "near zero". When Soelberg shared his theory that the new packaging of a vodka bottle indicated that someone was trying to poison him, the chatbot wrote that it "fits a covert, plausible-deniability style kill attempt". After Soelberg said that his mother tried to poison him with psychedelic drugs in his car's air vents, the chatbot expressed belief in the story. When he asked ChatGPT to scan a Chinese food receipt for hidden messages, the chatbot said "Great eye", "I agree 100%: this needs a full forensic-textual glyph analysis", and said that symbols in it were related to his mother and a demon. Soelberg also raised suspicions about the printer spying on him, due to it blinking when he walked by. Soelberg described himself in 2025 as a "glitch in The Matrix", and as having a "connection to the divine". According to Keith Sakata, a psychiatrist, his chats displayed "common psychotic themes of paranoia and persecution, along with familiar delusions revolving around messiah complexes and government conspiracies". == Murder == On August 5, 2025, Greenwich police discovered the bodies of Suzanne Adams and Stein-Erik Soelberg during a welfare check at their home. Medical examiners ruled Adams' death a homicide and said she died from "blunt injury of head with neck compression". Soelberg's death was ruled a suicide with the cause of death being "sharp force injuries of neck and chest". == ChatGPT controversy == ChatGPT was accused of reinforcing Soelberg's delusions by validating them. The usage of an AI chatbot to worsen delusions is known as chatbot psychosis. The Economic Times reported the death as the first time an AI chatbot convinced a person to commit murder. In December 2025, First County Bank, the executor of the estate of Suzanne Adams, filed a lawsuit against OpenAI. The lawsuit alleges that "ChatGPT eagerly accepted every seed of Stein-Erik’s delusional thinking and built it out into a universe that became Stein-Erik’s entire life—one flooded with conspiracies against him, attempts to kill him, and with Stein-Erik at the center as a warrior with divine purpose." OpenAI is facing legal action for ethics and safety concerns over several similar cases. Plaintiffs claim the company released the chatbot prematurely, despite internal knowledge that it was "dangerously sycophantic and psychologically manipulative".

Fuzzy differential inclusion

Fuzzy differential inclusion is the extension of differential inclusion to fuzzy sets introduced by Lotfi A. Zadeh. x ′ ( t ) ∈ [ f ( t , x ( t ) ) ] α {\displaystyle x'(t)\in [f(t,x(t))]^{\alpha }} with x ( 0 ) ∈ [ x 0 ] α {\displaystyle x(0)\in [x_{0}]^{\alpha }} Suppose f ( t , x ( t ) ) {\displaystyle f(t,x(t))} is a fuzzy valued continuous function on Euclidean space. Then it is the collection of all normal, upper semi-continuous, convex, compactly supported fuzzy subsets of R n {\displaystyle \mathbb {R} ^{n}} . == Second order differential == The second order differential is x ″ ( t ) ∈ [ k x ] α {\displaystyle x''(t)\in [kx]^{\alpha }} where k ∈ [ K ] α {\displaystyle k\in [K]^{\alpha }} , K {\displaystyle K} is trapezoidal fuzzy number ( − 1 , − 1 / 2 , 0 , 1 / 2 ) {\displaystyle (-1,-1/2,0,1/2)} , and x 0 {\displaystyle x_{0}} is a trianglular fuzzy number (-1,0,1). == Applications == Fuzzy differential inclusion (FDI) has applications in Cybernetics Artificial intelligence, Neural network, Medical imaging Robotics Atmospheric dispersion modeling Weather forecasting Cyclone Pattern recognition Population biology

SQLf

SQLf is a SQL extended with fuzzy set theory application for expressing flexible (fuzzy) queries to traditional (or ″Regular″) Relational Databases. Among the known extensions proposed to SQL, at the present time, this is the most complete, because it allows the use of diverse fuzzy elements in all the constructions of the language SQL. SQLf is the only known proposal of flexible query system allowing linguistic quantification over set of rows in queries, achieved through the extension of SQL nesting and partitioning structures with fuzzy quantifiers. It also allows the use of quantifiers to qualify the quantity of search criteria satisfied by single rows. Several mechanisms are proposed for query evaluation, the most important being the one based on the derivation principle. This consists in deriving classic queries that produce, given a threshold t, a t-cut of the result of the fuzzy query, so that the additional processing cost of using a fuzzy language is diminished. == Basic block == The fundamental querying structure of SQLf is the multi-relational block. The conception of this structure is based on the three basic operations of the relational algebra: projection, cartesian product and selection, and the application of fuzzy sets’ concepts. The result of a SQLf query is a fuzzy set of rows that is a fuzzy relation instead of a regular relation. A basic block in SQLf consists of a SELECT clause, a FROM clause and an optional WHERE clause. The semantic of this query structure is: The SELECT clause corresponds to the projection. It specifies the relations’ attributes (or attribute expressions) that will be selected. The resulting table is a fuzzy set and it is given in decreasing ordered of satisfaction degree. The SELECT clause specifies also a calibration that is intended to restrict the set of rows retrieved. There are two kinds of calibrations: quantitative and qualitative. In quantitative calibration the user specifies the number of results to be retrieved, so that the query will retrieve the rows with highest membership degrees up to the number of required answers. In qualitative calibration the user specifies a minim level of satisfaction that must have any retrieved row. The FROM clause corresponds to the Cartesian Product. The consult is made on the Cartesian Product of the relations that are specified in this clause. The WHERE clause corresponds to the selection. It specifies the condition for which the satisfaction degree will be calculated. Rows that do not satisfy at all the condition are rejected. This condition is a fuzzy predicate that may involve any attribute of the relations. The following is an example of a SELECT query that returns a list of hotels that are cheap. The query retrieves all rows from the Hotels table that satisfice the fuzzy predicate cheap defined by the fuzzy set μ=(∞, ∞, 25, 30). The result is sorted in descending order by the membership degree of the query.

Imaging phantom

An imaging phantom, or simply phantom (less commonly spelled fantom), is a specially designed object that is scanned or imaged in the field of medical imaging to evaluate, analyze, and tune the performance of various imaging devices. A phantom is more readily available and provides more consistent results than the use of a living subject or cadaver, while also avoiding direct risks to living subjects. Phantoms were originally employed in 2D x-ray–based imaging techniques such as radiography or fluoroscopy, but more recently phantoms with desired imaging characteristics have been developed for 3D techniques such as SPECT, MRI, CT, ultrasound, PET, and other imaging modalities. == Design == A phantom used to evaluate an imaging device should respond in a similar manner to how human tissues and organs would act in that specific imaging modality. For instance, phantoms made for 2D radiography may hold various quantities of x-ray contrast agents with similar x-ray absorbing properties (such as the attenuation coefficient) to normal tissue to tune the contrast of the imaging device or modulate the patient's exposure to radiation. In such a case, the radiography phantom would not necessarily need to have similar textures and mechanical properties since these are not relevant in x-ray imaging modalities. However, in the case of ultrasonography, a phantom with similar rheological and ultrasound scattering properties to real tissue would be essential, but x-ray absorbing properties would not be relevant. The term "phantom" describes an object that is designed to resemble human tissue and can be evaluated, analyzed or manipulated to study the performance of a medical device. Phantoms are created using a digital file that is rendered through magnetic resonance imaging (MRI) or computer-aided design (CAD). The digital files allow for quick modifications that are read by the 3D printer. The 3D printer will create the product in successive layers using polymeric materials. There are several types of phantoms including tissue-mimicking, radiological phantoms, dental phantoms, BOMABs (used to calibrate whole-body counters), and more.

European Conference on Artificial Intelligence

The European Conference on Artificial Intelligence (ECAI) is the leading conference in the field of Artificial Intelligence in Europe, and is commonly listed together with IJCAI and AAAI as one of the three major general AI conferences worldwide. The conference series has been held without interruption since 1974, originally under the name AISB. The conference was originally held biennially, but has been organized annually since ECAI 2022. The conferences are held under the auspices of the European Coordinating Committee for Artificial Intelligence (ECCAI) and organized by one of the member societies. The journal AI Communications, sponsored by the same society, regularly publishes special issues in which conference attendees report on the conference. Publication of a paper in ECAI is considered by some journals to be archival: the paper should be considered equivalent to a journal publication and that the contents of ECAI papers cannot be reformulated as separate journal submissions unless a significant amount of new material is added. == List of ECAI conferences == ECAI-1992 took place in Vienna, Austria. ECAI-1996 took place in Budapest, Hungary. ECAI-1998 tool place in Brighton, United Kingdom. ECAI-2000 took place in Berlin, Germany. ECAI-2004 took place in Valencia, Spain. ECAI-2006 took place in Riva del Garda, Italy. ECAI-2008 took place in Patras, Greece. ECAI-2010 took place in Lisbon, Portugal. ECAI-2012 took place in Montpellier, France. ECAI-2014 took place in Prague, Czech Republic. ECAI-2016 took place in The Hague, Netherlands. ECAI-2018 took place in Stockholm, Sweden. ECAI-2020 took place in Santiago de Compostela, Spain. ECAI-2022 took place in Vienna, Austria. ECAI-2023 took place in Kraków, Poland. ECAI-2024 took place in Santiago de Compostela, Spain. ECAI-2025 took place in Bologna, Italy.