AI Assistant For Writing

AI Assistant For Writing — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Apache Kudu

    Apache Kudu

    Apache Kudu is a free and open source column-oriented data store of the Apache Hadoop ecosystem. It is compatible with most of the data processing frameworks in the Hadoop environment. It provides completeness to Hadoop's storage layer to enable fast analytics on fast data. The open source project to build Apache Kudu began as internal project at Cloudera. The first version Apache Kudu 1.0 was released 19 September 2016. == Comparison with other storage engines == Kudu was designed and optimized for OLAP workloads. Like HBase, it is a real-time store that supports key-indexed record lookup and mutation. Kudu differs from HBase since Kudu's datamodel is a more traditional relational model, while HBase is schemaless. Kudu's "on-disk representation is truly columnar and follows an entirely different storage design than HBase/Bigtable".

    Read more →
  • 2024 National Public Data breach

    2024 National Public Data breach

    In August 2024, three class-action lawsuits were filed against National Public Data along with over 14 complaints filed in federal court, claiming that the company permitted hackers to steal sensitive private information covering millions of individuals. The theft was alleged to have occurred in April 2024. One of the lawsuits specifically claims that in April, a hacker going by the moniker "USDoD" posted a notice on the dark web, offering the data for sale at the price of US$3.5 million. The information stolen is alleged to include 2.9 billion records containing full names, current and past addresses, Social Security numbers, dates of birth, and telephone numbers. The stolen data contains records for people in the US, UK, and Canada. National Public Data confirmed on August 16, 2024, there was a breach originating from someone trying to breach their systems since December 2023, with the breach occurring from April 2024 and over the next few months. The company also confirmed that 2.9 billion records were obtained, though they were still working to determine how many people were affected by the breach, and were working with law enforcement to identify the hacker. == Jerico Pictures == Jerico Pictures, Inc., doing business as National Public Data, was a data broker company that performed employee background checks. Their primary service was collecting information from public data sources, including criminal records, addresses, and employment history, and offering that information for sale. On October 2, 2024, Jerico Pictures filed for Chapter 11 bankruptcy as it currently faces over a dozen lawsuits over the breach, and is potentially liable "for credit monitoring for hundreds of millions of potentially impacted individuals." In December 2024, National Public Data shut down, showing a closure notice on its website.

    Read more →
  • List of security assessment tools

    List of security assessment tools

    This is a list of available software and hardware tools that are designed for or are particularly suited to various kinds of security assessment and security testing. == Operating systems and tool suites == Several operating systems and tool suites provide bundles of tools useful for various types of security assessment. === Operating system distributions === Kali Linux (formerly BackTrack), a penetration-test-focused Linux distribution based on Debian Pentoo, a penetration-test-focused Linux distribution based on Gentoo ParrotOS, a Linux distro focused on penetration testing, forensics, and online anonymity. == Tools ==

    Read more →
  • Reflection (computer graphics)

    Reflection (computer graphics)

    Reflection in computer graphics is used to render reflective objects like mirrors and shiny surfaces. Accurate reflections are commonly computed using ray tracing whereas approximate reflections can usually be computed faster by using simpler methods such as environment mapping. Reflections on shiny surfaces like wood or tile can add to the photorealistic effects of a 3D rendering. == Approaches to reflection rendering == For rendering environment reflections there exist many techniques that differ in precision, computational and implementation complexity. Combination of these techniques are also possible. Image order rendering algorithms based on tracing rays of light, such as ray tracing or path tracing, typically compute accurate reflections on general surfaces, including multiple reflections and self reflections. However these algorithms are generally still too computationally expensive for real time rendering (even though specialized HW exists, such as Nvidia RTX) and require a different rendering approach from typically used rasterization. Reflections on planar surfaces, such as planar mirrors or water surfaces, can be computed simply and accurately in real time with two pass rendering — one for the viewer, one for the view in the mirror, usually with the help of stencil buffer. Some older video games used a trick to achieve this effect with one pass rendering by putting the whole mirrored scene behind a transparent plane representing the mirror. Reflections on non-planar (curved) surfaces are more challenging for real time rendering. Main approaches that are used include: Environment mapping (e.g. cube mapping): a technique that has been widely used e.g. in video games, offering reflection approximation that's mostly sufficient to the eye, but lacking self-reflections and requiring pre-rendering of the environment map. The precision can be increased by using a spatial array of environment maps instead of just one. It is also possible to generate cube map reflections in real time, at the cost of memory and computational requirements. Screen space reflections (SSR): a more expensive technique that traces rays come from pixel data.This requires the data of surface normal and either depth buffer (local space) or position buffer (world space).The disadvantage is that objects not captured in the rendered frame cannot appear in the reflections, which results in unresolved and or false intersections causing artefacts such as reflection vanishment and virtual image. SSR was originally introduced as Real Time Local Reflections in CryENGINE 3. == Types of reflection == Polished - A polished reflection is an undisturbed reflection, like a mirror or chrome surface. Blurry - A blurry reflection means that tiny random bumps, or microfacets, on the surface of the material causes the reflection to be blurry. Metallic - A reflection is metallic if the highlights and reflections retain the color of the reflective object. Glossy - This term can be misused: sometimes, it is a setting which is the opposite of blurry (e.g. when "glossiness" has a low value, the reflection is blurry). Sometimes the term is used as a synonym for "blurred reflection". Glossy used in this context means that the reflection is actually blurred. === Polished or mirror reflection === Mirrors are usually almost 100% reflective. === Metallic reflection === Normal (nonmetallic) objects reflect light and colors in the original color of the object being reflected. Metallic objects reflect lights and colors altered by the color of the metallic object itself. === Blurry reflection === Many materials are imperfect reflectors, where the reflections are blurred to various degrees due to surface roughness that scatters the rays of the reflections. === Glossy reflection === Fully glossy reflection, shows highlights from light sources, but does not show a clear reflection from objects. == Examples of reflections == === Wet floor reflections === The wet floor effect is a graphic effects technique popular in conjunction with Web 2.0 style pages, particularly in logos. The effect can be done manually or created with an auxiliary tool which can be installed to create the effect automatically. Unlike a standard computer reflection (and the Java water effect popular in first-generation web graphics), the wet floor effect involves a gradient and often a slant in the reflection, so that the mirrored image appears to be hovering over or resting on a wet floor.

    Read more →
  • Indic computing

    Indic computing

    Indic Computing means "computing in Indic", i.e., Indian Scripts and Languages. It involves developing software in Indic Scripts/languages, Input methods, Localization of computer applications, web development, Database Management, Spell checkers, Speech to Text and Text to Speech applications and OCR in Indian languages. Unicode standard version 15.0 specifies codes for 9 Indic scripts in Chapter 12 titled "South and Central Asia-I, Official Scripts of India". The 9 scripts are Bengali, Devanagari, Gujarati, Gurmukhi, Kannada, Malayalam, Oriya, Tamil and Telugu. A lot of Indic Computing projects are going on. They involve some government sector companies, some volunteer groups and individual people. == Government sector == Indian Union Government made it mandatory for Mobile phone companies whose handsets manufactured, stored, sold and distributed in India to have support for displaying and typing text using fonts for all 22 languages. This move has seen rise in use of Indian languages by millions of users. === TDIL === The Department of Electronics and Information Technology, India initiated the TDIL (Technology Development for Indian Languages) with the objective of developing Information Processing Tools and Techniques to facilitate human-machine interaction without a language barrier; creating and accessing multilingual knowledge resources; and integrating them to develop innovative user products and services. In 2005, it started distributing language software tools developed by Government/Academic/Private companies in the form of CD for non commercial use. Some of the outcomes of TDIL program have been deployed on Indian Language Technology Proliferation & Deployment Centre. This Centre disseminates all the linguistic resources, tools & applications which have been developed under TDIL funding. This programme took to exponential expansion under the leadership of Dr. Swaran Lata who also created international foot-print of the programme. She has now retired. === C-DAC === C-DAC is an India based government software company which is involved in developing language related software. It is best known for developing InScript Keyboard, the standard keyboard for Indian languages. It has also developed lot of Indic language solutions including Word Processors, typing tools, text to speech software, OCR in Indian languages etc. ==== BharateeyaOO.org ==== The work developed out of CDAC, Bangalore (earlier known as NCST, Bangalore) became BharateeyaOO. OpenOffice 2.1 had support for over 10 Indian languages. ==== BOSS ==== BOSS linux was developed by the Centre for Development of Advanced Computing (CDAC) to promote use of open-source software in India. == NGO and Volunteer groups == === Indlinux === Indlinux organisation helped organise the individual volunteers working on different indic language versions of Linux and its applications. === Sarovar === Sarovar.org is India's first portal to host projects under Free/Open source licenses. It is located in Trivandrum, India and hosted at Asianet data center. Sarovar.org is customised, installed and maintained by Linuxense as part of their community services and sponsored by River Valley Technologies. Sarovar.org is built on Debian Etch and GForge and runs off METTLE. === Pinaak === Pinaak is a non-government charitable society devoted to Indic language computing. It works for software localization, developing language software, localizing open source software, enriching online encyclopedias etc. In addition to this Pinaak works for educating people about computing, ethical use of Internet and use of Indian languages on Internet. === Ankur Group === Ankur Group is working toward supporting Bengali language (Bengali) on Linux operating system including localized Bengali GUI, Live CD, English-to-Bengali translator, Bengali OCR and Bengali Dictionary etc. === BhashaIndia === === SMC === SMC is a free software group, working to bridge the language divide in Kerala in the technology front and is today the biggest language computing community in India. == Input methods == === Full size keyboards === With the advent of Unicode inputting Indic text on computer has become very easy. A number of methods exist for this purpose, but the main ones are:- ==== InScript ==== Inscript is the standard keyboard for Indian languages. Developed by C-DAC and standardized by Government of India. Nowadays it comes inbuilt in all major operating systems including Microsoft Windows (2000, XP, Vista, 7), Linux and Macintosh. ==== Phonetic transliteration ==== This is a typing method in which, for instance, the user types text in an Indian language using Roman characters and it is phonetically converted to equivalent text in Indian script in real time. This type of conversion is done by phonetic text editors, word processors and software plugins. Building up on the idea, one can use phonetic IME tools that allow Indic text to be input in any application. Some examples of phonetic transliterators are Xlit, Google Indic Transliteration, BarahaIME, Indic IME, Rupantar, SMC's Indic Keyboard and Microsoft Indic Language Input Tool. SMC's Indic Keyboard has support for as many as 23 languages whereas Google Indic Keyboard only supports 11 Indian languages. They can be broadly classified as: Fixed transliteration scheme based tools – They work using a fixed transliteration scheme to convert text. Some examples are Indic IME, Rupantar and BarahaIME. Intelligent/Learning based transliteration tools – They compare the word with a dictionary and then convert it to the equivalent words in the target language. Some of the popular ones are Google Indic Transliteration, Xlit, Microsoft Indic Language Input Tool and QuillPad. ==== Remington (typewriter) ==== This layout was developed when computers had not been invented or deployed with Indic languages, and typewriters were the only means to type text in Indic scripts. Since typewriters were mechanical and could not include a script processor engine, each character had to be placed on the keyboard separately, which resulted in a very complex and difficult to learn keyboard layout. With the advent of Unicode, the Remington layout was added to various typing tools for sake of backward compatibility, so that old typists did not have to learn a new keyboard layout. Nowadays this layout is only used by old typists who are used to this layout due to several years of usage. One tool to include Remington layout is Indic IME. A font that is based on the Remington keyboard layout is Kruti Dev. Another online tool that very closely supports the old Remington keyboard layout using Kruti Dev is the Remington Typing tool. === Braille === IBus Sharada Braille, which supports seven Indian languages was developed by SMC. === Mobile phones with Numeric keyboards === Mobile/Hand/cell phone basic models have 12 keys like the plain old telephone keypad. Each key is mapped to 3 or 4 English letters to facilitate data entry in English. For inputting Indian languages with this kind of keypad, there are two ways to do so. First is the Multi-tap Method and second uses visual help from the screen like Panini Keypad. The primary usage is SMS. 140 characters size used for English/Roman languages can be used to accommodate only about 70 language characters when Unicode Proprietary compression is used some times to increase the size of single message for Complex script languages like Hindi. A research study of the available methods and recommendations of proposed standard was released by Broadband Wireless Consortium of India (BWCI). ==== Transliteration/Phonetic methods ==== English is used to type in Indian languages. QuillPad IndiSMS ==== Native methods ==== In native methods, the letters of the language are displayed on the screen corresponding to the numeral keys based on the probabilities of those letters for that language. Additional letters can be accessed by using a special key. When a word is partially typed, options are presented from which the user can make a selection. === Smart phones with Qwerty keyboards === Most smart phones have about 35 keys catering primarily to the English language. Numerals and some symbols are accessed with a special key called Alt. Indic input methods are yet to evolve for these types of phones, as support of Unicode for rendering is not widely available. === For Smart Phones with Soft/Virtual keyboards === Inscript is being adopted for smart phone usage. For Android phones which can render Indic languages, Swalekh Multilingual Keypad Multiling Keyboard app are available. Gboard offers support for several Indian languages. == Localization == Localization means translating software, operating systems, websites etc. various applications in Indian language. Various volunteers groups are working in this direction. === Mandrake Tamil Version === A notable example is the Tamil version of Mandrake linux(defunct since 2011). Tamil speakers in Toronto (Canada) released Mandrake,

    Read more →
  • Collaboration-oriented architecture

    Collaboration-oriented architecture

    Collaboration Oriented Architecture (COA) is a computer system that is designed to collaborate, or use services, from systems that are outside of the operators control. Collaboration Oriented Architecture will often use Service Oriented Architecture to deliver the technical framework. Collaboration Oriented Architecture is the ability to collaborate between systems that are based on the Jericho Forum principles or "Commandments". Bill Gates and Craig Mundie (Microsoft) clearly articulated the need for people to work outside of their organizations in a secure and collaborative manner in their opening keynote to the RSA Security Conference in February 2007. Successful implementation of a Collaboration Oriented Architecture implies the ability to successfully inter-work securely over the Internet and will typically mean the resolution of the problems that come with de-perimeterisation. == Etymology == The term Collaboration Oriented Architectures was defined and developed in a meeting of the Jericho Forum at a meeting held at HSBC on 6 July 2007. == Definition == The key elements that qualify a security architecture as a Collaboration Oriented Architecture are as follows; Protocol: Systems use appropriately secure protocols to communicate. Authentication: The protocol is authenticated with user and/or system credentials. Federation: User and/or systems credentials are accepted and validated by systems that are not under your (locus of) control. Network Agnostic: The design does not rely on a secure network, thus it will operate securely from an Intranet to raw-Internet Trust: The collaborating system have the capacity to be able to confirm to a specified degree of confidence that the components in a transaction chain have. Risk: The collaborating systems can make a risk assessment on any transaction based on the communicated levels of required trust, based on the required degree of identity, confidentiality, integrity, availability. == Authentication == Working in a collaborative multi-sourced environment implies the need for authentication, authorization and accountability which must interoperate / exchange outside of your locus / area of control. People/systems must be able to manage permissions of resources and rights of users they don't control There must be capability of trusting an organization, which can authenticate individuals or groups, thus eliminating the need to create separate identities In principle, only one instance of person / system / identity may exist, but privacy necessitates the support for multiple instances, or one instance with multiple facets, often referred to as personas Systems must be able to pass on security credentials /assertions Multiple loci (areas) of control must be supported

    Read more →
  • Hit-testing

    Hit-testing

    In computer graphics programming, hit-testing (hit detection, picking, or pick correlation) is the process of determining whether a user-controlled cursor (such as a mouse cursor or touch-point on a touch-screen interface) intersects a given graphical object (such as a shape, line, or curve) drawn on the screen. Hit-testing may be performed on the movement or activation of a mouse or other pointing device. Hit-testing is used by GUI environments to respond to user actions, such as selecting a menu item or a target in a game based on its visual location. In web programming languages such as HTML, SVG, and CSS, this is associated with the concept of pointer-events (e.g. user-initiated cursor movement or object selection). Collision detection is a related concept for detecting intersections of two or more different graphical objects, rather than intersection of a cursor with one or more graphical objects. == Algorithm == There are many different algorithms that may be used to perform hit-testing, with different performance or accuracy outcomes. One common hit-test algorithm for axis aligned bounding boxes. A key idea is that the box being tested must be either entirely above, entirely below, entirely to the right or left of the current box. If this is not possible, they are colliding. Example logic is presented in the pseudo-code below: In Python:

    Read more →
  • Site Security Handbook

    Site Security Handbook

    The Site Security Handbook, RFC 2196, is a guide on setting computer security policies and procedures for sites that have systems on the Internet (however, the information provided should also be useful to sites not yet connected to the Internet). The guide lists issues and factors that a site must consider when setting their own policies. It makes a number of recommendations and provides discussions of relevant areas. This guide is only a framework for setting security policies and procedures. In order to have an effective set of policies and procedures, a site will have to make many decisions, gain agreement, and then communicate and implement these policies. The guide is a product of the IETF SSH working group, and was published in 1997, obsoleting the earlier RFC 1244 from 1991.

    Read more →
  • Doubao

    Doubao

    Doubao (Chinese: 豆包) is an artificial intelligence assistant developed by ByteDance. == History == The chatbot was launched in August 2023. By November 2024, it had become China's most popular AI chatbot, with approximately 60 million monthly active users according to industry analytics. == Design == Doubao is powered by Volcano Engine (Volcengine), 120 trillion tokens consumed per day. == Variants == === Dola === The international version of Doubao is Dola which was launched in August 2023 as Cici. Dola is powered by OpenAI's GPT series of large language models and by Google's Gemini.

    Read more →
  • Comparison of operating systems

    Comparison of operating systems

    These tables provide a comparison of operating systems, of computer devices, as listing general and technical information for a number of widely used and currently available PC or handheld (including smartphone and tablet computer) operating systems. The article "Usage share of operating systems" provides a broader, and more general, comparison of operating systems that includes servers, mainframes and supercomputers. Because of the large number and variety of available Linux distributions, they are all grouped under a single entry; see comparison of Linux distributions for a detailed comparison. There is also a variety of BSD and DOS operating systems, covered in comparison of BSD operating systems and comparison of DOS operating systems. == Nomenclature == The nomenclature for operating systems varies among providers and sometimes within providers. For purposes of this article the terms used are; kernel In some operating systems, the OS is split into a low level region called the kernel and higher level code that relies on the kernel. Typically the kernel implements processes but its code does not run as part of a process. hybrid kernel monolithic kernel Nucleus In some operating systems there is OS code permanently present in a contiguous region of memory addressable by unprivileged code; in IBM systems this is typically referred to as the nucleus. The nucleus typically contains both code that requires special privileges and code that can run in an unprivileged state. Typically some code in the nucleus runs in the context of a dispatching unit, e.g., address space, process, task, thread, while other code runs independent of any dispatching unit. In contemporary operating systems unprivileged applications cannot alter the nucleus. License and pricing policies vary widely among different systems. Among others, the tables below use the following terms: BSD BSD licenses are a family of permissive free software licenses, imposing minimal restrictions on the use and distribution of covered software. bundled The fee is included in the price of the hardware == General information == == Technical information == == Security == == Commands == For POSIX compliant (or partly compliant) systems like FreeBSD, Linux, macOS or Solaris, the basic commands are the same because they are standardized. NOTE: Linux systems may vary by distribution which specific program, or even 'command' is called, via the POSIX alias function. For example, if you wanted to use the DOS dir to give you a directory listing with one detailed file listing per line you could use alias dir='ls -lahF' (e.g. in a session configuration file).

    Read more →
  • Security and Privacy in Computer Systems

    Security and Privacy in Computer Systems

    Security and Privacy in Computer Systems is a paper by Willis Ware that was first presented to the public at the 1967 Spring Joint Computer Conference. == Significance == Ware's presentation was the first public conference session about information security and privacy in respect of computer systems, especially networked or remotely-accessed ones. The IEEE Annals of the History of Computing said that Ware's 1967 Spring Joint Computer Conference session, together with 1970's Ware report, marked the start of the field of computer security.

    Read more →
  • Text Database and Dictionary of Classic Mayan

    Text Database and Dictionary of Classic Mayan

    The project Text Database and Dictionary of Classic Mayan (abbr. TWKM) promotes research on the writing and language of pre-Hispanic Maya culture. It is housed in the Faculty of Arts at the University of Bonn and was established with funding from the North Rhine-Westphalian Academy of Sciences, Humanities and the Arts. The project has a projected run-time of fifteen years and is directed by Nikolai Grube from the Department of Anthropology of the Americas at the University of Bonn. The goal of the project is to conduct computer-based studies of all extant Maya hieroglyphic texts from an epigraphic and cultural-historical standpoint, and to produce and publish a database and a comprehensive dictionary of the Classic Mayan language. == Subject of the Project == The text database, as well as the dictionary that will be compiled by the conclusion of the project, will be assembled based on all known texts from the pre-Hispanic Maya culture. These texts were produced and used between approximately the third century B.C. through A.D. 1500, in a region that today includes parts of the countries of Mexico, Guatemala, Belize, and Honduras. The thousands of hieroglyphic inscriptions on monuments, ceramics, or daily objects that have survived into the present offer insight into the language's vocabulary and structure. The project's database and dictionary will digitally represent original spellings using the logo-syllabic Maya hieroglyphs, as well as their transcription and transliteration in the Roman alphabet. The data will be additionally annotated with various epigraphic analyses, translations, and further object-specific information. == Project Partners == TWKM will employ digital technologies in order to compile and make available the data and metadata, as well as to publish the project's research results. The project thereby methodologically positions itself in the field of the digital humanities. The project will be conducted in cooperation with the project partners (below), the research association for the eHumanities TextGrid, as well as the University and Regional Library of Bonn (ULB). The working environment that is currently under construction, in which the data and metadata will be compiled and annotated, will be realized in theTextGrid Laboratory, a software of the virtual research environment. A further component of this software, the TextGrid Repository, will make the data that are authorized for publication freely available online and ensure their long-term storage. The tools for data compilation and annotation attained from the modularly constructed and extended TextGrid lab thereby provide all the necessary materials for facilitating the research team's the typical epigraphic workflow. The workflow usually begins by documenting the texts and the objects on which they are preserved, and by compiling descriptive data. It then continues with the various levels of epigraphic and linguistic analysis, and concludes in the best case scenario with a translation of the analyzed inscription and a corresponding publication. In cooperation with the ULB, selected data will additionally be made available. The project's Virtual Inscription Archive will present online, in the Digital Collections of the ULB, hieroglyphic inscriptions selected from the published data in the repository, including an image of and brief information about the texts and the objects on which they are written, epigraphic analysis, and translation. == Project Goal == One of the project's goals is to produce a dictionary of Classic Mayan, in both digital and print form, towards the end of the project run-time. Additionally, a database with a corpus of inscriptions, including their translations and epigraphic analyses, will be made freely available online. The database furthermore will provide an ontology-like link of the contextual object data with the inscriptions and with each other, thereby allowing a cultural-historical arrangement of all contents within the periods of pre-Hispanic Maya culture. The contents of the database are additionally linked to citations of relevant literature. As a result, the database will also make freely available to both the scientific community and other interested parties a bibliography representing the research history and a base of knowledge concerning ancient Maya culture and script. In addition, the Classic Maya script, in its temporally defined stages of language development, will be gathered into and documented in a comprehensive language corpus with the aid of the information gathered by the project. In collaboration with all project participants, the corpus data can be used, together with the aid of various comparable analyses and also computational linguistic methods, such as inference-based methods, to confirm readings of some hieroglyphs that are currently only partially confirmed, and to eventually completely decipher the Classic Maya script.

    Read more →
  • FuseBase

    FuseBase

    FuseBase (previously Nimbus Note and Nimbus Platform) is a B2B SaaS platform. It is among the first to support the Model Context Protocol (MCP), an open standard enabling seamless integration of AI agents with external tools, systems, and data sources. == History == The platform was founded in 2014 as Nimbus Note, the platform started as a cross-platform note-taking and information management tool. As it evolved into Nimbus Platform, it added project management and client portal capabilities. In 2023, the company rebranded as FuseBase, pivoting to connect and automate both internal and external collaboration through AI Agents and cutting-edge protocol adoption like MCP. At the same time, FuseBase was named Product of the Year on Product Hunt. == Technical overview == The platform integrates the Model Context Protocol (MCP), an open-source framework created by Anthropic. MCP allows AI models to securely access and interact with external data, tools, and systems. This enables FuseBase AI Agents to gather relevant context, perform actions, and provide more advanced automation.

    Read more →
  • System Service Descriptor Table

    System Service Descriptor Table

    The System Service Descriptor Table (SSDT) is an internal dispatch table within Microsoft Windows. == Function == The SSDT maps syscalls to kernel function addresses. When a syscall is issued by a user space application, it contains the service index as parameter to indicate which syscall is called. The SSDT is then used to resolve the address of the corresponding function within ntoskrnl.exe. In modern Windows kernels, two SSDTs are used: One for generic routines (KeServiceDescriptorTable) and a second (KeServiceDescriptorTableShadow) for graphical routines. A parameter passed by the calling userspace application determines which SSDT shall be used. == Hooking == Modification of the SSDT allows to redirect syscalls to routines outside the kernel. These routines can be either used to hide the presence of software or to act as a backdoor to allow attackers permanent code execution with kernel privileges. For both reasons, hooking SSDT calls is often used as a technique in both Windows kernel mode rootkits and antivirus software. In 2010, many computer security products which relied on hooking SSDT calls were shown to be vulnerable to exploits using race conditions to attack the products' security checks.

    Read more →
  • AI content watermarking

    AI content watermarking

    AI content watermarking is the process of embedding imperceptible yet detectable signals into content generated by artificial intelligence systems, such as text, images, audio, or video. The technique allows the content to be traced and identified as machine-generated without compromising its quality for the end user. AI watermarking has emerged as a key approach to address growing concerns about misinformation, deepfakes, copyright infringement, and the traceability of synthetic content in the context of the rapid development of generative artificial intelligence. Unlike traditional visible watermarks used in photography, AI content watermarks are typically invisible to humans and can only be detected and deciphered algorithmically. The concept is distinct from the watermarking of AI models themselves (to prevent model theft) and from the watermarking of training data (to combat unauthorized data use). Modern AI watermarking schemes are typically formalized as a pair of algorithms, an embedding (or generation) algorithm and a detection algorithm, sharing a secret key, whose performance is evaluated along three competing axes: quality (the watermark must not noticeably degrade outputs), detectability (the watermark must be statistically distinguishable from unwatermarked content), and robustness (the watermark must persist under adversarial or incidental modifications). == Background == Digital watermarking has been used for decades to protect physical and digital media, from paper currency to photographs. Classical schemes typically embedded a fixed bit-string into a fixed cover signal, with robustness criteria defined against a small fixed set of distortions such as JPEG compression or additive Gaussian noise. The rapid advancement of generative AI in the early 2020s, however, created a new and qualitatively different demand: rather than protecting a single artifact, watermarks for AI content must be embedded automatically across an open-ended distribution of generated outputs while remaining robust to a much wider class of adversarial transformations, including paraphrasing, image regeneration via diffusion models, and re-recording. Large image generation models such as DALL-E, Stable Diffusion, and Midjourney, along with large language models like ChatGPT, made it possible to produce highly realistic synthetic text, images, audio, and video at scale, raising significant ethical and security concerns. In July 2023, the Biden administration secured voluntary commitments from leading AI companies, including OpenAI, Alphabet, Meta, and Amazon, to develop watermarking and other provenance technologies to help users identify AI-generated content. == Formal definitions and design goals == Most modern AI watermarking schemes can be formalized as a pair of algorithms ( W m , D e t e c t ) {\displaystyle ({\mathsf {Wm}},{\mathsf {Detect}})} parameterized by a secret key k {\displaystyle k} . The embedding algorithm W m {\displaystyle {\mathsf {Wm}}} takes a generative model M {\displaystyle M} (and optionally a prompt) and returns a watermarked output x {\displaystyle x} ; the detection algorithm D e t e c t ( x , k ) {\displaystyle {\mathsf {Detect}}(x,k)} outputs a real-valued score (typically a p-value or log-likelihood ratio) used to decide whether x {\displaystyle x} was produced by the watermarked generator. The literature evaluates such schemes along several largely conflicting criteria: Criteria for evaluation include imperceptibility or quality preservation, measured for text via perplexity and human preference judgments, and for images and audio via metrics such as PSNR, SSIM, LPIPS, or PESQ. Detectability is typically expressed as the true positive rate at a fixed false positive rate (e.g. 1% or 10^-6), or as the number of tokens or pixels needed to reach a given confidence level. Robustness refers to the requirement that the watermark should survive expected modifications like JPEG or MP3 compression, cropping, noise, paraphrasing, or machine translation. Distortion-freeness is a stronger property requiring that the marginal distribution of any single watermarked output be statistically identical to the unwatermarked model's distribution. Schemes due to Aaronson, Christ et al., and Kuditipudi et al. are distortion-free in this sense, while the original Kirchenbauer et al. scheme is not. Forgery resistance or unforgeability means an adversary without the secret key should be unable to produce content that passes detection. == Techniques == AI watermarking techniques vary significantly depending on the type of content being watermarked. At its core, the process involves two main stages: embedding (or encoding) the watermark, and detection. There are two primary methods for embedding: watermarking during content generation, which requires access to the AI model itself but is generally more robust, and post-generation watermarking, which can be applied to content from any source, including closed-source models. Watermarks can be broadly classified as visible, including overt marks such as logos or text overlays, or imperceptible, which are detectable only by algorithms. They can also be classified by durability: robust watermarks are designed to withstand common transformations such as compression, cropping, and re-encoding, while fragile watermarks are easily destroyed by any alteration, making them useful for tamper detection. A further axis distinguishes zero-bit watermarks, which only signal "this content was generated by model M," from multi-bit watermarks, which embed an arbitrary payload (such as a user identifier) that can be recovered at detection time. === Text === Text watermarking is considered one of the most challenging modalities because natural language offers relatively limited redundancy compared to images or audio. Modern approaches for large language models alter the autoregressive sampling process so that some statistical signature is left in the choice of tokens, while leaving the surface form of the text unchanged. The literature distinguishes three main families of generation-time text watermarks. Logit-biasing schemes (e.g. KGW) add a fixed bias δ {\displaystyle \delta } to a pseudorandomly selected subset of vocabulary logits before softmax sampling. Reweighting or sampling-based schemes (e.g. SynthID-Text) compose multiple pseudorandom tournaments over the model's full distribution. Distortion-free schemes based on the Gumbel-max trick or inverse transform sampling (Aaronson 2022; Kuditipudi et al. 2023; Christ et al. 2024) preserve the marginal output distribution of the model. ==== KGW: token-probability shifting ==== The pioneering "green list / red list" scheme of Kirchenbauer et al. (KGW), introduced at ICML 2023, is the foundation for most subsequent text watermarks. At each decoding step t {\displaystyle t} , a pseudorandom function (PRF) keyed by a secret k {\displaystyle k} is applied to a context window of h {\displaystyle h} previous tokens to deterministically partition the vocabulary V {\displaystyle V} of size N {\displaystyle N} into a "green list" G ⊂ V {\displaystyle G\subset V} of size γ N {\displaystyle \gamma N} and its complement, the "red list" R = V ∖ G {\displaystyle R=V\setminus G} , where γ ∈ ( 0 , 1 ) {\displaystyle \gamma \in (0,1)} (typically γ = 1 / 2 {\displaystyle \gamma =1/2} ) is the green fraction. A logits processor then increments every green-list logit by a fixed bias δ > 0 {\displaystyle \delta >0} before softmax: ℓ v ′ = ℓ v + δ ⋅ 1 [ v ∈ G ] {\displaystyle \ell '_{v}=\ell _{v}+\delta \cdot \mathbf {1} [v\in G]} so that, after sampling, green tokens are over-represented but generation is not constrained to green tokens alone; high-entropy positions tolerate the bias gracefully, while low-entropy positions (where one token dominates the logits) override the watermark and preserve correctness on factual content. Detection requires only the secret key and the candidate text, not the language model itself. The detector recomputes the partition g ( ⋅ ) {\displaystyle g(\cdot )} for each token, counts the number of green hits | G | hits {\displaystyle |G|_{\text{hits}}} in a sequence of length T {\displaystyle T} , and computes a one-proportion z-test statistic: z = | G | hits − γ T T γ ( 1 − γ ) {\displaystyle z={\frac {|G|_{\text{hits}}-\gamma T}{\sqrt {T\gamma (1-\gamma )}}}} Under the null hypothesis that the text was written by an unwatermarked source (human or another model), the green-hit count is approximately binomially distributed with mean γ T {\displaystyle \gamma T} ; a large positive z {\displaystyle z} rejects the null hypothesis. The original paper reports that fewer than 25 watermarked tokens are sufficient to detect a watermark with a false positive rate below 10^-5 on the OPT-1.3B model. A follow-up study by the same group documented robustness under temperature sampling, top-p (nucleus) sampling, and human paraphrasing, and proposed sliding-window

    Read more →