AI Data Manager Jobs

AI Data Manager Jobs — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Brill tagger

    Brill tagger

    The Brill tagger is an inductive method for part-of-speech tagging. It was described and invented by Eric Brill in his 1993 PhD thesis. It can be summarized as an "error-driven transformation-based tagger". It is: a form of supervised learning, which aims to minimize error; and, a transformation-based process, in the sense that a tag is assigned to each word and changed using a set of predefined rules. In the transformation process, if the word is known, it first assigns the most frequent tag, or if the word is unknown, it naively assigns the tag "noun" to it. High accuracy is eventually achieved by applying these rules iteratively and changing the incorrect tags. This approach ensures that valuable information such as the morphosyntactic construction of words is employed in an automatic tagging process. == Algorithm == The algorithm starts with initialization, which is the assignment of tags based on their probability for each word (for example, "dog" is more often a noun than a verb). Then "patches" are determined via rules that correct (probable) tagging errors made in the initialization phase: Initialization: Known words (in vocabulary): assigning the most frequent tag associated to a form of the word Unknown word == Rules and processing == The input text is first tokenized, or broken into words. Typically in natural language processing, contractions such as "'s", "n't", and the like are considered separate word tokens, as are punctuation marks. A dictionary and some morphological rules then provide an initial tag for each word token. For example, a simple lookup would reveal that "dog" may be a noun or a verb (the most frequent tag is simply chosen), while an unknown word will be assigned some tag(s) based on capitalization, various prefix or suffix strings, etc. (such morphological analyses, which Brill calls Lexical Rules, may vary between implementations). After all word tokens have (provisional) tags, contextual rules apply iteratively, to correct the tags by examining small amounts of context. This is where the Brill method differs from other part of speech tagging methods such as those using Hidden Markov Models. Rules are reapplied repeatedly, until a threshold is reached, or no more rules can apply. Brill rules are of the general form: tag1 → tag2 IF Condition where the Condition tests the preceding and/or following word tokens, or their tags (the notation for such rules differs between implementations). For example, in Brill's notation: IN NN WDPREVTAG DT while would change the tag of a word from IN (preposition) to NN (common noun), if the preceding word's tag is DT (determiner) and the word itself is "while". This covers cases like "all the while" or "in a while", where "while" should be tagged as a noun rather than its more common use as a conjunction (many rules are more general). Rules should only operate if the tag being changed is also known to be permissible, for the word in question or in principle (for example, most adjectives in English can also be used as nouns). Rules of this kind can be implemented by simple Finite-state machines. See Part of speech tagging for more general information including descriptions of the Penn Treebank and other sets of tags. Typical Brill taggers use a few hundred rules, which may be developed by linguistic intuition or by machine learning on a pre-tagged corpus. == Code == Brill's code pages at Johns Hopkins University are no longer on the web. An archived version of a mirror of the Brill tagger at its latest version as it was available at Plymouth Tech can be found on Archive.org. The software uses the MIT License.

    Read more →
  • Screen generator

    Screen generator

    A screen generator, also known as a screen painter, screen mapper, or forms generator is a software package (or component thereof) which enables data entry screens to be generated declaratively, by "painting" them on the screen WYSIWYG-style, or through filling-in forms, rather than requiring writing of code to display them manually. 4GLs commonly incorporate a screen generator feature. They are also commonly found bundled with database systems, especially entry-level databases. A screen generator is one aspect of an application generator, which can also include other functions such as report generation and a data dictionary. The earliest screen generators were character-based; by the 1990s, GUI support became common, and then support for generating HTML forms as well. Some screen generators work by generating code to display the screen in a high-level language (for example, COBOL); others store the screen definition in a data file or in database tables, and then have a runtime component responsible for actually displaying the form and receiving and validating user input. == Examples == Examples of screen generators include: IBM Screen Definition Facility II: generates screens for CICS BMS, IMS MFS, ISPF, GDDM and CSP/AD. Performix for Informix. Microsoft Visual Basic the forms component of Microsoft Access Oracle Developer, in particular its Oracle Forms component the QDesign component of PowerHouse SystemBuilder/SB+ the Screen Painter component of SAP's ABAP Workbench the FoxView component of FoxPro. FoxView was originally developed by Luis Castro as a dBASE screen generator named ViewGen; Fox purchased it and bundled it with FoxPro 1.0. Later, Fox replaced Castro's code with their own screen painter code. dBASE included a built-in screen generator in dBASE IV onwards; in dBASE III and earlier, third party screen generators were available, including the already mentioned ViewGen DPS 1100 for UNIVAC 1100 series mainframes.

    Read more →
  • IWork

    IWork

    iWork is an office suite of applications created by Apple for its macOS, iPadOS, and iOS operating systems, and also available cross-platform through the iCloud website. iWork includes the presentation application Keynote, the word-processing and desktop-publishing application Pages, and the spreadsheet application Numbers. Apple's design goals in creating iWork have been to allow Mac users to easily create attractive documents and spreadsheets, making use of macOS's extensive font library, integrated spelling checker, sophisticated graphics APIs and its AppleScript automation framework. The equivalent Microsoft Office applications to Pages, Numbers, and Keynote are Word, Excel, and PowerPoint, respectively. Although Microsoft Office applications cannot open iWork documents, iWork applications can open Office documents for editing, and export documents from iWork's native formats (.pages, .numbers, .key) to Microsoft Office formats (.docx, .xlsx, .pptx, etc.) as well as to PDF files. The oldest application in iWork is Keynote, first released as a standalone application in 2003 for use by Steve Jobs in his presentations. Steve Jobs announced Keynote saying "It's for when your presentation really matters". Pages was released with the first iWork bundle in 2004; Numbers was added in 2007 with the release of iWork '08. The next release, iWork '09, also included beta access to iWork.com, an online service that allowed users to upload and share documents on the web, now integrated into Apple's iCloud service. A version of iWork for iOS was released in 2010 with the first iPad, and the apps have been regularly updated since, including the addition of iPhone support. In 2013, Apple launched iWork web apps in iCloud; even years later, however, their functionality is somewhat limited compared to equivalents on the desktop. iWork was initially sold as a suite for $79, then later at $19.99 per app on OS X and $9.99 per app on iOS. Apple announced in October 2013 that all iOS and OS X devices purchased onwards, whether new or refurbished, would be eligible for a free download of all three iWork apps: after device setup, the user can "claim" the apps on the App Store, after which they are permanently linked to the user’s Apple ID. iWork for iCloud, which also incorporates a document hosting service, is free to all iCloud users. iWork was released for free on macOS and iOS (including older or resold devices) in April 2017. In September 2016, Apple announced that the real-time collaboration feature would be available for all iWork apps. == History == The first version of iWork, iWork '05, was announced on January 11, 2005 at the Macworld Conference & Expo and made available on January 22 in the United States and on January 29 worldwide. iWork '05 comprised two applications: Keynote 2, a presentation creation program, and Pages, a word processor. iWork '05 was sold for US$79. A 30-day trial was also made available for download on Apple's website. Originally IGG Software held the rights to the name iWork. While iWork was billed by Apple as "a successor to AppleWorks", it does not replicate AppleWorks's database and drawing tools. However, iWork integrates with existing applications from Apple's iLife suite through the Media Browser, which allows users to drag and drop music from iTunes, movies from iMovie, and photos from iPhoto and Aperture directly into iWork documents. iWork '06 was released on January 10, 2006 and contained updated versions of both Keynote and Pages. Both programs were released as universal binaries for the first time, allowing them to run natively on both PowerPC processors and the Intel processors used in the new iMac desktop computers and MacBook Pro notebooks which had been announced on the same day as the new iWork suite. The next version of the suite, iWork '08, was announced and released on August 7, 2007 at a special media event at Apple's campus in Cupertino, California. iWork '08, like previous updates, contained updated versions of Keynote and Pages. A new spreadsheet application, Numbers, was also introduced. Numbers differed from other spreadsheet applications, including Microsoft Excel, in that it allowed users to create documents containing multiple spreadsheets on a flexible canvas using a number of built-in templates. iWork '09, was announced on January 6, 2009 and released the same day. It contains updated versions of all three applications in the suite. iWork '09 also included access to a beta version of the iWork.com service, which allowed users to share documents online until that service was decommissioned at the end of July 2012. Users of iWork '09 could upload a document directly from Pages, Keynote, or Numbers and invite others to view it online. Viewers could write notes and comments in the document, and download a copy in iWork, Microsoft Office, or PDF formats. iWork '09 was also released with the Mac App Store on January 6, 2011 at $19.99 per application, and received regular updates after this point, including links to iCloud and a high-DPI version designed to match Apple's MacBook Pro with Retina Display. On January 27, 2010, Apple announced iWork for iPad, to be available as three separate $9.99 applications from the App Store. This version has also received regular updates including a version for pocket iPhone and iPod Touch devices, and an update to take advantage of Retina Display devices and the larger screens of recent iPhones. On October 22, 2013, Apple announced an overhaul of the iWork software for both the Mac and iOS. Both suites were made available via the respective App Stores. The update is free for current iWork owners and was also made available free of charge for anyone purchasing an OS X or iOS device after October 1, 2013. Any user activating the newly free iWork apps on a qualifying device can download the same apps on another iOS or OS X device logged into the same App Store account. The new OS X versions have been criticized for losing features such as multiple selection, linked text boxes, bookmarks, 2-up page views, mail merge, searchable comments, ability to read/export RTF files, default zoom and page count, integration with AppleScript. Apple has provided a road-map for feature re-introduction, stating that it hopes to reintroduce some missing features within the next six months. As of April 1, 2014 a few features—e.g., the ability to set the default zoom—had been reintroduced, though scores had not. Due to using a completely new file format that can work across macOS, Windows, and in most web browsers by using the online iCloud web apps, versions of iWork beginning with iWork 13 and later do not open or allow editing of documents created in versions prior to iWork '09, with users who attempt to open older iWork files being given a pop-up in the new iWork 13 app versions telling them to use the previous iWork '09 (which users may or may not have on their machine) in order to open and edit such files. Accordingly, the current version for OS X (which was initially only compatible with OS X Mavericks 10.9 onwards) moves any previously installed iWork '09 apps to an iWork '09 folder on the users machine (in /Applications/iWork '09/), as a work-around to allow users continued use of the earlier suite in order to open and edit older iWork documents locally on their machine. In October 2015, Apple released an update to mitigate this issue, allowing users to open documents saved in iWork '06 and iWork '08 formats in the latest version of Pages. In 2016, Apple announced that the real-time collaboration feature would be available for all iWork apps, instead of being constrained to using iWork for iCloud. The feature is comparable to Google Docs. == Versions == === Major releases === === Updates === iWork '09 received several updates: iWork 9.0.3 DVD (for Mac OS X 10.5.6 "Leopard" or newer; released August 26, 2010) iWork 9.0.4 (for Mac OS X 10.5.6 "Leopard" or newer; released August 26, 2010) iWork 9.1 (for Mac OS X 10.6.6 "Snow Leopard" or newer; released July 20, 2011) iWork 9.3 (for Mac OS X 10.7.4 "Lion" or newer; released December 4, 2012) The Mac App Store version of iWork was updated on October 15, 2015 for 10.10 "Yosemite" or newer. It is the final release to support 10.10 "Yosemite" and 10.11 "El Capitan". Keynote 6.6, Pages 5.6 and Numbers 3.6 are included. iWork received a major update again on March 28, 2019 with Keynote 9.0, Pages 8.0 and Numbers 6.0. == Components == === Common components === Products in the iWork suite share a number of components, largely as a result of sharing underlying code from the Cocoa and similar shared application programming interfaces (APIs). Among these are the well known universal multilingual spell checker, which can also be found in products like Safari and Mail. Grammar checking, find and replace, style and color pickers are similar examples of design features found throughout the Apple application space. Moreover, the applications

    Read more →
  • Splitwise

    Splitwise

    Splitwise is an online expense-splitting application software accessible via web browser and mobile app. The app facilitates repayments of shared bills by calculating what each person in a group owes. The primary competitor to the app is Venmo, which only operates in the U.S. Splitwise allows users to create groups with friends to determine what each person owes. All expenses and allocations are added to the app, and Splitwise simplifies the transaction history to determine exactly what payments need to be made to whom to settle outstanding balances. Splitwise stores user information via cloud storage. It was developed and is owned by Splitwise Inc., based in Providence, Rhode Island, United States. == History == The app was launched in February 2011 as SplitTheRent, intended to be used for rent splitting, by Ryan Laughlin, Jon Bittner and Marshall Weir. In September 2013, Splitwise was integrated with Venmo to allow users to settle payments via Venmo. In April 2024, Splitwise partnered with Tink, a Visa payment services company, to incorporate a bank transfer feature directly in the Splitwise app. === Financing === In December 2014, the company raised $1.4 million. In October 2016, the company raised $5 million. In April 2021, Splitwise raised $20 million in funding from series A round run by Insight Partners. == Reception == A 2022 opinion piece in The Guardian by London journalist Imogen West-Knights shared the negative effects of exactly splitting bills among friends and family members. West-Knights argued that Splitwise and similar apps can "turn people into those true enemies of all that is fun and joyful in the world: accountants." However, she said the app does work better when used by couples rather than friend groups. Other reviews noted that the app makes people petty. In contrast, an article published by Condé Nast Traveler describes how Splitwise eliminated stress caused by complicated offline bill splitting, saying it "fixed such a pervasive obstacle in group travel." Coverage by The Wall Street Journal lands somewhere in between the two contrasting views, saying Splitwise and similar apps are helpful, but users need to be prepared for difficult money-related conversations that may arise. An etiquette advisor at Debrett's, said, "The less talk you can have about money on any of these occasions, the better." An editor suggested conversations as simple as asking, "We’re splitting this evenly, right?" before a meal.

    Read more →
  • Human Race Machine

    Human Race Machine

    The Human Race Machine (HRM) is a computerized console composed of four different programs. The Human Race Machine program allows participants to see themselves with the facial characteristics of six different races: Asian, White, African, Middle Eastern, and Indian, mapped onto their own face. The Age Machine allows viewers see an aged version of his or her face. A version of this methodology has been used for over twenty years by the FBI and the National Center for Missing and Exploited Children to help locate kidnap victims and missing children. The Couples Machine combines photographs of two people in different percentages to show the appearance of their child. The Anomaly Machine lets viewers see themselves with facial anomalies. The HRM was created by artist Nancy Burson and David Kramlich; it uses morphing technology. It was shown on Oprah on 2006-02-16.

    Read more →
  • Mojito (framework)

    Mojito (framework)

    Mojito is an environment agnostic, Model-View-Controller (MVC) web application framework. It was designed by Yahoo. == Features == Mojito supports agile development of web applications. Mojito has built-in support for unit testing, Internationalization, syntax and coding convention checks. Both server and client components are written in JavaScript. Mojito allows developers designing web applications to leverage the utilities of both configuration and MVC framework. Mojito is capable of running on both JavaScript-enabled web browsers and servers using Node.js because they both utilize JavaScript. Mojito applications mainly consist of two components: JSON Configuration files: these define relationships between code components, assets, routing paths, and framework defaults and are available at the application and mojit level. Directories: these reflect MVC architecture and are used to separate resources such as assets, libraries, middleware, etc. == Architecture == In Mojito, both server and "client" side scripting is done in JavaScript, allowing it to run on both client and server thereby breaking the "front-end back-end barrier." It has both client and server runtimes. === Server runtime === This block houses operations needed by server side components. Services include: Routing rules, HTTP Server, config loader and disk-based loader. === Client runtime === This block houses operations called upon while running client sides components. Services include local storage/cache access and JSON based /URL based loader === Core === Core function can be accessed on client or server. Services include Registry, Dispatcher, Front controller, Resource store. === Container === mojit object comes into the picture. This container also include the services used by mojits. API and Mojito services are the blocks which caters to services needed for execution of mojits. === API (Action Context) === Mojito services are a customizable service block. It offers mojits a range of services which might be needed by mojit to carry out certain actions. These services can be availed at both client and server side. Reusable services can be created and aggregated to the core here. == Mojits == Mojits are the modules of a Mojito application. An application consists of one or more mojits. A mojit encompasses a Model, Views and a Controller defined by JSON configuration files. It includes a View factory where views are created according to the model and a View cache that holds frequently requested views to aid performance. === Application Architecture === A Mojito application is a set of mojits facilitated by configurable JSON files which define the code for model, view and controller. This MVC structure works with API block and Mojito services, and can be deployed at both client and server side. While the application is deployed at client side, it can call server-side modules using binders. Binders are mojit codes that let mojits request services from each other. Mojit Proxy acts as an intermediary between binders and mojit's API (application context) block and other mojits. Controllers are command-issuing units of mojits. Models mirror the core logic and hold data. Applications can have multiple models. They can be centrally accessed from controllers. View files are created in accordance with controllers and models, and are marked-up before they are sent to users as output. === Application Directory Structure === Directory structure of a Mojito application with one mojit: [mojito_app]/ |-- application.json |-- assets/ | `-- favicon.icon |-- yui_modules/ | `-- .{affinity}.js |-- index.js |-- mojits/ | `-- [mojit_name | |-- assets/ | |-- yui_modules/ | | `-- .{affinity}.js | |-- binders/ | | `-- {view_name}.js | |-- controller.{affinity}.js | |-- defaults.json | |-- definition.json | |-- lang/ | | `-- {mojit_name}_{lang}.js | |-- models/ | | `-- {model_name}.{affinity}.js | |-- tests/ | | |-- yui_modules/ | | | `-- {module_name}.{affinity}-tests.js | | |-- controller.{affinity}-tests.js | | `-- models/ | | `-- {model_name}.{affinity}-tests.js | `-- views/ | |-- {view_name}.{view_engine}.html | `-- {view_name}.{device}.{view_engine}.html |-- package.json |-- routes.json (deprecated) |-- server.js == Model, View and Controller == The Model hosts data, which is accessed by the Controller and presented to the View. Controller also handles any client requests for data, in which case controller fetches data from the model and passes the data to the client. All three components are clustered in the mojit. Mojits are physically illustrated by directory structures and an application can have multiple mojits. Every mojit can have one controller, one or more views and zero or more models. === Model === The model it represents the application data and is independent of view or controller. Model contains code to manipulate the data. They are found in the models directory of each mojit. Functions include: Storing information for access by controller. Validation and error handling. Metadata required by the view === Controller === The controller acts like a connecting agent between model and view. It supplies input to Model and after fetching data from model, passes it to View. Functions include Redirection Monitors authentication Web safety Encoding === View === The view acts as a presentation filter by highlighting some model attributes and suppressing others. A view can be understood as a visual permutation of the model. The view renders data received from controller and displays it to the end user.

    Read more →
  • Cloud-Based Secure File Transfer

    Cloud-Based Secure File Transfer

    Cloud-Based Secure File Transfer is a managed or hosted file transfer service that provides cloud storage that can be accessed via SSH File Transfer Protocol (SFTP). These services allow secure, reliable file transfers while offering the scalability, redundancy, and high availability of cloud infrastructure. == Technical overview == The evolution of file transfer protocols began with File Transfer Protocol (FTP) and SSH File Transfer Protocol (SFTP). SFTP offered enhanced security through the use of SSH (Secure Shell) encryption, which addressed many of the security concerns associated with traditional FTP. Over time, as businesses increasingly adopted cloud infrastructure, the demand for services that integrate secure file transfer with cloud storage led to the rise of Cloud-Based Secure File Transfer services. These services combine the benefits of secure, encrypted file transfer with the scalability and flexibility of cloud-based storage systems. Traditional on-premises SFTP typically involves setting up and managing physical or virtual servers to handle file transfers. In contrast, Cloud-Based Secure File Transfer utilizes managed cloud infrastructure, such as AWS EC2, Azure VMs, or Google Cloud, to automate scaling, ensure redundancy, and provide high availability. These cloud environments can be configured to automatically scale with demand, enabling businesses to handle large volumes of data transfers without the need for extensive physical hardware. == Features == Scalability and availability: Cloud-Based Secure File Transfer services are inherently scalable, with features like load balancing, multi-region deployments, and auto-scaling groups that adjust resources in response to traffic spikes. This ensures that the system can handle varying workloads and provides continuous availability, even during high-demand periods. Cost-effectiveness: By eliminating the need for physical infrastructure and reducing ongoing server maintenance costs, Cloud-Based Secure File Transfer services offer significant cost savings compared to traditional on-premises services. Cloud providers typically offer pay-as-you-go pricing models, where users only pay for the resources they use, further optimizing costs. Security and compliance: Cloud-Based Secure File Transfer products offer strong security measures, including end-to-end encryption, key management, detailed logging, and auditing. These services are often compliant with industry regulations such as HIPAA (Health Insurance Portability and Accountability Act), GDPR (General Data Protection Regulation), and SOC 2 (System and Organization Controls), ensuring that data transfers meet necessary security and privacy standards. == Cloud-Based Secure File Transfer providers == == Uses == Cloud-Based Secure File Transfer is used across various industries to securely transfer sensitive data and integrate into business workflows. In healthcare, Cloud-Based Secure File Transfer is essential for securely transferring electronic Protected Health Information (ePHI), ensuring compliance with regulations like HIPAA. In financial institutions, it is used to protect sensitive financial data during transfer, maintaining privacy and security. Data analytics also benefits from Cloud-Based Secure File Transfer, offering a secure and efficient method for transferring large datasets between systems or partners. Technically, Cloud-Based Secure File Transfer is often integrated into enterprise workflows through automated file transfers, using scripting or APIs. It also plays a key role in cloud backup and disaster recovery, ensuring that files are securely transferred and stored in cloud environments, which supports business continuity. However, businesses must address certain implementation challenges. Despite its secure design, Cloud-Based Secure File Transfer is not immune to risks such as misconfigured SSH keys, improper access control, or inadequate encryption. Regular security audits and careful configuration management are necessary to minimize the risk of data breaches. Additionally, integrating Cloud-Based Secure File Transfer with legacy systems can present challenges, such as incompatible APIs or outdated authentication methods. == Comparisons with related technologies == Cloud-Based Secure File Transfer differs from traditional SFTP primarily in its deployment and management model. Traditional SFTP services are typically hosted on-premises or on virtual servers, requiring manual configuration, ongoing infrastructure maintenance, and security management by in-house IT teams. In contrast, Cloud-Based Secure File Transfer is offered as a Software-as-a-Service (SaaS) service, reducing infrastructure overhead by eliminating the need for dedicated hardware or virtual machines. This model simplifies management through centralized web-based interfaces, automated updates, and built-in scalability. While Cloud-Based Secure File Transfer is focused on providing secure file transfers over the SFTP protocol, Managed File Transfer (MFT) platforms generally support a broader range of protocols, including FTP, FTPS, HTTP/S, and AS2. MFT services often include advanced features such as end-to-end encryption, extensive automation, compliance reporting, and integration with enterprise systems. Cloud-Based Secure File Transfer services may offer some of these features but are typically more lightweight and streamlined, targeting organizations seeking a secure and scalable alternative to traditional SFTP without the full suite of MFT capabilities. As such, Cloud-Based Secure File Transfer can be seen as a specialized subset within the broader managed file transfer ecosystem.

    Read more →
  • Systems development life cycle

    Systems development life cycle

    The systems development life cycle (SDLC) describes the typical phases and progression between phases during the development of a computer-based system. These phases progress from inception to retirement. At base, there is just one life cycle, but the taxonomy used to describe it may vary; the cycle may be classified into different numbers of phases and various names may be used for those phases. The SDLC is analogous to the life cycle of a living organism from its birth to its death. In particular, the SDLC varies by system in much the same way that each living organism has a unique path through its life. The SDLC does not prescribe how engineers should go about their work to move the system through its life cycle. Prescriptive techniques are referred to using various terms such as methodology, model, framework, and formal process. Other terms are used for the same concept as SDLC, including software development life cycle (also SDLC), application development life cycle (ADLC), and system design life cycle (also SDLC). These other terms focus on a different scope of development and are associated with different prescriptive techniques, but are about the same essential life cycle. The term "life cycle" is often written without a space, as "lifecycle", with the former more popular in the past and in non-engineering contexts. The acronym SDLC was coined when the longer form was more popular and has remained associated with the expansion, even though the shorter form is popular in engineering. Also, SDLC is relatively unique as opposed to the TLA SDL, which is highly overloaded. == Phases == Depending on the source, the SDLC is described as having different phases and using different terms. Even so, there are common aspects. The following attempts to describe notable phases using notable terminology. The phases are somewhat ordered by the natural sequence of development, although they can be overlapping and iterative. === Conceptualization === During conceptualization (a.k.a. conceptual design, system investigation, feasibility), options and priorities are considered. A feasibility study can determine whether the development effort is worthwhile via activities such as understanding user needs, cost estimation, benefit analysis, and resource analysis. A study should address operational, financial, technical, human factors, and legal/political concerns. === Requirements analysis === Requirements analysis (a.k.a. preliminary design) involves understanding the problem and determining what is needed. Often this involves engaging users to define the requirements and recording them in a document known as a requirements specification. === Design === During the design phase (a.k.a. detail design), a solution is planned. The plan can include relatively high-level information such as describing the major components of the system. The plan can include relatively low-level information such as describing functions, screen layout, business rules, and process flow. The design phase is informed by the requirements of the system. The design must satisfy each requirement. The design may be recorded in textual documents as well as functional hierarchy diagrams, example screen images, business rules, process diagrams, pseudo-code, and data models. === Construction === During construction (a.k.a. implementation, production), the system is realized. Based on the design, hardware and software components are created and integrated. This phase includes testing sub-components, components and the integration of some components, but typically does not include testing at the complete system level. This phase may include the development of training materials, including user manuals and help files. === Acceptance === The acceptance phase (a.k.a. system testing) is about testing the complete system to ensure that it meets customer expectations (requirements). === Deployment === The deployment phase (a.k.a. implementation) involves the logistics of delivery to the customer. Some systems are deployed as a single instance (i.e. in the cloud), and deployment may be ad hoc and manual. Some systems are built in quantity and are associated with manufacturing process and commissioning. This phase may include training users to use the system. It may include transitioning future development to support staff. === Maintenance === During the maintenance phase (a.k.a. operation, utilization, support) development is largely inactive, although this phase does include customer support for resolving user issues and recording suggestions for improvement. Fixes and enhancements are handled by returning to the first phase, conceptualization. For minor changes, the cycle may be significantly abbreviated compared to initial development. === Decommission === Decommission (a.k.a. disposition, retirement, phase-out) is when the system is removed from use, i.e., when it reaches end-of-life. == Practices == === Management and control === SDLC phase objectives are described in this section with key deliverables, a description of recommended tasks, and a summary of related control objectives for effective management. It is critical for the project manager to establish and monitor control objectives while executing projects. Control objectives are clear statements of the desired result or purpose and should be defined and monitored throughout a project. Control objectives can be grouped into major categories (domains), and relate to the SDLC phases as shown in the figure. To manage and control a substantial SDLC initiative, a work breakdown structure (WBS) captures and schedules the work. The WBS and all programmatic material should be kept in the "project description" section of the project notebook. The project manager chooses a WBS format that best describes the project. The diagram shows that coverage spans numerous phases of the SDLC, but the associated MCD (Management Control Domains) shows mappings to SDLC phases. For example, Analysis and Design is primarily performed as part of the Acquisition and Implementation Domain, and System Build and Prototype is primarily performed as part of delivery and support. === Work breakdown structured organization === The upper section of the WBS provides an overview of the project scope and timeline. It should also summarize the major phases and milestones. The middle section is based on the SDLC phases. WBS elements consist of milestones and tasks to be completed rather than activities to be undertaken, and have a deadline. Each task has a measurable output (e.g., an analysis document). A WBS task may rely on one or more activities (e.g., coding). Parts of the project needing support from contractors should have a statement of work (SOW). The development of an SOW does not occur during a specific phase of SDLC but is developed to include the work from the SDLC process that may be conducted by contractors. === Baselines === Baselines are established after four of the five phases of the SDLC, and are critical to the iterative nature of the model. Baselines become milestones. functional baseline: established after the conceptual design phase. allocated baseline: established after the preliminary design phase. product baseline: established after the detailed design and development phase. updated product baseline: established after the production construction phase. In the following diagram, these stages are divided into ten steps, from definition to creation and modification of IT work products:

    Read more →
  • Maximum inner-product search

    Maximum inner-product search

    Maximum inner-product search (MIPS) is a search problem, with a corresponding class of search algorithms which attempt to maximise the inner product between a query and the data items to be retrieved. MIPS algorithms are used in a wide variety of big data applications, including recommendation algorithms and machine learning. Formally, for a database of vectors x i {\displaystyle x_{i}} defined over a set of labels S {\displaystyle S} in an inner product space with an inner product ⟨ ⋅ , ⋅ ⟩ {\displaystyle \langle \cdot ,\cdot \rangle } defined on it, MIPS search can be defined as the problem of determining a r g m a x i ∈ S ⟨ x i , q ⟩ {\displaystyle {\underset {i\in S}{\operatorname {arg\,max} }}\ \langle x_{i},q\rangle } for a given query q {\displaystyle q} . Although there is an obvious linear-time implementation, it is generally too slow to be used on practical problems. However, efficient algorithms exist to speed up MIPS search. Under the assumption of all vectors in the set having constant norm, MIPS can be viewed as equivalent to a nearest neighbor search (NNS) problem in which maximizing the inner product is equivalent to minimizing the corresponding distance metric in the NNS problem. Like other forms of NNS, MIPS algorithms may be approximate or exact. MIPS search is used as part of DeepMind's RETRO algorithm.

    Read more →
  • Toolchain

    Toolchain

    A toolchain is a set of software development tools used to build and otherwise develop software. Often, the tools are executed sequentially and form a pipeline such that the output of one tool is the input for the next. Sometimes the term is used for a set of related tools that are not necessarily executed sequentially. A relatively common and simple toolchain consists of the tools to build for a particular operating system (OS) and CPU architecture: a compiler, a linker, and a debugger. With a cross-compiler, a toolchain can support cross-platform development. For building more complex software systems, many other tools may be in the toolchain. For example, for a video game, the toolchain may include tools for preparing sound effects, music, textures, 3-dimensional models and animations, and for combining these resources into the finished product.

    Read more →
  • Evntlive

    Evntlive

    Evntlive was an interactive digital concert venue that allowed music fans worldwide to stream concerts to their computer, tablet, or phone. Based in Redwood City, CA, EVNTLIVE Beta launched on April 15, 2013. EVNTLIVE provided users with the ability to switch camera angles, view All Access interviews and clips from artists, buy music, and chat with other online concert-goers in the in-app feature. Users could watch live and on-demand concerts with both free and pay-per-view concerts offered. In its first two months, EVNTLIVE streamed live performances of popular artists ranging from Bon Jovi to Wale, as well as music festivals such as Taste of Country and Mountain Jam; including performances by The Lumineers, Gary Clark Jr., Phil Lesh & Friends, Primus, and more. On December 6, 2013, Evntlive was acquired and absorbed by Yahoo!. The site ceased operations and redirected viewers to Yahoo! Music and Yahoo! Screen promptly afterwards. == About the Platform == EvntLive is an HTML5, web-based platform available on laptops, iPads, and mobile devices. Users must register for a free account on Evntlive’s website in order to reserve tickets and access live and on-demand content. Once they reserve tickets, they can view All Access features from their favorite artists or bands, purchase music, and interact with other online audience members using Buzz. Users can also switch between alternate camera angles as though they are on the concert floor - sharing the experience with their friends online in real-time. EvntLive was acquired by Yahoo in December 2013 == Artists == Bon Jovi Wale Escape the Fate The Parlotones === Taste of Country Music Festival === Trace Adkins Willie Nelson Justin Moore Montgomery Gentry Craig Campbell Blackberry Smoke Gloriana Dustin Lynch LoCash Cowboys Rachel Farley Parmalee Joe Nichols === Mountain Jam Music Festival === Source: The Lumineers Primus Widespread Panic Gov't Mule Phil Lesh The Avett Brothers Dispatch Rubblebucket Michael Franti Jackie Greene Deer Tick Gary Clark Jr. ALO The London Souls Nicki Bluhm Amy Helm The Lone Bellow The Revivalists Swear and Shake Roadkill Ghost Choir Michael Bernard Fitzgerald Michele Clark 's Sunset Sessions Semi Precious Weapons Dale Earnhardt Jr. Jr. DigiTour Media Pentatonix Allstar Weekend Tyler Ward === Launch Music Festival ===

    Read more →
  • List of COBOL software and tools

    List of COBOL software and tools

    This is a list of software and programming tools for the COBOL programming language, which includes compilers, IDEs, build tools, testing, frameworks, and related projects. == Compilers and runtimes == Fujitsu NetCOBOL — COBOL compiler for Windows, Linux, and mainframes GnuCOBOL — open-source COBOL compiler translating COBOL to C and then compiling with GCC IBM COBOL — mainframe COBOL compiler for IBM z/OS and IBM i platforms Micro Focus COBOL — commercial COBOL compiler and runtime for enterprise systems FairCom RTG – A commercial real-time database and runtime solution developed by FairCom Corporation. It provides integration with COBOL applications for transaction processing and modernization projects, and is used in enterprise environments requiring high-performance data management. == Integrated development environments == Eclipse IDE — with COBOL plugin support, Micro Focus or Bitlang extensions. IBM Developer for z/OS — IDE for COBOL and PL/I mainframe development Micro Focus Visual COBOL — IDE integration for Visual Studio, Visual Studio Code, and Eclipse OpenCOBOLIDE — open-source lightweight IDE for GnuCOBOL Visual Studio Code — with COBOL extensions via Bitlang COBOL and GnuCOBOL Language Server == Frameworks, libraries, and APIs == ACUCOBOL-GT — runtime and API library suite from Micro Focus CICS — IBM middleware for transaction processing in COBOL applications DB2 and IMS APIs — database access libraries commonly used with COBOL applications == Build tools and package managers == Apache Ant — scripting and build automation for COBOL/Java hybrid systems GNU Make — common build tool for compiling COBOL via GnuCOBOL Jenkins — used for CI/CD automation with COBOL builds == Testing and quality assurance == COBOL Check — open-source unit testing framework for COBOL IBM Rational Performance Tester — automated performance testing of web and server-based applications from the Rational Software division of IBM Micro Focus Unit Testing Framework — integrated COBOL unit testing tool == Debugging and profiling tools == GnuCOBOL debug mode — command-line debugging integrated in GnuCOBOL compiler IBM Debug Tool for z/OS — mainframe debugging for COBOL and PL/I Micro Focus Animator — step-through debugger for COBOL code

    Read more →
  • Speculative decoding

    Speculative decoding

    Speculative decoding is an inference-time optimization for autoregressive large language models (LLMs) that generates multiple tokens per decoding step instead of one. A smaller draft model proposes a sequence of candidate tokens, and the larger target model verifies them in a single forward pass through a modified rejection sampling scheme. The verification preserves the target model's original output distribution, so the technique produces the same results as standard decoding while cutting latency by roughly two to three times. The name is an analogy to speculative execution in CPU design, where a processor runs instructions along a predicted branch before the outcome is known. == Background == Standard autoregressive decoding in large language models generates one token at a time. The model computes a probability distribution over its vocabulary, samples the next token, and feeds that token back as input. For large models, this process is bottlenecked by memory bandwidth rather than arithmetic throughput: loading the model's parameters from high-bandwidth memory (HBM) to the processor takes up most of the wall-clock time at each step. Because of this, a forward pass over one token and a forward pass over several tokens in a batch take roughly the same time. Speculative decoding relies on this property. == Mechanism == The technique alternates between two phases: drafting and verification. During drafting, a fast approximation model generates a short run of K candidate tokens, typically between 3 and 12. The draft model is usually a much smaller version of the target model or a lightweight auxiliary network. During verification, the target model scores the entire draft sequence in one batched forward pass. A modified rejection sampling algorithm compares the draft and target probabilities at each position. If the target model would have been at least as likely to produce a given token, that token is accepted; the first token that fails is resampled from a corrected distribution, and everything after it is thrown out. The result is that the output distribution is the same as if each token had been generated one at a time. How many tokens get accepted per cycle depends on how well the draft model matches the target. For common words and predictable continuations the match tends to be good, so the target model can confirm several tokens at once. == History == An early precursor was blockwise parallel decoding, proposed in 2018 by Stern, Shazeer, and Uszkoreit. Their method predicted multiple future tokens through auxiliary prediction heads and validated them against the autoregressive model, but it only worked with greedy decoding and did not preserve the full sampling distribution. The modern form of the technique came from Yaniv Leviathan, Matan Kalman, and Yossi Matias at Google Research, who posted "Fast Inference from Transformers via Speculative Decoding" on arXiv in November 2022. Separately and at about the same time, Charlie Chen and colleagues at DeepMind arrived at a closely related method they called speculative sampling, published in February 2023. Both papers introduced the use of rejection sampling to guarantee that the output distribution is unchanged. Leviathan et al. showed roughly 2–3x speedup on T5-XXL (11 billion parameters); Chen et al. reported 2–2.5x on the Chinchilla model (70 billion parameters). The Leviathan et al. paper was presented as an oral at the International Conference on Machine Learning in July 2023. == Variants == SpecInfer (Miao et al., 2024) uses multiple small language models to jointly build a tree of candidate continuations rather than a single chain. The target model verifies the whole tree in parallel and keeps the longest valid path, with reported speedups of 1.5–3.5x. Medusa (Cai et al., 2024) takes a different approach by not using a separate draft model at all. Extra lightweight decoding heads are attached to the target model itself, and each one predicts a token at a different future position. The candidates are evaluated through a tree-structured attention mechanism. The authors measured 2.2–3.6x speedup. EAGLE (Li et al., 2024) performs autoregression on the target model's internal feature representations (specifically the second-to-top layer) rather than on tokens directly. On LLaMA 2 Chat 70B, this gave a 2.7–3.5x latency reduction. Later versions added dynamic draft trees (EAGLE-2) and further optimizations (EAGLE-3), reaching 3–6.5x speedup. == Adoption == By 2024, speculative decoding had become a standard part of production LLM serving. Google uses it in the AI Overviews feature of Google Search. Open-source inference frameworks such as vLLM, NVIDIA's TensorRT-LLM, and SGLang all include built-in support for speculative decoding and its variants. Apple, AWS, and Meta have also published research extending the method or deploying it at scale.

    Read more →
  • Apache CarbonData

    Apache CarbonData

    Apache CarbonData is a free and open-source column-oriented data storage format of the Apache Hadoop ecosystem. It is similar to the other columnar-storage file formats available in Hadoop namely RCFile and ORC. It is compatible with most of the data processing frameworks in the Hadoop environment. It provides efficient data compression and encoding schemes with enhanced performance to handle complex data in bulk. == History == CarbonData was developed at Huawei in 2013. The project was donated to the Apache Community in 2015 submitted to the Apache Incubator in June 2016. The project won top honors in the BlackDuck 2016 Open Source Rookies of the Year's Big Data category. Apache CarbonData has been a top-level Apache Software Foundation (ASF)-sponsored project since May 1, 2017.

    Read more →
  • Web application firewall

    Web application firewall

    A Web application firewall (WAF) is a specific form of application firewall that filters, monitors, and blocks HTTP traffic to and from a web service. By inspecting HTTP traffic, it can prevent attacks exploiting a Web application's known vulnerabilities, such as SQL injection, cross-site scripting (XSS), file inclusion, and improper system configuration. Financial institutions often utilize WAFs to help in the mitigation of Web application zero-day vulnerabilities, as well as hard-to-patch bugs or weaknesses through custom attack signature strings. == History == Dedicated Web application firewalls entered the market in the late 1990s during a time when web server attacks were becoming more prevalent. Early WAF products, from Kavado and Gilian technologies, tried to solve the increasing amount of attacks on Web applications in the late 1990s. In 2002, the open-source project ModSecurity was formed in order to make WAF technology more accessible. They finalized a core rule set for protecting Web applications, based on OASIS Web Application Security Technical Committee’s (WAS TC) vulnerability work. In 2003, they expanded and standardized rules through the Open Web Application Security Project’s (OWASP) Top 10 List, an annual ranking for Web security vulnerabilities. This list would become the industry standard for Web application security compliance. Since then, the market has continued to grow and evolve, especially focusing on credit card fraud prevention. With the development of the Payment Card Industry Data Security Standard (PCI DSS), a standardization of control over cardholder data, security has become more regulated in this sector. == Description == A Web application firewall is a special type of application firewall that applies specifically to Web applications. It is deployed in front of Web applications and analyzes bi-directional web-based (HTTP) traffic – detecting and blocking anything malicious. The OWASP provides a broad technical definition for a WAF as “a security solution on the Web application level which – from a technical point of view – does not depend on the application itself”. According to the PCI DSS Information Supplement for requirement 6.6, a WAF is defined as “a security policy enforcement point positioned between a Web application and the client endpoint. This functionality can be implemented in software or hardware, running in an appliance device, or in a typical server running a common operating system. It may be a stand-alone device or integrated into other network components.” In other words, a WAF can be a virtual or physical appliance that prevents vulnerabilities in Web applications from being exploited by outside threats. These vulnerabilities may be because the application itself is a legacy type or was insufficiently coded by design. The WAF addresses these code shortcomings by special configurations of rule-sets, also known as policies. Previously unknown vulnerabilities can be discovered through penetration testing or via a vulnerability scanner. A Web application vulnerability scanner, also known as a web application security scanner, is defined in the SAMATE NIST 500-269 as “an automated program that examines Web applications for potential security vulnerabilities. In addition to searching for Web application-specific vulnerabilities, the tools also look for software coding errors.” Resolving vulnerabilities is commonly referred to as remediation. Corrections to the code can be made in the application, but typically a more prompt response is necessary. In these situations, the application of a custom policy for a unique Web application vulnerability to provide a temporary but immediate fix (known as a virtual patch) may be necessary. WAFs are not an ultimate security solution, rather they are meant to be used in conjunction with other network perimeter security solutions such as network firewalls and intrusion prevention systems to provide a holistic defense strategy. WAFs typically follow a positive security model, a negative security, or a combination of both as mentioned by the SANS Institute. WAFs use a combination of rule-based logic, parsing, and signatures to detect and prevent attacks such as cross-site scripting and SQL injection. In general, features like browser emulation, obfuscation and virtualization, and IP obfuscation are used to attempt to bypass WAFs. The OWASP produces a list of the top ten Web application security flaws. All commercial WAF offerings cover these ten flaws at a minimum. There are non-commercial options as well. As mentioned earlier, the well-known open-source WAF engine called ModSecurity is one of these options. A WAF engine alone is insufficient to provide adequate protection, therefore OWASP along with Trustwave's Spiderlabs help organize and maintain a Core-Rule Set via GitHub to use with the ModSecurity WAF engine. == Deployment options == Although the names for operating mode may differ, WAFs are basically deployed inline in three different ways. According to NSS Labs, deployment options are transparent bridge, transparent reverse proxy, and reverse proxy. "Transparent" refers to the fact that the HTTP traffic is sent straight to the Web application, therefore the WAF is transparent between the client and server. This is in contrast to reverse proxy, where the WAF acts as a proxy, and the client’s traffic is sent directly to the WAF. The WAF then separately sends filtered traffic to Web applications. This can provide additional benefits such as IP masking but may introduce disadvantages such as performance latencies. == JA3 fingerprint == JA3, developed by Salesforce in 2017, is a technique for generating a unique fingerprint for SSL/TLS traffic based on specific fields in the handshake, such as the version, cipher suites, and extensions used by the client. This fingerprint enables the identification and tracking of clients based on the characteristics of their encrypted traffic. In the context of distributed denial of service (DDoS) protection, JA3 fingerprints are used to detect and differentiate malicious traffic, often associated with attack bots, from legitimate traffic, allowing for more precise filtering of potential threats. In September 2023, AWS WAF announced built-in support for JA3, enabling customers to inspect the JA3 fingerprints of incoming requests. JA3 was deprecated in May 2025 in favor of JA4. JA4 is currently patent pending.

    Read more →