AI Avatar Tools

AI Avatar Tools — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Reparameterization trick

    Reparameterization trick

    The reparameterization trick (aka "reparameterization gradient estimator") is a technique used in statistical machine learning, particularly in variational inference, variational autoencoders, and stochastic optimization. It allows for the efficient computation of gradients through random variables, enabling the optimization of parametric probability models using stochastic gradient descent, and the variance reduction of estimators. It was developed in the 1980s in operations research, under the name of "pathwise gradients", or "stochastic gradients". Its use in variational inference was proposed in 2013. == Mathematics == Let z {\displaystyle z} be a random variable with distribution q ϕ ( z ) {\displaystyle q_{\phi }(z)} , where ϕ {\displaystyle \phi } is a vector containing the parameters of the distribution. === REINFORCE estimator === Consider an objective function of the form: L ( ϕ ) = E z ∼ q ϕ ( z ) [ f ( z ) ] {\displaystyle L(\phi )=\mathbb {E} _{z\sim q_{\phi }(z)}[f(z)]} Without the reparameterization trick, estimating the gradient ∇ ϕ L ( ϕ ) {\displaystyle \nabla _{\phi }L(\phi )} can be challenging, because the parameter appears in the random variable itself. In more detail, we have to statistically estimate: ∇ ϕ L ( ϕ ) = ∇ ϕ ∫ d z q ϕ ( z ) f ( z ) {\displaystyle \nabla _{\phi }L(\phi )=\nabla _{\phi }\int dz\;q_{\phi }(z)f(z)} The REINFORCE estimator, widely used in reinforcement learning and especially policy gradient, uses the following equality: ∇ ϕ L ( ϕ ) = ∫ d z q ϕ ( z ) ∇ ϕ ( ln ⁡ q ϕ ( z ) ) f ( z ) = E z ∼ q ϕ ( z ) [ ∇ ϕ ( ln ⁡ q ϕ ( z ) ) f ( z ) ] {\displaystyle \nabla _{\phi }L(\phi )=\int dz\;q_{\phi }(z)\nabla _{\phi }(\ln q_{\phi }(z))f(z)=\mathbb {E} _{z\sim q_{\phi }(z)}[\nabla _{\phi }(\ln q_{\phi }(z))f(z)]} This allows the gradient to be estimated: ∇ ϕ L ( ϕ ) ≈ 1 N ∑ i = 1 N ∇ ϕ ( ln ⁡ q ϕ ( z i ) ) f ( z i ) {\displaystyle \nabla _{\phi }L(\phi )\approx {\frac {1}{N}}\sum _{i=1}^{N}\nabla _{\phi }(\ln q_{\phi }(z_{i}))f(z_{i})} The REINFORCE estimator has high variance, and many methods were developed to reduce its variance. === Reparameterization estimator === The reparameterization trick expresses z {\displaystyle z} as: z = g ϕ ( ϵ ) , ϵ ∼ p ( ϵ ) {\displaystyle z=g_{\phi }(\epsilon ),\quad \epsilon \sim p(\epsilon )} Here, g ϕ {\displaystyle g_{\phi }} is a deterministic function parameterized by ϕ {\displaystyle \phi } , and ϵ {\displaystyle \epsilon } is a noise variable drawn from a fixed distribution p ( ϵ ) {\displaystyle p(\epsilon )} . This gives: L ( ϕ ) = E ϵ ∼ p ( ϵ ) [ f ( g ϕ ( ϵ ) ) ] {\displaystyle L(\phi )=\mathbb {E} _{\epsilon \sim p(\epsilon )}[f(g_{\phi }(\epsilon ))]} Now, the gradient can be estimated as: ∇ ϕ L ( ϕ ) = E ϵ ∼ p ( ϵ ) [ ∇ ϕ f ( g ϕ ( ϵ ) ) ] ≈ 1 N ∑ i = 1 N ∇ ϕ f ( g ϕ ( ϵ i ) ) {\displaystyle \nabla _{\phi }L(\phi )=\mathbb {E} _{\epsilon \sim p(\epsilon )}[\nabla _{\phi }f(g_{\phi }(\epsilon ))]\approx {\frac {1}{N}}\sum _{i=1}^{N}\nabla _{\phi }f(g_{\phi }(\epsilon _{i}))} == Examples == For some common distributions, the reparameterization trick takes specific forms: Normal distribution: For z ∼ N ( μ , σ 2 ) {\displaystyle z\sim {\mathcal {N}}(\mu ,\sigma ^{2})} , we can use: z = μ + σ ϵ , ϵ ∼ N ( 0 , 1 ) {\displaystyle z=\mu +\sigma \epsilon ,\quad \epsilon \sim {\mathcal {N}}(0,1)} Exponential distribution: For z ∼ Exp ( λ ) {\displaystyle z\sim {\text{Exp}}(\lambda )} , we can use: z = − 1 λ log ⁡ ( ϵ ) , ϵ ∼ Uniform ( 0 , 1 ) {\displaystyle z=-{\frac {1}{\lambda }}\log(\epsilon ),\quad \epsilon \sim {\text{Uniform}}(0,1)} Discrete distribution can be reparameterized by the Gumbel distribution (Gumbel-softmax trick or "concrete distribution") and diffusion models. In general, any distribution that is differentiable with respect to its parameters can be reparameterized by inverting the multivariable CDF function, then apply the implicit method. See for an exposition and application to the Gamma, Beta, Dirichlet, and von Mises distributions. == Applications == === Variational autoencoder === In Variational Autoencoders (VAEs), the VAE objective function, known as the Evidence Lower Bound (ELBO), is given by: ELBO ( ϕ , θ ) = E z ∼ q ϕ ( z | x ) [ log ⁡ p θ ( x | z ) ] − D KL ( q ϕ ( z | x ) | | p ( z ) ) {\displaystyle {\text{ELBO}}(\phi ,\theta )=\mathbb {E} _{z\sim q_{\phi }(z|x)}[\log p_{\theta }(x|z)]-D_{\text{KL}}(q_{\phi }(z|x)||p(z))} where q ϕ ( z | x ) {\displaystyle q_{\phi }(z|x)} is the encoder (recognition model), p θ ( x | z ) {\displaystyle p_{\theta }(x|z)} is the decoder (generative model), and p ( z ) {\displaystyle p(z)} is the prior distribution over latent variables. The gradient of ELBO with respect to θ {\displaystyle \theta } is simply E z ∼ q ϕ ( z | x ) [ ∇ θ log ⁡ p θ ( x | z ) ] ≈ 1 L ∑ l = 1 L ∇ θ log ⁡ p θ ( x | z l ) {\displaystyle \mathbb {E} _{z\sim q_{\phi }(z|x)}[\nabla _{\theta }\log p_{\theta }(x|z)]\approx {\frac {1}{L}}\sum _{l=1}^{L}\nabla _{\theta }\log p_{\theta }(x|z_{l})} but the gradient with respect to ϕ {\displaystyle \phi } requires the trick. Express the sampling operation z ∼ q ϕ ( z | x ) {\displaystyle z\sim q_{\phi }(z|x)} as: z = μ ϕ ( x ) + σ ϕ ( x ) ⊙ ϵ , ϵ ∼ N ( 0 , I ) {\displaystyle z=\mu _{\phi }(x)+\sigma _{\phi }(x)\odot \epsilon ,\quad \epsilon \sim {\mathcal {N}}(0,I)} where μ ϕ ( x ) {\displaystyle \mu _{\phi }(x)} and σ ϕ ( x ) {\displaystyle \sigma _{\phi }(x)} are the outputs of the encoder network, and ⊙ {\displaystyle \odot } denotes element-wise multiplication. Then we have ∇ ϕ ELBO ( ϕ , θ ) = E ϵ ∼ N ( 0 , I ) [ ∇ ϕ log ⁡ p θ ( x | z ) + ∇ ϕ log ⁡ q ϕ ( z | x ) − ∇ ϕ log ⁡ p ( z ) ] {\displaystyle \nabla _{\phi }{\text{ELBO}}(\phi ,\theta )=\mathbb {E} _{\epsilon \sim {\mathcal {N}}(0,I)}[\nabla _{\phi }\log p_{\theta }(x|z)+\nabla _{\phi }\log q_{\phi }(z|x)-\nabla _{\phi }\log p(z)]} where z = μ ϕ ( x ) + σ ϕ ( x ) ⊙ ϵ {\displaystyle z=\mu _{\phi }(x)+\sigma _{\phi }(x)\odot \epsilon } . This allows us to estimate the gradient using Monte Carlo sampling: ∇ ϕ ELBO ( ϕ , θ ) ≈ 1 L ∑ l = 1 L [ ∇ ϕ log ⁡ p θ ( x | z l ) + ∇ ϕ log ⁡ q ϕ ( z l | x ) − ∇ ϕ log ⁡ p ( z l ) ] {\displaystyle \nabla _{\phi }{\text{ELBO}}(\phi ,\theta )\approx {\frac {1}{L}}\sum _{l=1}^{L}[\nabla _{\phi }\log p_{\theta }(x|z_{l})+\nabla _{\phi }\log q_{\phi }(z_{l}|x)-\nabla _{\phi }\log p(z_{l})]} where z l = μ ϕ ( x ) + σ ϕ ( x ) ⊙ ϵ l {\displaystyle z_{l}=\mu _{\phi }(x)+\sigma _{\phi }(x)\odot \epsilon _{l}} and ϵ l ∼ N ( 0 , I ) {\displaystyle \epsilon _{l}\sim {\mathcal {N}}(0,I)} for l = 1 , … , L {\displaystyle l=1,\ldots ,L} . This formulation enables backpropagation through the sampling process, allowing for end-to-end training of the VAE model using stochastic gradient descent or its variants. === Variational inference === More generally, the trick allows using stochastic gradient descent for variational inference. Let the variational objective (ELBO) be of the form: ELBO ( ϕ ) = E z ∼ q ϕ ( z ) [ log ⁡ p ( x , z ) − log ⁡ q ϕ ( z ) ] {\displaystyle {\text{ELBO}}(\phi )=\mathbb {E} _{z\sim q_{\phi }(z)}[\log p(x,z)-\log q_{\phi }(z)]} Using the reparameterization trick, we can estimate the gradient of this objective with respect to ϕ {\displaystyle \phi } : ∇ ϕ ELBO ( ϕ ) ≈ 1 L ∑ l = 1 L ∇ ϕ [ log ⁡ p ( x , g ϕ ( ϵ l ) ) − log ⁡ q ϕ ( g ϕ ( ϵ l ) ) ] , ϵ l ∼ p ( ϵ ) {\displaystyle \nabla _{\phi }{\text{ELBO}}(\phi )\approx {\frac {1}{L}}\sum _{l=1}^{L}\nabla _{\phi }[\log p(x,g_{\phi }(\epsilon _{l}))-\log q_{\phi }(g_{\phi }(\epsilon _{l}))],\quad \epsilon _{l}\sim p(\epsilon )} === Dropout === The reparameterization trick has been applied to reduce the variance in dropout, a regularization technique in neural networks. The original dropout can be reparameterized with Bernoulli distributions: y = ( W ⊙ ϵ ) x , ϵ i j ∼ Bernoulli ( α i j ) {\displaystyle y=(W\odot \epsilon )x,\quad \epsilon _{ij}\sim {\text{Bernoulli}}(\alpha _{ij})} where W {\displaystyle W} is the weight matrix, x {\displaystyle x} is the input, and α i j {\displaystyle \alpha _{ij}} are the (fixed) dropout rates. More generally, other distributions can be used than the Bernoulli distribution, such as the gaussian noise: y i = μ i + σ i ⊙ ϵ i , ϵ i ∼ N ( 0 , I ) {\displaystyle y_{i}=\mu _{i}+\sigma _{i}\odot \epsilon _{i},\quad \epsilon _{i}\sim {\mathcal {N}}(0,I)} where μ i = m i ⊤ x {\displaystyle \mu _{i}=\mathbf {m} _{i}^{\top }x} and σ i 2 = v i ⊤ x 2 {\displaystyle \sigma _{i}^{2}=\mathbf {v} _{i}^{\top }x^{2}} , with m i {\displaystyle \mathbf {m} _{i}} and v i {\displaystyle \mathbf {v} _{i}} being the mean and variance of the i {\displaystyle i} -th output neuron. The reparameterization trick can be applied to all such cases, resulting in the variational dropout method.

    Read more →
  • Write or Die

    Write or Die

    Write or Die is an online web application designed to combat writer's block by letting users of the application punish themselves if they slow down or stop typing in the application's window. How severe the punishments are depends on the mode the user chooses, which ranges from "Gentle" to "Kamikaze". It was reviewed by publications PCWorld, the Los Angeles Times and The Guardian, and it was most notably used by writers Helen Oyeyemi and David Nicholls. The creator, Jeff Printy, explained that he wrote the application because he wants "to be published and make a living as a writer."

    Read more →
  • Biomedical data science

    Biomedical data science

    Biomedical data science is a multidisciplinary field which leverages large volumes of data to promote biomedical innovation and discovery. Biomedical data science draws from various fields including Biostatistics, Biomedical informatics, and machine learning, with the goal of understanding biological and medical data. It can be viewed as the study and application of data science to solve biomedical problems. Modern biomedical datasets often have specific features which make their analyses difficult, including: Large numbers of feature (sometimes billions), typically far larger than the number of samples (typically tens or hundreds) Noisy and missing data Privacy concerns (e.g., electronic health record confidentiality) Requirement of interpretability from decision makers and regulatory bodies Many biomedical data science projects apply machine learning to such datasets. These characteristics, while also present in many data science applications more generally, make biomedical data science a specific field. Examples of biomedical data science research include: Computational genomics Computational imaging Electronic health records data mining Biomedical network science Clinical Natural Language Processing (NLP) == Computational Imaging and Deep Learning == Computational imaging is a cornerstone of biomedical data science, focusing on the development of algorithms to enhance, analyze, and interpret medical imagery. In recent years, the field has been transformed by the integration of deep learning, particularly through the use of Convolutional Neural Networks. Deep learning started from researchers manually defining characteristics like edge detection or texture representation learning. In a more modern approach of computational imaging, models automatically learn a hierarchy of features directly from raw pixel data. This overlap between data science and deep learning is applied across several key tasks: Classification: Identifying the presence of specific diseases, such as distinguishing between benign and malignant tumors in histopathology slides or detecting pneumonia in chest X-rays. Segmentation: The precise delineation of anatomical structures or lesions. A notable example is the U-Net architecture, which is widely used for biomedical image segmentation to help clinicians quantify organ volume or track tumor growth. Detection: Automating the localization of small objects, such as identifying microcalcifications in mammograms or polyps during colonoscopies. Registration: The process of aligning multiple images to provide a comprehensive view of the patient's anatomy. Even with all of these enhancements, the application of deep learning in medical imaging requires accomplishing vigorous challenges. An example of these changes is building large, annotated datasets and creating the imperative for model interpretability in clinical decision-making. == Electronic Health Records == Electronic Health Records (EHRs) are a digital alternative to patient paper charts, usually including individual records or population health information. EHRs can be used in a wide variety of applications, including research and analysation as they often include demographics, diagnoses, medications, test results, and personal statistics. === History === ==== 1960s ==== The earliest precursor is considered Dr. Lawrence Weed's problem-oriented medical record (POMR) published in the 1968 which sorts and groups medical records by medical diagnoses and symptoms. The POMR was the first system to organize based off of patient information rather than the source (doctors, nurses, attendings, etc.). In 1969, the Regenstrief Institute developed and published the Regenstrief Medical Record System which established electronic writing, storage, and retrieval of records which served as the basis for modern EHR systems. ==== 2000s ==== In 2009, the Health Information Technology for Economic and Clinical Health Act (HITECH Act) was passed in the United States. This act standardized privacy and distribution of EHRs and increased the acceptance and utilization of EHRs within medical and academic settings. == Artificial Intelligence and Machine Learning Applications == Machine Learning and Artificial Intelligence have become central tools in biomedical data science. Recent advances in large language models (LLMs) have expanded their role beyond text, with models trained directly on genomic sequences enabling tasks such as gene function prediction, variant effect analysis, and drug discovery. In clinical settings, Natural Language Processing (NLP) models are applied to electronic health records to extract structured insights from unstructured clinical notes and data, supporting diagnosis and treatment planning. Beyond genomics, AI models have been applied to protein structure prediction. AlphaFold, developed by Google DeepMind, uses deep learning to predict three-dimensional protein structures from amino acid sequences with high accuracy. These predictions have been used to support drug target identification and the study of disease mechanisms. == Knowledge Graphs == Knowledge graphs (KGs) are widely used in biomedical data science to represent and analyze complex relationships among biological and medical entities. By structuring data as nodes (e.g., genes, diseases, drugs) and edges (relationships), KGs enable computational methods to extract insights and support decision-making. These biomedical relationships can be efficiently modeled and queried using technologies such as Neo4j. === Biomedical Research Applications === KGs provide biomedical researchers with a way to model complex biological systems. They have been used to identify the relationships between diseases and biomolecules, support drug repurposing, and to uncover new biological insights. Additional applications include: Identification of novel antibiotic resistance genes through graph-based link prediction. Finding associations between miRNA and diseases. Prediction of protein-protein interactions. === Clinical Applications === In clinical settings, KGs can be used to make visual representations of a patient's electronic health records. The data obtained from these graphs can assist healthcare providers in improving patient diagnoses and prescribing more effective drugs. Additionally, embeddings derived from resources like the Unified Medical Language System (UMLS) enable natural language processing of clinical text and similarity analysis between medical concepts. === Limitations === Despite their advantages, knowledge graphs face several challenges. Some of these include: High algorithmic complexity and large biological datasets make the process computationally expensive. KG construction can be a time-consuming process that requires careful attention to assign appropriate node types and vocabularies. Using data from a wide range of datasets in one KG requires them to be effectively integrated. == Privacy == A primary challenge in biomedical data science is maintaining medical privacy. Conducting research requires that data be collected on a number of people for training and testing purposes and is stored within biomedical datasets. This poses a risk for violating patient confidentiality and may dissuade people from participating in studies. The main sources of health statistics are surveys administrative and medical records health care claims data, vital records surveillance disease registries grey literature and peer-reviewed literature. Large data collection is a useful tool for researching various medical conditions. Researchers use these large datasets of information to identify factors that may make people more susceptible to certain diseases. Large amounts of collected data can help researchers identify patterns for disease probabilities. The findings can show a person is more likely for a condition, or identify environmental, social, and personal habits that may lead to adverse health issues. Institutions researching using personal medical information come with a moral and legal responsibility to protect the use of that information. Protection of the collected information has become a big concern. Sophisticated and coordinated attacks on certain medical systems happen more frequently. Medical companies, medical insurance and private businesses have invested a great deal into the protection of personal data. Despite this, data breaches continue to be documented. The chart below shows the top healthcare breaches in 2025. For these reasons, many people have reservations about giving up their personal data. Aside from the legitimate use of personal data there have been instances where companies have found methods to profit from brokering medical information. Concerns exist regarding unauthorized use of sensitive information within these data companies. If a person is identified within a dataset, then sensitive data can be used to discriminate against them. For example, insurance companies may charge a hi

    Read more →
  • Server.com

    Server.com

    Server.com is a domain name that was owned by software as a service (SaaS) company Server Corporation. They offered a suite of services from 1996 until 2007. It was the first SaaS site to offer a variety of services and the first to use the term WebApp to describe its services. It was selected as an Incredibly Useful Site by Yahoo! Internet Life magazine. net magazine listed Server.com among the 100 most influential websites of all time. Server.com launched in 1996 offering the first online personal information manager. In 1997, they rolled out the first threaded message board service; the first web based mailing list manager; one of the first online calendar services; and one of the first online form builders. In 2000, Server.com partnered with NBCi and became server.snap.com until 2001. In 2001, Server.com was serving 100 million monthly pageviews. Media Life declared it one of the 20 biggest ad domains on the Web. In 2002, Server.com developed one of the first web-based RSS aggregators. In 2007, all services were moved to YourWebApps.com. The domain name Server.com was sold in 2009 for $770,000.

    Read more →
  • Teaspiller

    Teaspiller

    Teaspiller was a US-based web application for customers to find accountants and hire them to do their taxes and accounting online. In 2013 the company was acquired by Intuit, Inc and added to its TurboTax product line. The Teaspiller employees and code were all acquired and the product was renamed as "TurboTax CPA select". It enabled accountants to work remotely with clients (share files, send secure messages, schedule appointments), as well as find new clients looking for their specific skills through a complex search algorithm. This was done through extended profiles containing licensing information, professional histories, user ratings, peer endorsements, association memberships, and practice areas. The service had been called an H&R Block killer by Business Insider as it helped customers find accountants to prepare tax returns online. As of 2011 it had 20,000 US accountants listed on the site. The application was built using the Django framework. == History == Teaspiller was built by Vemdara, LLC, a web company based in New York and founded in 2009 by Amit Vemuri (a former VP at Travelocity). The web application was launched in 2010. In 2013 the company was acquired by Intuit as part of their TurboTax product line and renamed as "TurboTax CPA select".

    Read more →
  • Dave's Redistricting

    Dave's Redistricting

    Dave's Redistricting App (DRA) is an online web app originally created by Dave Bradlee that allows anyone to simulate redistricting a U.S. state's congressional and legislative districts. == Purpose == According to Bradlee, the software was designed to "put power in people's hands," and so that they "can see how the process works, so it's a little less mysterious than it was 10 years ago." Bradlee has noticed that many citizens are taking this process seriously and using his app to create legitimate redistricting maps that could be put in place. Some websites have called Bradlee the pioneer and cause of the rise of do-it-yourself redistricting. States such as Montana in 2021 allowed the general population to use it to submit redistricting proposals following the 2020 United States Census. Dave's Redistricting has frequently been mentioned as a resource that can be used to combat gerrymandering, given that the public has free access to it. Political science firms such as FiveThirtyEight have used the website to draw examples of gerrymandered districts, including on their famous Atlas of Redistricting. Dave Bradlee built the first generation of DRA. DRA 2020 is built by a small team of volunteers—Dave Bradlee, Terry Crowley, Alec Ramsay, and David Rinn—all with a shared passion for technology & democracy and all Microsoft veterans. Their mission is to empower civic organizations and citizen activists to advocate for fair congressional and legislative districts and increased transparency in the redistricting process. == Functions == Users can redraw the congressional and state legislative districts for all 50 states, the District of Columbia, and Puerto Rico using a variety of census and election datasets including Cook PVI. Maps can be optimized for different criteria. DRA 2020 added several major features to the first generation app: Sharing & collaborative editing of maps, like Google Docs Multiple statewide elections for all 50 states including the ability to import your own data Comprehensive analytics for evaluating and comparing maps Custom overlays, and Block-level editing DRA remains free to use. == Versions == 2.2: This uses Bing Maps, an outdated software that projects the districts of a single state onto a map of the United States. 2.5: After Bing Maps announced that it would no longer be updating for the foreseen future, the U.S. Map feature was removed. DRA 2020: At the end of 2018, a beta version of 2020 was released. This version that did not require Microsoft Silverlight and could be used in any web browser. DRA 2020 has been under continuous development since and is built using React (JavaScript library), Mapbox, OpenStreetMap, TypeScript, Node.js, Amazon Web Services, as well as many open source components, tools, and icons.

    Read more →
  • Software configuration management

    Software configuration management

    Software configuration management (SCM), a.k.a. software change and configuration management (SCCM), is the software engineering practice of tracking and controlling changes to a software system. It is part of the larger cross-disciplinary field of configuration management (CM). SCM includes version control and the establishment of baselines. == Goals == The goals of SCM include: Configuration identification - Identifying configurations, configuration items and baselines. Configuration control - Implementing a controlled change process. This is usually achieved by setting up a change control board whose primary function is to approve or reject all change requests that are sent against any baseline. Configuration status accounting - Recording and reporting all the necessary information on the status of the development process. Configuration auditing - Ensuring that configurations contain all their intended parts and are sound with respect to their specifying documents, including requirements, architectural specifications and user manuals. Build management - Managing the process and tools used for builds. Process management - Ensuring adherence to the organization's development process. Environment management - Managing the software and hardware that host the system. Teamwork - Facilitate team interactions related to the process. Defect tracking - Making sure every defect has traceability back to the source. With the introduction of cloud computing and DevOps the purposes of SCM tools have become merged in some cases. The SCM tools themselves have become virtual appliances that can be instantiated as virtual machines and saved with state and version. The tools can model and manage cloud-based virtual resources, including virtual appliances, storage units, and software bundles. The roles and responsibilities of the actors have become merged as well with developers now being able to dynamically instantiate virtual servers and related resources. == History == == Examples == Ansible – Open-source software platform for remote configuring and managing computers CFEngine – Configuration management software Chef – Configuration management toolPages displaying short descriptions of redirect targets LCFG – Computer configuration management system NixOS – Linux distribution OpenMake Software – DevOps company Otter Puppet – Open source configuration management software Salt – Configuration management software Rex – Open source software

    Read more →
  • GNU toolchain

    GNU toolchain

    The GNU toolchain is a broad collection of programming tools produced by the GNU Project. These tools form a toolchain (a suite of tools used in a serial manner) used for developing software applications and operating systems. The GNU toolchain plays a vital role in development of Linux, some BSD systems, and software for embedded systems. Parts of the GNU toolchain are also directly used with or ported to other platforms such as Solaris, macOS, Microsoft Windows (via Cygwin and MinGW/MSYS/WSL2), Sony PlayStation Portable (used by PSP modding scene) and Sony PlayStation 3. == Components == Projects in the GNU toolchain are: GNU Autotools (build system) – Software build toolset from GNU GNU Binutils – GNU software development tools for executable code GNU Bison – Yacc-compatible parser generator program GNU C Library – GNU implementation of the standard C libraryPages displaying short descriptions of redirect targets GNU Compiler Collection – Free and open-source compiler for various programming languages GNU Debugger – Source-level debugger GNU m4 – General-purpose macro processor GNU make – Software build automation tool

    Read more →
  • Bandhan Tod

    Bandhan Tod

    Bandhan Tod is a mobile app to stop child marriage in India's Bihar state through SOS button in the app. When the SOS on Bandhan Tod is activated, the nearest small NGO will attempt to resolve the issue. If the family resists, then the police gets notified. Till now so many child marriages has been cancelled through Bandhan Tod interventions. Bandhan Tod is an initiative of Gender Alliance managed by Prashanti Tiwari to support the state government's efforts to end child marriage and dowry.

    Read more →
  • Containerization (computing)

    Containerization (computing)

    In software engineering, containerization is operating-system-level virtualization or application-level virtualization over multiple resources so that software applications can run in isolated user spaces called containers in any cloud or non-cloud environment, regardless of type or vendor. The term "container" has different meanings in different contexts, and it is important to ensure that the intended definition aligns with the audience's understanding. == Usage == Each container is basically a fully functional and portable cloud or non-cloud computing environment surrounding the application and keeping it independent of other environments running in parallel. Individually, each container simulates a different software application and runs isolated processes by bundling related configuration files, libraries and dependencies. But, collectively, multiple containers share a common operating system kernel (OS). In recent times, containerization technology has been widely adopted by cloud computing platforms like Amazon Web Services, Microsoft Azure, Google Cloud Platform, and IBM Cloud. Containerization has also been pursued by the U.S. Department of Defense as a way of more rapidly developing and fielding software updates, with first application in its F-22 air superiority fighter. == History == The concept of containerization in computing originated from early operating system–level isolation mechanisms. One of the earliest implementations was the chroot system call introduced in Version 7 Unix in 1979, which changed the apparent root directory for a process and its children, providing a basic form of filesystem isolation. In the early 2000s, more advanced forms of operating system–level virtualization were developed. FreeBSD introduced "jails" in 2000, which extended isolation by restricting processes to a subset of system resources. Around the same time, Solaris introduced "zones" (also known as Solaris Containers), providing similar capabilities with resource management and isolation features. Linux later incorporated comparable functionality through kernel features such as namespaces and control groups (cgroups), which enabled isolation of process IDs, network stacks, filesystems, and resource allocation. These features formed the foundation for Linux Containers (LXC), which provided a userspace interface for managing containers. The widespread adoption of containerization accelerated with the release of Docker in 2013, which introduced a standardized format for packaging applications and their dependencies, along with tooling for image distribution and container management. == Types of containers == OS containers Application containers == Security issues == Because of the shared OS, security threats can affect the whole containerized system. In containerized environments, security scanners generally protect the OS, but not the application containers, which adds unwanted vulnerability. == Container management, orchestration, clustering == Container orchestration or container management is mostly used in the context of application containers. Implementations providing such orchestration include Kubernetes and Docker swarm. == Container cluster management == Container clusters need to be managed. This includes functionality to create a cluster, to upgrade the software or repair it, balance the load between existing instances, scale by starting or stopping instances to adapt to the number of users, to log activities and monitor produced logs or the application itself by querying sensors. Open-source implementations of such software include OKD and Rancher. Quite a number of companies provide container cluster management as a managed service, like Alibaba, Amazon, Google, and Microsoft.

    Read more →
  • PlantUML

    PlantUML

    PlantUML is an open-source tool allowing users to create diagrams from a plain text language. Besides various UML diagrams, PlantUML has support for various other software development related formats (such as Archimate, Block diagram, BPMN, C4, Computer network diagram, ERD, Gantt chart, Mind map, and WBD), as well as visualisation of JSON and YAML files. The language of PlantUML is an example of a domain-specific language. Besides its own DSL, PlantUML also understands AsciiMath, Creole, DOT, and LaTeX. It uses Graphviz software to lay out its diagrams and Tikz for LaTeX support. Images can be output as PNG, SVG, LaTeX and even ASCII art. PlantUML has also been used to allow blind people to design and read UML diagrams. == Applications that use PlantUML == There are various extensions or add-ons that incorporate PlantUML. Atom has a community maintained PlantUML syntax highlighter and viewer. Confluence wiki has a PlantUML plug-in for Confluence Server, which renders diagrams on-the-fly during a page reload. There is an additional PlantUML plug-in for Confluence Cloud. Doxygen integrates diagrams for which sources are provided after the startuml command. Eclipse has a PlantUML plug-in. Google Docs has an add-on called PlantUML Gizmo that works with the PlantUML.com server. IntelliJ IDEA can create and display diagrams embedded into Markdown (built-in) or in standalone files (using a plugin). LaTeX using the Tikz package has limited support for PlantUML. LibreOffice has Libo_PlantUML extension to use PlantUML diagrams. MediaWiki has a PlantUML plug-in which renders diagrams in pages as SVG or PNG. Microsoft Word can use PlantUML diagrams via a Word Template Add-in. There is an additional Visual Studio Tools for Office add-in called PlantUML Gizmo that works in a similar fashion. NetBeans has a PlantUML plug-in. Notepad++ has a PlantUML plug-in. Obsidian has a PlantUML plug-in. Org-mode has a PlantUML org-babel support. Rider has a PlantUML plug-in. Sublime Text has a PlantUML package called PlantUmlDiagrams for Sublime Text 2 and 3. Visual Studio Code has various PlantUML extensions on its marketplace, most popular being PlantUML by jebbs. Vnote open source notetaking markdown application has built in PlantUML support. Xcode has a community maintained Source Editor Extension to generate and view PlantUML class diagrams from Swift source code. == Text format to communicate UML at source code level == PlantUML uses well-formed and human-readable code to render the diagrams. There are other text formats for UML modelling, but PlantUML supports many diagram types, and does not need an explicit layout, though it is possible to tweak the diagrams if necessary. +--------------------------------------+ | TEDx Talks Recommendation | | System | +--------------------------------------+ | +----------------------------------+ | | | Visitor | | | +----------------------------------+ | | | + View Recommended Talks | | | | + Search Talks | | | +----------------------------------+ | +--------------------------------------+ | | V +--------------------------------------+ | Authenticated User | +--------------------------------------+ | +----------------------------------+ | | | User | | | +----------------------------------+ | | | + View Recommended Talks | | | | + Search Talks | | | | + Save Favorite Talks | | | +----------------------------------+ | +--------------------------------------+ | | V +--------------------------------------+ | Admin | +--------------------------------------+ | +----------------------------------+ | | | Admin | | | +----------------------------------+ | | | + CRUD Talks | | | | + Manage Users | | | +----------------------------------+ | +--------------------------------------+

    Read more →
  • Toolchain

    Toolchain

    A toolchain is a set of software development tools used to build and otherwise develop software. Often, the tools are executed sequentially and form a pipeline such that the output of one tool is the input for the next. Sometimes the term is used for a set of related tools that are not necessarily executed sequentially. A relatively common and simple toolchain consists of the tools to build for a particular operating system (OS) and CPU architecture: a compiler, a linker, and a debugger. With a cross-compiler, a toolchain can support cross-platform development. For building more complex software systems, many other tools may be in the toolchain. For example, for a video game, the toolchain may include tools for preparing sound effects, music, textures, 3-dimensional models and animations, and for combining these resources into the finished product.

    Read more →
  • List of security hacking incidents

    List of security hacking incidents

    This list of security hacking incidents covers important or noteworthy events in the history of security hacking and cracking. == 1900 == === 1903 === Magician and inventor Nevil Maskelyne disrupts John Ambrose Fleming's public demonstration of Guglielmo Marconi's purportedly secure wireless telegraphy technology, sending insulting Morse code messages through the auditorium's projector. == 1930s == === 1932 === Polish cryptologists Marian Rejewski, Henryk Zygalski and Jerzy Różycki broke the Enigma machine code. === 1939 === Alan Turing, Gordon Welchman and Harold Keen worked together to develop the codebreaking device Bombe (based off of Rejewski's work on Bomba). The Enigma machine's use of a reliably small key space makes it vulnerable to brute force attacks. == 1940s == === 1943 === René Carmille, comptroller general of the Vichy French Army, hacked the punch card system used by the Nazis to locate Jews. === 1949 === The theory that underlies computer viruses was first made public in 1949, when computer pioneer John von Neumann presented a paper titled "Theory and Organization of Complicated Automata". In the paper, von Neumann speculated that computer programs could reproduce themselves. == 1950s == === 1955 === At MIT, "hack" first came to mean playing with machines. An April 1955 meeting of the Tech Model Railroad Club has one say that "Mr. Eccles requests that anyone working or hacking on the electrical system turn the power off to avoid fuse blowing." === 1957 === Joe "Joybubbles" Engressia, a blind seven-year-old boy with perfect pitch, discovered that whistling the fourth E above middle C (a frequency of 2600 Hz) would interfere with AT&T's automated telephone systems, thereby inadvertently opening the door for phreaking. == 1960s == Various phreaking boxes are used to interact with automated telephone systems. === 1963 === The first ever reference to malicious hacking is 'phreaking' in MIT's student newspaper, The Tech, containing hackers tying up the lines with Harvard, configuring the PDP-1 to make free calls, war dialing and accumulating large phone bills. === 1965 === William D. Mathews from MIT finds a vulnerability in a CTSS running on an IBM 7094. The standard text editor on the system was designed to be used by one user at a time, working in one directory, and so it created a temporary file with a constant name for all instances of the editor. The flaw was discovered when two system programmers were editing at the same time and the temporary files for the message of the day and the password file became swapped, causing the contents of the system CTSS password file to display to any user logging into the system. === 1967 === The first known incidence of network penetration hacking took place when members of a computer club at a suburban Chicago high school were provided access to IBM's APL network. In the fall of 1967, IBM (through Science Research Associates) approached Evanston Township High School with the offer of four 2741 Selectric teletypewriter-based terminals with dial-up modem connectivity to an experimental computer system which implemented an early version of the APL programming language. The APL network system was structured into workspaces which were assigned to various clients using the system. Working independently, the students quickly learned the language and the system. They were free to explore the system, often using existing code available in public workspaces as models for their own creations. Eventually, curiosity drove the students to explore the system's wider context. This first informal network penetration effort was later acknowledged as helping harden the security of one of the first publicly accessible networks:Science Research Associates undertook to write a full APL system for the IBM 1500. They modeled their system after APL/360, which had by that time been developed and seen substantial use inside of IBM, using code borrowed from MAT/1500 where possible. In their documentation, they acknowledge their gratitude to "a number of high school students for their compulsion to bomb the system". This was an early example of a kind of sportive, but very effective, debugging that was often repeated in the evolution of APL systems. == 1970s == === 1971 === John T. Draper (later nicknamed Captain Crunch), his friend Joe Engressia (also known as Joybubbles), and blue box phone phreaking hit the news with an Esquire magazine feature story. === 1979 === Kevin Mitnick breaks into his first major computer system, the Ark, which was the computer system Digital Equipment Corporation (DEC) used for developing their RSTS/E operating system software. == 1980s == === 1980 === The FBI investigates a breach of security at National CSS (NCSS). The New York Times, reporting on the incident in 1981, describes hackers as: Technical experts, skilled, often young, computer programmers who almost whimsically probe the defenses of a computer system, searching out the limits and the possibilities of the machine. Despite their seemingly subversive role, hackers are a recognized asset in the computer industry, often highly prized. The newspaper describes white hat activities as part of a "mischievous but perversely positive 'hacker' tradition". When a National CSS employee revealed the existence of his password cracker, which he had used on customer accounts, the company chastised him not for writing the software but for not disclosing it sooner. The letter of reprimand stated that "The Company realizes the benefit to NCSS and in fact encourages the efforts of employees to identify security weaknesses to the VP, the directory, and other sensitive software in files". === 1981 === Chaos Computer Club forms in Germany. Ian Murphy, aka Captain Zap, was the first cracker to be tried and convicted as a felon. Murphy broke into AT&T's computers in 1981 and changed the internal clocks that metered billing rates. People were getting late-night discount rates when they called at midday. Of course, the bargain-seekers who waited until midnight to call long distance were hit with high bills. === 1983 === The 414s break into 60 computer systems at institutions ranging from the Los Alamos National Laboratory to Manhattan's Memorial Sloan-Kettering Cancer Center. The incident appeared as the cover story of Newsweek with the title "Beware: Hackers at play". As a result, the U.S. House of Representatives held hearings on computer security and passed several laws. The group KILOBAUD is formed in February, kicking off a series of other hacker groups that formed soon after. The movie WarGames introduces the wider public to the phenomenon of hacking and creates a degree of mass paranoia about hackers and their supposed abilities to bring the world to a screeching halt by launching nuclear ICBMs. The U.S. House of Representatives begins hearings on computer security hacking. In his Turing Award lecture, Ken Thompson mentions "hacking" and describes a security exploit that he calls a "Trojan horse". === 1984 === Someone calling himself Lex Luthor founds the Legion of Doom. Named after a Saturday morning cartoon, the LOD had the reputation of attracting "the best of the best"—until one of the most talented members called Phiber Optik feuded with Legion of Doomer Erik Bloodaxe and got 'tossed out of the clubhouse'. Phiber's friends formed a rival group, the Masters of Deception. The Comprehensive Crime Control Act gives the Secret Service jurisdiction over computer fraud. The Cult of the Dead Cow forms in Lubbock, Texas, and begins publishing its underground ezine. The hacker magazine 2600 begins regular publication, right when TAP was putting out its final issue. The editor of 2600, "Emmanuel Goldstein" (whose real name is Eric Corley), takes his handle from the leader of the resistance in George Orwell's Nineteen Eighty-Four. The publication provides tips for would-be hackers and phone phreaks, as well as commentary on the hacker issues of the day. Today, copies of 2600 are sold at most large retail bookstores. The Chaos Communication Congress, the annual European hacker conference organized by the Chaos Computer Club, is held in Hamburg, Germany. William Gibson's groundbreaking science fiction novel Neuromancer, about "Case", a futuristic computer hacker, is published. Considered the first major cyberpunk novel, it brought into hacker jargon such terms as "cyberspace", "the matrix", "simstim", and "ICE". === 1985 === KILOBAUD is re-organized into P.H.I.R.M. and begins sysopping hundreds of bulletin board systems (BBSs) throughout the United States, Canada, and Europe. The online 'zine Phrack is established. The Hacker's Handbook is published in the UK. The FBI, Secret Service, Middlesex County NJ Prosecutor's Office and various local law enforcement agencies execute seven search warrants concurrently across New Jersey on July 12, 1985, seizing equipment from BBS operators and users alike for "complicity in computer theft", under a n

    Read more →
  • Dominant resource fairness

    Dominant resource fairness

    Dominant resource fairness (DRF) is a rule for fair division. It is particularly useful for dividing computing resources in among users in cloud computing environments, where each user may require a different combination of resources. DRF was presented by Ali Ghodsi, Matei Zaharia, Benjamin Hindman, Andy Konwinski, Scott Shenker and Ion Stoica in 2011. == Motivation == In an environment with a single resource, a widely used criterion is max-min fairness, which aims to maximize the minimum amount of resource given to a user. But in cloud computing, it is required to share different types of resource, such as: memory, CPU, bandwidth and disk-space. Previous fair schedulers, such as in Apache Hadoop, reduced the multi-resource setting to a single-resource setting by defining nodes with a fixed amount of each resource (e.g. 4 CPU, 32 MB memory, etc.), and dividing slots which are fractions of nodes. But this method is inefficient, since not all users need the same ratio of resources. For example, some users need more CPU whereas other users need more memory. As a result, most tasks either under-utilize or over-utilize their resources. DRF solves the problem by maximizing the minimum amount of the dominant resource given to a user (then the second-minimum etc., in a leximin order). The dominant resource may be different for different users. For example, if user A runs CPU-heavy tasks and user B runs memory-heavy tasks, DRF will try to equalize the CPU share given to user A and the memory share given to user B. == Definition == There are m resources. The total capacities of the resources are r1,...,rm. There are n users. Each users runs individual tasks. Each task has a demand-vector (d1,..,dm), representing the amount it needs of each resource. It is implicitly assumed that the utility of a user equals the number of tasks he can perform. For example, if user A runs tasks with demand-vector [1 CPU, 4 GB RAM], and receives 3 CPU and 8 GB RAM, then his utility is 2, since he can perform only 2 tasks. More generally, the utility of a user receiving x1,...,xm resources is minj(xj/dj), that is, the users have Leontief utilities. The demand-vectors are normalized to fractions of the capacities. For example, if the system has 9 CPUs and 18 GB RAM, then the above demand-vector is normalized to [1/9 CPU, 2/9 GB]. For each user, the resource with the highest demand-fraction is called the dominant resource. In the above example, the dominant resource is memory, as 2/9 is the largest fraction. If user B runs a task with demand-vector [3 CPU, 1 GB], which is normalized to [1/3 CPU, 1/18 GB], then his dominant resource is CPU. DRF aims to find the maximum x such that all agents can receive at least x of their dominant resource. In the above example, this maximum x is 2/3: User A gets 3 tasks, which require 3/9 CPU and 2/3 GB. User B gets 2 tasks, which require 2/3 CPU and 1/9 GB. The maximum x can be found by solving a linear program; see Lexicographic max-min optimization. Alternatively, the DRF can be computed sequentially. The algorithm tracks the amount of dominant resource used by each user. At each round, it finds a user with the smallest allocated dominant resource so far, and allocates the next task of this user. Note that this procedure allows the same user to run tasks with different demand vectors. == Properties == DRF has several advantages over other policies for resource allocation. Proportionality: each user receives at least as much resources as they could get in a system in which all resources are partitioned equally among users (the authors call this condition "sharing incentive"). Strategyproofness: a user cannot get a larger allocation by lying about his needs. Strategyproofness is important, as evidence from cloud operators show that users try to manipulate the servers in order to get better allocations. Envy-freeness: no user would prefer the allocation of another user. Pareto efficiency: no other allocation is better for some users and not worse for anyone. Population monotonicity: when a user leaves the system, the allocations of remaining users do not decrease. When there is a single resource that is a bottleneck resource (highly demanded by all users), DRF reduces to max-min fairness. However, DRF violates resource monotonicity: when resources are added to the system, some allocations might decrease. == Extensions == Weighted DRF is an extension of DRF to settings in which different users have different weights (representing their different entitlements). Parkes, Procaccia and Shah formally extend weighted DRF to a setting in which some users do not need all resources (that is, they may have demand 0 to some resource). They prove that the extended version still satisfies proportionality, Pareto-efficiency, envy-freeness, strategyproofness, and even Group strategyproofness. On the other hand, they show that DRF may yield poor utilitarian social welfare, that is, the sum of utilities may be only 1/m of the optimum. However, they prove that any mechanism satisfying one of proportionality, envy-freeness or strategyproofness may suffers from the same low utilitarian welfare. They also extend DRF to the setting in which the users' demands are indivisible (as in fair item allocation). For the indivisible setting, they relax envy-freeness to EF1. They show that strategyproofness is incompatible with PO+EF1 or with PO+proportionality. However, a mechanism called SequentialMinMax satisfies efficiency, proportionality and EF1. Wang, Li and Liang present DRFH - an extension of DRF to a system with several heterogeneous servers. == Implementation == DRF was first implemented in Apache Mesos - a cluster resource manager, and it led to better throughput and fairness than previously used fair-sharing schemes.

    Read more →
  • FreshBooks

    FreshBooks

    FreshBooks is accounting software operated by 2ndSite Inc. primarily for small and medium-sized businesses. It is a web-based software as a service (SaaS) model, that can be accessed through a desktop or mobile device. The company was founded in 2003 and is based in Toronto, Canada. == History == FreshBooks was founded in 2004 by Mike McDerment, Levi Cooperman, and Joe Sawada in Toronto, Ontario. McDerment incorporated a second company, BillSpring in January 2015 to work on new product development. It was rolled back into FreshBooks as an updated interface in 2016. Initially FreshBooks functioned like an electronic invoicing program targeting IT professionals. After the release of the new interface, the initial release of FreshBooks was referred to as "FreshBooks Classic." FreshBooks Classic was discontinued in 2022 after migrating users to the new platform. FreshBooks Classic's front-end application was built in PHP, and the backend services were built in Python while the new FreshBooks uses the same backend services with a JavaScript single-page application. == Product == FreshBooks is a subscription-based accounting software platform that provides features such as invoicing, accounts payable, expense and time tracking, retainers, fixed asset depreciation, purchase orders, payroll integrations, mileage tracking, double-entry accounting, and standard business reporting. Financial data is stored in the cloud on a unified ledger, enabling access from desktop and mobile devices. The platform includes a free API for integration with external applications and supports multiple tax rates and currencies. It also offers project management and payroll functionalities. Pricing is based on a recurring monthly fee. FreshBooks supports country-specific tax calculations, including GST and HST in Canada, sales taxes in the United States, and MTD compliance in the UK. == Operations == FreshBooks has its headquarters in Toronto, Canada with operations in North America, Europe and Australia. Founder Mike McDerment was the chief executive officer of the company from 2003 until 2021, when he stepped down and was replaced by Don Epperson, but stayed as the executive chair. Don Epperson had previously joined FreshBooks as executive director in 2019. == Funding == FreshBooks was initially self-funded. In 2014, the company raised a Series A venture investment of $30 million led by the venture capital firm Oak Investment Partners, with participation by Georgian Partners and Atlas Venture. In 2017, FreshBooks announced that it raised another $43 million in funding from Accomplice, Georgian Partners and Oak Investment Partners. On August 10, 2021, FreshBooks announced that it had secured $80.75 million in Series E funding and $50 million in debt financing. FreshBooks also reached a valuation of more than $1 billion.

    Read more →