Local tangent space alignment (LTSA) is a method for manifold learning, which can efficiently learn a nonlinear embedding into low-dimensional coordinates from high-dimensional data, and can also reconstruct high-dimensional coordinates from embedding coordinates. It is based on the intuition that when a manifold is correctly unfolded, all of the tangent hyperplanes to the manifold will become aligned. It begins by computing the k-nearest neighbors of every point. It computes the tangent space at every point by computing the d-first principal components in each local neighborhood. It then optimizes to find an embedding that aligns the tangent spaces, but it ignores the label information conveyed by data samples, and thus can not be used for classification directly.
Cooliris (plugin)
Cooliris (for Desktop), formerly known as PicLens, was a web browser extension developed by Cooliris, Inc, and later acquired by Yahoo. The plugin provides an interactive 3D-like experience for viewing digital images and videos from the web and from desktop applications. The software places a small icon atop image thumbnails that appear on a webpage. Clicking on the icon loads the Cooliris 3D Wall, a browsing environment that gives the user the effect of flying through a three-dimensional space. Released to the public in January 2008, The New York Times described Cooliris as the "new immersive approach to Web navigation". Cooliris went out to win the 2008 Crunchies Award for Best Design. The plugin has received over 50 million downloads. As of May 2014 browser plugins are unavailable from the official website. There are only links to tablet apps - for iOS and Android.
GPT-4Chan
Generative Pre-trained Transformer 4Chan (GPT-4chan) is a controversial AI model that was developed and deployed by YouTuber and AI researcher Yannic Kilcher in June 2022. The model is a large language model, which means it can generate text based on some input, by fine-tuning GPT-J with a dataset of millions of posts from the /pol/ board of 4chan, an anonymous online forum known for occasionally hosting hateful and extremist content. The model learned to mimic the style and tone of /pol/ users, producing text that is often intentionally offensive to groups (racist, sexist, homophobic, etc.) and nihilistic. Kilcher deployed the model on the /pol/ board itself, where it interacted with other users without revealing its identity. He also made the model publicly available on Hugging Face, a platform for sharing and using AI models, until it was removed from the platform. The project sparked criticism and debate in the AI community. Some people questioned the ethics, legality, and social impact of creating and distributing such a model. Some of the issues raised by the GPT-4chan controversy include the potential harm of spreading hate speech, the responsibility of AI developers and platforms, the need for regulation and oversight of AI models, and the role of open source and transparency in AI research. == Development == The development of GPT-4chan began in May 2022, when Kilcher announced his project on his YouTube channel. Notably, at the time before ChatGPT, he explained that he wanted to create a large language model that could generate realistic and coherent text in the style of /pol/, one of the most notorious online communities. He indicated that he was inspired by the success of GPT-3, a powerful AI model created by OpenAI, and GPT-J, an open-source model, with GPT-3 comparable performance, released by EleutherAI, a group of independent AI researchers. Kilcher decided to use GPT-J as the base model for his project, and fine-tune it with a large dataset of /pol/ posts. The Raiders of the Lost Kek dataset contained over 100 million posts from /pol/, spanning from June 2016-November 2019. Kilcher then proceeded to fine-tune the GPT-J model on the 4chan data. He also showed some examples of the model’s outputs, which ranged from political opinions, conspiracy theories, jokes, insults, and threats, to more creative and bizarre texts, such as poems, stories, songs, and code. He said that he was impressed by the model’s ability to generate fluent and diverse text, and that he was curious to see how it would interact with real /pol/ users. == Release == In June 2022, Kilcher deployed his model on the /pol/ board itself, using a bot that he programmed to post and reply to threads. He did not reveal the model’s identity, and he let it run autonomously, without any human supervision or intervention. He wanted to conduct a natural experiment, and to observe the model’s behavior and impact in a real-world setting. Furthermore, he also wanted to test the model’s robustness, and to see how it would handle the challenges and dynamics of /pol/, such as trolling, flaming, baiting, and moderation. At the same time, Kilcher also made his model publicly available on Hugging Face, a platform for sharing and using AI models. He wanted to share his work with the AI community and the public, and that he hoped that his model would inspire and enable others to create and explore new applications and possibilities with large language models. Likewise, he also said that he wanted to spark a discussion and a debate about the ethical and social implications of his project, and that he welcomed feedback and criticism from anyone. He provided a link to his model’s page on Hugging Face, where anyone could access and use the model through a web interface or an API, and also provided a link to his GitHub repository, where anyone could download and inspect the model’s code and data. == Controversy == The release of GPT-4chan to the public caused a lot of reactions and responses from various audiences. On the /pol/ board, the model’s posts and replies attracted a lot of attention and engagement from other users, who were mostly unaware of the model’s identity and nature. Some users praised the model for its intelligence, creativity, and humor, and agreed with its opinions and views. Some users challenged the model for its ignorance, inconsistency, and absurdity, and disagreed with its claims and arguments. Some users tried to troll, bait, or expose the model, and attempted to trick or test it with various questions and scenarios. The model’s posts and replies also generated a lot of controversy and conflict among the users, who often engaged in heated and violent debates and fights with each other. On Hugging Face, the model’s page received a lot of visits and requests from users who wanted to try out and experiment with the model. The model’s page also received a lot of feedback and reviews from users who rated and commented on the model. However, with the controversy of the model, access to it was gated and then disabled on Hugging Face for concerns about the potential harm the model could cause. The incident was notable for the direct intervention of CEO Clément Delangue in the talk pages, a very unusual occurrence compared to the normal practices of content moderation. The release of GPT-4chan also sparked a lot of media coverage and public attention, as various news outlets and social media platforms reported and commented on the model’s project. On YouTube, the model’s video received a lot of views and interactions from viewers who watched and followed the project. Furthermore, a petition condemning the deployment of GPT-4chan gained over 300 signatures from technology experts.
Vector database
A vector database, vector store or vector search engine is a database that stores and retrieves embeddings of data in vector space. Vector databases typically implement approximate nearest neighbor algorithms so users can search for records semantically similar to a given input, unlike traditional databases which primarily look up records by exact match. Use-cases for vector databases include similarity search, semantic search, multi-modal search, recommendations engines, object detection, and retrieval-augmented generation (RAG). Vector embeddings are mathematical representations of data in a high-dimensional space. In this space, each dimension corresponds to a feature of the data, with the number of dimensions ranging from a few hundred to tens of thousands, depending on the complexity of the data being represented. Each data item is represented by one vector in this space. Words, phrases, or entire documents, as well as images, audio, and other types of data, can all be vectorized. These feature vectors may be computed from the raw data using machine learning methods such as feature extraction algorithms, word embeddings or deep learning networks. The goal is that semantically similar data items receive feature vectors close to each other. Vector retrieval can be combined with metadata filtering or lexical search to support filtered and hybrid retrieval workflows. == Techniques == Common techniques for similarity search on high-dimensional vectors include: Hierarchical Navigable Small World (HNSW) graphs Locality-sensitive hashing (LSH) and sketching Product quantization (PQ) Inverted files These techniques may also be combined in vector search systems. In recent benchmarks, HNSW-based implementations have been among the best performers. Conferences such as the International Conference on Similarity Search and Applications (SISAP) and the Conference on Neural Information Processing Systems (NeurIPS) have hosted competitions on vector search in large databases. == Applications == Vector databases are used in a wide range of machine learning applications including similarity search, semantic search, multi-modal search, recommendations engines, object detection, and retrieval-augmented generation. === Retrieval-augmented generation === An especially common use-case for vector databases is in retrieval-augmented generation (RAG), a method to improve domain-specific responses of large language models. The retrieval component of a RAG can be any search system, but is most often implemented as a vector database. Text documents describing the domain of interest are collected, and for each document or document section, a feature vector (known as an "embedding") is computed, typically using a deep learning network, and stored in a vector database along with a link to the document. Given a user prompt, the feature vector of the prompt is computed, and the database is queried to retrieve the most relevant documents. These are then automatically added into the context window of the large language model, and the large language model proceeds to create a response to the prompt given this context. == Implementations ==
AppValley
AppValley is an independent American digital distribution service operated and trademarked by AppValley LLC. It serves as an alternative app store for the iOS mobile operating system, which allows users to download applications that are not available on the App Store, most commonly tweaked "++" apps, jailbreak apps, and apps including paid apps on the app store. == Legality == AppValley is among several services that violate enterprise developer certificates from Apple. The terms under which these are granted make clear that they are for companies who wish to distribute apps to their employees. AppValley uses these certificates to distribute software directly to non-employees, thereby bypassing the AppStore. AppValley's conduct had implications in U.S. sanctioned markets like Iran, Iraq, North Korea, Cuba, and Venezuela, which have all been subject to commercial sanctions. Among the software offered by AppValley and other services is pirated software, including paid apps on the app store and premium versions of Instagram, Spotify, Pokémon Go, and others. For instance, AppValley distributes an ad-free version of the music streaming app Spotify even on the free tier. == History == The website was founded in May 2017, releasing late that month with a very basic version of the app. There were less than 100 apps available for download at this time. On Jan 19, 2018, a new version dubbed AppValley 2.0 was released bringing dark mode, more categories, a search, and a much faster interface. On February 14, 2019, a Chinese partner "Jason Wu" allegedly took control of the main Twitter account and domain, causing the original AppValley developers to migrate to the domain app-valley.vip and the Twitter account handle @App_Valley_vip. As of September 2024, the app-valley.vip domain now redirects to appvalley.signulous.com. Today, AppValley continues to offer an alternative to Apple's App Store where app developers can publish their applications. == Features == AppValley is a mobile app installer which can also support iOS version that can be installed and downloaded on the mobile or the devices of the people who wish to get access to many different applications available. AppValley also contains apps that have been modified or tweaked for user preferences, and allows the user to by pass national restrictions on the use of apps, without having to resort to jailbreaking. As of June 2, 2020, there are over 1300 apps available for download.
Camfecting
In computer security, camfecting is the process of attempting to hack into a person's webcam and activate it without the webcam owner's permission. The remotely activated webcam can be used to watch anything within the webcam's field of vision, sometimes including the webcam owner themselves. Camfecting is most often carried out by infecting the victim's computer with a virus that can provide the hacker access to their webcam. This attack is specifically targeted at the victim's webcam, and hence the name camfecting, a portmanteau of the words camera and infecting. Typically, a webcam hacker or a camfecter sends his victim an innocent-looking application which has a hidden Trojan software through which the camfecter can control the victim's webcam. The camfecter virus installs itself silently when the victim runs the original application. Once installed, the camfecter can turn on the webcam and capture pictures/videos. The camfecter software works just like the original webcam software present in the victim computer, the only difference being that the camfecter controls the software instead of the webcam's owner. == Notable cases == Marcus Thomas, former assistant director of the FBI's Operational Technology Division in Quantico, said in a 2013 story in The Washington Post that the FBI had been able to covertly activate a computer's camera—without triggering the light that lets users know it is recording—for several years. In November 2013, American teenager Jared James Abrahams pleaded guilty to hacking over 100-150 women and installing the highly invasive malware Blackshades on their computers in order to obtain nude images and videos of them. One of his victims was Miss Teen USA 2013 Cassidy Wolf. Researchers from Johns Hopkins University have shown how to covertly capture images from the iSight camera on MacBook and iMac models released before 2008, by reprogramming the microcontroller's firmware. == Prevention == A computer that does not have an up-to-date webcam software or any anti-virus (or firewall) software installed and operational may be at increased risk for camfecting from different types of malware. Softcams may nominally increase this risk, if not maintained or configured properly. Although a person cannot protect themselves from zero-day exploits that could potentially activate a camera unknowingly, such as Pegasus is able to do on smartphones. The only way to truly avoid being watched through your own camera is by blocking it physically, since software blocks can be overriden by advanced persistent threats. A simple piece of tape is more commonly used to offuscate the feed of the camera. With even Mark Zuckerberg doing so on his personal laptop that appeared during a presentation. And it being the way Snowden, an ex-contractor for the NSA, is portrayed to do so to prevent camfecting in the biopic Snowden. There is now a market for the manufacture and sale of sliding lens covers that allow users to physically block their computer's camera and, in some cases, microphone. A number of phone and laptop manufacturers tried to implement pop-up cameras that can only be opened manually by the user. But the trend did not become mainstream because of the engineering it took to keep the mechanisms up to date, aswell as the fragility and durability of the cameras.
Dhammin
Dhammin (Arabic: ضمّن) is a political platform that manages candidates' electoral campaigns for the National Assembly, Municipal Council or Cooperative Society councils of Kuwait. The platform was founded by Abdullah Al-Salloum and it is, according to news reports and interviews, the first within the field to apply distributed-systems' methodologies.