AI Chat Prompt Generator

AI Chat Prompt Generator — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Photometric stereo

    Photometric stereo

    Photometric stereo is a technique in computer vision for estimating the surface normals of objects by observing that object under different lighting conditions (photometry). It is based on the fact that the amount of light reflected by a surface is dependent on the orientation of the surface in relation to the light source and the observer. By measuring the amount of light reflected into a camera, the space of possible surface orientations is limited. Given enough light sources from different angles, the surface orientation may be constrained to a single orientation or even overconstrained. The technique was originally introduced by Woodham in 1980. The special case where the data is a single image is known as shape from shading, and was analyzed by B. K. P. Horn in 1989. Photometric stereo has since been generalized to many other situations, including extended light sources and non-Lambertian surface finishes. Current research aims to make the method work in the presence of projected shadows, highlights, and non-uniform lighting. Photometric stereo is widely used in various fields, including archaeology, cultural heritage conservation, and quality control. It is now integrated into widely used open-source software, such as Meshroom. == Basic method == Under Woodham's original assumptions — Lambertian reflectance, known point-like distant light sources, and uniform albedo — the problem can be solved by inverting the linear equation I = L ⋅ n {\displaystyle I=L\cdot n} , where I {\displaystyle I} is a (known) vector of m {\displaystyle m} observed intensities, n {\displaystyle n} is the (unknown) surface normal, and L {\displaystyle L} is a (known) 3 × m {\displaystyle 3\times m} matrix of normalized light directions. This model can easily be extended to surfaces with non-uniform albedo, while keeping the problem linear. Taking an albedo reflectivity of k {\displaystyle k} , the formula for the reflected light intensity becomes I = k ( L ⋅ n ) . {\displaystyle I=k(L\cdot n).} If L {\displaystyle L} is square (there are exactly 3 lights) and non-singular, it can be inverted, giving L − 1 I = k n . {\displaystyle L^{-1}I=kn.} Since the normal vector is known to have length 1, k {\displaystyle k} must be the length of the vector k n {\displaystyle kn} , and n {\displaystyle n} is the normalised direction of that vector. If L {\displaystyle L} is not square (there are more than 3 lights), a generalisation of the inverse can be obtained using the Moore–Penrose pseudoinverse, by simply multiplying both sides with L T {\displaystyle L^{T}} , giving L T I = L T k ( L ⋅ n ) , {\displaystyle L^{T}I=L^{T}k(L\cdot n),} ( L T L ) − 1 L T I = k n , {\displaystyle (L^{T}L)^{-1}L^{T}I=kn,} after which the normal vector and albedo can be solved as described above. == Non-Lambertian surfaces == The classical photometric stereo problem concerns itself only with Lambertian surfaces, with perfectly diffuse reflection. This is unrealistic for many types of materials, especially metals, glass and smooth plastics, and will lead to aberrations in the resulting normal vectors. Many methods have been developed to lift this assumption. In this section, a few of these are listed. === Specular reflections === Historically, in computer graphics, the commonly used model to render surfaces started with Lambertian surfaces and progressed first to include simple specular reflections. Computer vision followed a similar course with photometric stereo. Specular reflections were among the first deviations from the Lambertian model. These are a few adaptations that have been developed. Many techniques ultimately rely on modelling the reflectance function of the surface, that is, how much light is reflected in each direction. This reflectance function has to be invertible. The reflected light intensities towards the camera is measured, and the inverse reflectance function is fit onto the measured intensities, resulting in a unique solution for the normal vector. === General BRDFs and beyond === According to the Bidirectional reflectance distribution function (BRDF) model, a surface may distribute the amount of light it receives in any outward direction. This is the most general known model for opaque surfaces. Some techniques have been developed to model (almost) general BRDFs. In practice, all of these require many light sources to obtain reliable data. These are methods in which surfaces with general BRDFs can be measured. Determine the explicit BRDF prior to scanning. To do this, a different surface is required that has the same or a very similar BRDF, of which the actual geometry (or at least the normal vectors for many points on the surface) is already known. The lights are then individually shone upon the known surface, and the amount of reflection into the camera is measured. Using this information, a look-up table can be created that maps reflected intensities for each light source to a list of possible normal vectors. This puts constraints on the possible normal vectors the surface may have, and reduces the photometric stereo problem to an interpolation between measurements. Typical known surfaces to calibrate the look-up table with are spheres for their wide variety of surface orientations. Restricting the BRDF to be symmetrical. If the BRDF is symmetrical, the direction of the light can be restricted to a cone about the direction to the camera. Which cone this is depends on the BRDF itself, the normal vector of the surface, and the measured intensity. Given enough measured intensities and the resulting light directions, these cones can be approximated and therefore the normal vectors of the surface. Some progress has been made towards modelling an even more general surfaces, such as Spatially Varying Bidirectional Distribution Functions (SVBRDF), Bidirectional surface scattering reflectance distribution functions (BSSRDF), and accounting for interreflections. However, such methods are still fairly restrictive in photometric stereo. Better results have been achieved with structured light. == Uncalibrated photometric stereo == Uncalibrated Photometric Stereo is an approach in photometric stereo that aims to reconstruct the 3D shape of an object from images captured under unknown lighting conditions. Unlike classical methods, which often assume controlled or known lighting setups, this approach removes these constraints, making it adaptable to diverse and real-world environments. The advent of deep learning has revolutionized universal PS by replacing handcrafted assumptions with data-driven models. Recent approaches leverage Transformer-based architectures and multi-scale encoder–decoder networks to directly estimate surface normals from input images. Uncalibrated Photometric Stereo is inherently an ill-posed problem, as it attempts to recover 3D shape and lighting conditions simultaneously from images alone. This leads to fundamental ambiguities in the reconstruction process, which manifest as systematic errors in the recovered geometry, including global distortions in the object's overall shape, and misinterpretation of surface orientation, where concave regions may appear convex and vice versa. To address the challenges of uncalibrated photometric stereo, hybrid methods have emerged that combine multi-view stereo and photometric stereo. These approaches leverage the strengths of both techniques, including geometric reliability and resolution.

    Read more →
  • Social film

    Social film

    A social film is a type of interactive film that is presented through the lens of social media. A social film is distributed digitally and integrates with a social networking service, such as Facebook or YouTube. It combines features of web video, social network games and social media. == Key elements == Social films are a more recent phenomenon, and, in turn, there are few precedents for their format. Although there are not many examples of this genre of film, the medium has certain identifiable elements: Casual entertainment Social media User-generated content Game mechanics Using just one of these factors or a combination of them, a social film engages viewers to interact directly with the work. This can be done through usual social media functionality like comments and ranking or adding directly to the narrative itself. Just as with memes, social film distribution relies on the viral spread enabled by social media. This is based on the viral expansion loops model, in which a viewer benefits from sharing the application with friends, exponentially creating new viewers compelled to share the application. == History == One of the first social films to be created was from the YouTube channel lonelygirl15. This social film started in 2006 and was created by Miles Beckett , Mesh Flinders, and Greg Goodfried. They used YouTube posts to create an interactive video series about a fictional character who showcased her life in a vlog format. As the videos went on, more bizarre things would keep happening to the main character, Bree, before she just stopped uploading. This channel was not only the first viral social film, but went on to be one of the first viral YouTube channels to be created. It did take a few years to see any more films in this genre, but 2011 saw many people start to try their hand at making these films. The first social film in this year was a film called Him, Her and Them which was produced and released by Murmur in April 2011. It was distributed exclusively through Facebook and promoted as the first “Facebook film.” The film is composed of short video clips and interactive slideshows, integrating Facebook's Social Graph API. Users participate via text-based additions to the story, which are viewable only by friends within their social network. In May 2011, Canon and Ron Howard teamed up to create Project Imagin8ion, which was a photo contest where photographers submitted photos and the top 8 photos would be the inspiration for a short film. This short film was called "When You Find Me" and could be found exclusively on YouTube. In July 2011, Intel and Toshiba partnered together to create Hollywood's first Social Film experience, a thriller called Inside, directed by D.J. Caruso and starring Emmy Rossum. The project is broken up into several segments across multiple social media platforms including Facebook, YouTube, and Twitter. In this instance, the audience is challenged to help Emmy Rossum's character, Christina, safely make it out of the room she's been trapped in. This particular form of social film is a major undertaking in that it combines social media activity with A-list acting talent to create a user experience that all happens in real time. Although not quite the same idea, Hollywood also started experimenting with the idea of interactive and crowd-sourced films. One of the first examples of this was a short film called "Life In A Day" directed by Kevin Macdonald and produced by Ridley Scott. Kevin asked people from all over the world to submit videos onto YouTube of what they were doing on July 24th, 2010. They combined all of the best videos that were submitted together to create one film of people doing different things all around the world, no matter how boring or simple those things seemed. They took this short to film festivals before releasing it to the public on YouTube in 2011. In August 2012, Intel and Toshiba partnered again to create The Beauty Inside, directed by Drake Doremus, starring Mary Elizabeth Winstead and Topher Grace. It's Hollywood's first social film that gives everyone in the audience a chance to play Alex, the lead role. The experience will be broken up into six filmed episodes interspersed with real-time interactive storytelling that all takes place on Alex's Facebook timeline. In August 2013, Intel and Toshiba released their third entry into the category, The Power Inside, directed by Will Speck and Josh Gordon and starring Harvey Keitel, Analeigh Tipton, and Craig Roberts. It's Hollywood's first social film that asks the audience to audition to help save or destroy the world. The experience is broken up into six filmed episodes interspersed with user-generated content and interactive storytelling on the main character's Facebook timeline. In 2015, Intel partnered with Dell for their fourth entry, What Lives Inside directed by Robert Stromberg and starring Colin Hanks, Catherine O'Hara, and J. K. Simmons. The first of four episodes was released on Hulu on March 25, 2015.

    Read more →
  • Client-side encryption

    Client-side encryption

    Client-side encryption is the cryptographic technique of encrypting data on the sender's side, before it is transmitted to a server such as a cloud storage service. Client-side encryption features an encryption key that is not available to the service provider, making it difficult or impossible for service providers to decrypt hosted data. Client-side encryption allows for the creation of applications whose providers cannot access the data its users have stored, thus offering a high level of privacy. Applications utilizing client-side encryption are sometimes marketed under the misleading or incorrect term "zero-knowledge", but this is a misnomer, as the term zero-knowledge describes something entirely different in the context of cryptography. == Details == Client-side encryption seeks to eliminate the potential for data to be viewed by service providers (or third parties that compel service providers to deliver access to data), client-side encryption ensures that data and files that are stored in the cloud can only be viewed on the client-side of the exchange. This prevents data loss and the unauthorized disclosure of private or personal files, providing increased peace of mind for its users. Current recommendations by industry professionals as well as academic scholars offer great vocal support for developers to include client-side encryption to protect the confidentiality and integrity of information. === Examples of services that use client-side encryption by default === Tresorit MEGA Cryptee Cryptomator === Examples of services that optionally support client-side encryption === Apple iCloud offers optional client-side encryption when "Advanced Data Protection for iCloud" is enabled. Google Drive, Google Docs, Google Meet, Google Calendar, and Gmail — However, as of Jul 2024, optional client-side encryption features are only available to paid users. === Examples of services that do not support client-side encryption === Dropbox === Examples of client-side encrypted services that no longer exist === SpiderOak Backup

    Read more →
  • Social television

    Social television

    Social television is the union of television and social media. Millions of people now share their TV experience with other viewers on social media such as Twitter and Facebook using smartphones and tablets. TV networks and rights holders are increasingly sharing video clips on social platforms to monetise engagement and drive tune-in. The social TV market covers the technologies that support communication and social interaction around TV as well as companies that study television-related social behavior and measure social media activities tied to specific TV broadcasts – many of which have attracted significant investment from established media and technology companies. The market is also seeing numerous tie-ups between broadcasters and social networking players such as Twitter and Facebook. The market is expected to be worth $256bn by 2017. Social TV was named one of the 10 most important emerging technologies by the MIT Technology Review on Social TV in 2010. And in 2011, David Rowan, the editor of Wired magazine, named Social TV at number three of six in his peek into 2011 and what tech trends to expect to get traction. Ynon Kreiz, CEO of the Endemol Group told the audience at the Digital Life Design (DLD) conference in January 2011: "Everyone says that social television will be big. I think it's not going to be big—it's going to be huge". Much of the investment in the earlier years of social TV went into standalone social TV apps. The industry believed these apps would provide an appealing and complimentary consumer experience which could then be monetized with ads. These apps featured TV listings, check-ins, stickers and synchronised second-screen content but struggled to attract users away from Twitter and Facebook. Most of these companies have since gone out of business or been acquired amid a wave of consolidation and the market has instead focused on the activities of the social media channels themselves – such as Twitter Amplify, Facebook Suggested Videos and Snapchat Discover – and the technologies that support them. == Twitter == Twitter and Facebook are both helping users connect around media, which can provoke strong debate and engagement. Both social platforms want to be the 'digital watercooler' and host conversation around TV because the engagement and data about what media people consume can then be used to generate advertising revenue. As an open platform, conversation on Twitter is closely aligned with real-time events. In May 2013, it launched Twitter Amplify – an advertising product for media and consumer brands. With Amplify, Twitter runs video highlights from major live broadcasts, with advertisers' names and messages playing before the clip. By February 2014, all four major U.S. TV networks had signed up to the Amplify program, bringing a variety of premium TV content onto the social platform in the form of in-tweet real-time video clips. In June 2014, Twitter acquired its Twitter Amplify partner in the U.S. SnappyTV, a company that was helping broadcasters and rights holders to share video content both organically across social and via Twitter's Amplify program. Twitter continues to rely on Grabyo, which has also struck numerous deals with some of the largest broadcasters and rights holders in Europe and North America to share video content across Facebook and Twitter. == Facebook == Facebook made significant changes to its platform in 2014 including updates to its algorithm to enhance how it serves video in users' feeds. It also launched video autoplay to get users to watch the videos in their feeds. It rapidly surpassed Twitter and by the end of 2014 it was enjoying three billion video views a day on its platform and had announced a partnership with the NFL, one of Twitter's most active Twitter Amplify partners. In April 2015, at its F8 Developer Conference, it revealed it was working with Grabyo among other technology partners to bring video onto its platform. Then in July it announced it would be launching Facebook Suggested Videos, bringing related videos and ads to anyone that clicks on a video – a move that not only competed with Twitter's commercial video offering but also put it in direct competition with YouTube. == TV Time == TV Time is a television dedicated social network that allows users to keep track of the television series they watch, as well as films. It also allows them to express their reaction to the media they have seen with episode specific voting for favorite characters and emotional reaction to episodes, as well as commenting in episode restrictive pages. This way users are able to avoid spoilers while also finding a precise audience and community for each of their interactions, as opposed to bigger, non-television dedicated social medias such as Facebook and Twitter where the likelihood of unintentionally reading spoilers is much higher. TV Time offers an analytics service called "TVLytics" where the votes and reactions collected from users can be studied for research and television production purposes. == Advertising == According to Businessinsider.com, there are variety of applications for social TV, including support for TV ad sales, optimizing TV ad buys, making ad buys more efficient, as a complement to audience measurement, and eventually, audience forecasting and real-time optimization. Social TV data can ease access to focus groups and may create a positive feedback loop for generating ultra-sticky TV programming and multi-screen ad campaigns. == In numbers == Viewers share their TV experience on social media in real-time as events unfold: between 88-100m Facebook users login to the platform during the primetime hours of 8pm – 11pm in the US. The volume of social media engagement in TV is also rising – according to Nielsen SocialGuide, there was a 38% increase in tweets about TV in 2013 to 263m. For the 2014 Super Bowl, Twitter reported that a record 24.9 million tweets about the game were sent during the telecast, peaking at 381,605 tweets per minute. Facebook reported that 50 million people discussed the Super Bowl, generating 185 million interactions. The 2014 Oscars generated 5m tweets, viewed by an audience of 37m unique Twitter users and delivering 3.3bn impressions globally as conversation and key moments were shared virally across the platform. In 2014 the All England Lawn Tennis Club (AELTC), hosts of Wimbledon, used Grabyo to share video content across social. The videos were viewed 3.5 million times across Facebook and Twitter. In partnered with Grabyo again in 2015 and the videos generated over 48 million views across Facebook and Twitter. == Television shows with social integration == Here are some examples of how TV executives are integrating social elements with TV shows: C-SPAN streamed tweets from US Senators and Representatives during the quorum call The Voice had the judges of the program tweet during the show and the posts scrolls on the bottom of the screen. The use of Twitter also led to an increase in viewers. "Glee" Entertainment Weekly created a second screen viewing platform for the Glee season 3 premiere. == Related publications == Erika Jonietz. "Making TV Social, Virtually" MIT Technology Review. (January 11, 2010) AmigoTV (Alcatel-Lucent; Coppens et al.) – 2004 www.ist-ipmedianet.org/Alcatel_EuroiTV2004_AmigoTV_short_paper_S4-2.pdf Nextream (MIT Media Lab, Martin et al.) – 2010 Social Interactive Television: Immersive Shared Experiences and Perspectives (P. Cesar, D. Geerts, and K. Chorianopoulos (eds.)) – 2009 Social TV and the Emergence of Interactive TV – Multimedia Research Group – November 2010 Interactive Social TV on Service Oriented Environments: Challenges and Enablers (May 2011) == Systems == Boxee – acquired by Samsung GetGlue – acquired by i.TV Grabyo KIT digital Miso TV Tank Top TV WiO Xbox Live

    Read more →
  • Teknomo–Fernandez algorithm

    Teknomo–Fernandez algorithm

    The Teknomo–Fernandez algorithm (TF algorithm), is an efficient algorithm for generating the background image of a given video sequence. By assuming that the background image is shown in the majority of the video, the algorithm is able to generate a good background image of a video in O ( R ) {\displaystyle O(R)} -time using only a small number of binary operations and Boolean bit operations, which require a small amount of memory and has built-in operators found in many programming languages such as C, C++, and Java. == History == People tracking from videos usually involves some form of background subtraction to segment foreground from background. Once foreground images are extracted, then desired algorithms (such as those for motion tracking, object tracking, and facial recognition) may be executed using these images. However, background subtraction requires that the background image is already available and unfortunately, this is not always the case. Traditionally, the background image is searched for manually or automatically from the video images when there are no objects. More recently, automatic background generation through object detection, medial filtering, medoid filtering, approximated median filtering, linear predictive filter, non-parametric model, Kalman filter, and adaptive smoothening have been suggested; however, most of these methods have high computational complexity and are resource-intensive. The Teknomo–Fernandez algorithm is also an automatic background generation algorithm. Its advantage, however, is its computational speed of only O ( R ) {\displaystyle O(R)} -time, depending on the resolution R {\displaystyle R} of an image and its accuracy gained within a manageable number of frames. Only at least three frames from a video is needed to produce the background image assuming that for every pixel position, the background occurs in the majority of the videos. Furthermore, it can be performed for both grayscale and colored videos. == Assumptions == The camera is stationary. The light of the environment changes only slowly relative to the motions of the people in the scene. The number of people does not occupy the scene for most of the time at the same place. Generally, however, the algorithm will certainly work whenever the following single important assumption holds: For each pixel position, the majority of the pixel values in the entire video contain the pixel value of the actual background image (at that position).As long as each part of the background is shown in the majority of the video, the entire background image needs not to appear in any of its frames. The algorithm is expected to work accurately. == Background image generation == === Equations === For three frames of image sequence x 1 {\displaystyle x_{1}} , x 2 {\displaystyle x_{2}} , and x 3 {\displaystyle x_{3}} , the background image B {\displaystyle B} is obtained using B = x 3 ( x 1 ⊕ x 2 ) + x 1 x 2 {\displaystyle B=x_{3}(x_{1}\oplus x_{2})+x_{1}x_{2}} where ⊕ {\displaystyle \oplus } denotes the exclusive disjunctive bit operator. The Boolean mode function S {\displaystyle S} of the table occurs when the number of 1 entries is larger than half of the number of images such that S = { 1 , if ∑ i = 1 n x i ≥ ⌈ n 2 + 1 ⌉ , and n ≥ 3 0 , otherwise {\displaystyle S={\begin{cases}1,&{\text{if }}\sum _{i=1}^{n}x_{i}\geq \left\lceil {\frac {n}{2}}+1\right\rceil ,{\text{ and }}n\geq 3\\0,&{\text{otherwise}}\end{cases}}} For three images, the background image B {\displaystyle B} can be taken as the value x ¯ 1 x 2 x 3 + x 1 x ¯ 2 x 3 + x 1 x 2 x ¯ 3 + x 1 x 2 x 3 {\displaystyle {\bar {x}}_{1}x_{2}x_{3}+x_{1}{\bar {x}}_{2}x_{3}+x_{1}x_{2}{\bar {x}}_{3}+x_{1}x_{2}x_{3}} === Background generation algorithm === At the first level, three frames are selected at random from the image sequence to produce a background image by combining them using the first equation. This yields a better background image at the second level. The procedure is repeated until desired level L {\displaystyle L} . == Theoretical accuracy == At level ℓ {\displaystyle \ell } , the probability p ℓ {\displaystyle p_{\ell }} that the modal bit predicted is the actual modal bit is represented by the equation p ℓ = ( p ℓ − 1 ) 3 + 3 ( p ℓ − 1 ) 2 ( 1 − p ℓ − 1 ) {\displaystyle p_{\ell }=(p_{\ell -1})^{3}+3(p_{\ell -1})^{2}(1-p_{\ell -1})} . The table below gives the computed probability values across several levels using some specific initial probabilities. It can be observed that even if the modal bit at the considered position is at a low 60% of the frames, the probability of accurate modal bit determination is already more than 99% at 6 levels. == Space complexity == The space requirement of the Teknomo–Fernandez algorithm is given by the function O ( R F + R 3 L ) {\displaystyle O(RF+R3^{L})} , depending on the resolution R {\displaystyle R} of the image, the number F {\displaystyle F} of frames in the video, and the desired number L {\displaystyle L} of levels. However, the fact that L {\displaystyle L} will probably not exceed 6 reduces the space complexity to O ( R F ) {\displaystyle O(RF)} . == Time complexity == The entire algorithm runs in O ( R ) {\displaystyle O(R)} -time, only depending on the resolution of the image. Computing the modal bit for each bit can be done in O ( 1 ) {\displaystyle O(1)} -time while the computation of the resulting image from the three given images can be done in O ( R ) {\displaystyle O(R)} -time. The number of the images to be processed in L {\displaystyle L} levels is O ( 3 L ) {\displaystyle O(3^{L})} . However, since L ≤ 6 {\displaystyle L\leq 6} , then this is actually O ( 1 ) {\displaystyle O(1)} , thus the algorithm runs in O ( R ) {\displaystyle O(R)} . == Variants == A variant of the Teknomo–Fernandez algorithm that incorporates the Monte-Carlo method named CRF has been developed. Two different configurations of CRF were implemented: CRF9,2 and CRF81,1. Experiments on some colored video sequences showed that the CRF configurations outperform the TF algorithm in terms of accuracy. However, the TF algorithm remains more efficient in terms of processing time. == Applications == Object detection Face detection Face recognition Pedestrian detection Video surveillance Motion capture Human-computer interaction Content-based video coding Traffic monitoring Real-time gesture recognition

    Read more →
  • Data marketplace

    Data marketplace

    Data marketplace is an online platform for sharing and consuming data in the form of data assets or data products. Part of the data management stack, it aims to bring together data producers and data consumers (including business users and AI) in a single space, with the objective of increasing access to understandable, high-quality data. Included within its Data Marketplaces and Exchange (DME) category by Gartner, data marketplaces can provide data internally within an organization, externally with partners, or as open data. == Concept == Digitization has dramatically increased data volumes within organizations, with IDC predicting that by 2025 the world will contain 175 zettabytes of data. This has created a need to both manage this data and provide access to it to enable business intelligence and data analysis. However, data is often scattered within multiple systems (such as data warehouses and data lakes), and is in formats that are only understandable by technical experts, such as data scientists. According to IDC, 81% of IT leaders cite data silos as a major barrier to digital transformation. This means that data is not freely available to business users or external audiences such as partners or citizens, limiting its value, and holding back AI deployments. Data marketplaces solve this issue, providing seamless, self-service access to high-quality data in an understandable, secure and auditable manner. They break down data silos, reduce friction in data access, and enable a broader range of users, including non-technical profiles, to find, understand, and consume data autonomously. Data assets on the marketplace can be raw data, data visualizations or data products. Data marketplaces combine data management functions such as data governance with the user-friendly experience offered by e-commerce marketplaces in order to increase the usage of data. These include features such as powerful search engines, feedback, ratings, subscriptions and product description sheets. According to Gartner, data marketplaces provide infrastructure, transactional capabilities, and services for both consumers and providers of data assets. == History and timeline == Data marketplaces have evolved since they first emerged in terms of both their scope and usage. === 2000s === With the rise of the internet, data brokers began collecting, aggregating, distributing and selling personal, financial and marketing data to third parties online. Data marketplaces were deployed to monetize this data, making it discoverable and accessible to users, either through subscriptions or one-off purchases. At the same time, regulations, such as the US Open Government Initiative of 2009 and others around the world mandated greater transparency and data sharing with the public. Data sharing portals were created by public and government bodies to make this information available through self-service to all users. === 2010s === Due to the growth of big data and cloud platforms, cloud-based data exchange platforms emerged. These were offered by major infrastructure providers, and included Amazon Web Services (AWS) Data Exchange, Snowflake Data Marketplace, and the Google Cloud Platform. These platforms moved beyond simple data brokerage or open data by providing structured, catalogued data sharing between organizations. === 2020s === Driven by a need to increase internal data sharing with both business users and AI, organizations are now looking to adopt internal data marketplaces. These aim to democratize data consumption by providing seamless access for all employees and AI to trusted data, including data products, through an intuitive, e-commerce style experience. According to Gartner analyst Richa Jha, "by providing a single, governed platform for discovering, sharing, and scaling data products, data marketplaces drive productivity, collaboration, and ROI across the enterprise." == Data marketplaces within the overall data architecture == Data marketplaces provide a consumption and collaboration layer for data. That means they complement and integrate with other parts of the overall data architecture, including: === Data warehouses and data lakes === Data marketplaces connect to data sources, such as data warehouses or data lakes, to provide intuitive access to the data stored within them, enabling data to be shared and distributed to non-technical audiences. Access can be direct, with data and data products stored within the data marketplace or virtualized. === Data catalog === A data catalog provides a technical inventory of an organization's data estate. It collects technical information on all available data assets within an organization, based on metadata descriptions. This ensures traceability, and supports compliance and governance requirements. Unlike a data marketplace, a data catalog does not provide access to data, and is designed to be used by data professionals, rather than the business. This means it lacks an intuitive, understandable interface and is consequently not easily accessible by business users. === Data mesh === Data mesh is an architecture and framework for data management, first defined by Zhamak Dehghani in 2019. It aims to decentralize data ownership to delegate responsibility, empowering teams and focusing on delivering data to users in the form of self-service data products. The data marketplace is a central pillar of data mesh, providing intuitive access to these data products, and creating a collaboration space for data owners and data consumers. === Data product === Data products are high-value, consumable data assets that package high-quality data and associated tools to enable seamless usage by business users at scale. First defined by McKinsey in 2022, they have an identified owner, a service level agreement (SLA), and a reusability logic. == Core components of a data marketplace == A data marketplace typically includes specific core components: === E-commerce style interface === An e-commerce style experience that engages non-technical users, minimizes the need for training and builds confidence and trust in data. Look and feel should be customizable to incorporate corporate design guidelines to ensure consistency with other organizational applications. === Built-in data catalog === As in a standalone data catalog, this indexes all available data, based on metadata that includes type, source, owner, freshness, and quality level. === Discovery and search engine === This enables users to search, filter, explore and discover available data intuitively. As in an e-commerce marketplace, it should be intelligent, and provide relevant results based on natural language queries. === Access control and security management === Data marketplaces will contain data that needs to be protected under regulations such as the General Data Protection Regulation (GDPR) in Europe, the California Consumer Privacy Act (CCPA) in the United States, and sector-specific frameworks in industries such as finance and healthcare. To ensure both security and compliance while maximizing data consumption, the data marketplace should include granular access management and a full audit trail. === Semantic layer and business glossary === Different parts of the business are likely to use different terms to describe data. This leads to inconsistencies and an inability to share data across systems and teams. The semantic layer and business glossary standardize a shared vocabulary and common definitions of business indicators and concepts, providing a single language for data across the business and for AI agents. === Data governance mechanisms === These enforce corporate data governance policies, ensuring data traceability through data lineage, quality certification, usage monitoring, and continuous improvement through user feedback loops. === Collaboration features === As on an e-commerce website, a data marketplace should provide collaboration features that bring together data users and data owners. This includes the ability to rate data products, share use cases, and provide feedback to data owners, creating a community around data and supporting a data-driven culture. == Types of data marketplace == While they share the same underlying technology, data marketplaces can be deployed in three broad ways: === Internal data marketplaces === These bring together data from across an organization and make it available via self-service to employees from across the business. They aim to widen access to data and consequently to improve decision-making and reporting, increase performance and maximize efficiency. === Ecosystem data marketplaces === These extend sharing beyond a single organization, enabling multiple partners (public institutions, industry players, research bodies) to share and consume data within a governed framework. Data can be provided by all parties or simply by one organization and consumed by others. Ecosystem data marketplaces are particularly relevant in

    Read more →
  • Time-lock puzzle

    Time-lock puzzle

    A time-lock puzzle, or time-released cryptography, encrypts a message that cannot be decrypted until a specified amount of time has passed. The concept was first described by Timothy C. May, and a solution first introduced by Ron Rivest, Adi Shamir, and David A. Wagner in 1996. Time-lock puzzle are useful in cases where confidentiality of information is determined by time, such as a diarist who does not want their views released until 50 years after their death, an auction where bids are sealed until the bidding period is closed, electronic voting, and contract signing. They can additionally be used in creating further cryptographic primitives, such as verifiable delay functions and zero knowledge proofs. Time-released cryptography can be achieved through several different mechanisms. Use mathematical problems requiring sequential calculations to solve, and cannot be solved with parallelization. Thus, adding more computers to a problem will not help solve the problem faster. Use of a trusted agent, or multiple agents who each hold a part of the message and cryptographic keys, who release the message after a specified time period has passed. Distribute public encryption keys to users, and place private cryptographic keys with a trusted agent in an offline location, to be released at a later date.

    Read more →
  • What I eat in a day video

    What I eat in a day video

    "What I eat in a day" videos are a trend on several social media platforms where a person describes all the meals and snacks that they eat during a given day, often as part of a given diet. The videos, shared on platforms including Twitter, TikTok and YouTube, become increasingly popular in 2020, with some of them accumulating millions of views, and they are considered a profitable industry for the people making them. Some have raised concerns that the videos may promote an unrealistic standard for healthy eating and contribute to the development of eating disorders. == Format == These videos often feature a montage of the food that the creator eats over the course of the day, sometimes with the associated calorie count of the foods that they describe. Unlike related mukbang videos, however, in which participants eat large amounts of food, the diets described are often restrictive. However, other videos are labeled as "unhealthy" and depict large portion sizes and higher amounts of processed food. == Popularity == "What I eat in a day" videos have existed for a long time, especially on YouTube, but they have become much more widespread in recent years. This phenomenon is self-reinforcing because when social media users watch or like these videos they are likely to see more of them in the future. Indeed, some of the most successful videos have tens of millions of view each. == Criticism and controversy == Several dieticians and mental health professionals over the impacts that these videos can have, as they can advocate a restrictive style of eating and not "promote body diversity." They have also raised concerns that this trend could contribute to a rise in disordered eating, especially since use of social media is known to increase feelings of negative body image. This trend is particularly prevalent among young adults, which are also the group with the highest vulnerability to eating disorders. More recently, a portion of these videos have begun to challenge diets and depict more realistic ways of eating in order to reduce the potential consequences of the trend.

    Read more →
  • ISO/IEC JTC 1/SC 24

    ISO/IEC JTC 1/SC 24

    ISO/IEC JTC 1/SC 24 Computer graphics, image processing and environmental data representation is a standardization subcommittee of the joint subcommittee ISO/IEC JTC 1 of the International Organization for Standardization (ISO) and the International Electrotechnical Commission (IEC), which develops and facilitates standards within the field of computer graphics, image processing, and environmental data representation. The international secretariat of ISO/IEC JTC 1/SC 24 is the British Standards Institute (BSI) located in the United Kingdom. == History == ISO/IEC JTC 1/SC 24 was formed in 1987 from ISO/TC 97 as a result of Resolution 21 at the ISO/IEC JTC 1 plenary. The group's origins began in computer graphics, the standardization of which was originally under ISO/IEC JTC 1/SC 21/WG 2. However, when ISO/IEC JTC 1/SC 24 was created, the standardization activity of ISO/IEC JTC 1/SC 21/WG 2 was carried over to the new subcommittee. The initial five working groups of ISO/IEC JTC 1/SC 24 were titled, “Architecture,” “Application programming interfaces,” “Metafiles and interfaces,” “Language bindings,” and “Validation, testing and registration.” The work of ISO/IEC JTC 1/SC 24 began with the Graphical Kernel System (GKS), which was adopted from ISO/IEC JTC 1/SC 21/WG 2. However, since GKS only addressed 2D functionality, attention turned to the standardization of 3D functionality. This resulted in two standards being published: GKS-3D in 1988 and PHIGS in 1989, both of which addressed 3D functionality. Since 1991, ISO/IEC JTC 1/SC 24 has held plenaries in a number of countries, including the Netherlands, Germany, United States, France, Canada, Japan, Sweden, Korea, United Kingdom, Australia, and Czech Republic. == Scope == The scope of ISO/IEC JTC 1/SC 24 is the “Standardization of interfaces for information technology based applications relating to”: Computer graphics Image processing Environmental data representation Support for the Mixed and Augmented Reality (MAR) Interaction with, and visual representation of, information Included are the following related areas: Modeling and simulation and related reference models Virtual reality with accompanying augmented reality/augmented virtuality aspects and related reference models Application program interfaces Functional specifications Representation models Interchange formats, encodings and their specifications, including metafiles Device interfaces Testing methods Registration procedures Presentation and support for creation of multimedia, hypermedia, and mixed reality documents Excluded are the following areas: Character and image coding Coding of multimedia, hypermedia, and mixed reality document interchange formats JTC 1 work in user system interfaces and document presentation ISO/TC 207 work on ISO 14000 environment management, ISO/TC 211 work on geographic information and geomatics Software environments as described by ISO/IEC JTC 1/SC 22 == Structure == ISO/IEC JTC 1/SC 24 is made up of four active working groups, each of which carries out specific tasks in standards development within the field of computer graphics, image processing and environmental data representation, together with ITU-T Study Group 16. As a response to changing standardization needs, working groups of ISO/IEC JTC 1/SC 24 can be disbanded if their area of work is no longer applicable, or established if new working areas arise. The focus of each working group is described in the group's terms of reference. Active working groups of ISO/IEC JTC 1/SC 24 are: == Collaborations == ISO/IEC JTC 1/SC 24 works in close collaboration with a number of other organizations or subcommittees, both internal and external to ISO or IEC, in order to avoid conflicting or duplicative work. Organizations internal to ISO or IEC that collaborate with or are in liaison to ISO/IEC JTC 1/SC 24 include: ISO/IEC JTC 1/WG 7, Sensor Networks ISO/IEC JTC 1/SC 29, Coding of audio, picture, multimedia and hypermedia information ISO/IEC JTC 1/SC 32, Data management and interchange ISO/TAG 14, Imagery and technology ISO/TC 130, Graphic Technology ISO/TC 184/SC 4, Industrial data ISO/TC 211, Geographic information/Geomatics ISO/TC 215, Health informatics IEC TC 100, Audio, video and multimedia system and equipment Some organizations external to ISO or IEC that collaborate with or are in liaison to ISO/IEC JTC 1/SC 24 include: Defence Geospatial Information Working Group (DGIWG) Digital Imaging and Communications in Medicine (DICOM) International Hydrographic Organization (IHO) The Khronos Group NATO - Joint Intelligence Surveillance and Reconnaissance Capability Group (JISRCG) OMG Robotics DTF Open CGM Open Geospatial Consortium (OGC) SEDRIS Organization Simulation Interoperability Standards Organization (SISO) US National Imagery Transmission Format Standard (NITFS) Technical Board (US NTB) Web3D Consortium World Intellectual Property Organization (WIPO) World Wide Web Consortium (W3C) == Member countries == Countries pay a fee to ISO to be members of subcommittees. The 11 "P" (participating) members of ISO/IEC JTC 1/SC 24 are: Australia, China, Egypt, France, India, Japan, Republic of Korea, Portugal, Russian Federation, United Kingdom, and United States. The 22 "O" (observer) members of ISO/IEC JTC 1/SC 24 are: Argentina, Austria, Belgium, Bosnia and Herzegovina, Bulgaria, Canada, Cuba, Czech Republic, Finland, Ghana, Hungary, Iceland, Indonesia, Islamic Republic of Iran, Italy, Kazakhstan, Malaysia, Poland, Romania, Serbia, Slovakia, Switzerland, and Thailand. == Published standards == ISO/IEC JTC 1/SC 24 currently has 80 published standards under their direct responsibility within the field of computer graphics, image processing, and environmental data representation, including:

    Read more →
  • Myrinet

    Myrinet

    Myrinet, ANSI/VITA 26-1998, is a high-speed local area networking system designed by the company Myricom to be used as an interconnect between multiple machines to form computer clusters. == Description == Myrinet was promoted as having lower protocol overhead than standards such as Ethernet, and therefore better throughput, less interference, and lower latency while using the host CPU. Although it can be used as a traditional networking system, Myrinet is often used directly by programs that "know" about it, thereby bypassing a call into the operating system. Earlier versions of Myrinet used a variety of media and connectors: Generation 2 used copper media with DC-37 (Myrinet-LAN, M2L- controllers and switches) or microribbon (Myrinet-SAN, M2M-) connectors. Generation 3 used copper media with HSSDC (Myrinet-Serial, M3S-) or microribbon (Myrinet-SAN, M3M-) connectors, or fiber with LC-connectors (Myrinet-Fiber, M3F-). The later versions of Myrinet physically consist of two fibre optic cables, upstream and downstream, connected to the host computers with a single connector. Machines are connected via low-overhead routers and switches, as opposed to connecting one machine directly to another. Myrinet includes a number of fault-tolerance features, mostly backed by the switches. These include flow control, error control, and "heartbeat" monitoring on every link. The "fourth-generation" Myrinet, called Myri-10G, supported a 10 Gbit/s data rate and can use 10 Gigabit Ethernet on PHY, the physical layer (cables, connectors, distances, signaling). Myri-10G started shipping at the end of 2005. Myrinet was approved in 1998 by the American National Standards Institute for use on the VMEbus as ANSI/VITA 26-1998. One of the earliest publications on Myrinet is a 1995 IEEE article. === Performance === Myrinet is a lightweight protocol with little overhead that allows it to operate with throughput close to the basic signaling speed of the physical layer. For supercomputing, the low latency of Myrinet is even more important than its throughput performance, since, according to Amdahl's law, a high-performance parallel system tends to be bottlenecked by its slowest sequential process, which in all but the most embarrassingly parallel supercomputer workloads is often the latency of message transmission across the network. === Deployment === According to Myricom, 141 (28.2%) of the June 2005 TOP500 supercomputers used Myrinet technology. In the November 2005 TOP500, the number of supercomputers using Myrinet was down to 101 computers, or 20.2%, in November 2006, 79 (15.8%), and by November 2007, 18 (3.6%), a long way behind gigabit Ethernet at 54% and InfiniBand at 24.2%. In the June 2014 TOP500 list, the number of supercomputers using Myrinet interconnect was 1 (0.2%). In November 2013, the assets of Myricom (including the Myrinet technology) were acquired by CSP Inc. In 2016, it was reported that Google had also offered to buy the company.

    Read more →
  • Data proliferation

    Data proliferation

    Data proliferation refers to the prodigious amount of data, structured and unstructured, that businesses and governments continue to generate at an unprecedented rate and the usability problems that result from attempting to store and manage that data. While originally pertaining to problems associated with paper documentation, data proliferation has become a major problem in primary and secondary data storage on computers. While digital storage has become cheaper, the associated costs, from raw power to maintenance and from metadata to search engines, have not kept up with the proliferation of data. Although the power required to maintain a unit of data has fallen, the cost of facilities which house the digital storage has tended to rise. Data proliferation has been documented as a problem for the U.S. military since August 1971, in particular regarding the excessive documentation submitted during the acquisition of major weapon systems. Efforts to mitigate data proliferation and the problems associated with it are ongoing. == Problems caused == The problem of data proliferation is affecting all areas of commerce as a result of the availability of relatively inexpensive data storage devices. This has made it very easy to dump data into secondary storage immediately after its window of usability has passed. This masks problem that could gravely affect the profitability of businesses and the efficient functioning of health services, police and security forces, local and national governments, and many other types of organizations. Data proliferation is problematic for several reasons: Difficulty when trying to find and retrieve information. At Xerox, on average it takes employees more than one hour per week to find hard-copy documents, costing $2,152 a year to manage and store them. For businesses with more than 10 employees, this increases to almost two hours per week at $5,760 per year. In large networks of primary and secondary data storage, problems finding electronic data are analogous to problems finding hard copy data. Data loss and legal liability when data is disorganized, not properly replicated, or cannot be found promptly. In April 2005, the Ameritrade Holding Corporation told 200,000 current and past customers that a tape containing confidential information had been lost or destroyed in transit. In May of the same year, Time Warner Incorporated reported that 40 tapes containing personal data on 600,000 current and former employees had been lost en route to a storage facility. In March 2005, a Florida judge hearing a $2.7 billion lawsuit against Morgan Stanley issued an "adverse inference order" against the company for "willful and gross abuse of its discovery obligations." The judge cited Morgan Stanley for repeatedly finding misplaced tapes of e-mail messages long after the company had claimed that it had turned over all such tapes to the court. Increased manpower requirements to manage increasingly chaotic data storage resources. Slower networks and application performance due to excess traffic as users search and search again for the material they need. High cost in terms of the energy resources required to operate storage hardware. A 100 terabyte system will cost up to $35,040 a year to run—not counting cooling costs. == Proposed solutions == Applications that better utilize modern technology Reductions in duplicate data (especially as caused by data movement) Improvement of metadata structures Improvement of file and storage transfer structures User education and discipline The implementation of Information Lifecycle Management solutions to eliminate low-value information as early as possible before putting the rest into actively managed long-term storage in which it can be quickly and cheaply accessed.

    Read more →
  • Software token

    Software token

    A software token (a.k.a. soft token) is a piece of a two-factor authentication security device that may be used to authorize the use of computer services. Software tokens are stored on a general-purpose electronic device such as a desktop computer, laptop, PDA, or mobile phone and can be duplicated. (Contrast hardware tokens, where the credentials are stored on a dedicated hardware device and therefore cannot be duplicated — absent physical invasion of the device) Because software tokens are something one does not physically possess, they are exposed to unique threats based on duplication of the underlying cryptographic material - for example, computer viruses and software attacks. Both hardware and software tokens are vulnerable to bot-based man-in-the-middle attacks, or to simple phishing attacks in which the one-time password provided by the token is solicited, and then supplied to the genuine website in a timely manner. Software tokens do have benefits: there is no physical token to carry, they do not contain batteries that will run out, and they are cheaper than hardware tokens. == Security architecture == There are two primary architectures for software tokens: shared secret and public-key cryptography. For a shared secret, an administrator will typically generate a configuration file for each end-user. The file will contain a username, a personal identification number, and the secret. This configuration file is given to the user. The shared secret architecture is potentially vulnerable in a number of areas. The configuration file can be compromised if it is stolen and the token is copied. With time-based software tokens, it is possible to borrow an individual's PDA or laptop, set the clock forward, and generate codes that will be valid in the future. Any software token that uses shared secrets and stores the PIN alongside the shared secret in a software client can be stolen and subjected to offline attacks. Shared secret tokens can be difficult to distribute, since each token is essentially a different piece of software. Each user must receive a copy of the secret, which can create time constraints. Some newer software tokens rely on public-key cryptography, or asymmetric cryptography. This architecture eliminates some of the traditional weaknesses of software tokens, but does not affect their primary weakness (ability to duplicate). A PIN can be stored on a remote authentication server instead of with the token client, making a stolen software token no good unless the PIN is known as well. However, in the case of a virus infection, the cryptographic material can be duplicated and then the PIN can be captured (via keylogging or similar) the next time the user authenticates. If there are attempts made to guess the PIN, it can be detected and logged on the authentication server, which can disable the token. Using asymmetric cryptography also simplifies implementation, since the token client can generate its own key pair and exchange public keys with the server.

    Read more →
  • Tradeshift

    Tradeshift

    Tradeshift is a cloud based business network and platform for purchase-to-pay automation, supply chain payments, marketplaces, virtual cards and supply chain financing. Its 2018 round of funding, led by Goldman Sachs, raised US$250 million at a valuation of $1.1 billion, giving the company unicorn status. Tradeshift is headquartered in San Francisco, California and has offices in London, Copenhagen, Bucharest and Kuala Lumpur. Tradeshift has reprocessed over $1 trillion USD through transactions on its network. == History == Tradeshift was founded in 2010 by Christian Lanng, Mikkel Hippe Brun, and Gert Sylvest. Inspiration for Tradeshift came after they created the world's first large scale peer-to-peer infrastructure for an e-business called NemHandel. The founders also had leading roles (Governing board member, Technical Director) in the European Commission project PEPPOL inside the European Union. In 2010, the Tradeshift platform launched in May in Copenhagen. Tradeshift won the European Startup Awards in the category of "Best Business or Enterprise Startup." In 2011, Tradeshift made its app marketplace available. In 2012, Tradeshift moved their headquarters from Copenhagen to San Francisco. In 2013, Tradeshift opened an R&D center in Suzhou, China. Tradeshift opened an additional office in London. And LATAM e-invoicing capabilities were added through partnership with Invoiceware. In 2014, Tradeshift expanded with offices in Tokyo, Paris, and Munich. The EU Commission officially approved the Universal Business Language (UBL) data format – a format Tradeshift supports – as eligible for referencing in tenders from public administrations. In 2015, Tradeshift won the Circulars "Digital Disruptor" Award at the WEF conference in Davos, Switzerland. Tradeshift also acquired product information management company Merchantry, and launched e-procurement and supplier risk management solutions. In 2016, Tradeshift acquired Hyper Travel and secured a $75 million series-D round funding. In 2017, Tradeshift acquired IBX Business Network and launches Tradeshift Ada. In 2018, Tradeshift secured a $250 million series-E round funding. and launched Blockchain Payments, the latter as part of Tradeshift Pay. In December 2018 Tradeshift acquired Babelway, an online B2B integration platform. The acquisition added three new office locations to Tradeshift (Salt Lake City, Louvain-la-neuve, Belgium, Cairo Egypt). In Q3 2018, Tradeshift reported year-over-year revenue growth of 400%, new bookings growth of 284%, and gross merchandise volume (GMV) growth of 262%. New total contract value also grew by US$47 million. Additionally, it added 27 new customers including Hertz, Shiseido, ECU and multiple Fortune 500 companies. In July 2023, HSBC and Tradeshift announced an agreement to launch a new, jointly owned business focused on the development of embedded finance solutions and financial services apps. As part of the agreement, HSBC made a $35 million investment into Tradeshift and joined its board. The agreement was part of a funding round which is expected to raise a minimum of $70 million from HSBC and other investors. The new joint venture will allow HSBC and Tradeshift to deploy a range of digital solutions across Tradeshift and other platforms. This includes payment and fintech services embedded into trade, e-commerce and marketplace experiences. In September 2023, CEO Lanng was fired for "gross misconduct on multiple grounds," including "allegations of sexual assault and harassment." Tradeshift was alleged to have fired his accuser after she complained to the company's human resources department, its co-founders and members of its board of directors about his abuse. == Financials == The company's valuation as of May 2018 was $1.1 billion. Tradeshift is now considered a unicorn, and, according to Bloomberg, will not need any further funding. Jan 14, 2020, Tradeshift announced that they had raised $240 million in Series F finance. == Acquisitions == In 2015, Tradeshift acquired product information management company Merchantry. Merchantry is a retail product information management (PIM) software for multi-vendor ecommerce retailers. In 2016, Tradeshift acquired Hyper Travel. Hyper Travel is a travel management service that allows customers to access travel agents via its native messaging apps, SMS, and email. In 2017, Tradeshift acquired IBX Group. In 2018, Tradeshift acquired Babelway, an online B2B integration platform.

    Read more →
  • Backup

    Backup

    In information technology, a backup, or data backup is a copy of computer data taken and stored elsewhere so that it may be used to restore the original after a data loss event. The verb form, referring to the process of doing so, is "back up", whereas the noun and adjective form is "backup". Backups can be used to recover data after its loss from data deletion or corruption, or to recover data from an earlier time. Backups provide a simple form of IT disaster recovery; however not all backup systems are able to reconstitute a computer system or other complex configuration such as a computer cluster, active directory server, or database server. A backup system contains at least one copy of all data considered worth saving. The data storage requirements can be large. An information repository model may be used to provide structure to this storage. There are different types of data storage devices used for copying backups of data that is already in secondary storage onto archive files. There are also different ways these devices can be arranged to provide geographic dispersion, data security, and portability. Data is selected, extracted, and manipulated for storage. The process can include methods for dealing with live data, including open files, as well as compression, encryption, and de-duplication. Additional techniques apply to enterprise client-server backup. Backup schemes may include dry runs that validate the reliability of the data being backed up. There are limitations and human factors involved in any backup scheme. == Storage == A backup strategy requires an information repository, "a secondary storage space for data" that aggregates backups of data "sources". The repository could be as simple as a list of all backup media (DVDs, etc.) and the dates produced, or could include a computerized index, catalog, or relational database. === 3-2-1 Backup Rule === The backup data needs to be stored, requiring a backup rotation scheme, which is a system of backing up data to computer media that limits the number of backups of different dates retained separately, by appropriate re-use of the data storage media by overwriting of backups no longer needed. The scheme determines how and when each piece of removable storage is used for a backup operation and how long it is retained once it has backup data stored on it. The 3-2-1 rule can aid in the backup process. It states that there should be at least 3 copies of the data, stored on 2 different types of storage media, and one copy should be kept offsite, in a remote location (this can include cloud storage). 2 or more different media should be used to eliminate data loss due to similar reasons (for example, optical discs may tolerate being underwater while LTO tapes may not, and SSDs cannot fail due to head crashes or damaged spindle motors since they do not have any moving parts, unlike hard drives). An offsite copy protects against fire, theft of physical media (such as tapes or discs) and natural disasters like floods and earthquakes. Physically protected hard drives are an alternative to an offsite copy, but they have limitations like only being able to resist fire for a limited period of time, so an offsite copy still remains as the ideal choice. Because there is no perfect storage, many backup experts recommend maintaining a second copy on a local physical device, even if the data is also backed up offsite. === Backup methods === ==== Unstructured ==== An unstructured repository may simply be a stack of tapes, DVD-Rs or external HDDs with minimal information about what was backed up and when. This method is the easiest to implement, but unlikely to achieve a high level of recoverability as it lacks automation. ==== Full only/System imaging ==== A repository using this backup method contains complete source data copies taken at one or more specific points in time. Copying system images, this method is frequently used by computer technicians to record known good configurations. However, imaging is generally more useful as a way of deploying a standard configuration to many systems rather than as a tool for making ongoing backups of diverse systems. ==== Incremental ==== An incremental backup stores data changed since a reference point in time. Duplicate copies of unchanged data are not copied. Typically a full backup of all files is made once or at infrequent intervals, serving as the reference point for an incremental repository. Subsequently, a number of incremental backups are made after successive time periods. Restores begin with the last full backup and then apply the incrementals. Some backup systems can create a synthetic full backup from a series of incrementals, thus providing the equivalent of frequently doing a full backup. When done to modify a single archive file, this speeds restores of recent versions of files. ==== Near-CDP ==== Continuous Data Protection (CDP) refers to a backup that instantly saves a copy of every change made to the data. This allows restoration of data to any point in time and is the most comprehensive and advanced data protection. Near-CDP backup applications—often marketed as "CDP"—automatically take incremental backups at a specific interval, for example every 15 minutes, one hour, or 24 hours. They can therefore only allow restores to an interval boundary. Near-CDP backup applications use journaling and are typically based on periodic "snapshots", read-only copies of the data frozen at a particular point in time. Near-CDP (except for Apple Time Machine) intent-logs every change on the host system, often by saving byte or block-level differences rather than file-level differences. This backup method differs from simple disk mirroring in that it enables a roll-back of the log and thus a restoration of old images of data. Intent-logging allows precautions for the consistency of live data, protecting self-consistent files but requiring applications "be quiesced and made ready for backup." Near-CDP is more practicable for ordinary personal backup applications, as opposed to true CDP, which must be run in conjunction with a virtual machine or equivalent and is therefore generally used in enterprise client-server backups. Software may create copies of individual files such as written documents, multimedia projects, or user preferences, to prevent failed write events caused by power outages, operating system crashes, or exhausted disk space, from causing data loss. A common implementation is an appended ".bak" extension to the file name. ==== Reverse incremental ==== A Reverse incremental backup method stores a recent archive file "mirror" of the source data and a series of differences between the "mirror" in its current state and its previous states. A reverse incremental backup method starts with a non-image full backup. After the full backup is performed, the system periodically synchronizes the full backup with the live copy, while storing the data necessary to reconstruct older versions. This can either be done using hard links—as Apple Time Machine does, or using binary diffs. ==== Differential ==== A differential backup saves only the data that has changed since the last full backup. This means a maximum of two backups from the repository are used to restore the data. However, as time from the last full backup (and thus the accumulated changes in data) increases, so does the time to perform the differential backup. Restoring an entire system requires starting from the most recent full backup and then applying just the last differential backup. A differential backup copies files that have been created or changed since the last full backup, regardless of whether any other differential backups have been made since, whereas an incremental backup copies files that have been created or changed since the most recent backup of any type (full or incremental). Changes in files may be detected through a more recent date/time of last modification file attribute, and/or changes in file size. Other variations of incremental backup include multi-level incrementals and block-level incrementals that compare parts of files instead of just entire files. === Storage media === Regardless of the repository model that is used, the data has to be copied onto an archive file data storage medium. The medium used is also referred to as the type of backup destination. ==== Magnetic tape ==== Magnetic tape was for a long time the most commonly used medium for bulk data storage, backup, archiving, and interchange. It was previously a less expensive option, but this is no longer the case for smaller amounts of data. Tape is a sequential access medium, so the rate of continuously writing or reading data can be very fast. While tape media itself has a low cost per space, tape drives are typically dozens of times as expensive as hard disk drives and optical drives. Tape media are generally rotated on a schedule so at least one set is off-site in case something should happe

    Read more →
  • Initialization vector

    Initialization vector

    In cryptography, an initialization vector (IV) or starting variable is an input to a cryptographic primitive being used to provide the initial state. The IV is typically required to be random or pseudorandom, but sometimes an IV only needs to be unpredictable or unique. Randomization is crucial for some encryption schemes to achieve semantic security, a property whereby repeated usage of the scheme under the same key does not allow an attacker to infer relationships between (potentially similar) segments of the encrypted message. For block ciphers, the use of an IV is described by the modes of operation. Some cryptographic primitives require the IV only to be non-repeating, and the required randomness is derived internally. In this case, the IV is commonly called a nonce (a number used only once), and the primitives (e.g. CBC) are considered stateful rather than randomized. This is because an IV need not be explicitly forwarded to a recipient but may be derived from a common state updated at both sender and receiver side. (In practice, a short nonce is still transmitted along with the message to consider message loss.) An example of stateful encryption schemes is the counter mode of operation, which has a sequence number for a nonce. The IV size depends on the cryptographic primitive used; for block ciphers it is generally the cipher's block-size. In encryption schemes, the unpredictable part of the IV has at best the same size as the key to compensate for time/memory/data tradeoff attacks. When the IV is chosen at random, the probability of collisions due to the birthday problem must be taken into account. Traditional stream ciphers such as RC4 do not support an explicit IV as input, and a custom solution for incorporating an IV into the cipher's key or internal state is needed. Some designs realized in practice are known to be insecure; the WEP protocol is a notable example, and is prone to related-IV attacks. == Motivation == A block cipher is one of the most basic primitives in cryptography, and frequently used for data encryption. However, by itself, it can only be used to encode a data block of a predefined size, called the block size. For example, a single invocation of the AES algorithm transforms a 128-bit plaintext block into a ciphertext block of 128 bits in size. The key, which is given as one input to the cipher, defines the mapping between plaintext and ciphertext. If data of arbitrary length is to be encrypted, a simple strategy is to split the data into blocks each matching the cipher's block size, and encrypt each block separately using the same key. This method is not secure as equal plaintext blocks get transformed into equal ciphertexts, and a third party observing the encrypted data may easily determine its content even when not knowing the encryption key. To hide patterns in encrypted data while avoiding the re-issuing of a new key after each block cipher invocation, a method is needed to randomize the input data. In 1980, the NIST published a national standard document designated Federal Information Processing Standard (FIPS) PUB 81, which specified four so-called block cipher modes of operation, each describing a different solution for encrypting a set of input blocks. The first mode implements the simple strategy described above, and was specified as the electronic codebook (ECB) mode. In contrast, each of the other modes describe a process where ciphertext from one block encryption step gets intermixed with the data from the next encryption step. To initiate this process, an additional input value is required to be mixed with the first block, and which is referred to as an initialization vector. For example, the cipher-block chaining (CBC) mode requires an unpredictable value, of size equal to the cipher's block size, as additional input. This unpredictable value is added to the first plaintext block before subsequent encryption. In turn, the ciphertext produced in the first encryption step is added to the second plaintext block, and so on. The ultimate goal for encryption schemes is to provide semantic security: by this property, it is practically impossible for an attacker to draw any knowledge from observed ciphertext. It can be shown that each of the three additional modes specified by the NIST are semantically secure under so-called chosen-plaintext attacks. == Properties == Properties of an IV depend on the cryptographic scheme used. A basic requirement is uniqueness, which means that no IV may be reused under the same key. For block ciphers, repeated IV values devolve the encryption scheme into electronic codebook mode: equal IV and equal plaintext result in equal ciphertext. In stream cipher encryption uniqueness is crucially important as plaintext may be trivially recovered otherwise. Example: Stream ciphers encrypt plaintext P to ciphertext C by deriving a key stream K from a given key and IV and computing C as C = P xor K. Assume that an attacker has observed two messages C1 and C2 both encrypted with the same key and IV. Then knowledge of either P1 or P2 reveals the other plaintext since C1 xor C2 = (P1 xor K) xor (P2 xor K) = P1 xor P2. Many schemes require the IV to be unpredictable by an adversary. This is effected by selecting the IV at random or pseudo-randomly. In such schemes, the chance of a duplicate IV is negligible, but the effect of the birthday problem must be considered. As for the uniqueness requirement, a predictable IV may allow recovery of (partial) plaintext. Example: Consider a scenario where a legitimate party called Alice encrypts messages using the cipher-block chaining mode. Consider further that there is an adversary called Eve that can observe these encryptions and is able to forward plaintext messages to Alice for encryption (in other words, Eve is capable of a chosen-plaintext attack). Now assume that Alice has sent a message consisting of an initialization vector IV1 and starting with a ciphertext block CAlice. Let further PAlice denote the first plaintext block of Alice's message, let E denote encryption, and let PEve be Eve's guess for the first plaintext block. Now, if Eve can determine the initialization vector IV2 of the next message she will be able to test her guess by forwarding a plaintext message to Alice starting with (IV2 xor IV1 xor PEve); if her guess was correct this plaintext block will get encrypted to CAlice by Alice. This is because of the following simple observation: CAlice = E(IV1 xor PAlice) = E(IV2 xor (IV2 xor IV1 xor PAlice)). Depending on whether the IV for a cryptographic scheme must be random or only unique the scheme is either called randomized or stateful. While randomized schemes always require the IV chosen by a sender to be forwarded to receivers, stateful schemes allow sender and receiver to share a common IV state, which is updated in a predefined way at both sides. == Block ciphers == Block cipher processing of data is usually described as a mode of operation. Modes are primarily defined for encryption as well as authentication, though newer designs exist that combine both security solutions in so-called authenticated encryption modes. While encryption and authenticated encryption modes usually take an IV matching the cipher's block size, authentication modes are commonly realized as deterministic algorithms, and the IV is set to zero or some other fixed value. == Stream ciphers == In stream ciphers, IVs are loaded into the keyed internal secret state of the cipher, after which a number of cipher rounds are executed prior to releasing the first bit of output. For performance reasons, designers of stream ciphers try to keep that number of rounds as small as possible, but because determining the minimal secure number of rounds for stream ciphers is not a trivial task, and considering other issues such as entropy loss, unique to each cipher construction, related-IVs and other IV-related attacks are a known security issue for stream ciphers, which makes IV loading in stream ciphers a serious concern and a subject of ongoing research. == WEP IV == The 802.11 encryption algorithm called WEP (short for Wired Equivalent Privacy) used a short, 24-bit IV, leading to reused IVs with the same key, which led to it being easily cracked. Packet injection allowed for WEP to be cracked in times as short as several seconds. This ultimately led to the deprecation of WEP. == SSL 2.0 IV == In cipher-block chaining mode (CBC mode), the IV need not be secret, but must be unpredictable (In particular, for any given plaintext, it must not be possible to predict the IV that will be associated to the plaintext in advance of the generation of the IV.) at encryption time. Additionally for the output feedback mode (OFB mode), the IV must be unique. In particular, the (previously) common practice of re-using the last ciphertext block of a message as the IV for the next message is insecure (for example, this method was used by SSL 2.0). If an attacker knows

    Read more →