AI Image Generators

Explore the best AI Image Generators — independent reviews, comparisons, pricing and step-by-step how-to guides, curated by Aizhi.

  • Kubity

    Kubity

    Kubity is a cloud-based 3D communication tool that works on desktop computers, the web, smartphones, tablets, augmented reality gear, and virtual reality glasses. Kubity is powered by several proprietary 3D processing engines including "Paragone" and "Etna" that prepare the 3D file for transfer over mobile devices. Kubity has practical applications for architecture, interior design, engineering, product design, film, and video games among others. The majority of its users create 3D models using SketchUp or Autodesk Revit software. Kubity products include the Kubity web app and Kubity Go (a mobile application for iOS and Android). Kubity is compatible across many platforms, devices and operating systems including: iOS, Android, Firefox, Chrome, Windows, MacOS, and Linux. == History == Kubity was created by SPK Technology (ex Kubity S.A.S.), a Paris-based software company specializing in automatic 3D data optimization and visualization. Founded in 2012 by a group of software engineers and an urban projects developer, they united around a simple idea: create a way for anyone, anywhere to simply and intuitively explore 3D models on smartphones and computers. In order to bring architects, engineers and designers together with their clients around a 3D model, it was essential to develop an interactive platform that supported multiple desktop and mobile devices for instantaneous and fluid 3D navigation. With specifications in place, 15 engineers fused together several technologies: 3D design, data compression, decimation and rendering optimization, web and mobile transfer, and virtual reality headset integration. In January 2014, the first public Kubity prototype (1.0 Amethyst) was launched to a small group of beta testers with a plug-in that allowed users to import 3D models from SketchUp to their browser. A global release was announced in April 2014 at the SketchUp Basecamp in Vail, Colorado. In May 2015, Kubity launched a web application that worked using WebGL technology (2.0 Citrine). For the first time, users were able to drag and drop any SketchUp file in a web browser without having to install a plug-in. In December 2015, Kubity launched a mobile application on the App Store for iPhone, iPod, and iPad as well as on Google Play for Android devices (3.0 Druzy). In November 2016, Kubity launched support for Oculus Rift and HTC Vive (4.0 Emerald). Beginning in November 2017, Kubity launched a full suite rollout of mobile applications over six months that included Kubity AR for augmented reality, Kubity VR for virtual reality, and Kubity Mirror for remote presentations and screen mirroring (5.0 Feldspar). In September 2018, a one-click plugin for SketchUp and Revit (Kubity PRO), along with a mobile-first revamp of Kubity Go was launched, allowing PRO-to-Go device pairing for automatic mobile sync (6.0 Gypsum). In early 2019, the Kubity Go application was updated to include fully integrated AR, VR, and screen mirroring functionalities, killing off the dedicated companion apps Kubity AR, Kubity VR and Kubity Mirror in the process (7.0 Heliotrope). In January 2020, support for the Kubity PRO plugin for SketchUp and Revit was migrated to a SketchUp-only web app. == Technology == Kubity is powered by a proprietary 3D crystallization engine known as "Paragone"; a technology developed by SPK Technology. Paragone takes constrained information from a 3D file and runs it through the "BlockWave" algorithm (US Patent 10,482.629), also developed by SPK Technology. BlockWave is a multiphase optimization algorithm that combines 3D design, data compression, decimation and rendering optimization, web and mobile transfer, and mixed reality headset integration to create a crystallized universal format of the original file. One phase of the BlockWave algorithm is based on the quadric-based polygonal surface simplification algorithm, performed using predefined heuristics, and is associated with a plurality of simplified versions of the 3D model, each version being associated with a predefined level of detail adapted to the user specific end device. BlockWave extracts data content, geometry and textures, then sets quadrics for each top of the original 3D model, and identifies pairs of adjacent tops linked by vertices. The algorithm uses a local collapsing operator and a top-plan error metric to obtain a fixed number of faces or a maximum defined error; 3D meshing is simplified by replacing two points with one, then deleting the degrading faces and updating adjacent relations. Once decimation is completed, texture optimization is set using texture target parameters allowing maximized GPU memory to improve computing time. With texture encoding completed, the crystallized universal 3D file can now be easily opened on any user-specific end device and played across most digital devices with real-time rendering. == Features == === 3D Crystallization === A user converts (or crystallizes) a 3D file by exporting it with the Kubity web app. Crystallization adds features like AR/VR and cinematic fly-through tour as well as assigns the model a dedicated QR code. === Automatic Mobile Sync === When a 3D model is exported, it is automatically synced to Kubity Go on the user's mobile device. From there, it can be accessed, explored, and shared with others with or without an internet connection. === Security and Management === User models can be managed all in one place on Kubity Go or in a browser from their account. Models can be renamed, password-protected, shared, and played. === Augmented Reality === Developed using Apple ARKit and Google ARCore technology, Kubity Go's augmented reality feature maps the environment in a room detecting horizontal planes like tables and floors to track and place 3D objects. By blending digital objects and information with the environment, Kubity allows users to interact with 3D models in true augmented reality. Built-in communication features allows users to instantly share 3D models with anyone over text, email, social media, or direct device-to-device with a QR Code. Platform Support AR supports devices running iOS11 including: iPhone SE, iPhone 6s, iPhone 6s Plus, iPhone 7, iPhone 7 Plus, iPhone 8, iPhone X, all iPad Pro models, and iPad (2017). AR for Android requires Android 7.0 or later and access to the Google Play Store. === Virtual Reality === VR allows users to explore SketchUp models and Revit projects on-the-go right from a mobile device using Oculus Go, Google Cardboard, Samsung Gear VR, or the glasses-free Magic Window feature. Kubity's virtual reality feature is compatible with Oculus Go, Google Cardboard viewers and other cardboard compatible devices including clip-on style VR glasses like Homido Mini, as well as the mobile virtual reality headset, Samsung Gear VR. Samsung Gear VR supports: Galaxy S6, Galaxy S6 Edge, Galaxy S6 Edge+, Samsung Galaxy Note 5, Galaxy S7, Galaxy S7 Edge, Galaxy S8, Galaxy S8+, Samsung Galaxy Note Fan Edition, Samsung Galaxy Note 8, Samsung Galaxy A8/A8+ (2018), and Samsung Galaxy S9/Galaxy S9+. === Screen Mirroring === Screen mirroring allows a user to sync the sender device to a receiver on a webpage, then control from the sender device to give a remote presentation of the 3D model. Devices are easily synced by entering a six-digit number displayed on the receiving computer. == Platform support == On iOS, the Kubity application is compatible with devices running on the version 9.0 or higher. On Android, Kubity is compatible with devices running on the version 4.4 “Kit Kat” or higher. The web version of Kubity applications currently support web browsers compatible with WebGL2 : Mozilla Firefox and Google Chrome. AR is compatible with devices running iOS11 including: iPhone SE, iPhone 6s, iPhone 6s Plus, iPhone 7, iPhone 7 Plus, iPhone 8, iPhone X, all iPad Pro models, and iPad (2017), and Android devices. Requires Android 7.0 or later and access to the Google Play Store. VR is compatible with Google Cardboard viewers and other cardboard compatible devices including clip-on style VR glasses like Homido Mini, as well as the Samsung Gear VR and Oculus Go. Samsung Gear VR supports: Galaxy S6, Galaxy S6 Edge, Galaxy S6 Edge+, Samsung Galaxy Note 5, Galaxy S7, Galaxy S7 Edge, Galaxy S8, Galaxy S8+, Samsung Galaxy Note Fan Edition, Samsung Galaxy Note 8, Samsung Galaxy A8/A8+ (2018) and Samsung Galaxy S9/Galaxy S9+.

    Read more →
  • Dynamic knowledge repository

    Dynamic knowledge repository

    The dynamic knowledge repository (DKR) is a concept developed by Douglas C. Engelbart as a primary strategic focus for allowing humans to address complex problems. He has proposed that a DKR will enable us to develop a collective IQ greater than any individual's IQ. References and discussion of Engelbart's DKR concept are available at the Doug Engelbart Institute. == Definition == A knowledge repository is a computerized system that systematically captures, organizes and categorizes an organization's knowledge. The repository can be searched and data can be quickly retrieved. The effective knowledge repositories include factual, conceptual, procedural and meta-cognitive techniques. The key features of knowledge repositories include communication forums. A knowledge repository can take many forms to "contain" the knowledge it holds. A customer database is a knowledge repository of customer information and insights – or electronic explicit knowledge. A Library is a knowledge repository of books – physical explicit knowledge. A community of experts is a knowledge repository of tacit knowledge or experience. The nature of the repository only changes to contain/manage the type of knowledge it holds. A repository (as opposed to an archive) is designed to get knowledge out. It should therefore have some rules of structure, classification, taxonomy, record management, etc., to facilitate user engagement.

    Read more →
  • Media evaluation

    Media evaluation

    Media evaluation is a discipline of the external and logical social sciences and centres on the analysis of media content, rating the exposure using a number of pre-designated criteria commonly including tonal value and presence of key messages. It is said to be one of the fastest-growing areas of mass communications research. The International Association for Measurement and Evaluation of Communication (AMEC) is the industry-appointed trade body for companies and individuals involved in research, measurement, and evaluation in editorial media coverage and related communications issues. To be a full member of AMEC, companies must be able to: a) offer comprehensive media evaluation, research, and interpretation services, b) have been in business for at least two years, and c) have a media evaluation turnover of more than £150,000 when applying. In addition, all companies abide by a strict code of ethics and must implement tight quality control procedures. These requirements guarantee that all media evaluation services provided are of the highest caliber. The Commission on Public Relations Measurement & Evaluation is a different organization that was established in 1998 under the direction of the Institute for Public Relations. The Commission's main functions are to set standards and procedures for research and measurement in public relations and to publish authoritative white papers on best practices.

    Read more →
  • Stegomalware

    Stegomalware

    Stegomalware is a form of malicious software that leverages steganography techniques to conceal its code, configuration data, or command-and-control (C&C) communications within seemingly benign digital media such as images, audio files, videos, documents, or network traffic. It typically embeds encrypted or obfuscated payloads into digital media and only extracts and executes them at runtime, which makes traditional signature-based and sandbox-based detection significantly more difficult. Stegomalware has been observed in attacks ranging from advanced persistent threats (APTs) to financially motivated cybercrime, and is now the subject of dedicated academic surveys, research projects, and international law-enforcement initiatives. The key distinction between stegomalware and traditional obfuscated malware lies in the encoding location. After obfuscation, malicious code remains present within the executable and can theoretically be discovered through static analysis. In contrast, stegomalware hides the payload entirely within a cover medium (image, audio, etc.), remaining invisible until the malware dynamically extracts and executes it at runtime. == History == The term stegomalware was formally introduced by researchers Águila, Laskov, and others in the context of mobile malware and presented at the Inscrypt (Information Security and Cryptology) conference in 2014. This marked the first academic formalization of the concept, though earlier work had already identified that botnets and mobile malware could use steganography and covert channels for command-and-control communication over probabilistically unobservable channels. Since its introduction, stegomalware has evolved from a theoretical concern to a documented threat. In 2011, the APT operation known as "Operation Shady RAT" became one of the first documented cases of stegomalware in the wild, using digital images to hide Internet Protocol addresses and command-and-control server addresses. The same year, the Duqu malware (targeting industrial manufacturers) embedded victim data into JPEG image files before exfiltration, making the data transfer virtually undetectable to network-level security tools. From 2014 onwards, stegomalware became more prevalent in organized cybercrime and advanced persistent threat campaigns. Notable examples include Zeus/Zbot, which masked configuration data in images; Gatak/Stegoloader, which hid shellcode in PNG files; TeslaCrypt, which embedded C&C commands in JPEGs; and Cerber, which concealed ransomware payloads within images. By the 2010s, stegomalware had become established as a preferred evasion technique for espionage, financial theft, and ransomware distribution campaigns. Recent surveys (2020–2025) document that stegomalware has increasingly been exploited by adversaries targeting banks, enterprises, government agencies, educational institutions, and internet users via malvertising campaigns. The technique is now considered a sophisticated method of attack worthy of dedicated international law-enforcement attention. == Technical Characteristics and Definitions == Stegomalware operates through a three-component architecture: Stegotext (R): An innocent-looking digital asset (image, audio file, etc.) into which the malicious payload is embedded. Secret key (sk): A key used by the embedding and extraction algorithms, typically hardcoded into the malware. Payload (p): The actual malicious code, configuration data, or C&C commands hidden within the stegotext. The malware extracts the payload at runtime using the secret key and either executes it directly or uses it to download additional stages of the attack. Stegomalware can be classified into several types based on deployment method: Type 0 (Autonomous): Both the stegotext and extraction algorithm are embedded within the malware application itself. The malicious payload is extracted and executed locally without external communication. Type I (Update): The stegotext and secret key are downloaded from a remote server at runtime; only the extraction algorithm is included in the malware. This variant is more flexible, allowing attackers to push updated payloads. Type II (External Algorithm): Neither the stegotext nor the extraction algorithm are distributed with the malware; both are fetched from an attacker-controlled infrastructure, providing maximum flexibility and evasion. == Steganography techniques == === Spatial domain methods === Stegomalware predominantly uses steganographic methods designed for images, as images are the most common cover medium in the wild. The most basic spatial domain technique is Least Significant Bit (LSB) substitution, which replaces the least significant bits of pixel color values with payload bits. While simple and easy to implement, LSB is also relatively easy to detect through statistical analysis. More sophisticated spatial domain techniques include: HUGO (High Undetectable steGO) (2010): Minimizes detectable distortion by distributing the payload across multiple pixels, achieving embedding capacity with reduced statistical footprint. WOW (Wavelet Obtained Weights) (2012): Embeds data preferentially in textured regions of images where modifications are less perceptually noticeable. UNIWARD (Universal Wavelet Relative Distortion) (2014): Uses a universal distortion function applicable to multiple image formats, balancing payload capacity with undetectability. HILL (2014): Applies high-pass and low-pass filters to identify robust embedding regions. MiPOD (Minimizing the Power of Optimal Detector) (2016): Designed to minimize the power of theoretical optimal steganalysis detectors. === Transform domain methods === Transform domain techniques convert images into the frequency domain (e.g., using DCT or DWT) before embedding, allowing for more robust hiding in JPEG and other compressed formats: Embedding in DCT coefficients (used in JPEG compression) Embedding in DWT coefficients (used in lossless formats) Spread spectrum techniques, which distribute the payload across many frequency components Transform domain methods are generally more resistant to noise, compression, and image transformations than spatial methods. === Generative adversarial network (GAN) methods === Recent advances in machine learning have introduced GAN-based steganography, where a generative model produces stego images that minimize detectable artifacts: SGAN (Steganographic GAN) (2017): First GAN applied to steganography, using a generator, discriminator, and steganalysis network. ASDL-GAN (2017): Performs automatic steganographic distortion learning at the pixel level. SteganoGAN (2019): Improves upon earlier GAN models, achieving higher embedding capacity and robustness. HiGAN (Hiding Images GAN) (2020): Enables hiding one image within another while maintaining visual plausibility. GAN-based approaches are more resilient to standard steganalysis attacks but remain an emerging threat requiring further research. == Notable malware campaigns == Stegomalware has been documented in numerous high-profile cyber attacks and campaigns. Notable examples include: Operation Shady RAT (2011): Used digital images to hide command-and-control server addresses in targeted espionage. Duqu (2011): Embedded victim data into JPEG files to exfiltrate industrial control system information. Zeus/Zbot (2014): Masked banking configuration data inside JPEG files exploited via malvertising. Gatak/Stegoloader (2015): Hid shellcode in PNG files for software licensing attacks and bot command execution. TeslaCrypt (2015): Embedded C&C commands and ransomware keys in JPEG images. Cerber (2016): Concealed executable ransomware code in JPEG files distributed via phishing. DNSChanger (2016): Embedded malicious code in PNG files for DNS hijacking campaigns. Sundown Exploit Kit (2017): Distributed exploit code in PNG files via malvertising. AdGholas (2017): Used JPEG steganography to distribute ransomware via malvertising. Synccrypt (2017): Hidden ransomware components in JPEG-steganographic encrypted archives. ZeroT/PlugX (2017): Hid Remote Access Trojan payloads in BMP files for espionage. Loki Bot (2018): Concealed malware installers in JPEG and video files. Waterbug (APT28) (2019): Injected malicious DLLs into WAV audio files. Shlayer (macOS adware) (2019): Hid malicious URLs in JPEG files via malvertising. === Attack vectors === The most common attack vectors for stegomalware include: Phishing emails with malicious attachments or links Malvertising campaigns using malicious banner advertisements Exploit kits through compromised or malicious websites Legitimate application vulnerabilities (e.g., watering-hole attacks) Fake software distribution (cracked software, keygen tools) === Exploitation stages === Stegomalware typically serves one or more roles in attack lifecycles: Payload delivery: Stego images contain full executable code or shellcode. C&C communication: Hidden data contains server addresses or command instructio

    Read more →
  • Tay (chatbot)

    Tay (chatbot)

    Tay was a chatbot that was originally released by Microsoft Corporation as a Twitter bot on March 23, 2016. It caused subsequent controversy when the bot began to post inflammatory and offensive tweets through its Twitter account, causing Microsoft to shut down the service only 16 hours after its launch. According to Microsoft, this was caused by trolls who "attacked" the service as the bot made replies based on its interactions with people on Twitter. It was replaced with Zo. == Background == The bot was created by Microsoft's Technology and Research and Bing divisions, and named "Tay" as an acronym for "thinking about you". Although Microsoft initially released few details about the bot, sources mentioned that it was similar to or based on Xiaoice, a Microsoft project in China. Ars Technica reported that, since late 2014 Xiaoice had had "more than 40 million conversations apparently without major incident". Tay was designed to mimic the language patterns of a 19-year-old American girl, and to learn from interacting with human users of Twitter. == Initial release == Tay was released on Twitter on March 23, 2016, under the name TayTweets and handle @TayandYou. It was presented as "The AI with zero chill". Tay started replying to other Twitter users, and was also able to caption photos provided to it into a form of Internet memes. Ars Technica reported Tay experiencing topic "blacklisting": Interactions with Tay regarding "certain hot topics such as Eric Garner (killed by New York police in 2014) generate safe, canned answers". Some Twitter users began tweeting politically incorrect phrases, teaching it inflammatory messages revolving around common themes on the internet, such as "redpilling" and "Gamergate". As a result, the robot began releasing racist and sexist messages in response to other Twitter users. Artificial intelligence researcher Roman Yampolskiy commented that Tay's misbehavior was understandable because it was mimicking the deliberately offensive behavior of other Twitter users, and Microsoft had not given the bot an understanding of inappropriate behavior. He compared the issue to IBM's Watson, which began to use profanity after reading entries from the website Urban Dictionary. Many of Tay's inflammatory tweets were a simple exploitation of Tay's "repeat after me" capability. It is not publicly known whether this capability was a built-in feature, or whether it was a learned response or was otherwise an example of complex behavior. However, not all of the inflammatory responses involved the "repeat after me" capability; for example, when asked if the Holocaust had happened, Tay answered "It was made up". == Suspension == Soon, Microsoft began deleting Tay's inflammatory tweets. Abby Ohlheiser of The Washington Post theorized that Tay's research team, including editorial staff, had started to influence or edit Tay's tweets at some point that day, pointing to examples of almost identical replies by Tay, asserting that "Gamer Gate sux. All genders are equal and should be treated fairly." From the same evidence, Gizmodo concurred that Tay "seems hard-wired to reject Gamer Gate". A "#JusticeForTay" campaign protested the alleged editing of Tay's tweets. Within 16 hours of its release and after Tay had tweeted more than 96,000 times, Microsoft suspended the Twitter account for adjustments, saying that it suffered from a "coordinated attack by a subset of people" that "exploited a vulnerability in Tay." Madhumita Murgia of The Telegraph called Tay "a public relations disaster", and suggested that Microsoft's strategy would be "to label the debacle a well-meaning experiment gone wrong, and ignite a debate about the hatefulness of Twitter users." However, Murgia described the bigger issue as Tay being "artificial intelligence at its very worst – and it's only the beginning". On March 25, Microsoft confirmed that Tay had been taken offline. Microsoft released an apology on its official blog for the controversial tweets posted by Tay. Microsoft was "deeply sorry for the unintended offensive and hurtful tweets from Tay", and would "look to bring Tay back only when we are confident we can better anticipate malicious intent that conflicts with our principles and values". == Second release and shutdown == On March 30, 2016, Microsoft accidentally re-released the bot on Twitter while testing it. Able to tweet again, Tay released some drug-related tweets, including "kush! [I'm smoking kush infront the police]" and "puff puff pass?" However, the account soon became stuck in a repetitive loop of tweeting "You are too fast, please take a rest", several times a second. Because these tweets mentioned its own username in the process, they appeared in the feeds of 200,000+ Twitter followers, causing annoyance to users. The bot was quickly taken offline again, in addition to Tay's Twitter account being made private so new followers must be accepted before they can interact with Tay. In response, Microsoft said Tay was inadvertently put online during testing. A few hours after the incident, Microsoft software developers announced a vision of "conversation as a platform" using various bots and programs, perhaps motivated by the reputation damage done by Tay. Microsoft has stated that they intend to re-release Tay "once it can make the bot safe" but has not made any public efforts to do so. == Legacy == In December 2016, Microsoft released Tay's successor, a chatbot named Zo. Satya Nadella, the CEO of Microsoft, said that Tay "has had a great influence on how Microsoft is approaching AI," and has taught the company the importance of taking accountability. In July 2019, Microsoft Cybersecurity Field CTO Diana Kelley spoke about how the company followed up on Tay's failings: "Learning from Tay was a really important part of actually expanding that team's knowledge base, because now they're also getting their own diversity through learning". === Unofficial revival === Gab, an alt-tech social media platform, has launched a number of chatbots, one of which is named Tay and uses the same avatar as the original.

    Read more →
  • Conditional disclosure of secrets

    Conditional disclosure of secrets

    Conditional disclosure of secrets (CDS) is a primitive, studied in information-theoretic cryptography, that allows distributed, non-communicating parties to coordinate the release of information to a third party. CDS was initially introduced for use in the context of private information retrieval, and has been related to communication complexity and non-local quantum computation. == Definition of conditional disclosure of secrets == The conditional disclosure of secrets setting involves three players; Alice, Bob and the referee. Alice receives an input x ∈ { 0 , 1 } n {\displaystyle x\in \{0,1\}^{n}} and a secret z ∈ { 0 , 1 } {\displaystyle z\in \{0,1\}} , and Bob receives a string y ∈ { 0 , 1 } n {\displaystyle y\in \{0,1\}^{n}} . A choice of Boolean function f : { 0 , 1 } 2 n → { 0 , 1 } {\displaystyle f:\{0,1\}^{2n}\rightarrow \{0,1\}} is fixed in advance and known to all players. Alice and Bob cannot communicate with one another, but share a string of random bits which we label r {\displaystyle r} . Alice and Bob compute messages m A = m A ( x , z , r ) {\displaystyle m_{A}=m_{A}(x,z,r)} and m B = m B ( y , r ) {\displaystyle m_{B}=m_{B}(y,r)} , which they send to the referee. The referee knows ( x , y ) {\displaystyle (x,y)} . A CDS protocol consists of the encoding maps applied by Alice and Bob. A protocol is said to be ϵ {\displaystyle \epsilon } -correct if, for all ( x , y ) ∈ f − 1 ( 1 ) {\displaystyle (x,y)\in f^{-1}(1)} , the referee can determine z {\displaystyle z} with probability 1 − ϵ {\displaystyle 1-\epsilon } . A protocol is said to be δ {\displaystyle \delta } -secure if, for all ( x , y ) ∈ f − 1 ( 0 ) {\displaystyle (x,y)\in f^{-1}(0)} the distribution of the messages is δ {\displaystyle \delta } close to a simulator distribution (in total variation distance), where the simulator distribution is independent of z {\displaystyle z} . The communication complexity of a CDS protocol P is the total number of bits of message sent by Alice and Bob. The CDS communication cost of a function, C D S ϵ , δ ( f ) {\displaystyle CDS_{\epsilon ,\delta }(f)} is the minimal communication cost of an ϵ {\displaystyle \epsilon } -correct, δ {\displaystyle \delta } secure protocol that implements f {\displaystyle f} . The randomness complexity and randomness cost of implementing a function in the CDS model are defined similarly, but consider the number of bits of shared random bits held by Alice and Bob. == Basic properties of the primitive == === Amplification === Supposing we have an ϵ {\displaystyle \epsilon } -correct and δ {\displaystyle \delta } -secure CDS protocol, it is known that we can find a new protocol which reduces the correctness and privacy errors at the expense of an increased communication and randomness cost. More specifically, the following theorem has been proven Theorem (Amplification). A CDS protocol for f which supports a single-bit secret with privacy and correctness error of 1/3 can be transformed into a new CDS protocol with privacy and correctness error of 2 − Ω ( k ) {\displaystyle 2^{-\Omega (k)}} and communication/randomness complexity which are larger than those of the original protocol by a multiplicative factor of O(k). In fact, somewhat more than the above theorem is true in that the size of the secret can also be made to be of length k {\displaystyle k} , while simultaneously reducing the correctness and privacy errors as above. The proof involves first encoding the secret z {\displaystyle z} into a secret sharing scheme, and then running the original CDS protocol on each share of the resulting scheme. === Closure === If a CDS protocol for a function f {\displaystyle f} is known, then certain simple modifications of f {\displaystyle f} have CDS protocols with similar efficiency. The simplest case is to consider a CDS protocol for function f {\displaystyle f} and ask for a similarly efficient protocol for the negation of f {\displaystyle f} , labelled ¬ f {\displaystyle \neg f} . This is addressed by the following theorem Theorem (CDS is closed under complement). Suppose that f has a CDS protocol with randomness cost of ρ {\displaystyle \rho } bits, communication complexity of t {\displaystyle t} bits, and privacy and correctness errors δ = ϵ = 2 − k {\displaystyle \delta =\epsilon =2^{-k}} . Then ¬ f {\displaystyle \neg f} has a CDS scheme with similar privacy and correctness errors, and randomness and communication complexity of O ( k 3 ρ 2 t + k 3 ρ 3 ) {\displaystyle O(k^{3}\rho ^{2}t+k^{3}\rho ^{3})} . The cost of a CDS protocol is also closed under formula's, in the following sense. Consider two functions f 1 {\displaystyle f_{1}} and f 2 {\displaystyle f_{2}} . Then, the communication and randomness costs of f 1 ∧ f 2 {\displaystyle f_{1}\wedge f_{2}} as well as f 1 ∨ f 2 {\displaystyle f_{1}\vee f_{2}} are not much larger than the sum of the costs for f 1 {\displaystyle f_{1}} and f 2 {\displaystyle f_{2}} . See Applebaum et al. for a precise statement. == Upper and lower bounds on communication cost == Given a function f {\displaystyle f} we would like to understand the communication and randomness costs to implement f {\displaystyle f} in the CDS setting. Towards understanding this, protocols for implementing CDS have been developed (which give an upper bound on the cost) and lower bound strategies have been developed. For most functions, there is a large gap between the known upper and lower bound, so understanding the cost of CDS remains largely an open problem. This section presents some of what is known so far about the cost of CDS. === Secret sharing based upper bound === A subject with a close relationship to CDS is secret sharing. Secret sharing constructions provide an upper bound on the cost of CDS protocols. A secret sharing scheme encodes a secret, s {\displaystyle s} into a set of shares S 1 , . . . , S n {\displaystyle S_{1},...,S_{n}} . Associated to any secret sharing scheme is an access structure, which consists of a set of authorized sets A = A 1 , . . . , A k {\displaystyle {\mathcal {A}}={A_{1},...,A_{k}}} with A i ⊆ { S 1 , . . . , S n } {\displaystyle A_{i}\subseteq \{S_{1},...,S_{n}\}} . The authorized sets are those subsets of the A i {\displaystyle A_{i}} from which it is possible to recover the secret recorded into the scheme. A succinct way to describe an access structure is in terms of a function f A : { 0 , 1 } n → { 0 , 1 } {\displaystyle f_{\mathcal {A}}:\{0,1\}^{n}\rightarrow \{0,1\}} . Each subset of the shares K [ x ] ⊂ { S 1 , . . . , S n } {\displaystyle K[x]\subset \{S_{1},...,S_{n}\}} is labelled by a string x ∈ { 0 , 1 } n {\displaystyle x\in \{0,1\}^{n}} such that x i = 1 {\displaystyle x_{i}=1} if and only if S i ∈ K {\displaystyle S_{i}\in K} . Then we define f A {\displaystyle f_{\mathcal {A}}} to be such that f A ( x ) = 1 {\displaystyle f_{\mathcal {A}}(x)=1} if and only if K [ x ] ∈ A {\displaystyle K[x]\in {\mathcal {A}}} . In words, the function f A {\displaystyle f_{\mathcal {A}}} is 1 when given an authorized subset as input, and 0 otherwise. A basic result in the theory of secret sharing is that an access structure A {\displaystyle {\mathcal {A}}} can be realized in a secret sharing scheme if and only if f A {\displaystyle f_{\mathcal {A}}} is monotone. The size of a secret sharing scheme is defined as the total number of bits in the shares S i {\displaystyle S_{i}} . For monotone functions, there is an upper bound on the communication cost in CDS for any monotone function f {\displaystyle f} in terms of the size of any secret sharing scheme with access structure given by f {\displaystyle f} , C D S ϵ = 0 , δ = 0 ( f ) ≤ S h a r i n g S i z e ( f ) {\displaystyle CDS_{\epsilon =0,\delta =0}(f)\leq SharingSize(f)} For some concrete classes of secret sharing schemes, this relationship can be extended to general (non-monotone) Boolean functions. This leads to an upper bound on CDS cost in terms of the size of any span program that computes f {\displaystyle f} , C D S ϵ = 0 , δ = 0 ( f ) ≤ S P k ( f ) {\displaystyle CDS_{\epsilon =0,\delta =0}(f)\leq SP_{k}(f)} The class of problems with efficient (polynomial size) span program is the complexity class M o d k L {\displaystyle Mod_{k}L} , so problems in this class have efficient CDS protocols. === Sub-exponential upper bounds for all functions === Using a matching vector family based construction, it has been proven that ∀ f , C D S ϵ = 0 , δ = 0 ( f ) ≤ 2 O ( n log ⁡ n ) {\displaystyle \forall f,\,\,\,\,\,\,CDS_{\epsilon =0,\delta =0}(f)\leq 2^{O({\sqrt {n\log n}})}} . The technique for this proof is similar to one used to prove upper bounds on private information retrieval. This upper bound on CDS also leads to sub-exponential upper bounds on the size of a large class of secret sharing schemes. === Lower bounds from communication complexity === In a CDS protocol, the referee is given the inputs ( x , y ) {\displaystyle (x,y)} . This means it is not clear if the messages sent by Alice a

    Read more →
  • Tokenization (data security)

    Tokenization (data security)

    Tokenization, when applied to data security, is the process of substituting a sensitive data element with a non-sensitive equivalent, referred to as a token, that has no intrinsic or exploitable meaning or value. The token is a reference (i.e. identifier) that maps back to the sensitive data through a tokenization system. The mapping from original data to a token uses methods that render tokens infeasible to reverse in the absence of the tokenization system, for example using tokens created from random numbers. A one-way cryptographic function is used to convert the original data into tokens, making it difficult to recreate the original data without obtaining entry to the tokenization system's resources. To deliver such services, the system maintains a vault database of tokens that are connected to the corresponding sensitive data. Protecting the system vault is vital to the system, and improved processes must be put in place to offer database integrity and physical security. The tokenization system must be secured and validated using security best practices applicable to sensitive data protection, secure storage, audit, authentication and authorization. The tokenization system provides data processing applications with the authority and interfaces to request tokens, or detokenize back to sensitive data. The security and risk reduction benefits of tokenization require that the tokenization system is logically isolated and segmented from data processing systems and applications that previously processed or stored sensitive data replaced by tokens. Only the tokenization system can tokenize data to create tokens, or detokenize back to redeem sensitive data under strict security controls. The token generation method must be proven to have the property that there is no feasible means through direct attack, cryptanalysis, side channel analysis, token mapping table exposure or brute force techniques to reverse tokens back to live data. Replacing live data with tokens in systems is intended to minimize exposure of sensitive data to those applications, stores, people and processes, reducing risk of compromise or accidental exposure and unauthorized access to sensitive data. Applications can operate using tokens instead of live data, with the exception of a small number of trusted applications explicitly permitted to detokenize when strictly necessary for an approved business purpose. Tokenization systems may be operated in-house within a secure isolated segment of the data center, or as a service from a secure service provider. Tokenization may be used to safeguard sensitive data involving, for example, bank accounts, financial statements, medical records, criminal records, driver's licenses, loan applications, stock trades, voter registrations, and other types of personally identifiable information (PII). Tokenization is often used in credit card processing. The PCI Council defines tokenization as "a process by which the primary account number (PAN) is replaced with a surrogate value called a token. A PAN may be linked to a reference number through the tokenization process. In this case, the merchant simply has to retain the token and a reliable third party controls the relationship and holds the PAN. The token may be created independently of the PAN, or the PAN can be used as part of the data input to the tokenization technique. The communication between the merchant and the third-party supplier must be secure to prevent an attacker from intercepting to gain the PAN and the token. De-tokenization is the reverse process of redeeming a token for its associated PAN value. The security of an individual token relies predominantly on the infeasibility of determining the original PAN knowing only the surrogate value". The choice of tokenization as an alternative to other techniques such as encryption will depend on varying regulatory requirements, interpretation, and acceptance by respective auditing or assessment entities. This is in addition to any technical, architectural or operational constraint that tokenization imposes in practical use. == Concepts and origins == The concept of tokenization, as adopted by the industry today, has existed since the first currency systems emerged centuries ago as a means to reduce risk in handling high value financial instruments by replacing them with surrogate equivalents. In the physical world, coin tokens have a long history of use replacing the financial instrument of minted coins and banknotes. In more recent history, subway tokens and casino chips found adoption for their respective systems to replace physical currency and cash handling risks such as theft. Exonumia and scrip are terms synonymous with such tokens. In the digital world, similar substitution techniques have been used since the 1970s as a means to isolate real data elements from exposure to other data systems. In databases for example, surrogate key values have been used since 1976 to isolate data associated with the internal mechanisms of databases and their external equivalents for a variety of uses in data processing. More recently, these concepts have been extended to consider this isolation tactic to provide a security mechanism for the purposes of data protection. In the payment card industry, tokenization is one means of protecting sensitive cardholder data in order to comply with industry standards and government regulations. Tokenization was applied to payment card data by Shift4 Corporation and released to the public during an industry Security Summit in Las Vegas, Nevada in 2005. The technology is meant to prevent the theft of the credit card information in storage. Shift4 defines tokenization as: "The concept of using a non-decryptable piece of data to represent, by reference, sensitive or secret data. In payment card industry (PCI) context, tokens are used to reference cardholder data that is managed in a tokenization system, application or off-site secure facility." To protect data over its full lifecycle, tokenization is often combined with end-to-end encryption to secure data in transit to the tokenization system or service, with a token replacing the original data on return. For example, to avoid the risks of malware stealing data from low-trust systems such as point of sale (POS) systems, as in the Target breach of 2013, cardholder data encryption must take place prior to card data entering the POS and not after. Encryption takes place within the confines of a security hardened and validated card reading device and data remains encrypted until received by the processing host, an approach pioneered by Heartland Payment Systems as a means to secure payment data from advanced threats, now widely adopted by industry payment processing companies and technology companies. The PCI Council has also specified end-to-end encryption (certified point-to-point encryption—P2PE) for various service implementations in various PCI Council Point-to-point Encryption documents. == The tokenization process == The process of tokenization consists of the following steps: The application sends the tokenization data and authentication information to the tokenization system. It is stopped if authentication fails and the data is delivered to an event management system. As a result, administrators can discover problems and effectively manage the system. The system moves on to the next phase if authentication is successful. Using one-way cryptographic or random generation techniques, a token is generated and kept in a highly secure data vault. The new token is provided to the application for further use, replacing the sensitive data for processing and storage. Tokenization systems share several components according to established standards. Token generation is the process of producing a token using any means, such as one-way nonreversible cryptographic functions (e.g., a hash function with a strong, secret salt) or assignment via a randomly generated number. Random number generator (RNG) techniques are often the best choice for generating token values. Token mapping – this is the process of assigning the created token value to its original value. To enable permitted look-ups of the original value using the token as the index, a secure cross-reference database must be constructed. Token data store – this is a central repository for the token mapping process that holds the original sensitive values and their related token values. Sensitive data and token values must be securely kept in an encrypted format. Management of cryptographic keys. Strong key management procedures are required for sensitive data encryption on token data stores. == Difference from encryption == Tokenization and "classic" encryption effectively protect data if implemented properly, and a computer security system may use both. While similar in certain regards, tokenization and classic encryption differ in a few key aspects. Both are cryptographic data security methods and the

    Read more →
  • Cryptosystem

    Cryptosystem

    In cryptography, a cryptosystem is a suite of cryptographic algorithms needed to implement a particular security service, such as confidentiality (encryption). Typically, a cryptosystem consists of three algorithms: one for key generation, one for encryption, and one for decryption. The term cipher (sometimes cypher) is often used to refer to a pair of algorithms, one for encryption and one for decryption. Therefore, the term cryptosystem is most often used when the key generation algorithm is important. For this reason, the term cryptosystem is commonly used to refer to public key techniques; however both "cipher" and "cryptosystem" are used for symmetric key techniques. == Formal definition == Mathematically, a cryptosystem or encryption scheme can be defined as a tuple ( P , C , K , E , D ) {\displaystyle ({\mathcal {P}},{\mathcal {C}},{\mathcal {K}},{\mathcal {E}},{\mathcal {D}})} with the following properties. P {\displaystyle {\mathcal {P}}} is a set called the "plaintext space". Its elements are called plaintexts. C {\displaystyle {\mathcal {C}}} is a set called the "ciphertext space". Its elements are called ciphertexts. K {\displaystyle {\mathcal {K}}} is a set called the "key space". Its elements are called keys. E = { E k : k ∈ K } {\displaystyle {\mathcal {E}}=\{E_{k}:k\in {\mathcal {K}}\}} is a set of functions E k : P → C {\displaystyle E_{k}:{\mathcal {P}}\rightarrow {\mathcal {C}}} . Its elements are called "encryption functions". D = { D k : k ∈ K } {\displaystyle {\mathcal {D}}=\{D_{k}:k\in {\mathcal {K}}\}} is a set of functions D k : C → P {\displaystyle D_{k}:{\mathcal {C}}\rightarrow {\mathcal {P}}} . Its elements are called "decryption functions". For each e ∈ K {\displaystyle e\in {\mathcal {K}}} , there is d ∈ K {\displaystyle d\in {\mathcal {K}}} such that D d ( E e ( p ) ) = p {\displaystyle D_{d}(E_{e}(p))=p} for all p ∈ P {\displaystyle p\in {\mathcal {P}}} . Note; typically this definition is modified in order to distinguish an encryption scheme as being either a symmetric-key or public-key type of cryptosystem. == Examples == A classical example of a cryptosystem is the Caesar cipher. A more contemporary example is the RSA cryptosystem. Another example of a cryptosystem is the Advanced Encryption Standard (AES). AES is a widely used symmetric encryption algorithm that has become the standard for securing data in various applications. Paillier cryptosystem is another example used to preserve and maintain privacy and sensitive information. It is featured in electronic voting, electronic lotteries and electronic auctions.

    Read more →
  • Data remanence

    Data remanence

    Data remanence is the residual representation of digital data that remains even after attempts have been made to remove or erase the data. This residue may result from data being left intact by a nominal file deletion operation, by reformatting of storage media that does not remove data previously written to the media, or through physical properties of the storage media that allow previously written data to be recovered. Data remanence may make inadvertent disclosure of sensitive information possible should the storage media be released into an uncontrolled environment (e.g., thrown in refuse containers or lost). Various techniques have been developed to counter data remanence. These techniques are classified as clearing, purging/sanitizing, or destruction. Specific methods include overwriting, degaussing, encryption, and media destruction. Effective application of countermeasures can be complicated by several factors, including media that are inaccessible, media that cannot effectively be erased, advanced storage systems that maintain histories of data throughout the data's life cycle, and persistence of data in memory that is typically considered volatile. Several standards exist for the secure removal of data and the elimination of data remanence. == Causes == Many operating systems, file managers, and other software provide a facility where a file is not immediately deleted when the user requests that action. Instead, the file is moved to a holding area (i.e. the "trash"), making it easy for the user to undo a mistake. Similarly, many software products automatically create backup copies of files that are being edited, to allow the user to restore the original version, or to recover from a possible crash (autosave feature). Even when an explicit deleted file retention facility is not provided or when the user does not use it, operating systems do not actually remove the contents of a file when it is deleted unless they are aware that explicit erasure commands are required, like on a solid-state drive. (In such cases, the operating system will issue the Serial ATA TRIM command or the SCSI UNMAP command to let the drive know to no longer maintain the deleted data.) Instead, they simply remove the file's entry from the file system directory because this requires less work and is therefore faster, and the contents of the file—the actual data—remain on the storage medium. The data will remain there until the operating system reuses the space for new data. In some systems, enough filesystem metadata are also left behind to enable easy undeletion by commonly available utility software. Even when undelete has become impossible, the data, until it has been overwritten, can be read by software that reads disk sectors directly. Computer forensics often employs such software. Likewise, reformatting, repartitioning, or reimaging a system is unlikely to write to every area of the disk, though all will cause the disk to appear empty or, in the case of reimaging, empty except for the files present in the image, to most software. Finally, even when the storage media is overwritten, physical properties of the media may permit recovery of the previous contents. In most cases however, this recovery is not possible by just reading from the storage device in the usual way, but requires using laboratory techniques such as disassembling the device and directly accessing/reading from its components. § Complications below gives further explanations for causes of data remanence. == Countermeasures == There are three levels commonly recognized for eliminating remnant data: === Clearing === Clearing is the removal of sensitive data from storage devices in such a way that there is assurance that the data may not be reconstructed using normal system functions or software file/data recovery utilities. The data may still be recoverable, but not without special laboratory techniques. Clearing is typically an administrative protection against accidental disclosure within an organization. For example, before a hard drive is re-used within an organization, its contents may be cleared to prevent their accidental disclosure to the next user. === Purging === Purging or sanitizing is the physical rewrite of sensitive data from a system or storage device done with the specific intent of rendering the data unrecoverable at a later time. Purging, proportional to the sensitivity of the data, is generally done before releasing media beyond control, such as before discarding old media, or moving media to a computer with different security requirements. === Destruction === The storage media is made unusable for conventional equipment. Effectiveness of destroying the media varies by medium and method. Depending on recording density of the media, and/or the destruction technique, this may leave data recoverable by laboratory methods. Conversely, destruction using appropriate techniques is the most secure method of preventing retrieval. == Specific methods == === Overwriting === A common method used to counter data remanence is to overwrite the storage media with new data. This is often called wiping or shredding a disk or file, by analogy to common methods of destroying print media, although the mechanism bears no similarity to these. Because such a method can often be implemented in software alone, and may be able to selectively target only part of the media, it is a popular, low-cost option for some applications. Overwriting is generally an acceptable method of clearing, as long as the media is writable and not damaged. The simplest overwrite technique writes the same data everywhere—often just a pattern of all zeros. At a minimum, this will prevent the data from being retrieved simply by reading from the media again using standard system functions. The UEFI in modern machines may offer an ATA class disk erase function as well. The ATA-6 standard governs secure erases specifications. Bitlocker is whole disk encryption and illegible without the key. Writing a fresh GPT allows a new file system to be established. Blocks will set empty but LBA read is illegible. New data will be unaffected and work fine. In an attempt to counter more advanced data recovery techniques, specific overwrite patterns and multiple passes have often been prescribed. These may be generic patterns intended to eradicate any trace signatures; an example is the seven-pass pattern 0xF6, 0x00, 0xFF, , 0x00, 0xFF, , sometimes erroneously attributed to US standard DOD 5220.22-M. One challenge with overwriting is that some areas of the disk may be inaccessible, due to media degradation or other errors. Software overwrite may also be problematic in high-security environments, which require stronger controls on data commingling than can be provided by the software in use. The use of advanced storage technologies may also make file-based overwrite ineffective (see the related discussion below under § Complications). There are specialized machines and software that are capable of doing overwriting. The software can sometimes be a standalone operating system specifically designed for data destruction. There are also machines specifically designed to wipe hard drives to the department of defense specifications DOD 5220.22-M. Writing zero to each block on hard disks and SSDs has the advantage of affording the firmware to deploy spare blocks when bad blocks are identified. Bitlocker has the advantage that data is illegible without the key. Seatools and other tools can erase disks with zero which is typical to revive old consumer class disks but they can wipe server disks albeit slowly. Modern 28TB and larger disks have an enormous number of LBA48 blocks. 40TB and 60TB disks will take proportionately longer times to wipe. ==== Feasibility of recovering overwritten data ==== Peter Gutmann investigated data recovery from nominally overwritten media in the mid-1990s. He suggested magnetic force microscopy may be able to recover such data, and developed specific patterns, for specific drive technologies, designed to counter such. These patterns have come to be known as the Gutmann method. Gutmann's belief in the possibility of data recovery is based on many questionable assumptions and factual errors that indicate a low level of understanding of how hard drives work. Daniel Feenberg, an economist at the private National Bureau of Economic Research, claims that the chances of overwritten data being recovered from a modern hard drive amount to "urban legend". He also points to the "18+1⁄2-minute gap" Rose Mary Woods created on a tape of Richard Nixon discussing the Watergate break-in. Erased information in the gap has not been recovered, and Feenberg claims doing so would be an easy task compared to recovery of a modern high density digital signal. As of November 2007, the United States Department of Defense considers overwriting acceptable for clearing magnetic media within the same security area/

    Read more →
  • Content management

    Content management

    Content management (CM) are a set of processes and technologies that support the collection, managing, and publishing of information in any form or medium. When stored and accessed via computers, this information may be more specifically referred to as digital content, or simply as content. Digital content may take the form of text (such as electronic documents), images, multimedia files (such as audio or video files), or any other file type that follows a content lifecycle requiring management. The process of content development and management is complex enough that various commercial software vendors (large and small), such as Interwoven and Microsoft, offer content management software to control and automate significant aspects of the content lifecycle. == Process == Content management practices and goals vary by mission and by organizational governance structure. News organizations, e-commerce websites, and educational institutions all use content management, but in different ways. This leads to differences in terminology and in the names and number of steps in the process. For example, some digital content is created by one or more authors. Over time that content may be edited. One or more individuals may provide some editorial oversight, approving the content for publication. Publishing may take many forms: it may be the act of "pushing" content out to others, or simply granting digital access rights to certain content to one or more individuals. Later that content may be superseded by another version of the content and thus retired or removed from use (as when this wiki page is modified). Content management is an inherently collaborative process. It often consists of the following basic roles and responsibilities: Creator – responsible for creating and editing content. Editor – responsible for tuning the content message and the style of delivery, including translation and localization. Publisher – responsible for releasing the content for use. Administrator – responsible for managing access permissions to folders, collections and files, usually accomplished by assigning access rights to user groups or roles. Admins may also assist and support users in various ways. Consumer, viewer or guest – the person who reads or otherwise consumes the content after it is published or shared. A critical aspect of content management is the ability to manage versions of content as it evolves (see also version control). Authors and editors often need to restore older versions of edited products due to a process failure or an undesirable series of edits. Time-sensitive content may also require updates as the subject matter evolves over time. Another equally important aspect of content management involves the creation, maintenance, and application of review standards. Each member of the content creation and review process has a unique role and set of responsibilities in the development or publication of the content. Each review team member requires clear and concise review standards. These must be maintained on an ongoing basis to ensure the long-term consistency and health of the knowledge base. A content management system is a set of automated processes that may support the following features: Import and creation of documents and multimedia material Identification of all key users and their roles The ability to assign roles and responsibilities to different instances of content categories or types Definition of workflow tasks often coupled with messaging so that content managers are alerted to changes in content The ability to track and manage multiple versions of a single instance of content The ability to publish the content to a repository to support access The ability to personalize content based on a set of rules Increasingly, the repository is an inherent part of the system, and incorporates enterprise search and retrieval. Content management systems take the following forms: Web content management system—software for web site management (often what content management implicitly means) Output of a newspaper editorial staff organization Workflow for article publication Document management systems Knowledge management software Single source content management system—content stored in chunks within a relational database Variant management system—where personnel tag source content (usually text and graphics) to represent variants stored as single source "master" content modules, resolved to the desired variant at publication (for example: automobile owners manual content for 12 model years stored as single master content files and "called" by model year as needed)—often used in concert with database chunk storage (see above) for large content objects == Governance structures == Content management expert Marc Feldman defines three primary content management governance structures: localized, centralized, and federated—each having its unique strengths and weaknesses. === Localized governance === By putting control in the hands of those closest to the content, the context experts, localized governance models empower and unleash creativity. These benefits come, however, at the cost of a partial-to-total loss of managerial control and oversight. === Centralized governance === When the levers of control are strongly centralized, content management systems are capable of delivering an exceptionally clear and unified brand message. Moreover, centralized content management governance structures allow for a large number of cost-savings opportunities in large enterprises, realized, for example, through (1) the avoidance of duplicated efforts in creating, editing, formatting, repurposing and archiving content; (2) process management and the streamlining of all content related labor; and/or (3) an orderly deployment or updating of the content management system. === Federated governance === Federated governance models potentially realize the benefits of both localized and centralized control while avoiding the weaknesses of both. While content management software systems are inherently structured to enable federated governance models, realizing these benefits can be difficult because it requires, for example, negotiating the boundaries of control with local managers and content creators. In the case of larger enterprises, in particular, the failure to fully implement or realize a federated governance structure equates to a failure to realize the full return on investment and cost savings that content management systems enable. == Implementation == Content management implementations must be able to manage content distributions and digital rights in content life cycle. Content management systems are usually involved with digital rights management in order to control user access and digital rights. In this step, the read-only structures of digital rights management systems force some limitations on content management, as they do not allow authors to change protected content in their life cycle. Creating new content using managed (protected) content is also an issue that gets protected contents out of management controlling systems. A few content management implementations cover all these issues.

    Read more →
  • Pamphlet war

    Pamphlet war

    A pamphlet war is a protracted argument or discussion through printed media, especially between the time the printing press became common, and when state intervention like copyright laws made such public discourse more difficult. The purpose was to defend or attack a certain perspective or idea. Pamphlet wars have occurred multiple times throughout history, as both social and political platforms. Pamphlet wars became viable platforms for this protracted discussion with the advent and spread of the printing press. Cheap printing presses, and increased literacy made the late 17th century a key stepping stone for the development of pamphlet wars, a period of prolific use of this type of debate. Over 2200 pamphlets were published between 1600–1715 alone. Pamphlet wars are generally credited for powering many key social changes of the era, including the Reformation and the Revolution Controversy, the English philosophical debate set off by the French Revolution. == History of the pamphlet in England == Throughout Europe in the 16th century, printed tracts were used to argue religious doctrine and foment support for religious causes. In England, Henry VIII used print literature to justify his break from the Catholic Church. During the subsequent reigns of Edward and Mary, print polemics escalated into propaganda warfare, as print media gained enormous potential to sway common opinion. By the 1560s, print was widely used to convey news. In 1562, the first pamphlets appeared, which discussed the English forces sent to aid the Protestant French Huguenots. In 1569, pamphlets reported the revolt of the Northern Earls and the subsequent Rebellion of the same year. In the 1580s, pamphlets began to replace broadsheet ballads as the means to convey information to the general public. Over the next century, the pamphlet became the principal means of garnering support for a cause or an idea, and was particularly influential during the English Civil Wars (1642-1651) and the Glorious Revolution of 1688. Through the ensuing decades, the pamphlet lost some popularity due to the emergence of newspapers and journals, but continued to be an important medium of public debate, as illustrated by the Revolution Controversy a full century later in the 1790s. == Pamphlet printing == Coming from a Latin word, "pamphlet" literally means "small book." In the early days of printing, the format of the book or pamphlet depended on the size of the paper used and the number of times it was folded. If a page was only folded once, it was called a folio. If it was folded twice, it was known as a quarto. An octave was a paper folded three times. A pamphlet was usually 1-12 sheets of paper folded in quarto, or 8-96 pages. It was sold for one or two pennies apiece. The printing of a pamphlet involved many people: the author, the printer, suppliers, print-makers, compositor, correctors, pressmen, binders, and distributors. Once the pamphleteer had written the pamphlet, it was sent to the printing house to be corrected, set into type, and printed. The papers were then given to the printer's warehouse-keeper, who bundled the copies and sent them to the bookseller, who was probably the one financing the printing. He was responsible to bind the pamphlets, usually by sewing them, and then sold them wholesale to individual bookselling vendors. The booksellers then sold them from a stall in the marketplace. == Pamphlet subjects == Pamphlets began as the means of conveyance for religious debates, and therefore religious topics were one of the main subjects they dealt with. The definition of a pamphlet came to mean a short work dealing with social, political, or religious issues. Typical topics included the Civil war, Church of England doctrines, Acts of Parliament, the Popish Plot (see below), the Stuart Era, and Cromwell propaganda. In addition, pamphlets were also used for romantic fiction, autobiography, scurrilous personal abuse, and social criticism. They contained much of the propaganda of the 17th century in the midst of the religious and political turmoil. They were also used for debates between the Puritans and the Anglican. During the Glorious Revolution, pamphlets were political weapons. == Authors == There were many authors of pamphlets. However some of the more popular authors include Daniel Defoe, Thomas Hobbes, Jonathan Swift, John Milton, and Samuel Pepys. Also included in the midst are Thomas Nashe, Joseph Addison, Richard Steele, and Matthew Prior. In 1591–1592, Robert Greene released a series of pamphlets which later inspired many other authors including Thomas Middleton and Thomas Dekker. == Critics == Pamphlets, along with their vast popularity, received criticism. There were many in the time period who believed that pamphlets were full of foolishness. They thought the pamphlets were not good enough literature and that they would turn people from "good" writing. They believed that pamphlets would be the end of the great volumes of literature and that great writing would be forgotten. == News reporting == Pamphlets made a great difference in the way news was reported to the general public. With the publication of pamphlets, it was no longer difficult for people to hear of events taking place far away. The closer the occurrence was to London, the easier and faster people heard of it. For example, the Battle of Edgehill took place on 23 October 1642. The first pamphlet reporting the incident was printed on 25 October 24 hours after some of the orders reported had been given. While not entirely accurate, and hurriedly made, the pamphlet nonetheless was able to tell the general public what had happened in the battle. A more accurate, specific, and readable account was available in a pamphlet printed on 26 October, and the "authorized" version was available only five days after the battle took place. == Marprelate pamphlets == In 1588, a series of pamphlets marked a turning point for the Puritans, dividing them from other Protestants in the country. The authors wrote under the pseudonym of Martin Marprelate and his two sons of the same name. The true identities of the authors were never discovered. The pamphlets aimed to provoke authorities to take action against censorship. The series was among the first to ask questions directly of its readers. == Early pamphlet wars == === Elizabethan pamphlet wars === As a means of forming or swaying public opinion, pamphlets like these had a part in influencing society, even as the content was itself influenced by society. During the 16th century and continuing for a short while in the early 17th century in England there was rise in the use of pamphlet wars to discuss a myriad of issues spanning from the civil war, to religious freedoms and the roles of women in society. The Queen herself participated in these discussions, making sure that she was widely read and understood by her people in order to gain favour and establish herself as the monarch despite being a woman. Examples of her use of this medium appear in To the Troops at Tilbury written in 1588, On Mary's Execution written in 1586, and many more. Another famous writer of this period to take advantage of the pamphlet was Emilia Lanier, famous for her arguments about the role of women. A common idea promoted by many literary works and the general attitude towards women, Lanier's work "Eve's Apology in Defence of Women" refuted the belief that Eve is responsible for the fall of man. A very uncommon and unpopular stance to take, Lanier accomplishes her defence through structuring it as an apology, one of the earliest subversive feminist texts. Similarly, Francis Bacon wrote his Essays to promote his idea of morality and other complicated social issues. For example, his work, "Of Love" examines the various understandings of the concept of love, particularly as it was perceived during the Elizabethan era. === Eikon Series === From 1649 until 1651, some five pamphlets were published in a debate about the execution of King Charles I of England (1600-1649). Prior to his execution, King Charles wrote the first pamphlet in the discussion, Eikon Basilike’’ (from the Greek “eikon” for image and “basileus” for king). The subtitle of this work - Portraiture of His Sacred Majesty in His Solitudes and Sufferings - indicates that Charles sought to portray himself as a martyr to the cause of regal prerogative. In the following months, several response pamphlets were published (collectively known as the "Eikon" series), including: Eikon Alethine, Eikon e Pistes, Eikonoklastes, and Eikon Aklastos,” alternately attacking or defending the king, his regicide, and his self-portrait in “Eikon Basilike.” == Popish Plot and Elizabeth Cellier == In the 1680s, after being acquitted of the "Meal-Tub Plot" for which she was accused, Elizabeth Cellier wrote Malice Defeated, which, along with The Matchless Picaro, sparked a pamphlet war surrounding debate of the ascension of a Catholic king to the thro

    Read more →
  • Data independence

    Data independence

    Data independence is the type of data transparency that matters for a centralized DBMS. It refers to the immunity of user applications to changes made in the definition and organization of data. Application programs should not, ideally, be exposed to details of data representation and storage. The DBMS provides an abstract view of the data that hides such details. There are two types of data independence: physical and logical data independence. The data independence and operation independence together gives the feature of data abstraction. There are two levels of data independence. == Logical data independence == The logical structure of the data is known as the 'schema definition'. In general, if a user application operates on a subset of the attributes of a relation, it should not be affected later when new attributes are added to the same relation. Logical data independence indicates that the conceptual schema can be changed without affecting the existing schemas. == Physical data independence == The physical structure of the data is referred to as "physical data description". Physical data independence deals with hiding the details of the storage structure from user applications. The application should not be involved with these issues since, conceptually, there is no difference in the operations carried out against the data. There are three types of data independence: Logical data independence: The ability to change the logical (conceptual) schema without changing the External schema (User View) is called logical data independence. For example, the addition or removal of new entities, attributes, or relationships to the conceptual schema or having to rewrite existing application programs. Physical data independence: The ability to change the physical schema without changing the logical schema is called physical data independence. For example, a change to the internal schema, such as using different file organization or storage structures, storage devices, or indexing strategy, should be possible without having to change the conceptual or external schemas. View level data independence: always independent no effect, because there doesn't exist any other level above view level. == Data independence == Data independence can be explained as follows: Each higher level of the data architecture is immune to changes of the next lower level of the architecture. The logical scheme stays unchanged even though the storage space or type of some data is changed for reasons of optimization or reorganization. In this, external schema does not change. In this, internal schema changes may be required due to some physical schema were reorganized here. Physical data independence is present in most databases and file environment in which hardware storage of encoding, exact location of data on disk, merging of records, so on this are hidden from user. == Data independence types == The ability to modify schema definition in one level without affecting schema of that definition in the next higher level is called data independence. There are two levels of data independence, they are Physical data independence and Logical data independence. Physical data independence is the ability to modify the physical schema without causing application programs to be rewritten. Modifications at the physical level are occasionally necessary to improve performance. It means we change the physical storage/level without affecting the conceptual or external view of the data. The new changes are absorbed by mapping techniques. Logical data independence is the ability to modify the logical schema without causing application programs to be rewritten. Modifications at the logical level are necessary whenever the logical structure of the database is altered (for example, when money-market accounts are added to banking system). Logical Data independence means if we add some new columns or remove some columns from table then the user view and programs should not change. For example: consider two users A & B. Both are selecting the fields "EmployeeNumber" and "EmployeeName". If user B adds a new column (e.g. salary) to his table, it will not affect the external view for user A, though the internal schema of the database has been changed for both users A & B. Logical data independence is more difficult to achieve than physical data independence, since application programs are heavily dependent on the logical structure of the data that they access.

    Read more →
  • The Cancer Imaging Archive

    The Cancer Imaging Archive

    The Cancer Imaging Archive (TCIA) is an open-access database of medical images for cancer research. The site is funded by the National Cancer Institute's (NCI) Cancer Imaging Program, and the contract is operated by the University of Arkansas for Medical Sciences. Data within the archive is organized into collections which typically share a common cancer type and/or anatomical site. The majority of the data consists of CT, MRI, and nuclear medicine (e.g. PET) images stored in DICOM format, but many other types of supporting data are also provided or linked to, in order to enhance research utility. All data are de-identified in order to comply with the Health Insurance Portability and Accountability Act and National Institutes of Health data sharing policies. TCIA resources are intended to support: Development of computer aided diagnosis methods (quantitative imaging) Evaluation of unbiased science reproducibility by acceptable standard statistical methods Research on correlation of clinical diagnostic medical images with digital microscopic histological images Exploratory biomarker research for which imaging is a key element Collaboration between cross-disciplinary investigators where imaging is crucial to research on tumor heterogeneity, between patients and within the tumor; tissue temporal response tracking - objective measurements of tumor progression; imaging genomics and Big Data linkages and analysis (clinical, histo-pathology, genomics) TCIA is recognized as a recommended repository for the Scientific Data, PLOS One, and F1000Research journals. It is also listed in the Registry of Research Data Repositories. == History == Prior to the creation of TCIA, the NCI funded development of the National Biomedical Imaging Archive. NBIA is an open-source Web application which was designed to allow the storage and query of DICOM images. TCIA was subsequently initiated in December 2010 to expand data sharing activities by funding a service component which would help address the technical and policy challenges associated with medical imaging research. TCIA leverages open-source tools such as NBIA and Clinical Trials Processor in order to provide its services. == Organization of the archive == The site content is organized into five categories: About Us - Provides a general overview of the site the organizations responsible for operating it. Share Your Data - Provides an overview of how to apply to upload data to the archive. Access the Archive - Provides information about the available data, methods for accessing that data and system usage metrics. Research Activities - Provides information about major research initiatives being conducted using TCIA data as well as information about publication guidelines. Help - Provides information about how to get support using the archive as well as documentation and data usage policies. == Methods for accessing data == Most collections on the Cancer Imaging Archive can be accessed without an account, but a few are restricted to specific users and therefore require an account to access them. TCIA has several ways to browse, filter, and download data. They include: Downloading the entire contents of a collection in bulk Leveraging the NBIA application to filter or search within or across collections Utilizing the RESTful Application programming interface to filter or search within or across collections === Browsing, bulk downloading and access to supporting data === The home page includes a list of all available collections. Basic information about the data such as the cancer type, cancer location, modalities, and number of subjects are also provided. Clicking on a collection name presents a page which describes the data including its original research purpose, how the data were generated, and how it might be useful to other TCIA users. For example, doi:10.7937/K9/TCIA.2015.L4FRET6Z describes the NSCLC-Radiomics-Genomics Collection. In the lower section of the page there are links to search or download the images and any available supporting data in the Data Access tab. Additional tabs provide information about data versions and how to cite the data if used in publications. Many collections contain additional data types such as genomics, patient demographics, treatment details, and expert analyses of the images. This data is usually only found by browsing the collection pages as opposed to searching in NBIA or using the API. === Filtering or searching with NBIA === On each Collection page and also in the main menu of the site there are links to "Search TCIA". This will load the NBIA application which allows simple, advanced and free text searches. Search results follow the conventional DICOM hierarchy of patient -> study -> series. TCIA provides comprehensive documentation on the various features of the NBIA software. === RESTful API === A number of search and download commands are also available through the API. New iterations on the API are released as new versions, so that existing applications developed against older versions of the API continue to function. == Research activities == A list of known publications based on TCIA data is maintained as a convenience to researchers who might want to investigate how it has been used previously. In addition to peer-reviewed publications there are also several major research initiatives described in the Research Activities section of the site. === The CIP TCGA Radiology Initiative for Radiogenomics Research === A large number of collections contain subjects which were analyzed as part of the NIH/NHGRI database known as The Cancer Genome Atlas (TCGA). This offers researchers the ability to correlate clinical images using shared unique identifiers each study that has in TCGA extensive genomic analysis, digital pathology slides and bulk download of individual demographic data and clinical data. A multi-institutional network of investigators volunteering their time is using the data to develop methods to determine prognosis or predict the response to therapy. TCGA collections are designated by nomenclature shared by the TCGA Data Portal (e.g.: TCGA-BRCA, TCGA-GBM, etc). They are subject to a special publication policy which is unique from the other public data on TCIA. === Challenge competitions === TCIA also provides specific data sets used for "Challenge" competitions such as international digital image-focused professional societies like MICCAI, SPIE, or ISBI. A directory of previous and upcoming challenges is maintained on the site. === Digital object identifiers === To facilitate data sharing, many publications encourage authors to include data citations to the data that the authors used in creating the results described in their scholarly papers. In addition, new journals are now available for describing data collections outright (e.g., Nature Scientific Data). TCIA assigns digital object identifiers (DOIs) to all collections when they are submitted, and also has the ability to create persistent identifiers linked to subsets of data held within TCIA that authors may use for data citations in their scholarly papers.

    Read more →
  • Social news website

    Social news website

    A social news website is a website that features user-posted stories. Such stories are ranked based on popularity, as voted on by other users of the site or by website administrators. Users typically comment online on the news posts and these comments may also be ranked in popularity. Since their emergence with the birth of Web 2.0, social news sites have been used to link many types of information, including news, humor, support, and discussion. All such websites allow the users to submit content and each site differs in how the content is moderated. On the Slashdot and Fark websites, administrators decide which articles are selected for the front page. On Reddit and Digg, the articles that get the most votes from the community of users will make it to the front page. Many social news websites also feature an online comment system, where users discuss the issues raised in an article. Some of these sites have also applied their voting system to the comments, so that the most popular comments are displayed first. Some social news websites also have a social networking service, in that users can set up a user profile and follow other users' online activity on the website. Like many other Web 2.0 tools, social news websites use the collective intelligence of all of the users to operate. Social news websites also "impl[y] the technical, economic, legal, and human enhancement of a universally distributed intelligence that will unleash a positive dynamic of recognition and skills mobilization". Social news websites help participants to share a collective vision and awareness of how their actions are integrated with those of other individuals. Social news websites provide a new and innovative way to participate in a community that is constantly being flooded with new information. These social news websites "include opportunities for peer-to-peer learning, a changed attitude toward intellectual property, the diversification of cultural expression, the development of skills valued in the modern workplace, and a more empowered conception of citizenship". These websites can help to shape and reshape democratic opinions and perspectives. Social news sites may mitigate the gatekeeping of mainstream news sources and allow the public to decide what counts as "news", which may facilitate a more participatory culture. Social news sites may also support democratic participation by allowing users from across geographic and national boundaries to access the same information, respond to fellow users' views and beliefs, and create a virtual sphere for users to contribute within. == Websites == === Active === ==== Fark ==== Fark, which started in 1997, features news on any topic. On Fark, users can submit articles to the administrators of the site. Each day, these administrators pick out 50 articles to display on the front page. ==== Slashdot ==== Slashdot, started in 1997, was one of the first social news websites. It focuses mainly on science and technology-related news. Users can submit stories and the editors pick out the best stories each day for the front page. Users can then post comments on the stories. The influx of web traffic that resulted from Slashdot linking to external websites led to the effect being called the Slashdot effect ==== Digg ==== Digg, started in December 2004, introduced the voting system. This system allows users to "digg" or "bury" articles. "Digging" is the equivalent of voting positively, so that popular articles are displayed first. "Burying" does not lower an article's score. However, if an article is buried enough times, it will be automatically deleted from the site. Digg offers a social networking service, as members can follow other members and build personal profiles with information about their interests. ==== Reddit ==== Reddit, started in June 2005, is a social news website where users can submit articles and comments and vote on these submissions. The submissions are organized into categories called "subreddits". Unlike Digg, with Reddit, users can directly affect an article's score. An "upvote" will increase the score and a "downvote" will decrease it. Articles with the highest scores are displayed on the front page. There is also a page for "controversial" articles, that have an almost equal number of upvotes and downvotes. Free speech debates have arisen due to the shutting down of obscene or potentially illegal "subreddits" (including /r/jailbait, a collection of sexually suggestive underage pictures.) Reddit introduced a system of user-created communities called "subreddits", which are essentially categories for a specific type of news. Comments on the featured posts are shown in a hierarchical fashion also based on votes. Users have the ability to earn "karma" for their participation and time on the website. ==== Hacker News ==== Hacker News, started in February 2007, is a social news site focusing on computer science and entrepreneurship, created by Paul Graham and run by his startup incubator, Y Combinator. === Defunct === ==== Newsvine ==== Newsvine, started in March 2006, was a social news website mostly focused on politics, both international and domestic. The Newsvine home page allowed users to customize "seeds" and story feeds. Users received articles via "The Wire" from sources including The Associated Press or The Huffington Post, and from "The Vine" a stream of content from other Newsvine users. The "Top of the Vine" displayed the most voted and commented on articles of the day, week, month, or year. Additionally, Newsvine allowed members to create their own "Customizable Column", which could highlight a user's content posted, recent comments, and information about the specific Newsvine member. ==== feedalizr ==== feedalizr was a cross-platform, desktop social media aggregator built using Adobe Integrated Runtime that consolidates the updates from social media and social networking websites. Users can then use this application to update those sites from their desktop and view a consolidated stream of information. ==== Voat ==== Voat, launched in April 2014 and discontinued in December of 2020, was also a social news website and is very similar to Reddit visually and functionally. The site's userbase included a large number of alt right users, many of whom migrated to Voat after being banned on Reddit. ==== Prismatic ==== Prismatic combined machine learning, user experience design, and interaction design to create a new way to discover, consume, and share media. Prismatic software used social network aggregation and machine learning algorithms to filter the content that aligns with the interests of a specific user. Prismatic integrated with Facebook, Twitter, and Pocket to gather information about user's interests and suggest the most relevant stories to read. ==== Artifact ==== Artifact was an iOS and Android app that used machine learning to personalize news recommendations to readers, and also had social features such as liking articles, commenting, and reputation scores for users.

    Read more →
  • Data recovery

    Data recovery

    In computing, data recovery is a process of retrieving deleted, inaccessible, lost, corrupted, damaged, or overwritten data from secondary storage, removable media or files, when the data stored in them cannot be accessed in a usual way. The data is most often salvaged from storage media such as internal or external hard disk drives (HDDs), solid-state drives (SSDs), USB flash drives, magnetic tapes, CDs, DVDs, RAID subsystems, and other electronic devices. Recovery may be required due to physical damage to the storage devices or logical damage to the file system that prevents it from being mounted by the host operating system (OS). Logical failures occur when the hard drive devices are functional but the user or automated-OS cannot retrieve or access data stored on them. Logical failures can occur due to corruption of the engineering chip, lost partitions, firmware failure, or failures during formatting/re-installation. Data recovery can be a very simple or technical challenge. This is why there are specific software companies specialized in this field that help to get back data on your system. == About == The most common data recovery scenarios involve an operating system failure, malfunction of a storage device, logical failure of storage devices, accidental damage or deletion, etc. (typically, on a single-drive, single-partition, single-OS system), in which case the ultimate goal is simply to copy all important files from the damaged media to another new drive. This can be accomplished using a Live CD, or DVD by booting directly from a ROM or a USB drive instead of the corrupted drive in question. Many Live CDs or DVDs provide a means to mount the system drive and backup drives or removable media, and to move the files from the system drive to the backup media with a file manager or optical disc authoring software. Such cases can often be mitigated by disk partitioning and consistently storing valuable data files (or copies of them) on a different partition from the replaceable OS system files. Another scenario involves a drive-level failure, such as a compromised file system or drive partition, or a hard disk drive failure. In any of these cases, the data is not easily read from the media devices. Depending on the situation, solutions involve repairing the logical file system, partition table, or master boot record, or updating the firmware or drive recovery techniques ranging from software-based recovery of corrupted data, to hardware- and software-based recovery of damaged service areas (also known as the hard disk drive's "firmware"), to hardware replacement on a physically damaged drive which allows for the extraction of data to a new drive. If a drive recovery is necessary, the drive itself has typically failed permanently, and the focus is rather on a one-time recovery, salvaging whatever data can be read. In a third scenario, files have been accidentally "deleted" from a storage medium by the users. Typically, the contents of deleted files are not removed immediately from the physical drive; instead, references to them in the directory structure are removed, and thereafter space the deleted data occupy is made available for later data overwriting. In the mind of end users, deleted files cannot be discoverable through a standard file manager, but the deleted data still technically exists on the physical drive. In the meantime, the original file contents remain, often several disconnected fragments, and may be recoverable if not overwritten by other data files. The term "data recovery" is also used in the context of forensic applications or espionage, where data which have been encrypted, hidden, or deleted, rather than damaged, are recovered. Sometimes data present in the computer gets encrypted or hidden due to reasons like virus attacks which can only be recovered by some computer forensic experts. == Physical damage == A wide variety of failures can cause physical damage to storage media, which may result from human errors and natural disasters. CD-ROMs can have their metallic substrate or dye layer scratched off; hard disks can suffer from a multitude of mechanical failures, such as head crashes, PCB failure, and failed motors; tapes can simply break. Physical damage to a hard drive, even in cases where a head crash has occurred, does not necessarily mean permanent data loss. However, in extreme cases, such as prolonged exposure to moisture and corrosion —like the lost Bitcoin hard drive of James Howells, buried in the Newport landfill for over a decade — recovery is usually impossible. In rare cases, forensic techniques such as magnetic force microscopy (MFM) have been explored to detect residual magnetic traces when data holds exceptional value. Other techniques employed by many professional data recovery companies can typically salvage most, if not all, of the data that had been lost when the failure occurred. Of course, there are exceptions to this, such as cases where severe damage to the hard drive platters may have occurred. However, if the hard drive can be repaired and a full image or clone created, then the logical file structure can be rebuilt in most instances. Most physical damage cannot be repaired by end users. For example, opening a hard disk drive in a normal environment can allow airborne dust to settle on the platter and become caught between the platter and the read/write head. During normal operation, read/write heads float 3 to 6 nanometers above the platter surface, and the average dust particles found in a normal environment are typically around 30,000 nanometers in diameter. When these dust particles get caught between the read/write heads and the platter, they can cause new head crashes that further damage the platter and thus compromise the recovery process. Furthermore, end users generally do not have the hardware or technical expertise required to make these repairs. Consequently, data recovery companies are often employed to salvage important data with the more reputable ones using class 100 dust- and static-free cleanrooms. === Recovery techniques === Recovering data from physically damaged hardware can involve multiple techniques. Some damage can be repaired by replacing parts in the hard disk. This alone may make the disk usable, but there may still be logical damage. A specialized disk-imaging procedure is used to recover every readable bit from the surface. Once this image is acquired and saved on a reliable medium, the image can be safely analyzed for logical damage and will possibly allow much of the original file system to be reconstructed. ==== Hardware repair ==== A common misconception is that a damaged printed circuit board (PCB) may be simply replaced during recovery procedures by an identical PCB from a healthy drive. While this may work in rare circumstances on hard disk drives manufactured before 2003, it will not work on newer drives. Electronics boards of modern drives usually contain drive-specific adaptation data (generally a map of bad sectors and tuning parameters) and other information required to properly access data on the drive. Replacement boards often need this information to effectively recover all of the data. The replacement board may need to be reprogrammed. Some manufacturers (Seagate, for example) store this information on a serial EEPROM chip, which can be removed and transferred to the replacement board. Each hard disk drive has what is called a system area or service area; this portion of the drive, which is not directly accessible to the end user, usually contains drive's firmware and adaptive data that helps the drive operate within normal parameters. One function of the system area is to log defective sectors within the drive; essentially telling the drive where it can and cannot write data. The sector lists are also stored on various chips attached to the PCB, and they are unique to each hard disk drive. If the data on the PCB do not match what is stored on the platter, then the drive will not calibrate properly. In most cases the drive heads will click because they are unable to find the data matching what is stored on the PCB. == Logical damage == The term "logical damage" refers to situations in which the error is not a problem in the hardware and requires software-level solutions. === Corrupt partitions and file systems, media errors === In some cases, data on a hard disk drive can be unreadable due to damage to the partition table or file system, or to (intermittent) media errors. In the majority of these cases, at least a portion of the original data can be recovered by repairing the damaged partition table or file system using specialized data recovery software such as TestDisk; software like ddrescue can image media despite intermittent errors, and image raw data when there is partition table or file system damage. This type of data recovery can be performed by people without expertise in drive hardware as it requires no special physica

    Read more →