AI Content Repurposing Service

AI Content Repurposing Service — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Deep Zoom

    Deep Zoom

    Deep Zoom is a technology developed by Microsoft for efficiently transmitting and viewing images. It allows users to pan around and zoom in on a large, high resolution image or a large collection of images. It reduces the time required for initial load by downloading only the region being viewed or only at the resolution it is displayed at. Subsequent regions are downloaded as the user pans to (or zooms into) them; animations are used to hide any jerkiness in the transition. The libraries are also available in other platforms including Java and Flash. == History == The Deep Zoom file format is very similar to the Google Maps image format where images are broken into tiles and then displayed as required. The tiling typically follows a quadtree pattern of increasing resolution of image (in other words twice the zoom and twice the resolution). The main difference is that with Google Maps the actual details on the image change from one zoom level to another, while with Deep Zoom the same image is displayed at each zoom level. Seadragon Software, formerly Sand Codex, first created the Seadragon technology and its implementation of what is now called Deep Zoom. This technology was then absorbed into the Microsoft Live Labs when Seadragon Software was acquired. Engineers from Seadragon now work with Microsoft to integrate their work into technology such as Silverlight and Photosynth. == Deep Zoom examples == The most famous implementation of Deep Zoom was probably the first: the memorabilia collection at the Hard Rock website. Conceived and designed by Duncan/Channon and built by Vertigo, it was demonstrated for the first time in March 2008 at the Microsoft MIX convention in Las Vegas. In 2010, Microsoft Live Labs partnered with the University of California, Berkeley to create ChronoZoom, a DeepZoom-powered time visualization tool that pushed the limits of DeepZoom, since it required zooming from the scale of 13 billion years down to a single day. The project has since graduated to development under Microsoft Research. Another example is the Deep Earth project. It is described by its creators as "a community project focused on creating a rich interactive mapping control using Silverlight2 Deep Zoom. Concentrating on Microsoft Virtual Earth imagery and data the project offers team members the opportunity to learn and share while creating something cool and useful." A paintings collection project http://galleryzoom.co.uk/ shows 1000 high resolution/sensor images individually indexed. (Using Deep Zoom Composer). Blaise Aguera y Arcas gave a demonstration of Seadragon and Photosynth at the 2007 TED conference. In November 2009, 352 Media Group, a Silverlight developer in the Microsoft Silverlight Partner Program, created an example of Deep Zoom using Microsoft Silverlight version 3. It is online at 352 Media Group's Web site. The Winston Churchill Deep Zoom Archived 2010-07-04 at the Wayback Machine mosaic, created by Silverlight developers Shoothill, features as both an online interactive deep zoom and a standalone deep zoom which forms part of the Churchill exhibit in the Churchill War Rooms in Whitehall. In 2010, Shoothill built the Sumatran Tiger Deep Zoom - the largest seen to date - for worldwide conservation charity Fauna and Flora International, featuring thousands of images of endangered species. An early example of Deep Zoom-like technology was implemented at The Department of Maori Affairs in New Zealand in 1997. The technology was used to display Maori land ownership. == Deep Zoom images == The file format used by Deep Zoom (as well as Photosynth and Seadragon Ajax) is XML based. Users can specify a single large image (dzi) or a collection of images (dzc). It also allows for "Sparse Images"; where some parts of the image have greater resolution than others, an example of which can be found on the Seadragon Ajax home page; The bike image displayed is a sparse image. Though used in the proprietary Deep Zoom, the dzi format is open and able to be used by anyone. === Deep Zoom image (dzi) === A DZI has two parts: a DZI file (with either a .dzi or .xml extension) and a subdirectory of image folders. Each folder in the image subdirectory is labeled with its level of resolution. Higher numbers correspond to a higher resolution level; inside each folder are the image tiles corresponding to that level of resolution, numbered consecutively in columns from top left to bottom right. === Deep Zoom collection (dzc) === A DZC is a collection of some number of DZIs linked and referenced by a DZC file (with either a .dzc or .xml extension). At a high level, a collection is a number of image thumbnails whose location is kept track of by the .dzc/.xml file, when zooming into an image, it accesses greater resolutions tiles. A DZC's structure is similar to that of a DZI; the .dzc/.xml file defines the collection and the subdirectory of folders maps to the DZI file structure, each with their set of .dzi/.xml and image tiles. The DZC is used in Microsoft's Pivot, but not in SeaDragon per se. === Sparse Images === Sparse images are a sub-classification of the DZI file type. A sparse image is normally a number of separate photographs with varying resolution levels that have been placed in a single DZI instead of a DZC. Sparse images have no different file structure than that of a DZI and differ only in that there is not a single "highest resolution" level for the entire DZI. == Software that uses Deep Zoom == Image Composite Editor - image stitching tool created by Microsoft Research Deep Zoom Composer - collage maker and simple panorama tool created by Microsoft. Images' resolution is maintained when exporting for web use (via Silverlight Deep Zoom or JavaScript using a third-party template). No longer available for download from Microsoft though it can be found on various other sources such as Internet Archive. == iPhone OS development == Microsoft Live Labs has created an application for the App Store called Seadragon Mobile. It is run over the internet and includes Deep Zoom on the following categories; art, history, maps, photos, Photosynth which anybody can upload to, space and technology & web.

    Read more →
  • List of network buses

    List of network buses

    List of electrical characteristics of single collision domain segment "slow speed" network buses: The number of nodes can be limited by either number of available addresses or bus capacitance. None of the above use any analog domain modulation techniques like MLT-3 encoding, PAM-5 etc. PSI5 designed with automation applications in mind is a bit unusual in that it uses Manchester code.

    Read more →
  • Data steward

    Data steward

    A data steward is an oversight or data governance role within an organization, and is responsible for ensuring the quality and fitness for purpose of the organization's data assets, including the metadata for those data assets. A data steward may share some responsibilities with a data custodian, such as the awareness, accessibility, release, appropriate use, security and management of data. A data steward would also participate in the development and implementation of data assets. A data steward may seek to improve the quality and fitness for purpose of other data assets their organization depends upon but is not responsible for. Data stewards have a specialist role that utilizes an organization's data governance processes, policies, guidelines and responsibilities for administering an organizations' entire data in compliance with policy and/or regulatory obligations (e.g., GDPR, HIPAA). The overall objective of a data steward is the data quality of the data assets, datasets, data records and data elements. This includes documenting metainformation for the data, such as definitions, related rules/governance, physical manifestation, and related data models (most of these properties being specific to an attribute/concept relationship), identifying owners/custodian's various responsibilities, relations insight pertaining to attribute quality, aiding with project requirement data facilitation and documentation of capture rules. Data stewards begin the stewarding process with the identification of the data assets and elements which they will steward, with the ultimate result being standards, controls and data entry. The steward works closely with business glossary standards analysts (for standards), with data architect/modelers (for standards), with DQ analysts (for controls) and with operations team members (good-quality data going in per business rules) while entering data. Data stewardship roles are common when organizations attempt to exchange data precisely and consistently between computer systems and to reuse data-related resources. Master data management often makes references to the need for data stewardship for its implementation to succeed. Data stewardship must have precise purpose, fit for purpose or fitness. == Data steward responsibilities == A data steward ensures that each assigned data element: Has clear and unambiguous data element definition Does not conflict with other data elements in the metadata registry (removes duplicates, overlap etc.) Has clear enumerated value definitions if it is of type Code Is still being used (remove unused data elements) Is being used consistently in various computer systems Is being used, fit for purpose = Data Fitness Has adequate documentation on appropriate usage and notes Documents the origin and sources of authority on each metadata element Is protected against unauthorised access or change Responsibilities of data stewards vary between different organisations and institutions. For example, at Delft University of Technology, data stewards are perceived as the first contact point for any questions related to research data. They also have subject-specific background allowing them to easily connect with researchers and to contextualise data management problems to take into account disciplinary practices. == Types of data stewards == Depending on the set of data stewardship responsibilities assigned to an individual, there are 4 types (or dimensions of responsibility) of data stewards typically found within an organization: Data object data steward - responsible for managing reference data and attributes of one business data entity Business data steward - responsible for managing critical data, both reference and transactional, created or used by one business function. The data steward may also serve as a liaison between the organization's data users and technical teams, helping to bridge the gap between business needs and technical requirements. They may also play a role in educating others within the organization about best practices for data management, and advocating for data-driven decision-making. Process data steward - responsible for managing data across one business process System data steward - responsible for managing data for at least one IT system == Benefits of data stewardship == Systematic data stewardship can foster: Faster analysis Consistent use of data management resources Easy mapping of data between computer systems and exchange documents Lower costs associated with migration to (for example) service-oriented architecture (SOA) Mitigation of data risk Better control of dangers associated with privacy, legal, errors, etc. Assignment of each data element to a person sometimes seems like an unimportant process. But multiple groups have found that users have greater trust and usage rates in systems where they can contact a person with questions on each data element. == Examples == Delft University of Technology (TU Delft) offers an example of data stewardship implementation at a research institution. In 2017 the Data Stewardship Project was initiated at TU Delft to address research data management needs in a disciplinary manner across the whole campus. Dedicated data stewards with subject-specific background were appointed at every TU Delft faculty to support researchers with data management questions and to act as a linking point with the other institutional support services. The project is coordinated centrally by TU Delft Library, and it has its own website, blog and a YouTube channel. The [1]EPA metadata registry furnishes an example of data stewardship. Note that each data element therein has a "POC" (point of contact). In 2023, ETH Zurich launched the Data Stewardship Network (DSN) to facilitate collaboration among employees engaged in data management, analysis, and code development across research groups. The DSN serves as a platform for networking and knowledge exchange, aiming to professionalize the role of data stewards who support research data management and reproducible workflows. Established by the team for Research Data Management and Digital Curation at the ETH Library, the DSN collaborates with Scientific IT Services to provide expertise in areas such as storage infrastructure and reproducible workflows. == Data stewardship applications == Information stewardship applications are business solutions used by business users acting in the role of information steward (interpreting and enforcing information governance policy, for example). These developing solutions represent, for the most part, an amalgam of a number of disparate, previously IT-centric tools already on the market, but are organized and presented in such a way that information stewards (a business role) can support the work of information policy enforcement as part of their normal, business-centric, day-to-day work in a range of use cases. The initial push for the formation of this new category of packaged software came from operational use cases — that is, use of business data in and between transactional and operational business applications. This is where most of the master data management efforts are undertaken in organizations. However, there is also now a faster-growing interest in the new data lake arena for more analytical use cases.

    Read more →
  • Plaintext

    Plaintext

    In cryptography, plaintext usually means unencrypted information pending input into cryptographic algorithms, usually encryption algorithms. This usually refers to data that is transmitted or stored unencrypted. == Overview == With the advent of computing, the term plaintext expanded beyond human-readable documents to mean any data, including binary files, in a form that can be viewed or used without requiring a key or other decryption device. Information—a message, document, file, etc.—if to be communicated or stored in an unencrypted form is referred to as plaintext. Plaintext is used as input to an encryption algorithm; the output is usually termed ciphertext, particularly when the algorithm is a cipher. Codetext is less often used, and almost always only when the algorithm involved is actually a code. Some systems use multiple layers of encryption, with the output of one encryption algorithm becoming "plaintext" input for the next. == Secure handling == Insecure handling of plaintext can introduce weaknesses into a cryptosystem by letting an attacker bypass the cryptography altogether. Plaintext is vulnerable in use and in storage, whether in electronic or paper format. Physical security means the securing of information and its storage media from physical, attack—for instance by someone entering a building to access papers, storage media, or computers. Discarded material, if not disposed of securely, may be a security risk. Even shredded documents and erased magnetic media might be reconstructed with sufficient effort. If plaintext is stored in a computer file, the storage media, the computer and its components, and all backups must be secure. Sensitive data is sometimes processed on computers whose mass storage is removable, in which case physical security of the removed disk is vital. In the case of securing a computer, useful (as opposed to handwaving) security must be physical (e.g., against burglary, brazen removal under cover of supposed repair, installation of covert monitoring devices, etc.), as well as virtual (e.g., operating system modification, illicit network access, Trojan programs). Wide availability of keydrives, which can plug into most modern computers and store large quantities of data, poses another severe security headache. A spy (perhaps posing as a cleaning person) could easily conceal one, and even swallow it if necessary. Discarded computers, disk drives and media are also a potential source of plaintexts. Most operating systems do not actually erase anything— they simply mark the disk space occupied by a deleted file as 'available for use', and remove its entry from the file system directory. The information in a file deleted in this way remains fully present until overwritten at some later time when the operating system reuses the disk space. With even low-end computers commonly sold with many gigabytes of disk space and rising monthly, this 'later time' may be months later, or never. Even overwriting the portion of a disk surface occupied by a deleted file is insufficient in many cases. Peter Gutmann of the University of Auckland wrote a celebrated 1996 paper on the recovery of overwritten information from magnetic disks; areal storage densities have gotten much higher since then, so this sort of recovery is likely to be more difficult than it was when Gutmann wrote. Modern hard drives automatically remap failing sectors, moving data to good sectors. This process makes information on those failing, excluded sectors invisible to the file system and normal applications. Special software, however, can still extract information from them. Some government agencies (e.g., US NSA) require that personnel physically pulverize discarded disk drives and, in some cases, treat them with chemical corrosives. This practice is not widespread outside government, however. Garfinkel and Shelat (2003) analyzed 158 second-hand hard drives they acquired at garage sales and the like, and found that less than 10% had been sufficiently sanitized. The others contained a wide variety of readable personal and confidential information. See data remanence. Physical loss is a serious problem. The US State Department, Department of Defense, and the British Secret Service have all had laptops with secret information, including in plaintext, lost or stolen. Appropriate disk encryption techniques can safeguard data on misappropriated computers or media. On occasion, even when data on host systems is encrypted, media that personnel use to transfer data between systems is plaintext because of poorly designed data policy. For example, in October 2007, HM Revenue and Customs lost CDs that contained the unencrypted records of 25 million child benefit recipients in the United Kingdom. Modern cryptographic systems resist known plaintext or even chosen plaintext attacks, and so may not be entirely compromised when plaintext is lost or stolen. Older systems resisted the effects of plaintext data loss on security with less effective techniques—such as padding and Russian copulation to obscure information in plaintext that could be easily guessed.

    Read more →
  • Deep learning in photoacoustic imaging

    Deep learning in photoacoustic imaging

    Photoacoustic imaging (PA) is based on the photoacoustic effect, in which optical absorption causes a rise in temperature, which causes a subsequent rise in pressure via thermo-elastic expansion. This pressure rise propagates through the tissue and is sensed via ultrasonic transducers. Due to the proportionality between the optical absorption, the rise in temperature, and the rise in pressure, the ultrasound pressure wave signal can be used to quantify the original optical energy deposition within the tissue. Photoacoustic imaging has applications of deep learning in both photoacoustic computed tomography (PACT) and photoacoustic microscopy (PAM). PACT utilizes wide-field optical excitation and an array of unfocused ultrasound transducers. Similar to other computed tomography methods, the sample is imaged at multiple view angles, which are then used to perform an inverse reconstruction algorithm based on the detection geometry (typically through universal backprojection, modified delay-and-sum, or time reversal ) to elicit the initial pressure distribution within the tissue. PAM on the other hand uses focused ultrasound detection combined with weakly focused optical excitation (acoustic resolution PAM or AR-PAM) or tightly focused optical excitation (optical resolution PAM or OR-PAM). PAM typically captures images point-by-point via a mechanical raster scanning pattern. At each scanned point, the acoustic time-of-flight provides axial resolution while the acoustic focusing yields lateral resolution. == Applications of deep learning in PACT == The first application of deep learning in PACT was by Reiter et al. in which a deep neural network was trained to learn spatial impulse responses and locate photoacoustic point sources. The resulting mean axial and lateral point location errors on 2,412 of their randomly selected test images were 0.28 mm and 0.37 mm respectively. After this initial implementation, the applications of deep learning in PACT have branched out primarily into removing artifacts from acoustic reflections, sparse sampling, limited-view, and limited-bandwidth. There has also been some recent work in PACT toward using deep learning for wavefront localization. There have been networks based on fusion of information from two different reconstructions to improve the reconstruction using deep learning fusion based networks. === Using deep learning to locate photoacoustic point sources === Traditional photoacoustic beamforming techniques modeled photoacoustic wave propagation by using detector array geometry and the time-of-flight to account for differences in the PA signal arrival time. However, this technique failed to account for reverberant acoustic signals caused by acoustic reflection, resulting in acoustic reflection artifacts that corrupt the true photoacoustic point source location information. In Reiter et al., a convolutional neural network (similar to a simple VGG-16 style architecture) was used that took pre-beamformed photoacoustic data as input and outputted a classification result specifying the 2-D point source location. ==== Deep learning for PA wavefront localization ==== Johnstonbaugh et al. was able to localize the source of photoacoustic wavefronts with a deep neural network. The network used was an encoder-decoder style convolutional neural network. The encoder-decoder network was made of residual convolution, upsampling, and high field-of-view convolution modules. A Nyquist convolution layer and differentiable spatial-to-numerical transform layer were also used within the architecture. Simulated PA wavefronts served as the input for training the model. To create the wavefronts, the forward simulation of light propagation was done with the NIRFast toolbox and the light-diffusion approximation, while the forward simulation of sound propagation was done with the K-Wave toolbox. The simulated wavefronts were subjected to different scattering mediums and Gaussian noise. The output for the network was an artifact free heat map of the targets axial and lateral position. The network had a mean error rate of less than 30 microns when localizing target below 40 mm and had a mean error rate of 1.06 mm for localizing targets between 40 mm and 60 mm. With a slight modification to the network, the model was able to accommodate multi target localization. A validation experiment was performed in which pencil lead was submerged into an intralipid solution at a depth of 32 mm. The network was able to localize the lead's position when the solution had a reduced scattering coefficient of 0, 5, 10, and 15 cm−1. The results of the network show improvements over standard delay-and-sum or frequency-domain beamforming algorithms and Johnstonbaugh proposes that this technology could be used for optical wavefront shaping, circulating melanoma cell detection, and real-time vascular surgeries. === Removing acoustic reflection artifacts (in the presence of multiple sources and channel noise) === Building on the work of Reiter et al., Allman et al. utilized a full VGG-16 architecture to locate point sources and remove reflection artifacts within raw photoacoustic channel data (in the presence of multiple sources and channel noise). This utilization of deep learning trained on simulated data produced in the MATLAB k-wave library, and then later reaffirmed their results on experimental data. === Ill-posed PACT reconstruction === In PACT, tomographic reconstruction is performed, in which the projections from multiple solid angles are combined to form an image. When reconstruction methods like filtered backprojection or time reversal, are ill-posed inverse problems due to sampling under the Nyquist-Shannon's sampling requirement or with limited-bandwidth/view, the resulting reconstruction contains image artifacts. Traditionally these artifacts were removed with slow iterative methods like total variation minimization, but the advent of deep learning approaches has opened a new avenue that utilizes a priori knowledge from network training to remove artifacts. In the deep learning methods that seek to remove these sparse sampling, limited-bandwidth, and limited-view artifacts, the typical workflow involves first performing the ill-posed reconstruction technique to transform the pre-beamformed data into a 2-D representation of the initial pressure distribution that contains artifacts. Then, a convolutional neural network (CNN) is trained to remove the artifacts, in order to produce an artifact-free representation of the ground truth initial pressure distribution. ==== Using deep learning to remove sparse sampling artifacts ==== When the density of uniform tomographic view angles is under what is prescribed by the Nyquist-Shannon's sampling theorem, it is said that the imaging system is performing sparse sampling. Sparse sampling typically occurs as a way of keeping production costs low and improving image acquisition speed. The typical network architectures used to remove these sparse sampling artifacts are U-net and Fully Dense (FD) U-net. Both of these architectures contain a compression and decompression phase. The compression phase learns to compress the image to a latent representation that lacks the imaging artifacts and other details. The decompression phase then combines with information passed by the residual connections in order to add back image details without adding in the details associated with the artifacts. FD U-net modifies the original U-net architecture by including dense blocks that allow layers to utilize information learned by previous layers within the dense block. Another technique was proposed using a simple CNN based architecture for removal of artifacts and improving the k-wave image reconstruction. ==== Removing limited-view artifacts with deep learning ==== When a region of partial solid angles are not captured, generally due to geometric limitations, the image acquisition is said to have limited-view. As illustrated by the experiments of Davoudi et al., limited-view corruptions can be directly observed as missing information in the frequency domain of the reconstructed image. Limited-view, similar to sparse sampling, makes the initial reconstruction algorithm ill-posed. Prior to deep learning, the limited-view problem was addressed with complex hardware such as acoustic deflectors and full ring-shaped transducer arrays, as well as solutions like compressed sensing, weighted factor, and iterative filtered backprojection. The result of this ill-posed reconstruction is imaging artifacts that can be removed by CNNs. The deep learning algorithms used to remove limited-view artifacts include U-net and FD U-net, as well as generative adversarial networks (GANs) and volumetric versions of U-net. One GAN implementation of note improved upon U-net by using U-net as a generator and VGG as a discriminator, with the Wasserstein metric and gradient penalty to stabilize training (WGAN-GP). ==== Pixel-wise interpolation

    Read more →
  • Data marketplace

    Data marketplace

    Data marketplace is an online platform for sharing and consuming data in the form of data assets or data products. Part of the data management stack, it aims to bring together data producers and data consumers (including business users and AI) in a single space, with the objective of increasing access to understandable, high-quality data. Included within its Data Marketplaces and Exchange (DME) category by Gartner, data marketplaces can provide data internally within an organization, externally with partners, or as open data. == Concept == Digitization has dramatically increased data volumes within organizations, with IDC predicting that by 2025 the world will contain 175 zettabytes of data. This has created a need to both manage this data and provide access to it to enable business intelligence and data analysis. However, data is often scattered within multiple systems (such as data warehouses and data lakes), and is in formats that are only understandable by technical experts, such as data scientists. According to IDC, 81% of IT leaders cite data silos as a major barrier to digital transformation. This means that data is not freely available to business users or external audiences such as partners or citizens, limiting its value, and holding back AI deployments. Data marketplaces solve this issue, providing seamless, self-service access to high-quality data in an understandable, secure and auditable manner. They break down data silos, reduce friction in data access, and enable a broader range of users, including non-technical profiles, to find, understand, and consume data autonomously. Data assets on the marketplace can be raw data, data visualizations or data products. Data marketplaces combine data management functions such as data governance with the user-friendly experience offered by e-commerce marketplaces in order to increase the usage of data. These include features such as powerful search engines, feedback, ratings, subscriptions and product description sheets. According to Gartner, data marketplaces provide infrastructure, transactional capabilities, and services for both consumers and providers of data assets. == History and timeline == Data marketplaces have evolved since they first emerged in terms of both their scope and usage. === 2000s === With the rise of the internet, data brokers began collecting, aggregating, distributing and selling personal, financial and marketing data to third parties online. Data marketplaces were deployed to monetize this data, making it discoverable and accessible to users, either through subscriptions or one-off purchases. At the same time, regulations, such as the US Open Government Initiative of 2009 and others around the world mandated greater transparency and data sharing with the public. Data sharing portals were created by public and government bodies to make this information available through self-service to all users. === 2010s === Due to the growth of big data and cloud platforms, cloud-based data exchange platforms emerged. These were offered by major infrastructure providers, and included Amazon Web Services (AWS) Data Exchange, Snowflake Data Marketplace, and the Google Cloud Platform. These platforms moved beyond simple data brokerage or open data by providing structured, catalogued data sharing between organizations. === 2020s === Driven by a need to increase internal data sharing with both business users and AI, organizations are now looking to adopt internal data marketplaces. These aim to democratize data consumption by providing seamless access for all employees and AI to trusted data, including data products, through an intuitive, e-commerce style experience. According to Gartner analyst Richa Jha, "by providing a single, governed platform for discovering, sharing, and scaling data products, data marketplaces drive productivity, collaboration, and ROI across the enterprise." == Data marketplaces within the overall data architecture == Data marketplaces provide a consumption and collaboration layer for data. That means they complement and integrate with other parts of the overall data architecture, including: === Data warehouses and data lakes === Data marketplaces connect to data sources, such as data warehouses or data lakes, to provide intuitive access to the data stored within them, enabling data to be shared and distributed to non-technical audiences. Access can be direct, with data and data products stored within the data marketplace or virtualized. === Data catalog === A data catalog provides a technical inventory of an organization's data estate. It collects technical information on all available data assets within an organization, based on metadata descriptions. This ensures traceability, and supports compliance and governance requirements. Unlike a data marketplace, a data catalog does not provide access to data, and is designed to be used by data professionals, rather than the business. This means it lacks an intuitive, understandable interface and is consequently not easily accessible by business users. === Data mesh === Data mesh is an architecture and framework for data management, first defined by Zhamak Dehghani in 2019. It aims to decentralize data ownership to delegate responsibility, empowering teams and focusing on delivering data to users in the form of self-service data products. The data marketplace is a central pillar of data mesh, providing intuitive access to these data products, and creating a collaboration space for data owners and data consumers. === Data product === Data products are high-value, consumable data assets that package high-quality data and associated tools to enable seamless usage by business users at scale. First defined by McKinsey in 2022, they have an identified owner, a service level agreement (SLA), and a reusability logic. == Core components of a data marketplace == A data marketplace typically includes specific core components: === E-commerce style interface === An e-commerce style experience that engages non-technical users, minimizes the need for training and builds confidence and trust in data. Look and feel should be customizable to incorporate corporate design guidelines to ensure consistency with other organizational applications. === Built-in data catalog === As in a standalone data catalog, this indexes all available data, based on metadata that includes type, source, owner, freshness, and quality level. === Discovery and search engine === This enables users to search, filter, explore and discover available data intuitively. As in an e-commerce marketplace, it should be intelligent, and provide relevant results based on natural language queries. === Access control and security management === Data marketplaces will contain data that needs to be protected under regulations such as the General Data Protection Regulation (GDPR) in Europe, the California Consumer Privacy Act (CCPA) in the United States, and sector-specific frameworks in industries such as finance and healthcare. To ensure both security and compliance while maximizing data consumption, the data marketplace should include granular access management and a full audit trail. === Semantic layer and business glossary === Different parts of the business are likely to use different terms to describe data. This leads to inconsistencies and an inability to share data across systems and teams. The semantic layer and business glossary standardize a shared vocabulary and common definitions of business indicators and concepts, providing a single language for data across the business and for AI agents. === Data governance mechanisms === These enforce corporate data governance policies, ensuring data traceability through data lineage, quality certification, usage monitoring, and continuous improvement through user feedback loops. === Collaboration features === As on an e-commerce website, a data marketplace should provide collaboration features that bring together data users and data owners. This includes the ability to rate data products, share use cases, and provide feedback to data owners, creating a community around data and supporting a data-driven culture. == Types of data marketplace == While they share the same underlying technology, data marketplaces can be deployed in three broad ways: === Internal data marketplaces === These bring together data from across an organization and make it available via self-service to employees from across the business. They aim to widen access to data and consequently to improve decision-making and reporting, increase performance and maximize efficiency. === Ecosystem data marketplaces === These extend sharing beyond a single organization, enabling multiple partners (public institutions, industry players, research bodies) to share and consume data within a governed framework. Data can be provided by all parties or simply by one organization and consumed by others. Ecosystem data marketplaces are particularly relevant in

    Read more →
  • Social computing

    Social computing

    Social computing is an area of computer science that is concerned with the intersection of social behavior and computational systems. It is based on creating or fostering existing social conventions and social contexts through the use of software and technology. Blogs, email, instant messaging, social network services, wikis, social bookmarking and other instances of what is often called social software illustrate ideas from social computing. The rise in social computing is attributed to the prevalence of personal devices and increased overall computing power. This enables a growing number of users to participate in sharing content and interact with another. == Definitions == Humans—and human behavior—are profoundly social. Humans tend to orient to one another and develop abilities to interact with each other and other species. This ranges from expression and gesture through spoken, written, and body language. Humans are influenced by the behavior of those around them and can rely on social context and cues to make decisions. An example of a behavior relying on social contexts is applauding at the end of the play. This is based on the context that the show ended, and other audience members are applauding. Social information provides a basis for inferences, planning, and coordinating activity. == Examples == Common tools include blogs, email, instant messaging, social networking sites, wikis, and social bookmarking platforms. These technologies enable users to generate content, share knowledge, and interact in real time. == Applications == The rise of social computing has highlighted opportunities for businesses. Businesses are interacting on social computing platforms and investing in facilities to support and research social computing.Business models can leverage the massive customer bases that accumulate through social computing channels. Some organizations have started their own blogs and networks (McAfee, 2006, Joe, 2005). Organizations from diverse industry sectors such as Google, Cisco, and Fox, have sought to acquire or invest in successful social computing enterprises. A business blog can serve as a source of information and promotion for the company. This allows the company to share content about the company and their initiatives. Businesses have also interacted with social computing to market themselves and interact with customers. A notable example is Wendy's with their X (formerly Twitter) account. The account was primarily used to promote business promotions and interact with users in a playful or meaningful way. E-commerce web sites have allowed users to leave reviews and feedback on purchases which has improved online shopping experience for sellers and consumers.As another example of social computing’s business applications, many e-commerce Web sites have adopted online product/vendor feedback/reputation systems. Such systems provide an asynchronous platform for the consumer community to share experiences collectively and influence their purchasing behavior. They also provide a vehicle for eliciting feedback information valuable to the vendors and e-commerce site operators.Consumers can use the feedback systems to make a more educated choice on a purchase by comparing reviews between products or vendors. Sellers can track consumer behaviors and trends regarding a product and adjust their supply according to the demand. == Challenges and criticism == Social computing raises several concerns related to privacy, data security, and algorithmic bias. The widespread collection and analysis of user-generated data can lead to ethical dilemmas, especially when users are unaware of how their information is used. Critics also highlight issues of digital labor, surveillance, and the spread of misinformation, which can influence public opinion and social dynamics. === Term appearance === The term appeared in the mid 1990s after technology advancements and development of the web. In 1994, the concept of social computing was first proposed by Schuler. He thought, "Social computing is a computing application, with software as the medium or focus of social relationships." === Premise === The premise of social computing is that it is possible to design digital systems that support useful functionality by making socially produced information available to their users. This information may be provided directly, as when systems show the number of users who have rated a review as helpful or not. Or the information may be provided after being filtered and aggregated, as is done when systems recommend a product based on what else people with similar purchase history have purchased. Alternatively, the information may be provided indirectly, as is the case with Google's page rank algorithms which orders search results based on the number of pages that (recursively) point to them. In all of these cases, information that is produced by a group of people is used to provide or enhance the functioning of a system. Social computing is concerned with systems of this sort and the mechanisms and principles that underlie them. Social computing can be defined as follows: "Social Computing" refers to systems that support the gathering, representation, processing, use, and dissemination of information that is distributed across social collectivities such as teams, communities, organizations, and markets. Moreover, the information is not "anonymous" but is significantly precise because it is linked to people, who are in turn linked to other people. More recent definitions, however, have foregone the restrictions regarding anonymity of information, acknowledging the continued spread and increasing pervasiveness of social computing. As an example, Hemmatazad, N. (2014) defined social computing as "the use of computational devices to facilitate or augment the social interactions of their users, or to evaluate those interactions in an effort to obtain new information." Social computing has to do with supporting "computations" that are carried out by groups of people, an idea that has been popularized in James Surowiecki's book, The Wisdom of Crowds. Examples of social computing in this sense include collaborative filtering, online auctions, reputation systems, computational social choice, tagging, and verification games. The social information processing page focuses on this sense of social computing. == History == === Technology infrastructure === Users were able to interact more with websites after the development of Web 2.0. This was an advancement from Web 1.0. Comode G. and Krishnamurthy B. (2008) note that "content creators were few in Web 1.0 with the vast majority of users simply acting as consumers of content." Web 2.0 provided functionalities that allowed for low-cost web-hosting services and introduced features with browser windows that used basic information structure and expanded it to as many devices as possible using HTTP, or Hypertext Transfer Protocol. Sometimes referred to as "Enterprise 2.0", a term derived from Web 2.0, social software for enterprise generally refers to the use of social computing in corporate intranets and in other medium- and large-scale business environments. It consisted of a class of tools that allowed for networking and social changes to businesses at the time. It was a layering of the business tools on Web 2.0 and brought forth several applications and collaborative software with specific uses. FinanceElectronic negotiation, which first came up in 1969 and was adapted over time to suit financial markets networking needs, represents an important and desirable coordination mechanism for electronic markets. Negotiation between agents (software agents as well as humans) allows cooperative and competitive sharing of information to determine a proper price. Recent research and practice has also shown that electronic negotiation is beneficial for the coordination of complex interactions among organizations. Electronic negotiation has recently emerged as a very dynamic, interdisciplinary research area covering aspects from disciplines such as Economics, Information Systems, Computer Science, Communication Theory, Sociology and Psychology.Social computing has become more widely known because of its relationship to a number of recent trends. These include the growing popularity of social software and Web 3.0, increased academic interest in social network analysis, the rise of open source as a viable method of production, and a growing conviction that all of this can have a profound impact on daily life. A February 13, 2006 paper by market research company Forrester Research suggested that: === Developments === PLATO was one of the earliest examples of social computing in a live production environment with initially hundreds and soon thousands of users. The PLATO computer system was developed by the University of Illinois at Urbana Champaign in 1960s. In the 70s, the system supported social software applications for multi-us

    Read more →
  • Pamphlet war

    Pamphlet war

    A pamphlet war is a protracted argument or discussion through printed media, especially between the time the printing press became common, and when state intervention like copyright laws made such public discourse more difficult. The purpose was to defend or attack a certain perspective or idea. Pamphlet wars have occurred multiple times throughout history, as both social and political platforms. Pamphlet wars became viable platforms for this protracted discussion with the advent and spread of the printing press. Cheap printing presses, and increased literacy made the late 17th century a key stepping stone for the development of pamphlet wars, a period of prolific use of this type of debate. Over 2200 pamphlets were published between 1600–1715 alone. Pamphlet wars are generally credited for powering many key social changes of the era, including the Reformation and the Revolution Controversy, the English philosophical debate set off by the French Revolution. == History of the pamphlet in England == Throughout Europe in the 16th century, printed tracts were used to argue religious doctrine and foment support for religious causes. In England, Henry VIII used print literature to justify his break from the Catholic Church. During the subsequent reigns of Edward and Mary, print polemics escalated into propaganda warfare, as print media gained enormous potential to sway common opinion. By the 1560s, print was widely used to convey news. In 1562, the first pamphlets appeared, which discussed the English forces sent to aid the Protestant French Huguenots. In 1569, pamphlets reported the revolt of the Northern Earls and the subsequent Rebellion of the same year. In the 1580s, pamphlets began to replace broadsheet ballads as the means to convey information to the general public. Over the next century, the pamphlet became the principal means of garnering support for a cause or an idea, and was particularly influential during the English Civil Wars (1642-1651) and the Glorious Revolution of 1688. Through the ensuing decades, the pamphlet lost some popularity due to the emergence of newspapers and journals, but continued to be an important medium of public debate, as illustrated by the Revolution Controversy a full century later in the 1790s. == Pamphlet printing == Coming from a Latin word, "pamphlet" literally means "small book." In the early days of printing, the format of the book or pamphlet depended on the size of the paper used and the number of times it was folded. If a page was only folded once, it was called a folio. If it was folded twice, it was known as a quarto. An octave was a paper folded three times. A pamphlet was usually 1-12 sheets of paper folded in quarto, or 8-96 pages. It was sold for one or two pennies apiece. The printing of a pamphlet involved many people: the author, the printer, suppliers, print-makers, compositor, correctors, pressmen, binders, and distributors. Once the pamphleteer had written the pamphlet, it was sent to the printing house to be corrected, set into type, and printed. The papers were then given to the printer's warehouse-keeper, who bundled the copies and sent them to the bookseller, who was probably the one financing the printing. He was responsible to bind the pamphlets, usually by sewing them, and then sold them wholesale to individual bookselling vendors. The booksellers then sold them from a stall in the marketplace. == Pamphlet subjects == Pamphlets began as the means of conveyance for religious debates, and therefore religious topics were one of the main subjects they dealt with. The definition of a pamphlet came to mean a short work dealing with social, political, or religious issues. Typical topics included the Civil war, Church of England doctrines, Acts of Parliament, the Popish Plot (see below), the Stuart Era, and Cromwell propaganda. In addition, pamphlets were also used for romantic fiction, autobiography, scurrilous personal abuse, and social criticism. They contained much of the propaganda of the 17th century in the midst of the religious and political turmoil. They were also used for debates between the Puritans and the Anglican. During the Glorious Revolution, pamphlets were political weapons. == Authors == There were many authors of pamphlets. However some of the more popular authors include Daniel Defoe, Thomas Hobbes, Jonathan Swift, John Milton, and Samuel Pepys. Also included in the midst are Thomas Nashe, Joseph Addison, Richard Steele, and Matthew Prior. In 1591–1592, Robert Greene released a series of pamphlets which later inspired many other authors including Thomas Middleton and Thomas Dekker. == Critics == Pamphlets, along with their vast popularity, received criticism. There were many in the time period who believed that pamphlets were full of foolishness. They thought the pamphlets were not good enough literature and that they would turn people from "good" writing. They believed that pamphlets would be the end of the great volumes of literature and that great writing would be forgotten. == News reporting == Pamphlets made a great difference in the way news was reported to the general public. With the publication of pamphlets, it was no longer difficult for people to hear of events taking place far away. The closer the occurrence was to London, the easier and faster people heard of it. For example, the Battle of Edgehill took place on 23 October 1642. The first pamphlet reporting the incident was printed on 25 October 24 hours after some of the orders reported had been given. While not entirely accurate, and hurriedly made, the pamphlet nonetheless was able to tell the general public what had happened in the battle. A more accurate, specific, and readable account was available in a pamphlet printed on 26 October, and the "authorized" version was available only five days after the battle took place. == Marprelate pamphlets == In 1588, a series of pamphlets marked a turning point for the Puritans, dividing them from other Protestants in the country. The authors wrote under the pseudonym of Martin Marprelate and his two sons of the same name. The true identities of the authors were never discovered. The pamphlets aimed to provoke authorities to take action against censorship. The series was among the first to ask questions directly of its readers. == Early pamphlet wars == === Elizabethan pamphlet wars === As a means of forming or swaying public opinion, pamphlets like these had a part in influencing society, even as the content was itself influenced by society. During the 16th century and continuing for a short while in the early 17th century in England there was rise in the use of pamphlet wars to discuss a myriad of issues spanning from the civil war, to religious freedoms and the roles of women in society. The Queen herself participated in these discussions, making sure that she was widely read and understood by her people in order to gain favour and establish herself as the monarch despite being a woman. Examples of her use of this medium appear in To the Troops at Tilbury written in 1588, On Mary's Execution written in 1586, and many more. Another famous writer of this period to take advantage of the pamphlet was Emilia Lanier, famous for her arguments about the role of women. A common idea promoted by many literary works and the general attitude towards women, Lanier's work "Eve's Apology in Defence of Women" refuted the belief that Eve is responsible for the fall of man. A very uncommon and unpopular stance to take, Lanier accomplishes her defence through structuring it as an apology, one of the earliest subversive feminist texts. Similarly, Francis Bacon wrote his Essays to promote his idea of morality and other complicated social issues. For example, his work, "Of Love" examines the various understandings of the concept of love, particularly as it was perceived during the Elizabethan era. === Eikon Series === From 1649 until 1651, some five pamphlets were published in a debate about the execution of King Charles I of England (1600-1649). Prior to his execution, King Charles wrote the first pamphlet in the discussion, Eikon Basilike’’ (from the Greek “eikon” for image and “basileus” for king). The subtitle of this work - Portraiture of His Sacred Majesty in His Solitudes and Sufferings - indicates that Charles sought to portray himself as a martyr to the cause of regal prerogative. In the following months, several response pamphlets were published (collectively known as the "Eikon" series), including: Eikon Alethine, Eikon e Pistes, Eikonoklastes, and Eikon Aklastos,” alternately attacking or defending the king, his regicide, and his self-portrait in “Eikon Basilike.” == Popish Plot and Elizabeth Cellier == In the 1680s, after being acquitted of the "Meal-Tub Plot" for which she was accused, Elizabeth Cellier wrote Malice Defeated, which, along with The Matchless Picaro, sparked a pamphlet war surrounding debate of the ascension of a Catholic king to the thro

    Read more →
  • Pixel binning

    Pixel binning

    Pixel binning, also known as binning, is a process image sensors of digital cameras use to combine adjacent pixels throughout an image, by summing or averaging their values, during or after readout. It improves low-light performance while still allowing for highly detailed photographs in good light. Charge from adjacent pixels in CCD or charge-coupled device image sensors and some other image sensors can be combined during readout, increasing the line rate or frame rate. In the context of image processing, binning is the procedure of combining clusters of adjacent pixels, throughout an image, into single pixels. For example, in 2×2 binning, an array of 4 pixels becomes a single larger pixel, reducing the number of pixels to 1/4 and halving the image resolution in each dimension. The result can be the sum, average, median, minimum, or maximum value of the cluster. Some systems use more advanced algorithms such as considering the values of nearby pixels, edge detection, self-claimed "AI", etc. to increase the perceived visual quality of the final downsized image. This aggregation, although associated with loss of information, reduces the amount of data to be processed, facilitating analysis. The binned image has lower resolution, but the relative noise level in each pixel is generally reduced. == History == Normally, an increase in megapixel count on a constant image sensor size would lead to a sacrifice of the surface size of the individual pixels, which would result in each pixel being able to catch less light in the same time, thus leading to a darker and/or noisier image in low light (given the same exposure time). In the past, camera manufacturers had to compromise between low-light performance and the amount of detail in good light, by dropping the megapixel count like HTC did in 2013 with their four-megapixel "UltraPixel" camera. However, this results in less detailed images in daylight where enough light is available. With pixel binning, the camera has "the best of both worlds", meaning both the benefit of high detail in good light and the benefit of high brightness in low light. In low light, the surfaces of four or more pixels can act as one large pixel that catches far more light. For example, some smartphones such as the Samsung Galaxy A15 are able to capture photographs with up to fifty megapixels in daylight. However, in low light, the individual pixels would be too small to capture the light needed for a bright image with the short exposure time available for handheld shooting. Therefore, with pixel binning activated, the 50-megapixel image sensor acts as a 12.5-megapixel image sensor, a quarter of its original resolution, with an accordingly larger surface area per pixel.

    Read more →
  • Bitcoin Satoshi Vision

    Bitcoin Satoshi Vision

    Bitcoin Satoshi Vision (BSV) is a cryptocurrency that is a hard fork of Bitcoin Cash. Bitcoin Satoshi Vision was created in November 2018 by a group of individuals led by Craig Steven Wright, who has claimed since 2015 to be Satoshi Nakamoto, the creator of the original bitcoin. == History == === 2018 split from Bitcoin Cash === On 15 November 2018, a hard fork chain split of Bitcoin Cash occurred between two rival factions called Bitcoin Cash and Bitcoin SV. On 15 November 2018 Bitcoin Cash traded at about $289, and Bitcoin SV traded at about $96.50, down from $425.01 on 14 November for the un-split Bitcoin Cash. The split originated from what was described as a "civil war" in two competing Bitcoin Cash camps. The first camp, supported by entrepreneur Roger Ver and Jihan Wu of Bitmain, promoted the software entitled Bitcoin ABC (short for Adjustable Blocksize Cap), which would maintain the block size at 32 MB. The second camp led by Craig Steven Wright and billionaire Calvin Ayre put forth a competing software version Bitcoin SV, short for "Bitcoin Satoshi Vision", which would increase the block size limit to 128 MB. === 2019 de-listing from Binance === In April 2019, an online feud broke out between those who supported the claims of Bitcoin SV supporter Craig Wright that he was Satoshi Nakamoto, and those who did not. The feud resulted in cryptocurrency exchange Binance de-listing Bitcoin SV from their platform, stating that: At Binance, we periodically review each digital asset we list to ensure that it continues to meet the high level of standard we expect. When a coin or token no longer meets this standard, or the industry changes, we conduct a more in-depth review and potentially delist it. We believe this best protects all of our users. When we conduct these reviews, we consider a variety of factors. Here are some that drive whether we decide to delist a digital asset: Commitment of team to project Level and quality of development activity Network / smart contract stability Level of public communication Responsiveness to our periodic due diligence requests Evidence of unethical / fraudulent conduct Contribution to a healthy and sustainable crypto ecosystem === 2021 network attack === In August 2021, Bitcoin SV suffered a 51% attack, after previously suffering attacks in June and July of the same year. Such an attack involves cryptocurrency miners gaining control of more than half of a network's computing power; these kinds of network attacks have the goal of preventing new transactions from gaining confirmations, allowing the attackers to double-spend coins. Adam James, senior editor at OKEx Insights claimed that "In the intermediate term, the attack has seemingly somewhat-negligible impact on its current price action," however "Faith in [Bitcoin SV] will likely be reduced following the incident." === 2024 high court ruling === In March 2024, Mr Justice James Mellor in the British High Court ruled that Wright is not Satoshi Nakamoto.

    Read more →
  • Social bot

    Social bot

    A social bot, refers to fully or partially automated social media accounts designed to perform most regular users’ actions, such as liking, posting content, and chatting with other users. Although their levels of autonomy vary, and often include a human-in-the-loop, social bots can use artificial intelligence to perform social media actions and can use large language models to mimic human dialogue. Social bots can operate alone or in groups that coordinate messaging as part of a network of coordinated inauthentic behavior. Social bots are often used to perform ad fraud by artificially boosting viewership and engagement metrics and to spread disinformation on social media. == Uses == Social bots are used for a large number of purposes on a variety of social media platforms, including Twitter, Instagram, Facebook, and YouTube. One common use of social bots is to inflate a social media user's apparent popularity, usually by artificially manipulating their engagement metrics with large volumes of fake likes, reposts, or replies. Social bots can similarly be used to artificially inflate a user's follower count with fake followers, creating a false perception of a larger and more influential online following than is the case. The use of social bots to create the impression of a large social media influence allows individuals, brands, and organizations to attract a higher number of human followers and boost their online presence. Fake engagement can be bought and sold in the black market of social media engagement. Corporations typically use automated customer service agents on social media to affordably manage high levels of support requests. Social bots are used to send automated responses to users’ questions, sometimes prompting the user to private message the support account with additional information. The increased use of automated support bots and virtual assistants has led to some companies laying off customer-service staff. Social bots are also often used to influence public opinion. Autonomous bot accounts can flood social media with large numbers of posts expressing support for certain products, companies, or political campaigns, creating the impression of organic grassroots support. This can create a false perception of the number of people who support a certain position, which may also have effects on the direction of stock prices or on elections. Messages with similar content can also influence fads or trends. Many social bots are also used to amplify phishing attacks. These malicious bots are used to trick a social media user into giving up their passwords or other personal data. This is usually accomplished by posting links claiming to direct users to news articles that would in actuality direct to malicious websites containing malware. Scammers often use URL shortening services such as TinyURL and bit.ly to disguise a link's domain address, increasing the likelihood of a user clicking the malicious link. The presence of fake social media followers and high levels of engagement help convince the victim that the scammer is in fact a trusted user. Social bots can be a tool for computational propaganda. Bots can also be used for algorithmic curation, algorithmic radicalization, and/or influence-for-hire, a term that refers to the selling of an account on social media platforms. == History == Bots have coexisted with computer technology since the earliest days of computing. Social bots have their roots in the 1950s with Alan Turing, whose work focused on machine intelligence with the development of the Turing Test. The following decades saw further progress made towards the goal of creating programs capable of mimicking human behavior, notably with Joseph Weizenbaum’s creation of ELIZA. Considered to be one of the first Chatbots, ELIZA could simulate natural conversations with human users through pattern matching. Its most famous script was DOCTOR, a simulation of a Rogerian psychotherapist that was programmed to chat with patients and respond to questions. With the growth of social media platforms in the early 2000s, these bots could be used to interact with much larger user groups in an inconspicuous manner. Early instances of autonomous agents on social media could be found on sites like MySpace, with social bots being used by marketing firms to inflate activity on a user’s page in an effort to make them appear more popular. Social bots have been observed on a large variety of social media websites, with Twitter being one of the most widely observed examples. The creation of Twitter bots is generally against the site’s terms of service when used to post spam or to automatically like and follow other users, but some degree of automation using Twitter’s API may be permitted if used for “entertainment, informational, or novelty purposes.” Other platforms such as Reddit and Discord also allow for the use of social bots as long as they are not used to violate policies regarding harmful content and abusive behavior. Social media platforms have developed their own automated tools to filter out messages that come from bots, although they cannot detect all bot messages. == Legal regulation == Due to the difficulty of recognizing social bots and separating them from "eligible" automation via social media APIs, it is unclear how legal regulation can be enforced. Social bots are expected to play a role in shaping public opinion by autonomously acting as influencers. Some social bots have been used to rapidly spread misinformation, manipulate stock markets, influence opinion on companies and brands, promote political campaigns, and engage in malicious phishing campaigns. In the United States, some states have started to implement legislation in an attempt to regulate the use of social bots. In 2019, California passed the Bolstering Online Transparency Act (the B.O.T. Act) to make it unlawful to use automated software to appear indistinguishable from humans for the purpose of influencing a social media user's purchasing and voting decisions. Other states such as Utah and Colorado have passed similar bills to restrict the use of social bots. The Artificial Intelligence Act (AI Act) in the European Union is the first comprehensive law governing the use of Artificial Intelligence. The law requires transparency in AI to prevent users from being tricked into believing they are communicating with another human. AI-generated content on social media must be clearly marked as such, preventing social bots from using AI in a manner that mimics human behavior. == Detection == The first generation of bots could sometimes be distinguished from real users by their often superhuman capacities to post messages. Later developments have succeeded in imprinting more "human" activity and behavioral patterns in the agent. With enough bots, it might be even possible to achieve artificial social proof. To unambiguously detect social bots as what they are, a variety of criteria must be applied together using pattern detection techniques, some of which are: cartoon figures as user pictures sometimes also random real user pictures are captured (identity fraud) reposting rate temporal patterns sentiment expression followers-to-friends ratio length of user names variability in (re)posted messages engagement rate (like/followers rate) analysis of the time series of social media posts Social bots are always becoming increasingly difficult to detect and understand. The bots' human-like behavior, ever-changing behavior of the bots, and the sheer volume of bots covering every platform may have been a factor in the challenges of removing them. Social media sites, like Twitter, are among the most affected, with CNBC reporting up to 48 million of the 319 million users (roughly 15%) were bots in 2017. Botometer (formerly BotOrNot) is a public Web service that checks the activity of a Twitter account and gives it a score based on how likely the account is to be a bot. The system leverages over a thousand features. An active method for detecting early spam bots was to set up honeypot accounts that post nonsensical content, which may get reposted (retweeted) by the bots. However, bots evolve quickly, and detection methods have to be updated constantly, because otherwise they may get useless after a few years. One method is the use of Benford's Law for predicting the frequency distribution of significant leading digits to detect malicious bots online. This study was first introduced at the University of Pretoria in 2020. Another method is artificial-intelligence-driven detection. Some of the sub-categories of this type of detection would be active learning loop flow, feature engineering, unsupervised learning, supervised learning, and correlation discovery. Some operations of bots work together in a synchronized way. For example, ISIS used Twitter to amplify its Islamic content by numerous orchestrated accounts which further pushed an item to the Hot List news, thus further a

    Read more →
  • Yao's test

    Yao's test

    In cryptography and the theory of computation, Yao's test is a test defined by Andrew Chi-Chih Yao in 1982, against pseudo-random sequences. A sequence of words passes Yao's test if an attacker with reasonable computational power cannot distinguish it from a sequence generated uniformly at random. == Formal statement == === Boolean circuits === Let P {\displaystyle P} be a polynomial, and S = { S k } k {\displaystyle S=\{S_{k}\}_{k}} be a collection of sets S k {\displaystyle S_{k}} of P ( k ) {\displaystyle P(k)} -bit long sequences, and for each k {\displaystyle k} , let μ k {\displaystyle \mu _{k}} be a probability distribution on S k {\displaystyle S_{k}} , and P C {\displaystyle P_{C}} be a polynomial. A predicting collection C = { C k } {\displaystyle C=\{C_{k}\}} is a collection of boolean circuits of size less than P C ( k ) {\displaystyle P_{C}(k)} . Let p k , S C {\displaystyle p_{k,S}^{C}} be the probability that on input s {\displaystyle s} , a string randomly selected in S k {\displaystyle S_{k}} with probability μ ( s ) {\displaystyle \mu (s)} , C k ( s ) = 1 {\displaystyle C_{k}(s)=1} , i.e. Moreover, let p k , U C {\displaystyle p_{k,U}^{C}} be the probability that C k ( s ) = 1 {\displaystyle C_{k}(s)=1} on input s {\displaystyle s} a P ( k ) {\displaystyle P(k)} -bit long sequence selected uniformly at random in { 0 , 1 } P ( k ) {\displaystyle \{0,1\}^{P(k)}} . We say that S {\displaystyle S} passes Yao's test if for all predicting collection C {\displaystyle C} , for all but finitely many k {\displaystyle k} , for all polynomial Q {\displaystyle Q} : === Probabilistic formulation === As in the case of the next-bit test, the predicting collection used in the above definition can be replaced by a probabilistic Turing machine, working in polynomial time. This also yields a strictly stronger definition of Yao's test (see Adleman's theorem). Indeed, one could decide undecidable properties of the pseudo-random sequence with the non-uniform circuits described above, whereas BPP machines can always be simulated by exponential-time deterministic Turing machines.

    Read more →
  • ZeroPC

    ZeroPC

    ZeroPC was a commercial webtop developed by ZeroDesktop, Inc. located in San Mateo, California. ZeroPC has been called a personal cloud OS. It mimicked the look, feel and functionality of the desktop environment of a real operating system. The software was launched in September 2011 through Disrupt SF 2011 event and recently selected to the finalist of SXSW 2012 in Innovative Web Technology category. ZeroPC is web-based and required a Java applet to operate bundled productivity tool Thinkfree. The web applications found on ZeroPC are built on Java in the back end. Features included drag-and-drop functionality, cloud dashboard and personal cloud storage meta services. ZeroPC belonged to a category of services that intended to turn the Web into a full-fledged platform by using Web services as a foundation along with presentation technologies that replicated the experience of desktop applications for users. ZeroPC aggregates content so users can easily access, transfer and share whatever content they want, using a web browser from any device. Its meta-cloud layer supports Dropbox, Box, SugarSync, OneDrive, 4Shared, Google Drive, Evernote, Picasa, Flickr, Instagram, Facebook, Twitter, and Photobucket. ZeroPC Cloud OS platform also provides extensive APIs for iOS and Android App developers. Some of the features found on ZeroPC are: File sharing, Webmail, Cloud Content Navigator, Instant messenger, Sticky Note, Audio/Video Player and Office productivity applications. ZeroPC 2.0 platform ran on AWS for free and paid users. Its platform is licensable to Telco and ISV for commercial purpose. Their clients are SFR, SK Telecom, Hancom and others. As of June 1, 2017, ZeroPC's servers were switched off completely, and ZeroPC is no longer in service since its parent company, NComputing, had launched Virtual Desktop Service in the cloud (AWS) to public. == Browser and Platform Compatibility == The ZeroPC web desktop was compatible with Mac OS X and Microsoft Windows platforms. It is certified to operate on Safari 6.0, Firefox 15.0.1, Google Chrome 22.0.1229.79 m and Internet Explorer 8 and 9. The ZeroPC front end user interface executes entirely within a web browser (see above) and uses HTML, some features of HTML5, JavaScript, AJAX and an optional Java plug-in. == Security == All communication between the ZeroPC front end user interface and the ZeroPC back end servers is encrypted using SSL (HTTPS) protocol. Furthermore, any content stored in the ZeroPC server-side repository is also encrypted using 256-bit Advanced Encryption Standard (AES-256) by Amazon S3 on AWS. ZeroPC users could connect their ZeroPC profile to other storage services such as Dropbox and Box. This connection allows the ZeroPC user to fully manage their content stored in these other storage services. To establish the connection ZeroPC rigorously adhered to the Oauth implementation provided by the target storage service. Upon completion of the Oauth process, ZeroPC stores the relevant access token in the user's profile. This token, along with all other sensitive password related data was encrypted using AES 256-bit key size. == Implementations == As noted above, the ZeroPC platform was hosted on Amazon Web Services infrastructure and is available to the general consumer. A user was allowed to sign up by selecting one of three account plans including a no-cost option. The ZeroPC could also be white-labeled for organizations wishing to provide this functionality to their own users. The white-label options include managed hosting on Amazon Web Services infrastructure and also installation within the organization's IT infrastructure. == User Access Points == The ZeroPC infrastructure provided user access to content and features in several different ways. As described in this article the user can access their information by signing into the ZeroPC web desktop. Additionally, ZeroPC offers native applications designed to run on popular mobile devices including smartphones and tablets. == Leadership == ZeroPC was founded by Chief Executive Officer, Young Song, an entrepreneur who previously founded NComputing, a $60 million venture-backed company. He also co-founded eMachines, Inc., a low-cost computer brand (later acquired by Gateway).

    Read more →
  • Stegomalware

    Stegomalware

    Stegomalware is a form of malicious software that leverages steganography techniques to conceal its code, configuration data, or command-and-control (C&C) communications within seemingly benign digital media such as images, audio files, videos, documents, or network traffic. It typically embeds encrypted or obfuscated payloads into digital media and only extracts and executes them at runtime, which makes traditional signature-based and sandbox-based detection significantly more difficult. Stegomalware has been observed in attacks ranging from advanced persistent threats (APTs) to financially motivated cybercrime, and is now the subject of dedicated academic surveys, research projects, and international law-enforcement initiatives. The key distinction between stegomalware and traditional obfuscated malware lies in the encoding location. After obfuscation, malicious code remains present within the executable and can theoretically be discovered through static analysis. In contrast, stegomalware hides the payload entirely within a cover medium (image, audio, etc.), remaining invisible until the malware dynamically extracts and executes it at runtime. == History == The term stegomalware was formally introduced by researchers Águila, Laskov, and others in the context of mobile malware and presented at the Inscrypt (Information Security and Cryptology) conference in 2014. This marked the first academic formalization of the concept, though earlier work had already identified that botnets and mobile malware could use steganography and covert channels for command-and-control communication over probabilistically unobservable channels. Since its introduction, stegomalware has evolved from a theoretical concern to a documented threat. In 2011, the APT operation known as "Operation Shady RAT" became one of the first documented cases of stegomalware in the wild, using digital images to hide Internet Protocol addresses and command-and-control server addresses. The same year, the Duqu malware (targeting industrial manufacturers) embedded victim data into JPEG image files before exfiltration, making the data transfer virtually undetectable to network-level security tools. From 2014 onwards, stegomalware became more prevalent in organized cybercrime and advanced persistent threat campaigns. Notable examples include Zeus/Zbot, which masked configuration data in images; Gatak/Stegoloader, which hid shellcode in PNG files; TeslaCrypt, which embedded C&C commands in JPEGs; and Cerber, which concealed ransomware payloads within images. By the 2010s, stegomalware had become established as a preferred evasion technique for espionage, financial theft, and ransomware distribution campaigns. Recent surveys (2020–2025) document that stegomalware has increasingly been exploited by adversaries targeting banks, enterprises, government agencies, educational institutions, and internet users via malvertising campaigns. The technique is now considered a sophisticated method of attack worthy of dedicated international law-enforcement attention. == Technical Characteristics and Definitions == Stegomalware operates through a three-component architecture: Stegotext (R): An innocent-looking digital asset (image, audio file, etc.) into which the malicious payload is embedded. Secret key (sk): A key used by the embedding and extraction algorithms, typically hardcoded into the malware. Payload (p): The actual malicious code, configuration data, or C&C commands hidden within the stegotext. The malware extracts the payload at runtime using the secret key and either executes it directly or uses it to download additional stages of the attack. Stegomalware can be classified into several types based on deployment method: Type 0 (Autonomous): Both the stegotext and extraction algorithm are embedded within the malware application itself. The malicious payload is extracted and executed locally without external communication. Type I (Update): The stegotext and secret key are downloaded from a remote server at runtime; only the extraction algorithm is included in the malware. This variant is more flexible, allowing attackers to push updated payloads. Type II (External Algorithm): Neither the stegotext nor the extraction algorithm are distributed with the malware; both are fetched from an attacker-controlled infrastructure, providing maximum flexibility and evasion. == Steganography techniques == === Spatial domain methods === Stegomalware predominantly uses steganographic methods designed for images, as images are the most common cover medium in the wild. The most basic spatial domain technique is Least Significant Bit (LSB) substitution, which replaces the least significant bits of pixel color values with payload bits. While simple and easy to implement, LSB is also relatively easy to detect through statistical analysis. More sophisticated spatial domain techniques include: HUGO (High Undetectable steGO) (2010): Minimizes detectable distortion by distributing the payload across multiple pixels, achieving embedding capacity with reduced statistical footprint. WOW (Wavelet Obtained Weights) (2012): Embeds data preferentially in textured regions of images where modifications are less perceptually noticeable. UNIWARD (Universal Wavelet Relative Distortion) (2014): Uses a universal distortion function applicable to multiple image formats, balancing payload capacity with undetectability. HILL (2014): Applies high-pass and low-pass filters to identify robust embedding regions. MiPOD (Minimizing the Power of Optimal Detector) (2016): Designed to minimize the power of theoretical optimal steganalysis detectors. === Transform domain methods === Transform domain techniques convert images into the frequency domain (e.g., using DCT or DWT) before embedding, allowing for more robust hiding in JPEG and other compressed formats: Embedding in DCT coefficients (used in JPEG compression) Embedding in DWT coefficients (used in lossless formats) Spread spectrum techniques, which distribute the payload across many frequency components Transform domain methods are generally more resistant to noise, compression, and image transformations than spatial methods. === Generative adversarial network (GAN) methods === Recent advances in machine learning have introduced GAN-based steganography, where a generative model produces stego images that minimize detectable artifacts: SGAN (Steganographic GAN) (2017): First GAN applied to steganography, using a generator, discriminator, and steganalysis network. ASDL-GAN (2017): Performs automatic steganographic distortion learning at the pixel level. SteganoGAN (2019): Improves upon earlier GAN models, achieving higher embedding capacity and robustness. HiGAN (Hiding Images GAN) (2020): Enables hiding one image within another while maintaining visual plausibility. GAN-based approaches are more resilient to standard steganalysis attacks but remain an emerging threat requiring further research. == Notable malware campaigns == Stegomalware has been documented in numerous high-profile cyber attacks and campaigns. Notable examples include: Operation Shady RAT (2011): Used digital images to hide command-and-control server addresses in targeted espionage. Duqu (2011): Embedded victim data into JPEG files to exfiltrate industrial control system information. Zeus/Zbot (2014): Masked banking configuration data inside JPEG files exploited via malvertising. Gatak/Stegoloader (2015): Hid shellcode in PNG files for software licensing attacks and bot command execution. TeslaCrypt (2015): Embedded C&C commands and ransomware keys in JPEG images. Cerber (2016): Concealed executable ransomware code in JPEG files distributed via phishing. DNSChanger (2016): Embedded malicious code in PNG files for DNS hijacking campaigns. Sundown Exploit Kit (2017): Distributed exploit code in PNG files via malvertising. AdGholas (2017): Used JPEG steganography to distribute ransomware via malvertising. Synccrypt (2017): Hidden ransomware components in JPEG-steganographic encrypted archives. ZeroT/PlugX (2017): Hid Remote Access Trojan payloads in BMP files for espionage. Loki Bot (2018): Concealed malware installers in JPEG and video files. Waterbug (APT28) (2019): Injected malicious DLLs into WAV audio files. Shlayer (macOS adware) (2019): Hid malicious URLs in JPEG files via malvertising. === Attack vectors === The most common attack vectors for stegomalware include: Phishing emails with malicious attachments or links Malvertising campaigns using malicious banner advertisements Exploit kits through compromised or malicious websites Legitimate application vulnerabilities (e.g., watering-hole attacks) Fake software distribution (cracked software, keygen tools) === Exploitation stages === Stegomalware typically serves one or more roles in attack lifecycles: Payload delivery: Stego images contain full executable code or shellcode. C&C communication: Hidden data contains server addresses or command instructio

    Read more →
  • Chaos Communication Congress

    Chaos Communication Congress

    The Chaos Communication Congress is an annual hacker conference organized by the Chaos Computer Club. The congress features a variety of lectures and workshops on technical and political issues related to security, cryptography, privacy and online freedom of speech. It has taken place regularly at the end of the year since 1984, with the current date and duration (27–30 December) established in 2005. It is considered one of the largest events of its kind, alongside DEF CON in Las Vegas. == History == The congress is held in Germany. It started in 1984 in Hamburg, moved to Berlin in 1998, and back to Hamburg in 2012, having exceeded the capacity of the Berlin venue with more than 4500 attendees. Since then, it attracts an increasing number of people: around 6600 attendees in 2012, over 13000 in 2015, and more than 15000 in 2017. From 2017 to 2019, it took place at the Trade Fair Grounds in Leipzig, since the Hamburg venue (CCH) was closed for renovation in 2017 and the existing space was not enough for the growing congress. The congress moved back to Hamburg in 2023, after the renovation of CCH was finished. A large range of speakers are featured. The event is organized by volunteers called Chaos Angels. The non-members entry fee for four days was €100 in 2016, and was raised to €120 in 2018 to include a public transport ticket for the Leipzig area. An important part of the congress are the assemblies, semi-open spaces with clusters of tables and internet connections for groups and individuals to collaborate and socialize in projects, workshops and hands-on talks. These assembly spaces, introduced at the 2012 meeting, combine the hack center project space and distributed group spaces of former years. From 1997 to 2004 the congress also hosted the annual German Lockpicking Championships. 2005 was the first year the Congress lasted four days instead of three and lacked the German Lockpicking Championships. 2020 was the first year where the Congress did not take place at a physical location due to the COVID-19 pandemic, giving way to the first Remote Chaos Experience (rC3). The Chaos Computer Club announced to return to the now newly renovated Congress Center Hamburg for the 37th edition of the Chaos Communication Congress. The announcement confirms the usual date of 27-30 December, notably omitting the year it will be held. On 18 October 2022, they confirmed that the congress will indeed not be held in 2022. On 6 October 2023, the CCC announced that 37C3 will take place again on the usual dates in 2023. === Timeline ===

    Read more →