AI Google Grammar Checker

AI Google Grammar Checker — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Free Studio

    Free Studio

    Free Studio is a freeware set of multimedia computer programs developed by DVDVideoSoft. The programs are available in one integrated package and also as separate downloads (Free Studio Manager is included in both). == Overview == The Free Studio software bundle consists of about 48 programs, grouped into several sections: YouTube, MP3 & Audio, CD-DVD-BD, DVD & Video, Photo & Images, Mobiles, Apple Devices, and 3D. The largest group is the DVD & Video section containing 14 different applications. Mobiles section is the second largest group with 13 programs. However, the YouTube section, particularly YouTube downloading programs, has gained more popularity among users. The programs have been tested and endorsed by a dozen of software portals and have won awards from these sites. Free Studio is most popular in Germany, Greece, Italy, and the United States. It is also popular in Japan, France, and the United Kingdom. Some of the programs in the package are free and open-source software. == History == DVDVideoSoft project was launched in 2006 by company Digital Wave Ltd., for software development to produce multimedia application software. The founders distributed paid software as an affiliate at the start, later their own products appeared on the site. Free YouTube Download was the first successful program, then DVDVideoSoft created and launched several other 'Free YouTube' applications. Later on upon users' requests DVDVideoSoft started developing other kinds of applications including media converters etc. Today DVDVideoSoft offers up to 49 different programs for video, audio and image processing individually or integrated into the Free Studio package. == Features == DVDVideoSoft YouTube programs can be used to download YouTube videos in their original format and convert them to AVI, DVD, MP4, WMV etc. or different audio formats. YouTube section contains Free Video Call Recorder for Skype button, but the program itself is not included into FS installation (it has to be downloaded and installed separately). The "MP3 & Audio" section consists of the programs which convert audio files between different formats, convert audio files to Flash for web, extract audio from video files, edit audio files (Free Audio Dub), rip and burn CDs. Enclosed in the CD-DVD-BD section are the applications that enable users to burn files and folders to discs, to convert videos to a DVD format and vice versa, to burn CDs, and to copy music from audio CDs into files. The "DVD and Video" section contains several desktop video and DVD converters. Some of the programs can flip, rotate and cut (Free Video Dub) videos. One of the most popular programs from the section is Free Video Dub. Converted videos are now, contrary to previous versions, watermarked if no paid membership is present. Free Studio includes several applications for Apple phones, iPods and other devices. The Mobiles section contains a dozen video converters for various mobile devices such as cell phones, Tablets and Game consoles. They convert videos to play them on (BlackBerry, HTC, LG phones, Sony/Sony Ericsson, Nintendo, Xbox, Motorola phones, etc.) The "Photo & Images" section incorporates the programs for image conversion and resizing, extracting JPEG frames from videos (Free Video To JPEG Converter), recording screen activities, making screenshots (Free Screen Recorder). The 3D section is composed of the programs to make 3D videos and 3D images. There are several algorithms which allow to create different types of 3D images. == Supported formats == === Video formats === Input: .avi; .ivf; .div; .divx; .mpg; .mpeg; .mpe; .mp4; .m4v; .wmv; .asf; .webm; .mkv; .mov; .qt; .ts; .mts; .m2t; .m2ts; .mod; .tod; .vro; .dat; .3gp2; .3gpp; .3gp; .3g2; .dvr-ms; .flv; .f4v; .amv; .rm; .rmm; .rv; .rmvb; .ogv; DVD video Output: .mp4; .wmv; .avi; .mkv; .webm; .flv; .swf; .mov; .3gp; .m2ts; DVD video === Audio formats === Input: .mp3 .wav; .aac; .m4a; .m4b; .wma; .ogg; .flac; .ra; .ram; .amr; .ape; .mka; .tta; .aiff; .au; .mpc; .spx; .ac3; audio cd Output: .mp3; .m4a; .aac; .wav; .wma; .ogg; .flac; .ape; audio CD === Image formats === Input: .jpg, .png, .bmp, .gif, .tga Output: .jpg, .png, .bmp, .gif, .tga, .pdf == Reception == The programs have been tested and endorsed by Chip Online, Tucows, SnapFiles, Brothersoft, and Softonic and have won awards from these sites. Free Studio is most popular in Germany, United States and Italy. It is also popular in Japan, France and the United Kingdom. The most popular applications, according to CNET statistics, include Free YouTube to MP3 Converter, Free Video to MP3 Converter, Free MP4 Video Converter and Free YouTube Download. Other programs with high rank: Free AVI Video Converter, Free Video Editor, Free Audio Converter and Free Studio in a whole. == Criticism == Free Studio (as can be common for freeware packages) is criticized for toolbar and Web search engine installation. Older versions have also included OpenCandy, which is loaded automatically, with no request for user approval. There can be difficulties installing only the programs needed without installing bundled extra programs. In March 2017, DVDVideoSoft announced that it had stopped showing other products' ads during installation and removed all toolbars, search engines, and OpenCandy.

    Read more →
  • Social media mining

    Social media mining

    Social media mining is the process of obtaining data from user-generated content on social media in order to extract actionable patterns, form conclusions about users, and act upon the information. Mining supports targeting advertising to users or academic research. The term is an analogy to the process of mining for minerals. Mining companies sift through raw ore to find the valuable minerals; likewise, social media mining sifts through social media data in order to discern patterns and trends about matters such as social media usage, online behaviour, content sharing, connections between individuals, buying behaviour. These patterns and trends are of interest to companies, governments and not-for-profit organizations, as such organizations can use the analyses for tasks such as design strategies, introduce programs, products, processes or services. Social media mining uses concepts from computer science, data mining, machine learning, and statistics. Mining is based on social network analysis, network science, sociology, ethnography, optimization and mathematics. It attempts to formally represent, measure and model patterns from social media data. In the 2010s, major corporations, governments and not-for-profit organizations began mining to learn about customers, clients and others. Platforms such as Google, Facebook (partnered with Datalogix and BlueKai) conduct mining to target users with advertising. Scientists and machine learning researchers extract insights and design product features. Users may not understand how platforms use their data. Users tend to click through Terms of Use agreements without reading them, leading to ethical questions about whether platforms adequately protect users' privacy. During the 2016 United States presidential election, Facebook allowed Cambridge Analytica, a political consulting firm linked to the Trump campaign, to analyze the data of an estimated 87 million Facebook users to profile voters, creating controversy when this was revealed. == Background == As defined by Kaplan and Haenlein, social media is the "group of internet-based applications that build on the ideological and technological foundations of Web 2.0, and that allow the creation and exchange of user-generated content." There are many categories of social media including, but not limited to, social networking (Facebook or LinkedIn), microblogging (Twitter), photo sharing (Flickr, Instagram, Photobucket, or Picasa), news aggregation (Google Reader, StumbleUpon, or Feedburner), video sharing (YouTube, MetaCafe), livecasting (Ustream or Twitch), virtual worlds (Kaneva), social gaming (World of Warcraft), social search (Google, Bing, or Ask.com), and instant messaging (Google Talk, Skype, or Yahoo! messenger). The first social media website was introduced by GeoCities in 1994. It enabled users to create their own homepages without having a sophisticated knowledge of HTML coding. The first social networking site, SixDegrees.com, was introduced in 1997. Since then, many other social media sites have been introduced, each providing service to millions of people. These individuals form a virtual world in which individuals (social atoms), entities (content, sites, etc.) and interactions (between individuals, between entities, between individuals and entities) coexist. Social norms and human behavior govern this virtual world. By understanding these social norms and models of human behavior and combining them with the observations and measurements of this virtual world, one can systematically analyze and mine social media. Social media mining is the process of representing, analyzing, and extracting meaningful patterns from data in social media, resulting from social interactions. It is an interdisciplinary field encompassing techniques from computer science, data mining, machine learning, social network analysis, network science, sociology, ethnography, statistics, optimization, and mathematics. Social media mining faces grand challenges such as the big data paradox, obtaining sufficient samples, the noise removal fallacy, and evaluation dilemma. Social media mining represents the virtual world of social media in a computable way, measures it, and designs models that can help us understand its interactions. In addition, social media mining provides necessary tools to mine this world for interesting patterns, analyze information diffusion, study influence and homophily, provide effective recommendations, and analyze novel social behavior in social media. == Uses == Social media mining is used across several industries including business development, social science research, health services, and educational purposes. Once the data received goes through social media analytics, it can then be applied to these various fields. Often, companies use the patterns of connectivity that pervade social networks, such as assortativity—the social similarity between users that are induced by influence, homophily, and reciprocity and transitivity. These forces are then measured via statistical analysis of the nodes and connections between these nodes. Social analytics also uses sentiment analysis, because social media users often relay positive or negative sentiment in their posts. This provides important social information about users' emotions on specific topics. These three patterns have several uses beyond pure analysis. For example, influence can be used to determine the most influential user in a particular network. Companies would be interested in this information in order to decide who they may hire for influencer marketing. These influencers are determined by recognition, activity generation, and novelty—three requirements that can be measured through the data mined from these sites. Analysts also value measures of homophily: the tendency of two similar individuals to become friends. Users have begun to rely on information of other users' opinions in order to understand diverse subject matter. These analyses can also help create recommendations for individuals in a tailored capacity. By measuring influence and homophily, online and offline companies are able to suggest specific products for individuals consumers, and groups of consumers. Social media networks can use this information themselves to suggest to their users possible friends to add, pages to follow, and accounts to interact with. == Perception == Modern social media mining is a controversial practice that has led to exponential gains in user growth for tech giants such as Facebook, Inc., Twitter, and Google. Companies such as these, considered "Big Tech" are companies that build algorithms that take advantage of user input to understand their preferences, and keep them on the platform as much as possible. These inputs, that can be as simple as time spent on a given screen, provide the data being mined, and lead to companies profiting heavily from using that data to capitalize on extremely accurate predictions about user behavior. The growth of platforms accelerated rapidly once these strategies were put in place; Most of the largest platforms now average over 1 billion active users per month as of 2021. It has been claimed by a multitude of anti-algorithm personalities, like Tristan Harris or Chamath Palihapitiya, that certain companies (specifically Facebook) valued growth above all else, and ignored potential negative impacts from these growth engineering tactics. At the same time, users have now created their own data arbitrages with the help of their own data, through content monetization and becoming influencers. Users typically have access to a varied set of analytics specific to people that interact with them on social media, and can use these as building blocks for their own targeting and growth strategies through ads and posts that cater to their audiences. Influencers also commonly promote products and services for established brands, creating one of the largest digital industries: Influencer marketing. Instagram, Facebook, Twitter, YouTube, Google, and others have long given access to platform analytics, and allowed third parties to access that information as well, at times unbeknownst to even the user whose data is being viewed/bought. == Research == === Research areas === Social media event detection – Social networks enable users to freely communicate with each other and share their recent news, ongoing activities or views about different topics. As a result, they can be seen as a potentially viable source of information to understand the current emerging topics/events. Public health monitoring and surveillance - Using large-scale analysis of social media to study large cohorts of patients and the general public, e.g. to obtain early warning signals of drug-drug interactions and adverse drug reactions, or understand human reproduction and sexual interest. Community structure (Community Detection/Evolution/Evaluation) – Identifying communities on social networks, how t

    Read more →
  • Data stream management system

    Data stream management system

    A data stream management system (DSMS) is a computer software system to manage continuous data streams. It is similar to a database management system (DBMS), which is, however, designed for static data in conventional databases. A DBMS also offers a flexible query processing so that the information needed can be expressed using queries. However, in contrast to a DBMS, a DSMS executes a continuous query that is not only performed once, but is permanently installed. Therefore, the query is continuously executed until it is explicitly uninstalled. Since most DSMS are data-driven, a continuous query produces new results as long as new data arrive at the system. This basic concept is similar to complex event processing so that both technologies are partially coalescing. == Functional principle == One important feature of a DSMS is the possibility to handle potentially infinite and rapidly changing data streams by offering flexible processing at the same time, although there are only limited resources such as main memory. The following table provides various principles of DSMS and compares them to traditional DBMS. == Processing and streaming models == One of the biggest challenges for a DSMS is to handle potentially infinite data streams using a fixed amount of memory and no random access to the data. There are different approaches to limit the amount of data in one pass, which can be divided into two classes. For the one hand, there are compression techniques that try to summarize the data and for the other hand there are window techniques that try to portion the data into (finite) parts. === Synopses === The idea behind compression techniques is to maintain only a synopsis of the data, but not all (raw) data points of the data stream. The algorithms range from selecting random data points called sampling to summarization using histograms, wavelets or sketching. One simple example of a compression is the continuous calculation of an average. Instead of memorizing each data point, the synopsis only holds the sum and the number of items. The average can be calculated by dividing the sum by the number. However, it should be mentioned that synopses cannot reflect the data accurately. Thus, a processing that is based on synopses may produce inaccurate results. === Windows === Instead of using synopses to compress the characteristics of the whole data streams, window techniques only look on a portion of the data. This approach is motivated by the idea that only the most recent data are relevant. Therefore, a window continuously cuts out a part of the data stream, e.g. the last ten data stream elements, and only considers these elements during the processing. There are different kinds of such windows like sliding windows that are similar to FIFO lists or tumbling windows that cut out disjoint parts. Furthermore, the windows can also be differentiated into element-based windows, e.g., to consider the last ten elements, or time-based windows, e.g., to consider the last ten seconds of data. There are also different approaches to implementing windows. There are, for example, approaches that use timestamps or time intervals for system-wide windows or buffer-based windows for each single processing step. Sliding-window query processing is also suitable to being implemented in parallel processors by exploiting parallelism between different windows and/or within each window extent. == Query processing == Since there are a lot of prototypes, there is no standardized architecture. However, most DSMS are based on the query processing in DBMS by using declarative languages to express queries, which are translated into a plan of operators. These plans can be optimized and executed. A query processing often consists of the following steps. === Formulation of continuous queries === The formulation of queries is mostly done using declarative languages like SQL in DBMS. Since there are no standardized query languages to express continuous queries, there are a lot of languages and variations. However, most of them are based on SQL, such as the Continuous Query Language (CQL), StreamSQL and ESP. There are also graphical approaches where each processing step is a box and the processing flow is expressed by arrows between the boxes. The language strongly depends on the processing model. For example, if windows are used for the processing, the definition of a window has to be expressed. In StreamSQL, a query with a sliding window for the last 10 elements looks like follows: This stream continuously calculates the average value of "price" of the last 10 tuples, but only considers those tuples whose prices are greater than 100.0. In the next step, the declarative query is translated into a logical query plan. A query plan is a directed graph where the nodes are operators and the edges describe the processing flow. Each operator in the query plan encapsulates the semantic of a specific operation, such as filtering or aggregation. In DSMSs that process relational data streams, the operators are equal or similar to the operators of the Relational algebra, so that there are operators for selection, projection, join, and set operations. This operator concept allows the very flexible and versatile processing of a DSMS. === Optimization of queries === The logical query plan can be optimized, which strongly depends on the streaming model. The basic concepts for optimizing continuous queries are equal to those from database systems. If there are relational data streams and the logical query plan is based on relational operators from the Relational algebra, a query optimizer can use the algebraic equivalences to optimize the plan. These may be, for example, to push selection operators down to the sources, because they are not so computationally intensive like join operators. Furthermore, there are also cost-based optimization techniques like in DBMS, where a query plan with the lowest costs is chosen from different equivalent query plans. One example is to choose the order of two successive join operators. In DBMS this decision is mostly done by certain statistics of the involved databases. But, since the data of a data streams is unknown in advance, there are no such statistics in a DSMS. However, it is possible to observe a data stream for a certain time to obtain some statistics. Using these statistics, the query can also be optimized later. So, in contrast to a DBMS, some DSMS allows to optimize the query even during runtime. Therefore, a DSMS needs some plan migration strategies to replace a running query plan with a new one. === Transformation of queries === Since a logical operator is only responsible for the semantics of an operation but does not consist of any algorithms, the logical query plan must be transformed into an executable counterpart. This is called a physical query plan. The distinction between a logical and a physical operator plan allows more than one implementation for the same logical operator. The join, for example, is logically the same, although it can be implemented by different algorithms like a Nested loop join or a Sort-merge join. Notice, these algorithms also strongly depend on the used stream and processing model. Finally, the query is available as a physical query plan. === Execution of queries === Since the physical query plan consists of executable algorithms, it can be directly executed. For this, the physical query plan is installed into the system. The bottom of the graph (of the query plan) is connected to the incoming sources, which can be everything like connectors to sensors. The top of the graph is connected to the outgoing sinks, which may be for example a visualization. Since most DSMSs are data-driven, a query is executed by pushing the incoming data elements from the source through the query plan to the sink. Each time when a data element passes an operator, the operator performs its specific operation on the data element and forwards the result to all successive operators. == Examples == AURORA, StreamBase Systems, Inc. Archived 23 March 2009 at the Wayback Machine Hortonworks DataFlow IBM Streams NIAGARA Query Engine NiagaraST: A Research Data Stream Management System at Portland State University Odysseus, an open source Java-based framework for Data Stream Management Systems Pipeline DB PIPES Archived 24 December 2016 at the Wayback Machine, webMethods Business Events QStream SAS Event Stream Processing SQLstream STREAM StreamGlobe StreamInsight TelegraphCQ WSO2 Stream Processor

    Read more →
  • Scalable Coherent Interface

    Scalable Coherent Interface

    The Scalable Coherent Interface or Scalable Coherent Interconnect (SCI), is a high-speed interconnect standard for shared memory multiprocessing and message passing. The goal was to scale well, provide system-wide memory coherence and a simple interface; i.e. a standard to replace existing buses in multiprocessor systems with one with no inherent scalability and performance limitations. The IEEE Std 1596-1992, IEEE Standard for Scalable Coherent Interface (SCI) was approved by the IEEE standards board on March 19, 1992. It saw some use during the 1990s, but never became widely used and has been replaced by other systems from the early 2000s. == History == Soon after the Fastbus (IEEE 960) follow-on Futurebus (IEEE 896) project in 1987, some engineers predicted it would already be too slow for the high performance computing marketplace by the time it would be released in the early 1990s. In response, a "Superbus" study group was formed in November 1987. Another working group of the standards association of the Institute of Electrical and Electronics Engineers (IEEE) spun off to form a standard targeted at this market in July 1988. It was essentially a subset of Futurebus features that could be easily implemented at high speed, along with minor additions to make it easier to connect to other systems, such as VMEbus. Most of the developers had their background from high-speed computer buses. Representatives from companies in the computer industry and research community included Amdahl, Apple Computer, BB&N, Hewlett-Packard, CERN, Dolphin Server Technology, Cray Research, Sequent, AT&T, Digital Equipment Corporation, McDonnell Douglas, National Semiconductor, Stanford Linear Accelerator Center, Tektronix, Texas Instruments, Unisys, University of Oslo, University of Wisconsin. The original intent was a single standard for all buses in the computer. The working group soon came up with the idea of using point-to-point communication in the form of insertion rings. This avoided the lumped capacitance, limited physical length/speed of light problems and stub reflections in addition to allowing parallel transactions. The use of insertion rings is credited to Manolis Katevenis who suggested it at one of the early meetings of the working group. The working group for developing the standard was led by David B. Gustavson (chair) and David V. James (Vice Chair). David V. James was a major contributor for writing the specifications including the executable C-code. Stein Gjessing’s group at the University of Oslo used formal methods to verify the coherence protocol and Dolphin Server Technology implemented a node controller chip including the cache coherence logic. Different versions and derivatives of SCI were implemented by companies like Dolphin Interconnect Solutions, Convex, Data General AViiON (using cache controller and link controller chips from Dolphin), Sequent and Cray Research. Dolphin Interconnect Solutions implemented a PCI and PCI-Express connected derivative of SCI that provides non-coherent shared memory access. This implementation was used by Sun Microsystems for its high-end clusters, Thales Group and several others including volume applications for message passing within HPC clustering and medical imaging. SCI was often used to implement non-uniform memory access architectures. It was also used by Sequent Computer Systems as the processor memory bus in their NUMA-Q systems. Numascale developed a derivative to connect with coherent HyperTransport. == The standard == The standard defined two interface levels: The physical level that deals with electrical signals, connectors, mechanical and thermal conditions The logical level that describes the address space, data transfer protocols, cache coherence mechanisms, synchronization primitives, control and status registers, and initialization and error recovery facilities. This structure allowed new developments in physical interface technology to be easily adapted without any redesign on the logical level. Scalability for large systems is achieved through a distributed directory-based cache coherence model. (The other popular models for cache coherency are based on system-wide eavesdropping (snooping) of memory transactions – a scheme which is not very scalable.) In SCI each node contains a directory with a pointer to the next node in a linked list that shares a particular cache line. SCI defines a 64-bit flat address space (16 exabytes) where 16 bits are used for identifying a node (65,536 nodes) and 48 bits for address within the node (256 terabytes). A node can contain many processors and/or memory. The SCI standard defines a packet switched network. === Topologies === SCI can be used to build systems with different types of switching topologies from centralized to fully distributed switching: With a central switch, each node is connected to the switch with a ringlet (in this case a two-node ring). In distributed switching systems, each node can be connected to a ring of arbitrary length and either all or some of the nodes can be connected to two or more rings. The most common way to describe these multi-dimensional topologies is k-ary n-cubes (or tori). The SCI standard specification mentions several such topologies as examples. The 2-D torus is a combination of rings in two dimensions. Switching between the two dimensions requires a small switching capability in the node. This can be expanded to three or more dimensions. The concept of folding rings can also be applied to the Torus topologies to avoid any long connection segments. === Transactions === SCI sends information in packets. Each packet consists of an unbroken sequence of 16-bit symbols. The symbol is accompanied by a flag bit. A transition of the flag bit from 0 to 1 indicates the start of a packet. A transition from 1 to 0 occurs 1 (for echoes) or 4 symbols before the packet end. A packet contains a header with address command and status information, payload (from 0 through optional lengths of data) and a CRC check symbol. The first symbol in the packet header contains the destination node address. If the address is not within the domain handled by the receiving node, the packet is passed to the output through the bypass FIFO. In the other case, the packet is fed to a receive queue and may be transferred to a ring in another dimension. All packets are marked when they pass the scrubber (a node is established as scrubber when the ring is initialized). Packets without a valid destination address will be removed when passing the scrubber for the second time to avoid filling the ring with packets that would otherwise circulate indefinitely. === Cache coherence === Cache coherence ensures data consistency in multiprocessor systems. The simplest form applied in earlier systems was based on clearing the cache contents between context switches and disabling the cache for data that were shared between two or more processors. These methods were feasible when the performance difference between the cache and memory were less than one order of magnitude. Modern processors with caches that are more than two orders of magnitude faster than main memory would not perform anywhere near optimal without more sophisticated methods for data consistency. Bus based systems use eavesdropping (snooping) methods since buses are inherently broadcast. Modern systems with point-to point links use broadcast methods with snoop filter options to improve performance. Since broadcast and eavesdropping are inherently non-scalable, these are not used in SCI. Instead, SCI uses a distributed directory-based cache coherence protocol with a linked list of nodes containing processors that share a particular cache line. Each node holds a directory for the main memory of the node with a tag for each line of memory (same line length as the cache line). The memory tag holds a pointer to the head of the linked list and a state code for the line (three states – home, fresh, gone). Associated with each node is also a cache for holding remote data with a directory containing forward and backward pointers to nodes in the linked list sharing the cache line. The tag for the cache has seven states (invalid, only fresh, head fresh, only dirty, head dirty, mid valid, tail valid). The distributed directory is scalable. The overhead for the directory based cache coherence is a constant percentage of the node’s memory and cache. This percentage is in the order of 4% for the memory and 7% for the cache. == Legacy == SCI is a standard for connecting the different resources within a multiprocessor computer system, and it is not as widely known to the public as for example the Ethernet family for connecting different systems. Different system vendors implemented different variants of SCI for their internal system infrastructure. These different implementations interface to very intricate mechanisms in processors and memory systems and each vendor has to preserve some degrees of

    Read more →
  • List of Haskell software and tools

    List of Haskell software and tools

    This is a list of Haskell software and tools, including compilers, interpreters, build tools, package managers, integrated development environments, libraries, and other development utilities. == Compilers, interpreters and editors == Emacs — text editor Glasgow Haskell Compiler (GHC) Hugs — bytecode interpreter (discontinued) IntelliJ IDEA — IDE with Haskell support via plugins Vim — text editor Visual Studio Code — editor/IDE with Haskell support via extensions == Libraries and frameworks == Parsec — parser combinator library Servant — web framework Yesod — web framework == Build tools and package management == Cabal — build system and packaging infrastructure Haskell Platform — bundled distribution of Haskell tools and libraries (deprecated) Stack — build tool and dependency manager == Language tools and static analysis == Fourmolu — code formatter based on Ormolu Haskell Language Server — implementation of the Language Server Protocol for Haskell HLint — source code suggestion and linting tool Hoogle — Haskell API search engine Ormolu — code formatter Stan — static analysis tool Stylish Haskell — source code formatter == Interactive environments == GHCi — interactive REPL for the Glasgow Haskell Compiler IHaskell — Jupyter kernel for Haskell == Debugging and profiling tools == hp2ps — heap profiling visualization tool ThreadScope — parallel execution visualizer for Haskell programs == Documentation generators == Haddock — API documentation generator for Haskell == Parser and lexer generators == Alex — lexer generator for Haskell Happy — parser generator for Haskell == Testing frameworks == HUnit — unit testing framework QuickCheck — property-based testing library == Version control == Darcs — distributed version control system written in Haskell

    Read more →
  • Sentiment analysis

    Sentiment analysis

    Sentiment analysis (also known as opinion mining) is the use of natural language processing, text analysis, computational linguistics, and biometrics to systematically identify, extract, quantify, and study affective states and subjective information. Sentiment analysis is widely applied to voice of the customer materials such as reviews and survey responses, online and social media, and healthcare materials for applications that range from marketing to customer service to clinical medicine. With the rise of deep language models, such as RoBERTa, more difficult data domains can be analyzed, e.g., news texts where authors typically express their opinion/sentiment less explicitly. == Types == A basic task in sentiment analysis is classifying the polarity of a given text at the document, sentence, or feature/aspect level—whether the expressed opinion in a document, a sentence or an entity feature/aspect is positive, negative, or neutral. Advanced, "beyond polarity" sentiment classification looks, for instance, at emotional states such as enjoyment, anger, disgust, sadness, fear, and surprise. Precursors to sentimental analysis include the General Inquirer, which provided hints toward quantifying patterns in text and, separately, psychological research that examined a person's psychological state based on analysis of their verbal behavior. Subsequently, the method described in a patent by Volcani and Fogel, looked specifically at sentiment and identified individual words and phrases in text with respect to different emotional scales. A current system based on their work, called EffectCheck, presents synonyms that can be used to increase or decrease the level of evoked emotion in each scale. Many other subsequent efforts were less sophisticated, using a mere polar view of sentiment, from positive to negative, such as work by Turney, and Pang who applied different methods for detecting the polarity of product reviews and movie reviews respectively. This work is at the document level. One can also classify a document's polarity on a multi-way scale, which was attempted by Pang and Snyder among others: Pang and Lee expanded the basic task of classifying a movie review as either positive or negative to predict star ratings on either a 3- or a 4-star scale, while Snyder performed an in-depth analysis of restaurant reviews, predicting ratings for various aspects of the given restaurant, such as the food and atmosphere (on a five-star scale). First steps to bringing together various approaches—learning, lexical, knowledge-based, etc.—were taken in the 2004 AAAI Spring Symposium where linguists, computer scientists, and other interested researchers first aligned interests and proposed shared tasks and benchmark data sets for the systematic computational research on affect, appeal, subjectivity, and sentiment in text. Even though in most statistical classification methods, the neutral class is ignored under the assumption that neutral texts lie near the boundary of the binary classifier, several researchers suggest that, as in every polarity problem, three categories must be identified. Moreover, it can be proven that specific classifiers such as the Max Entropy and SVMs can benefit from the introduction of a neutral class and improve the overall accuracy of the classification. There are in principle two ways for operating with a neutral class. Either, the algorithm proceeds by first identifying the neutral language, filtering it out and then assessing the rest in terms of positive and negative sentiments, or it builds a three-way classification in one step. This second approach often involves estimating a probability distribution over all categories (e.g. naive Bayes classifiers as implemented by the NLTK). Whether and how to use a neutral class depends on the nature of the data: if the data is clearly clustered into neutral, negative and positive language, it makes sense to filter the neutral language out and focus on the polarity between positive and negative sentiments. If, in contrast, the data are mostly neutral with small deviations towards positive and negative affect, this strategy would make it harder to clearly distinguish between the two poles. A different method for determining sentiment is the use of a scaling system whereby words commonly associated with having a negative, neutral, or positive sentiment are given an associated number on a −10 to +10 scale (most negative up to most positive) or simply from 0 to a positive upper limit such as +4. This makes it possible to adjust the sentiment of a given term relative to its environment (usually on the level of the sentence). When a piece of unstructured text is analyzed using natural language processing, each concept in the specified environment is given a score based on the way sentiment words relate to the concept and its associated score. This allows movement to a more sophisticated understanding of sentiment, because it is now possible to adjust the sentiment value of a concept relative to modifications that may surround it. Words, for example, that intensify, relax or negate the sentiment expressed by the concept can affect its score. Alternatively, texts can be given a positive and negative sentiment strength score if the goal is to determine the sentiment in a text rather than the overall polarity and strength of the text. There are various other types of sentiment analysis, such as aspect-based sentiment analysis, grading sentiment analysis (positive, negative, neutral), multilingual sentiment analysis and detection of emotions. === Subjectivity/objectivity identification === This task is commonly defined as classifying a given text (usually a sentence) into one of two classes: objective or subjective. This problem can sometimes be more difficult than polarity classification. The subjectivity of words and phrases may depend on their context and an objective document may contain subjective sentences (e.g., a news article quoting people's opinions). Moreover, as mentioned by Su, results are largely dependent on the definition of subjectivity used when annotating texts. However, Pang showed that removing objective sentences from a document before classifying its polarity helped improve performance. Subjective and objective identification, emerging subtasks of sentiment analysis to use syntactic, semantic features, and machine learning knowledge to identify if a sentence or document contains facts or opinions. Awareness of recognizing factual and opinions is not recent, having possibly first presented by Carbonell at Yale University in 1979. The term objective refers to the incident carrying factual information. Example of an objective sentence: 'To be elected president of the United States, a candidate must be at least thirty-five years of age.' The term subjective describes the incident contains non-factual information in various forms, such as personal opinions, judgment, and predictions, also known as 'private states'. In the example down below, it reflects a private states 'We Americans'. Moreover, the target entity commented by the opinions can take several forms from tangible product to intangible topic matters stated in Liu (2010). Furthermore, three types of attitudes were observed by Liu (2010), 1) positive opinions, 2) neutral opinions, and 3) negative opinions. Example of a subjective sentence: 'We Americans need to elect a president who is mature and who is able to make wise decisions.' This analysis is a classification problem. Each class's collections of words or phrase indicators are defined for to locate desirable patterns on unannotated text. For subjective expression, a different word list has been created. Lists of subjective indicators in words or phrases have been developed by multiple researchers in the linguist and natural language processing field states in Riloff et al. (2003). A dictionary of extraction rules has to be created for measuring given expressions. Over the years, in subjective detection, the features extraction progression from curating features by hand to automated features learning. At the moment, automated learning methods can further separate into supervised and unsupervised machine learning. Patterns extraction with machine learning process annotated and unannotated text have been explored extensively by academic researchers. However, researchers recognized several challenges in developing fixed sets of rules for expressions respectably. Much of the challenges in rule development stems from the nature of textual information. Six challenges have been recognized by several researchers: 1) metaphorical expressions, 2) discrepancies in writings, 3) context-sensitive, 4) represented words with fewer usages, 5) time-sensitive, and 6) ever-growing volume. Metaphorical expressions. The text contains metaphoric expression may impact on the performance on the extraction. Besides, metaphors take in different forms, which may have been contribu

    Read more →
  • Link encryption

    Link encryption

    Link encryption is an approach to communications security that encrypts and decrypts all network traffic at each network routing point (e.g. network switch, or node through which it passes) until arrival at its final destination. This repeated decryption and encryption is necessary to allow the routing information contained in each transmission to be read and employed further to direct the transmission toward its destination, before which it is re-encrypted. This contrasts with end-to-end encryption where internal information, but not the header/routing information, is encrypted by the sender at the point of origin and only decrypted by the intended recipient. Link encryption offers two main advantages: encryption is automatic so there is less opportunity for human error. if the communications link operates continuously and carries an unvarying level of traffic, link encryption defeats traffic analysis. On the other hand, end-to-end encryption ensures only the intended recipient has access to the plaintext. Link encryption can be used with end-to-end systems by superencrypting the messages. Bulk encryption refers to encrypting a large number of circuits at once, after they have been multiplexed.

    Read more →
  • Data recovery

    Data recovery

    In computing, data recovery is a process of retrieving deleted, inaccessible, lost, corrupted, damaged, or overwritten data from secondary storage, removable media or files, when the data stored in them cannot be accessed in a usual way. The data is most often salvaged from storage media such as internal or external hard disk drives (HDDs), solid-state drives (SSDs), USB flash drives, magnetic tapes, CDs, DVDs, RAID subsystems, and other electronic devices. Recovery may be required due to physical damage to the storage devices or logical damage to the file system that prevents it from being mounted by the host operating system (OS). Logical failures occur when the hard drive devices are functional but the user or automated-OS cannot retrieve or access data stored on them. Logical failures can occur due to corruption of the engineering chip, lost partitions, firmware failure, or failures during formatting/re-installation. Data recovery can be a very simple or technical challenge. This is why there are specific software companies specialized in this field that help to get back data on your system. == About == The most common data recovery scenarios involve an operating system failure, malfunction of a storage device, logical failure of storage devices, accidental damage or deletion, etc. (typically, on a single-drive, single-partition, single-OS system), in which case the ultimate goal is simply to copy all important files from the damaged media to another new drive. This can be accomplished using a Live CD, or DVD by booting directly from a ROM or a USB drive instead of the corrupted drive in question. Many Live CDs or DVDs provide a means to mount the system drive and backup drives or removable media, and to move the files from the system drive to the backup media with a file manager or optical disc authoring software. Such cases can often be mitigated by disk partitioning and consistently storing valuable data files (or copies of them) on a different partition from the replaceable OS system files. Another scenario involves a drive-level failure, such as a compromised file system or drive partition, or a hard disk drive failure. In any of these cases, the data is not easily read from the media devices. Depending on the situation, solutions involve repairing the logical file system, partition table, or master boot record, or updating the firmware or drive recovery techniques ranging from software-based recovery of corrupted data, to hardware- and software-based recovery of damaged service areas (also known as the hard disk drive's "firmware"), to hardware replacement on a physically damaged drive which allows for the extraction of data to a new drive. If a drive recovery is necessary, the drive itself has typically failed permanently, and the focus is rather on a one-time recovery, salvaging whatever data can be read. In a third scenario, files have been accidentally "deleted" from a storage medium by the users. Typically, the contents of deleted files are not removed immediately from the physical drive; instead, references to them in the directory structure are removed, and thereafter space the deleted data occupy is made available for later data overwriting. In the mind of end users, deleted files cannot be discoverable through a standard file manager, but the deleted data still technically exists on the physical drive. In the meantime, the original file contents remain, often several disconnected fragments, and may be recoverable if not overwritten by other data files. The term "data recovery" is also used in the context of forensic applications or espionage, where data which have been encrypted, hidden, or deleted, rather than damaged, are recovered. Sometimes data present in the computer gets encrypted or hidden due to reasons like virus attacks which can only be recovered by some computer forensic experts. == Physical damage == A wide variety of failures can cause physical damage to storage media, which may result from human errors and natural disasters. CD-ROMs can have their metallic substrate or dye layer scratched off; hard disks can suffer from a multitude of mechanical failures, such as head crashes, PCB failure, and failed motors; tapes can simply break. Physical damage to a hard drive, even in cases where a head crash has occurred, does not necessarily mean permanent data loss. However, in extreme cases, such as prolonged exposure to moisture and corrosion —like the lost Bitcoin hard drive of James Howells, buried in the Newport landfill for over a decade — recovery is usually impossible. In rare cases, forensic techniques such as magnetic force microscopy (MFM) have been explored to detect residual magnetic traces when data holds exceptional value. Other techniques employed by many professional data recovery companies can typically salvage most, if not all, of the data that had been lost when the failure occurred. Of course, there are exceptions to this, such as cases where severe damage to the hard drive platters may have occurred. However, if the hard drive can be repaired and a full image or clone created, then the logical file structure can be rebuilt in most instances. Most physical damage cannot be repaired by end users. For example, opening a hard disk drive in a normal environment can allow airborne dust to settle on the platter and become caught between the platter and the read/write head. During normal operation, read/write heads float 3 to 6 nanometers above the platter surface, and the average dust particles found in a normal environment are typically around 30,000 nanometers in diameter. When these dust particles get caught between the read/write heads and the platter, they can cause new head crashes that further damage the platter and thus compromise the recovery process. Furthermore, end users generally do not have the hardware or technical expertise required to make these repairs. Consequently, data recovery companies are often employed to salvage important data with the more reputable ones using class 100 dust- and static-free cleanrooms. === Recovery techniques === Recovering data from physically damaged hardware can involve multiple techniques. Some damage can be repaired by replacing parts in the hard disk. This alone may make the disk usable, but there may still be logical damage. A specialized disk-imaging procedure is used to recover every readable bit from the surface. Once this image is acquired and saved on a reliable medium, the image can be safely analyzed for logical damage and will possibly allow much of the original file system to be reconstructed. ==== Hardware repair ==== A common misconception is that a damaged printed circuit board (PCB) may be simply replaced during recovery procedures by an identical PCB from a healthy drive. While this may work in rare circumstances on hard disk drives manufactured before 2003, it will not work on newer drives. Electronics boards of modern drives usually contain drive-specific adaptation data (generally a map of bad sectors and tuning parameters) and other information required to properly access data on the drive. Replacement boards often need this information to effectively recover all of the data. The replacement board may need to be reprogrammed. Some manufacturers (Seagate, for example) store this information on a serial EEPROM chip, which can be removed and transferred to the replacement board. Each hard disk drive has what is called a system area or service area; this portion of the drive, which is not directly accessible to the end user, usually contains drive's firmware and adaptive data that helps the drive operate within normal parameters. One function of the system area is to log defective sectors within the drive; essentially telling the drive where it can and cannot write data. The sector lists are also stored on various chips attached to the PCB, and they are unique to each hard disk drive. If the data on the PCB do not match what is stored on the platter, then the drive will not calibrate properly. In most cases the drive heads will click because they are unable to find the data matching what is stored on the PCB. == Logical damage == The term "logical damage" refers to situations in which the error is not a problem in the hardware and requires software-level solutions. === Corrupt partitions and file systems, media errors === In some cases, data on a hard disk drive can be unreadable due to damage to the partition table or file system, or to (intermittent) media errors. In the majority of these cases, at least a portion of the original data can be recovered by repairing the damaged partition table or file system using specialized data recovery software such as TestDisk; software like ddrescue can image media despite intermittent errors, and image raw data when there is partition table or file system damage. This type of data recovery can be performed by people without expertise in drive hardware as it requires no special physica

    Read more →
  • Software construction

    Software construction

    Software construction is the process of creating working software via coding and integration. The process includes unit and integration testing although does not include higher level testing such as system testing. Construction is an aspect of the software development lifecycle and is integrated in the various software development process models with varying focus on construction as an activity separate from other activities. In the waterfall model, a software development effort consists of sequential phases including requirements analysis, design, and planning which are prerequisites for starting construction. In an iterative model such as scrum, evolutionary prototyping, or extreme programming, construction as an activity that occurs concurrently or overlapping other activities. Construction planning may include defining the order in which components are created and integrated, the software quality management processes, and the allocation of tasks to teams and developers. To facilitate project management, numerous construction aspects can be measured; these include the amount of code developed, modified, reused, and destroyed, code complexity, code inspection statistics, faults-fixed and faults-found rates, and effort expended. These measurements can be useful for aspects such as ensuring quality and improving the process. == Activities == Construction includes many activities. === Coding === The following are a few of the key aspects of the coding activity: Naming Choice of name for each identifier. One study showed that the effort required to debug a program is minimized when variable names are between 10 and 16 characters. Logic Organization into statements and routines Highly cohesive routines proved to be less error prone than routines with lower cohesion. A study of 450 routines found that 50 percent of the highly cohesive routines were fault free compared to only 18 percent of routines with low cohesion. Another study of a different 450 routines found that routines with the highest coupling-to-cohesion ratios had 7 times as many errors as those with the lowest coupling-to-cohesion ratios and were 20 times as costly to fix. Although studies showed inconclusive results regarding the correlation between routine sizes and the rate of errors in them, but one study found that routines with fewer than 143 lines of code were 2.4 times less expensive to fix than larger routines. Another study showed that the code needed to be changed least when routines averaged 100 to 150 lines of code. Another study found that structural complexity and amount of data in a routine were correlated with errors regardless of its size. Interfaces between routines are some of the most error-prone areas of a program. One study showed that 39 percent of all errors were errors in communication between routines. Unused parameters are correlated with an increased error rate. In one study, only 17 to 29 percent of routines with more than one unreferenced variable had no errors, compared to 46 percent in routines with no unused variables. The number of parameters of a routine should be 7 at maximum as research has found that people generally cannot keep track of more than about seven chunks of information at once. One experiment showed that designs which access arrays sequentially, rather than randomly, result in fewer variables and fewer variable references. One experiment found that loops-with-exit are more comprehensible than other kinds of loops. Regarding the level of nesting in loops and conditionals, studies have shown that programmers have difficulty comprehending more than three levels of nesting. Control flow complexity has been shown to correlate with low reliability and frequent errors. Modularity Structuring and refactoring the code into classes, packages and other structures. When considering containment, the maximum number of data members in a class shouldn't exceed 7±2. Research has shown that this number is the number of discrete items a person can remember while performing other tasks. When considering inheritance, the number of levels in the inheritance tree should be limited. Deep inheritance trees have been found to be significantly associated with increased fault rates. When considering the number of routines in a class, it should be kept as small as possible. A study on C++ programs has found an association between the number of routines and the number of faults. A study by NASA showed that the putting the code into well-factored classes can double the code reusability compared to the code developed using functional design. Error handling Encoding logic to handle both planned and unplanned errors and exceptions. Resource management Managing computational resource use via exclusion mechanisms and discipline in accessing serially reusable resources, including threads or database locks. Security Prevention of code-level security breaches such as buffer overrun and array index overflow. Optimization Optimization while avoiding premature optimization. Documentation Both embedded in the code as comments and as external documents. === Integration === Integration is about combining separately constructed parts. Concerns include planning the sequence in which components will be integrated, creating scaffolding to support interim versions of the software, determining the degree of testing and quality work performed on components before they are integrated, and determining points in the project at which interim versions are tested. === Testing === Testing can reduce the time between when faulty logic is inserted in the code and when it is detected. In some cases, testing is performed after code has been written, but in test-first programming, test cases are created before code is written. Construction includes at least two forms of testing, often performed by the developer who wrote the code: unit testing and integration testing. === Reuse === Software reuse entails more than creating and using libraries. It requires formalizing the practice of reuse by integrating reuse processes and activities into the software life cycle. The tasks related to reuse in software construction during coding and testing may include: selection of the reusable code, evaluation of code or test re-usability, reporting reuse metrics. === Quality assurance === Techniques for ensuring quality as software is constructed include: Testing One study found that the average defect detection rates of Unit testing and integration testing are 30% and 35% respectively. Software inspection With respect to software inspection, one study found that the average defect detection rate of formal code inspections is 60%. Regarding the cost of finding defects, a study found that code reading detected 80% more faults per hour than testing. Another study shown that it costs six times more to detect design defects by using testing than by using inspections. A study by IBM showed that only 3.5 hours were needed to find a defect through code inspections versus 15–25 hours through testing. Microsoft has found that it takes 3 hours to find and fix a defect by using code inspections and 12 hours to find and fix a defect by using testing. In a 700 thousand lines program, it was reported that code reviews were several times as cost-effective as testing. Studies found that inspections result in 20% - 30% fewer defects per 1000 lines of code than less formal review practices and that they increase productivity by about 20%. Formal inspections will usually take 10% - 15% of the project budget and will reduce overall project cost. Researchers found that having more than 2 - 3 reviewers on a formal inspection doesn't increase the number of defects found, although the results seem to vary depending on the kind of material being inspected. Technical review With respect to technical review, one study found that the average defect detection rates of informal code reviews and desk checking are 25% and 40% respectively. Walkthroughs were found to have a defect detection rate of 20% - 40%, but were found also to be expensive especially when project pressures increase. Code reading was found by NASA to detect 3.3 defects per hour of effort versus 1.8 defects per hour for testing. It also finds 20% - 60% more errors over the life of the project than different kinds of testing. A study of 13 reviews about review meetings, found that 90% of the defects were found in preparation for the review meeting while only around 10% were found during the meeting. Static analysis With respect to Static analysis (IEEE1028), studies have shown that a combination of these techniques needs to be used to achieve a high defect detection rate. Other studies showed that different people tend to find different defects. One study found that the extreme programming practices of pair programming, desk checking, unit testing, integration testing, and regression testing can achieve a 90% defect detection rate. An experiment involving exper

    Read more →
  • Protecting Kids From Social Media Act

    Protecting Kids From Social Media Act

    Protecting Kids on Social Media Act or HB 1891 is an American law that was introduced by William Lamberth of Sumner County, Tennessee and was signed into law by Tennessee's governor on May 2, 2024. The bill requires social media websites such as X, YouTube, TikTok, Facebook and others to verify the age of users and if those users are under 18, they must have parental consent. == Progress == The law passed the Tennessee State Legislature with little opposition: the bill had only two no votes in the House from Aftyn Behn and Vincent B. Dixie, and it had zero no votes in the Senate. == Bill summary == Every social media company must verify the age of new users after the law takes effect, and if the user had created an account before the law took effect, they must verify the age of the person attempting to access the account within 14 days. If the new user or the user who originally owned an account is under 18 years of age, they must get parental consent and the third party or social media company must not retain the data from the age verification process or obtaining parental consent. Parents who are account holders of those under 18 can view the privacy settings, set daily time restrictions, and implement breaks during which the minor cannot access the account. The law is enforced by the Attorney General of Tennessee and went into effect on January 1, 2025. == Lawsuit == On October 3, 2024, the trade association NetChoice filed a lawsuit against Tennessee Attorney General Jonathan Skrmetti in the Middle District Court of Tennessee, claiming that the law violates the First Amendment. The Judge for the case is William L. Campbell Jr. An initial case management conference was originally scheduled for December 4, 2024, however it was delayed because of the Supreme Court case United States v. Skrmetti, recommending that the conference be delayed after January 20, 2025. On February 14, 2025, Judge Eli Richardson denied NetChoice's motion for a temporary restraining order because it would disrupt the status quo of the case.

    Read more →
  • Social Media Working Group Act of 2014

    Social Media Working Group Act of 2014

    The Social Media Working Group Act of 2014 (H.R. 4263) is a bill that would direct the United States Secretary of Homeland Security to establish within the United States Department of Homeland Security (DHS) a social media working group (the Group) to provide guidance and best practices to the emergency preparedness and response community on the use of social media technologies before, during, and after a terrorist attack. The bill was introduced into the United States House of Representatives during the 113th United States Congress. == Background == === Social media === Social media is the social interaction among people in which they create, share or exchange information and ideas in virtual communities and networks. Andreas Kaplan and Michael Haenlein define social media as "a group of Internet-based applications that build on the ideological and technological foundations of Web 2.0, and that allow the creation and exchange of user-generated content." Furthermore, social media depend on mobile and web-based technologies to create highly interactive platforms through which individuals and communities share, co-create, discuss, and modify user-generated content. They introduce substantial and pervasive changes to communication between organizations, communities, and individuals. Social media differ from traditional or industrial media in many ways, including quality, reach, frequency, usability, immediacy, and permanence. === Virtual Social Media Working Group === First responders have increasingly used social media in emergency response and recovery operations. Social media tools are used to connect with citizens after a disaster and share information. The Virtual Social Media Working group (VSMWG) is an online platform that gives advice to first responders on how to safely and effectively use social media in emergency response operations. The working group is made up of subject matter experts from across the U.S. It was created by DHS in December 2010 and gives first responders guidance and best practices regarding the use of social media during emergencies. The DHS S&T and the VSMWG work with local and state governments, academics and nonprofits. Meetings of the VSMWG are chaired by the Under Secretary of Homeland Security for Science and Technology. == Provisions of the bill == This summary is based largely on the summary provided by the Congressional Research Service, a public domain source. The Social Media Working Group Act of 2014 would amend the Homeland Security Act of 2002 to direct the United States Secretary of Homeland Security to establish within the United States Department of Homeland Security (DHS) a social media working group (the Group) to provide guidance and best practices to the emergency preparedness and response community on the use of social media technologies before, during, and after a terrorist attack. The bill would require the Group to submit an annual report that includes: (1) a review of current and emerging social media technologies being used to support preparedness and response activities related to terrorist attacks, of best practices and lessons learned on the use of social media during the response to terrorist attacks that occurred during the period covered by the report, and of available training for government officials on the use of social media in response to a terrorist attack; (2) recommendations to improve DHS's use of social media and to improve information sharing among DHS and its components and among state and local governments; and (3) a summary of coordination efforts with the private sector to discuss and resolve legal, operational, technical, privacy, and security concerns. == Congressional Budget Office report == This summary is based largely on the summary provided by the Congressional Budget Office, as ordered reported by the House Committee on Homeland Security on June 11, 2014. This is a public domain source. H.R. 4263 would direct the Department of Homeland Security (DHS) to establish a working group to provide guidance and best practices on the use of social media technologies, specifically during a terrorist attack or other emergency. The group would prepare guidance for the emergency preparedness and response community. The bill would define the membership of the working group, which would include more than 20 experts from federal, state, local, and tribal governments along with nongovernmental organizations. The working group would be exempt from the Federal Advisory Committee Act and would be authorized to hold virtual meetings to fulfill the requirement to meet twice a year. The working group would be required to submit an annual report on emerging trends and best practices for emergency response through social media. Based on the cost of similar activities carried out under the DHS Acquisition and Accountability Efficiency Act and the Critical Infrastructure Research and Development Advancement Act of 2013, the Congressional Budget Office (CBO) estimates that the new DHS responsibilities and the annual report required by H.R. 4263 would cost a total of less than $500,000 annually, assuming the availability of appropriated funds. Enacting the legislation would not affect direct spending or revenues; therefore, pay-as-you-go procedures do not apply. H.R. 4263 contains no intergovernmental or private-sector mandates as defined in the Unfunded Mandates Reform Act and would impose no costs on state, local, or tribal governments. == Procedural history == The Social Media Working Group Act of 2014 was introduced into the United States House of Representatives on March 14, 2014, by Rep. Susan W. Brooks (R, IN-5). It was referred to the United States House Committee on Homeland Security and the United States House Homeland Security Subcommittee on Emergency Preparedness, Response, and Communications. On June 19, 2014, it was reported (amended) alongside House Report 113-480. On July 8, 2014, the House voted in Roll Call Vote 369 to pass the bill 375–19. == Debate and discussion == Nate Elliott, a social media expert at Forrester Research, explains that "the hope is when government or another authority tweets something, people will share it for them," but that this often doesn't happen. This problem, that "messages wash away very quickly," is the reason that the federal government is trying to formulate a better social media strategy. Rep. Steven Palazzo (R-MS), who co-sponsored the bill, stated that "social media has played a crucial role in emergency preparedness and response in Mississippi, including during disasters like Hurricane Isaac and the tornadoes that hit the Hattiesburg area a little over a year ago." He said that their goal with the bill was to "build upon existing public-private partnerships and use social media in a more strategic way in order to help save lives and property."

    Read more →
  • Cryptographic Service Provider

    Cryptographic Service Provider

    A cryptographic service provider (CSP) is a package that "provides a concrete implementation of certain cryptographic services." A CSP offers operations and protocols to support a variety of use cases. The cryptographic application programming interface (API) provided by the CSP provides common solutions for different platforms, for example hardware and cloud services. == Microsoft Windows == In Microsoft Windows, a Cryptographic Service Provider is a software library that implements the Microsoft CryptoAPI (CAPI). CSPs implement encoding and decoding functions, which computer application programs may use, for example, to implement strong user authentication or for secure email. CSPs are independent modules that can be used by different applications. A user program calls CryptoAPI functions and these are redirected to CSPs functions. Since CSPs are responsible for implementing cryptographic algorithms and standards, applications do not need to be concerned about security details. Furthermore, each application can define which CSP it is going to use on its calls to CryptoAPI. In fact, all cryptographic activity is implemented in CSPs. CryptoAPI only works as a bridge between the application and the CSP. CSPs are implemented basically as a special type of DLL with special restrictions on loading and use. Every CSP must be digitally signed by Microsoft and the signature is verified when Windows loads the CSP. In addition, after being loaded, Windows periodically re-scans the CSP to detect tampering, either by malicious software such as computer viruses or by the user him/herself trying to circumvent restrictions (for example on cryptographic key length) that might be built into the CSP's code. To obtain a signature, non-Microsoft CSP developers must supply paperwork to Microsoft promising to obey various legal restrictions and giving valid contact information. As of circa 2000, Microsoft did not charge any fees to supply these signatures. For development and testing purposes, a CSP developer can configure Windows to recognize the developer's own signatures instead of Microsoft's, but this is a somewhat complex and obscure operation unsuitable for nontechnical end users. The CAPI/CSP architecture had its origins in the era of restrictive US government controls on the export of cryptography. Microsoft's default or "base" CSP then included with Windows was limited to 512-bit RSA public-key cryptography and 40-bit symmetric cryptography, the maximum key lengths permitted in exportable mass market software at the time. CSPs implementing stronger cryptography were available only to U.S. residents, unless the CSPs themselves had received U.S. government export approval. The system of requiring CSPs to be signed only on presentation of completed paperwork was intended to prevent the easy spread of unauthorized CSPs implemented by anonymous or foreign developers. As such, it was presented as a concession made by Microsoft to the government, in order to get export approval for the CAPI itself. After the Bernstein v. United States court decision establishing computer source code as protected free speech and the transfer of cryptographic regulatory authority from the U.S. State Department to the more pro-export Commerce Department, the restrictions on key lengths were dropped, and the CSPs shipped with Windows now include full-strength cryptography. The main use of third-party CSPs is to interface with external cryptography hardware such as hardware security modules (HSM) or smart cards. === Smart Card CSP === These cryptographic functions can be realized by a smart card, thus the Smart Card CSP is the Microsoft way of a PKCS#11. Microsoft Windows is identifying the correct Smart Card CSP, which have to be used, analyzing the answer to reset (ATR) of the smart card, which is registered in the Windows Registry. Installing a new CSP, all ATRs of the supported smart cards are enlisted in the registry. === Use of CSP in MS Office password protection === Cryptographic service providers can be used for encryption of Word, Excel, and PowerPoint documents starting from Microsoft Office XP. A standard encryption algorithm with a 40-bit key is used by default, but enabling a CSP enhances key length and thus makes decryption process more continuous. This only applies to passwords that are required to open document because this password type is the only one that encrypts a password-protected document.

    Read more →
  • Toad Data Modeler

    Toad Data Modeler

    Toad Data Modeler is a database design tool allowing users to visually create, maintain, and document new or existing database systems, and to deploy changes to data structures across different platforms. It is used to construct logical and physical data models, compare and synchronize models, generate complex SQL/DDL, create and modify scripts, and reverse and forward engineer databases and data warehouse systems. Toad's data modelling software is used for database design, maintenance and documentation. == Product History == Toad Data Modeler was previously called "CASE Studio 2" before it was acquired from Charonware by Quest Software in 2006. Quest Software was acquired by Dell on September 28, 2012. On October 31, 2016, Dell finalized the sale of Dell Software to Francisco Partners and Elliott Management, which relaunched on November 1, 2016 as Quest Software. == Features/Usages == Multiple database support - Connect multiple databases natively and simultaneously, including Oracle, SAP, MySQL, SQL Server, PostgreSQL, Db2, Ingres, and Microsoft Access. Data modelling tool - Create database structures or make changes to existing models automatically and provide documentation on multiple platforms. Logical and physical modelling - Build complex logical and physical entity relationship models and reverse, forward, and engineer databases. Reporting - Generate detailed reports on existing database structures. Model customization - Add logical data to user diagrams to customize user models. All Toad products typically have 2 releases per year. == Other features == Model Actions (Compare Models, Convert Model, Merge Models, Generate Change Script) Version Control System (Apache Subversion) Naming Conventions Auto Layout Multiple Workspaces Scripting and Customization Automation Object Gallery Full Unicode Support Integration with Toad for Oracle == Related Software == Erwin Data Modeler Oracle SAP MySQL SQL Server PostgreSQL IBM Db2 Ingres Microsoft Access

    Read more →
  • Information leakage

    Information leakage

    Information leakage happens whenever a system that is designed to be closed to an eavesdropper reveals some information to unauthorized parties nonetheless. In other words: Information leakage occurs when secret information correlates with, or can be correlated with, observable information. For example, when designing an encrypted instant messaging network, a network engineer without the capacity to crack encryption codes could see when messages are transmitted, even if he could not read them. == Risk vectors == A modern example of information leakage is the leakage of secret information via data compression, by using variations in data compression ratio to reveal correlations between known (or deliberately injected) plaintext and secret data combined in a single compressed stream. Another example is the key leakage that can occur when using some public-key systems when cryptographic nonce values used in signing operations are insufficiently random. Bad randomness cannot protect proper functioning of a cryptographic system, even in a benign circumstance, it can easily produce crackable keys that cause key leakage. Information leakage can sometimes be deliberate: for example, an algorithmic converter may be shipped that intentionally leaks small amounts of information, in order to provide its creator with the ability to intercept the users' messages, while still allowing the user to maintain an illusion that the system is secure. This sort of deliberate leakage is sometimes known as a subliminal channel. Generally, only very advanced systems employ defenses against information leakage. Following are the commonly implemented countermeasures : Use steganography to hide the fact that a message is transmitted at all. Use chaffing to make it unclear to whom messages are transmitted (but this does not hide from others the fact that messages are transmitted). For busy re-transmitting proxies, such as a Mixmaster node: randomly delay and shuffle the order of outbound packets - this will assist in disguising a given message's path, especially if there are multiple, popular forwarding nodes, such as are employed with Mixmaster mail forwarding. When a data value is no longer going to be used, erase it from the memory.

    Read more →
  • Data set (IBM mainframe)

    Data set (IBM mainframe)

    In the context of IBM mainframe computers in the IBM System/360 line and its successors, a data set (IBM preferred) or dataset is a computer file having a record organization. Use of this term began with, e.g., DOS/360 and OS/360, and is still used by their successors, including the current VSE and z/OS. Documentation for these systems historically preferred this term rather than file. A data set is typically stored on a direct access storage device (DASD) or magnetic tape, however unit record devices, such as punch card readers, card punches, line printers and page printers can provide input/output (I/O) for a data set (file). Data sets are not unstructured streams of bytes, but rather are organized in various logical record and block structures determined by the DSORG (data set organization), RECFM (record format), and other parameters. These parameters are specified at the time of the data set allocation (creation), for example with Job Control Language DD statements. Within a running program they are stored in the Data Control Block (DCB) or Access Control Block (ACB), which are data structures used to access data sets using access methods. Records in a data set may be fixed, variable, or “undefined” length. == Data set organization == For OS/360, the DCB's DSORG parameter specifies how the data set is organized. It may be CQ Queued Telecommunications Access Method (QTAM) in Message Control Program (MCP) CX Communications line group DA Basic Direct Access Method (BDAM) GS Graphics device for Graphics Access Method(GAM) IS Indexed Sequential Access Method (ISAM) MQ QTAM message queue in application PO Partitioned Organization PS Physical Sequential among others. Data sets on tape may only be DSORG=PS. The choice of organization depends on how the data is to be accessed, and in particular, how it is to be updated. Programmers utilize various access methods (such as QSAM or VSAM) in programs for reading and writing data sets. Access method depends on the given data set organization. == Record format (RECFM) == Regardless of organization, the physical structure of each record is essentially the same, and is uniform throughout the data set. This is specified in the DCB RECFM parameter. RECFM=F means that the records are of fixed length, specified via the LRECL parameter. RECFM=V specifies a variable-length record. V records when stored on media are prefixed by a Record Descriptor Word (RDW) containing the integer length of the record in bytes and flag bits. With RECFM=FB and RECFM=VB, multiple logical records are grouped together into a single physical block on tape or DASD. FB and VB are fixed-blocked, and variable-blocked, respectively. RECFM=U (undefined) is also variable length, but the length of the record is determined by the length of the block rather than by a control field. The BLKSIZE parameter specifies the maximum length of the block. RECFM=FBS could be also specified, meaning fixed-blocked standard, meaning all the blocks except the last one were required to be in full BLKSIZE length. RECFM=VBS, or variable-blocked spanned, means a logical record could be spanned across two or more blocks, with flags in the RDW indicating whether a record segment is continued into the next block and/or was continued from the previous one. This mechanism eliminates the need for using any "delimiter" byte value to separate records. Thus data can be of any type, including binary integers, floating-point, or characters, without introducing a false end-of-record condition. The data set is an abstraction of a collection of records, in contrast to files as unstructured streams of bytes. == Partitioned data set == A partitioned data set (PDS) is a data set containing multiple members, each of which holds a separate sub-data set, similar to a directory in other types of file systems. This type of data set is often used to hold load modules (old format bound executable programs), source program libraries (especially Assembler macro definitions), ISPF screen definitions, and Job Control Language. A PDS may be compared to a Zip file or COM Structured Storage. A Partitioned Data Set can only be allocated on a single volume and have a maximum size of 65,535 tracks. Besides members, a PDS contains also a directory. Each member can be accessed indirectly via the directory structure. Once a member is located, the data stored in that member are handled in the same manner as a PS (sequential) data set. Whenever a member is deleted, the space it occupied is unusable for storing other data. Likewise, if a member is re-written, it is stored in a new spot at the back of the PDS and leaves wasted “dead” space in the middle. The only way to recover “dead” space is to perform file compression. Compression, which is done using the IEBCOPY utility, moves all members to the front of the data space and leaves free usable space at the back. (Note that in modern parlance, this kind of operation might be called defragmentation or garbage collection; data compression nowadays refers to a different, more complicated concept.) PDS files can only reside on DASD, not on magnetic tape, in order to use the directory structure to access individual members. Partitioned data sets are most often used for storing multiple job control language files, utility control statements, and executable modules. An improvement of this scheme is a Partitioned Data Set Extended (PDSE or PDS/E, sometimes just libraries) introduced with DFSMSdfp for MVS/XA and MVS/ESA systems. A PDS/E library can store program objects or other types of members, but not both. BPAM cannot process a PDS/E containing program objects. PDS/E structure is similar to PDS and is used to store the same types of data. However, PDS/E files have a better directory structure which does not require pre-allocation of directory blocks when the PDS/E is defined (and therefore does not run out of directory blocks if not enough were specified). Also, PDS/E automatically stores members in such a way that compression operation is not needed to reclaim "dead" space. PDS/E files can only reside on DASD in order to use the directory structure to access individual members. == Generation Data Group == A Generation Data Group (GDG) is a group of non-VSAM data sets that are successive generations of historically-related data stored on an IBM mainframe (running OS/360 and its successors or DOS/360 and its successors). A GDG is usually cataloged. An individual member of the GDG collection is called a "Generation Data Set." The latter may be identified by an absolute number, ACCTG.OURGDG(1234), or a relative number: (-1) for the previous generation, (0) for the current one, and (+1) the next generation. A GDG specifies how many generations of a data set are to be kept and at what age a generation will be deleted. Whenever a new generation is created, the system checks whether one or more obsolete generations are to be deleted. The purpose of GDGs is to automate archival, using the command language JCL, the data set name given is generic. When DSN appears, the GDG data set appears along with the history number, where (0) is the most recent version (-1), (-2), ... are previous generations (+1) a new generation (see DD) Another use of GDGs is to be able to address all generations simultaneously within a JCL script without having to know the number of currently available generations. To do this, you have to omit the parentheses and the generation number in the JCL when specifying the dataset. === GDG JCL & features === Generation Data Groups are defined using either the BLDG statement of the IEHPROGM utility or the DEFINE GENERATIONGROUP statement of the newer IDCAMS utility, which allows setting various parameters. LIMIT(10) would limit the number of generations limit to 10. SCRATCH FOR (91) would retain each member, up to the limited#generations, at least 91 days. IDCAMS can also delete (and optionally uncatalog) a GDG. ==== Example ==== Creation of a standard GDG for five safety scopes, each at least 35 days old: Delete a standard GDG:

    Read more →