AI Analysis Youtube Video

AI Analysis Youtube Video — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Landweber iteration

    Landweber iteration

    The Landweber iteration or Landweber algorithm is an algorithm to solve ill-posed linear inverse problems, and it has been extended to solve non-linear problems that involve constraints. The method was first proposed in the 1950s by Louis Landweber, and it can be now viewed as a special case of many other more general methods. == Basic algorithm == The original Landweber algorithm attempts to recover a signal x from (noisy) measurements y. The linear version assumes that y = A x {\displaystyle y=Ax} for a linear operator A. When the problem is in finite dimensions, A is just a matrix. When A is nonsingular, then an explicit solution is x = A − 1 y {\displaystyle x=A^{-1}y} . However, if A is ill-conditioned, the explicit solution is a poor choice since it is sensitive to any noise in the data y. If A is singular, this explicit solution doesn't even exist. The Landweber algorithm is an attempt to regularize the problem, and is one of the alternatives to Tikhonov regularization. We may view the Landweber algorithm as solving: min x ‖ A x − y ‖ 2 2 / 2 {\displaystyle \min _{x}\|Ax-y\|_{2}^{2}/2} using an iterative method. The algorithm is given by the update x k + 1 = x k − ω A ∗ ( A x k − y ) . {\displaystyle x_{k+1}=x_{k}-\omega A^{}(Ax_{k}-y).} where the relaxation factor ω {\displaystyle \omega } satisfies 0 < ω < 2 / σ 1 2 {\displaystyle 0<\omega <2/\sigma _{1}^{2}} . Here σ 1 {\displaystyle \sigma _{1}} is the largest singular value of A {\displaystyle A} . If we write f ( x ) = ‖ A x − y ‖ 2 2 / 2 {\displaystyle f(x)=\|Ax-y\|_{2}^{2}/2} , then the update can be written in terms of the gradient x k + 1 = x k − ω ∇ f ( x k ) {\displaystyle x_{k+1}=x_{k}-\omega \nabla f(x_{k})} and hence the algorithm is a special case of gradient descent. For ill-posed problems, the iterative method needs to be stopped at a suitable iteration index, because it semi-converges. This means that the iterates approach a regularized solution during the first iterations, but become unstable in further iterations. The reciprocal of the iteration index 1 / k {\displaystyle 1/k} acts as a regularization parameter. A suitable parameter is found, when the mismatch ‖ A x k − y ‖ 2 2 {\displaystyle \|Ax_{k}-y\|_{2}^{2}} approaches the noise level. Using the Landweber iteration as a regularization algorithm has been discussed in the literature. == Nonlinear extension == In general, the updates generated by x k + 1 = x k − τ ∇ f ( x k ) {\displaystyle x_{k+1}=x_{k}-\tau \nabla f(x_{k})} will generate a sequence f ( x k ) {\displaystyle f(x_{k})} that converges to a minimizer of f whenever f is convex and the stepsize τ {\displaystyle \tau } is chosen such that 0 < τ < 2 / ( ‖ ∇ f ‖ 2 ) {\displaystyle 0<\tau <2/(\|\nabla f\|^{2})} where ‖ ⋅ ‖ {\displaystyle \|\cdot \|} is the spectral norm. Since this is special type of gradient descent, there currently is not much benefit to analyzing it on its own as the nonlinear Landweber, but such analysis was performed historically by many communities not aware of unifying frameworks. The nonlinear Landweber problem has been studied in many papers in many communities; see, for example. == Extension to constrained problems == If f is a convex function and C is a convex set, then the problem min x ∈ C f ( x ) {\displaystyle \min _{x\in C}f(x)} can be solved by the constrained, nonlinear Landweber iteration, given by: x k + 1 = P C ( x k − τ ∇ f ( x k ) ) {\displaystyle x_{k+1}={\mathcal {P}}_{C}(x_{k}-\tau \nabla f(x_{k}))} where P {\displaystyle {\mathcal {P}}} is the projection onto the set C. Convergence is guaranteed when 0 < τ < 2 / ( ‖ A ‖ 2 ) {\displaystyle 0<\tau <2/(\|A\|^{2})} . This is again a special case of projected gradient descent (which is a special case of the forward–backward algorithm) as discussed in. == Applications == Since the method has been around since the 1950s, it has been adopted and rediscovered by many scientific communities, especially those studying ill-posed problems. In X-ray computed tomography it is called simultaneous iterative reconstruction technique (SIRT). It has also been used in the computer vision community and the signal restoration community. It is also used in image processing, since many image problems, such as deconvolution, are ill-posed. Variants of this method have been used also in sparse approximation problems and compressed sensing settings.

    Read more →
  • Data lineage

    Data lineage

    Data lineage refers to the process of tracking how data is generated, transformed, transmitted and used across systems over time. It documents data's origins, transformations and movements, providing detailed visibility into its life cycle. This process simplifies the identification of errors in data analytics workflows, by enabling users to trace issues back to their root causes. Data lineage facilitates the ability to replay specific segments or inputs of the dataflow. This can be used in debugging or regenerating lost outputs. In database systems, this concept is closely related to data provenance, which involves maintaining records of inputs, entities, systems and processes that influence data. Data provenance provides a historical record of data origins and transformations. It supports forensic activities such as data-dependency analysis, error/compromise detection, recovery, auditing and compliance analysis: "Lineage is a simple type of why provenance." Data governance plays a critical role in managing metadata by establishing guidelines, strategies and policies. Enhancing data lineage with data quality measures and master data management adds business value. Although data lineage is typically represented through a graphical user interface (GUI), the methods for gathering and exposing metadata to this interface can vary. Based on the metadata collection approach, data lineage can be categorized into three types: Those involving software packages for structured data, programming languages and Big data systems. Data lineage information includes technical metadata about data transformations. Enriched data lineage may include additional elements such as data quality test results, reference data, data models, business terminology, data stewardship information, program management details and enterprise systems associated with data points and transformations. Data lineage visualization tools often include masking features that allow users to focus on information relevant to specific use cases. To unify representations across disparate systems, metadata normalization or standardization may be required. == Representation of data lineage == Representation broadly depends on the scope of the metadata management and reference point of interest. Data lineage provides sources of the data and intermediate data flow hops from the reference point with backward data lineage, leading to the final destination's data points and its intermediate data flows with forward data lineage. These views can be combined with end-to-end lineage for a reference point that provides a complete audit trail of that data point of interest from sources to their final destinations. As the data points or hops increase, the complexity of such representation becomes incomprehensible. Thus, the best feature of the data lineage view is the ability to simplify the view by temporarily masking unwanted peripheral data points. Tools with the masking feature enable scalability of the view and enhance analysis with the best user experience for both technical and business users. Data lineage also enables companies to trace sources of specific business data to track errors, implement changes in processes and implementing system migrations to save significant amounts of time and resources. Data lineage can improve efficiency in business intelligence BI processes. Data lineage can be represented visually to discover the data flow and movement from its source to destination via various changes and hops on its way in the enterprise environment. This includes how the data is transformed along the way, how the representation and parameters change and how the data splits or converges after each hop. A simple representation of the Data Lineage can be shown with dots and lines, where dots represent data containers for data points, and lines connecting them represent transformations the data undergoes between the data containers. Data lineage can be visualized at various levels based on the granularity of the view. At a very high-level, data lineage is visualized as systems that the data interacts with before it reaches its destination. At its most granular, visualizations at the data point level can provide the details of the data point and its historical behavior, attribute properties and trends and data quality of the data passed through that specific data point in the data lineage. The scope of the data lineage determines the volume of metadata required to represent its data lineage. Usually, data governance and data management of an organization determine the scope of the data lineage based on their regulations, enterprise data management strategy, data impact, reporting attributes and critical data elements of the organization. == Rationale == Distributed systems like Google Map Reduce, Microsoft Dryad, Apache Hadoop (an open-source project) and Google Pregel provide such platforms for businesses and users. However, even with these systems, Big Data analytics can take several hours, days or weeks to run, simply due to the data volumes involved. For example, a ratings prediction algorithm for the Netflix Prize challenge took nearly 20 hours to execute on 50 cores, and a large-scale image processing task to estimate geographic information took 3 days to complete using 400 cores. "The Large Synoptic Survey Telescope is expected to generate terabytes of data every night and eventually store more than 50 petabytes, while in the bioinformatics sector, the 12 largest genome sequencing houses in the world now store petabytes of data apiece. It is very difficult for a data scientist to trace an unknown or an unanticipated result. === Big data debugging === Big data analytics is the process of examining large data sets to uncover hidden patterns, unknown correlations, market trends, customer preferences and other useful business information. Machine learning, among other algorithms, is used to transform and analyze the data. Due to the large size of the data, there could be unknown features in the data. The massive scale and unstructured nature of data, the complexity of these analytics pipelines, and long runtimes pose significant manageability and debugging challenges. Even a single error in these analytics can be extremely difficult to identify and remove. While one may debug them by re-running the entire analytics through a debugger for stepwise debugging, this can be expensive due to the amount of time and resources needed. Auditing and data validation are other major problems due to the growing ease of access to relevant data sources for use in experiments, the sharing of data between scientific communities and use of third-party data in business enterprises. As such, more cost-efficient ways of analyzing data intensive scale-able computing (DISC) are crucial to their continued effective use. === Challenges in Big Data debugging === ==== Massive scale ==== According to an EMC/IDC study, 2.8 ZB of data were created and replicated in 2012. Furthermore, the same study states that the digital universe will double every two years between now and 2020, and that there will be approximately 5.2 TB of data for every person in 2020. Based on current technology, the storage of this much data will mean greater energy usage by data centers. ==== Unstructured data ==== Unstructured data usually refers to information that doesn't reside in a traditional row-column database. Unstructured data files often include text and multimedia content, such as e-mail messages, word processing documents, videos, photos, audio files, presentations, web pages and many other kinds of business documents. While these types of files may have an internal structure, they are still considered "unstructured" because the data they contain doesn't fit neatly into a database. The amount of unstructured data in enterprises is growing many times faster than structured databases are growing. Big data can include both structured and unstructured data, but IDC estimates that 90 percent of Big Data is unstructured data. The fundamental challenge of unstructured data sources is that they are difficult for non-technical business users and data analysts alike to unbox, understand and prepare for analytic use. Beyond issues of structure, the sheer volume of this type of data contributes to such difficulty. Because of this, current data mining techniques often leave out valuable information and make analyzing unstructured data laborious and expensive. In today's competitive business environment, companies have to find and analyze the relevant data they need quickly. The challenge is going through the volumes of data and accessing the level of detail needed, all at a high speed. The challenge only grows as the degree of granularity increases. One possible solution is hardware. Some vendors are using increased memory and parallel processing to crunch large volumes of data quickly. Another method is putting data in-memory but using a grid

    Read more →
  • Social search

    Social search

    Social search is a behavior of retrieving and searching on a social searching engine that mainly searches user-generated content such as news, videos and images related search queries on social media like Facebook, LinkedIn, Twitter, Instagram and Flickr. It is an enhanced version of web search that combines traditional algorithms. The idea behind social search is that instead of ranking search results purely based on semantic relevance between a query and the results, a social search system also takes into account social relationships between the results and the searcher. The social relationships could be in various forms. For example, in LinkedIn people search engine, the social relationships include social connections between searcher and each result, whether or not they are in the same industries, work for the same companies, belong the same social groups, and go the same schools, etc. Social search may not be demonstrably better than algorithm-driven search. In the algorithmic ranking model that search engines used in the past, relevance of a site is determined after analyzing the text and content on the page and link structure of the document. In contrast, search results with social search highlight content that was created or touched by other users who are in the Social Graph of the person conducting a search. It is a personalized search technology with online community filtering to produce highly personalized results. Social search takes many forms, ranging from simple shared bookmarks or tagging of content with descriptive labels to more sophisticated approaches that combine human intelligence with computer algorithms. Depending on the feature-set of a particular search engine, these results may then be saved and added to community search results, further improving the relevance of results for future searches of that keyword. The principle behind social search is that human network oriented results would be more meaningful and relevant for the user, instead of computer algorithms deciding the results for specific queries. == Research and implementations == Over the years, there have been different studies, researches and some implementations of Social Search. In 2008, there were a few startup companies that focused on ranking search results according to one's social graph on social networks. Companies in the social search space include Sproose, Mahalo, Jumper 2.0, Scour, Wink, Eurekster, and Delver. Former efforts include Wikia Search. In 2008, a story on TechCrunch showed Google potentially adding in a voting mechanism to search results similar to Digg's methodology. This suggests growing interest in how social groups can influence and potentially enhance the ability of algorithms to find meaningful data for end users. There are also other services like Sentiment that turn search personal by searching within the users' social circles. In 2009, a startup project called HeyStaks (www.heystaks.com) developed a web browser plugin "HayStaks". HeyStaks applies social search through collaboration in web search as a way that leads to better search results. The main motivation for HeyStaks to work on this idea is to provide the user with features that search engines didn't provide at that time. For instance, different searches have indicated that about 70% of the time when user search for something, a friend or a coworker have found it already. Also, studies have shown that approximately, 30% of people who use online search, search for something that they have found before. The startup believe that they help avoid these kind of issues by providing a shared and rich search experience through a list of recommendations that get generated based on search results. In October 2009, Google rolled out its "Social Search"; after a time in beta, the feature was expanded to multiple languages in May 2011. Before the expansion however in 2010 Bing and Google were already taking into account re-tweets and Likes when providing search results. However, after a search deal with Twitter ended without renewal, Google began to retool its Social Search. In January 2012, Google released "Search plus Your World", a further development of Social Search. The feature, which is integrated into Google's regular search as an opt-out feature, pulls references to results from Google+ profiles. The goal was to deliver better, more relevant and personalized search results with this integration. This integration however had some problems in which Google+ still is not wildly adopted or has much usage among many users. Later on, Google was criticized by Twitter for the perceived potential impact of "Search plus Your World" upon web publishers, describing the feature's release to the public as a "bad day for the web", while Google replied that Twitter refused to allow deep search crawling by Google of Twitter's content. By Google integrating Google+, the company was encouraging users to switch to Google's social networking site in order to improve search results. One famous example occurred when Google showed a link to Mark Zuckerberg's dormant Google+ account rather than the active Facebook profile. In November 2014 these accusations started to die down because Google's Knowledge Graph started to finally show links to Facebook, Twitter, and other social media sites. In December 2008, Twitter had re-introduced their people search feature. While the interface had since changed significantly, it allows you to search either full names or usernames in a straight-forward search engine. In January 2013, Facebook announced a new search engine called Graph Search still in the beta stages. The goal was to allow users to prioritize results that were popular with their social circle over the general internet. Facebook's Graph search utilized Facebook's user generated content to target users. Although there have been different researches and studies in social search, social media networks have not vested enough interest in working with search engines. LinkedIn for example has taken steps to improve its own individual search functions in order to stray users from external search engines. Even Microsoft started working with Twitter in order to integrate some tweets into Bing's search results in November 2013. Yet Twitter has its own search engine which points out how much value their data has and why they would like to keep it in house. In the end though social search will never be truly comprehensive of the subjects that matter to people unless users opt to be completely public with their information. == Social discovery == Social discovery is the use of social preferences and personal information to predict what content will be desirable to the user. Technology is used to discover new people and sometimes new experiences shopping, meeting friends or even traveling. The discovery of new people is often in real-time, enabled by mobile apps. However, social discovery is not limited to meeting people in real-time, it also leads to sales and revenue for companies via social media. An example of retail would be the addition of social sharing with music, through the iTunes music store. There is a social component to discovering new music Social discovery is at the basis of Facebook's profitability, generating ad revenue by targeting the ads to users using the social connections to enhance the commercial appeal. == Social search engines == A social search engine in an aspect can be thought of as a search engine that provides an answer for a question from another answer by identifying a person in the answer. That can happen by retrieving a user submitted query and determining that the query is related to the question; and provides an answer, including the link to the resource, as part of search results that are responsive to the query. Few social search engines depend only on online communities. Depending on the feature-set of a particular search engine, these results may then be saved and added to community search results, further improving the relevance of results for future searches of that keyword. Social search engines are considered a part of Web 2.0 because they use the collective filtering of online communities to elevate particularly interesting or relevant content using tagging. These descriptive tags add to the meta data embedded in Web pages, theoretically improving the results for particular keywords over time. A user will generally see suggested tags for a particular search term, indicating tags that have previously been added. An implementation of a social search engine is Aardvark. Aardvark is a social search engine that is based on the "village paradigm" which is about connecting the user who has a question with friends or friends of friends whom can answer his or her question. In Aadvark, a user ask a question in different ways that mostly involves online ways such as instant messaging, email, web input or other non-online ways such as text message or voice. The Aar

    Read more →
  • Correlation immunity

    Correlation immunity

    In mathematics, the correlation immunity of a Boolean function is a measure of the degree to which its outputs are uncorrelated with some subset of its inputs. Specifically, a Boolean function is said to be correlation-immune of order m if every subset of m or fewer variables in x 1 , x 2 , … , x n {\displaystyle x_{1},x_{2},\ldots ,x_{n}} is statistically independent of the value of f ( x 1 , x 2 , … , x n ) {\displaystyle f(x_{1},x_{2},\ldots ,x_{n})} . == Definition == A function f : F 2 n → F 2 {\displaystyle f:\mathbb {F} _{2}^{n}\rightarrow \mathbb {F} _{2}} is k {\displaystyle k} -th order correlation immune if for any independent n {\displaystyle n} binary random variables X 0 … X n − 1 {\displaystyle X_{0}\ldots X_{n-1}} , the random variable Z = f ( X 0 , … , X n − 1 ) {\displaystyle Z=f(X_{0},\ldots ,X_{n-1})} is independent from any random vector ( X i 1 … X i k ) {\displaystyle (X_{i_{1}}\ldots X_{i_{k}})} with 0 ≤ i 1 < … < i k < n {\displaystyle 0\leq i_{1}<\ldots Read more →

  • Trigger list

    Trigger list

    Trigger list in its most general meaning refers to a list whose items are used to initiate ("trigger") certain actions. == United States: Private financial information == In the United States, when a person applies for a mortgage loan, the lender makes a credit inquiry about the potential borrower from the national credit bureaus, Equifax, Experian and TransUnion. Unless the borrower is opted out, the credit bureaus put the applicants onto a "trigger list" of "leads" about persons who are interested in new loans. These lists are sold to numerous lenders all over the United States, and soon after the application the applicant starts receiving offers from all parts of the country. The trigger lists contain a significant amount of personal financial information. Among the buyers of trigger lists are "lead generators" which resell filtered information to borrowers, e.g., of people who live in a certain area and have a certain credit score. While the Federal Trade Commission considers the market of "trigger lists" to be a legal business, many people and organizations (such as the National Association of Mortgage Brokers) consider this a serious breach of privacy and lobby for putting this practice under regulatory controls. As of now, American consumers may opt-out from "trigger lists" by calling 1-888-5-OPTOUT (1-888-567-8688). == Nuclear non-proliferation == The Zangger Committee and the Nuclear Suppliers Group maintain lists of items that may contribute to nuclear proliferation; The nuclear non-proliferation treaty forbids its members to export such items to non-treaty members. these items are said to trigger the countries' responsibilities under the NPT, hence the name.

    Read more →
  • Voice inversion

    Voice inversion

    Voice inversion scrambling is an analog method of obscuring the content of a transmission. It is sometimes used in public service radio, automobile racing, cordless telephones and the Family Radio Service. Without a descrambler, the transmission makes the speaker "sound like Donald Duck". Despite the term, the technique operates on the passband of the information and so can be applied to any information being transmitted. == Forms and details == There are various forms of voice inversion which offer differing levels of security. Overall, voice inversion scrambling offers little true security as software and even hobbyist kits are available from kit makers for scrambling and descrambling. The cadence of the speech is not changed. It is often easy to guess what is happening in the conversation by listening for other audio cues like questions, short responses and other language cadences. In the simplest form of voice inversion, the frequency p {\displaystyle p} of each component is replaced with s − p {\displaystyle s-p} , where s {\displaystyle s} is the frequency of a carrier wave. This can be done by amplitude modulating the speech signal with the carrier, then applying a low-pass filter to select the lower sideband. This will make the low tones of the voice sound like high ones and vice versa. This process also occurs naturally if a radio receiver is tuned to a single sideband transmission but set to decode the wrong sideband. There are more advanced forms of voice inversion which are more complex and require more effort to descramble. One method is to use a random code to choose the carrier frequency and then change this code in real time. This is called Rolling Code voice inversion and one can often hear the "ticks" in the transmission which signal the changing of the inversion point. Another method is split band voice inversion. This is where the band is split and then each band is inverted separately. A rolling code can also be added to this method for variable split band inversion (VSB). Common carrier frequencies are: 2.632 kHz, 2.718 kHz, 2.868 kHz, 3.023 kHz, 3.107 kHz, 3.196 kHz, 3.333 kHz, 3.339 kHz, 3.496 kHz, 3.729 kHz and 4.096 kHz. Voice inversion offers no security at all and software is available to restore the original voice, which is why it is no longer used to protect conversations today. However, voice inversion is still found in low-end Chinese walkie talkies.

    Read more →
  • Omni-Path

    Omni-Path

    Omni-Path Architecture (OPA) is a high-performance communication architecture developed by Intel. It aims for low communication latency, low power consumption and a high throughput. It directly competes with InfiniBand. Intel planned to develop technology based on this architecture for exascale computing. The current owner of Omni-Path is Cornelis Networks. == History == Production of Omni-Path products started in 2015 and delivery of these products started in the first quarter of 2016. In November 2015, adapters based on the 2-port "Wolf River" ASIC were announced, using QSFP28 connectors with channel speeds up to 100 Gbit/s. Simultaneously, switches based on the 48-port "Prairie River" ASIC were announced. First models of that series were available starting in 2015. In April 2016, implementation of the InfiniBand "verbs" interface for the Omni-Path fabric was discussed. In October 2016, IBM, Hewlett Packard Enterprise, Dell, Lenovo, Samsung, Seagate Technology, Micron Technology, Western Digital and SK Hynix announced a joint consortium called Gen-Z to develop an open specification and architecture for non-volatile storage and memory products—including Intel's 3D Xpoint technology—which might in part compete against Omni-Path. Intel offered their Omni-Path products and components via other (hardware) vendors. For example, Dell EMC offered Intel Omni-Path as Dell Networking H-series, following the naming-standard of Dell Networking in 2017. In July 2019, Intel announced it would not continue development of Omni-Path networks and canceled OPA 200 series (200-Gbps variant of Omni-Path). In September 2020, Intel announced that the Omni-Path network products and technology would be spun out into a new venture with Cornelis Networks. Intel would continue to maintain support for legacy Omni-Path products, while Cornelis Networks continues the product line, leveraging existing Intel intellectual property related to Omni-Path architecture. In 2021, Cornelis announced Omni-Path Express, which replaces PSM2-based drivers and middleware, which trace back to PathScale's PSM created in 2003, for the existing Omni-Path hardware, with a native libfabric provider.

    Read more →
  • Signatures with efficient protocols

    Signatures with efficient protocols

    Signatures with efficient protocols are a form of digital signature invented by Jan Camenisch and Anna Lysyanskaya in 2001. In addition to being secure digital signatures, they need to allow for the efficient implementation of two protocols: A protocol for computing a digital signature in a secure two-party computation protocol. A protocol for proving knowledge of a digital signature in a zero-knowledge protocol. In applications, the first protocol allows a signer to possess the signing key to issue a signature to a user (the signature owner) without learning all the messages being signed or the complete signature. The second protocol allows the signature owner to prove that he has a signature on many messages without revealing the signature and only a (possibly) empty subset of the messages. The combination of these two protocols allows for the implementation of digital credential and ecash protocols.

    Read more →
  • Geometric primitive

    Geometric primitive

    In vector computer graphics, CAD systems, and geographic information systems, a geometric primitive (or prim) is the simplest (i.e. 'atomic' or irreducible) geometric shape that the system can handle (draw, store). Sometimes the subroutines that draw the corresponding objects are called "geometric primitives" as well. The most "primitive" primitives are point and straight line segments, which were all that early vector graphics systems had. In constructive solid geometry, primitives are simple geometric shapes such as a cube, cylinder, sphere, cone, pyramid, torus. Modern 2D computer graphics systems may operate with primitives which are curves (segments of straight lines, circles and more complicated curves), as well as shapes (boxes, arbitrary polygons, circles). A common set of two-dimensional primitives includes lines, points, and polygons, although some people prefer to consider triangles primitives, because every polygon can be constructed from triangles (polygon triangulation). All other graphic elements are built up from these primitives. In three dimensions, triangles or polygons positioned in three-dimensional space can be used as primitives to model more complex 3D forms. In some cases, curves (such as Bézier curves, circles, etc.) may be considered primitives; in other cases, curves are complex forms created from many straight, primitive shapes. == Common primitives == The set of geometric primitives is based on the dimension of the region being represented: Point (0-dimensional), a single location with no height, width, or depth. Line or curve (1-dimensional), having length but no width, although a linear feature may curve through a higher-dimensional space. Planar surface or curved surface (2-dimensional), having length and width. Volumetric region or solid (3-dimensional), having length, width, and depth. In GIS, the terrain surface is often spoken of colloquially as "2 1/2 dimensional," because only the upper surface needs to be represented. Thus, elevation can be conceptualized as a scalar field property or function of two-dimensional space, affording it a number of data modeling efficiencies over true 3-dimensional objects. A shape of any of these dimensions greater than zero consists of an infinite number of distinct points. Because digital systems are finite, only a sample set of the points in a shape can be stored. Thus, vector data structures typically represent geometric primitives using a strategic sample, organized in structures that facilitate the software interpolating the remainder of the shape at the time of analysis or display, using the algorithms of Computational geometry. A Point is a single coordinate in a Cartesian coordinate system. Some data models allow for Multipoint features consisting of several disconnected points. A Polygonal chain or Polyline is an ordered list of points (termed vertices in this context). The software is expected to interpolate the intervening shape of the line between adjacent points in the list as a parametric curve, most commonly a straight line, but other types of curves are frequently available, including circular arcs, cubic splines, and Bézier curves. Some of these curves require additional points to be defined that are not on the line itself, but are used for parametric control. A Polygon is a polyline that closes at its endpoints, representing the boundary of a two-dimensional region. The software is expected to use this boundary to partition 2-dimensional space into an interior and exterior. Some data models allow for a single feature to consist of multiple polylines, which could collectively connect to form a single closed boundary, could represent a set of disjoint regions (e.g., the state of Hawaii), or could represent a region with holes (e.g., a lake with an island). A Parametric shape is a standardized two-dimensional or three-dimensional shape defined by a minimal set of parameters, such as an ellipse defined by two points at its foci, or three points at its center, vertex, and co-vertex. A Polyhedron or Polygon mesh is a set of polygon faces in three-dimensional space that are connected at their edges to completely enclose a volumetric region. In some applications, closure may not be required or may be implied, such as modeling terrain. The software is expected to use this surface to partition 3-dimensional space into an interior and exterior. A triangle mesh is a subtype of polyhedron in which all faces must be triangles, the only polygon that will always be planar, including the Triangulated irregular network (TIN) commonly used in GIS. A parametric mesh represents a three-dimensional surface by a connected set of parametric functions, similar to a spline or Bézier curve in two dimensions. The most common structure is the Non-uniform rational B-spline (NURBS), supported by most CAD and animation software. == Application in GIS == A wide variety of vector data structures and formats have been developed during the history of Geographic information systems, but they share a fundamental basis of storing a core set of geometric primitives to represent the location and extent of geographic phenomena. Locations of points are almost always measured within a standard Earth-based coordinate system, whether the spherical Geographic coordinate system (latitude/longitude), or a planar coordinate system, such as the Universal Transverse Mercator. They also share the need to store a set of attributes of each geographic feature alongside its shape; traditionally, this has been accomplished using the data models, data formats, and even software of relational databases. Early vector formats, such as POLYVRT, the ARC/INFO Coverage, and the Esri shapefile support a basic set of geometric primitives: points, polylines, and polygons, only in two dimensional space and the latter two with only straight line interpolation. TIN data structures for representing terrain surfaces as triangle meshes were also added. Since the mid 1990s, new formats have been developed that extend the range of available primitives, generally standardized by the Open Geospatial Consortium's Simple Features specification. Common geometric primitive extensions include: three-dimensional coordinates for points, lines, and polygons; a fourth "dimension" to represent a measured attribute or time; curved segments in lines and polygons; text annotation as a form of geometry; and polygon meshes for three-dimensional objects. Frequently, a representation of the shape of a real-world phenomenon may have a different (usually lower) dimension than the phenomenon being represented. For example, a city (a two-dimensional region) may be represented as a point, or a road (a three-dimensional volume of material) may be represented as a line. This dimensional generalization correlates with tendencies in spatial cognition. For example, asking the distance between two cities presumes a conceptual model of the cities as points, while giving directions involving travel "up," "down," or "along" a road imply a one-dimensional conceptual model. This is frequently done for purposes of data efficiency, visual simplicity, or cognitive efficiency, and is acceptable if the distinction between the representation and the represented is understood, but can cause confusion if information users assume that the digital shape is a perfect representation of reality (i.e., believing that roads really are lines). == In 3D modelling == In CAD software or 3D modelling, the interface may present the user with the ability to create primitives which may be further modified by edits. For example, in the practice of box modelling the user will start with a cuboid, then use extrusion and other operations to create the model. In this use the primitive is just a convenient starting point, rather than the fundamental unit of modelling. A 3D package may also include a list of extended primitives which are more complex shapes that come with the package. For example, a teapot is listed as a primitive in 3D Studio Max. == In graphics hardware == Various graphics accelerators exist with hardware acceleration for rendering specific primitives such as lines or triangles, frequently with texture mapping and shaders. Modern 3D accelerators typically accept sequences of triangles as triangle strips.

    Read more →
  • Back-Up Interceptor Control

    Back-Up Interceptor Control

    Backup Interceptor Control (BUIC, ) was the Electronic Systems Division 416M System to backup the SAGE 416L System in the United States and Canada. BUIC deployed Cold War command, control, and coordination systems to SAGE radar stations to create dispersed NORAD Control Centers. == Background == Prior to the SAGE Direction Centers becoming operational, the USAF deployed data link systems at NORAD Control Centers with ground computers for controlling crewed interceptors. After SAGE IBM AN/FSQ-7 Combat Direction Centrals became operational and the Super Combat Centers with improved (digital) computers were cancelled, a backup to SAGE was planned in the event the above-ground SAGE Air Defense Direction Center failed. == General Electric AN/GPA-37 Course Directing Group == BUIC began with deployment of General Electric AN/GPA-37 Course Directing Groups to several Long Range Radar stations. Units designated included the "U.S. Air Force 858th Air Defense Group (BUIC) [which became] a permanent operating facility" at Naval Air Station Fallon in Nevada. == BUIC II == BUIC II was used to command and control sites using the Burroughs AN/GSA-51 Radar Course Directing Group. North Truro AFS became the first ADC installation configured for BUIC II. == BUIC III == The AN/GYK-19 (initially AN/GSA-51A) was an upgraded version of the BUIC II system designated AN/GSA-51A and required a larger building than the AN/GSA-51. The first BUIC III site was Fort Fisher AFS, and Air Defense Command's was first installed at Fort Fisher Air Force Station, North Carolina. Although more advanced systems were contemplated, the final design of the BUIC III system was an upgraded version of the BUIC II with around twice the performance. == Closure and upgrade == In 1972, the USAF decided to shut down most of the BUIC sites; most of the sites mothballed by 1974, except for the BUIC III site at Tyndall Air Force Base. In Canada the BUIC site at Senneterre was shut down, but St Margarets remained open. The remaining sites were closed between 1983-1984 when SAGE was replaced by the Joint Surveillance System. The AN/FYQ-47 Common Digitizer for the Joint Surveillance System, and the Radar Video Data Processor (RVDP) was a combined system for the Air Force and Federal Aviation Administration (FAA), it replaced the SAGE Burroughs AN/FST-2 Coordinate Data Transmitting Sets.

    Read more →
  • Business continuity and disaster recovery auditing

    Business continuity and disaster recovery auditing

    Given organizations' increasing dependency on information technology (IT) to run their operations, business continuity planning (and its subset IT service continuity planning) covers the entire organization, while disaster recovery focuses on IT. Auditing documents covering an organization's business continuity and disaster recovery (BCDR) plans provides a third-party validation to stakeholders that the documentation is complete and does not contain material misrepresentations. == Overview == Often used together, the terms business continuity (BC) and disaster recovery (DR) are very different. BC refers to the ability of a business to continue critical functions and business processes after the occurrence of a disaster, whereas DR refers specifically to the IT functions of the business, albeit a subset of BC. == Metrics == The primary objective is to protect the organization in the event that all or part of its operations and/or computer services are rendered partially or completely unusable. === DR metrics === Minimizing downtime and data loss during disaster recovery is typically measured in terms of two key concepts: Recovery time objective (RTO), time until a system is completely up and running Recovery point objective (RPO), a measure of the ability to recover files by specifying a point in time the backup copy will restore to. == The auditor's role == Role of the Internal Auditor in Auditing a Disaster Recovery Plan (DRP): 1. Governance & Oversight - Confirm roles, responsibilities, and oversight are defined, and DRP aligns with risk appetite and continuity strategy. 2. Risk Assessment & BIA - Verify risk and impact assessments identify critical systems and define RTO/RPO. 3. Plan Design & Documentation - Ensure the DRP is current, complete, and includes key recovery procedures. 4. Testing & Validation - Confirm regular DRP testing occurs and results are used to improve the plan. 5. Backup & Recovery - Assess backup frequency and recovery capabilities against RTO/RPO targets. 6. Communication & Training - Verify staff are trained and communication protocols are in place for crises. 7. Maintenance & Improvement - Ensure the DRP is regularly updated and lessons learned are integrated. == Documentation == === Disaster recovery plan === A disaster recovery plan (DRP) is a documented process or set of procedures to execute an organization's disaster recovery processes and recover and protect a business IT infrastructure in the event of a disaster. It is "a comprehensive statement of consistent actions to be taken before, during and after a disaster". The disaster could be natural, environmental or man-made. Man-made disasters could be intentional (for example, an act of a terrorist) or unintentional (that is, accidental, such as the breakage of a man-made dam or even "fat fingers" - or errant commands entered - on a computer system). ==== Types of plans ==== Although there is no one-size-fits-all plan, there are three basic strategies: prevention, including proper backups, having surge protectors and generators detection, a byproduct of routine inspections, which may discover new (potential) threats correction The latter may include securing proper insurance policies, and holding a "lessons learned" brainstorming session. ==== Best practices ==== To maximize their effectiveness, DRPs are most effective when updated frequently, and should: be an integral part of all business analysis processes, be revisited at every major corporate acquisition, at every new product launch and at every new system development milestone. be thoroughly tested, not just unpracticed bureaucratic documentation Adequate records need to be retained by the organization. The auditor examines records, billings, and contracts to verify that records are being kept. One such record is a current list of the organization's hardware and software vendors. Such list is made and periodically updated to reflect changing business practices and as part of an IT asset management system. Copies of it are stored on and off site and are made available or accessible to those who require them. An auditor tests the procedures used to meet this objective and determine their effectiveness. === Relationship to BCPs === Disaster recovery is a subset of business continuity. Where DRP encompasses the policies, tools and procedures to enable recovery of data following a catastrophic event, BCP involves keeping all aspects of a business functioning regardless of potential disruptive events. As such, a business continuity plan is a comprehensive organizational strategy that includes the DRP as well as threat prevention, detection, recovery, and resumption of operations should a data breach or other disaster event occur. Therefore, BCP consists of five component plans: Business resumption plan Occupant emergency plan Continuity of operations plan Incident management plan Disaster recovery plan The first three components (business resumption, occupant emergency, and continuity of operations plans) do not deal with the IT infrastructure. The incident management plan (IMP) does deal with the IT infrastructure, but since it establishes structure and procedures to address cyber attacks against an organization's IT systems, it generally does not represent an agent for activating the DRP; thus DRP is the only BCP component of active interest to IT. == Testing == The overall categorization of tests are functional- and discussion-based. Types of tests include: tabletop exercises, checklists, simulations, parallel processing (testing recovery site while primary site is in operation), and full interruption (fail over) tests. These apply to both BC and DR. == Benefits == Like every insurance plan, there are benefits that can be obtained from proper business continuity planning, including: Studies have shown a correlation between higher spending on auditing fees and lower rates of Incidents. Minimizing risk of delays Guaranteeing the reliability of standby systems (even automating the failure detection and recovery in certain scenarios) Providing a standard for testing the plan Minimizing decision-making during a disaster Reducing potential legal liabilities Lowering unnecessarily stressful work environment === Planning and testing methodology === According to Geoffrey H. Wold of the Disaster Recovery Journal, the entire process involved in developing a Disaster Recovery Plan consists of 10 steps: Performing a risk assessment: The planning committee prepares a risk analysis and a business impact analysis (BIA) that includes a range of possible disasters. Each functional area of the organization is analyzed to determine potential consequences. Traditionally, fire has posed the greatest threat. A thorough plan provides for "worst case" situations, such as destruction of the main building. Establishing priorities for processing and operations: Critical needs of each department are evaluated and prioritized. Written agreements for alternatives selected are prepared, with details specifying duration, termination conditions, system testing, cost, any special security procedures, procedure for the notification of system changes, hours of operation, the specific hardware and other equipment required for processing, personnel requirements, definition of the circumstances constituting an emergency, process to negotiate service extensions, guarantee of compatibility, availability, non-mainframe resource requirements, priorities, and other contractual issues. Collecting data: This includes various lists (employee backup position listing, critical telephone numbers list, master call list, master vendor list, notification checklist), inventories (communications equipment, documentation, office equipment, forms, insurance policies, workgroup and data center computer hardware, microcomputer hardware and software, office supply, off-site storage location equipment, telephones, etc.), distribution register, software and data files backup/retention schedules, temporary location specifications, any other such lists, materials, inventories, and documentation. Pre-formatted forms are often used to facilitate the data gathering process. Organizing and documenting a written plan Developing testing criteria and procedures: reasons for testing include Determining the feasibility and compatibility of backup facilities and procedures. Identifying areas in the plan that need modification. Providing training to the team managers and team members. Demonstrating the ability of the organization to recover. Providing motivation for maintaining and updating the disaster recovery plan. Testing the plan: An initial "dry run" of the plan is performed by conducting a structured walk-through test. An actual test-run must be performed. Problems are corrected. Initial testing can be plan is done in sections and after normal business hours to minimize disruptions. Subsequent tests occur during normal business hours. === Caveats/controversie

    Read more →
  • Opinion Space

    Opinion Space

    Developed at UC Berkeley, "Opinion Space" (also known as The Collective Discovery Engine) is a social media technology designed to help communities generate and exchange ideas about important issues and policies. Version 1.0 was launched on April 4, 2009, at UC Berkeley, and explored the question "Do you think legalizing marijuana is a good idea?" It has since undergone 4 different iterations, and been used in partnership with various organizations including The Occupy movement (Version 4.0, 5/24/2013) and the African Robots Network (Version 4.0, 5/25/2013). Opinion Space has also been used in collaboration with the United States State Department and the University of California's Berkeley Center for New Media (Version 2.0, 12/1/2009 and Version 3.0, 2/25/2012) to gain public perspective on foreign policy issues. Then U.S. Secretary of State Hillary Rodham Clinton explained, "Opinion Space will harness the power of connection technologies to provide a unique forum for international dialogue. This is...an opportunity to extend our engagement beyond the halls of government directly to the people of the world" (2010). The website uses data visualization and statistical analysis to present and develop public opinion and ideas. Opinion Space is a self-organizing system that uses an intuitive graphical "map" that displays patterns, trends, and insights as they emerge and employs the wisdom of crowds to identify and highlight the most insightful ideas. The system uses a game model that incorporates techniques from deliberative polling, collaborative filtering, and multidimensional visualization.

    Read more →
  • Content determination

    Content determination

    Content determination is the subtask of natural language generation (NLG) that involves deciding on the information to be communicated in a generated text. It is closely related to the task of document structuring. == Example == Consider an NLG system which summarises information about sick babies. Suppose this system has four pieces of information it can communicate The baby is being given morphine via an IV drop The baby's heart rate shows bradycardia's (temporary drops) The baby's temperature is normal The baby is crying Which of these bits of information should be included in the generated texts? == Issues == There are three general issues which almost always impact the content determination task, and can be illustrated with the above example. Perhaps the most fundamental issue is the communicative goal of the text, i.e. its purpose and reader. In the above example, for instance, a doctor who wants to make a decision about medical treatment would probably be most interested in the heart rate bradycardias, while a parent who wanted to know how her child was doing would probably be more interested in the fact that the baby was being given morphine and was crying. The second issue is the size and level of detail of the generated text. For instance, a short summary which was sent to a doctor as a 160 character SMS text message might only mention the heart rate bradycardias, while a longer summary which was printed out as a multipage document might also mention the fact that the baby is on a morphine IV. The final issue is how unusual and unexpected the information is. For example, neither doctors nor parents would place a high priority on being told that the baby's temperature was normal, if they expected this to be the case. Regardless, content determination is very important to users, indeed in many cases the quality of content determination is the most important factor (from the user's perspective) in determining the overall quality of the generated text. == Techniques == There are three basic approaches to document structuring: schemas (content templates), statistical approaches, and explicit reasoning. Schemas are templates which explicitly specify the content of a generated text (as well as document structuring information). Typically, they are constructed by manually analysing a corpus of human-written texts in the target genre, and extracting a content template from these texts. Schemas work well in practice in domains where content is somewhat standardised, but work less well in domains where content is more fluid (such as the medical example above). Statistical techniques use statistical corpus analysis techniques to automatically determine the content of the generated texts. Such work is in its infancy, and has mostly been applied to contexts where the communicative goal, reader, size, and level of detail are fixed. For example, generation of newswire summaries of sporting events. Explicit reasoning approaches have probably attracted the most attention from researchers. The basic idea is to use AI reasoning techniques (such as knowledge-based rules, planning, pattern detection, case-based reasoning, etc.) to examine the information available to be communicated (including how unusual/unexpected it is), the communicative goal and reader, and the characteristics of the generated text (including target size), and decide on the optimal content for the generated text. A very wide range of techniques has been explored, but there is no consensus as to which is most effective.

    Read more →
  • Media contacts database

    Media contacts database

    In public relations (PR) and marketing, a media contacts database is a resource which catalogs the names, contact information, and other details about people who work in various media professions. These include journalists, reporters, editors, publishers, contributors, freelance journalists, opinion writers, social media personalities/ influencers, TV show anchors, radio show hosts, DJs, and others. A media contacts database usually contains the following information: Full name of the media contact, The publication or channel they work for Designations (past and present) Topics they cover, or their beat Contact information found in public domains Online presence like blogs and other social networking sites Education Information == Overview == A media contacts database is a public relations tool that is maintained and used by PR professionals to pitch stories on a particular topic, product, or company to a specific group of journalists. These journalists would then write or speak about the particular topic in a relevant issue or episode of their shows. A media contacts database allows a PR professional to gain easy access to hundreds of journalists within a short span of time. Media contacts database are created and sold by many media research companies that offer such PR software for professionals.

    Read more →
  • Social profiling

    Social profiling

    Social profiling is the process of constructing a social media user's profile using their social data. In general, profiling refers to the data science process of generating a person's profile with computerized algorithms and technology. There are various platforms for sharing this information with the proliferation of growing popular social networks, including but not limited to LinkedIn, Google+, Facebook and Twitter. == Social profile and social data == A person's social data refers to the personal data that they generate either online or offline (for more information, see social data revolution). A large amount of these data, including one's language, location and interest, is shared through social media and social network. Users join multiple social media platforms and their profiles across these platforms can be linked using different methods to obtain their interests, locations, content, and friend list. Altogether, this information can be used to construct a person's social profile. Meeting the user's satisfaction level for information collection is becoming more challenging. This is because of too much "noise" generated, which affects the process of information collection due to explosively increasing online data. Social profiling is an emerging approach to overcome the challenges faced in meeting user's demands by introducing the concept of personalized search while keeping in consideration user profiles generated using social network data. A study reviews and classifies research inferring users social profile attributes from social media data as individual and group profiling. The existing techniques along with utilized data sources, the limitations, and challenges were highlighted. The prominent approaches adopted include machine learning, ontology, and fuzzy logic. Social media data from Twitter and Facebook have been used by most of the studies to infer the social attributes of users. The literature showed that user social attributes, including age, gender, home location, wellness, emotion, opinion, relation, influence are still need to be explored. === Personalized meta-search engines === The ever-increasing online content has resulted in the lack of proficiency of centralized search engine's results. It can no longer satisfy user's demand for information. A possible solution that would increase coverage of search results would be meta-search engines, an approach that collects information from numerous centralized search engines. A new problem thus emerges, that is too much data and too much noise is generated in the collection process. Therefore, a new technique called personalized meta-search engines was developed. It makes use of a user's profile (largely social profile) to filter the search results. A user's profile can be a combination of a number of things, including but not limited to, "a user's manual selected interests, user's search history", and personal social network data. == Social media profiling == According to Samuel D. Warren II and Louis Brandeis (1890), disclosure of private information and the misuse of it can hurt people's feelings and cause considerable damage in people's lives. Social networks provide people access to intimate online interactions; therefore, information access control, information transactions, privacy issues, connections and relationships on social media have become important research fields and are subjects of concern to the public. Ricard Fogues and other co-authors state that "any privacy mechanism has at its base an access control", that dictate "how permissions are given, what elements can be private, how access rules are defined, and so on". Current access control for social media accounts tend to still be very simplistic: there is very limited diversity in the category of relationships on for social network accounts. User's relationships to others are, on most platforms, only categorized as "friend" or "non-friend" and people may leak important information to "friends" inside their social circle but not necessarily users to they consciously want to share the information to. The below section is concerned with social media profiling and what profiling information on social media accounts can achieve. === Privacy leaks === A lot of information is voluntarily shared on online social networks, such as photos and updates on life activities (new job, hobbies, etc.). People rest assured that different social network accounts on different platforms will not be linked as long as they do not grant permission to these links. However, according to Diane Gan, information gathered online enables "target subjects to be identified on other social networking sites such as Foursquare, Instagram, LinkedIn, Facebook and Google+, where more personal information was leaked". The majority of social networking platforms use the "opt out approach" for their features. If users wish to protect their privacy, it is user's own responsibility to check and change the privacy settings as a number of them are set to default option. A major social network platforms have developed geo-tag functions and are in popular usage. This is concerning because 39% of users have experienced profiling hacking; 78% burglars have used major social media networks and Google Street-view to select their victims; and an astonishing 54% of burglars attempted to break into empty houses when people posted their status updates and geo-locations. === Facebook === Formation and maintenance of social media accounts and their relationships with other accounts are associated with various social outcomes. In 2015, for many firms, customer relationship management is essential and is partially done through Facebook. Before the emergence and prevalence of social media, customer identification was primarily based upon information that a firm could directly acquire: for example, it may be through a customer's purchasing process or voluntary act of completing a survey/loyalty program. However, the rise of social media has greatly reduced the approach of building a customer's profile/model based on available data. Marketers now increasingly seek customer information through Facebook; this may include a variety of information users disclose to all users or partial users on Facebook: name, gender, date of birth, e-mail address, sexual orientation, marital status, interests, hobbies, favorite sports team(s), favorite athlete(s), or favorite music, and more importantly, Facebook connections. However, due to the privacy policy design, acquiring true information on Facebook is no trivial task. Often, Facebook users either refuse to disclose true information (sometimes using pseudonyms) or setting information to be only visible to friends, Facebook users who "LIKE" your page are also hard to identify. To do online profiling of users and cluster users, marketers and companies can and will access the following kinds of data: gender, the IP address and city of each user through the Facebook Insight page, who "LIKED" a certain user, a page list of all the pages that a person "LIKED" (transaction data), other people that a user follow (even if it exceeds the first 500, which we usually can not see) and all the publicly shared data. === Twitter === First launched on the Internet in March 2006, Twitter is a platform on which users can connect and communicate with any other user in just 280 characters. Like Facebook, Twitter is also a crucial tunnel for users to leak important information, often unconsciously, but able to be accessed and collected by others. According to Rachel Nuwer, in a sample of 10.8 million tweets by more than 5,000 users, their posted and publicly shared information are enough to reveal a user's income range. A postdoctoral researcher from the University of Pennsylvania, Daniel Preoţiuc-Pietro and his colleagues were able to categorize 90% of users into corresponding income groups. Their existing collected data, after being fed into a machine-learning model, generated reliable predictions on the characteristics of each income group. The mobile app called Streamd.in displays live tweets on Google Maps by using geo-location details attached to the tweet, and traces the user's movement in the real world. === Profiling photos on social network === The advent and universality of social media networks have boosted the role of images and visual information dissemination. Many types of visual information on social media transmit messages from the author, location information and other personal information. For example, a user may post a photo of themselves in which landmarks are visible, which can enable other users to determine where they are. In a study done by Cristina Segalin, Dong Seon Cheng and Marco Cristani, they found that profiling user posts' photos can reveal personal traits such as personality and mood. In the study, convolutional neural networks (CNNs) is introduced. It builds on the main characteristics of computational

    Read more →