Given organizations' increasing dependency on information technology (IT) to run their operations, business continuity planning (and its subset IT service continuity planning) covers the entire organization, while disaster recovery focuses on IT. Auditing documents covering an organization's business continuity and disaster recovery (BCDR) plans provides a third-party validation to stakeholders that the documentation is complete and does not contain material misrepresentations. == Overview == Often used together, the terms business continuity (BC) and disaster recovery (DR) are very different. BC refers to the ability of a business to continue critical functions and business processes after the occurrence of a disaster, whereas DR refers specifically to the IT functions of the business, albeit a subset of BC. == Metrics == The primary objective is to protect the organization in the event that all or part of its operations and/or computer services are rendered partially or completely unusable. === DR metrics === Minimizing downtime and data loss during disaster recovery is typically measured in terms of two key concepts: Recovery time objective (RTO), time until a system is completely up and running Recovery point objective (RPO), a measure of the ability to recover files by specifying a point in time the backup copy will restore to. == The auditor's role == Role of the Internal Auditor in Auditing a Disaster Recovery Plan (DRP): 1. Governance & Oversight - Confirm roles, responsibilities, and oversight are defined, and DRP aligns with risk appetite and continuity strategy. 2. Risk Assessment & BIA - Verify risk and impact assessments identify critical systems and define RTO/RPO. 3. Plan Design & Documentation - Ensure the DRP is current, complete, and includes key recovery procedures. 4. Testing & Validation - Confirm regular DRP testing occurs and results are used to improve the plan. 5. Backup & Recovery - Assess backup frequency and recovery capabilities against RTO/RPO targets. 6. Communication & Training - Verify staff are trained and communication protocols are in place for crises. 7. Maintenance & Improvement - Ensure the DRP is regularly updated and lessons learned are integrated. == Documentation == === Disaster recovery plan === A disaster recovery plan (DRP) is a documented process or set of procedures to execute an organization's disaster recovery processes and recover and protect a business IT infrastructure in the event of a disaster. It is "a comprehensive statement of consistent actions to be taken before, during and after a disaster". The disaster could be natural, environmental or man-made. Man-made disasters could be intentional (for example, an act of a terrorist) or unintentional (that is, accidental, such as the breakage of a man-made dam or even "fat fingers" - or errant commands entered - on a computer system). ==== Types of plans ==== Although there is no one-size-fits-all plan, there are three basic strategies: prevention, including proper backups, having surge protectors and generators detection, a byproduct of routine inspections, which may discover new (potential) threats correction The latter may include securing proper insurance policies, and holding a "lessons learned" brainstorming session. ==== Best practices ==== To maximize their effectiveness, DRPs are most effective when updated frequently, and should: be an integral part of all business analysis processes, be revisited at every major corporate acquisition, at every new product launch and at every new system development milestone. be thoroughly tested, not just unpracticed bureaucratic documentation Adequate records need to be retained by the organization. The auditor examines records, billings, and contracts to verify that records are being kept. One such record is a current list of the organization's hardware and software vendors. Such list is made and periodically updated to reflect changing business practices and as part of an IT asset management system. Copies of it are stored on and off site and are made available or accessible to those who require them. An auditor tests the procedures used to meet this objective and determine their effectiveness. === Relationship to BCPs === Disaster recovery is a subset of business continuity. Where DRP encompasses the policies, tools and procedures to enable recovery of data following a catastrophic event, BCP involves keeping all aspects of a business functioning regardless of potential disruptive events. As such, a business continuity plan is a comprehensive organizational strategy that includes the DRP as well as threat prevention, detection, recovery, and resumption of operations should a data breach or other disaster event occur. Therefore, BCP consists of five component plans: Business resumption plan Occupant emergency plan Continuity of operations plan Incident management plan Disaster recovery plan The first three components (business resumption, occupant emergency, and continuity of operations plans) do not deal with the IT infrastructure. The incident management plan (IMP) does deal with the IT infrastructure, but since it establishes structure and procedures to address cyber attacks against an organization's IT systems, it generally does not represent an agent for activating the DRP; thus DRP is the only BCP component of active interest to IT. == Testing == The overall categorization of tests are functional- and discussion-based. Types of tests include: tabletop exercises, checklists, simulations, parallel processing (testing recovery site while primary site is in operation), and full interruption (fail over) tests. These apply to both BC and DR. == Benefits == Like every insurance plan, there are benefits that can be obtained from proper business continuity planning, including: Studies have shown a correlation between higher spending on auditing fees and lower rates of Incidents. Minimizing risk of delays Guaranteeing the reliability of standby systems (even automating the failure detection and recovery in certain scenarios) Providing a standard for testing the plan Minimizing decision-making during a disaster Reducing potential legal liabilities Lowering unnecessarily stressful work environment === Planning and testing methodology === According to Geoffrey H. Wold of the Disaster Recovery Journal, the entire process involved in developing a Disaster Recovery Plan consists of 10 steps: Performing a risk assessment: The planning committee prepares a risk analysis and a business impact analysis (BIA) that includes a range of possible disasters. Each functional area of the organization is analyzed to determine potential consequences. Traditionally, fire has posed the greatest threat. A thorough plan provides for "worst case" situations, such as destruction of the main building. Establishing priorities for processing and operations: Critical needs of each department are evaluated and prioritized. Written agreements for alternatives selected are prepared, with details specifying duration, termination conditions, system testing, cost, any special security procedures, procedure for the notification of system changes, hours of operation, the specific hardware and other equipment required for processing, personnel requirements, definition of the circumstances constituting an emergency, process to negotiate service extensions, guarantee of compatibility, availability, non-mainframe resource requirements, priorities, and other contractual issues. Collecting data: This includes various lists (employee backup position listing, critical telephone numbers list, master call list, master vendor list, notification checklist), inventories (communications equipment, documentation, office equipment, forms, insurance policies, workgroup and data center computer hardware, microcomputer hardware and software, office supply, off-site storage location equipment, telephones, etc.), distribution register, software and data files backup/retention schedules, temporary location specifications, any other such lists, materials, inventories, and documentation. Pre-formatted forms are often used to facilitate the data gathering process. Organizing and documenting a written plan Developing testing criteria and procedures: reasons for testing include Determining the feasibility and compatibility of backup facilities and procedures. Identifying areas in the plan that need modification. Providing training to the team managers and team members. Demonstrating the ability of the organization to recover. Providing motivation for maintaining and updating the disaster recovery plan. Testing the plan: An initial "dry run" of the plan is performed by conducting a structured walk-through test. An actual test-run must be performed. Problems are corrected. Initial testing can be plan is done in sections and after normal business hours to minimize disruptions. Subsequent tests occur during normal business hours. === Caveats/controversie
Tiimo
Tiimo is an app designed to help neurodivergent individuals with planning their life. In August 2024 the company raised €1.4 million, bringing their total funding to €4.3 million. At that point they had over 500,000 users, including 50,000 paid users. The app has Apple Watch support and a learning platform that includes courses on well-being and neurodiversity. The app was founded by Helene Lassen Nørlem and Melissa Würtz Azari in 2015. After being a finalist in 2024, in December 2025 Tiimo was won Apple’s iPhone App of the Year. The premium version is $10/mo and features an AI chatbot alongside the daily planner.
Data transformation (computing)
In computing, data transformation is the process of converting data from one format or structure into another format or structure. It is a fundamental aspect of most data integration and data management tasks such as data wrangling, data warehousing, data integration and application integration. Data transformation can be simple or complex based on the required changes to the data between the source (initial) data and the target (final) data. Data transformation is typically performed via a mixture of manual and automated steps. Tools and technologies used for data transformation can vary widely based on the format, structure, complexity, and volume of the data being transformed. A master data recast is another form of data transformation where the entire database of data values is transformed or recast without extracting the data from the database. All data in a well-designed database is directly or indirectly related to a limited set of master database tables by a network of foreign key constraints. Each foreign key constraint is dependent upon a unique database index from the parent database table. Therefore, when the proper master database table is recast with a different unique index, the directly and indirectly related data are also recast or restated. The directly and indirectly related data may also still be viewed in the original form since the original unique index still exists with the master data. Also, the database recast must be done in such a way as to not impact the applications architecture software. When the data mapping is indirect via a mediating data model, the process is also called data mediation. == Data transformation process == Data transformation can be divided into the following steps, each applicable as needed based on the complexity of the transformation required. Data discovery Data mapping Code generation Code execution Data review These steps are often the focus of developers or technical data analysts who may use multiple specialized tools to perform their tasks. The steps can be described as follows: Data discovery is the first step in the data transformation process. Typically the data is profiled using profiling tools or sometimes using manually written profiling scripts to better understand the structure and characteristics of the data and decide how it needs to be transformed. Data mapping is the process of defining how individual fields are mapped, modified, joined, filtered, aggregated etc. to produce the final desired output. Developers or technical data analysts traditionally perform data mapping since they work in the specific technologies to define the transformation rules (e.g. visual ETL tools, transformation languages). Code generation is the process of generating executable code (e.g. SQL, Python, R, or other executable instructions) that will transform the data based on the desired and defined data mapping rules. Typically, the data transformation technologies generate this code based on the definitions or metadata defined by the developers. Code execution is the step whereby the generated code is executed against the data to create the desired output. The executed code may be tightly integrated into the transformation tool, or it may require separate steps by the developer to manually execute the generated code. Data review is the final step in the process, which focuses on ensuring the output data meets the transformation requirements. It is typically the business user or final end-user of the data that performs this step. Any anomalies or errors in the data that are found and communicated back to the developer or data analyst as new requirements to be implemented in the transformation process. == Types of data transformation == === Batch data transformation === Traditionally, data transformation has been a bulk or batch process, whereby developers write code or implement transformation rules in a data integration tool, and then execute that code or those rules on large volumes of data. This process can follow the linear set of steps as described in the data transformation process above. Batch data transformation is the cornerstone of virtually all data integration technologies such as data warehousing, data migration and application integration. When data must be transformed and delivered with low latency, the term "microbatch" is often used. This refers to small batches of data (e.g. a small number of rows or a small set of data objects) that can be processed very quickly and delivered to the target system when needed. === Benefits of batch data transformation === Traditional data transformation processes have served companies well for decades. The various tools and technologies (data profiling, data visualization, data cleansing, data integration etc.) have matured and most (if not all) enterprises transform enormous volumes of data that feed internal and external applications, data warehouses and other data stores. === Limitations of traditional data transformation === This traditional process also has limitations that hamper its overall efficiency and effectiveness. The people who need to use the data (e.g. business users) do not play a direct role in the data transformation process. Typically, users hand over the data transformation task to developers who have the necessary coding or technical skills to define the transformations and execute them on the data. This process leaves the bulk of the work of defining the required transformations to the developer, which often in turn do not have the same domain knowledge as the business user. The developer interprets the business user requirements and implements the related code/logic. This has the potential of introducing errors into the process (through misinterpreted requirements), and also increases the time to arrive at a solution. This problem has given rise to the need for agility and self-service in data integration (i.e. empowering the user of the data and enabling them to transform the data themselves interactively). There are companies that provide self-service data transformation tools. They are aiming to efficiently analyze, map and transform large volumes of data without the technical knowledge and process complexity that currently exists. While these companies use traditional batch transformation, their tools enable more interactivity for users through visual platforms and easily repeated scripts. Still, there might be some compatibility issues (e.g. new data sources like IoT may not work correctly with older tools) and compliance limitations due to the difference in data governance, preparation and audit practices. === Interactive data transformation === Interactive data transformation (IDT) is an emerging capability that allows business analysts and business users the ability to directly interact with large datasets through a visual interface, understand the characteristics of the data (via automated data profiling or visualization), and change or correct the data through simple interactions such as clicking or selecting certain elements of the data. Although interactive data transformation follows the same data integration process steps as batch data integration, the key difference is that the steps are not necessarily followed in a linear fashion and typically don't require significant technical skills for completion. There are a number of companies that provide interactive data transformation tools, including Trifacta, Alteryx and Paxata. They are aiming to efficiently analyze, map and transform large volumes of data while at the same time abstracting away some of the technical complexity and processes which take place under the hood. Interactive data transformation solutions provide an integrated visual interface that combines the previously disparate steps of data analysis, data mapping and code generation/execution and data inspection. That is, if changes are made at one step (like for example renaming), the software automatically updates the preceding or following steps accordingly. Interfaces for interactive data transformation incorporate visualizations to show the user patterns and anomalies in the data so they can identify erroneous or outlying values. Once they've finished transforming the data, the system can generate executable code/logic, which can be executed or applied to subsequent similar data sets. By removing the developer from the process, interactive data transformation systems shorten the time needed to prepare and transform the data, eliminate costly errors in the interpretation of user requirements and empower business users and analysts to control their data and interact with it as needed. == Transformational languages == There are numerous languages available for performing data transformation. Many transformation languages require a grammar to be provided. In many cases, the grammar is structured using something closely resembling Backus–Naur form (BNF). There are numerous languages
Media contacts database
In public relations (PR) and marketing, a media contacts database is a resource which catalogs the names, contact information, and other details about people who work in various media professions. These include journalists, reporters, editors, publishers, contributors, freelance journalists, opinion writers, social media personalities/ influencers, TV show anchors, radio show hosts, DJs, and others. A media contacts database usually contains the following information: Full name of the media contact, The publication or channel they work for Designations (past and present) Topics they cover, or their beat Contact information found in public domains Online presence like blogs and other social networking sites Education Information == Overview == A media contacts database is a public relations tool that is maintained and used by PR professionals to pitch stories on a particular topic, product, or company to a specific group of journalists. These journalists would then write or speak about the particular topic in a relevant issue or episode of their shows. A media contacts database allows a PR professional to gain easy access to hundreds of journalists within a short span of time. Media contacts database are created and sold by many media research companies that offer such PR software for professionals.
Time-lock puzzle
A time-lock puzzle, or time-released cryptography, encrypts a message that cannot be decrypted until a specified amount of time has passed. The concept was first described by Timothy C. May, and a solution first introduced by Ron Rivest, Adi Shamir, and David A. Wagner in 1996. Time-lock puzzle are useful in cases where confidentiality of information is determined by time, such as a diarist who does not want their views released until 50 years after their death, an auction where bids are sealed until the bidding period is closed, electronic voting, and contract signing. They can additionally be used in creating further cryptographic primitives, such as verifiable delay functions and zero knowledge proofs. Time-released cryptography can be achieved through several different mechanisms. Use mathematical problems requiring sequential calculations to solve, and cannot be solved with parallelization. Thus, adding more computers to a problem will not help solve the problem faster. Use of a trusted agent, or multiple agents who each hold a part of the message and cryptographic keys, who release the message after a specified time period has passed. Distribute public encryption keys to users, and place private cryptographic keys with a trusted agent in an offline location, to be released at a later date.
Database index
A database index is a data structure that improves the speed of data retrieval operations on a database table at the cost of additional writes and storage space to maintain the index data structure. Indexes are used to quickly locate data without having to search every row in a database table every time said table is accessed. Indexes can be created using one or more columns of a database table, providing the basis for both rapid random lookups and efficient access of ordered records. An index is a copy of selected columns of data, from a table, that is designed to enable very efficient search. An index normally includes a "key" or direct link to the original row of data from which it was copied, to allow the complete row to be retrieved efficiently. Some databases extend the power of indexing by letting developers create indexes on column values that have been transformed by functions or expressions. For example, an index could be created on upper(last_name), which would only store the upper-case versions of the last_name field in the index. Another option sometimes supported is the use of partial index, where index entries are created only for those records that satisfy some conditional expression. A further aspect of flexibility is to permit indexing on user-defined functions, as well as expressions formed from an assortment of built-in functions. == Usage == === Support for fast lookup === Most database software includes indexing technology that enables sub-linear time lookup to improve performance, as linear search is inefficient for large databases. Suppose a database contains N data items and one must be retrieved based on the value of one of the fields. A simple implementation retrieves and examines each item according to the test. If there is only one matching item, this can stop when it finds that single item, but if there are multiple matches, it must test everything. This means that the number of operations in the average case is O(N) or linear time. Since databases may contain many objects, and since lookup is a common operation, it is often desirable to improve performance. An index is any data structure that improves the performance of lookup. There are many different data structures used for this purpose. There are complex design trade-offs involving lookup performance, index size, and index-update performance. Many index designs exhibit logarithmic (O(log(N))) lookup performance and in some applications it is possible to achieve flat (O(1)) performance. === Policing the database constraints === Indexes are used to police database constraints, such as UNIQUE, EXCLUSION, PRIMARY KEY and FOREIGN KEY. An index may be declared as UNIQUE, which creates an implicit constraint on the underlying table. Database systems usually implicitly create an index on a set of columns declared PRIMARY KEY, and some are capable of using an already-existing index to police this constraint. Many database systems require that both referencing and referenced sets of columns in a FOREIGN KEY constraint are indexed, thus improving performance of inserts, updates and deletes to the tables participating in the constraint. Some database systems support an EXCLUSION constraint that ensures that, for a newly inserted or updated record, a certain predicate holds for no other record. This can be used to implement a UNIQUE constraint (with equality predicate) or more complex constraints, like ensuring that no overlapping time ranges or no intersecting geometry objects would be stored in the table. An index supporting fast searching for records satisfying the predicate is required to police such a constraint. == Index architecture and indexing methods == === Non-clustered === The data is present in arbitrary order, but the logical ordering is specified by the index. The data rows may be spread throughout the table regardless of the value of the indexed column or expression. The non-clustered index tree contains the index keys in sorted order, with the leaf level of the index containing the pointer to the record (page and the row number in the data page in page-organized engines; row offset in file-organized engines). In a non-clustered index, The physical order of the rows is not the same as the index order. The indexed columns are typically non-primary key columns used in JOIN, WHERE, and ORDER BY clauses. There can be more than one non-clustered index on a database table. === Clustered === Clustering alters the data block into a certain distinct order to match the index, resulting in the row data being stored in order. Therefore, only one clustered index can be created on a given database table. Clustered indexes can greatly increase overall speed of retrieval, but usually only where the data is accessed sequentially in the same or reverse order of the clustered index, or when a range of items is selected. Since the physical records are in this sort order on disk, the next row item in the sequence is immediately before or after the last one, and so fewer data block reads are required. The primary feature of a clustered index is therefore the ordering of the physical data rows in accordance with the index blocks that point to them. Some databases separate the data and index blocks into separate files, others put two completely different data blocks within the same physical file(s). === Cluster === When multiple databases and multiple tables are joined, it is called a cluster (not to be confused with clustered index described previously). The records for the tables sharing the value of a cluster key shall be stored together in the same or nearby data blocks. This may improve the joins of these tables on the cluster key, since the matching records are stored together and less I/O is required to locate them. The cluster configuration defines the data layout in the tables that are parts of the cluster. A cluster can be keyed with a B-tree index or a hash table. The data block where the table record is stored is defined by the value of the cluster key. == Column order == The order that the index definition defines the columns in is important. It is possible to retrieve a set of row identifiers using only the first indexed column. However, it is not possible or efficient (on most databases) to retrieve the set of row identifiers using only the second or greater indexed column. For example, in a phone book organized by city first, then by last name, and then by first name, in a particular city, one can easily extract the list of all phone numbers. However, it would be very tedious to find all the phone numbers for a particular last name. One would have to look within each city's section for the entries with that last name. Some databases can do this, others just won't use the index. In the phone book example with a composite index created on the columns (city, last_name, first_name), if we search by giving exact values for all the three fields, search time is minimal—but if we provide the values for city and first_name only, the search uses only the city field to retrieve all matched records. Then a sequential lookup checks the matching with first_name. So, to improve the performance, one must ensure that the index is created on the order of search columns. == Applications and limitations == Indexes are useful for many applications but come with some limitations. Consider the following SQL statement: SELECT first_name FROM people WHERE last_name = 'Smith';. To process this statement without an index the database software must look at the last_name column on every row in the table (this is known as a full table scan). With an index the database simply follows the index data structure (typically a B-tree) until the Smith entry has been found; this is much less computationally expensive than a full table scan. Consider this SQL statement: SELECT email_address FROM customers WHERE email_address LIKE '%@wikipedia.org';. This query would yield an email address for every customer whose email address ends with "@wikipedia.org", but even if the email_address column has been indexed the database must perform a full index scan. This is because the index is built with the assumption that words go from left to right. With a wildcard at the beginning of the search-term, the database software is unable to use the underlying index data structure (in other words, the WHERE-clause is not sargable). This problem can be solved through the addition of another index created on reverse(email_address) and a SQL query like this: SELECT email_address FROM customers WHERE reverse(email_address) LIKE reverse('%@wikipedia.org');. This puts the wild-card at the right-most part of the query (now gro.aidepikiw@%), which the index on reverse(email_address) can satisfy. When the wildcard characters are used on both sides of the search word as %wikipedia.org%, the index available on this field is not used. Rather only a sequential search is performed, which takes O ( N ) {\displaystyle
Sentiment analysis
Sentiment analysis (also known as opinion mining) is the use of natural language processing, text analysis, computational linguistics, and biometrics to systematically identify, extract, quantify, and study affective states and subjective information. Sentiment analysis is widely applied to voice of the customer materials such as reviews and survey responses, online and social media, and healthcare materials for applications that range from marketing to customer service to clinical medicine. With the rise of deep language models, such as RoBERTa, more difficult data domains can be analyzed, e.g., news texts where authors typically express their opinion/sentiment less explicitly. == Types == A basic task in sentiment analysis is classifying the polarity of a given text at the document, sentence, or feature/aspect level—whether the expressed opinion in a document, a sentence or an entity feature/aspect is positive, negative, or neutral. Advanced, "beyond polarity" sentiment classification looks, for instance, at emotional states such as enjoyment, anger, disgust, sadness, fear, and surprise. Precursors to sentimental analysis include the General Inquirer, which provided hints toward quantifying patterns in text and, separately, psychological research that examined a person's psychological state based on analysis of their verbal behavior. Subsequently, the method described in a patent by Volcani and Fogel, looked specifically at sentiment and identified individual words and phrases in text with respect to different emotional scales. A current system based on their work, called EffectCheck, presents synonyms that can be used to increase or decrease the level of evoked emotion in each scale. Many other subsequent efforts were less sophisticated, using a mere polar view of sentiment, from positive to negative, such as work by Turney, and Pang who applied different methods for detecting the polarity of product reviews and movie reviews respectively. This work is at the document level. One can also classify a document's polarity on a multi-way scale, which was attempted by Pang and Snyder among others: Pang and Lee expanded the basic task of classifying a movie review as either positive or negative to predict star ratings on either a 3- or a 4-star scale, while Snyder performed an in-depth analysis of restaurant reviews, predicting ratings for various aspects of the given restaurant, such as the food and atmosphere (on a five-star scale). First steps to bringing together various approaches—learning, lexical, knowledge-based, etc.—were taken in the 2004 AAAI Spring Symposium where linguists, computer scientists, and other interested researchers first aligned interests and proposed shared tasks and benchmark data sets for the systematic computational research on affect, appeal, subjectivity, and sentiment in text. Even though in most statistical classification methods, the neutral class is ignored under the assumption that neutral texts lie near the boundary of the binary classifier, several researchers suggest that, as in every polarity problem, three categories must be identified. Moreover, it can be proven that specific classifiers such as the Max Entropy and SVMs can benefit from the introduction of a neutral class and improve the overall accuracy of the classification. There are in principle two ways for operating with a neutral class. Either, the algorithm proceeds by first identifying the neutral language, filtering it out and then assessing the rest in terms of positive and negative sentiments, or it builds a three-way classification in one step. This second approach often involves estimating a probability distribution over all categories (e.g. naive Bayes classifiers as implemented by the NLTK). Whether and how to use a neutral class depends on the nature of the data: if the data is clearly clustered into neutral, negative and positive language, it makes sense to filter the neutral language out and focus on the polarity between positive and negative sentiments. If, in contrast, the data are mostly neutral with small deviations towards positive and negative affect, this strategy would make it harder to clearly distinguish between the two poles. A different method for determining sentiment is the use of a scaling system whereby words commonly associated with having a negative, neutral, or positive sentiment are given an associated number on a −10 to +10 scale (most negative up to most positive) or simply from 0 to a positive upper limit such as +4. This makes it possible to adjust the sentiment of a given term relative to its environment (usually on the level of the sentence). When a piece of unstructured text is analyzed using natural language processing, each concept in the specified environment is given a score based on the way sentiment words relate to the concept and its associated score. This allows movement to a more sophisticated understanding of sentiment, because it is now possible to adjust the sentiment value of a concept relative to modifications that may surround it. Words, for example, that intensify, relax or negate the sentiment expressed by the concept can affect its score. Alternatively, texts can be given a positive and negative sentiment strength score if the goal is to determine the sentiment in a text rather than the overall polarity and strength of the text. There are various other types of sentiment analysis, such as aspect-based sentiment analysis, grading sentiment analysis (positive, negative, neutral), multilingual sentiment analysis and detection of emotions. === Subjectivity/objectivity identification === This task is commonly defined as classifying a given text (usually a sentence) into one of two classes: objective or subjective. This problem can sometimes be more difficult than polarity classification. The subjectivity of words and phrases may depend on their context and an objective document may contain subjective sentences (e.g., a news article quoting people's opinions). Moreover, as mentioned by Su, results are largely dependent on the definition of subjectivity used when annotating texts. However, Pang showed that removing objective sentences from a document before classifying its polarity helped improve performance. Subjective and objective identification, emerging subtasks of sentiment analysis to use syntactic, semantic features, and machine learning knowledge to identify if a sentence or document contains facts or opinions. Awareness of recognizing factual and opinions is not recent, having possibly first presented by Carbonell at Yale University in 1979. The term objective refers to the incident carrying factual information. Example of an objective sentence: 'To be elected president of the United States, a candidate must be at least thirty-five years of age.' The term subjective describes the incident contains non-factual information in various forms, such as personal opinions, judgment, and predictions, also known as 'private states'. In the example down below, it reflects a private states 'We Americans'. Moreover, the target entity commented by the opinions can take several forms from tangible product to intangible topic matters stated in Liu (2010). Furthermore, three types of attitudes were observed by Liu (2010), 1) positive opinions, 2) neutral opinions, and 3) negative opinions. Example of a subjective sentence: 'We Americans need to elect a president who is mature and who is able to make wise decisions.' This analysis is a classification problem. Each class's collections of words or phrase indicators are defined for to locate desirable patterns on unannotated text. For subjective expression, a different word list has been created. Lists of subjective indicators in words or phrases have been developed by multiple researchers in the linguist and natural language processing field states in Riloff et al. (2003). A dictionary of extraction rules has to be created for measuring given expressions. Over the years, in subjective detection, the features extraction progression from curating features by hand to automated features learning. At the moment, automated learning methods can further separate into supervised and unsupervised machine learning. Patterns extraction with machine learning process annotated and unannotated text have been explored extensively by academic researchers. However, researchers recognized several challenges in developing fixed sets of rules for expressions respectably. Much of the challenges in rule development stems from the nature of textual information. Six challenges have been recognized by several researchers: 1) metaphorical expressions, 2) discrepancies in writings, 3) context-sensitive, 4) represented words with fewer usages, 5) time-sensitive, and 6) ever-growing volume. Metaphorical expressions. The text contains metaphoric expression may impact on the performance on the extraction. Besides, metaphors take in different forms, which may have been contribu