Industrial process data validation and reconciliation, or more briefly, process data reconciliation (PDR), is a technology that uses process information and mathematical methods in order to automatically ensure data validation and reconciliation by correcting measurements in industrial processes. The use of PDR allows for extracting accurate and reliable information about the state of industry processes from raw measurement data and produces a single consistent set of data representing the most likely process operation. == Models, data and measurement errors == Industrial processes, for example chemical or thermodynamic processes in chemical plants, refineries, oil or gas production sites, or power plants, are often represented by two fundamental means: Models that express the general structure of the processes, Data that reflects the state of the processes at a given point in time. Models can have different levels of detail, for example one can incorporate simple mass or compound conservation balances, or more advanced thermodynamic models including energy conservation laws. Mathematically the model can be expressed by a nonlinear system of equations F ( y ) = 0 {\displaystyle F(y)=0\,} in the variables y = ( y 1 , … , y n ) {\displaystyle y=(y_{1},\ldots ,y_{n})} , which incorporates all the above-mentioned system constraints (for example the mass or heat balances around a unit). A variable could be the temperature or the pressure at a certain place in the plant. === Error types === Data originates typically from measurements taken at different places throughout the industrial site, for example temperature, pressure, volumetric flow rate measurements etc. To understand the basic principles of PDR, it is important to first recognize that plant measurements are never 100% correct, i.e. raw measurement y {\displaystyle y\,} is not a solution of the nonlinear system F ( y ) = 0 {\displaystyle F(y)=0\,\!} . When using measurements without correction to generate plant balances, it is common to have incoherencies. Measurement errors can be categorized into two basic types: random errors due to intrinsic sensor accuracy and systematic errors (or gross errors) due to sensor calibration or faulty data transmission. Random errors means that the measurement y {\displaystyle y\,\!} is a random variable with mean y ∗ {\displaystyle y^{}\,\!} , where y ∗ {\displaystyle y^{}\,\!} is the true value that is typically not known. A systematic error on the other hand is characterized by a measurement y {\displaystyle y\,\!} which is a random variable with mean y ¯ {\displaystyle {\bar {y}}\,\!} , which is not equal to the true value y ∗ {\displaystyle y^{}\,} . For ease in deriving and implementing an optimal estimation solution, and based on arguments that errors are the sum of many factors (so that the Central limit theorem has some effect), data reconciliation assumes these errors are normally distributed. Other sources of errors when calculating plant balances include process faults such as leaks, unmodeled heat losses, incorrect physical properties or other physical parameters used in equations, and incorrect structure such as unmodeled bypass lines. Other errors include unmodeled plant dynamics such as holdup changes, and other instabilities in plant operations that violate steady state (algebraic) models. Additional dynamic errors arise when measurements and samples are not taken at the same time, especially lab analyses. The normal practice of using time averages for the data input partly reduces the dynamic problems. However, that does not completely resolve timing inconsistencies for infrequently-sampled data like lab analyses. This use of average values, like a moving average, acts as a low-pass filter, so high frequency noise is mostly eliminated. The result is that, in practice, data reconciliation is mainly making adjustments to correct systematic errors like biases. === Necessity of removing measurement errors === ISA-95 is the international standard for the integration of enterprise and control systems It asserts that: Data reconciliation is a serious issue for enterprise-control integration. The data have to be valid to be useful for the enterprise system. The data must often be determined from physical measurements that have associated error factors. This must usually be converted into exact values for the enterprise system. This conversion may require manual, or intelligent reconciliation of the converted values [...]. Systems must be set up to ensure that accurate data are sent to production and from production. Inadvertent operator or clerical errors may result in too much production, too little production, the wrong production, incorrect inventory, or missing inventory. == History == PDR has become more and more important due to industrial processes that are becoming more and more complex. PDR started in the early 1960s with applications aiming at closing material balances in production processes where raw measurements were available for all variables. At the same time the problem of gross error identification and elimination has been presented. In the late 1960s and 1970s unmeasured variables were taken into account in the data reconciliation process., PDR also became more mature by considering general nonlinear equation systems coming from thermodynamic models., , Quasi steady state dynamics for filtering and simultaneous parameter estimation over time were introduced in 1977 by Stanley and Mah. Dynamic PDR was formulated as a nonlinear optimization problem by Liebman et al. in 1992. == Data reconciliation == Data reconciliation is a technique that targets at correcting measurement errors that are due to measurement noise, i.e. random errors. From a statistical point of view the main assumption is that no systematic errors exist in the set of measurements, since they may bias the reconciliation results and reduce the robustness of the reconciliation. Given n {\displaystyle n} measurements y i {\displaystyle y_{i}} , data reconciliation can mathematically be expressed as an optimization problem of the following form: min x , y ∗ ∑ i = 1 n ( y i ∗ − y i σ i ) 2 subject to F ( x , y ∗ ) = 0 y min ≤ y ∗ ≤ y max x min ≤ x ≤ x max , {\displaystyle {\begin{aligned}\min _{x,y^{}}&\sum _{i=1}^{n}\left({\frac {y_{i}^{}-y_{i}}{\sigma _{i}}}\right)^{2}\\{\text{subject to }}&F(x,y^{})=0\\&y_{\min }\leq y^{}\leq y_{\max }\\&x_{\min }\leq x\leq x_{\max },\end{aligned}}\,\!} where y i ∗ {\displaystyle y_{i}^{}\,\!} is the reconciled value of the i {\displaystyle i} -th measurement ( i = 1 , … , n {\displaystyle i=1,\ldots ,n\,\!} ), y i {\displaystyle y_{i}\,\!} is the measured value of the i {\displaystyle i} -th measurement ( i = 1 , … , n {\displaystyle i=1,\ldots ,n\,\!} ), x j {\displaystyle x_{j}\,\!} is the j {\displaystyle j} -th unmeasured variable ( j = 1 , … , m {\displaystyle j=1,\ldots ,m\,\!} ), and σ i {\displaystyle \sigma _{i}\,\!} is the standard deviation of the i {\displaystyle i} -th measurement ( i = 1 , … , n {\displaystyle i=1,\ldots ,n\,\!} ), F ( x , y ∗ ) = 0 {\displaystyle F(x,y^{})=0\,\!} are the p {\displaystyle p\,\!} process equality constraints and x min , x max , y min , y max {\displaystyle x_{\min },x_{\max },y_{\min },y_{\max }\,\!} are the bounds on the measured and unmeasured variables. The term ( y i ∗ − y i σ i ) 2 {\displaystyle \left({\frac {y_{i}^{}-y_{i}}{\sigma _{i}}}\right)^{2}\,\!} is called the penalty of measurement i. The objective function is the sum of the penalties, which will be denoted in the following by f ( y ∗ ) = ∑ i = 1 n ( y i ∗ − y i σ i ) 2 {\displaystyle f(y^{})=\sum _{i=1}^{n}\left({\frac {y_{i}^{}-y_{i}}{\sigma _{i}}}\right)^{2}} . In other words, one wants to minimize the overall correction (measured in the least squares term) that is needed in order to satisfy the system constraints. Additionally, each least squares term is weighted by the standard deviation of the corresponding measurement. The standard deviation is related to the accuracy of the measurement. For example, at a 95% confidence level, the standard deviation is about half the accuracy. === Redundancy === Data reconciliation relies strongly on the concept of redundancy to correct the measurements as little as possible in order to satisfy the process constraints. Here, redundancy is defined differently from redundancy in information theory. Instead, redundancy arises from combining sensor data with the model (algebraic constraints), sometimes more specifically called "spatial redundancy", "analytical redundancy", or "topological redundancy". Redundancy can be due to sensor redundancy, where sensors are duplicated in order to have more than one measurement of the same quantity. Redundancy also arises when a single variable can be estimated in several independent ways from separate sets of measurements at a given time or time averaging period, using the algebraic constraints. Redundancy is linked to the concept
Read more →