Chapter 1. Theoretical Concepts for Describing a Replication-levels-based Uncertainty Analysis Approach Chapter 1 Theoretical Concepts for Describing a Replication-levels-based Uncertainty Analysis Approach Ander Zarketa-Astigarraga, Alain Martin-Mayor and Manex Martinez-Agirre1 1.1. Introduction Delimiting the scope of uncertainty analysis, which places itself between the fields of mathematics and physical experimentalism, may turn tedious for the newbie who tries to take it to application for the first time. However, performing such an analysis is becoming an accepted standard on fields including any sort of experiment, and the subsequent results are being required to provide information on their degree of exactitude. This calls for a systematization when accounting for the uncertainties of measured magnitudes, and systematizing such an analysis, albeit possible and desirable, asks for a well-founded background on the notions that underpin the uncertainty theory. The fact, anyway, is that there seems to be no conclusive consensus regarding the basic concepts that are meant to constitute the building blocks of the theory; rather, those concepts are to be found on a number of canonical references [1, 3, 5, 7, 8] that, although compose a closed system of notions as a theory already liable to be applied, lack of a unified narrative necessary for constituting a holistic view of the subject. Hence, the principal aim of the work presented herein is to perform an attempt to gather those disperse pieces of information and to put them in Ander Zarketa-Astigarraga Mondragon Unibertsitatea, Faculty of Engineering, Mechanical and Industrial Production, Mondragon, Spain 15 Advances in Measurements and Instrumentation: Review, Book Series, Vol. 2 a format of a structured narrative; the secondary aim is to provide the theoretical background so that its application to practical case studies may be understood on a well-founded basis. As the purpose is to give a bottom-up description of the entities that come to interplay in the uncertainty analysis, the paper is structured as follows: Section 1.2 serves as a reminder of the foundational concepts of physical experimentalism; Section 1.3 introduces the notion of measurement chain, which is the starting point for properly describing the term uncertainty, detailed in Section 1.4. Section 1.5 verses on replication levels, which are central to the process of systematizing the analysis and, finally, Section 1.6 translates the previous concepts to a mathematical formulation, developing the tools that are to be applied on practical grounds. 1.2. On the Basic Definitions of Physical Experimentalism The study of any physical phenomenon, from an empirical standpoint, is comparative. The baseline case is usually represented by a simplified model of a particular phenomenon. The purpose of a modelization is to enclose a minimal set of entities needed to reproduce a phenomenon; such a minimal set is known as a system. The term model, as it is employed herein, refers to the mathematical formulation that describes a physical process taking place within a defined system. The concept of physical phenomenon is related with the notion of change, and the questions to be answered are how, and why, a system changes (the constancy of a system can be understood as a lack of change). As such, a given model contains a set of descriptors that represent the variations of a system. Those descriptors are termed variables in the mathematical formulation, and their material counterparts are combinations of physical properties, or magnitudes. Intuitively, the lesser the number of magnitudes affecting a phenomenon, the less complex the relational analysis becomes. That’s why, as happens for the definition of a system, the formulation of a model seeks to describe a physical process with a set of variables that minimizes the magnitudes coming to interplay. That minimum amount need not be obtained by a unique collection of magnitudes; rather, it is left to the judgement of the experimentalist to choose from the set of potential properties the ones that best describe the different configurations to be found on a predefined system. The only constraint imposed upon the chosen magnitudes is that their discrete values must univocally reproduce a given configuration of a system. 16 Chapter 1. Theoretical Concepts for Describing a Replication-levels-based Uncertainty Analysis Approach Each of those configurations, represented by disjoint combinations of magnitudes, is known as a state of a system. The definitions yielded above constitute a mere conceptual framework. A model is no more than a derivation coming from the abstract mathematical strata. Insofar a model addresses entities of different categorical sciences, as physics and mathematics, it remains theoretical unless corroborated by empirical evidence. From this conception, a proof seeks to ascertain the closeness between the physical reality of a phenomenon and its modelization. Accounting for that closeness lies at the core of the comparative nature of experimentalism. The primary act that experiments rely on is observation and a physically observed action constitutes a fact. The validity of an abstract, mathematical-empirical model depends on the assessment of relations between facts, or hypotheses. Hence, a hypothesis is a potentially true statement that links facts together. Thus, for a system undergoing a physical phenomenon, a hypothesis constitutes a relation between the magnitudes that determine the states of that system. It follows that a physical fact is acknowledgeable as far as it is measurable, i.e. as far as the observed differences between magnitudes are quantified somehow. On experimental grounds, the magnitudes that constitute the variables of a model become measurands. Any experimental effort is carried out within a given physical scenario, or experimental set-up. A set-up is projected and built so that a desired system may be reproduced at its states of interest. Inherently, any set-up comes with two main limitations. On the one hand, the experimentalist is supposed to operate on the system somehow, i.e. the system is requested to allow setting a number of measurands to known values. On the other hand, and in addition to the desired physical phenomenon, several other phenomena may be taking place. This implies the potential existence of magnitudes that are not measurands themselves. As such, even though hypotheses are stated in simple terms, yielding a one-to- one, causal relationships between measurands, it is practically unfeasible to incorporate the entire set of physical magnitudes in a set of hypotheses. Based on these considerations, the overall magnitudes of a set-up that interact with a given system may be divided into three categories: • An independent magnitude is a measurand contained in a hypothesis, and settable by the experimentalist; 17 Advances in Measurements and Instrumentation: Review, Book Series, Vol. 2 • A dependent magnitude is a measurand contained in a hypothesis, and determinable by the experimentalist; • An extraneous magnitude is not contained in a set of hypotheses, and hence does not constitute a measurand itself. However, it may affect the values of the measurands of a system. An experimental workflow leading to the determination of a set of dependent magnitudes is called an experiment. Due to the heuristic nature of physical experimentalism, the experiments are required to be normative; a way of restating this is to consider that the same abstract system is usually subjected to physical analysis repeatedly, either in a single set-up or in a number of different set-ups. Regardless of the possible different physical scenarios, for the experiments to be relevantly comparable they must comply with a reproducibility condition. This does not solely mean that the input measurands need to match among different trials, but that the experiments themselves are to be performed in a procedurally equivalent manner. That’s why the design of an experiment is meant, ultimately, to define formal procedures, or protocols. These protocols contain the sets of rules for setting the independent magnitudes and determining the dependent ones; additionally, they should provide the experimental conditions under which the extraneous magnitudes are assumed unchanged, or frozen. In such circumstances, a theoretical model may be tested against a number of experiments to prove its validity. 1.3. On the Concept of Measurement Chain When the previous methodology is applied to the study of a physical phenomenon, discrepancies may be found at two conceptual levels. The first one pertains the differences between the theoretical framework and the physical reality; a model may not reflect the overall changes that a system is undergoing, or it may be valid within a restricted range of physical magnitudes, or the effects of such magnitudes may not be acceptably imprinted on the model itself. Regardless of the specific cause, a decoupling between the two categorical strata, namely physics and mathematics, lies at the core of the divergences. Severe as it may sound, this kind of flaw has more to do with a lack of understanding of a phenomenon, rather than with mistakes committed at the execution level of the experiments. 18 Chapter 1. Theoretical Concepts for Describing a Replication-levels-based Uncertainty Analysis Approach The second source of discrepancies is found at the experimental stage. When a model is assumed valid enough so that its rejection is not considered an acceptable solution for the detected divergencies, explanations are to be found at possible experimental errors. Taking a model for granted is not an exception that raises few times; either if the mathematical expressions reflect basic conservation laws of physics (e.g. zero net production of energy on an isolated system) or refer to previously validated tests against universally accepted standards (e.g. calibration protocols), the underlying mathematical abstractions are not questioned. Instead a sensible and broadly accepted consideration in experimentalism is the existence of errors in measurements. When the scope of error analysis is limited to experimental reasons, the basic concept serving as a starting point is that of measurement chain. This chain refers to the potential deviations that contribute to the mismatch between a measurand’s value and its modeled counterpart. Notice that the comparison with an idealized system also lies at the conception of the measurement chain. According to [5], the disturbances introduced in a system by the mere act of measuring may be described sequentially, and yield a total of five potentially different values for a measurand, which are schematically summarized in Fig. 1.1. Fig. 1.1. Schematic depiction of the different potential values found on a generic measurement chain (adapted from [12]). • The real value is the hypothetical value the measurand would have if the system were not affected by the measurement process. 19 Advances in Measurements and Instrumentation: Review, Book Series, Vol. 2 • The available value is the value of the measurand in the system, at the measurement point, while the measurement is being taken. A device employed to measure a magnitude is termed sensor, or probe. Since any sensor is intrusive to some extent, it is considered unfeasible to leave a system intact when a probe is introduced. Put on formal terminology: as the sensor itself constitutes another system, what is being measured are the changes in the state of the sensor-system; those changes come at the expense of moving the original system to a different operating point, resulting in a further change of the measurand. Carefully designed sensors or non-intrusive techniques are aimed at minimizing these system disturbances as much as possible. • The achieved value is the value that the measurand has in the sensor while the measurement is being made. If attention is focused on the system constituted by the sensor itself, it would be naïve to consider that the only magnitude affecting that sensor-system is the measurand. Extraneous magnitudes may further alter the value being recorded, which ultimately cause the sensor to equilibrate with the entire environment rather than with the measurand alone. These deviations are called system/sensor interactions. • The measured value is the value attributed to the measurand when the output of the sensor is interpreted using the best estimate of the calibration of the sensor. The only potential source of error that stands between achieved and measured values is the measurement system. A proper calibration seeks to provide a complete map between the input and output of a measurement system in terms of a temporal parameter. Even in standardized calibration protocols, there exists a pre-established tolerance related to the acceptability of the measured values relative to the hypothetical real values. This tolerance can fall as small as the resolution of the sensor, which imposes a lower bound on the changes that the sensor itself is able to detect on a measurand, and is usually dictated by physical constraints. However, that tolerance may grow larger due to several reasons, as fabrication defects or above-resolution measurand fluctuations; in such cases of imperfect calibration, the achieved and measured values do not coincide, and the error coming from the calibration stage needs to be accounted for. • Finally, the corrected value refers to the experimentalist’s best estimate of the real value, once the system disturbances, system/sensor interactions and calibration errors are taken into account. 20 Chapter 1. Theoretical Concepts for Describing a Replication-levels-based Uncertainty Analysis Approach As far as measurands are descriptors of a system, if errors on measurement are assumed as an inherent part of the experiments, then a description of a system lacking information about those errors is incomplete by all means. An acceptable description requires defining certain bounds for the differences between real and corrected values and information must be provided on how close those values are estimated to lie. However, completing a system’s description by error information is not to be understood, solely, as a formal good-practice exercise; in fact, another of its main purposes resides in constituting a checking baseline for other experiments. It tells experimentalists whether values measured in a hypothetically equivalent system’s state, but under potentially different experimental conditions, is significant. In cases that those values differ more than the expected error range, it may be concluded that either the experimental conditions are not the same for the tests, or that unwanted effects are entering one of the measurement chains. The former cause is not treated herein as a formal experimental error, as it is fixable by modifying the operative point of the mistaken set-up. Accounting for measurement chain perturbations, besides, is related to the nature of two main factors: that of conducting the experiments and that of the error sources. 1.4. On the Concept of Uncertainty: Nature of Experiments and Error Sources Previous considerations make clear that an error refers to a difference between real and corrected values, and that it constitutes a reliability descriptor of a measurand on an experiment. For a single observation on a test, the error is certainly a fixed number, computable by the experimentalist; when a finite number of measurements are performed, an error analysis may be carried out on the recorded data, or the stored values of the measurand. As those values are known, error calculations are based on statistical analysis. However, the term uncertainty addresses a different concept, as the point is not to compute the error value from known data, but to ascertain a possible value that the error might have on future tests. From this standpoint, uncertainty analysis relies on statistical inference, and not statistical analysis. Uncertainty values are not descriptors in the way errors are, but estimators. The distinction is relevant in a twofold sense: on formal grounds, mathematical tools employed in statistical analysis 21 Advances in Measurements and Instrumentation: Review, Book Series, Vol. 2 are different from those used in statistical inference. From a practical point of view, a test owns an unknown error value a priori. At best, the error is expected to fall within the range predicted by the uncertainty value. The calculations to obtain this uncertainty may depend on the pool of error data coming from previous tests and, if so, its value changes due to the error resulting from the experiment itself. As such, error and uncertainty analyses may come to influence each other, but that they are defined in such a way so that uncertainties are meant to predict the possible errors committed in further tests. Thus, uncertainty analysis is predictive by definition, but its relation to error analysis is not. The link between uncertainties and errors depends on the type of experiment conducted. Statistically, experimental tests may be classified in two main groups [3, 7], whose differences are subtle enough to require further explanation. The basic criterion is the availability to obtain independent data points on a given experimental configuration. From a statistical standpoint, achieving independent data points is equivalent to repeating the experiment a number of times. The convenience of repeatability has a well-founded rationale behind: ideally, if the same measurement were taken with acceptably large sets of different observers and instruments as to constitute a statistical population, then the reliability of the measurements could be assured by statistics. Multiple-sample experiments are those in which uncertainties are evaluated by such a repetition. On the other hand, single-sample experiments are constrained, for any reason whatsoever, to a limited number of observations, and their uncertainty evaluation is done by estimation, not statistical calculus. Although the concept of statistical independence is easily understood, its experimental reality is not as evident. The difficulty seems to come from a number of factors that tend to lessen the repeatability condition [3], such as differences in the reading of the same measurand by several observers, or discrepancies on the same measurement by nominally equivalent, but individually different, sensors. Alternatively, inadequate measurement parameters may lead to single-sample observations [7]. Regardless of the number of samples taken, measurements are done with a given sampling frequency, and the physical phenomenon being studied shows characteristic changing rates, or frequency spectra. If the lapse between consecutive readings happens to be much larger than the lowest characteristic frequency of the system, then the samples may be considered independent. Otherwise, a single-sample experiment results. 22 Chapter 1. Theoretical Concepts for Describing a Replication-levels-based Uncertainty Analysis Approach The corollary is that the mentioned experimental categories are overlapping. A measurement may satisfy single-sample conditions for certain configurations, whereas it becomes multiple-sampled for others. The distinction between the mentioned experimental categories arises when treating the recorded data. Statistical inference and statistical analysis are mathematical processing tools used when dealing with single- and multiple-sample experiments, respectively. Nevertheless, experiments are subjected to the same potential sources of error regardless of their statistical nature. Those sources are identified in Section 1.3 as being responsible for the different links that constitute the measurement chain, namely: system disturbances, system/sensor interactions and calibration errors (see Fig. 1.1). Although valid enough, that classification lacks the universality employed when defining the types of error that are found on experiments. Instead, error sources are better classified on a temporal basis alone. If time-dependency is taken as the primary criterion, it follows that errors fall into two main categories: those that change with time, and those that do not. Additionally, the ones that change with time may be predictable on a deterministic way, or be of a wholly random nature. A way of matching this temporal classification and the one introduced in Section 1.3 is to think of a time-marching structure, or a past-present-future sequence (which is an ad hoc classification made by the authors). Past errors are the ones not affected by time anymore, such as fabrication defects or calibration errors that may influence the measurements. Future errors refer to those that change with time, but whose effect is known in advance: system disturbances and system/sensor interactions can be regarded as deterministic as long as they follow well-established trends. Present errors are to be understood as purely random, either because system disturbances and system/sensor interactions are poorly understood or wrongly considered, or because there are a number of sources entirely contingent on experimental or operative conditions, such as scale- or display-reading actions performed by observers or natural equilibrium fluctuations, respectively. This classification, based on deterministic grounds, allows performing a preciser denomination of errors. As the so-called past and future errors can be estimated, their effects are considered by the addition of a fixed correction factor to the measurements that compensates the bias introduced in the recorded value; hence, they constitute fixed or bias errors. On the other hand, present errors are not liable to corrections due 23 Advances in Measurements and Instrumentation: Review, Book Series, Vol. 2 to their random nature or precision scattering, and are assumed as random or precision errors. The double denomination has to do with historical differences on the terminological conventions regarding single- and multiple-sample uncertainty theories; whereas fixed and random errors are used on single-sample theory, bias and precision errors are typical of multiple-sample experiments. Regardless of the type of experiment conducted, it is the experimentalist’s duty to account for all the correctable errors, so that the tests are carried out with the only unavoidable sources at play, which are the random ones. Additionally, both fixed and random error values are meant to be reported with a justified uncertainty analysis performed on them. A systematic approach to account for those uncertainties lies on the concept of replication levels. 1.5. On the Concept of Replication Levels Replication and repetition are close concepts, but refer to different experimental levels. When a test is said to be repeatable, it is meant that the recorded values of a measurand, on repeated trials, lie on an acceptably narrow range; in other words, that the committed errors are below a certain threshold. Section 1.4 links the concept of repeatability to the stage of data processing, whereby the statistical categories of experiments arise; similarly, it distinguishes between error values and error sources, showing them to be independent from each other. Keeping that distinction, replicability has to do with the conditions under which the repeated trials are assumed to take place; specifically, it points to the potential error sources that are supposed unchanged for all trials. As such, repeatibility is related to the error values resulting from a set of experiments, whereas replicability addresses the error sources of those tests. Originally, the notion of replicability lies within the scope of single- sample experiments [5-8] and it owns an additional distinctive feature when compared to repeatability; as replicability addresses unchanged error sources, the experimentalist can make different assumptions regarding the sources that remain constant. This leads to defining different orders of replicability in accordance to the constraining level of those assumptions. The interest in considering constant error sources resides in the auxiliary information obtained from such analyses, which 24 Chapter 1. Theoretical Concepts for Describing a Replication-levels-based Uncertainty Analysis Approach ultimately serves to diagnose the system from an uncertainty standpoint. Following the classification of error sources in Section 1.4, three main replication levels are defined [5, 8] (see Fig 1.2). Fig. 1.2. Schematic depiction of the replication level philosophy. • 0th order replication level: at this level, time itself is considered frozen. A way of thinking of a time-independent replication pattern is to assume that no error sources are allowed to change, just the reading that different observers make of a certain measurand on the same display. If a picture of that display is taken and shown to a number of observers, the only error source available is the scale-reading interpolation resulting from the resolution of the sensor. The utility of this particular replication pattern is found on the planning stage of an experiment, when a survey is taken over a number of sensors to check their suitability for a given measurement. If the precision of that measurement is required to lie below a predefined value, sensors that own larger interpolation scattering can be discarded in advance. A generalized case should consider the contributions of electrical white noise [2] and thermodynamic equilibrium fluctuations [4]. However, the resolution interval of the device usually subsumes white noise, as it happens to be several orders of magnitude smaller than the detectable electrical output. A way of characterizing the relevance of thermodynamic equilibrium fluctuations is to estimate whether the medium within which measurements are taken behaves as a continuum; as those fluctuations happen at a molecular scale, the comparison between that scale and the probe’s representative 25 Advances in Measurements and Instrumentation: Review, Book Series, Vol. 2 dimension is used as an indicator. That is precisely the meaning of the Knudsen number, Kn =  / d , where  stands for the molecular mean free path and d is the characteristic dimension of the probe. If the resulting Kn is small, the sensor is large compared to the scale at which thermodynamic fluctuations take place, and their effect is, usually, averaged out over the measurement volume. As such, for typical cases in which white noise and thermodynamic fluctuations are neglected, the 0th order replication level provides the random uncertainty value of the measurement system. • 1st order replication level: when the temporal parameter is allowed to change, the inherent unsteadiness of the process enters the uncertainty calculation. This timewise jitter, as it is termed, is captured if a number of subsequent measurements are taken while the test is running; from previous considerations, it is important to ensure the independence of those measurements, so that a proper statistical calculation can follow. Additionally, for this replication level to be representative, the measurements must cover a characteristic time-lapse of the measurand’s change rate during the experiment. The lower bound of that time-lapse is dictated by the response time of the probe, but the upper bound is specifically test-dependent, and corresponds to the experimentalist to choose a sensible value. As the set of subsequent measurements is obtained while the experiment is running, both timewise jitter and interpolation errors are present on the recorded data. The timewise jitter, measured this way, only accounts for random errors of system disturbance and system/sensor interaction types. Usually, the 1st order replication test is run separately at the debugging stage of the experimental set-up, and its value stored to diagnose further measurements. • Nth order replication level: it is the broadest replication level conceivable, and may be considered by thinking that, for each measurement, the probe is changed by a similar one coming from the same manufacturer. The potential errors committed at the calibration level are added to the uncertainty value, together with the experimental unsteadiness and interpolation scattering. Typically, tests performed on a single experimental set-up do not face the scattering problems associated with sensor identity, as the measurements are carried out with the same equipment. However, information about calibration uncertainty is necessary when comparing experimental data from different facilities that undertake the same tests. When random calibration errors are 26 Chapter 1. Theoretical Concepts for Describing a Replication-levels-based Uncertainty Analysis Approach considered together with the value coming from the 1st order replication test, the resulting uncertainty is meant to be part of the final report. Unlike the available tests for the lower replication levels, there is not an auxiliary test or a generalized rule of thumb for obtaining the calibration error of a probe. The manufacturer may provide this information in the form of a calibration sheet, but the usual case is not to do so, either because the information itself has not been obtained or because it is concealed on behalf of confidentiality. Instead, the experimentalist has to interpret the specification sheet and retrieve the necessary information, making the correspondent assumptions to justify the data. The concept of replication levels comprises a property that determines the relation between single- and multiple-sample experiments. By definition, all orders of replication show a common feature, not mentioned so far: they require the repeatability of tests for their determination. Further, they need those repetitions to be both large and independent so that statistical analysis applies on the recorded data. It is precisely the purpose of multiple-sample theory the evaluation of uncertainties by repetitive, independent tests. Hence, it follows that replication orders aim at calculating random uncertainty terms of single- sample experiments by undertaking individual, multiple-sample tests on each of those random terms. If it is not possible to perform multiple- sample tests on a term, which is the case, for example, of calibration- related errors, then statistical inference is used to estimate its contribution. With all, the notion of replication level provides a hinge between the statistical categories of experimental tests. However, replication levels only account for random errors. For a complete description of the uncertainties, it is necessary to add the correspondent fixed errors at each level. The resultant values are named 0th, 1st and Nth order uncertainties, respectively. Fixed errors coming from the measurement system are usually due to ground loops or flawed connections, and should not show up if the electrical circuitry is properly designed; hence, 0th order uncertainty is usually equal to the random uncertainty of the measurement system. 1st order fixed errors come from deterministic system disturbances and system/sensor interactions; the typical way of treating them is by analytical-empirical correlations that model the expected effects. If those models quantify the error in terms of the measurand alone, the resultant value is directly added as a fixed error. Instead, if the correction models depend on additional parameters, the uncertainty associated with each of them affects the fixed error value, 27 Advances in Measurements and Instrumentation: Review, Book Series, Vol. 2 and needs to be estimated. N th order fixed errors coming from the calibration stage are as elusive to detect as their random counterparts. Their ultimate sources are fabrication defects or unaccounted calibration biases that unavoidably leak into the tests. On measurement argot, it is said that these errors are fossilized, in the sense that they can neither be detected nor removed, but simply assumed. If the calibration process is taken for granted, the fossilized error may be neglected without further concerns. With all, the corollary to be drawn from previous lines is that fixed and bias errors are equivalent terms, in the sense that they both need to be estimated regardless of the type of experiment conducted. Random and precision errors differ on the statistical treatment, which ultimately depends on how large the population of measurements grows. The outputs generated by single- and multiple-sample analyses are not strictly equivalent. As mentioned, the concept of replication level is linked to the theory of single-sample experiments; as such, the notions of replication order or uncertainty order are defined within that scope. For any reason whatsoever, multiple-sample theory has been, originally, more concerned with analyzing standard procedures [8]. The outcomes of multiple-sample experiments are defined to be the bias limit and the precision index [1]; the former is equivalent to the N th fixed error of single-sample theory, whereas the second is related (but not equal) to the 1st order random error. When combined, they yield the overall uncertainty, which matches the N th order uncertainty of single-sample experiments. With the core concepts laid, what is left is to translate the key definitions to mathematical notation for laying the proper formulation of the theory. 1.6. On the Mathematical Basis of Uncertainty Analysis The primary question posed by the uncertainty analysis is how to process the measured data so that potential errors committed during a test get reflected on the final results. So far, the term result has not been formally introduced on its uncertainty-related meaning. Instead, the term measurand has been used to address the primary quantity, or magnitude, coming from a measurement and the term data has served to refer to the stored values of a given measurand. A result, however, is obtained after processing the data. This processing stage may not only include 28 Chapter 1. Theoretical Concepts for Describing a Replication-levels-based Uncertainty Analysis Approach converting the electrical output into a physical magnitude or applying signal treatment techniques, but making operations with different blocks of data. As such, the distinctive feature of a result, from the uncertainty standpoint, is that it comes from calculations performed by a number of different measurands. Of course, there may be measurands that constitute results themselves, i.e. they need not be combined with other measurands in order to yield a proper descriptor of a system. Such a definition of result involves considering how the errors that affect each measurand on the calculation are propagated to the derived values. Although the former definition of propagation, due to Kline & McClintock [3], calls uncertainties to the errors carried by each of the measurands to the result, updated versions of the theory state that uncertainties are meant to be specifically result-related [8]; measurands used to derive results are said to own fixed and random errors, if single- sampled, or bias limits and precision indices, if multiple-sampled. The distinction is important insofar the information provided by uncertainties and errors is different; in fact, as uncertainties are obtained by combining errors statistically, the contributions of individual error terms are no longer discernible in an overall uncertainty value. The analysis begins by stating the functional relation between a generic result R and a number of magnitudes x1,, xn : R = f x1,, xn  (1.1) The term magnitude is used to address each of the xi -s present on Eq. (1.1). From those, the set containing either dependent or independent magnitudes constitutes, by definition, the observed measurands. The rest are extraneous magnitudes that do not get measured, but that may affect the outcome if they are not properly considered; in case the experiment is designed and conducted correctly, those extraneous magnitudes remain unchanged or frozen. Letting N be the number of measurands, and m the number of extraneous magnitudes that, for the sake of simplicity, are assumed frozen, Eq. (1.1) may be reformulated thusly: x1,...xm R = f X1,, X N ; x1,, xm   R = f X ,, X  , (1.2) frozen 1 N where the right hand-side expression is to be understood as the frozen magnitudes not affecting the result. 29 Advances in Measurements and Instrumentation: Review, Book Series, Vol. 2 If the measurands are considered independent and their respective errors small enough, the overall error may be assumed to follow a linearized expression: i=N R = R  Xi (1.3) i=1 Xi In Eq. (1.3), the terms R / Xi constitute the error sensitivities, i.e. each of the partial contributions to the overall error due to a unit error in a measurand [6]. The terms  Xi represent measurand uncertainties, in case measurands were considered results themselves. However, being terms that enter a result calculation, these factors are named variation intervals further on, thus complying with formal terminological conventions. Further derivations of Eq. (1.3) require assumptions regarding those variation intervals. As mentioned in Section 1.4, uncertainties refer to the probabilities of errors falling between certain values. The procedures for modeling errors and calculating uncertainties, hence, require the usage of probability spaces. Mathematically, when an experiment is conducted, all possible events comprising error observations constitute a set  ; such a set is interpreted as the potentially observable errors that can be obtained when randomly sampling  , which is why  is termed a sample space. An event is the actual observation of a particular error on a given trial; intuitively, it is assumed that certain events are more probable to happen than others, which is expressed by a functional relation P that assigns probabilities to events. The triplet , , P is called a probability space; the variable  refers to a mathematical entity, namely a -algebra structure, that is formally necessary to define a probability space with the required mathematical rigor, but which is not of a major concern for the derivation that follows. A probability space links events and probabilities together, but it is defined in terms of error observations instead of error values. Providing values to events is accomplished by introducing the concept of random variable that, within the scope of uncertainty analysis, is a functional relation assigning real numbers to error observations. When a random variable  is defined so that it maps each error observation to the actual error value coming from a test, it becomes possible to relate  and the probability assigning function P to yield relevant information about the 30 Chapter 1. Theoretical Concepts for Describing a Replication-levels-based Uncertainty Analysis Approach experiments. If an infinite number of trials were conducted, different relations between  and P would constitute continuous functions upon which the tools of differential calculus would apply. Two of such functions that provide experimental information are the cumulative distribution function and the probability density function, respectively [9]: F :   0,1 (1.4)   F () := P    , f :    dF   f ( ) :=    (1.5)    d The cumulative distribution function, Eq. (1.4), provides the probability of an error acquiring a value of  or less, which is why it is mapped to the interval 0,1 (the probability of committing any error is unity, whereas that of no error is zero). Its derivative with respect to the random variable is the probability density function (PDF onwards), Eq. (1.5), also named the frequency distribution function. Intuitively, such a concept of derivative may be understood as f ()d being the probability of  falling in the interval ,  d . Although any of the distribution functions contains the necessary information for mathematically acknowledging a random variable, the PDF provides a straightforward relation to certain experimental data, as this function can be constructed from that data. So far, no distinction has been made on the particular nature of the errors being treated. Building the PDF of a given experimental error is done by following the definition of Eq. (1.5); for a set of subsequent trials, the number of times a particular error value is measured is plotted against that same value. When normalizing the number of events by the total number of trials, and on the limit of taking the number of trials to infinity, this procedure results on a continuous function that represents the probabilities of measuring those error values, which is precisely the definition of f   . However, requiring subsequent trials for the construction of f   discards any possibility of treating fixed errors in such a way, as fixed errors do not manifest themselves on the scattering 31 Advances in Measurements and Instrumentation: Review, Book Series, Vol. 2 of a set of consecutive measurements. Instead, f   addresses random errors only, whose specific nature will depend on the order of replication considered in the experiments. The fact that random errors may be represented by PDFs is acknowledged on early reports treating the subject [10, 3]. The choice of a particular distribution for describing a random error is done according to additional assumptions regarding that randomness. Usually, those assumptions lead to three typical random error distributions [9]: • A Gaussian or normal distribution (see Fig. 1.3) is used when the scattering of the measurements is considered to be of a random nature itself:  2  f  | , 2  1    = exp   2  (1.6) 2 2  2   White noise or equilibrium thermodynamic fluctuation measurements, if feasible, would enter this group. So would fluctuations of a measurand above the resolution of the measuring device. The standard uncertainty interval,  , corresponding to a Gaussian distribution is given by [9]:  =  , (1.7) N being N the number of measurements, or sampling population. Fig. 1.3. Schematic of a Gaussian distribution (may not be properly scaled). 32 Chapter 1. Theoretical Concepts for Describing a Replication-levels-based Uncertainty Analysis Approach • A uniform distribution (see Fig. 1.4) is used when the interval of possible values is known, but the measurement provides little additional information. Being [a,b] the interval of possible values for  :  0  < a,  f  1  |  =  a   b, (1.8) 2 3  0 b < Fig. 1.4. Schematic of a uniform distribution (may not be properly scaled). Again, from [9], the standard uncertainty of a uniform distribution is given by:  = b  a (1.9) 2 3 Digital-display equipment is considered to follow this distribution at the resolution level of the device. The resolution represents the range b  a of possible values, within which the measured value is known to lie. As it is unknown how the device performs the round-off operation, the actual value is equiprobable within the range b  a , from which the uniform distribution results. 33 Advances in Measurements and Instrumentation: Review, Book Series, Vol. 2 • A triangular distribution (see Fig. 1.5) is used when, in addition to the range b  a of possible values of  , the measurand is considered to lie closer to the central value of that range:  0  < a     a a  < a  b , b  a / 22 2  f  | a,b =  2   = a  b , (1.10)  b  a 2  b  a  b  2 <   b, b  a / 2 2   0 b <  Fig. 1.5. Schematic of a triangular distribution (may not be properly scaled). As for the previous cases, [9] provides the standard uncertainty for a triangular distribution:  = b  a , (1.11) 2 6 where b  a stands for the range of possible values, as before. Triangular distributions are typical for analog-display devices such as 34 Chapter 1. Theoretical Concepts for Describing a Replication-levels-based Uncertainty Analysis Approach calipers or manometers. The actual reading allows discerning the indicator’s closeness to the center value, which affects the subsequent round-off operation, unlike for digital displays. The definition of typical PDFs entails additional statistical concepts not introduced so far, such as the parameters  or  in Eqs. (1.6), (1.8). Mathematically, those parameters are grouped under the notion of moment, which serves to quantitatively describe the shape of a function. Formally, the k -th moment of a continuous PDF about a point ̂ is defined as [9]:  k =   k  ˆ f  d (1.12)   The parameter  corresponds to the centered (ˆ = 0 ) first moment ( k = 1) of a PDF:   =   f  d , (1.13)   and it represents the average value of a given distribution, or the mean. The parameter  2 equates to the mean-centered (ˆ =  ) second moment ( k = 2 ) of a PDF:   2 =      2 f  d , (1.14)  and is known as the variance of the distribution ( 2 = Var ). The parameter  itself, which is unit-consistent with  , is termed standard deviation, and quantifies the scattering of a distribution around its mean. Although higher-order moments provide additional morphological information on the distribution, the mean and the standard deviation suffice for the purpose of uncertainty analysis. Running an infinite number of experiments constitutes an idealization, and on practical ground the PDFs are not continuous, but discrete. They are obtained not from the entire population that would result from the idealized set of infinite experiments, but from the finite sampling space that represents the actually undertaken ones. As such, the moments obtained from that finite sampling space do not describe the shape of the 35 Advances in Measurements and Instrumentation: Review, Book Series, Vol. 2 theoretical PDF faithfully but, rather, constitute estimators of its moments. If a sufficiently large number of samples is taken, as may be the case for random errors, the estimators will lie close enough to the theoretical moments so that the differences can be regarded as negligible. Otherwise, as happens for fixed errors, the experimentalist is forced to guess the PDF that would result from a hypothetical repetitive pattern of those biases. The main difference between single- and multiple-sample experiments, at a mathematical level, is to be found here: multiple- sample experiments are liable to follow a practically discrete, but conceptually continuous, PDF, whereas single-sample tests are not. However, the point on describing the different replication levels in Section 1.5 is to justify that the distinction between fixed and random errors is formally relative; in fact, fixed errors may be thought of as random errors not been sampled the necessary amount of times to obtain a statistically relevant PDF. Written in simpler terms [3]: fixed errors are also accepted to own a theoretical PDF similar to random errors. With all, the discrete estimators of the mean and the (squared) standard deviation are written as follows [9]:  = 1 N i , (1.15) N i=1 1 N S 2 2 =   (1.16) N 1 i i=1 The aim of the estimators is to provide the best estimate of a measurand, and the magnitude of the error on that estimation. Employing Eq. (1.15) as the best estimate is straightforward; agreeing on an estimator for the errors, which ultimately stand for each of the  X mentioned in Eq. (1.3), is not. The problem with considering the standard deviation as the basic uncertainty estimator is that it is inconsistent for errors coming from different distributions [3]. That inconsistency is found at the definition level. As a data scattering quantifier, the standard deviation is to be understood as follows: a value of a unit standard deviation around the mean encloses a certain area of the underlying PDF; that area is the probability of an error value lying on the range [  ,  ] . For the typical PDFs in Eqs. (1.6), (1.8) and (1.10), those probabilities are of 68 %, 58 % and 65 %, respectively (see Figs. 1.3 to 1.5). 36 Chapter 1. Theoretical Concepts for Describing a Replication-levels-based Uncertainty Analysis Approach Usually, the experimentalist asks for higher probability rates. A confidence level of 95 % is commonplace when reporting experiments, which is to say that, regardless of the PDF, the scattering is quantified with a 95 % of probability of finding future error values lying on the resulting confidence interval around the mean. The advantages of considering uncertainties with such a method are that it homogenizes the differences coming from different distributions, allows propagating the uncertainties with constant probability to the results, and provides the experimentalist with a flexible parameter to play with, namely the settable value of the confidence level. Bearing such considerations in mind, measurands are meant to be reported on the following manner [3]:  =   S b to 1 , (1.17) where  and S are the arithmetic mean and standard deviation estimators, respectively, and the expression b to 1 stands for the odds that the experimentalist would be willing to bet that the error is less than S . The parameter b and the confidence level are related as b = 1/ 1 100  , with  being the chosen level. For the case  = 95% , the previous expression yields b = 20 , which provides the odds 20 to1 for the measurand-reporting expression. In addition to the interpretations given so far, such an expression tells that, for consecutive trials of an experiment, the error committed in the measurand value, 19 times out of 20, is supposed to lie below S . For the typical Gaussian, uniform and triangular distributions, the 95% confidence level uncertainties are related to the standard uncertainties expressed in Eqs. (1.7), (1.9) and (1.11) by constants, such that  , =0.95 = 2 1.65 and 1.81 ,  respectively. Reports [3, 8] show that, if a result can be linearized as in Eq. (1.3), with each of the measurands on the expression being independent and expressed as in Eq. (1.17), then a quadratic combination of uncertainty intervals (with  Xi = Si ) allows propagating the uncertainties to the result with constant probability: i= N  2  S =  RR  S 2 (1.18) i=1  X  i|R i  37 Advances in Measurements and Instrumentation: Review, Book Series, Vol. 2 With such an approach, fixed and random errors of measurands are kept separated until the last step of computing the uncertainty of a result is undertaken. Thus, Eq. (1.18) is applied separately for fixed and random terms, namely SR| fixed and SR|random . The overall uncertainty in a result is given as [1]: 2  X R = S 2 R| fixed  tSR|random  , (1.19) where SR| fixed and SR|random are calculated from Eq. (1.18). In the original formulation, the parameter t is the Student-t value, which depends on the degrees of freedom used in estimating each of the S factors. For relatively large samples ( N > 30 ), t may be assumed to be  2 ; otherwise, the Welch-Satterthwaite formula is to be applied [11, 1]. The purpose of the Student-t parameter is to account for the difference between the statistical parameter (coming from a continuous PDF) and its estimator (discrete), as mentioned before. As the 95 % confidence intervals already account for that difference, the approach undertaken herein does not employ the Student-t parameter, and the formulation changes slightly by already introducing the SR|random terms with their respective 95 % intervals uR|random : i=N 2 SR|random = uR|randomSR|random  (1.20) i i=1 As the errors are propagated with constant probability, the Student-t parameter is absent in the final expression:  X = S 2 2R R| fixed  SR|random (1.21) The treatment of errors developed so far serves to systematically account for different sources and their propagation to results. When doing so, it is common practice to provide the broadest uncertainty intervals in order to keep the measurements as conservative as possible. However, such a practice hinders the original purpose of uncertainty analysis: if results from different set-ups or facilities are to be compared, shorter uncertainty intervals allow for preciser comparisons, easily detecting differences among tests and capturing mislead measurements. Thus, the claim made 38 Chapter 1. Theoretical Concepts for Describing a Replication-levels-based Uncertainty Analysis Approach herein is to employ the shortest uncertainty intervals available when reporting the measured data. 1.7. Concluding Remarks It has been shown that the growing importance that experimental uncertainty analysis has had historically can be tackled by a systematic procedure lying on the concept of replication levels. Those levels are delimited by the nature of the error sources that enter the experimental test. The 0th replication level considers, merely, the reading- interpolation error committed at the measuring device’s scope. The 1st one adds potential system/sensor interactions, which are accounted for by analytical-empirical correlations, and timewise jitters. The broadest level, namely the N th one, encompasses the uncertainties coming from possible manufacturing defects or calibration errors. The former two levels are calculated by tools coming from the field of statistical inference, whereas the latter can only be estimated based on manufacturers’ specification sheets. Anyhow, the mathematical strategy that backs up the mentioned procedure assumes that the combination of different uncertainty intervals is to be performed on a probability- preserving manner. This means that, when certain measurands are combined to yield a derived magnitude, the uncertainty interval of such a magnitude is meant to own a confidence level that matches the one that the original measurands have. This approach provides a way for building a hierarchical classification of magnitudes depending on their functional relation to the basic measurands, with the overall set of magnitudes owning the same confidence level regarding their particular uncertainty intervals. These intervals can be pieced down into a number of contributors that, if traced back, correspond to the basic measurands. Acknowledgements The authors gratefully acknowledge the financial support from the Department of Education of the Basque Government for the Research Grant [PRE_2017_1_0178]. 39 Advances in Measurements and Instrumentation: Review, Book Series, Vol. 2 References [1]. R. B. Abernethy, R. P. Benedict, R. B. Dowdell, ASME measurement uncertainty, J. Fluids Eng., Vol. 107, 1985, pp. 161-164. [2]. J. B. Johnson, Thermal agitation of electricity in conductors, Phys. Rev., Vol. 32, 1928, pp. 97-109. [3]. J. S. Kline, F. A. McClintock, Uncertainties in single-sample experiments, Mech. Eng. , 1953, pp. 3-8. [4]. Y. Mishin, Thermodynamic theory of equilibrium fluctuations, Annals of Physics, Vol. 363, 2015, pp. 48-97. [5]. R. J. Moffat, The measurement chain and validation of experimental measurements, in Proceedings of the 6th Congress of the International Measurement Confederation (ACTA IMEKO’73), Vol. 1, Dresden, Germany, 1973, pp. 45-53. [6]. R. J. Moffat, Contributions to the theory of single-sample uncertainty analysis, J. Fluids Eng., Vol. 104, 1982, pp. 250-258. [7]. R. J. Moffat, Using uncertainty analysis in the planning of an experiment, J. Fluids Eng., Vol. 107, 1985, pp. 173-178. [8]. R. J. Moffat, Describing the uncertainties in experimental results, Exp. Therm. Fluid Sci., Vol. 1, 1988, pp. 3-17. [9]. J. Olarrea Busto, M. Cordero Gracia, Estadística, 45 Problemas Útiles, 1st Edition, Garcia Maroto Editores, Madrid, 2009. [10]. K. Pearson, On the mathematical theory of errors of judgment, with special reference to the personal equation, Philosophical Transactions of the Royal Society of London Series A, Vol. 198, 1902, pp. 235-299. [11]. B. L. Welch, The generalization of student’s problem when several different population variances are involved, Biometrika, Vol. 34, 1947, pp. 28-35. [12]. T. Arts, J.-M. Buchlin, Temperature measurements, Chapter 4, in Measurement Techniques in Fluid Dynamics, 3rd Edition, von Karman Institute for Fluid Dynamics, Brussels, 2009. 40