Making Time: A Study in the Epistemology of Measurement Eran Tal This is a pre-published version. To cite or quote this article please refer to the published version: Tal, E. (2016) “Making Time: A Study in the Epistemology of Measurement”, British Journal for the Philosophy of Science 67, pp. 297-335. (http://doi.org/10.1093/bjps/axu037)

Abstract: This article develops a model-based account of the standardization of physical measurement, taking the contemporary standardization of time as its central case-study. To standardize the measurement of a quantity, I argue, is to legislate the mode of application of a quantity-concept to a collection of exemplary artefacts. Legislation involves an iterative exchange between top-down adjustments to theoretical and statistical models regulating the application of a concept, and bottom-up adjustments to material artefacts in light of remaining gaps. The model-based account clarifies the cognitive role of ad hoc corrections, arbitrary rules and seemingly circular inferences involved in contemporary timekeeping, and explains the stability of networks of standards better than its conventionalist and constructivist counterparts.

Introduction The reproducibility of quantitative results in the physical sciences depends on the availability of stable measurement standards. The maintenance, dissemination and improvement of standards are central tasks in metrology, the science of measurement and its application1. Under the guidance of the Bureau International des Poids et Mesures (International Bureau of Weights and Measures, or BIPM) near Paris, a worldwide network of metrological institutions is responsible for the ongoing comparison and adjustment of standards. Among the various standardization projects in which metrologists are engaged, contemporary timekeeping is often considered the most successful, with the vast majority of national time signals agreeing well within a microsecond and stable to within a few nanoseconds a month2. The standard measure of time currently used in almost every context of civil and scientific life is known as Coordinated Universal Time or UTC3. UTC is the product of an international cooperative effort by time centers that rely on state-of-the-art atomic clocks spread throughout the globe. These clocks are designed to measure the frequencies associated with specific atomic transitions, including the cesium transition, which has defined the second since 19674. The success of time standardization makes it an instructive case study for the epistemology of measurement, that is, for the study of the relationships between 1

measurement and knowledge. Central topics that fall under the purview of the epistemology of measurement include the conditions under which measurement produces knowledge; the content, scope, justification and limits of such knowledge; the reasons why particular methodologies of measurement and standardization succeed or fail in supporting particular knowledge claims; and the relationships between measurement and other knowledgeproducing activities such as observation, theorizing, experimentation, modeling and calculation. While these topics are certainly not new to philosophy, in recent years they have been receiving focused and systematic attention from a growing number of scholars5. Intended as a contribution to the burgeoning body of literature on the epistemology of measurement, this article will attempt to clarify how successful standardization projects produce knowledge, what kind of knowledge they produce and why such knowledge should be deemed reliable. I will develop my account of standardization by examining the methods currently used to standardize time and by tracing the sources of those methods’ reliability. A central desideratum for the reliability of measurement procedures is stability, conceived as a double-aspect notion: the stability of a single measuring instrument is its tendency to produce the same measurement outcome over repeated runs, whereas the stability of a collection of instruments is their tendency to reproduce each other’s outcomes. In the case of clocks, the stability of a single clock is its tendency to ‘tick’ at the same frequency over time, whereas the stability of an ensemble of clocks is their tendency to ‘tick’ at the same frequency as each other6. What accounts for the overwhelming stability of contemporary timekeeping standards? Or, to phrase the question somewhat differently, what factors enable a variety of standardization laboratories around the world to closely reproduce Coordinated Universal Time on an ongoing basis? I will use this question as a test for the epistemology of standardization. A satisfactory account of standardization would explain how the various methods currently employed to standardize time succeed in maintaining a stable global system of timekeeping and why these methods are justified from an epistemic perspective. The various explanans one could offer for stability may be divided into two broad kinds. First, one could appeal to the discovery of empirical regularities in the behaviour of the atomic clocks that keep world time. Second, one could appeal to agreement among the metrological institutions that interpret and correct these atomic clocks. The adequate combination of these two sorts of explanans and the limits of their respective contribution to stability are contested issues among philosophers and sociologists of science. Following a discussion of the central methods and challenges involved in contemporary timekeeping (Section 1), I will explore the strengths and limits of two existing approaches to the study of standardization, namely conventionalism and constructivism (Section 2). While both approaches offer valuable insights into the practice and semantics of standardization, I will show that they focus too narrowly on either natural or social explanations for the stability of standards. The third and final section will present a novel, model-based account of standardization. As I will argue, facts about the stability of measuring instruments are modelbased, that is, sensitive to the theoretical and statistical assumptions under which those instruments are modelled. By making occasional modifications to the way measurement standards are modelled, standardization bureaus are able to regulate the mode of application of a quantity-concept even when the concept’s definition remains unchanged. The modelbased account will explain how both natural and social elements are mobilized through metrological practice, and provide an epistemic justification for the ad hoc and seemingly circular aspects of time standardization. In so doing, the model-based account will shed light on the content and quality of knowledge produced by successful standardization projects. 2

1. Making time universal 1.1 Stability and accuracy The measurement of time relies predominantly on counting the periods of cyclical processes, namely clocks7. Until the late 1960s, time was standardized by recurrent astronomical phenomena such as the apparent solar noon, and artificial clocks served only as secondary standards. Contemporary time standardization relies on atomic clocks, i.e. instruments that produce an electromagnetic signal that tracks the frequency of a particular atomic resonance. The two central desiderata for a reliable clock are known in the metrological jargon as frequency stability and frequency accuracy. The frequency of a clock is said to be stable if it ticks at a uniform rate, that is, if its cycles mark equal time intervals. The frequency of a clock is said to be accurate if it ticks at the desired rate, i.e. a specified number of cycles per second8. In practice, no clock has a perfectly stable frequency. The very notion of a stable frequency is an idealized one, derived from the theoretical definition of the standard second. Since 1967 the second has been defined as the duration of exactly 9,192,631,770 periods of the radiation corresponding to a hyperfine transition of cesium-133 in the ground state9. This definition plays a double role: it both defines the duration of the standard second and stipulates that a particular frequency associated with the cesium atom is uniform, i.e. that its periods are equal to one another. However, the frequency in question is a highly idealized construct. As far as the definition is concerned, the cesium atom in question is at rest at zero degrees Kelvin with no background fields influencing the energy associated with the transition. It is only under these ideal conditions that a cesium atom would constitute a perfectly stable clock10. There are several different ways to construct clocks that would approximately satisfy – or in metrological jargon, ‘realize’ – the conditions specified by the definition. Different clock designs result in different trade-offs between frequency accuracy, frequency stability and other desiderata, such as ease of maintenance, ease of comparison and low financial cost. Primary realizations of the second are designed for optimal accuracy, i.e. minimal uncertainty with respect to the rate in which they ‘tick’. As of 2011, twelve primary realizations are maintained by leading national metrological laboratories worldwide11. These clocks are special by virtue of the fact that every known influence on their output frequency is controlled and rigorously modelled, resulting in detailed ‘uncertainty budgets.’ The clock design implemented in most primary standards is the ‘cesium fountain’, so called because cesium atoms are tossed up in a vacuum and fall down due to gravity12. The complexity of cesium fountains, however, and the need to routinely monitor their performance and environment prevents them from running continuously. Instead, each cesium fountain clock usually operates only for a few weeks at a time, about five times a year. The intermittent operation of cesium fountain clocks means that they cannot be used directly for timekeeping. Instead, they are used to calibrate secondary standards, i.e. atomic clocks that are less accurate but run continuously for years. About 400 such secondary standards are currently employed to keep world time13. These clocks are highly stable in the short run, meaning that the ratios between the frequencies of their ‘ticks’ remain very nearly constant over weeks and months. But over longer periods the frequencies of secondary standards exhibit drifts, both relative to each other and in relation to the frequencies of primary standards. 3

1.2 The challenges of synchronicity As neither primary nor secondary standards ‘tick’ at exactly the same rate, metrologists are faced with a variety of real durations that can all be said to fit the definition of the second with some degree of uncertainty. Metrologists are therefore faced with the task of realizing the second based on indications from multiple, and often divergent, clocks14. To tackle this challenge, metrologists construct idealized theoretical representations of their instruments. From a theoretical point of view, the unified timescale chosen as the basis for international timekeeping is called terrestrial time. Corresponding to coordinate time on the earth’s surface, this timescale is chosen so that differences in proper time among local clocks could be accounted for. Ideally, one can imagine all of the atomic clocks that participate in global timekeeping as located on a rotating surface of equal gravitational potential that approximates the earth’s sea level. Such surface is called a ‘geoid’, and terrestrial time is the time a perfectly stable clock on that surface would tell when viewed by a distant observer. However, much like the definition of the second, the definition of terrestrial time is highly idealized and cannot be used directly to evaluate the accuracy of any concrete clock. Moreover, unlike primary standards, there is no practical way to evaluate the individual uncertainties of secondary standards from first theoretical principles. The solution is to introduce an operational measure of time that would approximate terrestrial time, while also maintaining a known relation to the indications of concrete clocks. This intermediary measure of time is Coordinated Universal Time15. Coordinated Universal Time is a measure of time whose scale interval – its basic unit – is intended to remain as close as is practically possible to a standard second on the rotating geoid. Yet UTC is not a clock; it does not actually ‘tick’, and cannot be continuously read off the display of any instrument. Instead, UTC is an abstract measure of time: a set of numbers calculated monthly in retrospect, based on the readings of participating clocks16. At the BIPM near Paris, the indications of secondary standards from over sixty national laboratories are recorded at five-day intervals, and used to calculate UTC. The end-result of the calculation is a table of numbers that indicate how late or early each nation’s ‘master time’, its local approximation of UTC, has been running in the past month. Typically ranging from a few nanoseconds to a few microseconds, these numbers allow national metrological institutes to tune their clocks to internationally accepted time. Table 1 is an excerpt from the monthly publication issued by the BIPM in which deviations from UTC are reported for each national laboratory.

1.3 Bootstrapping reliability UTC is calculated in three major steps, each involving its own set of conceptual and methodological difficulties. The first step involves processing data from hundreds of continually operating atomic clocks and calculating the free-running time scale, EAL (Échelle Atomique Libre). EAL is an average of clock indications weighted by frequency stability. Finding out which clocks are more stable than others requires some higher standard of stability against which clocks would be compared, but arriving at such a standard is the very goal of the calculation. For this reason EAL itself is used as the standard of stability for the clocks contributing to it. Every month, the BIPM rates the weight of each clock depending on how well it predicted the weighted average of the EAL clock ensemble in the past twelve 4

Table 1: Excerpt from Circular-T (BIPM 2013), a monthly report through which the International Bureau of Weights and Measures disseminates Coordinated Universal Time (UTC) to national standardization institutes. The numbers in the first seven columns indicate differences in nanoseconds between UTC and each of its local approximations at five-day intervals. The last three columns indicate type-A, type-B and total uncertainties for each comparison. (Only data associated with the first twenty laboratories is shown.)

months. The updated weight is then used to average clock data in the next cycle of calculation. This method promotes clocks that are stable relative to each other, while clocks whose stability relative to the overall average falls below a fixed threshold are given a weight of zero, i.e. removed from that month’s calculation. The average is then recalculated based on the remaining clocks. The process of removing offending clocks are recalculating is repeated exactly four times in each monthly cycle of calculation17.

5

Though effective in weeding out ‘noisy’ clocks, the weight updating algorithm introduces new perils to the stability of world time. First, there is the danger of a positive feedback effect, i.e. a case in which a few clocks become increasingly influential in the calculation simply because they have been dominant in the past. In this scenario, EAL would become tied to the idiosyncrasies of a handful of clocks, thereby increasing the likelihood that the remaining clocks would drift farther away from EAL. For this reason, the BIPM limits the weight allowed to any clock to a maximum of about 0.7 percent18. Other than positive feedback, a second source of potential instability is the abruptness with which new clock weights are modified every month. Because different clocks ‘tick’ at slightly different rates, a sudden change in weights results in a sudden change of frequency of the weighted average. To avoid frequency jumps, the BIPM adds ‘cushion’ terms to the weighted average based on a prediction of that month’s jump19. As a third precautionary measure, the BIPM assigns a zero weight to new clocks for a four month test interval before authorizing them to exert influence on international time. While the weighting algorithm stabilizes the average to some extent, further stabilization requires modifications to the composition of the clock ensemble. As mentioned above, EAL is maintained by a free-running ensemble of secondary standards. Today the majority of these clocks are commercially manufactured by Hewlett-Packard or one of its offshoot companies, Agilent and Symmetricom. These clocks have proven to be exceptionally stable relative to each other, and the number of HP clocks that participate in UTC has been steadily increasing since their introduction into world timekeeping in the early 1990s (Petit 2004, 208)20. The results of averaging depend not only on the choice of weighting algorithm and clock manufacturer, but also on the selection of participating laboratories. Only laboratories in nations among the eighty members and associates of BIPM are eligible for participation in the determination of EAL. Funded by membership fees, the BIPM aims to balance the threshold requirements of metrological quality with the financial benefits of inclusiveness. Membership requires national diplomatic relations with France, the depositary of the intergovernmental treaty known as the Metre Convention (Convention du Mètre). This treaty authorizes BIPM to standardize industrial and scientific measurement. The BIPM encourages participation in the Metre Convention by highlighting the advantages of recognized metrological competence in the domain of global trade, and by offering reduced fees to smaller states and developing countries21. Economic trends and political considerations thus influence which countries contribute to world time, and indirectly which atomic clocks are included in the calculation of UTC. Comparing clocks in different locations around the globe requires a reliable method of fixing the interval of comparison. This is another major challenge to globalizing time. Were the clocks located in the same room, they could be connected by optical fibres to a counter that would indicate the difference, in nanoseconds, among their readings every five days. Over large distances, time signals are transmitted via satellite. In most cases Global Positioning System (GPS) satellites are used, thereby ‘linking’ the readings of participating clocks to GPS time. But satellite transmissions are subject to delays, which fluctuate depending on atmospheric conditions. Moreover, GPS time is itself a relatively unstable derivative of UTC. These factors introduce uncertainties to clock comparison data known as time transfer noise. Increasing with its distance from Paris, transfer noise is often much larger than the local instabilities of contributing clocks. This means that the stability of UTC is in effect limited by satellite transmission quality. 6

1.4 Divergent standards Despite the multiple means employed to stabilize the weighted average of clock readings, additional steps are necessary to guarantee stability, due to the fact that the frequencies of continuously operating clocks tend to drift away from those of primary standards. In the late 1950s, when atomic time scales were first calculated, they were based solely on free-running clocks. Over the course of the following two decades, technological advances revealed that universal time was running too fast: the primary standards that realized the second were beating slightly slower than the clocks that kept time. To align the two frequencies, in 1977 the second of UTC was artificially lengthened by one part in 1012. At this time it was decided that the BIPM would make regular small corrections that would ‘steer’ the atomic second toward its officially realized duration, in at attempt to avoid future shocks22. This decision effectively split atomic time into two separate scales, each ‘ticking’ with a slightly different second: on the one hand, the weighted average of free-running clocks (EAL), and on the other the continually corrected (or ‘steered’) International Atomic Time, TAI (Temps Atomique International). The monthly calculation of steering corrections is a remarkable algorithmic feat, relying upon intermittent calibrations against the world’s twelve primary standards. These calibrations differ significantly from one another in quality and duration23. For this reason the BIPM assigns weights, or ‘filters’, to each calibration episode depending on its quality. But filtering is not sufficient: primary standards do not have exactly the same frequency, giving rise to the concern that the duration of the UTC second could fluctuate depending on which primary standard contributed the latest calibration. To circumvent this, the steering algorithm is endowed with ‘memory’, i.e. it extrapolates data from past calibration episodes into times in which primary standards are offline. This extrapolation must itself be timedependent, as noise limits the capacity of free-running clocks to ‘remember’ the frequency to which they were last calibrated. The BIPM therefore constructs statistical models for the relevant noise factors and uses them to derive a temporal coefficient, which is then incorporated into the calculation of ‘filters’24. This steering algorithm allows metrologists to track the difference in frequency between free-running clocks and primary standards. Ideally, the difference in frequency would remain stable, i.e. there would be a constant ratio between the ‘seconds’ of the two measures. In this ideal case, a simple linear transformation of EAL would provide us with a continuous timescale as accurate as a cesium fountain. In practice, EAL continues to drift. During the decade prior to 2009 its second has lengthened by a yearly average of about 4 parts in 1016 relative to primary standards25. This presents metrologists with a twofold problem: first, they have to decide how fast they want to ‘steer’ world time away from the drifting average. Overly aggressive steering would destabilize UTC, while too small a correction would cause clocks the world over to slowly diverge from the official (primary) second. Indeed, the BIPM has made several modifications to its steering policy in the past three decades in at attempt to optimize both smoothness and accuracy26. The other aspect of the problem is the need to stabilize the frequency of EAL itself. One solution to this aspect of the problem is to replace clocks in the ensemble with others that drift to a lesser extent. This task has largely been accomplished in the past two decades by the proliferation of HP clocks, but some instability remained. Since 2011, the calculation of EAL has incorporated a nonlinear prediction algorithm that compensates for clock instabilities in advance, a method which appears to have significantly reduced its overall frequency drift (Panfilo 2012). 7

Disagreement among standards is not the sole reason for frequency steering. Abrupt changes in the ‘official’ duration of the second as realized by primary standards may also trigger steering corrections. These abrupt changes can occur when metrologists modify the ways in which they model primary standards. For example, in 1996 the metrological community achieved consensus around the effects of thermal background radiation on cesium fountains, previously a much debated topic. A new systematic correction was subsequently applied to primary standards that shortened the second by approximately 2 parts in 1014. While this difference may seem minute, it took more than a year of monthly steering corrections for UTC to ‘catch up’ with the suddenly shortened second27.

1.5 The leap second With the calculation of TAI, the task of realizing a unified timescale based on the definition of the standard second is complete. TAI is considered to be a realization of terrestrial time, that is, an approximation of general-relativistic coordinate time on the earth’s sea level. However, a third and last step is required to keep UTC in step with traditional time as measured by the duration of the solar day. The mean solar day is slowly increasing in duration relative to atomic time due to gravitational interaction between the earth and the moon. To keep ‘noon UTC’ closely aligned with the apparent passage of the sun over the Greenwich meridian, a leap second is occasionally added to UTC based on astronomical observations. By contrast, TAI remains free of the constraint to match astronomical phenomena, and runs ahead of UTC by an integer number of seconds28.

2. The two faces of stability 2.1 An explanatory challenge The global synchronization of clocks in accordance with atomic time is a considerable technological achievement. Coordinated Universal Time is disseminated to all corners of civil life, from commerce and aviation to telecommunication, in manner that is seamless to the vast majority of its users29. The task of the remainder of this article is to explain how metrologists succeed in synchronizing clocks worldwide to Coordinated Universal Time. What are the sources of this measure’s efficacy in maintaining agreement among time centers? An adequate answer must account for the way in which the various ingredients that make up the calculation of UTC contribute to its success. In particular, the function of ad hoc corrections, rules of thumb and seemingly circular inferences prevalent in the production of UTC requires explanation. What role do these mechanisms play in stabilizing UTC, and is their use justified from an epistemic point of view? In tackling these questions, I will consider two explanans that have been traditionally proposed for the stability of networks of physical measurement standards: (i) The empirical regularities exhibited by the behaviour of measurement standards; (ii) The social coordination of policies for regulating and interpreting the behaviour of measurement standards. When the first explanan is emphasized, standardization is viewed as the discovery of regularities in the behaviour of some physical phenomena and the exploitation of these 8

regularities for constructing stable measurement standards. The efficacy of standardization methods is accordingly explained by appealing to their cognitive function, namely, to their suitability for discovering regularities in empirical data. When the second explanan is emphasized, standardization is viewed as the social coordination of policies for manipulating and interpreting physical phenomena. The efficacy of standardization methods is then explained by their consensus-building function, namely by their ability to generate universal agreement about the proper ways to conduct such manipulation and interpretation. What is the correct combination of these two explanans, and what sort of account of standardization results from their combination? Answering these questions will shed light on the purpose and epistemic status of the algorithmic manoeuvres involved in the calculation of UTC. But the scope of these questions extends beyond the measurement of time. Answering them will involve an inquiry into the goals of standardization projects, the sort of knowledge such projects produce, and the reasons such projects succeed or fail. This inquiry itself falls under the wider field known as the epistemology of measurement, which has attracted increasing attention in recent years30. Below I will propose a model-based account of standardization. This account will offer a novel analysis of the interplay between cognitive and social aspects of standardization by appealing to the dual, descriptive and normative roles of metrological models. I will frame the model-based account by comparison to two earlier strands of scholarship on measurement. The first strand is commonly labelled ‘conventionalism’ and includes a variety of views expounded by figures such as Ernst Mach ([1883] 1919, 223-4), Henri Poincaré ([1898] 1958), Hans Reichenbach ([1927] 1958, 109-149) and Rudolf Carnap ([1966] 1995, 51-121), among others. The second strand, which I will label ‘constructivism’, belongs to the social studies of science, and will be represented here by Bruno Latour (1987, 247-257) and Simon Schaffer (1992). With the exception of Schaffer, none of the abovementioned authors were primarily concerned with standardization but with questions concerning e.g. the relationship between theory and observation, the structure of space and time, and the processes governing the social acceptance of scientific claims. My goal here is not to offer a comprehensive commentary on their writings, but rather to use some of their insights concerning standardization as starting points for the development of an epistemology of standardization. Before proceeding to the analysis, it is worthwhile clarifying which problems this article will not attempt to resolve. I will not be concerned with realism about measurement in its many forms. Questions such as ‘are there mind-independent facts about temporal uniformity?’, ‘does the success of time standardization provide grounds for believing in the existence of such facts?’ and ‘should statements about clock accuracy be understood literally as statements about closeness to truth?’ are outside the scope of the current discussion. This is noteworthy for two reasons. First, conventionalism and constructivism are traditionally labelled as ‘anti-realist’ positions, and their adoption as starting points for the discussion may give the mistaken impression that the model-based account embraces a similar metaphysical standpoint. However, the lessons I will draw from conventionalist and constructivist literature concern the structure and dynamics of standardization rather than any claims about truth or reality. Second, debates concerning realism commonly invoke truth or reality as potential explanans of scientific success. By contrast, the epistemological approach to measurement adopted in this article aims to explain success and failure by appealing only to resources that are practically available to scientists when measuring and standardizing. Such resources include, for example, theories, models, definitions, statistical tools, materials, instruments, data, observations, calculations, methodological and linguistic conventions, and 9

institutional policies and regulations. Mind-independent truths about measurable quantities, e.g. the exact true value of a quantity or the true equality of two magnitudes, are not cognitively accessible to metrologists and thus cannot serve as epistemic standards for measurement. The existence or lack of such truths can only be inferred from measurement outcomes in retrospect with the help of additional, metaphysical assumptions that are extrinsic to the practice of measuring. Indeed, were metrologists required to evaluate the accuracy and stability of their measuring instruments against such truths metrology would be mired in scepticism31. Such truths or their lack will therefore not be considered legitimate epistemological explanans of success or failure in generating a stable network of clocks. The model-based account will accordingly remain metaphysically neutral and will neither presuppose nor deny any sort of realism about measureable quantities.

2.2 Conventionalist explanations Any plausible account of metrological knowledge must attend to the fact that metrologists enjoy some freedom in determining the correct application of the concepts they standardize, and should be able to clarify the sources and scope of this freedom. Traditionally, philosophers of science have taken standardization to consist in arbitrary acts of definition. Conventionalists like Mach, Poincaré, Carnap and Reichenbach stressed the arbitrary nature of the choice of congruence conditions, that is, the conditions under which magnitudes of certain quantities such as length and duration are deemed equal to one another. In the case of duration, such conditions amount to a criterion of uniformity in the flow of time. In his essay on “The Measure of Time” (([1898] 1958), Poincaré argued against the existence of a mind-independent criterion of temporal uniformity. Instead, he claimed that the choice of a standard measure of time is “the fruit of an unconscious opportunism” that leads scientists to select the simplest system of laws (ibid, 36). Reichenbach called these arbitrary choices of congruence conditions ‘coordinative definitions’ because they coordinate between the abstract concepts employed by a theory and the physical relations denoted by these concepts (Reichenbach 1927, 14)32. Prior to such coordinative definition there is no fact of the matter as to whether or not two given time intervals are equal (ibid, 116). Which physical process is deemed uniform (e.g. solar day, pendulum cycle, or radiation associated with an atomic transition) depends on considerations of convenience and simplicity in the description of empirical data rather than on the data themselves. The standardization of time, according to conventionalists, involves a free choice of a coordinative definition for uniformity. It is worth highlighting three features of this definitional sort of freedom. First, it is an a priori freedom in the sense that its exercise is independent of experience. One may choose any uniformity criterion as long as the consequences of that criterion do not contradict one another. Second, it is a freedom only in principle and not in practice. For pragmatic reasons, scientists select uniformity criteria that make their descriptions of nature as simple as possible. The actual selection of coordinative definition is therefore strongly constrained by the results of empirical tests. Third, definitional freedom is singular in the sense that it is completely exhausted by a single act of exercising it. Though a definition can be replaced by another, each such replacement annuls the previous definition. In this respect acts of definition are essentially ahistorical. Once a coordinative definition of uniformity is specified, conventionalists hold that the truth or falsity of empirical claims concerning temporal uniformity is completely fixed. How uniformly a given clock ‘ticks’ relative to a specified uniformity criterion is purely a matter of 10

empirical fact. The remaining task for metrologists is only to discover which clocks ‘tick’ at a more stable rate relative to the chosen definition of uniformity and to improve those clocks that are found to be less stable. In Carnap’s own words: If we find that a certain number of periods of process P always match a certain number of periods of process P’, we say that the two periodicities are equivalent. It is a fact of nature that there is a very large class of periodic processes that are equivalent to each other in this sense. (Carnap [1966] 1995, 82-3, my emphasis) We find that if we choose the pendulum as our basis of time, the resulting system of physical laws will be enormously simpler than if we choose my pulse beat. […] Once we make the choice, we can say that the process we have chosen is periodic in the strong sense. This is, of course, merely a matter of definition. But now the other processes that are equivalent to it are strongly periodic in a way that is not trivial, not merely a matter of definition. We make empirical tests and find by observation that they are strongly periodic in the sense that they exhibit great uniformity in their time intervals. (ibid, 84-5, my emphases)

In contemporary timekeeping, the definition of the second also functions as a coordinative definition of uniformity. Recall that the current definition of the second, in addition to fixing a unit of time, also postulates that the period of electromagnetic radiation associated with a particular transition of the cesium atom is constant. Accordingly, a conventionalist like Carnap would explain the stability of contemporary timekeeping by a combination of two factors: on the social side, the worldwide agreement to define uniformity on the basis of the frequency of the cesium transition; and on the natural side, the fact that all cesium atoms under specified conditions have the same frequency associated with that particular transition. The universality of the cesium transition frequency is, according to conventionalists, a mind-independent empirical regularity that metrologists cannot influence but may only describe more or less simply. How does the conventionalist explanation of stability fare with respect to the actual methods used to standardize time? From a conventionalist point of view, the main task facing metrologists is that of detecting which of the hundreds of atomic clocks used to standardize time ‘tick’ at a more stable rate relative to the defined cesium frequency. If the algorithmic manoeuvres employed in the calculation of UTC serve any epistemic purpose, it is to detect these stable clocks. A conventionalist, in other words, would view UTC as an indicator for regularities in the frequencies of clocks, regularities that may be described more or less simply but are themselves independent of human choice. The reproducibility of contemporary timekeeping would accordingly be explained by the reliability with which the UTC algorithm detects those underlying regularities. The idea that UTC is a reliable indicator of mind-independent regularities gains credence from the fact that UTC is gradually ‘steered’ towards the frequency of primary standards. As previously mentioned, primary frequency standards are rigorously evaluated for uncertainties and compared to each other in light of these evaluations. The fact that the frequencies of different primary standards are consistent with each other within uncertainty bounds can be taken as an indication for the universal regularity of the cesium frequency. Assuming, as metrologists do33, that the long-term frequency stability of UTC over years is due mostly to the contribution of primary standards, one can plausibly make the case that the algorithm that produces UTC is a reliable detector of a natural regularity, namely, the fact that all cesium-133 atoms have the same frequency associated with the specified transition. 11

The conventionalist analysis nevertheless leaves unexplained the success of the mechanisms that keep UTC stable in the short-term, i.e. when UTC is averaged over weeks and months. These mechanisms include the ongoing redistribution of clock weights, the limiting of maximum weight, the ‘slicing’ of steering corrections into small monthly increments and the increasingly exclusive reliance on Hewlett-Packard clocks, among others. One way of accounting for these short-term stabilizing mechanisms is to treat them as pragmatic tools for facilitating consensus among metrological institutions. I will discuss this approach in the next subsection. Another option would be to look for a genuine epistemic function that these mechanisms serve. To a conventionalist, this means finding a way of vindicating these self-stabilizing mechanisms as reliable indicators of an underlying empirical regularity. As a reliable indicator is one that is sensitive to the property being indicated, one should expect the relevant stabilizing mechanisms to do less well when such regularity is not strongly supported by the data. In practice, however, no such degradation in stability occurs. On the contrary, short-term stabilization mechanisms are designed to be as insensitive to frequency drifts or gaps in the data as is practically possible. It is rather the data that are continually adjusted to stabilize the outcome of the calculation. As already mentioned, whenever a discrepancy among the frequencies of different secondary standards persists for too long it is eliminated ad hoc, either by ignoring individual clocks or by eventually replacing them with others that are more favourable to the stability of the average. Frequency ‘shocks’ introduced by new clocks are numerically cushioned. Even corrections towards primary standards, which are supposed to increase accuracy, are spread over a long period by slicing them into incremental steering adjustments or by embedding them in a ‘memory-based’ calculation. The constancy of the cesium period in the short-term is therefore not tested by the algorithm that produces UTC. For a test implies the possibility of failure, whereas the stabilizing mechanisms employed by the BIPM in the short-term are fail-safe and intended to guard UTC against instabilities in the data. Indeed, there is no sign that metrologists even attempt to test the ‘goodness of fit’ of UTC to the individual data points that serve as the input for the calculation, let alone that they are prepared to reject UTC if it does not fit the data well enough. Rather than a hypothesis to be tested, the stability of the cesium period is a presupposition that is written into the calculation from the beginning and imposed on the data that serves as its input34. This seemingly question-begging practice of data analysis suggests either that metrological methods are fundamentally flawed or that the conventionalist explanation overlooks some important aspect of the way UTC is supposed to function. Below I will argue that the latter is the case, and that the seeming circularity in the calculation of UTC dissolves once the normative role of models in metrology is acknowledged.

2.3 Constructivist explanations As we learned previously, UTC owes its short-term stability not to the detection of underlying regularities in clock data, but rather to the imposition of a preconceived regularity on that data through algorithmic manoeuvres. Constructivist explanations for the success of standardization projects make such regulatory practices their central explanans. According to constructivists, standardizing time is not a matter of choosing which pre-existing natural regularity to exploit; rather, it is a matter of constructing regularities from otherwise irregular

12

instruments and human practices. Bruno Latour and Simon Schaffer express this position in the following ways: Time is not universal; every day it is made slightly more so by the extension of an international network that ties together, through visible and tangible linkages, each of all the reference clocks of the world and then organizes secondary and tertiary chains of references all the way to this rather imprecise watch I have on my wrist. There is a continuous trail of readings, checklists, paper forms, telephone lines, that tie all the clocks together. As soon as you leave this trail, you start to be uncertain about what time it is, and the only way to regain certainty is to get in touch again with the metrological chains. (Latour 1987, 251, emphasis in the original) Recent studies of the laboratory workplace have indicated that institutions’ local cultures are crucial for the emergence of facts, and instruments, from fragile experiments. […] But if facts depend so much on these local features, how do they work elsewhere? Practices must be distributed beyond the laboratory locale and the context of knowledge multiplied. Thus networks are constructed to distribute instruments and values which make the world fit for science. Metrology, the establishment of standard units for natural quantities, is the principal enterprise which allows the domination of this world. (Schaffer 1992, 23)

According to Latour and Schaffer, the metrological enterprise makes a part of the noisy and irregular world outside of the laboratory replicate an order otherwise exhibited only under controlled laboratory conditions. Metrologists achieve this aim by extending networks of instruments throughout the globe along with protocols for interpreting, adjusting and comparing these instruments. In particular, atomic clocks agree about the time because standardization bureaus maintain an international bureaucratic effort to harness those clocks into synchronicity. To use Latour’s language, the stability of the network of clocks depends on an ongoing flux of ‘paper forms’ issued by a network of ‘calculation centers’. When we look for the sources of regularity by which these forms are circulated we do not find universal laws of nature but international treaties, trade agreements and protocols of meetings among clock manufacturers, theoretical physicists, astronomers and telecommunication engineers. Without the resources continuously poured into the metrological enterprise, atomic clocks would not be able to tell the same time for very long. From a constructivist perspective, the algorithm that produces UTC is a particularly efficient mechanism for generating consensus among metrologists. Recall that Coordinated Universal Time is nothing over and above a list of corrections that the BIPM prescribes to the time signals maintained by local standardization institutes. By administering the corrections published in the monthly reports of the BIPM, metrologists from different countries are able to reach agreement despite the fact that their clocks ‘tick’ at different rates. This agreement is not arbitrary but constrained by the need to balance the central authority of the International Bureau with the autonomy of national institutes. The need for a tradeoff between centralism and autonomy accounts for the complexity of the algorithm that produces UTC, which is carefully crafted to achieve a socially optimal compromise among metrologists. A socially optimal compromise is one that achieves consensus with minimal cost to local metrological authorities, making it worthwhile for them to comply with the regulatory strictures imposed by the BIPM. Indeed, the algorithm is designed to distribute the smallest adjustments possible among as many clocks as possible. Consequently, the overall adjustments required to approximate UTC at any given local laboratory is kept to a minimum. 13

In stressing the importance of ongoing negotiations among metrological institutions, constructivists do not yet diverge from conventionalists, who similarly view the comparison and adjustment of standards as prerequisites for the reproducibility of measurement results. But constructivists go a step further and, unlike conventionalists, refuse to invoke the presence of an underlying empirical regularity in order to explain the stability of timekeeping standards35. On the contrary, they remind us that regularity is imposed on otherwise discrepant clocks for the sake of achieving commercial and economic goals. Only after the fact does this socially-imposed regularity assume the appearance of a natural phenomenon. This, I take it, is what Latour means when he states that “[t]ime is not universal; every day it is made slightly more so by the extension of an international network [of standards]” (1987, 251). Schaffer similarly claims that facts only “work” outside of the laboratory because metrologists have already made the world outside of the laboratory “fit for science”(1992, 23). According to these statements, if they are taken literally, quantitative scientific claims attain universal validity not by virtue of any pre-existing state of the world, but by virtue of the continued efforts of metrologists who transform parts of the world until they reproduce desired quantitative relations. In what follows I will call this the reification thesis. The reification thesis is a claim about the sources of regularity exhibited by measurement outcomes outside the carefully controlled conditions of a scientific laboratory. This sort of regularity, constructivists hold, is constituted by the stabilizing practices carried out by metrologists rather than simply discovered in the course of carrying out such practices. Note that the reification thesis entails an inversion of explanans and explanandum relative to the conventionalist account. It is the successful stabilization of metrological networks that, according to Latour and Schaffer, explains universal regularities in the behaviour of instruments rather than the other way around. How plausible is this explanatory inversion in the case of contemporary timekeeping? As already hinted above, the constructivist account fits well with the details of the case insofar as the short-term stability of standards is involved. In the short run, the UTC algorithm does not detect frequency stability in the behaviour of secondary standards but imposes stability on their behaviour. Whenever a discrepancy arises among different clocks it is eliminated by ad hoc correction or by replacing some of the clocks with others. The ad hoc nature of these adjustments guarantees that any instability, no matter how large, can be eliminated in the short run simply by redistributing instruments and ‘paper forms’ throughout the metrological network. The constructivist account is nevertheless hard pressed to explain the fact that the corrections involved in maintaining networks of standards remain small in the long run. An integral part of what makes a network of metrological standards stable is the fact that its maintenance requires only small and occasional adjustments rather than large and frequent ones. A network that reverted to irregularity too quickly after its last recalibration would demand constant tweaking, making its maintenance ineffective. This long-term aspect of stability is an essential part of what constitutes a successful network of standards, and is therefore in need of explanation no less than its short-term counterpart. After all, nothing guarantees that metrologists will always succeed in diminishing the magnitude and frequency of corrections they apply to networks of instruments. To illustrate this point, imagine that metrologists decided to keep the same algorithm they currently use for calculating UTC, but implemented it on the human pulse as a standard clock instead of the atomic standard36. As different humans have different pulse rates depending on the person and circumstances, the time difference between these organic standards would grow rapidly from the time of their latest correction. Institutionally imposed adjustments would only be able to bring universal time 14

into agreement for a short while before discrepancies among different pulse-clocks exploded once more. The same algorithm that produces UTC would be able to minimize adjustments to a few hours per month at best, instead of a few nanoseconds when implemented with atomic standards. Constructivists are faced with the task of explaining how the same mechanism of social compromise would generate either a highly stable, or a highly unstable, network depending on nothing but the kind of physical process used as a standard. How should one explain the difference in long-term stability between the atomic and organic case? Recall that conventionalists appealed to underlying regularities in nature to explain stability: metrologists succeed in stabilizing networks of standards because they choose as standards classes of processes that exhibit a high degree of regularity in relation to one another. But this explanatory move is blocked for those who, like Latour and Schaffer, appear to endorse the reification thesis with its requirement of explanatory inversion. Constructivists who work under the assumption of the reification thesis cannot appeal to natural regularities in the behaviour of pulses or cesium atoms as primitive explanans, and would therefore be unable to explain the difference in stability. Constructivists may respond by claiming that, for contingent historical reasons, scientists have not (yet) mastered reliable control over human pulses as they have over cesium atoms. This is a historical fact about the state of human knowledge, not a mindindependent fact about hearts or cesium atoms. However, even if this claim is granted, it offers no explanation for the difference in long-term stability but only admits the lack of such an explanation. Another possibility is for constructivists to relax the reification thesis, and claim that metrologists do detect pre-existing regularities in the behaviour of their instruments, but that such regularities do not sufficiently explain how networks of standards are stabilized. Under this ‘moderate’ version of their view, constructivists admit that a combination of natural and socio-technological explanans is required for the stability of metrological networks. The question then arises as to how the two sorts of explanans should be combined into a single explanatory account. The following section will provide such an account.

3. Models and Standardization 3.1 A third alternative As we have seen, conventionalists and constructivists agree that claims concerning the uniformity of a clock’s cycles are neither true nor false independent of human agency, but disagree about the scope of this agency. Conventionalists like Carnap and Reichenbach hold that human agency is limited to the choice of an a priori coordinative definition of temporal uniformity. Once the choice is made, stabilizing a network of frequency standards is a matter of empirical discovery. Constructivists like Latour and Schaffer would argue instead that claims about temporal uniformity are true or false only relative to a particular act of comparison among clocks, made at a particular time and location in an ever changing network of instruments, protocols and calculations. If such claims appear universal and context-free, it is only because they rely on metrological networks that have already been stabilized and ‘black-boxed’ so as to conceal their historicity. As each view explains aspects of stability that the other fails to account for, a synthesis of the two views would be desirable. However, the two views are incompatible with one 15

another, at least when interpreted literally. Conventionalists like Carnap and Reichenbach take mind-independent empirical regularities as their primitive explanans, with which they try to account for the possibility of reproducible measurement. Constructivists like Latour and Schaffer, on the other hand, reject the appeal to natural regularities as primitive explanans, believing instead that such regularities are reified end-products of metrological activity. Any attempt to use elements from both views without addressing this fundamental tension would be incoherent and provide only an illusion of explanation. Instead, a comprehensive and coherent explanation of the stability of networks of standards requires that the very notion of stability be reconceived in a manner that obviates the need for explanatory reduction in either direction. Such explanation is provided by the model-based account of standardization, whose main tenets are as follows: 1. Stability (both over time and across particulars) is a model-based notion. Which object or process counts as stable depends on the theoretical and statistical assumptions under which that object or process is modeled. 2. To standardize a quantity-concept is to regulate its use in a manner that allows certain exemplary particulars (i.e. measurement standards) to be modeled as highly stable. 3. Metrologists are to some extent free to regulate the use of a quantity concept, not only through acts of definition, but also through choices of exemplary particulars and the assumptions under which those exemplary particulars are modeled. 4. Success in stabilizing measurement standards cannot be explained by reduction to either natural or socio-historical factors; instead, stability results from the iterative entanglement of these factors in the process of standardization. 5. Standardization projects produce genuine empirical knowledge; this knowledge is nonetheless mediated by background theoretical and statistical assumptions and shaped by social and pragmatic constraints. In what follows I will argue for each of these claims and illustrate them in the special case of contemporary timekeeping. In so doing I will show that the model-based account provides a more comprehensive explanation for the stability of metrological standards than its alternatives. I will conclude the section by discussing the general scope of this account.

3.2 Stability reconceived A key move in disentangling the explanatory conflict between conventionalism and constructivism is the adoption of a subtler notion of stability. As already noted, the stability of a measuring instrument is a double-aspect notion, encompassing both repeatability and reproducibility. Both aspects are ways of ‘getting the same measurement outcome’, either over time or across particulars. Neither conventionalist nor constructivist accounts of standardization say much about what it means for measurement outcomes to agree with one another, raising the worry that the relevant authors suppose a naive criterion of agreement. Carnap, for example, seems to identify regularity in the behaviour of pendulums with a relation of proportionality among the number of swings they produce (1995 [1966], 82). His empiricist notion of regularity pertains to the indications of instruments, namely their observed outputs37. By contrast, under the model-based view agreement pertains to measurement outcomes, i.e. to knowledge claims that are inferred from indications along with 16

background knowledge about the measuring system. Meaningful agreement or disagreement can be established among measurement outcomes because outcomes are already corrected for known distortions and associated with uncertainty estimates for unknown distortions. By contrast, instrument indications are not directly comparable to each other for two reasons. First, different instruments often display their outputs differently, e.g. by the position of a needle relative to a dial, occurrence of digits on a display, the frequency of a sound etc.. Second, even when indications are converted to numbers, their convergence is neither necessary nor sufficient for agreement among measurement outcomes. After applying required error corrections, instruments that generate the same indications may be interpreted as producing incompatible measurement outcomes and vice versa. For example, when primary frequency standards are compared to each other, each of their raw output frequencies is first corrected for a host of systematic errors. As different clocks differ in their environment and design, they are assigned different correction factors. Consequently, two primary standards that ‘tick’ at the same rate would normally be considered to be in significant disagreement. It is only their already-adjusted measurement outcomes that are considered meaningful for comparison38. The notion of agreement, and ipso facto the notion of stability in both of its aspects, properly pertain to measurement outcomes and not to instrument indications. As noted above, measurement outcomes are inferred from instrument indications based on background assumptions. These assumptions may be theoretical, i.e. concern the dynamics of the interaction between measuring instrument, measured object and environment, or statistical, i.e. concern the distribution of indications and the properties of background noise. Which measurement outcome is inferred from a set of indications depends on the background assumptions involved. In other words, measurement outcomes depend not only on the indications generated by a measuring system, but also on how the measuring system is modeled both theoretically and statistically39. A model of a measuring system is an abstract and idealized representation of the apparatus, the objects being measured and elements in the environment (including, in some cases, human operators). Models of this sort are necessary preconditions for grounding inferences from indications to outcomes40. The modeldependence of measurement outcomes is widely acknowledged by metrologists and forms the basis for contemporary methods of evaluating measurement uncertainty, among other uses41. The model-dependence of measurement outcomes entails the model-dependence of agreement and hence of stability. Prior to the specification of modeling assumptions about instruments there can be no meaningful estimation of stability, as such assumptions are necessary preconditions for specifying which configurations of indications constitute stable behaviour. In other words, the behaviour of measuring instruments is properly deemed stable only relative to some set of modeling assumptions about those instruments. Whether or not measuring instruments behave in a regular, or stable, manner cannot be uniquely determined merely by observing them. Rather, stability is co-constituted by observations together with the assumptions used to interpret observations by a given scientific community at a given time42. This is not to say that any set of modeling assumptions will suffice in order to achieve stability. As the following sections will clarify, the success of a standardization project depends on careful co-development of measuring instruments along with models representing their behaviour. Stability emerges from this exchange, and is therefore tightly constrained by empirical considerations.

17

3.3 Legislative freedom The model-dependence of stability has implications for the scope and limits of freedom associated with standardization. Conventionalists, recall, traced this freedom to the arbitrariness of certain definitions, such as the definition of ‘equality of duration.’ Such definitional freedom is a priori, in principle and singular, and amounts to a freedom to describe the same observations more or less simply. But precisely because it is independent of empirical considerations, definitional freedom is very limited when it comes to regulating the use of concepts. It does not allow metrologists to make changes to the physical conditions that satisfy a given definition, for any such change would (by conventionalist reasoning) require a completely new and equally arbitrary definition. Were metrologists confined to follow the definition alone, there would be no epistemic justification for the patchwork of self-fulfilling, and occasionally ad hoc, criteria they employ to evaluate the stability of clocks. According to the model-based account, a definition constitutes an important first step in determining how a quantity-concept would be applied to particulars, but this step leaves room for further interpretation. Recall that it is technologically impossible to strictly satisfy an idealized definition. It is technologically impossible, for example, to isolate a single, unperturbed Cesium atom and place it exactly on the surface of the Earth’s gravitational geoid (and, indeed, physically impossible to probe such atom without perturbing it). The notion of terrestrial time can only be approximately satisfied – or ‘realized’. Doing so requires clear criteria for what counts as a good approximation of the conditions specified by the definition. The theoretical definition alone cannot provide rules for determining which concrete approximations are more accurate than others, for it lacks necessary detail. If one were concerned merely with abstract semantics, one could think of such approximations as located on a multi-dimensional, abstract parameter space surrounding the definition’s true referent. Metrologists cannot rest content with such abstractions: having no immediate experimental access to such parameter space, they cannot determine the location of any concrete object on that parameter space without additional low-level rules. Such rules are supplied by a chain of increasingly detailed models that run from the highly abstract, theoretical definition all the way down to concrete realizations. These mediating models function as schemas for applying the theoretical concept being standardized to concrete particulars, and are able to do so because they are far richer in detail than the bare theoretical definition. They involve assumptions about a myriad of idiosyncrasies associated with particular artefacts and their environments, assumptions that cannot be supplied by high theory and must be filled in through experiment. These models allow metrologists to specify rules for deciding which concrete particulars fall closer to the defined concept. In the case of timekeeping, metrologists specify and continually modify algorithms for deciding which clocks belong to the ensemble of standards, and which of the clocks in the ensemble approximate terrestrial time more closely. These algorithms are embedded in models that represent particular clocks with much more detail than the abstract definition of terrestrial time. Recall that each individual clock is represented by a set of parameters including its weight in the calculation and its characteristic transfer noise. The values of these parameters are determined based on statistical and theoretical assumptions concerning, for example, atmospheric disturbances affecting the satellite signals that transmit a clock’s indications to Paris. In addition, the entire ensemble of secondary standards is modeled statistically, characterizing its ‘cushion’ parameters, noise 18

coefficients, frequency stability, accuracy and so on. Finally, the entire ensemble of both primary and secondary standards is characterized by a host of parameters, the most important of them being UTC itself. The values of UTC are determined based on theoretical and statistical assumptions about the behaviour of the clock ensemble as well as additional information such as astronomical observations (for the determination of leap seconds). It is to this abstract model parameter known as Coordinated Universal Time, rather than to any concrete clock, that the concept of terrestrial time is directly coordinated43. Like Reichenbach, I am using the term ‘coordination’ to denote an act that specifies the mode of application of an abstract concept to a concrete level. But the form taken by coordination in the model-based view is quite different than what conventionalists envisioned. Instead of directly linking concepts with objects or physical operations, coordination consists in the specification of a hierarchy among parameters in different models. In our case, the hierarchy links a parameter (terrestrial time) in a highly abstract and simplified theoretical model of the earth’s spacetime to a parameter (UTC) in a less abstract, theoretical-statistical model of certain atomic clocks. UTC is in turn coordinated to a myriad of parameters (UTC(k)) representing local approximations of UTC based on even more detailed, lower-level models constructed at national laboratories. Finally, the particular clocks that standardize terrestrial time are subsumed under the lowest-level models in the hierarchy. It is by constructing and modifying these mediating models as well as the exemplary artefacts themselves that metrologists exercise a second kind of freedom, one that concerns use rather than definition. Metrologists are to some extent free to decide not only how they define an ideal measurement of the quantity they are standardizing, but also what counts as an accurate concrete approximation (or ‘realization’) of this ideal. This freedom stems from the fact that measurement outcomes are model-dependent: the correct outcome of a measurement depends not only on the definition of the quantity being measured and the indications produced by the instrument, but also on a variety of ‘auxiliary’ theoretical and statistical assumptions. These assumptions concern, among other things, the design of the instrument, its various idiosyncrasies and imperfections, the statistical distribution of its indications and fluctuations in the surrounding environment. When standardizing a quantity-concept, metrologists are required to specify these sorts of modeling assumptions for the particular objects or processes they intend to use as standards. As these objects or processes are intended to become exemplary of the correct application of the concept, they are not yet associated with exact values of the quantity being standardized prior to the specification of such modeling assumptions (for otherwise standardization would be unnecessary). Hence metrologists are to some extent free to choose these values, not only through their choice of definition, but also through the theoretical and statistical assumptions with which they choose to model the relevant standards. This choice of modeling assumptions determines the distribution of errors among multiple measurement standards, thereby deciding which particulars count as more accurate realizations of the relevant quantity. The choice of modeling assumptions is legislative with respect to the application of the quantity-concept. By ‘legislation’ I mean the specification and modification of rules for deciding which particulars count as exemplary of a concept. Unlike definition, which is a singular and linguistic activity, legislation is ongoing and empirically-constrained. In the case of timekeeping, metrologists legislate the mode of application of the concept of terrestrial time to an ensemble of atomic standards. Legislation involves running and refining the algorithms that determine the values of model parameters, as well as developing and modifying the theoretical and statistical models underlying the entire calculation. Legislation also involves making physical modifications to the collection of exemplary particulars, e.g. 19

replacing one clock with another. I call these activities ‘legislative’ because they play a normative role with respect to the application the quantity-concept to particulars. That is, they determine which concrete frequencies count as good approximations to the ideally defined cesium frequency, and hence what counts as the proper way of applying the concept of temporal uniformity to particulars. Metrological models thus play a dual role. On the one hand, like other scientific models, they have a representational and descriptive function, i.e. explaining and predicting the behaviours of measuring systems. On the other hand, metrological models also fulfill a prescriptive function, i.e. they exemplify how certain measuring systems should be described in terms of a quantity-concept and thereby help regulate the use of that concept. This duality results in the inversion of approximation relations. In most types of scientific inquiry abstract models are meant to approximate their concrete target systems. But the models constructed during standardization projects have a special normative function, that of legislating the mode of application of concepts to concrete particulars. At each level of abstraction, the models specify what counts as an accurate application of the standardized concept at the level below. Consequently, the accuracy of a concrete frequency standard is evaluated against its abstract models and not the other way around. Figure 1 summarizes the various levels of abstraction and relations of approximation involved in contemporary atomic timekeeping.

Figure 1: A simplified hierarchy of approximations among model parameters in contemporary atomic timekeeping. Vertical position on the diagram denotes level of abstraction and arrows denote approximation relations. Note that parameters on concrete levels approximate parameters on more abstract levels. (For simplification, Terrestrial Time is represented here as approximated directly by UTC. Strictly speaking, Terrestrial Time is approximated by UTC plus a known number of ‘leap seconds’.)

20

3.4 Achieving stability The freedom to adjust the mode of application of quantity-concepts explains why metrologists allow themselves to introduce seemingly self-fulfilling mechanisms to stabilize UTC. When standardizing a quantity-concept, metrologists try to legislate its mode of application in a manner that makes certain particulars as stable as practically possible. Successful standardization thus depends on metrologists’ choice of exemplary particulars as well as on the way they model those particulars in terms of the standardized concept. In the case of timekeeping, success is achieved by small and ongoing changes to the sanctioned mode of application of the concept of terrestrial time, so that its extension at any given cycle of the calculation coincides with those clocks metrologists know how to represent as most stable. Rather than ask: ‘how well does this clock approximate terrestrial time?’ metrologists are, to a limited extent, free to ask: ‘which models should we use to apply the concept of terrestrial time to this clock?’ In answering the second question metrologists enjoy some interpretive leeway, which they use to maximize the short-term stability of their clock ensemble. This is precisely the role of the algorithmic manoeuvres discussed above. These selfstabilizing mechanisms do not require justification for their ability to approximate terrestrial time because they are legislative with respect to the application of the concept of terrestrial time to begin with. UTC is successfully stabilized in the short run not because its calculation correctly applies the concept of terrestrial time to secondary standards; rather, UTC is chosen to determine what counts as a correct application of the concept of terrestrial time to secondary standards because this choice results in greater short-term stability. Contrary to conventionalist explanations of stability, then, the short-term stability of UTC cannot be fully explained by the presence of an independently detectable regularity in the data from individual clocks. Instead, a complete explanation must non-reducibly appeal to stabilizing policies adopted by metrological institutions. These policies are designed in part to promote a socially optimal compromise among metrological institutions, namely to balance the need for multi-national inclusiveness with the central authority of the BIPM. The metrological community could have chosen to standardize time in a more centralized way (e.g. only clocks in Paris participate) or in a more distributed way (e.g. every nation keeps its own timescale, and the BIPM merely maintains a table of the differences). Neither of these alternatives is better or worse than the current timekeeping system in a strict epistemic sense. And yet under each of these alternatives discrepancies between national time signals would have been somewhat different than they are today, despite there being no technological difference in the underlying clock network. Constructivists are therefore correct to some extent in claiming that the universality of time is constituted by the regulatory policies adopted by metrological institutions. The shape of social compromise partially determines what counts as a correct application of the notion of terrestrial time, and hence influences the distribution of errors among global time signals. Nonetheless, legislation is not merely a reflection of social or pragmatic desiderata. As the recurring qualification ‘to some extent’ hints, legislative freedom is severely, though not completely, constrained by empirical considerations. First, choices of realization are empirically constrained by the behaviours of concrete artefacts metrologists use as standards. Second, the quantity concepts being standardized are not ‘free-floating’ concepts but are already embedded in a web of assumptions. Terrestrial time, for example, is a notion that is already deeply saturated with assumptions from general relativity, atomic theory, electromagnetic theory and quantum mechanics. The task of standardizing terrestrial time in a consistent manner is therefore constrained by the need to maintain compatibility with 21

established standards for other quantities that feature in these theories. Finally, the same concept may be realized using multiple methods. Terrestrial time, for example, may be approximated in retrospect by post-processing indications from primary standards. The BIPM calculates such post-processed timescales and occasionally compares them to TAI44. The question ‘how well does clock X approximate terrestrial time?’ is therefore still largely an empirical question even in the context of a standardization project. It can be answered to a good degree of accuracy by comparing multiple approximations of terrestrial time. These approximations of terrestrial time nevertheless do not completely agree with one another. More generally, different applications of the same concept to different domains, or in light of a different trade-off between goals, often end up being somewhat discrepant in their results. Standardization institutes continually manage a delicate balance between the extent of legislative freedom they allow themselves in applying concepts and the inevitable gaps discovered among multiple applications of the same concept. Nothing exemplifies better the shifting attitudes of the BIPM towards this trade-off than the history of ‘steering’ corrections, which have been dispensed aggressively or smoothly over the past decades depending on whether frequency accuracy or frequency stability was preferred. The gaps discovered between different applications of the same quantity-concept are among the most important (though by no means the only) pieces of empirical knowledge amassed by standardization projects. Such gaps constitute empirical discoveries concerning the existence or absence of regularities in the behaviour of instruments, and not merely about the way metrologists use their concepts. This is a crucial point, as failing to appreciate it risks mistaking standardization projects for exercises in the social regulation of data-analysis practices. This is precisely the mistake inherent in an overly literal reading of constructivist claims, and especially in the explanatory inversion expressed by the reification thesis. Even if metrologists reached perfect consensus as to how a given quantity concept should be applied, there is no guarantee that the mode of application they have chosen will lead to consistent results. Success and failure in applying a quantity concept consistently are to be investigated empirically, and the discovery of gaps (or their absence) is accordingly a matter of obtaining genuine empirical knowledge. The discovery of gaps explains the possibility of stabilizing networks of standards in the long run. Metrologists choose to use as standards those instruments to which they have managed to apply the relevant concept most consistently, i.e. with the smallest gaps. To return to the example above, metrologists have succeeded in applying the concept of temporal uniformity to different cesium atoms with much smaller gaps than to different pulse rates. This is not only a fact about the way metrologists apply the concept of uniformity, but also about a regularity in the behaviour of cesium atoms, a regularity that is discovered when cesium clocks are subsumed under the concept of uniformity through the mediation of relevant models. Metrologists rely on such regularities for their choices of physical standards, i.e. they tend to select those instruments whose behaviour requires the smallest and least frequent model-based corrections. As standardization projects progress, metrologists often find new theoretical and statistical means of predicting some of the gaps that remain, thereby discovering ever ‘tighter’ regularities in the behaviours of their instruments. Consequently, a particularly efficient way of obtaining a stable network of standards is through iterative exchange between ‘top-down’ adjustment to the mode of application of concepts and ‘bottom-up’ discovery of inconsistencies in light of this application. This bidirectional exchange results in greater overall stability, as it allows metrologists to latch onto regularities in the behaviour of their instruments while redistributing remaining gaps in 22

a socially optimal manner45. This iterative strategy explains the efficacy of the methods chosen to standardize time and the stability of timekeeping standards in both the short and long run. Thus the model-based account has a wider explanatory scope than conventionalist and constructivist explanations, which only account for either long- or short-term stability. The model-based account also clarifies the epistemic function of seemingly circular and ad hoc parts of the UTC algorithm, an elucidation not afforded by either of the alternatives. Finally, the model-based account avoids reducing explanations of stability to either natural or social factors. This last point will be clarified in the next section. 3.5 Nature and society entangled We saw that the stability of measurement standards arises from interlocking steps of legislation and empirical discovery, both of which are mediated by models. Given this model-based account of standardization, one may be tempted to ask: how much of the resulting stability is due to nature, and how much due to socio-historical factors? This question mistakenly presupposes that the stability of measurement standards can be analyzed into distinct contributions by humans and nature. The conceptual mistake becomes apparent if one recalls that facts about the stability of measuring instruments are co-constituted by empirical observations of those instruments along with the assumptions with which those instruments are modeled by a scientific community at a given time. Though facts about stability are far from being arbitrary or merely conventional, they are nonetheless tied to a social and historical context. Attempting to estimate stability independently of a sociohistorical context means forgoing the local and changing background assumptions that allow such estimates to play an informative role in metrology. On the other hand, attempts to explain facts about stability solely on the basis of socio-historical factors are just as mistaken, for they ignore the empirical grounding of such facts. Under the model-based account, the stability of measurement standards arises from an iterative process of metrological model-building and instrument modification. In each iteration, social and practical considerations partly shape the use of a quantity-concept, allowing for new empirical information to be collected about the degree of consistency of that concept’s application. This empirical information is in turn incorporated into the relevant metrological models in the next iteration, and so on. Social, technological and epistemic considerations all become entangled in this process, and there is no conceptual sense in trying to divide the explanatory turf between them. For example, it makes no conceptual sense to try and disentangle the weight differences among clocks contributing to EAL and establish what percentage of those differences is due to mind-independent differences in clock stability, and what percentage to error-distributive policies enforced by the BIPM. These policies, recall, are partly constitutive of what counts as a stable clock, whereas the indications of clocks feed back and shape those very policies. This interplay gives rise to a virtuous stabilizing cycle, whose end-results no longer bear the distinct marks of either nature or society. These insights are intended neither as an outright rejection, nor as a reinterpretation, of conventionalism or constructivism. Above I rejected conventionalism in its literal reading, that is, insofar as it is understood to ascribe only an a priori, definitional and ahistorical sort of freedom to scientists engaged in the standardization of measurement. I similarly rejected the reification thesis implied by a literal reading of constructivist claims. The model-based account has been offered as a third, independent alternative, and while it is inspired by elements of conventionalism and constructivism, it is also deeply informed by considerations 23

from the philosophies of modeling and experimentation that extend far beyond either of the earlier accounts. Still, the model-based account may point the way to alternative, non-literal and more charitable readings of both conventionalist and constructivist claims about standardization. It seems plausible, for example, to reinterpret conventionalist claims about coordinative definition as pertaining to idealized measuring procedures rather than concrete ones46. Similarly, a moderate reading of constructivism that replaces reification with coproduction by nature and society would cohere with the ‘naturalistic’ tradition in sociology of knowledge47. Further work is required to develop such non-literal interpretations and to ascertain whether they produce compelling alternative explanations for the stability of measurement standards. 3.6 Remark on scope This article focused on the standardization of global timekeeping. Although space limitations do not permit the detailed discussion of additional examples, it should be emphasized that the model-based account relies on considerations that apply broadly to the standardization of physical measurement. The model-dependence of stability, the underdetermination of concept application by definition, the coordination of theory with measurement through increasingly detailed models and the legislative function of such models are all general features of physical measurement independently of the specific details of timekeeping. The model-based account is therefore expected to have a wide scope of application within the physical sciences, and possibly beyond them as well. The model-based account is also not limited to cases where the relevant measurement unit is defined theoretically. The kilogram, for example, is currently defined as the mass of a particular platinum-iridium artefact kept in a vault near Paris, but this artefact-based definition does not diminish the model-dependence of mass measurement. The only relevant difference between time and mass standardization is that in the case of time, the definition of the standard second plays the dual role of defining a unit and a congruence criterion, while in the case of mass the definition of the kilogram defines a unit alone. A stable network of mass standards nonetheless requires not only a standard measurement unit but also repeatable and reproducible methods for comparing masses to each other. Criteria of equality among masses, just like criteria of equality among time intervals, are abstract and depend on the assumptions under which comparisons among masses are carried out. In order to standardize the measurement of mass, the definition of the kilogram must be supplemented with recommendations concerning the kinds of balances that count as accurate (Picard 2004) as well as cleaning techniques for removing contaminants from the prototype kilogram and its many official copies (Girard 1990). The recommended weighing, washing and cleaning procedures involve a host of theoretical and statistical assumptions that are occasionally revised in order to stabilize global mass measurement further48. The persistence of inconsistencies in spite of these legislative acts is a key reason behind the current proposal to redefine the kilogram based on the Planck constant (CGPM 2011).

Conclusions Standardization is an ongoing activity aimed at legislating the proper mode of application of a quantity-concept to exemplary particulars. Contrary to the views of conventionalists, this legislation is not a matter of arbitrary, one-time stipulation. Instead, I have argued that legislation is an ongoing, empirically-informed activity. This activity is 24

required because linguistic definitions by themselves do not completely determine how the defined concept is to be applied to particulars. Instead, acts of legislation are partly constitutive of the regularities metrologists discover in the behaviour of their instruments. As I have shown, legislation proceeds by constructing a hierarchy of idealized models that mediate between the theoretical definition of the concept and concrete objects or processes. These models are iteratively modified in light of empirical data so as to maximize the stability with which concrete instruments are represented under the quantity concept. Additionally, instruments themselves are modified in light of the most recent models so as to maximize stability further. In this reciprocal exchange between abstract and concrete modifications, stable behaviour is iteratively imposed on the network ‘from above’ and discovered ‘from below’, leaving genuine room for both natural and social explanans in an account of stabilization49. As part of the process of iterative stabilization, metrologists gain knowledge concerning inconsistencies in the application of the quantity concept to particulars. This knowledge is empirical, that is, it pertains to the observed behaviours of instruments. It is reliable knowledge to the extent that those instruments have already been stabilized. And it is contextual knowledge insofar as it is mediated by the relevant models and partly shaped by social and pragmatic considerations. The model-based account of standardization is a part of a more comprehensive, model-based epistemology of measurement, which I have outlined elsewhere (Tal 2012). According to this wider account, theoretical and statistical models of measuring systems are necessary preconditions for inferring measurement outcomes from indications, for establishing the objectivity of measurement outcomes, and for evaluating accuracy, precision, error and uncertainty. Tracing these epistemic roles in detail affords a richer and more exact conception of measurement and its relation to theory and observation than do general statements about the ‘theory-ladenness’ of measurement. Additional studies are required to determine whether the model-based approach can be extended to other domains, particularly to measurement and standardization in the human and social sciences.

References Arias, E.F. and Petit, G. (2005), “Estimation of the duration of the scale unit of TAI with primary frequency standards”, Proceedings of the IEEE International Frequency Control Symposium, pp. 244-6. Audoin, C. and Guinot, B. (2001) The Measurement of Time, Cambridge University Press. Azoubib, J., Granveaud, M. and Guinot, B. (1977) “Estimation of the Scale Unit of Time Scales” Metrologia 13, pp. 87-93. BIPM (International Bureau of Weights and Measures) (1977), “News from the BIPM”, Metrologia 13, pp. 53-4. ––––– (1978), “News from the BIPM”, Metrologia 14, pp. 89-91. ––––– (2006) The International System of Units (SI). 8th edition. http://www.bipm.org/en/si/si_brochure/ ––––– (2011), BIPM Annual Report on Time Activities, Vol. 6. http://www.bipm.org/utils/en/pdf/time_ann_rep/Time_annual_report_2011.pdf ––––– (2013), Circular-T 304, ftp://ftp2.bipm.org/pub/tai/publication/cirt.304 Bloor, D. (1999) “Anti-Latour”, Studies in History and Philosophy of Science Vol. 30 (1), pp. 81-112. Boumans, M. (2005). How Economists Model the World into Numbers. Routledge. –––––. (2007) Measurement in Economics: A Handbook. London: Elsevier. Carnap, R. (1995 [1966]) An Introduction to the Philosophy of Science, Dover. CGPM (General Conference on Weights and Measures) (2011) “Resolution 1 of the 24th meeting of the CGPM”. (http://www.bipm.org/en/CGPM/db/24/1/) Chang, H. (1995) ‘Circularity and Reliabilty in Measurement.’ Perspectives on Science 3.2, 153–172. ———. (2001) ‘Spirit, air, and quicksilver: The search for the "real" scale of temperature.’ Historical Studies in the Physical and Biological Sciences 31.2, 249-284. ———. (2004). Inventing Temperature: Measurement and Scientific Progress. Oxford University Press. Galison, Peter. 2003. Einstein’s Clocks, Poincaré’s Maps: Empires of Time. W.W. Norton.

25

Girard, G. (1990) The washing and cleaning of kilogram prototypes at the BIPM, (http://www.bipm.org/utils/en/pdf/Monographie1990-1-EN.pdf) Guinot, B. and Petit, g. (1991) “Atomic time and the rotation of pulsars”, Astronomy and Astrophysics 248, pp. 292-6. Hacking, I. (1999) The Social Construction of What? Harvard University Press. JCGM (Joint Committee for Guides in Metrology) (2008) Guide to the Expression of Uncertainty in Measurement. Sèvres: JCGM, http://www.bipm.org/en/publications/guides/gum.html –––––. (2012) International Vocabulary of Metrology – Basic and General Concepts and Associated Terms. 3rd edition with minor corrections. http://www.bipm.org/en/publications/guides/vim.html Jones, T. (2000) Splitting the Second: The Story of Atomic Time. Bristol and Philadelphia: Institute of Physics Publishing. Klein, J. & Morgan, M. (2001). The Age of Economic Measurement. Annual supplement to vol. 33 of History of Political Economy. Kuhn, T. ([1961] 1977) “The Function of Measurement in Modern Physical Sciences”, in: The Essential Tension, University of Chicago Press, pp. 178-224. Krantz, D. H., P. Suppes, R. D. Luce, and A. Tversky (1971) Foundations of measurement: Additive and polynomial representations. Dover Publications. Latour, B. (1987) Science in Action. Harvard University Press. ––––– (1992) “One more turn after the social turn…’, in E. McMullin (ed.), The Social Dimension of Science. Notre Dame: Indiana University of Notre Dame Press. pp. 272–294. Leplège, A. (2003). ‘Epistemology of Measurement in the Social Sciences: Historical and Contemporary Perspectives.’ Social Science Information 42, 451-462. Mach, E. ([1883] 1919) The Science of Mechanics. Chicago and London: Open Court Publishing. Mari, L. (2003) “Epistemology of Measurement”, Measurement 34, pp. 17-30. McClimans, L. (2010) ‘A theoretical framework for patient-reported outcome measures.’ Theoretical Medicine and Bioethics 31, 225-240. Morrison, M. (2009). ‘Models, measurement and computer simulation: the changing face of experimentation.’ Philosophical Studies 143, 33-57. Morrison, M. and Morgan M. (1999) “Models as mediating instruments”, in Morgan, M. and Morrison, M. (eds.) Models as Mediators, Cambridge University Press, pp. 10-37. Panfilo, G. (2012) “The new prediction algorithm for UTC: application and results” European Frequency and Time Forum (EFTF), pp. 242 – 246. Panfilo, G. and Arias, E.F., (2009) “Studies and possible improvements on EAL algorithm,” IEEE Transactions on Ultrasonics, Ferroelectrics, and Frequency Control (UFFC-57), pp. 154-160. Petit, G. (2004) “A new realization of terrestrial time”, 35th Annual Precise Time and Time Interval (PTTI) Meeting, pp. 307-16. Picard, A. (2004) “The BIPM flexure-strip balance FB-2” Metrologia 41:319-329. Poincaré, H. (1958 [1898]), “The Measure of Time”, in: The Value of Science, New York: Dover, pp. 26-36. Quinn, T.J. (2003) “Open letter concerning the growing importance of metrology and the benefits of participation in the Metre Convention, notably the CIPM MRA”, http://www.bipm.org/utils/en/pdf/importance.pdf Reichenbach, H. (1958 [1927]) The Philosophy of Space and Time, Courier Dover Publications. Riordan, S. ‘The Objectivity of Scientific Measures’. Forthcoming in Studies in History and Philosophy of Science. Schaffer, S. (1992) “Late Victorian metrology and its instrumentation: a manufactory of Ohms”, in Bud, R. & Cozzens, S.E. (eds.) Invisible Connections: Instruments, Institutions, and Science, SPIE Optical Engineering Press, pp. 23-56. Soler, L., Allamel-Raffin, C., Wieber, F. & Gangloff, J.L. (2011) ‘Calibration in everyday scientific practice: a conceptual framework.’ Tal, E. (2011) ‘How Accurate Is the Standard Second?’ Philosophy of Science 78.5, 1082-96. ———. (2012) ‘The Epistemology of Measurement: A Model-Based Account.’ PhD Dissertation, University of Toronto. ———. (forthcoming) ‘Old and New Problems in Philosophy of Measurement’. Philosophy Compass. Teller, P. (2013) ‘The concept of measurement-precision.’ Synthese, 190, 189-202. van Fraassen, Bas C. (2008). Scientific Representation: Paradoxes of Perspective. Oxford University Press. ———. (2012) ‘Modeling and Measurement: The Criterion of Empirical Grounding.’ Philosophy of Science. 79.5, 773-784.

26

JCGM 2012, 2.2 Barring time zone and daylight saving differences. See BIPM (2013) for a sample comparison of national approximations to UTC. An excerpt from the latter document appears in Table 1 below. 3 UTC replaced Greenwich Mean Time as the global timekeeping reference in 1972. The acronym ‘UTC’ was chosen as a compromise to avoid favoring the order of initials in either English (CUT) of French (TUC). 4 The definition of the atomic second will be discussed in the next section. 5 See especially Chang (1995, 2001, 2004), Klein & Morgan (2001), Leplège (2003), Mari (2003), Boumans (2005, 2007), van Fraassen (2008, 2012), Morrison (2009), McClimans (2010), Tal (2011, 2012), Soler et al. (2011), Teller (2013) and Riordan (forthcoming). For a discussion of the relationships between traditional and contemporary philosophy of measurement see Tal (forthcoming). 6 In official metrological terminology, these aspects of stability are called ‘measurement repeatability’ and ‘measurement reproducibility’ respectively (JCGM 2012, 2.21 & 2.25). Though conceptually distinct, these two aspects of stability cannot be evaluated separately. In the case of time measurement, for example, the only way to evaluate the stability of a clock’s frequency is to compare it with the frequencies of other clocks. Similarly, the only way to evaluate fluctuations in the outcomes of a balance is to compare them to those of other balances (or other mass-measuring instruments more generally). From an epistemological point of view, then, these two aspects of stability are deeply entangled. My claims about stability throughout this article are intended to apply to both aspects unless otherwise specified. 7 The term ‘clock’ is used here in a broad sense to include both artificial and natural systems for measuring time. This is in line with the common definition of a clock in physics, i.e. a system consisting of an oscillator and a counter (Jones 2000, 26), where the oscillator may be a naturally occurring process such as the Earth’s orbit around the Sun. 8 Frequency stability is, in principle, sufficient for reproducible timekeeping. A collection of clocks with perfectly stable frequencies would tick at constant rates relative to each other, and so the readings of any such clock would be sufficient to reproduce the readings of any of the others by simple linear conversion, barring relativistic effects. A collection of individually stable clocks is therefore also stable in the global sense of the term, i.e. supports the reproducibility of measurement outcomes. 9 BIPM (2006), 113 10 Paradoxically, under these ideal conditions it would be impossible to probe the cesium atom so as to induce the relevant transition. Hence the duration of the second is defined in a doubly counterfactual manner. 11 In 2011, active primary frequency standards were maintained by laboratories in France, Germany, Italy, Japan, the UK, and the US (BIPM 2011, 32) 12 This design allows for a higher signal-to-noise ratio through the use of Ramsey resonance. For a summary of the design principles of cesium fountains see Audoin and Guinot (2001, 170-184). 13 Panfilo (2012, 242) 14 Elsewhere I have called this the problem of multiple realizability of unit definitions and discussed the way this problem is solved in the case of primary frequency standards (Tal 2011). This article focuses on the ways metrologists solve the problem of multiple realizability in the context of international timekeeping, where the goal is not merely to produce good local approximations of the standard second but also to maintain a unified measure of time and synchronize clocks worldwide in accordance with this measure. 15 More exactly, Terrestrial Time is approximated by International Atomic Time (TAI), identical to UTC except for leap seconds. This point will be clarified below. 16 There are many clocks that approximate UTC, of course. As will be mentioned below, the BIPM and national laboratories produce continuous time signals that are considered realizations of UTC. However, UTC itself is an abstract measure and should not be confused with its many realizations. 17 Audoin and Guinot (2001, 249). 18 The method of fixing this maximum weight has itself been modified four times in the past two decades to optimize stability, and as of 2012 a new method of calculating weights is being considered in order to increase stability further (Petit 2004, 308; Panfilo 2012, 246). 19 Audoin and Guinot 2001, 243-5. 20 As of 2010, HP clocks constituted over 70 percent of contributing clocks (BIPM 2010, 52-67). A smaller portion of continuously-running clocks are hydrogen masers, i.e. atomic clocks that probe a transition in hydrogen rather than in cesium. 21 Quinn (2003) 22 BIPM (1977, 54); BIPM (1978, 90); Audoin and Guinot (2001, 250) 1 2

27

23 Some primary standards are active for longer periods than others, resulting in a better signal; some calibrations suffer from higher transfer noise; and some of the primary standards are more accurate than others. See Tal (2011) for a detailed discussion of how the accuracy of primary frequency standards is evaluated. 24 Azoubib et al (1977), Arias and Petit (2005) 25 Panfilo and Arias (2009, 112) 26 Audoin and Guinot 2001, 251 27 Audoin and Guinot 2001, 251 28 In May 2013 the difference between TAI and UTC was 35 seconds (BIPM 2013). 29 This achievement is better appreciated when one contrasts it to the state of time coordination less than a century-and-a-half ago, when the transmission of time signals by telegraphic cables first became available. Peter Galison (2003) provides a detailed history of the efforts involved in extending a unified ‘geography of simultaneity’ across the globe during the 1870s and 1880s, when railroad companies, national observatories, and municipalities kept separate and conflicting timescales. Today, the magnitude of discrepancies among timekeeping standards is far smaller than is required by almost all practical applications, with the exception of few highly precise astronomical measurements. In particular, the study of millisecond pulsars requires highly stable atomic timescales (Guinot & Petit 1991). 30 See fn. 5 for references. 31 Metaphysical beliefs can still play a legitimate role in metrological practice indirectly through their psychological and social consequences. For example, metrologists may be psychologically motivated to accurately measure the value of a physical constant by their belief that the constant has an exact, mindindependent value. Their methods of accuracy evaluation would nonetheless be independent of the truth or falsity of their metaphysical beliefs, as clarified in Tal (2011, 1092-3). 32 Coordinative definitions are required because theories by themselves do not specify the application conditions for the concepts they define. A theory can only link concepts to one another, e.g. link the concept of uniformity of time to the concept of uniform motion, but it cannot determine which real motions or frequencies count as uniform (Reichenbach 1958 [1927], 14). This, of course, is true only under the very restrictive notion of theory that was accepted in the early days of logical positivism. 33 Audoin and Guinot 2001, 251 34 My point concerning the possibility of failure is not meant to invoke any particular criterion of testability such as falsifiability or severity. Rather, my point is that UTC should not be thought of as a hypothesis to be tested at all. UTC is not an estimator of a mind-independent parameter whose value metrologists are trying to approximate, but an abstract artefact in its own right that metrologists attempt to stabilize (much like their attempts to stabilize material artefacts). 35 Ian Hacking identifies explanations of stability as one of three ‘sticking points’ in the debate between social constructivists and their intellectual opponents (1999, 84-92) 36 This hypothetical example is inspired by Carnap (1995 [1966], 83-5) 37 For the sake of this argument I am not supposing any particular theory of observation. Observing the indications of a measuring instrument may involve the use of additional instruments – in some cases, even additional measuring instruments – as long as the recursion eventually bottoms out. 38 For more details on the comparison of primary frequency standards see Tal (2011) and references therein. 39 I conceive of models as autonomous mediators between theory and experiment. For an exposition to this way of thinking about models see Morrison and Morgan (1999). My views on models and their role in measurement therefore significantly depart from that of Krantz, Suppes et al (1971), for whom models are empirical relational structures that are homomorphic to mathematical ones. Space limitations prevent these differences from being discussed in detail here. 40 For a detailed analysis of the inferential structure of measurement procedures and its dependence on models see Tal (2012). 41 For example, JCGM (2008) (known as the ‘GUM’) is a document written by a committee of international metrological institutions that provides guidelines for evaluating measurement uncertainty based on models of the relevant measuring system. For a discussion of the epistemological role of models in measurement see also Mari (2003). 42 This claim pertains to measuring instruments. Whether or not empirical regularities in general are model-dependent in this sense is a question that falls beyond the scope of this essay.

28

43 More exactly, the concept of terrestrial time is directly coordinated to TAI, i.e. to UTC prior to the addition of ‘leap seconds’ (see the section on ‘leap second’ above.) 44 See Petit (2004) for an example. 45 This double-sided methodological configuration is an example of what Hasok Chang (2004, 220-34) calls ‘epistemic iteration’, i.e. enrichment of knowledge through successive partial revisions of existing traditions. It is worth noting, however, that the iterations discussed here occur on a much smaller scale than the examples discussed by Chang. Whereas Chang focuses on technological and theoretical innovations that advance the measurement of a quantity over decades or centuries, the iterations discussed here constitute small and rapid (e.g. monthly) changes to the sanctioned mode of application of a concept that are not precipitated by the introduction of novel theories or instruments. 46 Some of Reichenbach’s remarks seem to support such reading, e.g. his claim that “uniform time is not considered to be equal to the directly observed time, but is derived indirectly from it by a series of corrections.” (1958 [1927], 118) 47 The naturalistic approach in SSK is defended by Bloor (1999). It should be noted that Latour would have likely favoured a more radical reading of his own claims about metrology, according to which both nature and society arise from interactions between human and non-human actors (Latour 1992). Such reading has several interesting differences from – and also surprising commonalities with – the model-based account that must regretfully await elaboration elsewhere. 48 For example, in 1989 the definition of the kilogram was reinterpreted as referring to the mass of the International Prototype immediately after cleaning and washing in accordance with the recommended procedure, and after applying a linear correction factor of 0.0386 microgram per day to cancel out known mass drifts (Girard 1994, 320). 49 In this respect the model-based account continues the analysis of measurement offered by Kuhn ([1961] 1977). Kuhn took scientific theories to be both constitutive of the correct application of measurement procedures and as preconditions for the discovery of anomalies. The model-based account applies Kuhn’s insights to the maintenance of metrological standards, where local models play a role analogous to theories in Kuhn’s account despite operating within ‘normal’ science.

29

Making Time: A Study in the Epistemology of ...

epistemology of measurement, that is, for the study of the relationships between ... currently used to standardize time and by tracing the sources of those methods' ... degrees Kelvin with no background fields influencing the energy associated with the ...... Neither of these alternatives is better or worse than the current.

304KB Sizes 4 Downloads 156 Views

Recommend Documents

THE EPISTEMOLOGY OF THE PATHOLOGICAL ... - Semantic Scholar
had maintained a lower-middle-class life, doing light clerical work until the mid-1980s. When she was no longer able to find such work because of her age and ...

THE EPISTEMOLOGY OF THE PATHOLOGICAL ... - Semantic Scholar
for Foucault in the late nineteenth century). Beginning with ...... Journal of Criminal law and Criminology. ..... “increased government intervention and urban renewal, and indirectly as a consequence .... that generally only permits one brief enco

Utility of Real-time Decision-making in Commercial ...
To maximise overall utility with all algorithms, only 1% to 4% in the highest ... have high predictions, and high predictions can exceed one). ... class of interest.

John Turri, "Supervenience in Epistemology"
Epistemology: An Anthology. Malden, MA: Black- well. Steup, Matthias. 1996. An Introduction to Contemporary Epistemology. Upper Saddle. River, NJ: Prentice Hall. Van Cleve, James. 1979. “Foundationalism, Epistemic Principles, and the Cartesian Cir-

A Measurement Study of Short-time Cell Outages ... - Semantic Scholar
Jan 19, 2016 - in hot-spot locations. In this scenario, we expect that. STCOs may occur, due to the possible high load experi- enced by the cells. We therefore point out the importance of load balancing and off-loading techniques [14], being low load

A Measurement Study of Short-time Cell Outages ... - Semantic Scholar
Jan 19, 2016 - supply variation, preventive BS activity state transition due to excessive temperature increase or ... Figure 1: Outage events recording in the OMC database. Table 1: Data set features. Feature. Value ... formance monitoring database (

DOWNLOAD Best PDF A Long Time in Making: The ...
offers useful data and information to aid wider research into questions such as the legitimacy of conglomerates as a business model, the creation and ...

Dynamic interactive epistemology - CiteSeerX
Jan 31, 2004 - a price of greatly-increased complexity. The complexity of these ...... The cheap talk literature (e.g. Crawford and Sobel ...... entire domain W.

Quotidian Medical Epistemology
books, and internet resources are among the conduits. The absence or ... the last six years: ♢. “Coffee May Protect Against Diabetes,” according to WebMD. 2.

Parallel Time Series Modeling - A Case Study of In-Database Big Data ...
R's default implementation in the “stats” package. keywords: parallel computation, time series, database management system, machine learning, big data, ...

Weisberg, The Limits of Epistemology, Rudolf Carnap Confronts ...
Traditional metaphysical discussions, he argues, can be decom- posed into arguments about two distinct kinds of questions. Consider, for. example, questions ...

A Study of the Interrelated Bilateral Transactions in ...
In the first case, the U.S. Department of Justice (DOJ) .... credit card rates remained high even when other consumer loan rates declined ...... Wells Fargo Bank.

Dynamic interactive epistemology - CiteSeerX
Jan 31, 2004 - A stark illustration of the importance of such revisions is given by Reny (1993), .... This axiom system is essentially the most basic axiom system of epistemic logic ..... Working with a formal language has precisely this effect.