## Assimilation of observations

Classically, the information deduced from the analysis of observations and from model simulations is obtained from two different, clearly separated streams that are then combined to provide additional insight, in order to give more strength to the conclusions, or to propose a new interpretation. In the sections above, we have shown examples where a stronger coupling between modeling and data analysis could be very instructive. This is precisely the goal of the assimilation of observations or data assimilation, which is defined by Talagrand (1997) "as the process through which all the information is used to estimate as accurately as possible the state of the atmospheric or oceanic flow. The available information essentially consists of the observations proper, and of the physical laws which govern the evolution of the flow. The latter are available in practice under the form of a numerical model."

Data assimilation techniques have been used successfully in various domains of meteorology and oceanography during the past decades (for a review, see for instance Talagrand 1997; Kalnay 2003, Brasseur 2006). One of the initial and most important applications of such techniques is in weather forecasting, to determine the initial conditions at a time t0 from which subsequent evolution of the system at time t > t0 will be deduced. This step, which is called the analysis, combines some background information (also denoted as first guess or prior information) with available observations. Following the notation and formulation of Kalnay (2003), observed variables are noted y0, while xb is the background field. It is generally obtained from a previous forecast using a model simulation starting from an analysis at time t - 1 (t - 1 < t0) until time t0. Both y0 and xb are three-dimensional fields. Variables y0 and xb are not usually obtained at the same location and some interpolation is needed. Furthermore, observations could provide a different variable than the one needed by the model. Some transformation of model results (or alternatively of observations) is then needed before they can be compared with data. This is represented by H(xb), where H is called the observation operator. The difference between the first guess obtained from the model forecast and observations, y - H(xb), is called observational increment or innovation. The analysis xa is then obtained by adding this innovation to the model forecast, using some weight W that is determined by accounting for uncertainties in model results as well as in observations. This could be mathematically represented by xa = xb + W(y0 - H(xb)) (1)

The sequence of obtaining a first guess from model results and then updating using data to provide the analysis can be repeated from time to time in an "analysis cycle". Such a technique is often called sequential assimilation (e.g. Talagrand 1997; Kalnay 2003).

In weather forecasting, model and data assimilation schemes (for instance the formulation of W and H) are improved regularly, providing potential inconsistencies between different periods. Reanalyses of the past 50 years, using the same procedure during all periods covered (e.g. Kalnay et al. 1996), have provided a very valuable and widely used comprehensive set of atmospheric fields.

Such a procedure could, in theory, be applied to any period and thus also for the past millennia. However, several difficulties arise, a major one being the amount of data. Hundreds of thousands of atmospheric and surface observations are used for the analyses in weather forecasting, covering all areas of the world, and providing a very detailed knowledge of the atmospheric state, whereas only a few tens up to hundreds of proxy records are available for the past millennium, mainly localized on the mid-latitude continents. Secondly, analysis in weather forecasting is performed several times a day, typically every 6 hours. By contrast, proxy data provide at best seasonal resolution. It is therefore not possible to constrain the day-to-day variability of the climate system. Thirdly, proxy data contain climatic as well as nonclimatic signals and the variables measured, such as tree-ring width, wood density or varve thickness, are usually not a simple function of climate. Determining precisely the form of the observation operator H is thus a difficult task. Finally, sophisticated data assimilation techniques are expensive, in particular if they have to be applied on relatively long periods such as the past millennium. The central processing unit (CPU) time requirements could be a limiting factor for the AOGCMs, and the technique developed for data assimilation exercises covering the past 20 to 50 years for the atmosphere and the ocean cannot be transferred to the analysis of climate variations during the past millennium without significant adaptation.

Because of the difficulties, only a few attempts of data assimilation have been published for the past millennium (von Storch et al. 2000; Jones and Widmann 2003; van der Schrier and Barkmeijer 2005, Goosse et al. 2006). The first technique proposed was called DATUN, for Data Assimilation Through Upscaling and Nudging (von Storch et al. 2000). In this technique, the first step, upscaling, is to derive from proxy data the intensity of some large-scale patterns, for instance the one of the Northern Annular Mode (also known as the Arctic Oscillation and is linked to the NAO, see previous section). The goal is to increase the signal-to-noise ratio, as proxy data are usually more suitable for reconstructing large-scale fields than small-scale ones. In the second step, a simple nudging technique is applied to force the model to remain close to the pattern obtained by upscaling. In DATUN, the nudging is performed directly on the pattern space.

The goal of van der Schrier and Barkmeijer (2005) is also to bring model results close to a pattern deduced from observations. In the example they studied, the target for the model surface atmospheric circulation in winter is a reconstruction for the period 1790-1820. This is not achieved through nudging as in DATUN, but by adding small perturbations to model variables that optimally lead the atmospheric model to the target pattern. These perturbations are called "forcing singular vectors" and are computed using the adjoint of the atmospheric model. To apply this technique, the adjoint model must be available, which induces a limitation of the applications. As the forcing singular vectors have small amplitudes, they affect only the large-scale variability while the synoptic variability evolves freely. Van der Schrier and Barkmeijer (2005) argue that this provides an advantage compared with nudging techniques, which tend to suppress the variability in the nudged pattern. Unfortunately, no systematic comparison of the two techniques is presently available.

In the technique described by Goosse et al. (2006), it is not necessary to obtain first a large-scale pattern from observations. The proxy data are directly compared with model results of an ensemble of simulations, using a simple quadratic law to compute a cost function CF:

Figure 7.9 Anomaly of summer mean temperature in Fennoscandia. The average over an ensemble of 125 simulations performed with ECBILT-CLIO-VECODE is in black. The reconstruction of Briffa etal. (1992) is in green. The red line is the best pseudo-simulation obtained by using 12 proxy records in the Northern Hemisphere extratropics (Goosse et al. 2006). The time series are grouped by 10-year average and divided by their standard deviation. The reference period is 1500-1850.

where CFk(t) is the value of the cost function for the experiment k - a member of the ensemble - for a particular period t, and n is the number of reconstructions or proxy data used in the model-data comparison. Fobs is the reconstruction of a variable F, based on observations, in a particular location for the period t. Fkobs is the value of the corresponding variable F simulated in experiment k in the model and wi is a weight factor characterizing the statistics and reliability of proxy data.

For the technique of Goosse et al. (2006), a relatively large number of simulations (i.e. at least 30) is required. CFk(t) is first evaluated for all the simulations and all the time periods. The minimum for each t over all k experiments is then selected and the corresponding experiment corresponds to the "best" one for this period. Such a "best" model estimate for the whole millennium is obtained by grouping the best simulations for all the periods t, obtaining what is called the best pseudosimulation. The technique selects among the ensemble of simulations the one that is the closest to the available reconstructions, with the cost function measuring the misfit between model results and proxy records. From this cost function, it is then also possible to assess the quality of the reconstruction for all the periods investigated. An example of the application of this technique is illustrated in Figure 7.9, showing that it is possible to find, for nearly all the periods, an element of the ensemble that is close to all the available proxies, at least in the case tested by Goosse et al. (2006) using only a few temperature-sensitive proxies that covered (nearly) the whole past millennium. We must be aware, however, that this technique implicitly assumes that the ensemble members cover a sufficiently wide range of internal variability compared with the observed one, in order to find a good analog of the observed situations among the simulations. Problems could thus occur in regions where this assumption is not valid.

1100 1200 1300 1400 1500 1600 Time (years)

1700 1800 1900

1100 1200 1300 1400 1500 1600 Time (years)

### 1700 1800 1900

A main goal of the data assimilation techniques is to provide a reconstruction of the state of the climate system that is consistent with proxy records, the physical laws described by the model, and forcing reconstructions. These techniques can be considered as a physically based way to interpolate the information provided by data that have an incomplete spatial and temporal coverage. This reconstruction of past climate variations includes the forced response of the system as well as the random variations associated with internal variability. As mentioned before, models alone could not estimate the history of the latter component if model evolution is not constrained by observations.

Although observations provide only regional- and seasonal-scale constraints, the estimates obtained by these techniques could be used to analyze high-frequency changes. For instance, van der Schrier and Barkmeijer (2005) analyze, in their model, the influence of the assimilation of a pattern of winter-mean sea-level pressure corresponding to the period 1790-1820 on the number of cyclones in the Atlantic and found an enhanced cyclonic activity both south and north of the modern storm track position. The model results also include information on variables that are not assimilated, such as oceanic currents. This has allowed van der Schrier and Barkmeijer (2005) to show that changes in oceanic surface circulation and temperature in the North Atlantic have played a role during the coldest periods observed in Europe in winter. This kind of information on variables that are not directly accessible using proxy records provides a very powerful tool to analyze past climate variations. Indeed, it is possible to propose mechanisms that are consistent with proxy evidence as well as with the physics governing the atmosphere, the ocean, and sea ice. In a following step, the proposed hypotheses could be tested using new data or the results of the data assimilation exercise could be used to select data locations that would give essential information and a strong constraint on the mechanisms involved.

0 0