Catherine Gautier and Pete Peterson
With the enormous flow of data that will be generated by the sensors on the upcoming US and international Earth Observing platforms, a number of data base-related issues will have to be tackled if we want these new data sets to be optimally used for global studies. While some of those issues are being addressed at the program level such as with EOSDIS (see other papers at this conference), other issues need to be looked at more from a users' perspective. This is what is done here. Among those data base issues for the users to handle is the fusion of data coming from different sources: satellites and orbiting platforms and in-situ (ship, aircraft, surface measurements), and their integration into models. One way to approach this issue is 4-D data assimilation whereby general circulation models of the atmosphere and the ocean are used to "assimilate" the data. This is however not a unique approach and may not be the best one to unravel new processes, because these processes can be masked by all the other processes that are at work within the complex model used in the assimilation.
The approach we discuss here is one by which enormous data sets generated from different sources are combined and integrated with simpler models to produce low level geophysical parameters for which we do not have an existing data base at a high time and space resolution. This is generally true for geophysical processes dealing with the ocean, or the atmosphere over the ocean, where the sampling in both time and space is too limited. We have however recently understood the crucial role these oceanic and air-sea interaction processes play in the variations and evolution of the earth's climate and therefore, the need to improve our knowledge of their variability.
The main questions we will address here are: (1) how do we extract scientifically useful information from 10's of Gigabytes of satellite radiance data and first order geophysical products which have variable time and space sampling? and, (2) what role can GIS tools play in this data melt-down whereby hundreds of 2-D data sets over long periods of time are essentially digested down to a single curve ?
To focus our approach to these questions, we discuss an example of a scientific topic that can presently be addressed with available data and faces many of the same issues as those that we will be faced when EOS and other enormous data sets become available, at the end of the century. The scientific question we are interested in investigating is that of the annual mean meridional oceanic heat transport. This scientific topic is of extreme importance to climate since the oceans play a major role in redistributing the heat that is acquired at low latitudes and lost at high latitudes. This heat redistribution effected by the major ocean currents helps drive weather and climate on the adjacent continents due to the exchanges of heat that take place between the ocean and the atmosphere at all latitudes.
We first describe the data manipulation we presently perform to derive the annual mean meridional oceanic heat transport, including its various components Then we discuss the areas within which GIS tools could facilitate our computations. Finally we investigate the areas for which GIS tools could have a unique contribution.
There are many ways to compute the meridional oceanic heat transport, the most direct one being to use ocean currents together with ocean temperature distribution. This however requires knowledge of the vertical distribution of currents and temperature over all the oceans at a spatial resolution commensurate with currents and temperature spatial variability and a temporal resolution of at least a season to obtain the mean annual meridional heat transport. Data at these resolutions have not yet been available and therefore our knowledge of the meridional oceanic heat transport is limited to long-term (10-50 year) averages. Many indirect approaches are possible and here we present one whereby the net surface heat flux is first computed over all the oceans and the meridional oceanic heat transport is estimated from the meridional divergence of the zonal mean net surface heat flux. The main issue is therefore to compute the net heat flux at the ocean-atmosphere interface. This flux, Qnet, is given by:
Where SW is the net surface solar (shortwave hence, SW) radiation flux, LW is the net longwave radiation flux, LHF is the turbulent latent heat flux corresponding to the evaporation of water from the ocean surface and SE is the turbulent sensible heat flux. The net heat flux is taken to be positive going into the ocean and negative when leaving the ocean. Each term is evaluated individually but the two largest terms, SW and LHF will be focused on here. The other two terms are temporarily obtained from climatologies.
The latent heat flux is a high level geophysical parameter which is directly measurable neither from the surface nor from space. It is typically estimated using what is called a parameterization, which is a formulation that relates a parameter at a generally small scale to some "measureables" at a larger scale. In the case of the latent heat flux, the "measureables" are the low level geophysical products: surface wind speed, relative and absolute humidity at the ocean's surface and surface ocean temperature. These can be measured from ships, but are not directly measurable from space. They can, however, be estimated from microwave and infrared radiance data through the application of a retrieval algorithm.
The determination of latent heat flux over the ocean surface from space is based on microwave brightness temperatures from the SSM/I sensor which flies on the DSMP satellites series and infrared radiances from the AVHRR sensor on the NOAA satellites series (Jourdan and Gautier, 1995). The microwave brightness temperatures are transformed into surface wind speed and total precipitable water in the atmosphere, among other geophysical parameters by F. Wentz after a number of data manipulations (Wentz, 1983, 1992). The AVHRR infrared radiances are transformed into sea surface temperature by NOAA (Smith, 1992). All these data manipulations (application of operational retrieval algorithms) are similar to those that will be applied to EOS data in the near future and are not discussed here. We only start the discussion of our analysis with the low level geophysical data sets which are available to the scientific community.
The SW product is, like the LHF, based on a number of satellite data sets. In this case the data base of origin is a set of satellite radiances in the visible and infrared spectral regions taken from geostationary and polar orbiting satellites. The data are remapped over the globe (Rossow, 1991) and analyzed to produce cloudy and clear (surface) radiances as well as percentage area covered by clouds. Together with sun-illumination and viewing geometry and atmospheric properties (such as total precipitable water and total ozone amount), these data can be used to compute the net solar surface radiation (or shortwave radiation). Daily and monthly fields of SW can be computed (Gautier et al., 1980) and included in the overall computations of the net surface heat flux.
LHF and SW are combined with climatological values of LW and SE to produce monthly mean fields of surface heat flux. The meridionial oceanic heat transport is computed by integrating from north to south the zonal average of the net heat flux. A simple rescaling by the cosine of the latitude is necessary to obtain the actual flux values.
The selected application - annual mean meridional oceanic heat transport - can easily be generalized to many climatologically relevant products. The procedure used applies a number of data manipulation steps to derive a high level geophysical climatological product from a low level geophysical products that have been obtained from multiple satellite sensor or in-situ measurements. The high level geophysical product thus obtained have two characteristics: (1) they are many steps removed from the direct satellite data or in-situ measurements and, (2) they represent a new type of parameters that has never been derived before at this resolution. Its characteristics and time and space variability, therefore, are little known. These imply that many quality and accuracy evaluations must take place at every step of the computation process. While the quality evaluation is generally performed through visualization, the accuracy evaluation is usually performed through some type of statistical analysis.
Even though the data volume has already been reduced by at least an order of magnitude when we acquire it, because we are interested in climate problems, the data sets need to extend over long periods of time. We, therefore, deal with very large volumes of data. In the latent heat flux computations, for instance, one year of SSM/I data occupies four (4) Gigabytes. The data set includes each geophysical variable, for each footprint, for every swath for the whole year, organized into month long blocks.
The first step in the SSM/I geophysical data manipulation is transform Level 2 swath data into Level 3 gridded data of fine resolution monthly averaged total precipitable water and surface wind speed (Figure 1). Without GIS tools it is necessary to write our own procedure to perform this task. The Interactive Display Language (IDL) is presently used to perform this tasks. It is likely that a standardized GIS product should be able to perform this sort of function routinely.
The monthly mean SSM/I low level geophysical products, along with AVHRR SST data, are rewritten as ASCII data files which are the inputs to our LHF model written in FORTRAN and run on a Dec alpha. In the case of the SW production scheme, all the initial computational steps are performed on binary data files provided as standard ISCCP products by ISCCP project for 3-hour averages over a seven year period. Once the daily SW fields are calculated IDL is used to produce the monthly averages (Figure 2) and put these in a format which can be combined with the LHF, LW and SE fields. This includes recasting all fields to .5 degree resolution. From this point on, all subsequent operations and visualization are performed in IDL (Figures 3 and 4).
Statistical modeling is often used in environmental research to transform low level geophysical products into higher level products. In our application statistical and physical models are both used.
First, the LHF computations require the knowledge of the surface saturation and relative humidity (which are not directly measurable from space). The air mixing ratio, Qa and surface air temperature, Ta, need to be derived from the satellite derived total precipitable water. The goal of that step is to statistically relate the total precipitable water to the surface air humidity (expressed as air mixing ratio, Qa) and temperature. Whereas it has been demonstrated that such a statistical relationship exists over monthly time scale (Liu , 1986), the accuracy of that relationship has been found to vary geographically (Esbensen et al., 1993). A tool that would help in assessing the geographical variations of the accuracy of this relationship and accounting for these variations in the final LHF computations would be very useful.
Second, the LHF computations are performed with a physical model introduced by Liu et al., 1979. This model uses a well known aerodynamic bulk formula (ABF) to relate measurables of moisture, temperature and wind speed to heat and momentum fluxes across the air-sea interface. What makes this model different is the use of similarity theory to determine the transfer coefficients necessary to compute the ABF. Conceptually the similarity theory states that the vertical profiles of moisture, temperature and wind speed should have similar shapes owing to the fact that they are mutually acted upon by the same physical forces throughout space & time. A more detailed theoretical discussion can be found in Businger (1975). The similarity theory requires measurements at two heights, the surface and some common reference height for all three parameters. An iteration technique is then used to find the common profile which has the "best" overall fit for the three parameters.
The relationship between the total precipitable water (TPW) and the air mixing ratio, Qa, is based on the analysis of radiosonde measurements. Starting from the conceptual notion that Qa will respond to changes in the TPW a statistical relationship was sought between available radiosonde data and TPW. Here we are using information in the vertical direction to modify data in the spatial direction. This method fails for single events, but by averaging over sufficient time scales (~2 weeks) a relationship is found (Liu, 1986). This is a typical example of the application of a statistical relationship for point (radiosonde) data applied to grid data as part of the process of coming up with a high level product. This is probably an area where GIS tools can be useful. In a similar way the total precipitable water is related to the surface air temperature, Ta, which determines the surface saturation humidity.
A procedure that is often used with satellite data when other data are available to complement them, is the blending of data sets of different origin. This is particularly true when one expects that improved fields can be obtained if the satellite data can be complemented, in some ways, with surface observations in which we have a higher confidence level. Such a data blending approach is used, for instance, to derive global sea surface temperature produced by the Climate Analysis Center of NOAA. SST measured by ships and buoys are used together with AVHRR-derived SST to obtain an optimal SST product for use in climate studies (Reynolds et al., 1988).
The LHF produced by satellite data only (both microwave and infrared measurements) has been found to have some deficiencies in some regions and even systematic errors (Jourdan and Gautier, 1995, Esbensen et al., 1994). We have therefore developed a method for blending satellite-derived products with available in-situ data from the Comprehensive Ocean Atmosphere Data System (COADS) (Slutz et al., 1985) (see Figure 3). The idea being that while the in-situ data may be more accurate, their sampling is very limited and the gradient provided by the satellite data sets can be used to interpolate between the sparse in-situ measurements. This is achieved by assigning weights to the in-situ data sets that depend on the number of measurements used to produce time averages. The satellite data are weighted by the number of samples (nbsTa COADS, nbsQa COADS) used to determine the COADS value. These blended inputs, Qa blend and Ta blend, are then used to recalculate the LHF using the same model as before.
One of the problems with combining data sets used to compute the four terms that compose the net surface heat flux is that the data sets used have different resolutions. For instance, the ISCCP data and therefore the SW product we compute is only available at 2.5 degree resolution, while the COADS data used to blend the LHF inputs is available at 2 degrees resolution. The method we use to combine these two data sets is to rescale each to the finer resolution of .5 degree resolution and do the arithmetic at this scale. This obviously involves some assumptions regarding the way the interpolation is performed. Here again, GIS tools could be integrated to provide a better way of combining these data sets at different spatial resolution.
The blended LHF program uses four inputs: blended air temperature, blended Qa, wind speed and SST. It produces eight output fields: Obukov and Stability Indices, Reynolds Number, latent and sensible drag coefficients, sensible and latent heat fluxes and wind stress. Each of these output fields has a differing sensitivity to the errors/data quality of the model inputs. It would be very useful to have a tool that can assess these sensitivities. This is discussed below.
In general, some type of statistical analysis is necessary to ensure that the data sets produced are of the highest quality possible. Averages and variance fields are computed to assess whether bias or trends exist. More sophisticated analysis are often performed that are more appropriate to the data set utilized. In our case, for instance, we were interested in investigating the large geographical differences we found between the satellite only fields and the ship only fields. In particular we wanted to investigate the influence of the data sampling on that difference. To address that goal we investigated the correlation between the difference between the two fields and the ship sampling. Considering the difference between the two fields directly results in large areas of strong disagreement, but when an allowance is made for the variable ship sampling of the COADS data, a different picture, shown in figure 5, emerges. We see that when sampling is high (along established shipping lanes) the two fields are more likely to agree than in places where there is little or no ship activity. Satellites have the superior spatial coverage with regions of systematic error, the COADS data is comprised of direct measurements with large spatial gaps, combining these two fields in space is necessary to produce an improved LHF product with the complete coverage needed to estimate oceanic heat transport.
Here we summarize a typical procedure sequence used to generate high level geophysical products. Some steps may be superfluous or not in the right order, but they are in general representative of those that need to derive such product.
While going through the above steps the user performs a variety of visual confirmations to check the quality and accuracy of the geophysical products obtained at every step. For instance, when the daily maps are derived from the raw SSM/I data, they are viewed in monthly groups to check for missing days, days with large data gaps, and correct registration of each map. At this point a missing step could result in not assigning the correct value to missing data or land areas. This can be quickly checked by viewing the images in monthly groups. Obviously, an automatic procedure that would perform this function flawlessly would speed up the evaluation procedure enormously.
Some issues where we believe GIS tools could contribute to the development of high level geophysical products such as those described here are:
At each step in the product development choices are made with regard to data quality: which data to include, how to do smoothing, corrections for known systematic errors etc.. Currently, our final product contains a single frozen collection of these decisions. An area where GIS type tools could be very helpful and have a unique contribution is the management and storage of all these sorts of data quality decisions in order to be able to assess the consequences of altering some combinations of these decisions. With regards to the issues discussed some assessments could be:
Each of these assessments could also be modified to specific spatial (where the W -> Qa relationship is weak, coastal vs. open ocean, tropical vs. southern ocean) or temporal (El Nino vs. non-El Nino years) considerations.
In this way, the reporting phase of our scientific result could be enhanced by showing appropriate error ranges that are a function of the spatial averaging processes or data selection for instance.
The idea would be to store the influences of propagation all the data quality decisions as a parallel process to the geophysical product development and come up with a comprehensive data quality description for the final product. This comprehensive data quality description would be the basis for answering the above type product assessments.
This study has been made possible by a grant from Jet Propulsion Laboratory (JPL 959177). We would also like to thank Bill O'Hirok for his useful suggestions and James Marquez for helping with the figures and manuscript preparation.
Esbensen, S.K.,D.B. Chelton, D. Vickers and J.Sun, 1993: An analysis of errors in Special Sensor Microwave Imager evaporation estimates over the global oceans. Geophys. Res., 98, 7081-7101.
Gautier, C., G. Diak and S. Masse, 1980: A Simple physical model to estimate incident solar radiation at the surface from GOES satellite data. J. Appl. Meteor., 19(8), 1005-1012
Jourdan, D., and C. Gautier, 1995: Comparison between latent heat flux computed from multisensor (SSM/I and AVHRR) and from in-situ data. J. Atmos. Ocean. Technol., 12, 46-72
Liu, W.T., K.B. Katsaros and J.A. Businger, 1979: Bulk parameterization of air-sea exchanges of heat and water vapor including the molecular constraints at the surface. J. Atm. Sci., Vol. 36, 1722-1735.
Liu, W.T., 1986: Statistical relation between monthly mean precipitable water and surface-level humidity over global oceans. Mon. Wea. Rev., 114(8), 1591-1602.
Reynolds, R.W.. 1988: A real-time global sea surface temperature. . J. Climate, 1, 75-86
Rossow, W.B., L.C. Gardev, P-J Lu and A.W. Walker, 1991: International Satellite Cloud Climatology Project (ISCCP) Documentation of cloud data. WMO/TD-No. 266, World Meteo. Org., Geneva.
Smith, E.A., 1992: User's guide to the NOAA Advanced Very High Resolution Radiometer MCSST dataset. JPL D-10737, 14 pp.
Wentz, F.J., 1983: A model function for ocean microwave brightness temperature. J. Geophys. Res., 88 (C3), 1892-1908.
Wentz, F.J., 1992: User's Manual SSM/I Monthly Ocean Tape, Remote Sensing Systems Tech. Rep.