Personal tools
You are here: Home Catalogue of methods Model Performance
Related terms
Navigation
What's up ?
Be notified when a document is published in this folder or below.
 
Views
Model Performance copied.

Evaluating the performance of complex nonlinear models, often given limited observational data that may not exactly match model predicted variables, remains a fundamental problem. As noted earlier, methods for model evaluation and uncertainty analysis are well established in statistics, where formal assumptions can be made about the nature of the errors. In principle, these methods can also be applied to evaluate model structural error (sometimes called model inadequacy) where that error can also be represented by a statistical model (e.g. Wynn et al., 2001a; Wynn et al., 2001b). Bayesian statistical methods also allow prior information about model structures and parameters to be incorporated into the process. Where the necessary assumptions can be justified, these methods allow strong inference about the probabilities of predicting observations conditional on the model. These types of methods have been increasingly used in rainfall-runoff modelling (e.g Beven and Freer, 2001; Freer et al., 1996; Kavetski et al., 2003a; Romanowicz et al., 1996; Romanowicz et al., 1994; Thiemann et al., 2001; Vrugt et al., 2005; Vrugt et al., 2003).

When considered in the context of linear statistical inference, assumptions about the structure of the errors allow the interpretation of modelling uncertainty as a probability of predicting an observation conditional on the model. Thus, for example, in linear statistics use of the standard least square objective function in estimating prediction confidence limits implies an assumption of an uncorrelated Gaussian noise. If this assumption is not correct (as is normally the case in rainfall-runoff modelling, for example, where model residuals are often heteroscedastic and invariably correlated in time; or in hydraulics where errors may exhibit bias and be correlated in both time and space) statistical theory shows that for a linear system the resulting parameter estimates will be biased. By analogy, we should expect a similar bias with a nonlinear model (e.g. Kavetski et al., 2003a). Different (more complete) assumptions can be made about the residuals to formulate more complex statistical likelihood functions (e.g. Gupta et al., 1999; Liu et al., 2005; Moradkhani et al., 2005; Romanowicz et al., 1996), and, in principle, the validity of the error assumptions can be checked. However, with the multiple interacting sources of uncertainty common to environmental systems, it seems that it will be difficult to formulate a consistent statistical representation of the modelling error (Kavetski et al., 2003b).

There is a very real difficulty of whether for the type of nonlinear modelling commonly applied in flood hydrology and hydraulics there is adequate justification for making the strong statistical error assumptions required by these models (see the discussion of Beven and Young(Beven and Young, 2003)). This is particularly true for the evaluation of distributed model predictions where observational data are available for model evaluation. This has led to the development of alternative formal frameworks for model evaluation and rejection (Beven, in press; Pappenberger and Beven, 2004). In that these methodologies are not based on formal statistical assumptions they will not provide estimates of the probability of predicting a measured value conditioned on the model. They can provide an estimate of the range of predictions consistent with the observational data available for conditioning of the model (see for example the GLUE methodology below which can use informal or fuzzy performance measures to condition belief in a model).

There has been a tradition in rainfall-runoff modelling, for example, of using, Nash-Sutcliffe Efficiency (Nash and Sutcliffe, 1970) to evaluate predicted flow hydrographs. Efficiency is based on the sum of squared errors between observed and predicted values and therefore, in statistical inference, could result in biased parameter estimates where the errors do not have a Gaussian independent structure. There are many other informal evaluation measures in the literature(Blazkova and Beven, 2002; Pappenberger and Beven, 2004; Romanowicz and Beven, 2003; Schulz et al., 1999)

It appears that no particular objective function is superior to others under all circumstances and that no unambiguous way of evaluating a model with complex error structures in space and time may exist (Freer et al., 1996; Gupta et al., 2003; Gupta and Sorooshian, 1985; Sorooshian et al., 1983; van Straten and Keesman, 1991; Yan and Haan, 1991c; Yapo et al., 1996). It has been argued that in a decision-theoretic setting there would be an objective solution that minimises loss or risk. However, if there is uncertainty in both predicted outcomes and possible consequences, and different ways to compute costs or benefits, a decision maker may want to allow for such uncertainties in taking a more or less precautionary position to risk.

In summary, model evaluation criteria should be chosen to be fit for purpose. For example, if flood peaks are of interest than a performance measure which gives greater weight to the simulation errors for the flood peaks should be used. One rational method to choose a measure of model performance is derived from the risk-based decisions which the model is intended to inform: In the long run does the model improve decision-making (in the sense of long run net gain/loss of utility) compared with competing models?

An additional way to counter this problem is by using multi-objective or multi-criteria model calibration (Boyle et al., 2000; Franchini and Galeati, 1997; Freer et al., 2003; Gupta et al., 2003; Refsgaard, 1997; Yapo et al., 1996). In this work a distinction is made between multi-criteria and multi-objective (sometimes termed multi-signal). Multi-criteria are used when different evaluation criteria are used on the same set of data (Emsellem and de Marsily, 1971; Hogue et al., 2000; Neuman, 1973; Yan and Haan, 1991a; Yan and Haan, 1991b; Yan and Haan, 1991c) whereas multi-objective are applied when different data sets are used for model evaluation (Ambroise et al., 1995; de Grosbois et al., 1988; Hooper et al., 1988; Kuczera, 1983a; Kuczera, 1983b; Mroczkowski et al., 1997). It has been recognised that global indices on their own are not distinctive enough to discriminate between several models.

In other words, equal performance may result from fundamentally different model responses (Naef, 1981). An analogy would be if one tries to classify animals. When one looks only at one property for example colour. A grey elephant would be the same as a grey hippo. However, if additional measures are included it might be possible to distinguish these two. Therefore, it is necessary to apply various objective functions which reflect different features of the evaluation data to test the model hypothesis (Beven, 2001). For example, it is possible to combine the Nash-Sutcliffe (Nash and Sutcliffe, 1970) criterion, which is more sensitive to errors in simulating the peak values of variables, with a volumetric measure and thus extract more information on model performance out of the same data series. Another recent example is the subdivision of a hydrograph into three local criteria, which match the rising limb, the early recession and the late recession of a flow hydrograph (Boyle et al., 2003; Boyle et al., 2000). Other methods include the analysis of seasonal responses (Freer et al., 2003; Legates and McCabe?, 1999). The seasonal response is closely linked to the second way forward to tackle the problem of computing ‘correct’ responses based on the ‘correct’ reasons. The analysis of several model results (multi-objective) can give valuable insight into model behaviour. For example, an inundation model should not only be able to predict the outflow hydrograph, but also water levels within the reach.

However, mixed results have been achieved employing the multi-objective framework. In some studies the uncertainty range in parameter estimates and responses could not be reduced (Blazkova et al., 2002; Kuczera and Mroczkowski, 1998) whereas in others significant improvements could be achieved. This discrepancy can be simplified by our previous animal analogy: No improvement of our evaluation would be given by an evaluation criteria based on the number of legs. However, if the length of a trunk would have been chosen, a classification might have been possible. This shows that a multi-criteria approach does not necessarily increase the information content for model evaluation.

Fundamentally, there is an unresolved problem of linking field measurements to catchment responses (Dooge et al., 1982). For example, many variables predicted by models are not the same quantity as their measured ‘equivalents’ despite being termed in the same way. Such discrepancy can be due to heterogeneity, scale effects, nonlinearities or measurement techniques. A soil moisture variable, for example, might be predicted as an average over a model grid element several metres in spatial extent and over a certain time step; the same variable might be measured at a point in space and time by a small gravimetric sample, or by time domain reflectrometry integrating over a few tens of cm, or by a cross-borehole radar or resistivity technique, integrating over several metres. Only the latter might be considered to approach the same variable as predicted by the model, but may itself be subject to a model inversion that involves additional parameters in deriving an estimate of soil moisture (Beven, 2005).

References

Ambroise, B., Perrin, J.L. and Reutenauer, D., 1995. Multicriterion Validation of a Semidistributed Conceptual-Model of the Water Cycle in the Fecht Catchment (Vosges Massif, France). Water Resources Research, 31(6): 1467-1481.

Beven, K.J., 2001. How far can we go in distributed hydrological modelling? Hydrology and Earth System Sciences, 5(1): 1-12.

Beven, K.J., in press. A Manifesto for the equifinality thesis. Journal of Hydrology, --(--): --.

Beven, K.J. and Freer, J., 2001. Equifinality, data assimilation, and uncertainty estimation in mechanistic modelling of complex environmental systems using the GLUE methodology. Journal of Hydrology, 249(1-4): 11-29.

Beven, K.J. and Young, P., 2003. Comment on "Bayesian recursive parameter estimation for hydrologic models" by M. Thiemann, M. Trosset, H. Gupta, and S. Sorooshian. Water Resources Research, 39(5): art. no.-1116.

Blazkova, S. and Beven, K.J., 2002. Flood frequency estimation by continuous simulation for a catchment treated as ungauged (with uncertainty). Water Resources Research, 38(8): art. no.-1139.

Blazkova, S., Beven, K.J. and Kulasova, A., 2002. On constraining TOPMODEL hydrograph simulations using partial saturated area information. Hydrological Processes, 16(2): 441-458.

Boyle, D.P., Gupta, H. and Sorooshian, S., 2003. Multicriteria calibration of hydrologic models. In: Q. Duan, H. Gupta, S. Sorooshian, A.N. Rousseau and R. Turcotte (Editors), Advances in Calibration of Watershed Models. American Geophysical Union, Washington.

Boyle, D.P., Gupta, H.V. and Sorooshian, S., 2000. Toward improved calibration of hydrologic models: Combining the strengths of manual and automatic methods. Water Resources Research, 36(12): 3663-3674.

Chiew, F. and McMahon?, T.A., 1994. Application of a daily rainfall-runoff model MODHYDROLOG to 28 Australian catchments. Journal of Hydrology, 153: 383-416. de Grosbois, E., Hooper, R.P. and Christophersen, N., 1988. A Multisignal Automatic Calibration Methodology for Hydrochemical Models - a Case-Study of the Birkenes Model. Water Resources Research, 24(8): 1299-1307.

Dooge, J.C.I., Strupczewski, W.G. and Napiorkowski, J.J., 1982. Hydrodynamic derivation of storage parameters of the Muskingum model. Journal of Hydrology, 54(4): 371-387.

Emsellem, Y. and de Marsily, G., 1971. An automatic solution for the inverse problem. Water Resource Research, 7: 1264-1283.

Franchini, M. and Galeati, G., 1997. Comparing several genetic algorithm schemes for the calibration of conceptual rainfall-runoff models. Hydrological Sciences Journal-Journal Des Sciences Hydrologiques, 42(3): 357-379.

Freer, J., Beven, K.J. and Ambroise, B., 1996. Bayesian estimation of uncertainty in runoff prediction and the value of data: An application of the GLUE approach. Water Resources Research, 32(7): 2161-2173.

Freer, J., Beven, K.J. and Peters, N., 2003. Multivariate seasonal period model rejection within the generalised likelihood uncertainty estimation procedure. In: Q.Y. Duan, H. Gupta, S. Sorooshian, A. Rousseau and R. Turcotte (Editors), Calibration of watershed models. American Geophysical Union, Washington, pp. 69-88.

Gupta, H., Sorooshian, S., Hogue, T.S. and Boyle, D.P., 2003. Advances in automatic calibration of watershed models. In: Q.Y. Duan, H. Gupta, S. Sorooshian, A. Rousseau and R. Turcotte (Editors), Calibration of watershed models. American Geophysical Union, Washington.

Gupta, H.V., Bastidas, L.A., Sorooshian, S., Shuttleworth, W.J. and Yang, Z.L., 1999. Parameter estimation of a land surface scheme using multicriteria methods. Journal Of Geophysical Research-Atmospheres, 104(D16): 19491-19503.

Gupta, V.K. and Sorooshian, S., 1985. The Relationship between Data and the Precision of Parameter Estimates of Hydrologic-Models. Journal of Hydrology, 81(1-2): 57-77.

Hogue, T.S., Sorooshian, S., Gupta, H., Holz, A. and Braatz, D., 2000. A multistep automatic calibration scheme for river forecasting models. Journal of Hydrometeorology, 1(6): 524-542.

Hooper, R.P., Stone, A., Christophersen, N., Degrosbois, E. and Seip, H.M., 1988. Assessing the Birkenes Model of Stream Acidification Using a Multisignal Calibration Methodology. Water Resources Research, 24(8): 1308-1316.

Kavetski, D., Kuczera, G. and Franks, S.W., 2003a. Semidistributed hydrological modeling: A "saturation path" perspective on TOPMODEL and VIC. Water Resources Research, 39(9): art. no.-1246.

Kavetski, D.N., Franks, S.W. and Kuczera, G., 2003b. Confronting input uncertainty in environmental modelling. In: Q. Duan, H. Gupta, S. Sorooshian, A.N. Rousseau and R. Turcotte (Editors), Advances in Calibration of Watershed Models. American Geophysical Union, Washington, pp. 49-68.

Kuczera, G., 1983a. Improved Parameter Inference in Catchment Models.1. Evaluating Parameter Uncertainty. Water Resources Research, 19(5): 1151-1162.

Kuczera, G., 1983b. Improved Parameter Inference in Catchment Models.2. Combining Different Kinds of Hydrologic Data and Testing Their Compatibility. Water Resources Research, 19(5): 1163-1172.

Kuczera, G. and Mroczkowski, M., 1998. Assessment of hydrologic parameter uncertainty and the worth of multiresponse data. Water Resources Research, 34(6): 1481-1489.

Legates, D.R. and McCabe?, G.J., 1999. Evaluating the use of "goodness-of-fit" measures in hydrologic and hydroclimatic model validation. Water Resources Research, 35(1): 233-241.

Liu, Y.Q., Gupta, H.V., Sorooshian, S., Bastidas, L.A. and Shuttleworth, W.J., 2005. Constraining land surface and atmospheric parameters of a locally coupled model using observational data. Journal Of Hydrometeorology, 6(2): 156-172.

Moradkhani, H., Sorooshian, S., Gupta, H.V. and Houser, P.R., 2005. Dual state-parameter estimation of hydrological models using ensemble Kalman filter. Advances In Water Resources, 28(2): 135-147.

Mroczkowski, M., Raper, G.P. and Kuczera, G., 1997. The quest for more powerful validation of conceptual catchment models. Water Resources Research, 33(10): 2325-2335.

Naef, F., 1981. Can We Model the Rainfall-Runoff Process Today. Hydrological Sciences Bulletin-Bulletin Des Sciences Hydrologiques, 26(3): 281-289.

Nash, J.E. and Sutcliffe, J.V., 1970. River flow forecasting through conceptual models, Part I - A discussion of principles. Journal of Hydrology, 10: 282-290.

Neuman, S.P., 1973. Calibration of distributed parameter groundwater flow models viewed as a multiple-objective decision process under uncertainty. Water Resource Research, 9: 1006-1021.

Pappenberger, F. and Beven, K., 2004. Functional Classification and Evaluation of Hydrographs based on Multicomponent Mapping. International Journal of River Basin Management, 2(2).

Refsgaard, J.C., 1997. Parameterisation, calibration and validation of distributed hydrological models. Journal of Hydrology, 198(1-4): 69-97.

Romanowicz, R. and Beven, K.J., 2003. Estimation of flood inundation probabilities as conditioned on event inundation maps. Water Resources Research, 39(3): art. no.-1073.

Romanowicz, R., Beven, K.J. and Tawn, J., 1996. Bayesian calibration of flood inundation models. In: M.G. Anderson, D.E. Walling and P.D. Bates (Editors), Floodplain Processes. John Wiley & Sons, New York, pp. 333-360.

Romanowicz, R., Beven, K.J. and Tawn, J.A., 1994. Evaluation of predictive uncertainty in nonlinear hydraulic models using a Bayesian Approach. In: V. Barnett and K.F. Turkman (Editors), Statistics for the Environment 2, Water Related Issues. Wiley & Sons, New York, pp. 297-317.

Schulz, K., Beven, K.J. and Huwe, B., 1999. Equifinality and the problem of robust calibration in nitrogen budget simulations. Soil Science Society of America Journal, 63(6): 1934-1941.

Sorooshian, S., Gupta, V.K. and Fulton, J.L., 1983. Evaluation of Maximum-Likelihood Parameter-Estimation Techniques for Conceptual Rainfall-Runoff Models - Influence of Calibration Data Variability and Length on Model Credibility. Water Resources Research, 19(1): 251-259.

Thiemann, M., Trosset, M., Gupta, H. and Sorooshian, S., 2001. Bayesian recursive parameter estimation for hydrologic models. Water Resources Research, 37(10): 2521-2535.

van Straten, G. and Keesman, J., 1991. Uncertainty propagation and speculation in projective forecasts of environmental change: a lake eutrophication example. Journal of Forecasting, 10: 163-190.

Vrugt, J.A., Diks, C.G.H., Gupta, H.V., Bouten, W. and Verstraten, J.M., 2005. Improved treatment of uncertainty in hydrologic modeling: Combining the strengths of global optimization and data assimilation. Water Resources Research, 41(1): art. no.-W01017.

Vrugt, J.A., Gupta, H.V., Bouten, W. and Sorooshian, S., 2003. A Shuffled Complex Evolution Metropolis algorithm for optimization and uncertainty assessment of hydrologic model parameters. Water Resources Research, 39(8): art. no.-1201.

Willmott, C., 1981. On the validation of models. Physical Geography, 2: 184-194.

Willmott, C., 1982. Some comments on the evaluation of model performance. Bulletin American Meterological Society: 1309-1313.

Wynn, H.P. et al., 2001a. Bayesian calibration of computer models - Discussion. Journal Of The Royal Statistical Society Series B-Statistical Methodology, 63: 450-464.

Wynn, H.P. et al., 2001b. Bayesian calibration of computer models - Discussion. Journal of the Royal Statistical Society Series B-Statistical Methodology, 63: 450-464.

Yan, J. and Haan, C.T., 1991a. Multiobjective Parameter-Estimation for Hydrologic-Models - Multiobjective Programming. Transactions of the Asae, 34(3): 848-856.

Yan, J. and Haan, C.T., 1991b. Multiobjective Parameter-Estimation for Hydrologic-Models - Multiobjective Programming (Correction of Transactions of the Asae, Vol 34, No 3, Pg 848, 1991). Transactions of the Asae, 34(4): 848-856.

Yan, J. and Haan, C.T., 1991c. Multiobjective Parameter-Estimation for Hydrologic-Models - Weighting of Errors. Transactions of the Asae, 34(1): 135-141.

Yapo, P.O., Gupta, H.V. and Sorooshian, S., 1996. Automatic calibration of conceptual rainfall-runoff models: Sensitivity to calibration data. Journal of Hydrology, 181(1-4): 23-48.

Go to

Risk and Uncertainty (Description and Definition), Evaluating Model Performance and Conditioning of Uncertainties as Data are made available




subject:
 

Powered by Plone CMS, the Open Source Content Management System

This site conforms to the following standards: