Skip to content.
|
Skip to navigation
Site Map
Accessibility
Contact
Search Site
Advanced Search…
FloodRiskNet
Personal tools
Log in
Join
You are here:
Home
→
Catalogue of methods
→
Fitting Flood Frequency Distributions (case study)
FRMRC
Development of decision tree funded by RPA9 of the
Related terms
Navigation
Home
News
Events
Opportunities
Catalogue of methods
Fitting Flood Frequency Distributions (case study)
Glossary
Articles
Links
Projects
What's up ?
Be notified when a document is published in this folder or below.
Subscribe
Unsubscribe
Views
View
Edit
Actions
Wiki changes
Wiki contents
Related pages
History
Add Wiki Page to folder
Actions
Copy
== Introduction == ''Linear'' regression is a familiar concept, described in it’s simplest, univariate form by the ‘straight-line’ equation: ''y'' = m''x'' + c, where the linear relationship between the independent variable ''x'' and the dependent variable ''y'' is sought by determining the (in some sense) optimum values for the two constants m and c using a set of paired ''x,y'' data. (For anyone unfamiliar with this scenario, the Science Education Resource Center (SERC) provides [http://serc.carleton.edu/introgeo/teachingwdata/StatRegression.html a good tutorial]). However, many real world relationships between the independent variable(s) and the dependent variable aren’t linear. In order to apply the methods of regression to estimate model parameters (and their associated uncertainty), either: 1. the relationship must first be transformed in some way so that a linear model results (eg a log transformation); 2. a non-linear optimisation algorithm must be used (usually implemented within a statistical software package). Wikipedia provides [http://en.wikipedia.org/wiki/Nonlinear_regression a concise description of the nonlinear regression problem] and also provides links to common nonlinear regression algorithms. The University of Illinois provides a useful [http://www.cse.uiuc.edu/eot/modules/optimization/index.html tool for the visualisation of several optimization methods commonly used for nonlinear regression]). The purpose of regression (both linear and nonlinear), and the various forms of optimization used to facilitate it, is to fit a model to data in a sense that is in some way optimum. The theory of model equifinality (see for example Beven (2006) and [Generalized Likelihood Uncertainty Estimation (GLUE)]) problematizes the idea that complex models, such as those describing hydrological systems, should be defined by one ‘optimal’ parameter set; however, methods for parameter estimation, usually requiring some form of nonlinear regression, can form a useful starting point to build an understanding of the process and for generating pragmatic numerical representations and associated quantile values (here a quantile value describes a statistic such as the 100 year return period flow). ---- == Case study: details == This case study presents the work of [http://dx.doi.org/10.1016/S0022-1694(99)00135-3 '''G. R. Pandey''' and '''V. T.V. Nguyen''' (1999) “''A comparative study of regression based methods in regional flood frequency analysis''” (doi:10.1016/S0022-1694(99)00135-3)]). This paper is referred to here as P&N. ---- The study compares several methods of regression (including three forms of nonlinear regression) on annual peak discharge data for 71 catchment basins in Quebec, Canada. The basins range in scale from 3.9 to 86900 km2. The length of the discharge data sets ranges from 20 to 62 years. At each site, a [http://water.oregonstate.edu/streamflow/analysis/floodfreq/index.htm log-Pearson type III (LP3) probability distribution] was used to estimate 10 and 100 year flood quantiles (i.e., the estimated discharge exhibiting a return period of 10 and 100 years). In this way, P&N generated a data base of 71 catchments with information about catchment area and estimated 10 and 100 year flow quantiles. The objective was then to form a model that answers the question: * “given a catchment of area ''x'', what will be the value of the 10 and 100 year flow quantiles?” P&N assumed two potential model forms: 1. ''Q,,T,,'' = ''alpha,,0,,'' x ''A1^alpha1^'' x ''A2^alpha2^'' ... x ''An^alphan^'' x ''error''. 2. ''Q,,T,,'' = ''alpha,,0,,'' x ''A1^alpha1^'' x ''A2^alpha2^'' ... x ''An^alphan^'' + ''error''. These are power-form function models (see [http://choctaw.er.usgs.gov/new_web/reports/other_reports/flood_frequency/streamflowcharac_1975.html Thomas and Benson, 1970]) The two models are different in their treatment of the error term: * For (1) the error term is multiplicative, this permitted P&N to linearize the model using a log transform; then a range of standard linear regression methods were employed for parameter estimation. * For (2) the error term is additive which results in a model that cannot be linearized; for this model, P&N applied a number of nonlinear regression methods for parameter estimation. ---- '''Nonlinear regression methods''' P&N used [http://comjnl.oxfordjournals.org/cgi/content/abstract/3/3/175 Rosenbrock’s hill climbing algorithm] to minimize the chosen objective function; an in depth description and comparison of practically every form of direct search optimization algorithm –including Rosenbrock’s, can be found in [http://www.cs.wm.edu/~va/research/sirev.pdf) Kolda et al (2003)]. This leads to a key component of P&N’s study; namely, the definition of the objective function. The objective function (often called a cost function) is the value minimized by the chosen optimization algorithm. Clearly, the objective function can be defined in many ways. P&N chose three forms of objective function, these were: 1. Ordinary Least Squares (labelled as N_RMS), where the objective function is formed from the sum of the squared errors between the observed and modelled quantiles. 2. Relative Error, (labelled as N_RERR) where the value of ordinary least squares is divided by the square of the observed quantile before summation (designed to remove the bias of large values in the data). 3. Least Absolute Value (labelled as N_LAV), where the objective function is formed from the sum of the absolute values of the error. Having specified the above objective functions (1) through (3) above for use with nonlinear regression together with a number of linear regression methods (not shown here); P&N go on to compare the performance of models identified using each regression approach. The comparison was performed using the following jackknife procedure ([http://pareonline.net/getvn.asp?v=8&n=19 Yu, Chong Ho (2003)] describes resampling methods including jackknife). 1. A catchment was selected from the data base 2. The remaining catchments were used to estimate the parameters for 10 and 100 year flow quantiles using each of the chosen regression methods. 3. The above steps were repeated until each catchment had been excluded at least once. ---- == Results == P&N define three model performance criteria: 1. Mean Average Deviation (MAD) this provides an indication of prediction bias; 2. Root Mean Squared Error (RMSE) this is effectively the model error variance; and 3. Root Relative Error (RERR). Taken over all catchment areas, all three nonlinear regression model types performed better than the process of linearizing the multiplicative model with a log transform then applying standard linear regression methods. For example, the Generalised Least Squares (GLS) linear regression method produced: MAD = 1051, RMSE = 1890, RERR = 12.6. The nonlinear ordinary least squares method produced: MAD = 541, RMSE = 902, RERR = 0.7. ---- == Comment == P&N draw a number of conclusions from the study, a key concern relates to the drawbacks associated with a log transformation of the data: following linearization, the model operates on the relationship between the log of the catchment area and the log of the flow return quantiles. This artificial situation introduces a number of problems; for example, the regression process overemphasises errors generated from very low flows and underemphasises errors from high flows. Also, goodness-of-fit indicators applied to the log of the data are less meaningful when transformed back to the real flow domain. Taken together, these problems favour the use of nonlinear regression methods. Other comments include: * The models that require a log transform tend to under predict the flow quantiles, especially for large catchments. * The values of numerical performance indices demonstrate the superiority of the nonlinear regression methods over log transformed models. * The choice of objective function that allowed the nonlinear models to perform the ‘best’ was dependent on which performance indicator was used (e.g. the least absolute value objective function produces the best results for the MAD performance indicator). P&N show that the performance of the various models changes with the size of the catchment areas included in the study. Chiefly, removing the large catchments reduces the dominance (in performance terms) of the nonlinear models. Also, both linear and nonlinear models are less consistent across the performance criteria and the two flow return periods i.e., it is more difficult to generate a simple model that works well across a range of smaller catchments. This presents a problem as it is generally small catchments that are unguaged (an application that this modelling method is commonly used for). P&N suggest that it may be the best course of action to choose a specific regression methods/performance indicator dependent on the catchment area and required flow return quantile. ---- == References == * Beven, K. (2006). "A manifesto for the equifinality thesis." Journal of Hydrology 320(1-2): 18-36. * Kolda, T. G., R. M. Lewis and V. Torezon (2003). "Optimization by Direct Search: New Perspectives on Some Classical and Modern Methods." SIAM review 45(3): 98. * Pandey, G. R. and V. T. V. Nguyen (1999). "A comparative study of regression based methods in regional flood frequency analysis." Journal of hydrology 225(1): 92. * Thomas, D. M. and M. A. Benson (1970). Generalization of streamflow characteristics from drainage-basin characteristics. Water Supply Paper, US Geological Survey: 1975. * [http://PAREonline.net/getvn.asp?v=8&n=19 Yu, Chong Ho. (2003). Resampling methods: concepts, applications, and justification. Practical Assessment, Research & Evaluation, 8(19). Retrieved April 19, 2007] ----
Optional change note
:
Upload a file or image
:
For editing help, see
HelpPage
.
Powered by Plone CMS, the Open Source Content Management System
This site conforms to the following standards:
Section 508
WCAG
Valid XHTML
Valid CSS
Usable in any browser