Deriving guideline values using reference-site data

​​​​​​​​​​​​​​​​​​​In the Water Quality Guidelines, we provide general advice on the use of reference-site data to derive guideline values for stressor and ecosystem receptor components of the pressure–stressor–ecosystem receptor (PSER) causal pathway.

Chemical and physical lines of evidence

Approaches to deriving guideline values are listed in our order of preferred use:

The referential approach to guideline value derivations is mainly applicable to physical and chemical stressors but can also be applied to toxicants. We regard this approach as inherently conservative. It’s often a good starting point when no guideline values are available.

As a default, our recommended approach for deriving guideline values in this way is to calculate an appropriate percentile of reference-site data. Typically, the 80th percentile.

More conservative guideline values based on lower percentiles have been applied by some jurisdictions as a precautionary measure. This may be preferred where there are indications that such change from the reference condition has potential to adversely affect ecosystems.

Ecosystem receptors

Ecosystem receptors in the Water Quality Guidelines include the biodiversity, toxicity and biomarkers lines of evidence.

​You can use ecosystem receptor lines of evidence to:

  • measure how — and to what extent — ecosystems respond to stressors in the environment
  • diagnose the nature or identity of the stressor responsible for any measured change to a receptor.

Toxicity and biomarkers lines of evidence can be used as:

  • early detection information so that substantial and ecologically important disturbances can be avoided, or
  • diagnostic information in a weight-of-evidence evaluation to detect the presence of — and intensity of response to — stressors (e.g. through direct toxicity assessment) and the nature or identity of the stressors eliciting responses.

(We discuss the role of the toxicity line of evidence when deriving guideline values for single toxicants in Weight of evidence.)

Unlike biodiversity indicators, information acquired from toxicity and biomarkers lines of evidence most often lacks correlation and linkage to effects at higher levels of biological organisation (e.g. ecosystems).

Indicators within the biodiversity line of evidence may also provide the same early detection and diagnostic roles but are typically selected as surrogates or direct measures of the management goals. They can inform management of the extent to which ecosystems are being protected or are tracking towards improved ecosystem condition.

In all these contexts, the notion of​ guideline values for ecosystem receptors is more often coached around an ‘effect size’, associated with a sampling design of specified statistical power, to detect any change (including trend) from a reference condition, associated with stressors.

In the case of indicators of biodiversity, any change (departure) from a reference condition may represent impact and non-achievement of the management goals.

You should obtain reference data for stressors and ecosystem receptors from the same ecosystem but away from possible environmental impacts (e.g. upstream). Refer to Study design for water quality monitoring programs for some guidance on what constitutes a reference site.

Regional guideline values for physical and chemical stressors

For the ANZECC & ARMCANZ (2000) guidelines, reference data for a range of physical and chemical (PC) stressors were obtained from state agencies, and DGVs were derived separately (where data were available) for upland river, lowland river, freshwater lakes and reservoirs, wetlands, estuaries and marine ecosystem types for 5 geographic regions: south-east Australia (Victoria, NSW, ACT, south-east Queensland, Tasmania), south-west Australia (southern WA), tropical Australia (northern WA, NT, northern Queensland), south central Australia — low rainfall area (SA) and New Zealand.

We recognised that the scale of these regional defaults was too coarse and the climatic-physiographic groupings were not natural.

Hale et al. (2012) reviewed this ecoregionalisation approach and recommended the adoption of 12 major drainage basins for inland waters and new Integrated Marine and Coastal Regionalisation of Australia (IMCRA) subdivisions for marine waters.

Since the ANZECC & ARMCANZ (2000) guidelines, guideline values for PC stressors have been developed for many specific regions and subregions, for example:

  • Murray–Darling Basin Plan 2012 (guideline values are referred to as ‘targets’)
  • Tasmania (DPIW 2008, DPIPWE 2012)
  • Victorian rivers and streams (EPAV 2003)
  • Queensland waterways (DERM 2009)
  • Indian Ocean and Timor Sea drainage divisions and all of Australia’s coastal marine waters.

Here we summarise a number of approaches that have been used to derive guideline values based on reference data. Refer to Data analysis for water quality monitoring for detailed guidance.

ANZECC & ARMCANZ (2000) methodology and derivatives developed to 2018

Regional and catchment-level derivations

Unless a jurisdiction has developed regional guideline values since 2000, the basis of national guideline values and associated guidance for upland river, lowland river, freshwater lakes and reservoirs, wetlands, estuaries and marine ecosystem types for different regions is use of:

  • 80th percentile of reference-site data, or
  • 20th percentile of reference-site data for stressors that cause problems at low concentrations, such as oxygen.

Data collected over 2 years of monthly sampling are regarded as sufficient to indicate ecosystem variability and therefore suitable for guideline value derivation.

For high conservation or ecological value ecosystems, the objective should be to keep the water body at the reference condition.

For slightly to moderately disturbed ecosystems, test site medians should be compared with the 80th percentile guideline values.

For highly disturbed ecosystems, the 90th (or 10th) percentiles could be used, although the goal should be to improve water quality.

(Refer to Level of protection to better understand the degree of species protection for an aquatic ecosystem based on its condition).

When using guideline values derived using reference-site data, comparison of the annual median of measured test site data is made with the guideline value. We recommend partitioning guideline values by season instead of an annual assessment if there will be seasonal effects.

Refer to Deriving site-specific guideline values for physical and chemical stressors for advice on seasonal guideline value derivations.

Jurisdictional variants of the ANZECC & ARMCANZ (2000) guidance, using the minimum data requirement and percentile as the guideline value, include:

  • Queensland water quality guidelines recommended that estimates of 80th percentiles should be based on a minimum of 18 samples collected at 1 to 2 reference sites over at least 12 months but preferably over 24 months (to capture 2 complete annual cycles) (DEHP 2009). Specific considerations were given to flow conditions. The Queensland guidelines also provided more explicit guidance that included compliance with the 20th, 50th and 80th percentiles for high conservation value ecosystems. Recently, the Queensland Government has provided greater clarity around derivation of guideline values at (inland) catchment scale, including greater differentiation amongst levels of protection (ecosystem condition), advice on selection of reference sites within these different levels of ecosystem condition, and stratification of data into ‘high’ and ‘low’ flow classes.
  • For Tasmania, guideline values and water quality objectives that permit a particular cultural value to be maintained and protected were set using the national DGVs, regional values or values derived from site-specific data. Water quality datasets that satisfy the ANZECC & ARMCANZ (2000) approach for aquatic ecosystems were developed for high ecological value and slightly to moderately disturbed ecological conditions (divided into slightly modified ecological value and moderately disturbed ecosystem conditions). For PC stressors, state, regional and more specific subregional data were used, and, where possible, only percentile values with 95% confidence (≥ 14 samples) or greater were reported. Proposed draft guideline values (subject to approval by the EPA Tasmania Board) were derived using the 80th/20th percentiles for (a) riverine waters – generally for catchments but to river section level in some cases, (b) estuarine waters – by estuary flushing classes and in some cases functional zones or more specifically localised areas within zones, and (c) marine waters – provincial to mesoscale marine bioregions or more specific coastal segments.
  • When deriving locally relevant guideline values for marine waters in Western Australia, it is recommended that reference-site data ideally are collected over a 2-year period. For indicators that are seasonally variable, consideration should be given to deriving guideline values on a seasonal basis. Depending on the level of ecological protection to be achieved, a different percentile of the reference-site data is recommended to derive a relevant guideline value (EPA WA 2015).
  • In New South Wales, for estuarine waters, a method was developed to derive guideline values for 4 metaclasses of estuary types (drowned valleys, rivers, lakes and lagoons, small creeks) based on extrapolation from measurements in existing low disturbance estuaries of each type, and confirmation that the low disturbance values were materially different from values for high disturbance sites.

Physiographic layering

The Victorian guidelines (EPAV 2003) considered physiographic parameters in delineating subregions for inland waters, for example:

  • highland areas above approximately 1000 m altitude
  • forested areas in the north-east
  • cleared hills and coastal plains
  • lowland reaches of specified catchments.

Guideline values were then derived for each of these subregions using the reference data percentile approach. The 75th/25th percentile objectives were set for some stressors based on annual monthly data (12 data points) (EPAV 2003).

Recently, the Victoria jurisdiction has provided greater clarity and advice around derivation of guideline values at (inland) catchment scale, including greater differentiation amongst levels of protection (ecosystem condition) and corresponding advice on selection of reference sites within these different levels of ecosystem condition.

New guideline value derivations at regional scales

The Water Quality Guidelines stratify the landscape into units (may be geographically independent) or ‘ecoregions’ (geographically distinct) with greater distinctive ecological character than the regions used in the ANZECC & ARMCANZ (2000) guidelines (Hale et al. 2012).

In Australia, regional guideline value derivations are recommended for, or at catchments scales within:

Ecosystem types are now based on the Australian National Aquatic Ecosystem (ANAE) classes for surface waters: palustrine (wetlands), lacustrine (lakes), riverine (rivers and streams), floodplain, estuarine and marine ecosystems.

For New Zealand rivers and streams, and where ‘attributes’ for PC stressors have not been assigned, geographically independent classifications are applied, based on environmental characteristics for the stressors.

Guideline value derivation approaches associated with the new schema include modelling of reference-site data and a combination of information on climatic zone, wet–dry season and ANAE level 3 classification.

Modelling of reference data

The CSIRO Atlas of Regional Seas (CARS 2015) is a digital atlas of seasonal ocean water properties: temperature, salinity, dissolved oxygen, nitrate, phosphate and silicate. It is synthesised (modelled) from ocean profile data for these properties arising from the World Ocean Database, Argo and many other sources, collectively called BlueLINK Ocean Archive (BOA). The source of much of BOA is underway transect water quality data.

From BOA, estimates of mean and seasonal ocean state are derived on a 3D grid (CARS). CARS is mapped on a global 0.5 degree grid (~ 55 km) on 79 depth layers, and conveys seasonal cycles at each grid point by 4 harmonic coefficients. Seasonal and depth ranges can be set from BOA to derive CARS for particular temporal and spatial requirements.

Based on measurements over many years, CARS has recently been applied to derive guideline values for Australia’s IMCRA mesoscale marine bioregions. Refer to the IMCRA derivation method for details.

For New Zealand rivers, and using data from the National River Water Quality Network and other sources, the percentile approach has been used to derive guideline values for reference conditions of different types of river sections identified by the New Zealand River Environment Classification (REC) model. This classifies and models the expected water quality of sites according to the environmental conditions that are strong determinants of baseline water quality (e.g. climate, flow, geology, land cover, network position and valley landform) (Snelder & Briggs 2002, McDowell et al. 2013). Refer to the REC derivation method for details.

Combined climatic zone, wet–dry season and ANAE level 3 classification

For the Indian Ocean and Timor Sea drainage divisions of Australia, an analysis of available datasets identified that reference-site measurement could be classified by:

  • climatic zone (particularly in the Indian Ocean drainage division)
  • season (defined only as wet and dry seasons)
  • water body type (to level 3) under the ANAE classification.

Sites with available data were further subdivided into naturally saline water and freshwater systems, as reference water quality differed strongly between them. However, there were insufficient data to derive reference-based guideline values based on 20th and/or 80th percentile values within each season and wetland type under this hierarchical classification.

For example, most wetlands of the Indian Ocean drainage division lacked sufficient wet season sampling, due largely to inaccessibility and the vast, remote and changeable nature of this landscape at that time of year.

This classification system provided guideline values at different levels of classification for some classes and indicators and also highlighted data gaps for other wetland classes and parameters.

Site-specific guideline values for physical and chemical stressors

Ideally, site-specific guideline values for PC stressors should be derived and used instead of regional DGVs.

As with regional DGVs, site-specific guideline values should be based on at least 2 years of monthly monitoring data from an appropriate site; for example, upstream of impacted areas, or from appropriate local reference systems that are representative of unimpacted water bodies.

Using sets of reference sites will provide a better characterisation of the local regional characteristics than a single site.

In some regions, water quality can be influenced by strong seasonal or event-scale effects. In these areas, it will be important to use monitoring data that cover these seasons or events, and derive guideline values appropriate to the particular period (e.g. wet and dry season values for tropical waters).

If there is evidence that the local ecosystem may be naturally stressed in some seasons (e.g. seasonal depletion of dissolved oxygen in wetlands after the wet season), then you should consider the extent that the ecosystem will be able to accommodate any further move away from median conditions. In these cases, it might be necessary to:

  • set the reference-based guideline value at or near the median value
  • ensure that biological monitoring is implemented for assurance of ecosystem protection, as part of a multiple lines-of-evidence approach.

A new feature of the Water Quality Guidelines is advice on more flexible and dynamic guideline derivations where PC stressors are naturally and strongly related to stream discharge. Thus, flow is often a key driver of water quality. In a natural system, parameters such as electrical conductivity (EC), for example, can be highly dependent on stream discharge (van Dam et al. 2014). You should take this into account when deriving fl​ow-related changes to the reference condition where appropriate.

Guideline values for toxicants

If there is no effects-based DGV, or when the natural background concentration of a toxicant exceeds the DGV, then reference data can be used to derive a site-specific guideline value.

However, you should consider the extent to which the ecosystem can accommodate further elevation of toxicant concentrations where there is evidence that the local ecosystem:

  • may be naturally stressed
  • has reduced biodiversity
  • has altered ecosystem structure compared with ecosystems without naturally elevated toxicant concentrations.

If the aquatic ecosystem has a limited ability to tolerate substantial further increases in concentration, then it might be necessary to set the reference-based guideline value at a value below the 80th percentile of the reference data (closer to the median value), and to implement biological monitoring as a condition of using the guideline value in a weight-of-evidence process.

Guideline values for ecosystem receptors

We provide detailed information on setting guideline values for ecosystem receptors in Data analysis​.

General principles of guideline-value derivation according to the line of evidence

Generally, there are no fixed ‘standard’ guideline values for ecosystem receptors. Instead, guideline values for ecosystem receptors are typically couched in terms of an ‘effect size’, representing the magnitude of change (including trend) from a reference condition, associated with the effect of stressors in the environment.

Toxicity and biomarkers lines of evidence provide:

  • early detection information, or
  • diagnostic information in a weight-of evidence evaluation (stressor presence, intensity of action and type).

For these lines of evidence, the objective of the measurement program is often to detect any change (including trend) from a control or reference condition associated with stressors. This assessment is less likely to be confounded by non–water quality related stressors and other environmental influences.

In contrast, for indicators of the biodiversity line of evidence used as ‘ecosystem response’ measures in particular, non–water quality and other environmental influences contribute to inherent variability in these biological systems. Because of this variability, associated sampling designs have a limited capacity to detect and quantify change relative to an undisturbed or reference state.

Any given sample size or number of sample units taken during a monitoring or assessment program has quantifiable constraints on its capacity to detect a change of a given magnitude. There is a strong relationship between the power (in statistical terms) of a monitoring program design, the magnitude of the effect that is detectable and the sample sizes involved.

In the case of indicators of biodiversity and where inferences are strong, any change (departure, including trend) from a reference condition may represent impact and non-achievement of the management goals.

For most biological indicators selected amongst the ecosystem receptor lines of evidence, guideline values will be negotiated in terms of the maximum acceptable deviation from control or reference condition. This may include trending departure as detected in regression analysis or control charting. The basis for these negotiations will depend on the management questions being addressed and the candidate monitoring design (Chapter 10 of Downes et al. 2002).

Situation-dependant guideline value derivations

Decisions on the magnitude of acceptable change in indicators selected amongst ecosystem receptor lines of evidence may vary depending on the management context.

Weight-of-evidence evaluations

In a weight-of-evidence evaluation​, the measurement program for each line of evidence investigation culminates in analyses to determine whether the collective information across the measured responses indicate a water quality-related change (e.g. guideline value exceedance, such as significant statistical test for change in a field biological response, or observed toxicity), and where possible, the extent and likely cause of that change.

For each line of evidence, the relevant assessment is made through the Data analysis template.

Typical scenarios and management questions for respective lines of evidence might include:

  • Direct toxicity assessment of effluents or the water/sediment of interest (laboratory but also field) to measure short-term (days) response, using either acute or chronic endpoints. Is toxicity observed? Subsequent toxicity identification and evaluation (TIE) can be important to help identify a cause where an adverse laboratory response is measured.
  • Biomarkers indicate the extent to which contaminants are bioavailable and taken up by organisms. Are physiological and biochemical responses or elemental concentrations in organisms indicative of exposure to stressors, and which (class of) stressors?
  • Biodiversity responses provide more direct measures of the effect of stressors on ecosystems, and the extent and magnitude of any impacts observed. Are changes to key biodiversity indicators observed? What is the extent and possible cause of the change?

Assuming sound experimental design, conservative effect sizes may be set in each case (e.g. 5 to 10% change from controls or reference condition). Alternatively, the null hypothesis of ‘no effect’ is specified in the measured response between the ‘exposed’ and reference condition or controls.

Significance in the statistical test for change may be sufficient evidence of water quality-related change.

Assessments for licensing and compliance

Previous decisions made between stakeholders (e.g. developer and regulator) about effect size and the probability of making a Type I error (α) and Type II error (β) remain a key aspect of the Water Quality Guidelines philosophy.

The setting of these decision criteria (effect size, α and β) may be governed by principles of:

  • early intervention (setting values for critical effect sizes that are inherently flexible so as to trigger an early management response to a potential disturbance), or
  • setting a priori fixed-decision criteria under a compliance or legal framework where data are gathered under strict and rigorous hypothesis testing.

We explain these 2 approaches in Data analysis. General guidance on setting default values for decision criteria is provided later.

Broadscale assessments at large geographical scales (catchment, regions)

Rapid biological assessment (RBA) methods are appropriate cost-effective techniques for application over large numbers of sites and over large geographical areas.

The best-developed method in Australia, Australian River Assessment System (AUSRIVAS,)​ is based on freshwater macroinvertebrate community composition and compares site data with regionally relevant reference conditions, via a predictive model, and reported using a standard index.

The values of the AUSRIVAS index can range from a minimum of 0, indicating that none of the families expected at a site were actually found at that site, to a theoretical maximum of 1.0 (or greater), indicating a perfect match between the families expected and those that were found (or enrichment for > 1.0). The index is banded from X (enrichment), then A (reference band) to D (severely depleted in terms of the number of macroinvertebrate families expected).

The ‘effect size’ in such a monitoring program is effectively any one of the different bands, with management objectives typically set at maintenance of current ecosystem health, or improvement (moving from a lower AUSRIVAS band to a higher one).

Different levels of protection

ANZECC & ARMCANZ (2000) recognised that guidelines for ecosystem receptors could differ depending on the level of protection assigned to the relevant water bodies (high conservation/ecological value, slightly to moderately disturbed systems and highly disturbed systems).

For high conservation/ecological value and slightly to moderately disturbed systems, management involves tracking the intrinsic attributes of the ecosystems (key structural and functional components typically imbued in management goals) to ensure they do not deviate outside natural variability as determined from baseline knowledge or accruing knowledge.

For any level of protection, local jurisdictions could negotiate site-specific guideline values, alternative to those recommended as a starting point, after considering site-specific factors. However, any decisions on effect size should be based on sound ecological principles of sustainability rather than arbitrary relaxation of any DGVs, or because of resource constraints.

For highly disturbed ecosystems, our philosophy in the Water Quality Guidelines is that at worst, water quality is maintained. Ideally, the longer-term aim is to move towards improved water quality. For these sites, any decisions on effect size can be arbitrary relaxations of the DGVs although they should still be based on sound ecological principles of sustainability.

A starting point for negotiating guidelines for ecosystem receptors

For applications at a local scale for sites of high conservation/ecological value and slightly to moderately disturbed systems (e.g. licensing, but not necessarily broadscale monitoring as would be applied for State of Environment reporting), ANZECC & ARMCANZ (2000) provided advice for baseline data collection and detecting and assessing change in ecosystem receptors.

Default guidance was provided because, for most applications using ecosystem receptors response measures in Australia, there is usually insufficient information about ecosystems on which to make informed judgments about an acceptable level of change.

Baseline data collection

Using an appropriate statistical design for the indicator response, parties should ensure an ‘adequate’ baseline is gathered for the indicators measured. This may be achieved by setting ‘conservative’ α, β and effect size values, where the effect size is determined on the basis of statistical or other criteria.

In the absence of clear information from which to set decision criteria, ANZECC & ARMCANZ (2000) recommended default targets for ecologically conservative decisions be set at α = 0.1, β = 0.2 (power of 0.8) and effect size = 10% of, or 1 SD about, the baseline mean, whichever is smaller.

Whether these defaults are applied or not, the importance of sound and numerous baseline data cannot be overemphasised.

We strongly recommend that baseline data be gathered from at least 3 control or reference locations (for biodiversity indicators at least) over a period of at least 3 years (all indicators) wherever possible. We provide guidance for situations when it i​s not possible to meet these baseline requirements.

ANZECC & ARMCANZ (2000) stressed that the recommended decision criteria should be seen as the starting point for stakeholders (e.g. developer and regulator) to consider (and negotiate) what is appropriate or reasonable for each case.

In some cases, an effect size as small as 10% is achievable and necessary, but for many other responses typically measured in environmental programs, it will be very difficult to detect changes of 10% or less about some mean, and perhaps impossible in practice.

Detecting and assessing disturbances

The guidelines for detecting and assessing environmental impacts or disturbances are determined from previous decisions made between all parties. Management of discharges is considered when assessing licensing and compliance.

Where early intervention is a key principle, responsiveness to any apparent trend away from a baseline, or to reaching a feedback ‘trigger’ or threshold, will apply. The proponent/discharger may also wish to corroborate the results for a line of evidence indicator with information from stressor and other ecosystem receptor lines of evidence in a weight of evidence assessment.

Where data may be being gathered for compliance assessment within a legal framework, under strict and rigorous hypothesis-testing, then using the default targets for ecologically conservative decisions (shown earlier) or other values determined beforehand, an unacceptable disturbance has occurred if P < 0.1 in the statistical test applied to the data.

Regardless of the management context, we strongly recommend that parties adopt a precautionary approach and respond wisely and in a timely manner to data gathered for ‘early detection’ indicators (from toxicity and biomarkers lines of evidence).

References

ANZECC & ARMCANZ (2000), Australian and New Zealand Guidelines for Fresh and Marine Water Quality, Australian and New Zealand Environment and Conservation Council (ANZECC) & Agriculture and Resource Management Council of Australia and New Zealand (ARMCANZ), Canberra.

CSIRO 2009, CSIRO Atlas of Regional Seas (CARS), CSIRO, Canberra.

DERM 2009, Queensland Water Quality Guidelines, Version 3, Department of Environment and Resource Management Brisbane.

DPIPWE 2012, Tasmanian Water Quality and Biological Condition Guidelines, Version 1, Department of Primary Industry Parks Water and Environment, Hobart.

DPIW 2008, Site-Specific Trigger Values for Physico-chemical Indicators monitored under the DPIW Baseline Water Quality Monitoring Program (PDF, 545KB), Water Assessment Water Quality Report Series no. WA 08/52, Department of Primary Industries and Water, Hobart.

EPA WA 2015, Technical Guidance Protecting the Quality of Western Australia’s Marine Environment, Environmental Protection Authority Western Australia, Perth.

EPAV 2003, Water quality objectives for rivers and streams – ecosystem protection, publication no. 793.1, Environment Protection Authority, Victoria, Carlton.

Hale J, Butcher R, Collier K & Snelder T 2012, Ecoregionalisation and Ecosystem Types in Australian and New Zealand Marine, Coastal and Inland Water Systems,​ Department of Agriculture and Water Resources, Canberra.

McDowell RW, Snelder TH, Cox N, Booker DJ & Wilcock RJ 2013, Establishment of reference or baseline conditions of chemical indicators in New Zealand streams and rivers relative to present conditions, Marine and Freshwater Research 64: 387–400.

Smith EP 2006, BACI design, in: Encyclopedia of Environmetrics, John Wiley & Sons Ltd.

Snelder TH & Biggs BJF 2002, Multiscale river environment classification for water resources management, Journal of the American Water Resources Association 8: 1225–1239.

Van Dam RA, Humphrey CL, Harford AJ, Sinclair A, Jones DR, Davies S & Storey AW 2014, Site-specific water quality guidelines: 1 Derivation approaches based on physicochemical, ecotoxicological and ecological data,​ Environmental Science and Pollution Research 21: 118–130.

​​