The file biota_summary.csv summarises the assessment results for each timeseries of contaminants and biological effects in biota in the 2020 assessment of the UK's Clean Seas Environment Monitoring Programme (CSEMP). The results come from timeseries models fitted to the contaminant and biological effect data in biota_data.csv which in turn are based on data extracted from the MERMAN database on 30 March 2022. Details of the modelling procedures can be found in the method help files for contaminants, PAH metabolites, imposex and other biological effects.
The file biota_summary.csv is UTF-8-BOM encoded so can be read directly into Excel, or into R using the function read.csv with the argument fileEncoding = “UTF-8-BOM”.
The data mostly relate to chemical concentrations and the variable descriptions are written with that in mind. In particular, 'concentration' is used to describe the measurement in all time series including biological effects. However, there are some important differences for time series of imposex, ACHE, Lysosomal labilisation period, Neural red retention time and Stress on stress and these are described in the footnotes at the bottom of the document.
The variables in the file are:
series
Description: timeseries identifier
Unit:
Type: categorical
Levels: 6971
Note: links to the timeseries data in biota_data.csv
CSEMP_region
Description: CSEMP region where the monitoring station is located
Unit:
Type: categorical
Levels: 19
Note: see map of CSEMP regions
CSEMP_stratum
Description: subdivision of CSEMP region where the monitoring station is located
Unit:
Type: categorical
Levels: 122
Note: there are 818 CSEMP strata, most of which are WFD water bodies, but only 122 with data
biogeographic_region
Description: biogeographic region where the monitoring station is located
Unit:
Type: categorical
Levels: Northern North Sea, Southern North Sea, Eastern Channel, Western Channel & Celtic Sea, Irish Sea, Minches & Western Scotland, Scottish Continental Shelf
Note: see map of biogeographic regions
station
Description: monitoring station
Unit:
Type: categorical
Levels: 208
Note:
station_name
Description: name associated with the monitoring station
Unit:
Type: categorical
Levels: 207
Note:
latitude
Description: station latitude
Unit: decimal degrees
Type: continuous
Range: 50.11, 61.10
Note: this is a nominal position: sampling occurs in a pre-defined area broadly centred on this position
longitude
Description: station longitude
Unit: decimal degrees
Type: continuous
Range: -7.11, 2.90
Note: this is a nominal position: sampling occurs in a pre-defined area broadly centred on this position
MSTAT
Description: type of monitoring station
Unit:
Type: categorical
Levels: B, RH, IH
Note: baseline (B), reference (RH) or impacted (IH)
WLTYP
Description: station typography
Unit:
Type: categorical
Levels: Estuary, Coast, Open Sea
Note:
determinand_group
Description: contaminant or biological effect group
Unit:
Type: categorical
Levels: Metals, Organotins, PAH parent compounds, PAH alkylated compounds, PAH metabolites, Polybrominated diphenyl ethers, Organobromines (other), Organofluorines, Polychlorinated biphenyls, Dioxins, Organochlorines (other), Imposex, Biological effects (other)
Note:
determinand
Description: contaminant or biological effect
Unit:
Type: categorical
Levels: 103
Notes:
* see ICES reference codes for PARAM
* SBDE6 is the code used for the sum of BDE28, BDE47, BDE99, BD100, BD153 and BD154
* TEQDFP is the code used for the WHO TEQ_DFP (where DFP indicates dioxins, furans and planar polychlorinated biphenyls)
species
Description: species
Unit:
Type: categorical
Levels: 8
Note:
* Crassostrea gigas: AphiaID = 140656
* Gadus morhua: AphiaID = 126436
* Limanda limanda: AphiaID = 127139
* Merlangius merlangus: AphiaID = 126438
* Mytilus edulis: AphiaID = 140480
* Nucella lapillus: AphiaID = 140403
* Platichthys flesus: AphiaID = 127141
* Pleuronectes platessa: AphiaID = 127143
matrix
Description: sample matrix (tissue)
Unit:
Type: categorical
Levels: BI, ER, HML, LI, LIS9, MU, SB, WO
Note: see ICES reference codes for MATRX
basis
Description: basis of the assessment
Unit:
Type: discrete
Levels: dry weight (D), lipid weight (L) or wet weight (W)
Note:
unit
Description: unit of measurement
Unit:
Type: discrete
Levels: %, d, mins, nmol/min/mg protein, nr/1000 cells, pmol/min/mg protein, st, TEQ ug/kg, ug/kg, ug/ml
Note:
sex
Description: sex
Unit:
Type: categorical
Levels: F, M
Note: only provided for EROD where separate time series are assessed for females (F) and males (M)
metoa
Description: method of chemical analysis
Unit:
Type: categorical
Levels: FLM-SS
Notes:
* see ICES reference codes for METOA
* only provided for PAH metabolites where the assessment concentrations depend on the method of analysis
shape_env
Description: the symbol used to summarise the fitted trend when using environmental thresholds
Unit:
Type: categorical
Levels: upward_triangle, downward_triangle, large_filled_circle, small_filled_circle, small_open_circle
Notes:
* upward_triangle: significant (p < 0.05) increase in concentration in the last 20 years
* downward_triangle: significant (p < 0.05) decrease in concentration in the last 20 years
* large_filled_circle: no significant (p > 0.05) change in concentration in the last 20 years
* small_filled_circle: insufficient years of data to test for trends
* small_open_circle: only 1-2 years of data (or a time series dominated by less-than values for which no assessment criteria is available)
* the relevant significance level is given in prtrend
colour_env
Description: the colour used to summarise the status assessment when using environmental thresholds
Unit:
Type: categorical
Levels: blue, green, red, orange, black
Notes:
* blue: the mean concentration is significantly below the Background Assessment Concentration (BAC) or equivalent (p < 0.05)
* green: the mean concentration is significantly below the Environmental Assessment Criterion (EAC) or equivalent (p < 0.05)
* red: the mean concentration is not significantly below the EAC or equivalent (p > 0.05)
* orange: the mean concentration is not significantly below the BAC or equivalent (p > 0.05) and there is no EAC or equivalent
* black: no assessment critieria
shape_health
Description: the symbol used to summarise the fitted trend when using human health thresholds
Unit:
Type: categorical
Levels: upward_triangle, downward_triangle, large_filled_circle, small_filled_circle, small_open_circle
Notes:
* upward_triangle: significant (p < 0.05) increase in concentration in the last 20 years
* downward_triangle: significant (p < 0.05) decrease in concentration in the last 20 years
* large_filled_circle: no significant (p > 0.05) change in concentration in the last 20 years
* small_filled_circle: insufficient years of data to test for trends
* small_open_circle: only 1-2 years of data (or a time series dominated by less-than values for which no assessment criteria is available)
* the relevant significance level is given in prtrend
colour_health
Description: the colour used to summarise the status assessment when using human health thresholds
Unit:
Type: categorical
Levels: green, red, black
Notes:
* green: the mean concentration is significantly below the human Health Assessment Criterion (HAC) (p < 0.05)
* red: the mean concentration is not significantly below the HAC (p > 0.05)
* black: no assessment critierion
n_year_all
Description: number of years with data
Unit:
Type: integer
Range: 1, 20
Note:
n_year_fit
Description: number of years included in the statistical analysis
Unit:
Type: integer
Range: 1, 20
Note: some early years might be excluded because they are separated from the bulk of the data by large gaps in time, or because they are dominated by 'less-than' values
n_year_positive
Description: number of years included in the analysis that have at least one concentation measurement above the limit of detection
Unit:
Type: integer
Range: 0, 20
Note:
first_year_all
Description: first year with data
Unit: y
Type: integer
Range: 1999, 2019
Note:
first_year_fit
Description: first year included in the statistical analysis
Unit: y
Type: integer
Range: 1999, 2019
Note: see n_year_fit for explanation
last_year
Description: last year of data
Unit: y
Type: discrete
Range: 2014, 2019
Notes:
* the last year is always included in the statistical analysis
* only timeseries with some data in the last six monitoring years are included in the assessment (i.e. 2014-2019 for the 2020 assessment)
p_nonlinear
Description: the significance of the nonlinear component of the trend
Unit:
Type: continuous
Range: 0, 0.038
Notes:
* this assesses whether log concentrations changed nonlinearly over the monitoring period
* it is based on a likelihood ratio test comparing the smooth model with a linear model and is only given if a smooth model is selected by AICc
p_linear
Description: the significance of the linear component of the trend
Unit:
Type: continuous
Range: 0, 1
Notes:
* this test only has a simple interpretation when the trend is linear (rather than smooth) in which case it assesses whether concentrations changed (log-linearly) over the monitoring period
* it is based on a likelihood ratio test comparing the linear model with the null model (in which only an intercept if fitted)
* for smooth models, the terms p_linear_trend and p_recent_trend are more relevant
p_overall
Description: the overall significance of the trend
Unit:
Type: continuous
Range: 0, 1
Notes:
* this assesses whether mean concentrations changed over the monitoring period
* it is based on a likelihood ratio test comparing the fitted model (smooth or linear) with the null model
* p_overall is identical to p_linear if the fitted model is linear
p_linear_trend
Description: a test of whether the mean concentrations at the start and end of the monitoring period are the same
Unit:
Type: continuous
Range: 0, 1
Notes:
* for linear models, p_linear_trend is identical to p_linear
* for smooth models, p_linear_trend is based on a Wald test that compares the fitted values at the start and end of the monitoring period
* p_linear_trend can be non-significant even if p_overall is highly signficant; for example, if concentrations have increased and then decreased by the same amount
linear_trend
Description: an estimate of the change in mean concentration between the start and end of the monitoring period
Unit:
Type: continuous
Range: -63, 62
Notes:
* loosely, linear_trend can be interpreted as the percentage annual change in concentration between first_year_fit and last_year assuming the trend in concentration is log-linear
* more information can be found here
p_recent_trend
Description: a test of whether the mean concentrations 20 years ago is the same as it is today
Unit:
Type: continuous
Range: 0, 1
Notes:
* p_recent_trend assesses whether the mean concentration in 2000 (or first_year_fit whichever is later) is the same as the mean concentration in 2019 (or last_year whichever is earlier)
* for linear models, p_recent_trend is identical to p_linear
* for smooth models, p_recent_trend is based on a Wald test that compares the fitted values in 2000 (or first_year_fit) and 2019
recent_trend
Description: an estimate of the change in mean concentration in the last 20 years
Unit:
Type: continuous
Range: -63, 62
Notes:
* loosely, recent_trend can be interpreted as the percentage annual change in concentration in the last 20 years assuming the trend in concentration is log-linear
* more information can be found here
detectable_trend
Description: a measure of the power of the time series to detect changes over time
Unit:
Type: continuous
Range: 1.3, 136
Notes:
* the annual change in log concentration (multiplied by 100) that would be detected with 90% power based on a (two-sided) test at the 5% significance level given 10 years of annual monitoring and variability typical of the time series
* loosely, detectable_trend can be interpreted as the percentage annual change in concentration detectable with 90% power in 10 years of annual monitoring
* more information can be found here
mean_last_year
Description: the fitted mean concentration in last_year
Unit: see unit
Type: continuous
Range: 0, 3881043
Note:
climit_last_year
Description: the upper one-sided 95% confidence limit on the fitted mean concentration in last_year
Unit: see unit
Type: continuous
Range: 0.000000000000033, 115052551
Note:
BAC_type
Description: the name of the Background Assessment Concentration (BAC) or equivalent
Unit:
Type: categorical
Levels: BAC
Note:
BAC_value
Description: the value of the BAC (or equivalent)
Unit: see unit
Type: continuous
Range: 0.0054, 63000
Note: more details can be found in the help files on assessment criteria for contaminants in biota and biological effects
BAC_diff
Description: the difference between climit_last_year and the BAC (or equivalent)
Unit: see unit
Type: continuous
Range: -5050, 115052550
Note: a negative value means that the mean concentration in the final monitoring year is significantly (p < 0.05) below the BAC
BAC_achieved
Description: the first year (moving forward) in which the mean concentration is predicted to be below the BAC (or equivalent)
Unit: y
Type: integer
Range: 2014, 3000
Notes: there are four cases
* mean_last_year &le BAC: the mean concentration is already below the BAC and BAC_achieved is set to last_year
* mean_last_year > BAC and recent_trend < 0: concentrations are predicted to decrease and BAC_achieved is set to the first year that the predicted concentration is below the BAC, assuming the rate of decrease is given by recent_trend; the year is truncated at 3000 to prevent values getting silly
* mean_last_year > BAC and recent_trend &ge 0: concentrations are predicted to increase and the mean concentration will never be below the BAC, so BAC_achieved is arbitrarily set to 3000
* mean_last_year > BAC and no trend is estimated: BAC_achieved is left blank
BAC_below
Description: the result of a non-parametric test of whether mean concentrations are below the BAC (or equivalent)
Unit:
Type: categorical
Levels: above, below
Notes:
* a one-sided sign-text, based on the last five monitoring years, is used to test whether mean concentrations are below the BAC; this provides a non-parametric alternative to the parametric test of status based on cl_last_year, and is useful when the data are dominated by less-than measurements
* the status of the time series (colour) is determined by BAC_diff if a parametric model has been fitted, and by BAC_below otherwise
EAC_type
Description: the name of the Environmenal Assessment Criterion (EAC) or equivalent
Unit:
Type: categorical
Levels: EAC, FEQG, QSsp
Notes:
* EAC = Environmental Assessment Criterion
* FEQG = Federal Environmental Quality Guideline
* QSsp = Quality Standard secondary poisoning
EAC_value
Description: the value of the EAC (or equivalent)
Unit: see unit
Type: continuous
Range: 0.024, 3340
Note: more details can be found in the help files on assessment criteria for contaminants in biota and biological effects
EAC_diff
Description: the difference between climit_last_year and the EAC (or equivalent)
Unit: see unit
Type: continuous
Range: -3340, 22428
Note: a negative value means that the mean concentration in the final monitoring year is significantly (p < 0.05) below the EAC
EAC_achieved
Description: the first year (moving forward) in which the mean concentration is predicted to be below the EAC (or equivalent)
Unit: y
Type: integer
Range: 2014, 3000
Notes: there are four cases
* mean_last_year &le EAC: the mean concentration is already below the EAC and EAC_achieved is set to last_year
* mean_last_year > EAC and recent_trend < 0: concentrations are predicted to decrease and EAC_achieved is set to the first year that the predicted concentration is below the EAC, assuming the rate of decrease is given by recent_trend; the year is truncated at 3000 to prevent values getting silly
* mean_last_year > EAC and recent_trend &ge 0: concentrations are predicted to increase and the mean concentration will never be below the EAC, so EAC_achieved is arbitrarily set to 3000
* mean_last_year > EAC and no trend is estimated: EAC_achieved is left blank
EAC_below
Description: the result of a non-parametric test of whether mean concentrations are below the EAC (or equivalent)
Unit:
Type: categorical
Levels: above, below
Notes:
* a one-sided sign-text, based on the last five monitoring years, is used to test whether mean concentrations are below the EAC; this provides a non-parametric alternative to the parametric test of status based on cl_last_year, and is useful when the data are dominated by less-than measurements
* the status of the time series (colour) is determined by EAC_diff if a parametric model has been fitted, and by EAC_below otherwise
HAC_type
Description: the name of the (human) Health Assessment Criterion (HAC)
Unit:
Type: categorical
Levels: MPC, QShh
Notes:
* MPC = Maximum Permissible Concentration
* QShh = Quality Standard human health
* there are no HAC for biological effects (as one might expect)
HAC_value
Description: the value of the HAC
Unit: see unit column
Type: continuous
Range: 0.052, 9146
Note: more details can be found in the help file on assessment criteria for contaminants in biota
HAC_diff
Description: the difference between climit_last_year and the HAC
Unit: see unit
Type: continuous
Range: -8771, 45882
Note: a negative value means that the mean concentration in the final monitoring year is significantly (p < 0.05) below the HAC
HAC_achieved
Description: the first year (moving forward) in which the mean concentration is predicted to be below the HAC
Unit: y
Type: integer
Range: 2014, 3000
Notes: there are four cases
* mean_last_year &le HAC: the mean concentration is already below the HAC and HAC_achieved is set to last_year
* mean_last_year > HAC and recent_trend < 0: concentrations are predicted to decrease and HAC_achieved is set to the first year that the predicted concentration is below the HAC, assuming the rate of decrease is given by recent_trend; the year is truncated at 3000 to prevent values getting silly
* mean_last_year > HAC and recent_trend &ge 0: concentrations are predicted to increase and the mean concentration will never be below the HAC, so HAC_achieved is arbitrarily set to 3000
* mean_last_year > HAC and no trend is estimated: HAC_achieved is left blank
HAC_below
Description: the result of a non-parametric test of whether mean concentrations are below the HAC
Unit:
Type: categorical
Levels: above, below
Notes:
* a one-sided sign-text, based on the last five monitoring years, is used to test whether mean concentrations are below the HAC; this provides a non-parametric alternative to the parametric test of status based on cl_last_year, and is useful when the data are dominated by less-than measurements
* the status of the time series (colour) is determined by HAC_diff if a parametric model has been fitted, and by HAC_below otherwise
imposex_class
Definition: an alternative classification of imposex status
Unit:
Type: categorical
Levels: A, B, C
Note:
Differences for some biological effects
For imposex, linear_trend and recent_trend are the estimated log odds ratio of the VDS (or INTS) of an individual exceeding any given value in one year relative to the previous year.
For ACHE, Lysosomal labilisation period, Neural red retention time and Stress on stress, low values indicate unhealthy organisms. Consequently,
* colour_env is e.g. blue if the mean concentration is significantly above the BAC (p < 0.05)
* climit_last_year is the lower one-sided 95% confidence limit on the fitted mean concentration in last_year
* a positive value of BAC_diff means that the mean concentration in the final monitoring year is significantly above the BAC (p < 0.05)
and so on.