The file biota_data.csv contains the contaminant and biological effect data in biota used in the 2020 assessment of the UK's Clean Seas Environment Monitoring Programme (CSEMP). The data are a cleaned and processed subset of the data extracted from the MERMAN database on 30 March 2022.
The file is UTF-8-BOM encoded so can be read directly into Excel, or into R using the function read.csv with the argument fileEncoding = “UTF-8-BOM”.
The variables in the file are:
series
Description: timeseries identifier
Unit:
Type: categorical
Levels: 6971
Note: identifies the data (typically a station / contaminant / species / matrix combination) that were grouped together into a timeseries and modelled to assess status and trends
CSEMP_region
Description: monitoring region
Unit:
Type: categorical
Levels: 19
Note: there are 22 monitoring regions in the CSEMP but only 19 with data; the CSEMP regions are aggregated into 8 biogeographic regions for regional assessments
biogeographic_region
Description: biogeographic region used for regional assessments
Unit:
Type: categorical
Levels: Northern North Sea, Southern North Sea, Eastern Channel, Western Channel & Celtic Sea, Irish Sea, Minches & Western Scotland, Scottish Continental Shelf
Note: there are 8 biogeographic regions in total, but only 7 with data
station
Description: monitoring station
Unit:
Type: categorical
Levels: 208
Note:
latitude
Description: station latitude
Unit: decimal degrees
Type: continuous
Range: 50.11, 61.10
Note: this is a nominal position: sampling occurs in a pre-defined area broadly centred on this position
longitude
Description: station longitude
Unit: decimal degrees
Type: continuous
Range: -7.11, 2.90
Note: this is a nominal position: sampling occurs in a pre-defined area broadly centred on this position
MSTAT
Description: type of monitoring station
Unit:
Type: categorical
Levels: B, RH, IH
Note: baseline (B), reference (RH) or impacted (IH)
WLTYP
Description: station typography
Unit:
Type: categorical
Levels: Estuary, Coast, Open Sea
Note:
monitoring_year
Description: monitoring year
Unit:
Type: discrete
Range: 1999, 2019
Note: for some data suppliers, monitoring is in winter and e.g. sampling in December 2018 and January 2019 would be regarded as having come from the same monitoring year
sample_id
Description: sample identifier
Unit:
Type: categorical
Levels: 21836
Note: data from different matrices (e.g. LI and MU) in the same fish have different sample_ids
sample_date
Description: sampling date
Unit:
Type: discrete
Range: 1999-01-05, 2019-08-29
Note:
sample_time
Description: sampling time
Unit:
Type: continuous
Range: 00:00:00, 23:29:00
Note:
sample_latitude
Description: sampling latitude
Unit: decimal degrees
Type: continuous
Range: 50.10, 61.01
Note:
sample_longitude
Description: sampling longitude
Unit: decimal degrees
Type: continuous
Range: -7.11, 2.91
Note:
species
Description: species
Unit:
Type: categorical
Levels: 8
Note:
* Crassostrea gigas: AphiaID = 140656
* Gadus morhua: AphiaID = 126436
* Limanda limanda: AphiaID = 127139
* Merlangius merlangus: AphiaID = 126438
* Mytilus edulis: AphiaID = 140480
* Nucella lapillus: AphiaID = 140403
* Platichthys flesus: AphiaID = 127141
* Pleuronectes platessa: AphiaID = 127143
sex
Description: sex
Unit:
Type: categorical
Levels: F, I, M, X
Note: see ICES reference codes for SEXCO
matrix
Description: sample matrix
Unit:
Type: categorical
Levels: BI, ER, HML, LI, LIS9, MU, SB, WO
Note: see ICES reference codes for MATRX
determinand_group
Description: contaminant or biological effect group
Unit:
Type: categorical
Levels: Metals, Organotins, PAH parent compounds, PAH alkylated compounds, PAH metabolites, Polybrominated diphenyl ethers, Organobromines (other), Organofluorines, Polychlorinated biphenyls, Dioxins, Organochlorines (other), Imposex, Biological effects (other)
Note:
determinand
Description: contaminant or biological effect
Unit:
Type: categorical
Levels: 103
Notes:
* see ICES reference codes for PARAM
* data submitted as CHRTR and VDSI have been relabelled as CHR and VDS respectively
* SBDE6 is the code used for the sum of BDE28, BDE47, BDE99, BD100, BD153 and BD154
* TEQDFP is the code used for the WHO TEQ_DFP (where DFP indicates dioxins, furans and planar polychlorinated biphenyls)
metoa
Description: method of chemical analysis
Unit:
Type: categorical
Levels: 25
Note: see ICES reference codes for METOA
basis
Description: basis on which the measurement is expressed
Unit:
Type: discrete
Levels: dry weight (D), lipid weight (L) or wet weight (W)
Note:
unit
Description: unit of the concentration measurement and its uncertainty
Unit:
Type: discrete
Levels: %, d, mins, nmol/min/mg protein, nr/1000 cells, pmol/min/mg protein, st, TEQ ug/kg, ug/kg, ug/ml
Note:
concentration
Description: concentration of contaminant or equivalent for biological effects
Unit: see unit column
Type: continuous
Range: 0, 5600000
Note:
qflag
Description: less-than qualifier for the concentration measurement
Unit:
Type: categorical
Levels: "", D, Q, <
Notes:
* "" (a blank or missing value) indicates a non-censored measurement
* D indicates the measurement is left-censored at the limit of detection; i.e. the measurement is below the limit of detection, but it is not known by how much; the limit of detection is given in the concentration column
* Q indicates the measurement is left-censored at the limit of quantification; i.e. the measurement is below the limit of quantification, but it is not known by how much; the limit of quantification is given in the concentration column
* < indicates the measurement is left-censored by an unspecified censoring criterion (which could be the limit of detection or quantification); the value of the censoring criterion is given in the concentration column
uncertainty
Description: uncertainty in the concentration measurement
Unit: see unit column
Type: continuous
Range: 0.00008, 920099
Note: analytical uncertainty expressed as the standard deviation; not applicable to some biological effects measurements
LNMEA
Description: mean length
Unit: cm
Type: continuous
Range: 0.1, 103
Note: length of monitoring organism, or mean length if several individuals were pooled; there are unit errors in these data, so the data should be used with caution
DRYWT
Description: dry weight of the sample
Unit: %
Type: continuous
Range: 2.46, 91.6
Note: all values are above the limit of detection
DRYWT_uncertainty
Description: uncertainty in the dry weight measurement
Unit: %
Type: continuous
Range: 0.028, 12.4
Note: analytical uncertainty expressed as the standard deviation
LIPIDWT
Description: lipid weight of the sample
Unit: %
Type: continuous
Range: 0.1, 81.4
Note:
LIPIDWT_qflag
Description: less-than qualifier for the lipid weight measurement
Unit:
Type: categorical
Levels: "", D
Note: see qflag
LIPIDWT_uncertainty
Description: uncertainty in the lipid weight measurement
Unit: %
Type: continuous
Range: 0.019, 10.1
Note: analytical uncertainty expressed as the standard deviation
noinp
Description: number of individuals pooled in the sample
Unit: nr
Type: discrete
Range: 1, 260
Note:
FEMALEPOP
Description: % of the sample that are females
Unit: %
Type: continuous
Range: 17, 100
Note: used to model imposex data when submitted as a pooled sample
CMTQCNR
Description: Comet assay cells screened
Unit: nr
Type: discrete
Range: 41, 168
Note: used to model Comet assay (%DNATAIL) data
MNCQCNR
Description: Micronucleus assay cells screened
Unit: nr
Type: discrete
Range: 1000, 5000
Note: used to model Micronucleus assay (MNC) data