Species observation data - Quality principles

Data quality: a major problem for the SINP

Quality is a subjective and relative concept that corresponds to matching the representation of reality given by an information system, with reality as perceived by users.

Records of naturalistic observations are the source feeding many diagnoses, evaluation, and biodiversity assessments. Data input is made by the observer on either material (paper) or digital media.

Informations and the precision of elements comprising a naturalit's data will determine whether it'll be taken into account or discarded before use, specifically for relevant scientific analysis. This data quality allows to qualify data (datasets) for a given use.

More than just the observer, other people will intervene on data in between the data input and the numerous potential uses. Naturalist experts, managers and database administrators, programmers, analysts, users, etc.

All actors must take care to preserve data quality, as degrading the information (simplifying, erasing attributes, etc.) can happen at any data life stage (while collecting, digitizing, documenting, saving, analysint, or manipulating data).

Data quality on species in an information system can be defined with seveal criteria:

taxonomic, spatial, or temporal precision,
completeness of given information (lifestage of the species, biological status, informations required by a collection protocol...),
data structuration according to the protocol (for example regrouping observations by transect, using the groupings provided for in the exchange standard, ...),
documentation of the acquisition context (metadata describing, notably, protocoles and collection protocols),
traceability and sources (observer, determinator, validator),
use of controlled vocabulary (list of scientific names or description elements) which is a must have for a homogeneous data tratment, or to ensure interoperability (concept of a common language, allowing for a good communication between information systems),
scientific validity (taxonomic determination reliability, most notably),
data consistency (between data),
unicity within a system (spotting possible duplicate entries),
data updates (checking on possible updates, taking into account remarks that were made about a specific datum).

Data Quality Components

Principles ensuring data quality

To ensure availability of quality data, the whole production chain must be watched, as soon as it is possible in the data life cycle, and at first while collecting it.
More about the collection: Guide de bonnes pratiques pour la collecte et la saisie de données naturalistes

Actors must be provided with input and management tools adapted to their needs, that are as interpoerable as possible in order to facilitate data sharing.
More about the tools: Guide pratique pour le développement et le choix d'un outil de saisie de données naturalistes

Data curation (notably standardization) must degrade data as little as possible.
More about the standardization: Guide pratique pour la standardisation des données naturalistes

Data use must be adapted to available or selected data. Users should be able to understand data well in order to better use them, which stresses the importance of describing datasets well (metadata).
More about the valorisation: Guide pratique pour la valorisation et le post-traitement de données naturalistes

Of note
The validation process within the SINP is described in the SINP methodological guide for conformity, consistency and scientific validation of data and metadata. This guide describes the general methodology, terminology, and principles on duplicate entry identification, conformity, consistency, and scientific validation.