What data are of interest?
All shared biodiversity data are valuable and important. But certain data types are particularly valuable: those that have been systematically collected and those that fill glaring gaps in current data.
Ideally, biodiversity and ecosystems research would be based on massive amounts of systematically collected data about all organism groups. Current biodiversity data is far from this ideal. It is mostly from citizen scientists and natural history collections, and is usually not collected in an organized way. This makes it challenging to correctly analyze biodiversity patterns and trends. Therefore, systematically collected biodiversity data are particularly welcome additions. The current data are also strongly biased towards certain organism groups, areas and time periods. Datasets that help fill those gaps are also high on the wish list.
Systematically collected data
In 2015, GBIF implemented a data standard extension for systematically collected data. The new standard allows data providers to express in detail how the data were collected. For instance, you can describe the methods used to collect the sample, and the organism groups targeted (see GBIF best practices for more details). This information allows more sophisticated analysis of data from monitoring programs and other projects that use systematic data collecting methods.
We now see a rapid increase in sample-based data but it still constitutes a minor fraction of the total data volume. More sample-based datasets are therefore particularly welcome.
Some of the current datasets that are shared as unstructured occurrence data actually consist of sample-based data. Efforts to reformat these datasets so that they can be shared as sample-based data are therefore also extremely valuable.
Taxonomic, spatial and temporal gaps
Ideally, we would have access to an equal amount of information about all organism groups from all regions and all time periods. However, this is far from the current situation.
Taxonomically, the current data are dominated by birds, mammals and flowering plants. We have enormous data gaps for many groups of insects, fungi and microorganisms. Therefore, datasets targeting such organism groups are particularly welcome.
Similarly, the current data is characterized by strong regional biases. We have comparatively rich information from Western Europe, North America and Australia but very little from large parts of Africa and Asia. Even within countries, it is typically the case that we have more data from densely populated regions. Thus, datasets from poorly sampled regions are particularly valuable.
It is also true that we have more data from recent time periods than from the past. Historical data are particularly valuable when analyzing long-term trends affecting biodiversity. The older the data, the more detailed it is, and the more we know about how the data were obtained, the better. Therefore, data from old specimens in natural history collections, from archaeological or palaeontological samples, or from long-running monitoring programs are high on the wish list.
For some discussion on global biodiversity data gaps, see the GBIF blog posts tagged “data gaps”.