Data sharing

[ What kind of data do you have? | What do you want to do with your data? | How do you choose an appropriate tool to share your data? | Existing data repositories | How can you share the data using the EBP? ]

 

What kind of data do you have?


Biologists are joining the Big-Data club. This comes due to combined efforts in the domain of genetics and genomics (molecular data), but also due data from many monitoring programs, growing citizen sciences initiatives, data collected through remote sensing and satellite imagery and the vast digitization activities opening new perspective for data mining. This huge amount of data is of high scientific value and potential and should be mobilized to become more accessible via data portals to its end-users.

If you are a scientist (professional or amateur) or data manager and have data on biological diversity of any kind (collection, taxonomic, monitoring data or data from literature), most likely you do not want to keep the data only for yourself.

It can be also so that you have the data, but you do not want to share it right now, instead you would prefer to disseminate only information about the data, we call it the metadata. "Metadata" is "data about other data". It contains a description and context of content, using predefined attributes, aiming at providing a brief data about the characteristics of a resource (e.g. "who, what, where, when, how and on what purpose"). The modern data sharing tools give a possibility to share the metadata only and add the data later on.

 

What do you want to do with your data?


There are different possibilities what scientist can do with the (meta)data. They can keep it for themselves on local server, they can store it online on remote (hosting) server. Usually the second option means "data sharing" as it makes the resource available to other investigators. Note that access to data can be controlled and revoked! The data can be also published and "publishing" implies that once published, access can no longer be revoked. Both terms are often used interchangeably and suggest the use of common practices and standards ensuring that data can be discovered and reused effectively, and that data owners and custodians get the recognition they deserve for making datasets public.

Often projects dealing with data mobilization or platforms offering hosting services, tools for data sharing or data management develop own data sharing policies and principles and data provider is asked to agree to it.

The examples of data sharing aggreements:

GBIF Data sharing aggreement

EU BON data sharing aggreement

GEOSS data sharing principles

 

How do you choose an appropriate tool to share your data?


Even considering only biodiversity information domain, there are quite a few data sharing tools available there. Some are well known and widely used, such as tables or structured or semi-structured text documents. While other tools are very specific and designed for selected data types, models, specific applications or purposes.

In the framework of the EU BON project an extensive study was conducted on data publishing and data sharing tools. The report on this  assessment was published as a Deliverable 2.2 and in the RIO journal. There are also other tools, such as those for storage, data management, data capture and portals, which may also be used in data sharing workflows, but are not included in this report.

About 30 data sharing tools used in the natural history domain have been evaluated and is available as online reporsitory. This list is not meant to be exhaustive, but is a snapshot of the current state of the art.

Other searchable tools repositories:

BDTracker (Biodiversity Service and Application tracker)

DataOne Software tools catalogue

The EU BON has chosen for several tools which were adapted for the needs of the community  and will be available on the European Biodiversity portal (see below).

 

Existing data repositories


Shared/published data is usually indexed and made discoverable, browsable and searchable through biodiversity infrastructures. In recent decades amount of such infrustructures (platforms, portals, repositories) have increased substantially. The comprehensive list was compiled by the EU BON partners and is available on Resources menu. Some of these portals are multidisciplinary  (e.g., GBIF, DataOne, Pangaea etc.), others are more specific (e.g.  LTER for environmetal studies or BOLD for barcode data).

Other lists of repositories:

Data repositories on Wikipedia (including Biology domain)

Scientific Data (recommended data repositories for authors)

 

How can you share the data using the EBP?


Depending on kind of data you have (occurrences, ecological/environmental data, monitoring (including data from citizen science applications) or just a metadata, the European Biodiversity Portal (EBP) offeres you several tools to share the data and an overview of tools available elsewhere and their specifications.

Tools supported by EU BON:

Arpha Data Publishing Toolkit

Plazi TreatmentBank

Biodiversity datasets and metadata can be published through the GBIF network. The tool allows publication of four types of biodiversity datasets:

  • primary occurrence data (specimens and observations)
  • species checklists and taxonomie
  • sample-based data from monitoring programs
  • metadata-only
Occurrence and specimen records stored in different databases (e.g. GBIF, BOLD, iDigBio, PlutoF) can be directly imported into manuscripts and the data papers can be created. An online service to create, record, manage, share, analyze and mobilize
biodiversity data. Data types include ecology, taxonomy, metagenomics, nature conservation and natural history collections.
A platform to store, annotate, access and distribute taxonomic treatments and the data objects within them.

Additional resources: