Biodiversity data sharing and data publishing workshop

The training took place in Sofia, Bulgaria on 22-23 March 2016.

Addressing global problems, such as biodiversity loss and impacts of climate change requires open access to data. This was concluded by world leaders at Johannesburg Summit in 2002 when they established the Group for Earth Observations (GEO).  EU BON (Building the European Biodiversity Observation Network) seeks to enhance biodiversity data availability and integration, and is the European contribution to the GEO Biodiversity Observation Network.

All biodiversity databases need to be integrated in GEO.  Therefore, EU BON undertakes capacity building of biodiversity communities (e.g. researchers, citizen scientists, non-governmental organisations) that are involved in collecting and disseminating biodiversity information, including monitoring initiatives.

This training event is directed to biologists and other life scientists who are actively involved in monitoring  and managing biodiversity data. A core topic of this training will be the  publishing of biodiversity data, in particular species occurrences, sample-based and citizen science data. The training will include a practical session during which participants will be assisted by experienced trainers from the EU BON project.

The sessions on data sharing will cover introductory overview of key concepts,  demonstration and  practical  exercise using the GBIF Integrated Publishing Toolkit  (IPT). The Global Biodiversity Information Facility (GBIF) is the world's largest initiative for enabling free access to biodiversity data via internet.

Special attention will be paid to data paper publishing led by specialists from Pensoft Publishers - a company well known among biodiversity scientists worldwide for technologically cutting-edge open access journals, such as: Research Ideas and Outcomes, ZooKeys, Biodiversity Data Journal, Nature Conservation, and a strong advocate of data publishing. Registration of data in GEO registry system will be also addressed during the workshop.

DSC_1673.JPG DSC_1679.JPG

The following topics were covered:

Session 01: Welcome, practicalities, introductions. Participants introduction.

Session 02: The data publishing landscape, gaps and mobilization efforts (GBIF/EU BON).

Presentation Larissa Smirnova

Session 03: How to make vegetation sampling data accessible on

Scientists can now make their sampling data accessible on in order to enhance accessibility for other researchers and show a commitment to open access and reproducibility that are integral to scientific inquiry. is the world's largest source for species occurrence data, providing free and  open access to more  than 640 million occurrences from more  than 15,000 datasets published by nearly 800 institutions. Its near real-time infrastructure is  widely used, too, currently  averaging more than one substantive use in peer-­reviewed resarch per day.  Over the  past  two years, however, the GBIF Secretariat has been working with EU BON  partners and the  wider  biodiversity informatics community to enable sharing of “sample event datasets”. These data derive from environmental, ecological, and natural  resource  investigations that follow standardized protocols for measuring and observing biodiversity. could  not provide this type of data previously due to  the complexity of encoding   the underlying protocols in standardized ways. In March 2015, TDWG, an international body responsible for maintaining standards for the exchange of biological data, ratified changes  to Darwin Core that enable support for mobilization of sample event based data, in particular species abundance. Then  in September 2015, GBIF enhanced a new version of  the Integrated Publishing Toolkit, or IPT (its free open-­source data publishing software) that enables publication of sample event datasets and updated with enhanced indexing and discovery of these datasets.
The purpose of this presentation is to highlight that GBIF now supports sample event datasets, and to explain how scientists can share their datasets freely and openly   through using the IPT. As an example, the presentation will demonstrate how vegetation plot surveys exported from TurboVeg gets converted into the new Darwin Core sample event format.

Presentation Kyle Braak

Session 04: Practical demo on how to prepare and map different types of data.

You have your  observation data in a spreadsheet or database and you want to publish it at GBIF? We will demonstrate it using the IPT and Darwin Core Archive extensions mechanism. All steps between will be demonstrated and discussed.

The Integrated Publishing Toolkit (IPT) is a software tool developed by GBIF, aiming to facilitate the sharing and publishing of biodiversity data on the Internet using the GBIF network. It uses the Darwin Core standard to map species occurrence datasets and checklists, and can also handle data from natural sciences collections or observations. Since additional, sample-based terms are added to the DwC vocabulary, the IPT tool can be used by various monitoring networks collecting mainly quantitative data (environmental, ecological, and natural resource investigations).

Presentation Larissa Smirnova

IPT user manual

Video tutorial on simple data publishing to the IPT by Nicolas Noé

Session 05: Data publishing practice in groups.

This session will focus on practical work. You will do some real data publishing using the latest version of the IPT and using realistic use cases (or own data).

Session 06: Registration of data in GEOSS. EU BON registry and biodiversity portal review.

The session aims at introducing the user the last architectural updates in GEOSS Common Infrastructure (GEOSS GCI), focusing in particular on the GEO Discovery and Access Broker (GEO DAB) and its relation to earth observation and biodiversity networks.

The session will also introduce GI-cat, the message broker that acts as the main subsystem in the core of the GEO DAB and the EU BON metadata registry as well, harvesting datasets and storing their metadata using standardised models. The last part of the session will introduce the EU BON Biodiversity Portal (beta release), exploring some tools currently in development.

Presentation Antonio Camacho

Session 07: The Data Publishing Toolkit at EU BON: Automated creation of data papers, data and text integrated publishing via the ARPHA Publishing Platform.

The session aims at introducing the notion of data publishing by some newly developed data publishing workflows via the ARPHA Writing Tool, ARPHA Publishing Platform, and its associated journals: Biodiversity Data Journal (BDJ), Research Ideas and Outcomes (RIO), and One Ecosystem. ARPHA is an innovative publishing solution developed by Pensoft that supports the full life cycle of a manuscript, from authoring and reviewing to publishing and dissemination. The data publishing strategy of ARPHA aims at increasing the proportion of structured text and data within the article content, so as to allow for both human use and machine readability to the maximum possible extent.

Biodiversity Data Journal is one of the most innovative and technologically advanced open access scholarly journals in biodiversity science. Since its launch in September 2013 more than 260 articles have been published in BDJ and the interest of the scientific community is continuously growing. BDJ and the ARPHA Writing Tool associated to it are core elements of EU BON’s Data Publishing Toolkit.

A key element of the training will be to demonstrate how data paper manuscripts can be created at the click of a button by importing EML metadata from GBIF IPT, DataONE and LTER, as well as online import of occurrence records into manuscripts for BDJ from GBIF, BOLD, iDigBio and PlutoF, and export of published specimen records in Darwin Core Archive and GBIF​,​ and treatments to Plazi.

The session includes a live, interactive demonstration of the functionalities of both, ARPHA and BDJ, which attendants may practice on their own computers.

Presentation Teodor Georgiev and Viktor Senderov

Session 08: Managing citizen science projects with PlutoF workbench.

You want to start a citizen science project but you are not so sure how to manage mass of observations from your project participants? You want to engage experts to evaluate observations based on photos or sound recordings but lack communication tools? You need to export your project data as Darwin Core compliant spreadsheet?

PlutoF workbench offers tools to design observation data forms, moderate observations, publish project data, etc. In addition, your observation data can be linked to GBIF - a respected biodiversity data hub.

During the demo session you can get the basics for biodiversity related citizen science project management with PlutoF workbench.

Presentation Veljo Runnel

Presentation Allan Zirk

PlutoF videotutorials

PlutoF manuals

Session 09: From a scientific publication to data in EU BON: GoldenGate Imagine conversion and TreatmentBank as access and dissemination tool of published research data.

Taxonomic, floristic and faunistic publications and expedition reports include the entire description of all the species of the world’s biodiversity. They can be very rich in data and are often the only record known of a given species. There are millions of such taxonomic treatments in the printed record, but unlike the rapidly growing number of online accessible observation and DNA records, this data is still sleeping a Sleeping Beauty’s sleep. Treatments are the explicit part of taxonomic name usages, and are the first hand link between a name and the underlying research data. In cyberspace, this constellation allows to provide a link from a taxonomic name to the treatment and from there to a wealth of linked data, from the host article to specimens to multimedia. Finally, access to treatments and data therein allows a novel level of data analyses and visualization.
Plazi’s TreatmentBank and GoldenGate provides the tools to convert biodiversity literature into semantically enhanced documents, its storing, worldwide dissemination and analyses. It also allows direct import of articles born with Taxpub Journal Article Tag Suit based biodiversity domain specific markup, such as the journals published by Pensoft. The minting of persistent identifiers allows to create a link from a taxonomic name to its treatment.
The trainings session will provide an introduction into the concept of semantic markup and treatments. The workflow will be demonstrated from the conversion of an article to the upload to TreatmentBank to is visualization and import to the Global Biodiversity Information Facility, where the data is hosted for usage in EU BON. Participants will have to option to convert their own articles. Software can be downloaded from

Presentation Donat Agosti and Guido Sautter




Event type: 




Start date: 

Tuesday, March 22, 2016 - 09:00

End date: 

Wednesday, March 23, 2016 - 17:00