Training materials

Title Content Author

Additional resources

Integrating biodiversity networks through Software Oriented Architecture

 

In this training session we will introduce the concepts of Software Oriented Architecture, Business Process Modelling and  Enterprise Application Integration, analyzing their relevance with EU BON architectural design (D 2.1) and presenting different ways to accomplish the integration of biodiversity networks and other data sources. We will present a demonstration of a working system that will integrate several data sources through and Enterprise Services Bus. Francisco Antonio García Camacho (CSIC)  

Data standards: publishing sample-based data using the GBIF Integrated Publishing Toolkit

The GBIF Integrated Publishing Toolkit (IPT) is the recommended application for publishing data to the GBIF network. To date, it has been used to publish three types of data:  taxon occurrences, checklists and data set level metadata.  In this session, we will explore its adaptation for publishing sample based data.  First, we will review the essential attributes of sample data that need to be captured. Then we will introduce the Darwin Core Archive data format, explain the constraints imposed by its star – schema, relational data model, and address the requirement for additional terminology in the Darwin Core vocabulary to describe the attributes of sample data together with controlled value vocabularies for some attributes.  A prototype of the IPT adapted for sample data will be demonstrated and participants encouraged to test it with their own data sets.

Eamonn O Tuama (GBIF)

IPT manual

GBIF resources

Information architecture – GEOSS perspective

 

The session aims at introducing the architecture of the Global Earth Observation System of Systems (GEOSS), the GEOSS Common Infrastructure (GCI) and the GEOSS Brokering Framework. GEOSS has been created by an international voluntary effort that connects geospatial, Earth Observation and information infrastructures, acting as a gateway between producers of environmental data and end users. GEOSS aims at enhancing the relevance of Earth Observation and at offering public access to comprehensive, comprehensive, and sustained near-real time data, information and analyses of the environment. The GCI allows the user of Earth observations to access, search and use the data, information, tools and services available through GEOSS. The GEOSS Brokering Framework implements multi-disciplinary interoperability and lower entry barriers for both users and data providers, allowing them to continue using their tools and publishing their resources according to their standards.

The session includes a live, interactive demonstration of the GEOSS Discovery & Access Broker, based on material from the "Bringing GEOSS services into practice" workshop, which session attendants may practice on their own computer.

Lorenzo Bigagli (GEOSS)  
 

Data sharing and repositories in the GBIF network

The GBIF network is diverse, spanning more than 500 institutions and connecting thousands of databases using a variety of protocols and tools.  The key components of the network include the data publishing repositories, a central coordinating registry and a sophisticated search index, which supports the GBIF portal - itself consider a data repository in wider networks.  During this session a live demonstration of data sharing between repositories will be given, during which the architecture of the network will be described.  An installation of the GBIF Integrated Publishing Toolkit (IPT) which acts as a data publishing repository will be used to demonstrate the services of the GBIF registry component, and specifically the management of data profiles (standards) available to data publishers.  A dataset will be mapped, and registered with GBIF.  Crawling components will be alerted automatically, and the data will be indexed and made available for discovery and access through the GBIF portal and web services API.  Some observations about this architecture will be offered, including the opportunity to collaborate with the EU BON partners to improve data security through redundant storage.

This session is targeted for people interested in the GBIF architecture and key components, the data flows within the GBIF network, the GBIF publishing tool and those interested in interfacing with GBIF through the portal web services API.  Being a live demo, opportunity will be given to address questions along the way, with the overarching goal that participants leave with a better understanding of the data flows than before the session.

Tim Robertson (GBIF)  

Data flow and modelling in virtual laboratory

 

 

The Biodiversity Virtual e-Laboratory, BioVeL, addresses research challenges by having scientists and computer engineers working together to develop tools for pipelining data and analysis into efficient analytical pipelines, called "workflows." Workflows are complex digital data manipulations and modelling tasks that execute sequences of web services. BioVeL designs and deploys such workflows for a selected number of important areas in systematic, ecological, and conservation research, e.g. for the analysis of data sets with ecological, taxonomic, phylogenetic, and environmental information.

BioVeL data refinement and ecological niche modelling workflows allow researchers to (i) explore, access, refine, and format large data sets from major data providers; (ii) combine disparate data sets with the researchers' own data; and (iii) run complex and computationally intense analytical cycles. (iv) generate comparative maps of species distribution.

The training workshop will demonstrate use of the informatics tools and services developed by the BioVeL project to address research topics such as historical analyses, invasive species distribution modelling, endangered species distribution modelling, and dynamic modelling of ecologically related species, and Essential Biodiversity Variables. In particular, there will be introduction to the BioVeL e-infrastructure and portal. Examples of taxonomic data cleaning, ecological niche modelling, model testing, statistical analysis of GIS data, invasive and endangered species distribution modelling, and historical comparison biodiversity from museum collections will be shown.

Hannu Saarenmaa (BioVeL)

Ecological Niche Modeling workflows (ENM) and tutorial there in.

Data sharing and repositories in DataONE

 

The mission of the Data Observation Network for Earth (DataONE) is to enable new science and knowledge creation through universal access to data about life on earth and the environment that sustains it.  Organizations that collect, manage, or distribute data relevant to the Earth and the environmental sciences can collaborate with each other and with DataONE by becoming Member Nodes in DataONE.  This collaboration brings broader exposure to the organization’s holdings, tools for end-users to more directly access and use data (the DataONE Investigator Toolkit), and tools to assist the organization with their preservation and curation missions.  DataONE also makes available a wide range of educational materials and best practice guides for community use in data management education and has conducted sociocultural studies on the barriers and enablers for improved data sharing.  This talk will provide an overview of DataONE, highlight the sociocultural and technical approaches used by DataONE to enable data sharing and data interoperability, and explore ways that DataONE and other projects can collaborate with each other. Bruce E. Wilson (DataONE)  
Introduction to GEOSS, GEO BON, EU BON The presentation covers origins, organisation, and current plans of the Group on Earth Observation (GEO), the goals and information architecture of the Global Earth Observation System of Systems (GEOSS).  GEOSS Portal, registry system, data sharing principles, available data, and brokering mechanisms is explained. Participation in the GEOSS Architecture Implementation pilot process is discussed. The GEO Biodiversity Observation System (GEO BON) and its aims for information management was also covered. Hannu Saarenmaa (UEF)  
Information architecture of EU BON

The presentation covers the current update or the software architecture that will support the EU BON Biodiversity Portal, focusing on the brokering alternatives and data/metadata sharing standards (EML, OGC-CSW, SOAP and REST interfaces..). GI-cat is introduced as a brokering tool to integrate new data sources. This session also includes a demonstration of the first implementation prototype of the EU BON Biodiversity Portal.

Francisco Antonio García Camacho (CSIC)  
Data standards, Darwin Core and extensions for sample-based quantitative data The Darwin Core vocabulary, extended with a small number of additional terms, can be used in a Darwin Core Archive to encode information from sample-based data sets, i.e., data sets associated with environmental, ecological, and natural resource investigations. Such data are usually quantitative, calibrated, and follow certain protocols so that changes and trends in populations can be detected. This session introduces the new terms, the star schema model underlying Darwin Core archives consisting of a core table linked to one or more extension tables, and the associated enhancements to the GBIF Integrated Publishing Toolkit (IPT) to support publishing of sample-based data. Éamonn Ó Tuama (GBIF)  
Demonstration of GBIF/EU BON IPT for monitoring networks

The Integrated Publishing Toolkit (IPT) is a software tool developed by GBIF, aiming to facilitate the sharing and publishing of biodiversity data on the Internet using the GBIF network. It uses the Darwin Core standard to map species occurrence datasets and checklists, and can also handle data from natural sciences collections or observations. Since additional, sample-based terms are added to the DwC vocabulary, the IPT tool can be used by various monitoring networks collecting mainly quantitative data (environmental, ecological, and natural resource investigations). The practical part of the training will present the IPT tool, and explain how to publish your dataset  using the IPT tool, via a practical example using occurrence data.  Extensions of the IPT will be presented and issues will be discussed after the training.

Larissa Smirnova and Franck Theeten (RMCA)

Presentation by F.Theeten on  vocabularies

Presentation by F.Theeten on EU BON IPT

Presentation by I.Peer the sample-based data example from Israel

Introduction to sample-based data publishing

In this session we address the recent adaptation of the Darwin Core standard to accommodate data coming from sampling efforts. The changing landscape of natural sciences and the wide variety of the sample-based data produced these days are introduced. The possible application of sample-based data and their importance for modern biology is discussed. Several example use cases illustrating different types of sample data are presented.

The presentation contains following elements:

  • The definition and types of sample-based data
  • The use of sample-based data
  • How to express sample-based data using the Darwin Core model
  • Enabling discovery and access to sample-based data via IPT
  • Examples of sample-based datasets
  • Presentation of the use cases
Larissa Smirnova (RMCA) The data publishing landscape, gaps and mobilization efforts (GBIF / EU BON)
How to make vegetation sampling data accessible on GBIF.org

Scientists can now make their sampling data accessible on GBIF.org in order to enhance accessibility for other researchers and show a commitment to open access and reproducibility that are integral to scientific inquiry. GBIF.org is the world's largest source for species occurrence data, providing free and  open access to more  than 640 million occurrences from more  than 15,000 datasets published by nearly 800 institutions. Its near real-time infrastructure is  widely used, too, currently  averaging more than one substantive use in peer-reviewed research per day.  Over the  past  two years, however, the GBIF Secretariat has been working with EU BON  partners and the  wider  biodiversity informatics community to enable sharing of “sample event datasets”. These data derive from environmental, ecological, and natural  resource  investigations that follow standardized protocols for measuring and observing biodiversity.
GBIF.org could  not provide this type of data previously due to  the complexity of encoding   the underlying protocols in standardized ways. In March 2015, TDWG, an international body responsible for maintaining standards for the exchange of biological data, ratified changes  to Darwin Core that enable support for mobilization of sample event based data, in particular species abundance. Then  in September 2015, GBIF enhanced a new version of  the Integrated Publishing Toolkit, or IPT (its free open-source data publishing software) that enables publication of sample event datasets and updated GBIF.org with enhanced indexing and discovery of these datasets.
The purpose of this presentation is to highlight that GBIF now supports sample event datasets, and to explain how scientists can share their datasets freely and openly   through GBIF.org using the IPT. As an example, the presentation will demonstrate how vegetation plot surveys exported from TurboVeg gets converted into the new Darwin Core sample event format.

Kyle Braak (GBIF)

Practical demo on how to prepare and map different types of data

IPT manual v.2.3

Video tutorial on siйple data publishing to the IPT by Nicolas Noé

Registration of data in GEOSS. EU BON registry and biodiversity portal review

The session aims at introducing the user the last architectural updates in GEOSS Common Infrastructure (GEOSS GCI), focusing in particular on the GEO Discovery and Access Broker (GEO DAB) and its relation to earth observation and biodiversity networks.

The session will also introduce GI-cat, the message broker that acts as the main subsystem in the core of the GEO DAB and the EU BON metadata registry as well, harvesting datasets and storing their metadata using standardised models. The last part of the session will introduce the EU BON Biodiversity Portal (beta release), exploring some tools currently in development.

Francisco Antonio García Camacho (CSIC)  
The Data Publishing Toolkit at EU BON: Automated creation of data papers, data and text integrated publishing via the ARPHA Publishing Platform

The session aims at introducing the notion of data publishing by some newly developed data publishing workflows via the ARPHA Writing Tool, ARPHA Publishing Platform, and its associated journals: Biodiversity Data Journal (BDJ), Research Ideas and Outcomes (RIO), and One Ecosystem. ARPHA is an innovative publishing solution developed by Pensoft that supports the full life cycle of a manuscript, from authoring and reviewing to publishing and dissemination. The data publishing strategy of ARPHA aims at increasing the proportion of structured text and data within the article content, so as to allow for both human use and machine readability to the maximum possible extent.

Biodiversity Data Journal is one of the most innovative and technologically advanced open access scholarly journals in biodiversity science. Since its launch in September 2013 more than 260 articles have been published in BDJ and the interest of the scientific community is continuously growing. BDJ and the ARPHA Writing Tool associated to it are core elements of EU BON’s Data Publishing Toolkit.

A key element of the training will be to demonstrate how data paper manuscripts can be created at the click of a button by importing EML metadata from GBIF IPT, DataONE and LTER, as well as online import of occurrence records into manuscripts for BDJ from GBIF, BOLD, iDigBio and PlutoF, and export of published specimen records in Darwin Core Archive and GBIF​,​ and treatments to Plazi.

The session includes a live, interactive demonstration of the functionalities of both, ARPHA and BDJ, which attendants may practice on their own computers.

Teodor Georgiev and Viktor Senderov (Pensoft) Arpha Pensoft Guidelines
Managing citizen science projects with PlutoF workbench

You want to start a citizen science project but you are not so sure how to manage mass of observations from your project participants? You want to engage experts to evaluate observations based on photos or sound recordings but lack communication tools? You need to export your project data as Darwin Core compliant spreadsheet?

PlutoF workbench offers tools to design observation data forms, moderate observations, publish project data, etc. In addition, your observation data can be linked to GBIF - a respected biodiversity data hub.

During the demo session you can get the basics for biodiversity related citizen science project management with PlutoF workbench.

Veljo Runnel and Allan Zirk (UTARTU)

New developments in PlutoF web workbench by A. Zirk

PlutoF video tutorials

PlutoF manuals

From a scientific publication to data in EU BON: GoldenGate Imagine conversion and TreatmentBank as access and dissemination tool of published research data

Taxonomic, floristic and faunistic publications and expedition reports include the entire description of all the species of the world’s biodiversity. They can be very rich in data and are often the only record known of a given species. There are millions of such taxonomic treatments in the printed record, but unlike the rapidly growing number of online accessible observation and DNA records, this data is still sleeping a Sleeping Beauty’s sleep. Treatments are the explicit part of taxonomic name usages, and are the first hand link between a name and the underlying research data. In cyberspace, this constellation allows to provide a link from a taxonomic name to the treatment and from there to a wealth of linked data, from the host article to specimens to multimedia. Finally, access to treatments and data therein allows a novel level of data analyses and visualization.
Plazi’sTreatmentBank and GoldenGateprovides the tools to convert biodiversity literature into semantically enhanced documents, its storing, worldwide dissemination and analyses. It also allows direct import of articles born with Taxpub Journal Article Tag Suit based biodiversity domain specific markup, such as the journals published by Pensoft. The minting of persistent identifiers allows to create a link from a taxonomic name to its treatment.
The trainings session will provide an introduction into the concept of semantic markup and treatments. The workflow will be demonstrated from the conversion of an article to the upload to TreatmentBank to is visualization and import to the Global Biodiversity Information Facility, where the data is hosted for usage in EU BON. Participants will have to option to convert their own articles.

Donat Agosti and Guido Sautter (Plazi)

Software: http://plazi.org/resources/treatmentbank/

GoldenGate manual