- Integrating biodiversity networks through Software Oriented Architecture - Antonio Garcia, CSIC
- Data standards: publishing sample-based data- Eamonn O Tuama, GBIF
- Information architecture - GEOSS perspective – Lorenzo Bigagli, GEOSS
- Data sharing and repositories in GBIF network - Tim Robertson
- Data flow and modelling in virtual laboratory - Hannu Saarenmaa, BioVeL
- Data sharing and repositories in DataONE – Bruce E. Wilson
The First EU BON Training Event was held on April 3rd 2014, following the General Meeting of EU BON in Crete. Information regarding logistics (location, accommodation) can be found here: http://www.symposium.eubon.eu/general-meeting-2014.
Audience: EU BON-members.
More information: contact kim.jacobsen (at) africamuseum.be, larissa.smirnova (at) africamuseum.be
Photo: Heraklion port (Wiki Commons)
Titles and Abstracts:
1. Integrating biodiversity networks through Software Oriented Architecture - Antonio Garcia (CSIC)
Presentation: EU BON_EAI_SOA-presentation.pdf
In this training session we will introduce the concepts of Software Oriented Architecture, Business Process Modelling and Enterprise Application Integration, analyzing their relevance with EU BON architectural design (D 2.1) and presenting different ways to accomplish the integration of biodiversity networks and other data sources. We will present a demonstration of a working system that will integrate several data sources through and Enterprise Services Bus.
2. Data standards: publishing sample-based data using the GBIF Integrated Publishing Toolkit - Eamonn O Tuama (GBIF)
The GBIF Integrated Publishing Toolkit (IPT) is the recommended application for publishing data to the GBIF network. To date, it has been used to publish three types of data: taxon occurrences, checklists and data set level metadata. In this session, we will explore its adaptation for publishing sample based data. First, we will review the essential attributes of sample data that need to be captured. Then we will introduce the Darwin Core Archive data format, explain the constraints imposed by its star – schema, relational data model, and address the requirement for additional terminology in the Darwin Core vocabulary to describe the attributes of sample data together with controlled value vocabularies for some attributes. A prototype of the IPT adapted for sample data will be demonstrated and participants encouraged to test it with their own data sets.
3. Information architecture – GEOSS perspective - Lorenzo Bigagli (GEOSS)
The session aims at introducing the architecture of the Global Earth Observation System of Systems (GEOSS), the GEOSS Common Infrastructure (GCI) and the GEOSS Brokering Framework. GEOSS has been created by an international voluntary effort that connects geospatial, Earth Observation and information infrastructures, acting as a gateway between producers of environmental data and end users. GEOSS aims at enhancing the relevance of Earth Observation and at offering public access to comprehensive, comprehensive, and sustained near-real time data, information and analyses of the environment. The GCI allows the user of Earth observations to access, search and use the data, information, tools and services available through GEOSS. The GEOSS Brokering Framework implements multi-disciplinary interoperability and lower entry barriers for both users and data providers, allowing them to continue using their tools and publishing their resources according to their standards.
The session includes a live, interactive demonstration of the GEOSS Discovery & Access Broker, based on material from the "Bringing GEOSS services into practice" workshop, which session attendants may practice on their own computer.
Operating system: Windows, OS X, Linux
Memory (RAM): minimum 4 GB
Disk space: minimum 20 GB
Follow the instructions at: http://www.unige.ch/sig/enseignements/GeossInPractice/Start.html
4. Data sharing and repositories in the GBIF network - Tim Robertson (GBIF)
The GBIF network is diverse, spanning more than 500 institutions and connecting thousands of databases using a variety of protocols and tools. The key components of the network include the data publishing repositories, a central coordinating registry and a sophisticated search index, which supports the GBIF portal - itself consider a data repository in wider networks. During this session a live demonstration of data sharing between repositories will be given, during which the architecture of the network will be described. An installation of the GBIF Integrated Publishing Toolkit (IPT) which acts as a data publishing repository will be used to demonstrate the services of the GBIF registry component, and specifically the management of data profiles (standards) available to data publishers. A dataset will be mapped, and registered with GBIF. Crawling components will be alerted automatically, and the data will be indexed and made available for discovery and access through the GBIF portal and web services API. Some observations about this architecture will be offered, including the opportunity to collaborate with the EU BON partners to improve data security through redundant storage.
This session is targeted for people interested in the GBIF architecture and key components, the data flows within the GBIF network, the GBIF publishing tool and those interested in interfacing with GBIF through the portal web services API. Being a live demo, opportunity will be given to address questions along the way, with the overarching goal that participants leave with a better understanding of the data flows than before the session.
5. Data flow and modelling in virtual laboratory - Hannu Saarenmaa (BioVeL)
Presentation: Hannu - lecture_GBIFfrance_1.pdf
The Biodiversity Virtual e-Laboratory, BioVeL, addresses research challenges by having scientists and computer engineers working together to develop tools for pipelining data and analysis into efficient analytical pipelines, called "workflows." Workflows are complex digital data manipulations and modelling tasks that execute sequences of web services. BioVeL designs and deploys such workflows for a selected number of important areas in systematic, ecological, and conservation research, e.g. for the analysis of data sets with ecological, taxonomic, phylogenetic, and environmental information.
BioVeL data refinement and ecological niche modelling workflows allow researchers to (i) explore, access, refine, and format large data sets from major data providers; (ii) combine disparate data sets with the researchers' own data; and (iii) run complex and computationally intense analytical cycles. (iv) generate comparative maps of species distribution.
The training workshop will demonstrate use of the informatics tools and services developed by the BioVeL project to address research topics such as historical analyses, invasive species distribution modelling, endangered species distribution modelling, and dynamic modelling of ecologically related species, and Essential Biodiversity Variables. In particular, there will be introduction to the BioVeL e-infrastructure and portal. Examples of taxonomic data cleaning, ecological niche modelling, model testing, statistical analysis of GIS data, invasive and endangered species distribution modelling, and historical comparison biodiversity from museum collections will be shown.
6. Data sharing and repositories in DataONE – Bruce E. Wilson (DataONE)
Presentation: DataONE Overview EU BON 2014-04.pdf
The mission of the Data Observation Network for Earth (DataONE) is to enable new science and knowledge creation through universal access to data about life on earth and the environment that sustains it. Organizations that collect, manage, or distribute data relevant to the Earth and the environmental sciences can collaborate with each other and with DataONE by becoming Member Nodes in DataONE. This collaboration brings broader exposure to the organization’s holdings, tools for end-users to more directly access and use data (the DataONE Investigator Toolkit), and tools to assist the organization with their preservation and curation missions. DataONE also makes available a wide range of educational materials and best practice guides for community use in data management education and has conducted sociocultural studies on the barriers and enablers for improved data sharing. This talk will provide an overview of DataONE, highlight the sociocultural and technical approaches used by DataONE to enable data sharing and data interoperability, and explore ways that DataONE and other projects can collaborate with each other.