GBIF Integrated Publishing Toolkit (IPT)

 

About the IPT


The GBIF IPT (Integrated Publishing Toolkit) is an open source software widely used to publish and share biodiversity datasets on the GBIF network. It uses the standards Darwin Core (DwC) and Ecological Metadata Language (EML). The IPT v2.3 is the latest release (September 10th 2015) and has been developed together with the EU BON, in form of first IPT prototype to test the handling of sample-based data with several uses cases from the EU BON monitoring test sites.

To be able to publish data using the IPT you have to be registered at one or another IPT instance (recommended way), including EU BON IPT or to install the IPT on your own computer or sever.

  • Consult the GBIF user manual (available in English and Spanish) and online video to learn how to install the IPT.
  • Online video tutorial on how to publish occurrence data using the IPT.
  • IPT Wiki provides you with IPT User Manual and a variety of other valuable resources including FAQ.
  • Visit IPT Website if you're searching for a more complete description of this project, its uptake statistics, release history, or roadmap.
  • To report the bugs, technical issues - use a bug/feature request tracker
  • To follow IPT developments, discussions and feedback of IPT users subscribe to the IPT mailing list

What type of data can be published using the IPT?


  • Observation data
  • Occurrences data from the literature
  • Natural history collections
  • Checklists
  • Sample-based data (data from monitoring programs, citizen science initiatives, ecological/environmental study)
  • Metadata

 

In what format should be the data?


Data can be in text file format (e.g. .txt, .tab, .csv), Excel, or in a database. You can also upload an existing Darwin Core Archive (DwC-A) or eml.xml file for the metadata coming from other repositories (e.g. LTER).

 

IPT workflow (based on Canadensys guide book):


 

Standards used


Darwin Core (DwC), Darwin Core Archive (DwC-A), Ecological Metadata Language (EML)

 

Pros and Cons of the tool


Pros:

  1. Publication of different types of biodiversity data: i) primary occurrence data (specimens, observations), ii) species checklists and taxonomies, iii) sample data.
  2. Integrated metadata editor for publishing dataset level metadata.
  3. Internationalisation: user interface available in six different languages: English, French, Spanish, Traditional Chinese, Brazilian Portuguese, Japanese; instructions are available for translating the interface.
  4. Data security: controls access to data sets using three levels of dataset visibility: private, public and registered; controls which users can modify data sets, with four types of user roles.
  5. Integration with GBIF Registry: can automatically register data sets in the GBIF Registry; registration enables global discovery of data sets in both the GBIF Registry, and GBIF Data Portal.
  6. Support for large data sets: can process ~500,000 records/minute during publication; disk space is the only limiting factor; for example, a published dataset with 50 million records in DwC-A format is 3.6 GB.
  7. Standards-compliant publishing: publishes a dataset in Darwin Core Archive (DwC-A) format, a compressed set of files based on the Darwin Core terms, and the GBIF metadata profile based on the Ecological Metadata Language standard.
  8. The tool is supported by good documentation and mailing list; the User Manual is also available in both English and Spanish.

Cons:

  1. The IPT lacks built-in data validation. Since the IPT is designed to run effectively on a common computer, validating extremely large data sets (+100 million records) becomes an impractical operation. GBIF has been working with its partners, however, to provide pluggable remote validation services on performant data architecture to fill this gap.
  2. The IPT depends on server administrators to backup its data. There are plans to address this problem by adding long-term data storage and redundancy to the IPT this year.

 

IPT Trainings


The EU BON Training program is dedicated to data and metadata integration strategies (including registration of data with GEOSS), use of standards, and use of data tools developed or adopted by EU BON. During life span of the project five training events were organized and the training materials are available online for consultation.

GBIF has extensive training resourses on the installation and use of the IPT. Please consult their resource and training page for further information. GBIF operates through a network of national nodes which  coordinate the data management within the participant country and can support you on different aspects of data publishing (technical assistance including data hosting, tools, data standards, and personal assistance at data publishing workflow). Usually the nodes have a range of capacity enhancement activities, including mentoring and training.

GBIF recently announced BID (Biodiversity Information for Development) program aiming to increase the amount of biodiversity information available in the ‘ACP’ nations of sub-Saharan Africa, the Caribbean and the Pacific. The program will support capacity enhancement activities and projects to mobilize biodiversity data in these regions.

Additional resources: