The GBIF IPT (Integrated Publishing Toolkit) is an open source software widely used to publish and share biodiversity datasets on the GBIF network and related networks. It uses the standards Darwin Core (DwC) and Ecological Metadata Language (EML). The IPT v2.3 is the latest release (September 10th 2015) and has been developed together with the EU BON. It has multilingual user interface and a very extensive supporting documentation.
You will find a lot of detailed information on what is IPT, how and why it was developed in this PLOS article and on GBIF website. Here we summarize the most important arguments to use the IPT for publishing your data, give a short introduction on its use, and support it by necessary and important links to external resources.
The IPT allows you to publish different types of biodiversity data:
- Observation data
- Occurrences data from the literature
- Natural history collections
- Sample-based data (data from monitoring programs, citizen science initiatives, ecological/environmental study)
It has Integrated metadata editor for publishing dataset level metadata.Any information you provide will be visible on the resource homepage and bundled together with your data when you publish. Metadata are expressed in the GBIF EML Profile standard and can also be downloaded as a Rich Text Format (RTF) file. The latter can serve as a draft manuscript describing the dataset (a “Data Paper“), which can be submitted for example for peer-review to a Pensoft open-access journal.
Internationalisation: user interface is available in six different languages: English, French, Spanish, Traditional Chinese, Brazilian Portuguese, Japanese; instructions are available for translating the interface.
Data security: controls access to data sets using three levels of dataset visibility: private, public and registered; controls which users can modify data sets, with four types of user roles.
Integration with GBIF Registry: can automatically register data sets in the GBIF Registry; registration enables global discovery of data sets in both the GBIF Registry, and GBIF Data Portal.
Support for large data sets: can process ~500,000 records/minute during publication; disk space is the only limiting factor; for example, a published dataset with 50 million records in DwC-A format is 3.6 GB.
Standards-compliant publishing: publishes a dataset in Darwin Core Archive (DwC-A) format, a compressed set of files based on the Darwin Core terms, and the GBIF metadata profile based on the Ecological Metadata Language standard.
The IPT is capable of automatically connecting with either DataCite or EZID to assign DOIs to datasets. This feature makes biodiversity data easier to access on the Web and facilitates tracking its re-use.
The tool is supported by good documentation and mailing list; the User Manual is also available in both English and Spanish (see Docimentation and Media sub-divisions).
To use the IPT tool you have to be registered at one or another IPT instance (recommended way), including EU BON IPT or to install the IPT on your own computer or sever. Please consult Documentation and Media for installation support.
The data publishing workflow using the IPT includes several steps the data provider has to follow to be able to publish his data to GBIF, as graphically representated by Canadensys:
Create your resource on the IPT: this step simply means giving a short name to it and choosing a resource type: Event, Occurrence, Checklist or Metadata-only.
Upload source file: data can be in text file format (e.g. .txt, .tab, .csv), Excel, or in a database. You can also upload an existing Darwin Core Archive (DwC-A) or eml.xml file for the metadata coming from other repositories (e.g. LTER, TER-Europe or DataOne).
Add metadata (or “data about other data”). Metadata are a description of content and context of content, using predefined attributes, aim at providing a brief data about the characteristics of a resource (e.g. ‘who, what, where, when, how and on what purpose’). Well-structured and complete metadata are essential for the potential users, in order to decide if the dataset fits their needs and purposes, for example for incorporating them into statistical and modelling tools. Thus this should be described as clearly and extensively as possible. On the Basic metadate page all fields are required. Metadata can be extended by adding information on taxonomy, location, sampling methods etc.
Map data to Darwin Core terms. Generally speaking the Data Mapping page allows a user to specify exactly how the data accessible through this IPT resource are to be configured based on the selected core/extension. The mapping process allows a data provider to specify the meaning of his data field in DwC terms. Darwin Core standard includes a list of defined terms and allows your data to be understood and used by anyone. It also allows an aggregator like GBIF to combine your data with other data.
Publish: publishing means that the IPT will now generate your data as Darwin Core, combine it with the metadata, validate data and package it as a standardized zip-file called a “Darwin Core Archive“. The DwC-A file can be shared (with colleagues, per email) or be uploaded to another IPT instance. The dataset is published privately, which means only managers can view and edit it. If you change the visibility to Public it will be available to everyone. It is now listed on the repository homepage and you is accessible via internet.
Register with GBIF: it allows your data to become available to an international audience via the GBIF data portal and it ensures full attribution is given to your institution. Notice, that your institution has to be registered with GBIF, otherwise you will get a warning message.
For the detailed instructions on every step of publishing process please follow the IPT manual and consult additional documentation and media listed below.
- GBIF user manual (available in English and Spanish)
- IPT Wiki provides you with IPT User Manual and a variety of other valuable resources including FAQ.
- Canadensys user guide: a simplidied version of GBIF user guide
- A primer on publishing sample data using the GBIF IPT
- Introduction and context to the tool (PLOS One article)
- Data papers: what is data paper, why publish a data paper and how.
- Other resources on GBIF site.
- Online video to learn how to install the IPT.
- Installation webinar.
- Online video tutorial on how to publish occurrence data using the IPT.
- Training material from EU BON trainings.
- Training material from GBIF training event in Madagascar (2015) (in English and French).
- Document map to publishing occurrence data: iconographic guide illustrating publishing process for occurrence data.
- Document map to publishing checklists: iconographic guide illustrating publishing process for checklists.
- Document map to publishing metadata: iconographic guide illustrating publishing process for metadata.
The IPT is a community-driven tool and users are welcome to improve the tool, share their feedback and experience!
- To report the bugs, technical issues - use a bug/feature request tracker.
- To follow IPT developments, discussions and feedback of IPT users subscribe to the IPT mailing list.
The core development of the IPT happens at the GBIF Secretariat, but the coding, documentation, and internationalization are a community effort.
For technical issues relating to publishing or use of data, email the GBIF Helpdesk.