Preamble

EMPHASIS and its members are dedicated to data sharing, data persistence and data standards. Many of the EMPHASIS partners are involved in data standardization, plant ontology development and interoperability projects. This is exemplified by some of the participants being part of the ERA-CAPs working group on data standards (>>read more), or being a data cite entity being able to provide digital object identifiers (DOIs) and a recent joint publication on metadata and data handling in plant phenotyping. All partners are thus committed to make the best use of phenotyping data by sharing and disseminating the underlying data.

Data Standards

EMPHASIS due to its experience and out of its own interest to maximize collaboration across Europe and beyond, will make sure that data is i) discoverable and available in a machine readable format, ii) is tagged with metadata using standard terms and ontologies such that the experiment could be repeated. As EMPHASIS is dealing with plants and their reaction to the environment this comprises a description of the environment as well and is not limited to plant (organ) sampling strategy, sensors and evaluation. As laid down in our recommendation, EMPHASIS believes that minimal information standards already established are relevant and that these are necessary towards description of phenotyping even though these are mostly geared towards destructive phenotyping (MIAME, MIAPE, MSI, MIxS). That said we mostly consider plant specific extensions as relevant and necessary to describe any kind of phenotyping experiment, such as MIAME-Plant, CIMR and the MIxS plant environment package. To formalize this we have begun to work on minimal standards for plant phenotyping MIAPPE and link in necessary or (depending on experiment) optional ontologies to be used for a formalized and machine readable description of phenotyping experiments such as plant and crop ontology. Thus EMPHASIS will work together with these to also shape ontologies for further needs in phenotyping. Also through the EMPHASIS partner’s link into the international plant phenotyping network (IPPN), EMPHASIS will also oversee that standards development is kept abreast with emerging needs of the phenotyping community at the same time not jumping on the all bandwaggons, as to allow continuity and safeguard the merit of legacy data. As fluctuating standards provide no stability for the community that is needed to best profit from data sharing and reuse.

In addition to necessary metadata and ontologies, EMPHASIS will also further its work on suitable data formats for data sharing and will provide extensions to download data in e.g. ISA-Tab. Once again however EMPHASIS will keep abreast to developments and provide additional export/import facilities.

EMPHASIS will also refine APIs to access data using APIs (e.g. RESTful services) that allow retrieving and collecting the data, if the data is stored at the partners.

EMPHASIS emphasizes the need for data stewardship, i.e. also long term availability and potentially adding additional metadata and observations even after experiments have been conducted.

Data and Data storage

Within many phenotyping experiments one has to distinguish between the raw and the extracted data. In the simplest case the raw data are images and extracted data could e.g. be biomass. When talking about phenotyping experiments most often only the latter data is used for practical purposes in downstream applications. However in any case the raw data needs to be kept for later as well to allow reanalysis of the raw data using better algorithms. In any case the algorithm and software to extract data will be achieved as well for data reproducibility reasons. Within EMPHASIS forward looking strategies will be developed to discern whether original raw data is to be achieved or as in the case of e.g., sequencing experiments only evaluated raw data should be used for long term maintainance. 

Also -where available- EMPHASIS will upload data to centralized and established data storage platforms to benefit the community as a whole. This is particularly the case for Genomic and Transcriptomic experiments to be distributed via the European Bioinformatics Institute (EBI) where clear standards and requirements exist. As however no centralized phenotyping repository exists as of yet, the EMPHASIS partners will built on their data storage systems already established from own resources allowing the storage of several petabyte of data across the partners which is backed by professional data management and back up strategies. In addition the European Grid architecture will be used. Due to the common development of APIs and the commitment to make data traceable, sustainable and discoverable, EMPHASIS will thus safeguard data availability.

Collaborations with other relevant European infrastructure

The German nodes are partners within the German Network for Bioinformatics which is seeking collaboration with the ELIXIR network and this will safeguard that data standards are adhered to.

In addition FranceGrille (http://www.france-grilles.fr/) will be used by the French node thus tying into the EGI (European Grid Infrastructure).