Sample and data archiving

Work-Package 1 description


More than 33,000 samples were collected. While some analyses were initiated aboard Tara, the vast majority of samples were directly sent to the different collaborating laboratories for further study. The high throughput imaging and sequencing set up in working groups nº3 and 4 generate enormous amounts of results, estimated to date at about 500 Tb of raw data that need to be conscientiously archived, linked to the related physico-chemical data and associated with the corresponding sampling station.



The objectives of this working group are to:

  • Develop and implement a system for archiving biological samples in order to secure their storage and use in the long term.
  • Develop and implement a comprehensive data warehouse to host, link, provide access and / or monitor the use of all types of data.




Image icon ian_probert.jpg
Ian Probert, coordinates the working group Sample and data archiving

This working group is coordinated by Ian Probert at the Roscoff Marine Station. It involves the UMR7144 and FR2424 teams of the Roscoff Marine Station, the Oceanographic Laboratory of Villefranche-sur-Mer, the Genomic & Structural Information laboratory (GSI) of Marseille, the European Molecular Biology Laboratory (EMBL) and Altran. As collaborative partners, the VIB Structural Biology Research Center and Pangaea also contribute to this project.


Samples archiving - Task 1.1

Samples collected during the Tara Oceans expedition were labeled with a bar code system and apportioned among our French, European and American partner laboratories (Roscoff Marine Station, Genoscope and Institut de Biologie de l’Ecole Normale Superieure - IBENS). To overcome an acute shortage of sample tracking (localization and use), OCEANOMICS aims at creating a system to enable their management.

The task is to define precisely:

  • different types of data associated with the samples
  • procedures and activities using these data
  • functions such a management system should provide
  • standards to assess the performances of  this system

Based on the results of this reflection, the management system was developed and set up at the Roscoff Marine Station. After an internal test phase, the resulting interface will be available to all partners and an evaluation process will be implemented to ensure the optimum evolution of the tool.


Core-databasing : building an integrated eco-morphs-genetic data repository - Task 1.2

The objective of the task is to produce an integrated data warehouse with a single entry point to access all the primary data resulting from the Tara Oceans expedition, the production of working groups nº2, 3 and 4 and modeling results produced by working group nº5. In most cases, the data will remain archived at the location they were produced and the tool developed through this task will enable to access them via sophisticated computing requests. A web interface will allow the user to navigate through the data warehouse and search its content.


Cross-disciplinary data mining and visualization for core databasing - Task 1.3

Intertwined with the task previously described, the work performed in this framework will provide the user with visualization and data analysis tools clustered in the warehouse to give easy access to primary and secondary data, as well as to the resulting ecosystem analysis. The strength of the Tara Oceans / OCEANOMICS dataset is its homogeneity, interdisciplinarity and in the different systemic levels involved: data cross-referencing will be possible on various parameters such as satellite data, gene sequences, imaging data (from viruses to zooplankton) and several types of biophysical, physical and chemical parameters for a single water column.

Comparison and analysis modules will be developed at several systemic levels:


1. Organisms:

  • Tools for analyzing meta-barcoding data
  • Tools for correlating images / genetic sequences, and for manual annotation


2. Genes:

  • Tools for analyzing metagenomic and metatranscriptomic data
  • Tools designed for projection of metabolic pathways and definition of relationships between gene expression and environmental parameters,
  • Search for similarities intended for sequence data exploration.


3. Ecosystems:

  • Tools for comparing and visualizing ecological parameters,
  • Tools for visualizing species / functions relationships
  • Tools for visualizing and exploiting inter-species interactions