Reference oceanic genes, genomes & transcriptomes - Task 4.2
The genetic information contained in the majority of planktonic organisms maintained in culture remains unknown. In fact, the vast majority of marine planktonic organisms cannot be maintained in culture. Thus the greatest limit to the above approaches is the lack of available reference sequences.
The OCEANOMICS team aims to generate new reference sequences that will facilitate the analysis of information generated by metagenetic approaches.
The first analyses of metabarcoding showed that a significant number of these markers could not be assigned to any organism identified in the databases. Surprisingly, the presence of markers of unknown species is sometimes considerable. To reveal the essential nature of this important unknown biodiversity, OCEANOMICS identifies the most abundant sequences that are associated with identified taxa and search copies of these sequences in the genomes and transcriptomes sequenced for better taxonomic characterization of biodiversity poorly described. In some cases, this information will be related to morphological knowledge generated by the working group 3.
Protists - Reference Transcriptomes
The size of protist genomes can be significantly larger than the human genome. This could be a major obstacle to the metagenetic study of ocean communities. A transcriptomic approach should help overcome this obstacle since the number of genes is rather stable from one eukaryotic species to another (about 10,000 genes per species). Furthermore, mRNA sequencing prevents contamination from prokaryotic nucleic acids, and the translation of these types of sequences can be used directly for searches of similarity, more sensitive at the protein level.
In order to improve the interpretation of metatranscriptomic datasets, OCEANOMICS aims to increase the number of reference transcriptomes for protists. This is done through the use of cultured strains in different collections (including the Roscoff Culture Collection), or isolated cells identified in freshly collected plankton samples. In this context, OCEANOMICS expects to generate approximately 250 new reference transcriptomes of phylogenetic and/or environmentally-friendly interest. The selection of organisms is guided by data obtained via metabarcoding and metagenomics.
For eukaryotes less than 20µm in size, single-cell amplified genome sequencing (SAGs) is used.
During the Tara Oceans expedition, organisms included in the smaller size fractions were cryopreserved for this purpose. Cells are isolated by flow cytometry. Their genome is then amplified, barcoded for phylogenetic assignment, and fully sequenced if necessary. A pipeline of bioinformatics tools has been developed for the annotation and analysis of these SAGs.