Welcome to the Euglena genome project.

Euglena are a genus of protist capable of both heterotrophy and photosynthesis (autotrophy). Euglena are also capable of phagocytosis, possess two flagella (only one of which is involved in locomotion), have a flagellar pocket-like organelle (the reservoir), are phototropic and exhibit rather unique ‘euglenoid’ movement when encountering a solid substrate. They are distantly related to the trypanosomatids within the Excavata supergroup. The plastid has been sequenced, and there are ~20k EST sequences in the database, but no genome sequencing effort. Given the potential importance of Euglenids in terms of taxonomic position and unique biology for understanding many aspects of protist and evolutionary cell biology, we initiated a sequencing project, primarily for gene discovery and comparative genomics. We are using a combination of Illumina and 454 sequencing, together with mapping of multiple transcriptome datasets to train the assembly for gene prediction. We are anticipating a limited release of data for annotation purposes in spring 2015.

Who’s involved: Mark C. Field, Steve Kelly, ThankGod Ebenezer, Mark, Carrington, Michael Lebert, Michael Ginger, Julius Lukes, Andrew Jackson, Joel Dacks, Bill Wickstead, and Harry De-Koning.

Strain being sequenced:Euglena gracilisZ, kindly given by William Martin (Düsseldorf). DNA isolated using method of Medina-Acosta and Cross (1993). There is a restricted access to the data, and are only available by invitation or specific request. If you use the data, we do ask that you please acknowledge the source as follows; “E. gracilis genome data obtained from the sequence project at“.

Draft genome assembly statistics

Parameter*                                        Euglena gracilis genome (draft)
# contigs                                                      257242
# contigs (>= 1000 bp)                            97509
Total length                                                 639399673
Total length (>= 1000 bp)                      564374092
Largest contig                                               246170
GC (%)                                                         50.15
N50                                                               11144

*All statistics are based on contigs of size >= 1 bp, unless otherwise noted (e.g., “# contigs (>= 0 bp)” and “Total length (>= 0 bp)” include all contigs).

Draft transcriptome assembly statistics

Parameter                                       Euglena gracilis transcriptome (draft)
n seqs                                                         176638
smallest                                                      98
largest                                                        11395
n bases                                                       84279051
mean len                                                   441.48
n over 1k                                                   17780
mean orf percent                                      75.76
N50                                                            796
gc                                                               59.12

