Querying Compara¶
The Ensembl compara database is represented by ensembldb3.compara.Compara
. This object provides a means for querying for relationships among genomes and obtaining multiple alignments.
Creating a compara instance¶
Instantiating Compara
requires the ensembl release, the series of species of interest and optionally an account (we also use our local account for speed). For the purpose of illustration we’ll use the human, chimpanzee and macaque genomes. The resulting object has a Genome
instance added as an attribute with name corresponding to the capitalised common name for each species.
>>> import os
>>> from ensembldb3 import HostAccount, Compara
>>> account = HostAccount(*os.environ['ENSEMBL_ACCOUNT'].split())
>>> compara = Compara(['human', 'chimp', 'macaque'], release=85, account=account)
>>> compara.Human
Genome(species='Homo sapiens'; release='85')
>>> compara.Chimp
Genome(species='Pan troglodytes'; release='85')
>>> compara.Macaque
Genome(species='Macaca mulatta'; release='85')
Note
Use Species.get_compara_name(species_name)
to see what the attribute name will be.
Get the species tree¶
This is accessed from the Compara
instance.
>>> tree = compara.get_species_tree(just_members=True)
>>> print(tree.ascii_art())
/-Pan troglodytes
/Homininae
-root----| \-Homo sapiens
|
\-Macaca mulatta
What alignment types are available¶
What alignments are available for the species chosen can be displayed printing the method_species_links
attribute of Compara
.
>>> print(compara.method_species_links)
Align Methods/Clades
===================================================================================================================
method_link_species_set_id method_link_id species_set_id align_method align_clade
-------------------------------------------------------------------------------------------------------------------
756 13 35886 EPO 8 primates EPO
780 13 36102 EPO 17 eutherian mammals EPO
781 14 36103 EPO_LOW_COVERAGE 39 eutherian mammals EPO_LOW_COVERAGE
788 10 36176 PECAN 23 amniota vertebrates Pecan
-------------------------------------------------------------------------------------------------------------------
Note
Any queries on this instance of compara will only return results for the indicated species. If you want to query about other species, create another instance.
Get a syntenic region¶
Get genomic alignment for the BRCA2 gene region. We can specify the alignment set we want the data from using align_method
and align_clade
. We then use the get_alignment()
method. We can further identify feature types we want the sequences in the alignment to be annotated with.
>>> human_brca2 = compara.Human.get_gene_by_stableid(stableid='ENSG00000139618')
>>> regions = compara.get_syntenic_regions(region=human_brca2, align_method='EPO', align_clade='primate')
>>> for region in regions:
... print(region)
SyntenicRegions:
Coordinate(Human,chro...,13,32315473-32400266,1)
Coordinate(Chimp,chro...,13,31957346-32041418,-1)
Coordinate(Macaque,chro...,17,11686607-11779396,-1)
>>> aln = region.get_alignment(feature_types=["gene", "repeat"])
>>> aln
3 x 99492 dna alignment: Homo sapiens:chromosome:13:32315473-32400266:1[GGGCTTGTGGC...], Pan troglodytes:chromosome:13:31957346-32041418:1[GGGCTTGTGGC...], Macaca mulatta:chromosome:17:11686607-11779396:1[GGGCTTGTGGC...]
A Cogent3 annotated alignment object is returned. This can be queried to get annotations corresponding to specific features, or for masking those features, etc.. See the Cogent3 documentation for more information on using annotations.
>>> print(aln.get_annotations_from_any_seq('CDS'))
[CDS "ENST00000380152" at [1000:1067, 3676:3925, 10390:10499...
You can specify an equivalent query using the corresponding method_link_species_set_id
value.
>>> regions = compara.get_syntenic_regions(region=human_brca2, method_clade_id=756)
>>> for region in regions:
... print(region)
... # tree = region.get_species_tree()
... # print(tree.ascii_art())
SyntenicRegions:
Coordinate(Human,chro...,13,32315473-32400266,1)
Coordinate(Chimp,chro...,13,31957346-32041418,-1)
Coordinate(Macaque,chro...,17,11686607-11779396,-1)