Querying Compara

The Ensembl compara database is represented by ensembldb3.compara.Compara. This object provides a means for querying for relationships among genomes and obtaining multiple alignments.

Creating a compara instance

Instantiating Compara requires the ensembl release, the series of species of interest and optionally an account (we also use our local account for speed). For the purpose of illustration we’ll use the human, chimpanzee and macaque genomes. The resulting object has a Genome instance added as an attribute with name corresponding to the capitalised common name for each species.

>>> import os
>>> from ensembldb3 import HostAccount, Compara
>>> account = HostAccount(*os.environ['ENSEMBL_ACCOUNT'].split())
>>> compara = Compara(['human', 'chimp', 'macaque'], release=85, account=account)
>>> compara.Human
Genome(species='Homo sapiens'; release='85')
>>> compara.Chimp
Genome(species='Pan troglodytes'; release='85')
>>> compara.Macaque
Genome(species='Macaca mulatta'; release='85')

Note

Use Species.get_compara_name(species_name) to see what the attribute name will be.

Get the species tree

This is accessed from the Compara instance.

>>> tree = compara.get_species_tree(just_members=True)
>>> print(tree.ascii_art())
                    /-Pan troglodytes
          /Homininae
-root----|          \-Homo sapiens
         |
          \-Macaca mulatta

What alignment types are available

What alignments are available for the species chosen can be displayed printing the method_species_links attribute of Compara.

>>> print(compara.method_species_links)
Align Methods/Clades
===================================================================================================================
method_link_species_set_id  method_link_id  species_set_id      align_method                            align_clade
-------------------------------------------------------------------------------------------------------------------
                       756              13           35886               EPO                         8 primates EPO
                       780              13           36102               EPO               17 eutherian mammals EPO
                       781              14           36103  EPO_LOW_COVERAGE  39 eutherian mammals EPO_LOW_COVERAGE
                       788              10           36176             PECAN           23 amniota vertebrates Pecan
-------------------------------------------------------------------------------------------------------------------

Note

Any queries on this instance of compara will only return results for the indicated species. If you want to query about other species, create another instance.

Get a syntenic region

Get genomic alignment for the BRCA2 gene region. We can specify the alignment set we want the data from using align_method and align_clade. We then use the get_alignment() method. We can further identify feature types we want the sequences in the alignment to be annotated with.

>>> human_brca2 = compara.Human.get_gene_by_stableid(stableid='ENSG00000139618')
>>> regions = compara.get_syntenic_regions(region=human_brca2, align_method='EPO', align_clade='primate')
>>> for region in regions:
...     print(region)
SyntenicRegions:
  Coordinate(Human,chro...,13,32315473-32400266,1)
  Coordinate(Chimp,chro...,13,31957346-32041418,-1)
  Coordinate(Macaque,chro...,17,11686607-11779396,-1)
>>> aln = region.get_alignment(feature_types=["gene", "repeat"])
>>> aln
3 x 99492 dna alignment: Homo sapiens:chromosome:13:32315473-32400266:1[GGGCTTGTGGC...], Pan troglodytes:chromosome:13:31957346-32041418:1[GGGCTTGTGGC...], Macaca mulatta:chromosome:17:11686607-11779396:1[GGGCTTGTGGC...]

A Cogent3 annotated alignment object is returned. This can be queried to get annotations corresponding to specific features, or for masking those features, etc.. See the Cogent3 documentation for more information on using annotations.

>>> print(aln.get_annotations_from_any_seq('CDS'))
[CDS "ENST00000380152" at [1000:1067, 3676:3925, 10390:10499...

You can specify an equivalent query using the corresponding method_link_species_set_id value.

>>> regions = compara.get_syntenic_regions(region=human_brca2, method_clade_id=756)
>>> for region in regions:
...     print(region)
...     # tree = region.get_species_tree()
...     # print(tree.ascii_art())
SyntenicRegions:
  Coordinate(Human,chro...,13,32315473-32400266,1)
  Coordinate(Chimp,chro...,13,31957346-32041418,-1)
  Coordinate(Macaque,chro...,17,11686607-11779396,-1)