Querying Variation¶
Variation databases are attributes of a species Genome
, so we create an instance of the human genome for our examples.
>>> import os
>>> from ensembldb3 import HostAccount, Genome
>>> account = HostAccount(*os.environ['ENSEMBL_ACCOUNT'].split())
>>> human = Genome(species='human', release=85, account=account)
What variation effects are there?¶
There are distinct effect types stored in Ensembl which can be discovered from the genome,
>>> print(human.get_distinct('effect'))
['3_prime_UTR_variant', 'missense_variant', 'non_coding_transcript_exon_variant',...
Note
What we term effect
, Ensembl terms consequence. We use effect
because it’s shorter.
Get variants by effect¶
Genetic variants are represented by the region type class Variation
. We query the genome for a specific effect type, using tyhe limit argument in the interests of speed.
>>> snps = human.get_variation(effect='missense_variant', limit=3)
>>> for snp in snps:
... print(snp)
...
Variation(symbol='rs2853516'; effect=['downstream_gene_variant', 'missense_variant', 'upstream_gene_variant']; alleles='G/A')
Variation(symbol='rs28416101'; effect=['downstream_gene_variant', 'missense_variant', 'upstream_gene_variant']; alleles='T/G')
Variation(symbol='rs201969351'; effect=['downstream_gene_variant', 'missense_variant', 'upstream_gene_variant']; alleles='T/C')
Get variantion from a genomic region¶
We can also use a slightly more involved query to find all variants within the gene of a specific type. (Of course, you could also simply iterate over the variants
attribute to grab these out too.)
>>> region = human.get_region(coord_name='16', start=1000000, end=2000000)
>>> snps = human.get_features(feature_types='variation',
... region=region, limit=2)
>>> for snp in snps:
... print(snp)
...
Variation(symbol='rs560025441'; effect=['upstream_gene_variant', 'non_coding_transcript_variant', 'intron_variant']; alleles='T/C')
Variation(symbol='rs773514182'; effect=['upstream_gene_variant', 'non_coding_transcript_variant', 'intron_variant']; alleles='A/T')
Get variants from a gene¶
Gene’s have a variation
attribute.
>>> brca2 = human.get_gene_by_stableid(stableid='ENSG00000139618')
>>> for snp in brca2.variants:
... if snp.peptide_alleles is not None:
... break # so we can demonstrate stuff below
Attributes of Variation
¶
>>> snp.location, snp.num_alleles, snp.alleles
(Coordinate(Human,chro...,13,32316464-32316465,1), 2, 'C/T')
They can also have peptide alleles and position of the polymorphism in the translated peptide.
>>> print(snp.peptide_alleles, snp.translation_location)
P/L 1...
Note
If a variant does not affect protein coding sequence (either it’s not exonic or it’s a synonymous variant) then these properties have the value None
.
They have length, so single nucleotide polymorphisms have alength of 1. (The length is the length of it’s longest allele.)
>>> len(snp)
1
Flanking sequence of a variant¶
Variation
objects have a seq
attribute which in the case of a SNP is a single nucleotide long and should correspond to one of the alleles. They also have a flanking_seq
attribute. This property is a tuple with the 0th entry being the 5’- 300 nucleotides and the 1st entry being the 3’ nucleotides.
>>> print(snp.flanking_seq[0]) # 5' flank
TGATTGAAACT...
>>> print(snp.flanking_seq[1]) # 3' flank
TATTGGATCCA...
Note
The flanking sequence is only returned when the SNPs flank matches reference.
Getting allele frequencies¶
Variation
objects have an allele frequency table.
>>> snp = list(human.get_variation(symbol="rs55880202"))[0]
>>> print(snp.allele_freqs)
=================================
allele freq population_id
---------------------------------
C 5474
T 5474
C 0.9661 11888
T 0.0339 11888
---------------------------------
Validation status¶
>>> snp.validation
{'Phenotype_or_Disease', 'Frequency', '1000Genomes'}