Database Administration tools¶
We strongly encourage you to install a local copy of the Ensembl databases because running queries against a local installation is much, much faster! It will also alleviate the burden on Ensembl’s UK servers.
If you get MySQL installed locally with sufficient storage to host the collection of species you’re interested in, the administration tools we provide will make it relatively straightforward to administer the Ensembl databases and keep up-to-date with the Ensembl release cycle.
Installing MySQL¶
The bottom line is you need to install and configure MySQL yourself. Ensembl offers some instructions.
ensembldb3
command line tool¶
Install of ensembldb3
places a new executable ensembldb3
on your path. This tool provides a number of capabilities as illustrated on the command line:
$ ensembldb3
Usage: ensembldb3 [OPTIONS] COMMAND [ARGS]...
admin tools for an Ensembl MySQL installation
Options:
--help Show this message and exit.
Commands:
download download databases from Ensembl using rsync,...
drop drop databases from a MySQL server
exportrc exports the rc directory to the nominated...
install install ensembl databases into a MySQL server
show shows databases corresponding to release
status checks download/install status using...
ensembldb3 exportrc
¶
The command:
$ ensembldb3 exportrc -o /path/to/ensembldbrc
produces a directory ensembldbrc
containing 3 files that can be used by ensembldb3
:
- species.tsv
A tab delimited file with a species latin name and common name per line. This is used to define the common names that
ensembldb3.Species
uses for succinctly identifying species and their databases.- ensembldb_download.cfg
A config file with sections for remote path, local path, release and the species of interest. In the latter case, their common names are used as the section title. The databases are specified by a comma separated line as core, variation, otherfeatures. The compara database has the same section title as the db name. Here’s an example
[remote path] # required path=ftp.ensembl.org/ensembl/pub/ [local path] # required path=/tmp/ensembldb_download [release] # required release=85 [S.cerevisiae] db=core [Xenopus] db=core [Human] db=core,variation [compara] db=compara
- mysql.cfg
A config file with sections for
mysql
andmysqlimport
. The sections include the command (full path) to the executable, including any command arguments and the account settings (username, password).
If you wish to use the contents of this directory you can create an environment variable ENSEMBLDBRC=/path/to/ensembldbrc
.
Note
If ENSEMBLDBRC
is defined in your environment, the species.tsv
file within that directory will be used for all ensembldb3 applications.
ensembldb3 download
¶
This capability relies on the lftp client. This can be installed on linux using conventional package managers, and on MacOS using conda.
The command:
$ ensembldb3 download -c /path/to/edited/ensembldb_download.cfg -n 3
will download databases for the species specified in the ensembldb_download.cfg
config. The specific databases for the Ensembl release, remote and local paths must all be defined in that file (see ensembldb3 exportrc). The -n
option indicates the number of parallel processors to use for the download (maximum allowed is 5).
For the very large databases (e.g. compara or human variation) the download times can be very long. In which case we recommend, if running on a server, using the nohup
command.
ensembldb3 install
¶
The command:
$ ensembldb3 install -c /path/to/edited/ensembldb_download.cfg -m /path/to/mysql.cfg
installs databases specified in the ensembldb_download.cfg
config, into the mysql server specified by mysql.cfg
. If you wish to use multiple threads for importing the data, this is done via editing the --use-threads=4
in the mysql.cfg
file (see the example provided).
For the very large databases (e.g. compara or human variation) the install times can be very long. In which case we recommend, if running on a server, using the nohup
command.
ensembldb3 drop
¶
The command:
$ ensembldb3 drop -c /path/to/edited/ensembldb_download.cfg -m /path/to/mysql.cfg
will drop the databases specified in the ensembldb_download.cfg
from the mysql server specified by mysql.cfg
. You are required to confirm dropping listed databases.
ensembldb3 show
¶
The command:
$ ensembldb3 show --release 85 -m /path/to/mysql.cfg
will display all databases from release 85 on the mysql host in the server specified by mysql.cfg
.
ensembldb3 status
¶
The command:
$ ensembldb3 status -c /path/to/edited/ensembldb_download.cfg
will display the download/install status of the databases specified by ensembldb_download.cfg
. This command just checks whether ENSEMBLDB_DOWNLOADED
and ENSEMBLDB_INSTALLED
files exist.
Trouble shooting¶
Many of the administrative functions wrap shell commands. If you encounter any issues, use the verbose flag (-v
), causing shell commands to be printed to stdout. Then try the shell command directly to get all error messages.