This document provides guidance and information about:
1. How Do I Sign Up for an Apollo Instance?
Only project administrators can add or remove user accounts.
To add a user account:
- Log into the Apollo Instance via the Apollo browser interface using your credentials
- Click on the “User” tab
- Click on “+ Create User”
- Click on the “Detail” tab and add the following information:
- First Name
- Last Name
- Role: Choose from “user” or “admin”
- Click on the “Groups” tab
- Assign the user to a group (optional)
- Assign the user permissions for an organism. This can only be done after the organism has been added to the Apollo Instance
2. What Data Files are Required to Create Genome Tracks in Apollo?
This section provides a list of the available file formats that can be used with Apollo.
Table 1. Required Data Files for Genome Track Creation.
|Fasta||Genome Reference File||Mandatory|
|Fai||Genome Reference Index File||Mandatory|
|Gff3||Genome Annotation File||Mandatory|
|Gff3.gz||Genome Annotation Compressed File||Mandatory|
|2bit||Genomic Sequence in Binary File||Optional|
|Bai||Alignment Index File||Optional|
|Vcf||Variant Calling File||Optional|
|Vcf.gz||Variant Calling Index File||Optional|
|bw||A compressed, indexed, binary format of genome-wide signal data||Optional|
3. How do I Prepare My Data Prior to Upload?
Apollo is a Genome Browser. It allows you to visualise evidence from analyses on a genome, create and edit genes, and create and edit annotations on those genes. To allow you to view the genome in your Apollo Instance, the reference genome in FASTA file format and the genome annotation in GFF file format are required. The reference genome can be obtained from the public repository or generated from any genome assembly workflow. The genome annotation describes genes and other features of DNA, RNA and protein sequences. The genome annotation can be generated using any genome annotation workflow. For the evidence files such as alignment file (BAM), variant file (VCF) and genome coverage file (Bigwig), they should be prepared using the appropriate workflows before uploading into your Apollo Instance.
This section outlines the steps required to prepare data files to create a genome track in Apollo. In order to create tracks in your Apollo Instance you must provide the 'mandatory' files listed in Table 1:
- Organism files:
- Genome reference file (fasta, fa, fna)
- Genome reference indexing file (fai). It can be yielded from running the following command in the Samtools suite of programs:
>samtools faidx [species.fasta]
- Genome sequence file in binary format (2bit). The 2bit file can be created using the following command in the faToTwoBit program:
>faToTwoBit [species.fasta] [species.2bit]
- Genome annotation files:
- Genome annotation file (gff3) generated from the annotation process.
- Genome annotation compressed file (gz). This file can be compressed by running the following commands in the Samtools suite of programs:
>tabix -p gff [species.sorted.gff3.gz]
- Evidence track files:
- Alignment file (bam)
- Alignment index file (bam). The index file can be yield from running the following commands in the Samtools suite of programs:
>samtools index [species.bam]
- Variant files:
- Variant calling file (vcf) generated from the variant calling process.
- Variant calling index file (vcf.gz). The variant calling index file of the variant calling file. The index file can be created running the following commands in the Samtools suite of programs:
>tabix -p vcf [species.vcf.gz]
- BigWig files are compressed, indexed, binary format of genome-wide signal data. They can be created by running the following commands in the bedtools suite of programs:
>bedtools genomecov -ibam -bg -split -strand + -I [species.bam] -g Chrominfo.txt > [species.plus.bedGraph]
>bedtools genomecov -ibam -bg -split -strand - -I [species.bam] -g Chrominfo.txt > [species.minus.bedGraph]
>bedGraphToBigWig [species.plus.bedGraph] Chrominfo.txt [species.plus.bw]
>bedGraphToBigWig [species.minus.bedGraph] Chrominfo.txt [species.minus.bw]
4. How Do I Generate a Public-Key?
- The Project Administrator is required to share a public-key to securely transfer data into their Instance via the command line. For more information about public-key cryptography, you can view here.
- To create a public/private rsa key pair on your Unix/Linux/MacOS/Windows system, run the following commands on your local machine (the computer that you will be accessing the Apollo Instance from):
- After running the above command, a passphrase (essentially a password) will be requested. You can either leave this blank by hitting return/enter or create a passphrase to recall for future use of the key pair.
- The following command sets permissions for the .ssh/:
- The following command outputs your public key to the terminal and saves the public-key titled "firstname_lastname_key.pub" in ~/.ssh/:
- You can now copy the file to a convenient location. Choose one that is accessible to your mail client, such as the Desktop, by running the following command:
- Send the details of your public-key to the Apollo Service Team via return email or by sending an email to firstname.lastname@example.org with the subject header "Public-key for Apollo Instance". Please note, an automated response to this email will be signed off by one of our partners, the ARDC Support Team. Please also note that you should never give out the private-key (i.e. the firstname_lastname_key file). Following the creation of your Instance and the addition of your public-key, login credentials will be provided to you.
>ssh-keygen -t rsa -f ~/.ssh/firstname_lastname_key
>chmod 600 ~/.ssh/firstname_lastname_key
>cp ~/.ssh/firstname_lastname_key.pub ~/Desktop
5. How Do I Upload Data and Create a Genome Track via the Command Line?
Data to support your genome annotations can be uploaded via the command line or terminal on your local machine. This method is recommended for data files that exceed 2 GB. If your data files are less than 2 GB, you can also upload them via the Apollo (browser) interface. To ensure you can create the visual representation of your genome in the Apollo Instance, known as a genome track, there are a minimum number of evidence files that are required. The files that are 'mandatory' for genome track creation are listed in Table 1.
- To access your Instance, you will need to create a public-key to share with the Apollo Service Team for the creation of a login. If you have not yet received your login credentials, email your public-key file to the Apollo Service Team with the subject header "Public-key for Apollo Instance". Please note, an automated response to this email will be signed off by one of our partners, the ARDC Support Team.
- You can login to your Apollo Instance with your credentials with the SSH command as follows:
- For example, for the username, 'doe_user', with a host name, 'ocean.genome.edu.au', and a private key file, 'jane_doe_key', stored in their /.ssh/, they would use the following command to copy files from their local environment to their Apollo Instance:
- Once access has been granted, your prompt will indicate that you are in your apollo instance, e.g. username@apollo-0##:~$. Navigate to the folder ’apollo_data’. To do this, you can use the following commands:
- While in the folder ‘apollo_data’, create a new folder for your organism named ‘organism_A’ using the following command:
- Now you can transfer your data files into this folder. To copy a single file into your new folder, 'organism_A', navigate to a new terminal prompt connected to your local environment and run:
>scp -i ~/.ssh/jane_doe_key organism_A.fa email@example.com:/home/data/apollo_data/organism_A/
- To copy a complete directory of files in a folder, including subfolders and their files to the 'organism_A' folder you can use:
>scp -i ~/.ssh/jane_doe_key -r organism_A_files/ firstname.lastname@example.org:/home/data/apollo_data/organism_A/
- Navigate into the folder named ‘organism_A’ and make sure the genome reference file (fasta and faidx) and the annotation file (gff3 and gff3.gz) are in this folder.
- To ensure the tool/command is available in the environment you are in, you can run:
- Prepare the config file with the following commands:
>prepare-refseqs.pl –fasta [organism_A.fa]
>flatfile-to-json.pl –tracklabel [organism_A] –key “organism_A” –gff organism_A.gff3 –trackType CanvasFeatures –type CDS –autocomplete all
>ln -s data/tracks tracks
>ln -s data/seq seq
>ln -s data/trackList.json trackList.json
>ln -s data/names names
- You can follow the instructions on page 7 in this document to complete the creation of your reference genome. These are:
1. Navigate to your Instance via the browser interface and log in
2. Go to 'Organism' tab in the bottom right-hand-side of the 'Annotator Panel'
3. Select (+) 'Add New Organism'
4. Fill out details, i.e. Name, Genus, Species, Directory, etc.
5. Click on 'Create Organism'
6. At the top of the annotation panel, select the 'Tracks' tab for your organism of interest. You should now see genome tracks in the visualisation window.
- For further instructions about configuring track data files, you can read through the official Apollo documentation of data loading.
> ssh -i /path/to/your/private_key_file email@example.com
> ssh -i ~/.ssh/jane_doe_key firstname.lastname@example.org
6. How Do I Upload Data and Create a Genome Track via the Apollo (browser) Interface?
This method is recommended for data files that are less than 2 GB, if your data files exceed 2 GB, you can upload them via the command line. To ensure you can create the visual representation of your genome in the Apollo Instance, known as a genome track, there are a minimum number of evidence files that are required. The files that are 'mandatory' for genome track creation are listed in Table 1.
- Log into your Apollo Instance
- If the Apollo Instance is newly built, please follow the instructions below to:
- upload your reference genome
- upload your tracks once your reference genome upload is complete.
- In the 'Annotator Panel' to the right-hand side of the view and click on 'Organism' tab.
- Click on the "Upload New Organism" buttom at the bottom of the right-hand side panel
- Follow the instruction on the pop-up menu to upload your reference genome
- Click on "Upload" button when finished
- In the ‘Annotator Panel’ to the right-hand side of the view and click on ‘Tracks’ tab
- Click on “New Track” button
- Select Track Type
- Provide the information of the upload track (i.e GFF3)
- Click On "Upload" button when finished
- Log into your Apollo Instance
- In the ‘Annotator Panel’ to the right-hand side of the view and click on ‘Organism’ tab.
- Tick the ‘Public’ box at the bottom of the ‘Details’ tab.
- At the top of the 'Annotator Panel' above the tabs is a dropdown box to the right of a chainlink icon, select the organism from the dropdown and click on the chainlink icon.
- You should be offered the 'Public URL' in a new window. Right-click on 'Public URL' to select the link and copy the address to share the genome publicly.
- You can direct users to select the organism they wish to view by clicking on the options available on the left-hand side once accessing the link using a public URL. If only one organism is available, this organism will be shown.
7. How Do I Make My Genome Public?
A genome can be published via your Apollo Instance (browser) interface:
8. How Do I Manually Annotate My Genome in Apollo?
Detailed documentation on the use of Apollo for manual annotation can be found in the Apollo User’s Guide. See also tutorials on manual genome annotation and Apollo in the Training and Help Resources tab.