Skip to content

Assembly with PacBio data and SMRT Portal

This tutorial will show you how to assemble a bacterial genome de novo, using the PacBio SMRT Portal on the mGVL. We will use an analysis pipeline called HGAP, the Hierarchical Genome Assembly Process.

Start

  • Open your mGVL dashboard.
  • Go to Admin. There is a list of packages. Find SMRT Analysis. On the right, click Install.
  • You should see SMRT Portal as one of the instance services on your GVL dashboard.
  • Open up the SMRT portal web link (to the right) and register/log on.

Input

  • Locate your PacBio data.
  • Load the PacBio data onto your GVL.
  • In the SMRT Portal, go to Design Job, the top left tab.
  • Go to Import and Manage.

smrt portal screenshot

  • Click Import SMRT cells.

smrt portal screenshot

  • Work out where you put the data on your GVL, and make sure the file path is showing.
    • If not, click Add and enter the file path to the data.
  • Click on the file path and then Scan to check for new data.

Assembly

  • Go back to the top left tab, Design Job.
  • Go to Create New.
  • An Analysis window should appear. Check the box next to De novo assembly, then Next.
  • Under Job Name enter a name.
  • Under Protocols choose RS_HGAP_Assembly.3.
  • There is an ellipsis underneath Protocols - click on the ellipsis.

smrt portal screenshot

This brings up the settings. Click on Assembly.

  • For Compute Minimum Seed Read Length: ensure box is ticked
  • For Number of Seed Read Chunks: enter 12
  • Change the Genome Size to an approximately correct size for the species.
  • For Target Coverage: enter 10
  • For Overlapper Error Rate: enter 0.04
  • Leave all other settings as they are.
  • Click Apply

Your protocol window should look like this:

smrt portal screenshot

  • Click Ok.

  • In the SMRT Cells Available window, select the file to be used. Click on the arrow to transfer these files to the SMRT Cells in Job window.

smrt portal screenshot

  • Click Save (bottom right hand side).
  • Next to Save, click Start.
  • The Monitor Jobs window should open.
    • As each step proceeds, new items will appear under the Reports and Data tabs on the left.

smrt portal screenshot

Output

  • Click on the top right tab, View Data. Double click on the job name to open its reports.
  • Click on different reports in the left hand panel.
  • Look at Assembly: Polished Assembly.
  • If there is only one contig, then this is the assembled genome. We will do further polishing in the next step.
  • If there are two or more contigs, one could be a plasmid, or the sample may require different assembly parameters, or new sequencing.

Polishing

During polishing, raw reads are used to correct the assembly.

  • From the previous step, Go to Data: Assembly: Polished Assembly: and download the FASTA file by clicking on it.
  • Unzip the .gz file
  • Go to Design Job, Import and Manage, (bottom right hand side button:) New, then select that FASTA assembly file to upload.
    • creates a new reference.
  • Design Job → Create New
  • choose reference-based
  • Select protocol: RS_Resequencing.1
  • Leave all settings.
  • Select SMRT cell (same cell as used in the first analysis)
  • Select your reference from the drop down menu.
  • Save.
  • Start.
  • When complete, see Reports.
    • Variants: how many found? if less than 2, does not need any more polishing.
    • If 2+ variants found, repeat the polishing step (including adding a new reference).

Next

Correct with Illumina reads