Skip to main content

Frequently Asked Questions - Illumina

 

Frequently Asked Questions – Proposal and Sample Submission

Illumina Proposal and sample submission questions

What services does the Sequencing Facility provide?
Who can order services through the Sequencing Facility?
How do I submit a sequencing proposal?
How do I submit samples?
What are the quality/quantity requirements for submitted samples?
What happens after my sample is submitted?

CCR-SF Bioinformatics questions:

What type of Bioinformatic services does the CCR-SF Bioinformatics group offer?
What types of analysis does the CCR-SF Bioinformatics group perform?
Sequencing depth and experimental design questions?
What is required to assure timely processing and delivery of my data?
What type of analysis workflows does the CCR-SF use to perform analyses?
What types of file data/formats will I receive from SF?
How do I analyze the data?
How are the data files delivered?
How large are the data delivery files?
How long is the data made available to download?
How do I obtain a LIMS account and submit an order in the CCR-SF LIMS?
What is the yield per lane for different sequencing platforms?

 

Resources

(top)

 

What services does the Sequencing Facility provide?

Please see the services page for a detailed list of projects we support. If your project design is not listed, please contact ccrsfhelp@mail.nih.gov, the sequencing facility director, to discuss the feasibility of a custom project.

 

Who can order services through the Sequencing Facility?

All NIH research labs are eligible to order services through the Sequencing Facility.

How do I submit a sequencing proposal?

Please complete a sequencing proposal form at https://ostr.cancer.gov/sequence/request and submit it to the steering committee for approval. You may also contract Bao Tran to discuss the available platforms and best choice for your project.

How do I submit samples?

Before submitting samples, ensure that the sequencing proposal has been approved and the CSAS request submitted. Our service is listed under: NCI-CCR Sequencing Facility (Illumina) http://ncifrederick.cancer.gov/rtp/csas/requestor/

You may then submit samples by delivering them to ccrsfhelp@mail.nih.gov, at the ATRF Room D3040 (instructions for sample delivery link). Be sure to include a sample manifest (link) with your submission.

What are the quality requirements for submitted samples?

 

All samples must be sent in a 1.5 ml or 2 ml tubes.

Type of Library

Minimum DNA/RNA Requirement for Library Construction

Recommended  DNA/RNA  for Optimal Library Construction

Maximum Sample Volume Requirement for Library Construction

Minimum DNA/RNA Quality Requirement for Library Construction

ChIP DNA Sequencing

10 ng

20 ng

30 μL

DNA should be as intact as possible with no contamination.

gDNA Sequencing

1 μg

3 μg

30 μL

DNA should be as intact as possible with no contamination.

mRNA Sequencing

100 ng

1 μg

50 μL

RIN should be at least 8.0

miRNA Sequencing

1 μg

2 μg

5 μL

RIN should be at least 8.0

Total RNA Sequencing

100 ng

1 μg

30 μL

RIN should be at least 8.0.

What happens after my sample is submitted?

Before sequencing, we will perform an internal QC to confirm the information in the sample manifest and notify you if any samples do not meet minimum sequencing requirements. You will then be able to choose whether to resubmit those samples or continue and sequence them. You will be notified again when the analysis on each sample is completed and available for download.

How do I check the status of my order/sample?

You may be able to estimate the stage of your samples using our general processing flowchart. For more specific information, please contact us at ccrsfhelp@mail.nih.gov.                

(top)

 

What types of bioinformatics services does the CCR-SF Bioinformatics group offer?

Here at CCR-SF, our mission is to provide the highest quality of sequencing data to our customers.  We work closely with investigators to help get their NGS projects off the ground, the services we provide including,

  • Provide experimental design consultation including sequencing technology recommendation, library protocol consultation, sequencing coverage and cost estimate, etc.
  • Perform QC, secondary and tertiary data analysis for sequencing data from different platforms, including Illumina, PacBio, Oxford Nanopore, and BioNano
  • Develop robust and reproducible analysis workflows/pipelines based on application types and sequencing technologies
  • Support adaptive new sequencing protocols and new technology development.
  • Provide training to customers for NGS technology and data analysis.

New services:

  • Single Cell Analysis – support both whole transcriptome, and 3’ and 5’ capture based technologies such as 10x Genomics, and BD Rhapsody scRNAseq . Analysis support includes cell subpopulation identification, differential analysis cross conditions.Single-cell genomic analysis and epigenetic markers detection.
  • Structural Variations Detection and Genome Assembly – utilize the long reads technologies such as PacBio, Oxford Nanopore or 10x genomics link read technologies to detect genetic variations or rearrangements in the structure of chromosomes.
  • Full Length Transcriptome Analysis – utilize PacBio Iso-ˇseq for full length transcripts and novel splice variants discoveries.
  • Coming soon: direct RNA-sequencing and analysis using Oxford Nanopore technology.
 

What analyses does the CCR-SF Bioinformatics group perform?

Currently we offer primary and secondary analyses for all NGS projects, including initial base-calling, demultiplexing, data quality control, and reference genome alignment of NGS reads.  ­­­­­We also offer tertiary analyses on a limited basis for certain R&D projects, which may include de novo assembly, structural variant analysis, isoform detection, and single cell analysis. For all projects, we insure that every sequence run we deliver meets our high standards  for yield, base-call quality, base alignment percentage, and application specific standard metrics that we established.

 

Sequencing depth and experimental design questions?

Coverage requirements vary by application, library protocol, sequencing platform, and project specific considerations. In order to provide the best approach for your project, a meeting is setup between you and representatives from our sequencing facility in order to make recommendations in sequencing platform, library protocol, and other needs.

For assistance in planning your experiment or to discuss specifics of your project please contact Bao Tran. For bioinformatics consultation, please contact Dr. Maggie Cam and Yongmei Zhao. Please refer to the following web links for experimental design of best practices:

RNA-seq Best Practices: https://bioinformatics.cancer.gov/content/rna-seq

ChIP-seq Best Practices: coming soon

Exome-seq Best Practices: coming soon

Whole Genome Sequencing and Structural Variation Detection Best Parctices: coming soon

Single Cell RNA-seq Best Parctices: coming soon

 

What is required to assure timely processing and delivery of my data?

We recommend an initial consultation with the CCR-SF Bioinformatics group in order to discuss data analysis requirements and to establish expectations. It is also important to specify the reference genome version and annotation build for projects with human or mouse genome mapping requirements. For other reference-based sequencing projects, you will need to provide us with the reference sequences (FASTQ file format or weblink).

If you have any questions regarding the reference sequences or your preferred data processing options, please contact CCR-SF Bioinformatics group via email at CCRSF_IFX@nih.gov or PacBioInfo@mail.nih.gov.

 

What types of analysis workflows does the CCR-SF use to perform analyses?

We currently provide analyses based on sequencing application type. We have designed and implemented in-house data analysis pipelines that integrate platform/vendor specific data analysis tools with popular open-source software.

Currently available custom data analysis pipelines:

1.   ChIP-seq
2.   Exome-seq
3.   Whole Genome Sequencing
4.   RNA-seq
5.   PacBio ISO-seq
6.   PacBio Assembly
7.   Large Structural Variants Detection
8.   Single Cell RNA-seq Analysis

 

 

What types of data formats will I receive from CCR-SF?

For projects using the Illumina sequencing platform, you will receive the pass-filtered raw sequence reads in FASTQ format and the reference alignment data in BAM format.  BAM files contain base-call and quality score information for all pass-filtered reads, as well as alignment information for reads that have mapped to the reference genome. We also deliver a PDF report containing a summary of the sequencing project (i.e. library and sequencing protocols, sequencing result summary, application-based QC metrics, and software details) and an excel file containing the detailed data analysis results.  Depending on the application, other QC reports such as pre-alignment and post-alignment QC PDF reports are provided as well.

For projects using the PacBio sequencing platform, the data delivery choice is driven by the specific needs of the project. For example, when circular consensus processing is performed, the raw polymerase reads and the consensus reads are included in the data package as FASTA files. If alignment and variant calling are performed, the resulting data are provided within BAM and VCF files. There are also files containing the intermediate results of pipeline processing (such as the read-to-cluster mapping for IsoSeq) are sometimes included. Beyond that, we are happy to deliver any of the files produced by our processing upon request. The content of the data delivery package should be discussed at project definition time.

For standard projects, the deliverable data file types are:

•       Sequencing FASTQ/FASTA files
•       Alignment BAM files or assembly files
•       Data QC statistics reports
•       Mapping or assembly statistics
•       Single cell gene matrix, clustering results, and analysis statistics (for single cell sequencing only)
 

For R&D projects, the deliverable data file types are:

o   Iso-Seq

·       Raw polymerase reads, consensus reads, polished cluster reads in FASTA format
·       Alignment BAM files
·       Annotation files for the mapped Iso-seq cluster reads
·       Statistics reports
·       ClusterView plot

o   De novo Assembly:

·       Consensus call fasta file
·       Assembly statistics reports
·       Base modification reports

o   Exome-seq or Structural Variants Discovery

·       Raw fastq  files
·       Alignment BAM files
·       SNP/Indel variant call VCF files
·       Structural variant call BED file
·       Variant annotation files
·       Variant analysis statistics reports

o   Single Cell RNA-seq:

·       Raw fastq  files
·       Alignment BAM files
·       Gene matrix
·       Clustering results and statistics reports
 

How do I analyze the data?

The SF typically provides primary and secondary data analysis, which includes delivery of the FASTQ pass-filtered raw read files and alignment BAM files to the customer.  Investigators are expected to provide for their own tertiary or downstream analyses not offered by the SF bioinformatics group.  For investigators interested in performing their own bioinformatics in-house, there are several commercial software options from Illumina, PacBio, and third party vendors.  In addition, a large number of open-source NGS software tools are freely available from Biowulf and other online computing sources.

For investigators interested in need of assistance for downstream NGS data analyses, the CCR Collaborative Bioinformatics Resource (CCBR) provides expert bioinformatics data analysis for the Center for Cancer Research at the NCI free of charge.  To contact the CCBR, please submit a request through the CCBR Project Submission Form.

RNA-seq Best Practices:  https://bioinformatics.cancer.gov/content/rna-seq

ChIP-seq Best Practices: coming soon
Exome-seq Best Practices: coming soon
Whole genome Best Practices: coming soon
Single cell RNA-seq Best Practices: coming soon
 

How large are the data delivery files?

Because NGS sequencing is still a rapidly evolving field, this answer changes regularly. Please contact the bioinformatics group for current data delivery file size information.

How are the data files delivered?

The original sequence, alignment, and analysis files are available to download through ftp or a web link. We recommend registering an account at GlobusFTP (https://www.globus.org/) in order to transfer data via the GlobusFTP site. You will need to register an account for each lab member planning to log in. Please see the following tutorial on registering an account and transferring data:
https://helix.nih.gov/Documentation/globus.html

If you have any issues setting up a Globus account or transferring data via the shared endpoint, please contact us.
Data delivery via hard drive is no longer available. For labs with the appropriate resources, we can quickly and directly transfer the files via ftp. Please contact the bioinformatics group to discuss your options.

 
How long is the data made available to download?

We make data available for up to 2 weeks starting from the date of our data delivery email announcement.  It is the responsibility of the investigator, laboratory contact, or bioinformatics contact to ensure that they have retrieved their data within the given time frame. To maintain sufficient data storage for upcoming projects, the analysis files are then archived and stored for an additional four weeks.

If your data is no longer available for download, please contact the SF bioinformatics group and we can re-run the data processing and alignment as necessary. However, please note that it may take longer to receive the re-analyzed data due to resource conflicts with current production runs. Whenever possible, it is best to download the data in a timely manner after receipt of the delivery notice.

 

How do I obtain a LIMS account and submit an order in the CCR-SF LIMS

Instructions on how to initiate the LIMS account set-up process for your group are available in the LIMS user guide , as well as instructions on how to submit an order once your account is authorized by CCR-SF.

In order to have your account authorized, the group PI should email CCR-SF (CCRSF_IFX@nih.gov) with a list of group members requiring LIMS account access after successful completion of the account creation steps in the LIMS user guide.

 

What is the yield per lane for different sequencing platforms?

Platform     NextSeq    
Application Type Chemistry Version Read Length Total PF Reads(Million) Total Yield (Gb) %>=Q30 (PF)
ChipSeq NextSeq High Output Kit 1x75 300 - 400 20 - 35Gb >80%
mRNA NextSeq High Output Kit 2x150 600 - 800 100 - 120Gb >75%
gDNA NextSeq High Output Kit 2x150 600 - 800 100 - 120Gb >75%
PCR- product NextSeq High Output Kit 1x75 300 - 400 20 - 30Gb >80%
Other NextSeq High Output Kit 2x75 600 - 800 20 - 30Gb >80%

 

Platform     HiSeq2500  
Application Type Chemistry Version Read Length Total PF Reads (Million) Total Yield (Gb) %>=Q30 (PF)
mRNA V3 2x100 300 - 400 30 - 40Gb >80%
Exome V3 2x100 300 - 420 30 - 42Gb >80%
mRNA V4 2x126 300 - 440 36 - 55Gb >80%
Exome V4 2X126 350 - 450 42 - 56Gb >80%
gDNA V4 2x126 350 - 500 42 - 62Gb >80%
Other V4 2x126 300 - 450 36 - 56Gb >80%

 

Platform     HiSeq3000/HiSeq4000  
Application Type Chemistry Version Read Length Total PF Reads (Million) Total Yield (Gb) %>=Q30 (PF)
mRNA SBS 2x75 450 - 600 38 - 45Gb >80%
Exome SBS 2x150 480 - 620 80 - 90Gb >80%
gDNA SBS 2x150 480 - 620 80 - 90Gb >80%

 

Platform   NovaSeq6000 per Flowcell  
Flowcell Type S1 (Estimated yields of Gb) S2 (Estimated yields of Gb) S4 (Estimated yields of Gb)
2 x 50 bases 134 - 167 333 - 417 NA
2 x 100 bases 266 - 333 667 - 833 NA
2 x 150 bases 400 - 500 1000 - 1250 2400 - 3000

For further questions please contact us directly  CCRSF_IFX@nih.gov

For ordering and pricing information, head back to Resources.

 

(top)

 

Go to top