“Next generation” DNA sequencing instruments provide a platform to survey the genome, transcriptome and cistrome at a higher resolution than can be obtained on most microarrays. The DFCI Microarray Core has recently purchased a SOLiD (Sequencing by Oligonucleotide Ligation and Detection) sequencer, and will soon begin offering services.
A library comprised of genomic DNA fragments, or comprised of cDNA – from fragmented mRNA, or from unfragmented, short RNA - is constructed. Each DNA or cDNA molecule within a library contains a synthetic adaptor ligated to each end. Each adaptor-flanked DNA or cDNA molecule is clonally amplified on a derivatized bead using an emulsion PCR. The beads containing the clonally amplified nucleic acids are deposited onto a slide. The slide is placed on the SOLiD instrument where the chemistry and imaging occur.
The chemistry is based on template-directed ligation of short, “dinucleotide-encoding”, 8-mer oligonucleotides. Dinucleotide-encoding permits discrimination of SNP’s from most chemistry and imaging errors, and subsequent in silico correction of those errors. Because the length of the sequencing reads is short (50 bases at this time), and because the error correction schema requires it, a reference genome sequence is necessary to analyze SOLiD data. This is a “resequencing” platform, not a platform for sequencing DNA from organisms whose genome sequence is unknown.
A brief, animated overview of the sample preparation, sequencing chemistry, and instrumentation can be found at the following URL:
An excellent webinar, “Fundamentals of 2 Base Encoding and Color Space”, can be found at the following URL:
The Application Note at the following link covers the same concepts that are presented in the webinar:
A white paper, “A Theoretical Understanding of 2 Base Color Codes and Its Application to Annotation, Error Detection and Error Correction” which presents a more detailed treatment of two-base encoding, can be found at the following link:
The DFCI Microarray Core has carried out pilots for a number of different applications. The pilots involve library construction, emulsion PCR, sequencing, and primary data analysis (color calls, quality assessment) and secondary data analysis (base calls, quality assessment). Application-specific tertiary data analysis is being carried out by our group or by colleagues.
To date, we have constructed and sequenced the following types of libraries: genomic fragment, genomic mate-pair, and bar-coded S.A.G.E. (Serial Analysis of Gene Expression), ChIP (Chromatin Immunoprecipitation), "small RNA" and "whole transcriptome" libraries.
An undivided slide can be used for deposition of a single library, or of mixed, “barcoded” libraries (Data from mixed, barcoded libraries can be organized into library-specific files once the sequencing reactions have been completed.). Alternatively, a slide can be divided into four or eight segments, and a separate library deposited onto each segment. In our hands, up to ~340 million “useable” beads can be deposited on an undivided slide – of this number, approximately 40%, depending on the application, usually map uniquely to the human or mouse genome. Two slides can be run concurrently.
As each service becomes available, information will appear in a “Libraries” section within the “Next Generation Sequencing” heading on our website.