Tong Hao
for any questions regarding the database
David Hill
for any questions regarding the ORFeome project
Please contact Tong Hao with your questions or comments.
Please note: The web browser must be configured to accept cookies. And JavaScript must be enabled.
The isoform project is made possible through support from the High-Tech Fund of the Dana-Farber Cancer Institute, The Ellison Foundation (Boston, MA), and by grants from the National Cancer Institute, the National Human Genome Research Institute, and the National Institute of General Medical Sciences.
Complete understanding of the biology of the human genome will not be possible without understanding the full complement of functional proteins, or proteome, that the genome encodes. The latest GENCODE effort reaffirms the ~20,000 protein-coding genes in the human genome, but also emphasizes that full protein coding capacity of the genome and the full extent to which different transcripts and different isoforms are differentially expressed remains to be determined. Alternative splicing of pre-mRNAs has profound biological significance in metazoan organisms, as it can produce many variant protein products from a single gene. The precise number of proteins produced by alternatively spliced isoforms from the human genome is not known, and the ENCODE Consortium estimates that at least 18% of loci contain as-yet-unannotated exons.
Our Human ORFeome efforts have provided the scientific community with at least one high quality ORF model, or “reference ORF”, for most annotated protein-coding genes. What remains elusive is the systematic cloning of full-length alternative ORFs that would allow expression, and biochemical and functional characterization of all alternatively spliced proteins in the human proteome. It is well established that different isoforms of a gene can exhibit different functions, and differential function is likely reflected by differential protein interactions. Cloning full-length ORFs is important because functional differences are best understood when comparing full-length proteins.
The advent of systems biology necessitates the cloning of nearly entire sets of protein-encoding open reading frames (ORFs) collected into ORFeome collections, so as to allow functional studies of the corresponding proteomes. Over the years, we have developed genome-scale ORF cloning and verification projects for various organisms. The Gateway-based full-length ORFeome cloning strategy has had tremendous success in numerous projects in the Center for Cancer Systems Biology (CCSB) at the Dana-Farber Cancer Institute Our ORF-cloning pipeline has the following steps: i) Predicted ORFs are precisely PCR-amplified between annotated initiation and termination codons (ATG to STOP), using either a cDNA library or RT-PCR as template, using primers to add 5’-tails with Gateway recombinational cloning sites, ii) Resulting PCR products are recombined directionally into a Gateway Donor vector to create Entry clones, and iii) ORF sequence tags (OSTs) are obtained from the Entry clones, experimentally verifying the existence and intron-exon structure of the corresponding coding isoform.
We present a first version of a human Isoform ORFeome, human isoORFeome v1.1 containing 1,423 isoforms from 506 genes, of which 917 were not previously available as ORF clones, that are clonally-derived and sequence-confirmed ORFs as a set of Gateway Entry clones ready for transfer to Gateway-compatible expression vectors.
Binary protein-protein interaction mapping has allowed us to discriminate isoform-specific interactions. For over half of gene-specific isoform pairs in which each isoform exhibited at least one interaction, the comparative interactome profiles differ by 50% or more, suggesting that distinct isoforms engage in differential sets of interactions leading to distinct functional differences.