Less than 2% of human genome sequence is involved in transcription of protein coding genes. Approximately 5% has been evolutionarily conserved within the mammalian lineage. Is the rest really just "junk" DNA? Or does it play an important functional role?
We are interested in investigating the relationship between DNA sequence and important biological processes such as chromosome X inactivation, aberrant DNA methylation in cancer, and DNA replication. In general, these processes act differently in distinct regions of the genome allowing for the division of these regions into classes. Characteristics of the genomic sequence are employed to create feature vectors to represent the genomic profile for that region. Individual genomic features that have significantly different distributions in regions of one class as compared to regions of the other are identified using statistical techniques such as Mann-Whitney/Wilcoxon RankSum tests and signal-to-noise ratios. Computaional machine learning classifiers, such as SVMs, are then trained using these significant features and used to predict what class novel regions of the genome will be in suggesting how that region will be affected in a particular process.
Currently, we primarily employ existing genomic annotations as DNA sequence features. Annotations utilized include locations of genes and retrotransposons such as Alu and L1 repeat elements, as well as distributions of GC bases and other specific short nucleotide sequences such as 3- and 5-base sequences. While these have proven to be very informative, we are investigating the creation of novel features genome-wide that characterize DNA structural properties such as bendability, curvature, and thermostability. These features may provide a better clue as to the mechanistic role of DNA sequence in these processes.
We maintain a full mirror of the UCSC Genome Browser at http://genome-mirror.duhs.duke.edu. This not only provides direct access to the database of annotations of the human and other genomes, but also the ability to view new annotations created within the lab.
Students in the lab generally are part of either the graduate program in Computational Biology & Bioinformatics or from the Department of Computer Science, but any with strong computational backgrounds and an interest in biology should feel free to contact Terry.
|
|
|||
|
|||
|
|