CIS Computing & Information Services

2013 CCV/EPSCoR Bioinformatics Workshop

Friday, October 18, 2013

Digital Scholarship Lab, Rockefeller Library

Session 1: Computing & Sequencing Infrastructure, 9:00 - 10:00am

Session 2: Research Talks, 10:30am - noon

  • Oligotyping: Using information theory to find order in microbial community data
    A. Murat Eren (MBL)
    Availability and affordability of high-throughput sequencing platforms offer a unique opportunity for microbial ecologists to better understand the diversity of bacteria in natural environments. However, popular bioinformatics approaches for the analysis of bacterial community data often overlook ecological trends that are manifested by subtle nucleotide variations for the sake of computational feasibility. Oligotyping is a supervised computational method that utilizes Shannon entropy to help researchers recover diversity patterns that widely-used bioinformatics pipelines may have failed to explain.

  • Use of genome data to explore patterns of protein evolution
    Bob Campbell (Brown/MPPB and MBL)
    Glycoprotein hormones are essential for reproduction in vertebrates, and disruption of their function impairs fertility. Molecular studies revealed evidence for gene invention and functional changes to this hormone network during the evolution of mammals, in particular within the primate lineage. The increasing availability of complete genome data is an opportunity to apply phylogenetic methods to clarify the temporal order of these evolutionary changes, and to test for evidence of directional (positive) selection on specific amino acid positions. The results are being assessed with knowledge from comparative endocrinology to understand how network components evolve and adjust through the emergence of new functions during evolution.

  • Computational modeling of the metabolism and evolution of marine microbes
    Ying Zhang (URI)
    Microbes account for as much as 90% of the total biomass in the ocean and dominate the diversity, productivity, and resilience of marine ecosystems. The advance of high-throughput molecular technologies has enabled the identification of individual molecules that are important for the activities of marine microbes. However, a systems-level understanding of the function and evolution of microbial life can only be reached through the construction of genome-scale models combining knowledge from multiple disciplines. This talk will present procedures for the integrated modeling that combines comparative genomics, structural biology, and systems biology. Examples will be given to demonstrate the application of such models for studying the metabolism and evolution of marine microbes.

Lunch Break, noon - 1:00pm

Session 3: Short Research Talks, 1:00 - 1:45pm

  • De novo assembly of transcriptomes and genomes
    Adrian Reich (Wessel Lab, MCB)
    Echinodermata is a diverse phylum that spans 500 million years of evolution but the vast majority of known genetic information within the phylum is from a single species of sea urchin, S. purpuratus. In the current study we have sequenced and assembled de novo transcriptomes of ovaries of twenty different species, representing all five families of echinoderms. The goals of this dataset are to identify orthologous genes, resolve the phylogenetic relationships of extant echinoderms, and identify rapidly evolving genes. In addition we have sequenced the genomes of two echinoderms and are attempting to assemble these de novo.

  • SOWHAT? The Swofford-Olsen-Waddell-Hillis (SOWH) Test of Topologies
    Samuel Church (Dunn Lab, EEB)
    The Swofford-Olsen-Waddell-Hillis (SOWH) test is a probabilistic evaluation of competing phylogenetic topologies. Using a parametric resampling approach, the SOWH test creates a null distribution of differences in likelihood values and tests the observed data against this distribution. This test is currently implemented using a complicated set of instructions which require manual manipulation of data. It also appears to be unreliable for some data sets. We present a program which automates the complicated steps of the SOWH test, along with a more reliable alternative called SOWHat.

  • Analysis of genomes for drug targets
    Bob Campbell (Brown/MPPB and MBL)
    The initial sequencing of parasite genomes revealed numerous homologs of human drug targets. This led to comparative genomics studies to comprehensively identify candidate drug targets from Neglected Tropical Disease pathogens, contributing to the creation of the WHO-sponsored TDR Targets Database. This also fostered projects to mobilize “old” medicinal chemistry from industry for "new" drug discovery against neglected diseases. Despite this progress, it has remained difficult to produce high-quality drug leads. This short talk will briefly review the work to date, and considerations for integrating bioinformatics approaches with other tools and knowledge to improve outcomes of parasite drug discovery programs.

Session 4: Data Management, 2:00 - 3:00pm

  • Data management is fun
    Casey Dunn (EEB)
    Managing data for a large, data intensive project with multiple researchers can get more complicated than the analyses themselves. I'll discuss some of the challenges we've faced in my lab and how we have approached them.

  • Data Management: Critical Success Factors
    Jaime Combariza (CCV)
    The ready availability of powerful instruments like genome sequencers, telescopes, electron microscopes, and remote sensing devices coupled with the increased capability of supercomputers to process vast amounts of information has resulted in an explosion of data (4th paradigm of science) that poses challenges not only to analyze it and move science forward but also -- and perhaps more importantly -- for the proper management and curation of this data. These challenges include: planning for data management; documentation and metadata; data structure and organization; security and backup; citation; dissemination; and sharing, In this talk I will describe best practices on data management that are recommended to researchers and if properly applied will strengthen collaborative endeavors (technical and scientific) between researchers and the unit/center that may be the stewardship of the data. These best practices are also critical components of a "data management plan," which is a requirement for some funding agencies.