I had a great time at the ISMB conference & related events in Boston last week. As always, it is an exciting event scientifically and socially.
First there was the Open Bio Foundation’s Codefest for contributors to the Bio* libraries and other open source bioinformatics tools. It was a fun two days of collaboration and coding. I worked on designs for a multiple structure alignment data structure for BioJava, although we didn’t actually finish an implementation.
Other 3DSIG talks I found particularly interesting or relevant to my research were
- Ray Stevens drives a well-oiled machine for solving GPCR structures. They’re trying to get structures for all 826 human GPCRs in the next couple of years, and are on track with 2-3 new structures per month.
- Debora Marks talked about EVFold.
- Rachel Kolodny discussed protein fold space. For “fold-like” thresholds, structural similarity networks form a large connected component and lots of small disconnected components (see my ISMB 2011 poster on the same topic). She finds that the large component of alpha/beta proteins disintegrates at a certain threshold rather than smoothly transitioning to smaller and smaller modules, and interprets this as a problem with of our attempts to classify alpha/beta proteins as discrete folds. However, it occurs to me that this might just be a problem with our scoring functions, since RMSD is lower in unrelated alpha proteins due to their inherent globularity.
- Jeff Skolnick gave a scary talk (for me) about whether fold space reflects evolutionary relationships or physical constraints. Quite a lot of it seems to be physics–a very simple model for making random protein structures is able to replicate quite a lot of properties of real proteins, including packing, surface pockets, and structural motifs. He even sees “sequence conservation” in similar pockets of random proteins! That’s bad news for those of us trying to use structural comparison for detecting distant homology. He concluded by saying that the only valid way to demonstrate homology is through careful phylogenetic analysis of not-too-distant relatives.
- Aled Edwards complained that 90% of research focuses on 10% of proteins (in kinases, at least), and called for more funding for studies on poorly characterized proteins. Science should have a grant to “fund weird-ass stuff,” he colorfully explained. However, he wasn’t just ranting about the system, but he’s also trying to do something about it. Specifically, his lab is developing new inhibitors for uncharacterized kinases, which they give away for free to universities and industry. One compound, JQ1, is in clinical trials only two years after it’s discovery due to the free sharing of information and chemicals.
- My advisor, Phil Bourne, talked about how current databases like the PDB are inefficient and poorly integrated. He introduced his new idea of ‘The Commons’ which would allow Dropbox-like transfer of information between groups, workflow-like features for providence and reproducibility, and a host of other features for big data. BioInform did an article about his BOSC talk on The Commons (he did four talks at ISMB!).
Then it was on to three days at ISMB. Structural biology is always a bit lacking at the main conference, but it’s a good conference to keep up with current trends in Bioinformatics.
- Metagenomics is hot. So are MOOCs (massive open online courses).
- There are tons of cloud computing systems starting up. CloudBioLinus, Basespace (Illumina), Seven Bridges (disclaimer: they gave me swag), Curoverse (sponsors of Codefest), Synapse (Sage Bionetworks; actually more of a workflow, but has a cloud component), etc. Most will give free time to developers if you ask for it.
- The $1000 genome race has decreased genome quality, according to Gene Myers. Biases in sequencing errors can prevent good assembly even at high read depth, so he’s a strong advocate of PacBio’s tech, which gives long reads and unbiased errors.
- Steven Brenner discussed how scientists deal with privacy concerns. While transparency may be good for science, the fact is that we aren’t very good about managing private patient data. He had numerous examples of various “leaks”, from the recent anthrax/smallpox containment breaches to about 20 reported privacy problems per month in California alone. He also called for clearer ethical standards on how to deal with leaked data. If it’s public knowledge, can you still use it for research?
- Dana Pe’er is doing some amazing things with single-cell analysis. The problem with flow cytometry is that it’s really noisy and stochastic, to the point that correlations are really weak. She seems to have overcome this problem and had some great results, both in benchmarks and in novel predictions.