I put the source code to the simulations in last month's tree topology paper online.
I just had a paper published online in BMC Evolutionary Biology. I've hosted a compiled PDF with all the figures inline, while their production office gets the official version together.
This was a fun project. It's essentially showing how the basic models of selection in population genetics play out in detailed phylogenies, where the passage of time is clearly evident. I think a visual / phylogenetic approach really helps to understand the processes at work. In this case, rather than describing an allele that reaches 100% frequency, the process of fixation describes a lineage that outcompetes its contemporaries and comes to be the progenitor of the entire future population. A fundamental finding in population genetics is that selection reduces effective population size, that is when there is heritable variation for fitness, then the patterns of ancestry connecting individuals will resemble a smaller population. Essentially, only a few fitter individuals have a chance of contributing their genetic legacy to the future population. This paper explores the effects of selection on phylogeny shape, with particular attention to uncovering selective dynamics through time.
It's a reassuring thing when science works the way it's supposed to and the pieces mesh together. A paper by McLeish and colleagues just came out looking at sero-prevalence of pandemic H1N1 from 2009 to winter 2010 in Scotland. This comes from measuring antibodies specific to pandemic H1N1 in a large sample of the population. McLeish et al. find that through March 2010, approximately 35% of people in Scotland were infected with pandemic H1N1 influenza. This number apparently came as a surprise to the media and the story has been getting a lot of play.
However, what's cool is that this 35% number is exactly what we expected from our epidemiological models. The basic reproductive number R0 measures the number of secondary infections expected to result from a given infection in a naive host population (a population with no previous exposure). This number was quite small for the H1N1 pandemic, estimated at around 1.2 to 1.3 from the early upswing of the pandemic in 2009. The original paper detailing the SIR model (Kermack and McKendrick 1927) gives the formula:
, where Z represents the final size of an epidemic (in terms of proportion of the population infected). Numerically solving this for these values of R0 gives an expected final epidemic size of 31% to 42%. This is amazingly on target.
I did a small write-up on the scaling of the effective number of parameters in phylogenetic inference. I was surprised by how nicely it worked out. Basically, each additional taxa included in the model contributes one additional parameter.
I finally got the simulation code associated with the "Global migration dynamics" paper up and online. Hopefully this will prove useful to other groups working on the evolutionary dynamics of viral pathogens. Source code and analysis can be found here.
I've been trying to bring myself somewhat up to date with current web technologies. I got my coalescent visualization ported over to HTML5 / Javascript by using Processing.js. If you're running Chrome or Safari, then this definitely makes for the better version.
Another paper to share. Here, I helped with the population genetic side of things in Rebekah Roger's analysis of a new gene in Drosophila. Before I left the Hartl lab, I worked with Rebekah on a bioinformatic analysis of chimeric genes. These are strange accidents of evolution where two functioning genes are spliced together to create a new gene with bits from both parental genes. We found 14 such genes in Drosophila, one of which is the focus of this current work. This new gene, which we're calling Quetzalcoatl, appears to be fantastic for the flies that possess it.
From my perspective, it's good to see the bioinformatic work pay dividends.
Reuters: "Who to blame for flu? Maybe the US, study finds."
This is hilarious. Of course it has to be someone's fault. Wow.
Ph.D. comics did a piece on this. Fortunately, the articles produced by science writers were actually pretty good: U of M News Service, HHMI News, CIDRAP News.
Today, my paper on migration patterns in the flu virus was published in PLoS Pathogens. This was fun work to do, requiring approaches from multiple disciplines. While the basics of the migration model came from population genetics and coalescent theory, fitting this model to sequence data required a lot of heavy-lifting computation implemented by Peter Beerli in the program Migrate. I originally wrote my program PACT to deal with the enormous (2000+ tips) phylogenetic trees produced by this analysis. Additionally, a lot of epidemiology went in to making realistic simulations on which to hone the methods.
The common ancestor of all contemporaneous H3N2 flu can be traced back to a single infection occurring somewhere in the world approximately 2-5 years before hand. This infection, by luck and by virtue of its genotype, becomes the progenitor of the entire worldwide flu population. The main goal of this analysis was to trace this progenitor lineage through time. We found that this lineage existed primarily in China and Southeast Asia, but also, surprisingly, in the USA. The occasional presence of this progenitor lineage in USA has important public health implications.
I'm not terribly happy with the PLoS presentation. Rather than keeping figures as line art, they were converted to low quality bitmaps. Also, I don't like the splitting of the supporting information into 10 different files. So, in addition to the paper, I'm hosting high quality PDFs of the figures and a single PDF of the entire supporting appendix. Go here.
About a week ago I gave an internet seminar on phylogenetics of the influenza virus over at phyloseminar.org. I'm very happy someone stepped up to organize something like this (thanks Erick!). You can watch me talk to my computer for an hour if you'd like, and the other seminars are really good as well.
twitter
github
RSS