I've spent a bit more time cleaning up my LaTeX template to make for fully automated web display. You can find it over on GitHub. This is currently set up for the canalization paper, but it should make a good basis for any sort of scientific manuscript. I've provided style sheets and a ruby script to cleanup output from TeX4ht into a presentable web version. This web version is hosted automatically through GitHub pages. Thus, by running a single script, the LaTeX source is compiled to HTML and with a GitHub push you can update the public web version of the manuscript. I hope this sort of approach will prove useful for collaborative writing.
It was easy to run this on my previous manuscripts written in LaTeX. I now have web versions of the tree topology and the global migration dynamics papers up.
In the Serengeti food web paper, we present a network diagram of predator-prey relationships, illustrating network structure (Figure 3). In getting with the times, we've also made an interactive version of this figure, presenting the network in a force-directed layout. Ed coded this up in d3.js based on a version I did in Processing. Green nodes represent plants, blue nodes represent herbivores and red nodes represent carnivores. Edges connecting nodes pull them toward each other following Hooke's Law, while nodes are repelled from each other according to Coulomb's Law. We add an additional force pulling nodes belonging to the same group toward each other.
My favorite part of the visualization is the concept of focus. If you click on a node, the spring forces applied to the edges of this node are magnified, pulling its connections closer. This makes it easier to explore relationships in the network. A double-click removes focus.
Our paper on modeling food webs was just published in PLoS Computational Biology. Here, I was happy to bring the statistics I've learned from phylogenetic analysis to an entirely different field. I advised Ed Baskerville in implementing MCMC and marginal likelihood estimation for network data. In this case, the data is a matrix of predator-prey relationships, which can be thought of as a network of directed edges specifying who-eats-whom. We investigated structure in the Serengeti food web through a model in which groups of species behave similarly to one another in terms of what species they eat and what species they are eaten by. The inferred model shows a high degree of trophic and spatial clustering in which a number of spatially distinct plant groups are fed upon by a few wider-ranging herbivore groups, which are in turn fed upon by just a couple of predator groups.
Also of possible interest, the supporting appendix provides a nice overview of the use Bayesian methods for inference on network data. The model we present here really should be useful in a variety of biological contexts; genetic regulatory networks and protein interaction networks immediately come to mind. Photo by Andy Dobson.
| Journal A | ||
| Accept | Reject | |
| Accept | 71 (67) | 35 (43) |
| Reject | 42 (43) | 31 (27) |
| Journal B | ||
| Accept | Reject | |
| Accept | 57 (47) | 18 (27) |
| Reject | 16 (27) | 25 (15) |
I discovered this paper by Peter Rothwell and Christopher Martyn through an excellent blog post by Bradley Voytek. In the paper, the authors show that reviews of the same paper by two independent reviewers show a level of agreement little better than expected by chance alone. The authors repeat their experiment across two neuroscience journals. For the first journal, they have 179 pairs of reviews, with 219 of the 358 votes (61%) recommending acceptance or acceptance with revision. If votes between reviewers were distributed entirely by chance, we would expect 67 accept-accept pairs, 43 accept-reject pairs, 43 reject-accept pairs and 27 reject-reject pairs. However, if the reviewers are coming to some sort of scientific consensus we would see an overabundance of accept-accept and reject-reject pairs.
Here, I've shown their findings, with observed and (expected) counts for each scenario. In journal A, there appears to be little or no difference from the chance expectation, while journal B shows a very modest improvement over the chance expectation. A simple Fisher's exact test gives a P value of 0.285 on the results of journal A and a P value of less than 0.0001 on the results of journal B. Additionally, Rothwell and Martyn find little correspondence in reviewer's assessments of priority of publication.
Interestingly, the authors studied reproducibility of abstract acceptance at two different scientific conferences. Here, each abstract was reviewed and rated on a 1 to 6 scale by a panel of 14 or 16 reviewers. In this case, variance across abstracts can be assessed, but also variance across reviewers (we expect some reviewers to be tougher than others in their assessments). Rothwell and Martyn find a very modest R2 across abstracts of 0.11–0.15, indicating very little reviewer agreement. However, R2 across reviewers was a more respectable 0.27–0.32, suggesting more variation in reviewer "toughness".
Thus, it appears that in small samples of two or three reviewers, noise from positive/negative reviewer bias may swamp the signal of a particular manuscript. This fits with my own anecdotal experiences. Usually (but by no means always) reviewers seem to agree on what's lacking in a manuscript, but will often disagree on how terrible a particular failing is to the manuscript's prospects. Perhaps if each reviewer's overall positive/negative rating bias were taken into account, we could arrive at a measure of manuscript quality that is more repeatable between independent reviewers. In turn, this could make authors less beholden to the roll of the reviewer die.
In an ongoing effort to be more open in my scientific dealings, I've posted a preprint of my latest paper to the arXiv and here on my website, as both PDF and HTML. This represents my first attempt at a straight-up modeling study. There's a lot going on with the epidemiology and evolution of influenza; I've made a model that attempts capture all the salient details. This includes things like the yearly attack rates, rate of antigenic evolution, genetic diversity, and geographic spread. At it's core, the model assumes that the antigenic phenotype of the virus can be adequately explained as a point in a Euclidean space. Mutation serves to jostle the location of the virus in this space and infection by one virus confers immunity to subsequent infection by nearby viruses in this antigenic space. The geometric basis of the model stems from empirical studies of influenza's antigenic phenotype (see Smith et al. 2004). In this study, I find that evolution in such a space results in a "canalized" trajectory. The best move for a virus is to move as far away from its past as possible, resulting in linear antigenic movement and a distinctive single-trunked phylogenetic tree.
I'm especially proud of my HTML version of the manuscript, which, through the magic of LaTeX, has all sorts of hyperlinking between figures and references. In addition, I've done my best to make something that's highly readable on the screen. Almost everything is taken care of by TeX4ht conversion from my LaTeX source and with a CSS stylesheet, so with only a little more work I should be able to fully automate the process.
I'm working now to put the source code for the simulations behind this online. In the meantime, I would very much welcome any feedback you might have on the manuscript. Good to get feedback before publication, when there's still an opportunity to incorporate it.
I came across a simple visualization of England and Wales mortality data in the Guardian. And because I couldn't deal with the network-y display of hierarchical count data, I decided to redesign the graphic as a tree map. In googling for "treemap", I found d3.js, which makes extremely attractive Javascript graphics, with a number of rather fancy built-in figure types. It seems a little harder to get into than Processing, as it exposes more of the raw Javascript, but the results are beautiful and it provides full SVG support. Here's the mortality data laid out with d3's treemap algorithm.
In my paper on selection in viral phylogenies, I compared the effective population size of measles virus to the effective population of human influenza virus. The concept of effective population size Ne is central to population genetics. It measures the timescale of population turnover, or, looking backwards in time, it measures how long it takes for individuals in the population to find a common ancestor. Genetic diversity is a combination of this timescale and mutation rate.
This is just a small addendum to that paper. I had wanted to include swine influenza in with the comparison of measles virus and human influenza virus, but decided that this would detract from the paper's focus. Here, the sequences of swine influenza come from de Jong et al. (2007).
The scaled effective population size Neτ of measles is estimated at 124.6 years, Neτ of global H3N2 human influenza is estimated at 7.2 years, and Neτ of European H3N2 swine influenza is estimated at 24.1 years.
This fits nicely with the observed patterns of antigenic evolution. Infection with measles confers life-long immunity; evolution of the measles genome does not change its antigenic phenotype. This results in neutral population dynamics. However, human influenza evolves in antigenic phenotype very rapidly, causing strong selective pressures that reduce effective population size. Swine influenza presents a nice example between these two extremes. In comparing rates of antigenic evolution, de Jong et al. find that "while human H3N2 viruses have evolved at a rate of about 2.0 antigenic units per year since 1982, swine H3N2 viruses have evolved more than six times more slowly, about 0.3 antigenic units per year." In this case, selective pressures still reduce effective population size, but not to the degree seen in human influenza.
In my work on flu I've been trying to build joint evolutionary and epidemiological models, where natural selection emerges dynamically from influenza strains competing for susceptible hosts. In speaking on this, I found it useful to broaden the context a bit. Here, you can think very generally of genetic / ecological variants competing with one another in some sort of ecological space. Variants that are close together in this space strongly compete, while more distant variants exist more-or-less independently.
This is exactly the model that Darwin used to illustrate the Origin of Species. Here, I've described this idea in a bit more detail and built a visualization of the model.
I put the source code to the simulations in last month's tree topology paper online.
I just had a paper published online in BMC Evolutionary Biology. I've hosted a compiled PDF with all the figures inline, while their production office gets the official version together.
This was a fun project. It's essentially showing how the basic models of selection in population genetics play out in detailed phylogenies, where the passage of time is clearly evident. I think a visual / phylogenetic approach really helps to understand the processes at work. In this case, rather than describing an allele that reaches 100% frequency, the process of fixation describes a lineage that outcompetes its contemporaries and comes to be the progenitor of the entire future population. A fundamental finding in population genetics is that selection reduces effective population size, that is when there is heritable variation for fitness, then the patterns of ancestry connecting individuals will resemble a smaller population. Essentially, only a few fitter individuals have a chance of contributing their genetic legacy to the future population. This paper explores the effects of selection on phylogeny shape, with particular attention to uncovering selective dynamics through time.
twitter
github
RSS