<?xml version="1.0" encoding="UTF-8" ?>
<rss version="0.91">
<channel>
    <title>trevorbedford</title>
    <link>http://www.trevorbedford.com</link>
    <description>RSS feed for www.trevorbedford.com</description>
    <language>en-us</language>
    <copyright>Copyright 2009-2012 Trevor Bedford</copyright>
    
	<lastBuildDate>Mon, 07 May 2012 16:47:01 GMT</lastBuildDate>

	<item>
		<pubDate>Mon, 07 May 2012 11:22:00 GMT</pubDate>
		<title>Comparing performance of Processing.js and D3.js</title>
		<link>http://www.trevorbedford.com/archive/may_07_2012.html</link>
		<description><![CDATA[
		
		<p>
		I've just started on a project for <a href="http://code.google.com/soc/">Google Summer of Code</a>, mentoring <a href="https://plus.google.com/104602171363021110147">Michael Landis</a> at Berkeley.  Michael has proposed to build a browser-based tool to visualize phylogeographic output from <a href="http://beast.bio.ed.ac.uk">BEAST</a> and similar programs.  Here, we want to track the geographic locations of lineages or species through time across a phylogeny.  An animation would start at the root of the tree and work its way forward to the present, essentially slicing the phylogeny at each point in time and showing the distribution of lineage-specific locations at this slice.
		
		<p>
		We're still in the planning stages and one of the big questions is which Javascript library to base this on.  The top two contenders are <a href="http://processingjs.org/">Processing.js</a> and <a href="http://d3js.org/">D3.js</a>.  Processing.js will take code written in <a href="http://processing.org/">Processing</a>, essentially stripped-down Java, and draw to an HTML canvas object, essentially specifying pixels on a grid.  D3, on the other hand, is written as pure Javascript and all of the manipulation is in terms of SVG objects, specifying lines and circles and so on.  Although, I agree in part with <a href="http://en.wikipedia.org/wiki/Donald_Knuth">Knuth</a> in that "premature optimization is the root of all evil," I wanted to see if performance would have a definite tick in one column or the other.
		
		<p>
		Here, I coded up a simple Brownian motion style visualization using both programs.  The Processing.js visualization is <a href="/performance/processing/">here</a> and the D3.js visualization is <a href="/performance/d3/">here</a>.  There are 500 particles, the velocity of which is constantly being bumped up and down by random noise.  The XY window is adjusted every frame to match up with the extent of the XY locations of the particles.  In addition to random noise, there is some friction slowing down the particle velocities and there is an attraction of each particle to {0,0}, making this an <a href="http://en.wikipedia.org/wiki/Ornstein–Uhlenbeck_process">OU process</a>.  Particle sizes are proportional to their velocities.
		
		<p class="margin">		
		<table style="float:right; text-align:center; padding:0px 30px 10px 30px;" width=300>
			<tr><td><em>Browser</em></td><td><em>Processing.js</em></td><td><em>D3.js</em></td></tr>
			<tr><td>Safari</td><td>58</td><td>29</td></tr>
			<tr><td>Chrome</td><td>40</td><td>34</td></tr>
			<tr><td>Firefox</td><td>40</td><td>4</td></tr>			
		</table>			
		
		<p>
		Here, I've recorded the frame rates I was getting for both Processing.js and D3.js.  I'm doing all of this on my MacBook Pro.  Your results may vary.  Processing.js running in Safari comes out on top, nearly hitting 60 FPS, while D3.js under Safari gave roughly half this.  Chrome fairs substantially worse with Processing.js, but slightly better with D3.js, while Firefox does terribly with D3.js.  I would imagine that almost all of the differences here will lie in the handling of SVG vs canvas rather than in the D3 and Processing libraries.  Still, although I'm sure SVG performance will continue to improve, for the moment it seems that Processing.js is the clear winner.
		
		<p>
		Disclaimer: this is one particular visualization.  Incorporating other aspects (transparency, polygons, etc...) could give different results entirely.
		
		]]></description>
	</item>

	<item>
		<pubDate>Thu, 22 Mar 2012 12:25:00 GMT</pubDate>
		<title>Simulating virus evolution and epidemiology</title>
		<link>http://www.trevorbedford.com/antigen/</link>
		<description><![CDATA[
		
		<p>
		I've posted the code that I used in the <a href="/canalization/">canalization paper</a> to do large-scale simulations of the evolution and epidemiology of the human influenza virus.  I've called it <a href="/antigen/">'Antigen'</a>.  In it, each virus maintains a reference to its 'parent' infection, allowing the full infection tree to be built as the simulation proceeds.  I've found this very useful, in that preempts the usual intermediate step of phylogenetic inference.  Here, a virus's antigenic phenotype is represent as a point in an <i>n</i>-d space.  Mutations move the virus in this space and mutant viruses that are antigenically distant from previous infections have a transmission advantage.  This differential transmission leads to the antigenic drift of the virus population.  However, phenotype is just an interface; sequence-based phenotypes could easily be implemented.
		
		<p>
		I was very happy with how quickly I've got things to run.  Usually individual-based models end up being slow and having to have not very many individuals.  Here, I can run 40 years of simulation with 90 million hosts in around 15 minutes.  With the code <a href="https://github.com/trvrb/antigen">up on GitHub</a>, I very much hope it gets used and improved upon by others in the community.
		
		]]></description>
	</item>

	<item>
		<pubDate>Mon, 19 Mar 2012 09:50:00 GMT</pubDate>
		<title>Running the numbers for Contagion... they don't come out well</title>
		<link>http://www.trevorbedford.com/archive/mar_19_2012.html</link>
		<description><![CDATA[
		
		<p class="margin">
		<img class="offset" src="/images/contagion_movie.jpg">		
		
		<p>
		I know I'm totally late to the party with this, but after seeing <a href="http://en.wikipedia.org/wiki/Contagion_(film)">Contagion</a> on the plane back from the states in December, I've been wanting to run the numbers comparing their stated epidemiological parameters with how quickly we see the pandemic spread.  I finally got my hands on a DVD copy, so I was able to go through and record some of the epidemiological details.  I was initially skeptical of the movie, thinking they'd do the standard Hollywood thing of grossly exaggerating everything.  With this in mind, I was pleasantly surprised to see <i>R</i><sub>0</sub> listed as a very reasonable 2 and not, say, 20, and mortality rate listed around 20% or 25% rather than a Hollywood 90%.
		
		<p>
		Here are some of the most relevant numbers stated during the film, which helpfully lists "Day X" at the beginning of many scenes.  On day 12, a news announcer states: "The WHO estimates the number of people who have been infected worldwide to be over 8 million."  At some point between day 26 and day 28, an announcer states: "The death toll in the United States is believed to have reached 2.5 million."  And around day 35, another announcer states: "[The virus] so far has taken over 26 million lives worldwide." 
		
		<p class="margin">
		<img class="offset" src="/images/contagion_realistic.png">				
		
		<p>
		These are the results of running a simple <a href="http://en.wikipedia.org/wiki/Compartmental_models_in_epidemiology">SIR model</a> with <i>R</i><sub>0</sub> of 2, an exponentially distributed duration of infection of 3 days and a mortality rate of 20%. The left-hand panel shows the number of currently infected (prevalence) through time.  The log <i>y</i>-axis makes it abundantly clear that an epidemic undergoes an initial phase of exponential growth, until the number of remaining susceptibles drops below a certain threshold (when <i>R</i><sub>0</sub> &times; <i>S</i> < 1), at which point the epidemic undergoes exponential decay.  Cumulative cases (and deaths), shown in the right-hand panel, initially increase exponentially, until the pandemic peaks at around day 65 or 70, at which point they level off.  The red circle shows the 8 million infections that supposedly exist at day 12.  These 8 million infections are not reached in the SIR model until day 50.  It's abundantly clear that the timeline of the pandemic has been vastly sped up to increase the scare factor.  
		
		<p class="margin">
		<img class="offset" src="/images/contagion_fit.png">			
		
		<p>
		Curious as to what sort of parameters would be necessary to get the extremely rapid pandemic shown in the movie, I fit <i>R</i><sub>0</sub> to the 12 day / 8 million number arriving at <i>R</i><sub>0</sub> = 4.92.  Although it seems that the overly rapid progression could be corrected for by keeping basically the same movie but just relabeling the days, there are other aspects of the movie that are internally inconsistent.  Cheever (Fishburne) says that "without a vaccine approximately one in 12 people [8% of the population] will contract the disease."  If <i>R</i><sub>0</sub> is 2, then we expect 80% of the human population to eventually become infected.  For comparison, the relatively mild H1N1 pandemic managed to infect around <a href="/archive/jun_14_2011.html">35% of the population</a>.  
		
		<p>
		It seems that the movie wants to offer a story of a devastating event, but something far from an apocalyptic scenario, i.e. "70 million deaths" rather than the 1.1 billion deaths that <i>R</i><sub>0</sub> of 2 and a mortality rate of 20% would imply, but that the writers could not resist making the pandemic scenario scarier than necessary. 
		
		<p>
		
		]]></description>
	</item>

	<item>
		<pubDate>Mon, 12 Mar 2012 11:14:00 GMT</pubDate>
		<title>Model comparison through path sampling and AICM</title>
		<link>http://www.trevorbedford.com/archive/mar_12_2012.html</link>
		<description><![CDATA[
		
		<p class="margin">
		<a href="/pdfs/baele-model-comparison-2012.pdf"><img class="offset" src="/images/roc_curve.png"></a>		
		
		<p>
		We just had a paper published in MBE <a href="/pdfs/baele-model-comparison-2012.pdf">on comparing different phylogenetic models</a>.  Generally, its much easier to estimate the parameters of a particular model (including a level of uncertainty on these estimates) than it is to assess which model of several is the "best" fit.  More complicated models will necessarily better fit the data at hand.  However, overly complicated models will be brittle and not generalize well when new data is confronted.  Thus, there is a trade-off between number of effective parameters and a model's likelihood.  Assessing this trade-off can be quite difficult computationally.
		
		<p>
		Here, <a href="http://www.kuleuven.be/rega/ecv/GuyBaele.html">Guy Baele</a> put in a lot of effort to implement very general methods of "thermodynamic integration" (as it's known in the phylogenetics literature) or "path sampling" (as it's known in the statistics literature) into <a href="http://beast.bio.ed.ac.uk/">BEAST</a>.  The basic idea is very similar to what Ed and I did with <a href="/pdfs/baskerville-serengeti-2011.pdf">the food web analysis</a>, calculate a marginal likelihood by comparing the posterior likelihood across MCMC chains at different "temperatures".  The MBE paper shows that this estimate is much more accurate and repeatable than the very popular harmonic mean estimator of marginal likelihood.
		
		<p>
		For the paper, I did most of the work involving measuring <a href="http://en.wikipedia.org/wiki/Akaike_information_criterion">Akaike information criteria (AIC)</a> in a Bayesian Monte Carlo context, and thus called AICM.  This measure has the computational advantage that, like the harmonic mean estimator, it can be computed directly from MCMC runs produced by BEAST or another piece of software, without resorting to a second, more-complicated, analysis.  We find that AICM does not match path sampling for accuracy, but definitely beats the harmonic mean.  It's currently implemented in BEAST XML and will be built in to the next version of <a href="http://tree.bio.ed.ac.uk/software/tracer/">Tracer</a>.  
		
		]]></description>
	</item>

	<item>
		<pubDate>Wed, 29 Feb 2012 13:10:00 GMT</pubDate>
		<title>Visualizing the global circulation of the human influenza virus</title>
		<link>http://www.trevorbedford.com/migration_dynamics/network.html</link>
		<description><![CDATA[
		<p>
		I've put together a <a href="/migration_dynamics/network.html">visualization showing the global migration dynamics</a> of the influenza virus.  Here, simulation parameters are based on the results of the phylogenetic analysis in the <a href="/migration_dynamics/">2010 paper</a>.  I've been using this in my talks for a while now.  Nice to finally get it online.
		
		]]></description>
	</item>

	<item>
		<pubDate>Mon, 20 Feb 2012 12:42:00 GMT</pubDate>
		<title>Some thoughts on a GitHub of Science</title>
		<link>http://www.trevorbedford.com/archive/feb_20_2012.html</link>
		<description><![CDATA[
		
		<p class="margin">
		<img class="offset" src="/images/github_large.png">			
		
		<p>
		Lately, I've been thinking more about issues surrounding Open Science and scientific publishing.  This post is in part a response to posts by <a href="http://schamberlain.github.com/scott/2012/02/13/a-github-publishing-model/">Scott Chamberlain</a> and <a href="http://marciovm.com/i-want-a-github-of-science/">Marcio von Muhlen</a>.  Marcio's idea represents a major call-to-arms for innovation in how science is conducted and communicated.  He states that "we need a social network of science, meaning scientific bundles of knowledge must be structured and accessible by API, with the connections among those bundles and appropriate utility metrics being what connects and prioritizes scientists."  I would completely agree here.  Making small steps, this is why I chose to post my latest paper to the <a href="http://arxiv.org/abs/1111.4579">arXiv</a> and to <a href="http://trvrb.github.com/canalization/">GitHub itself</a>.
		
		<p>
		Scott questions whether GitHub could be useful as a scientific publishing platform, which I think is a very different thing from Marcio's GitHub of Science.  Here, as publishing platform, I think the primary advantage of GitHub is the versioning system at its heart.  This would allow an audience to follow a scientific story as is progresses, but would also allow the history of a project to be queried and individual contributions to be easily assessed (at least in terms of writing and coding).  If we want to move towards a <a href="http://www.michaeleisen.org/blog/?p=694">system of post publication peer review</a> there needs to be a good way of continually updating a manuscript and making it obvious what each new version brings.  A nice open source analogy here (that Scott originally mentioned on Twitter) is the idea of peer review as opening <i>issues</i>.  Right now, in Google Code or GitHub, you can open an issue with a project documenting a bug or other sort of problem.  Developers can then respond to this issue and make the appropriate changes to their project (that are then linked to the issue, making tracking of specific revisions straight-forward).  Peer review acts in a very similar fashion, documenting inadequacies with the approach taken in a scientific manuscript.  So, please, please, <a href="http://github.com/trvrb/canalization/issues">open an issue with the canalization paper</a>.  I would be happy to try to attend to it.
		
		<p>
		However, I think the potential for something like a GitHub of Science goes much farther than just a publishing platform.  In the current paradigm, manuscripts are built on top of manuscripts, but there is a lot of replicated effort.  Let's say someone thinks of a small, but highly relevant, addition to a paper.  For example, in the case of the canalization paper, what is the effect of vaccination on the antigenic evolution of influenza?  This could be a one figure addition to the present paper.  However, in the current system, doing this research would entail rerunning a lot of the model basics, writing a new paper, with a new introduction and a new discussion, all centered around vaccination.  This one-figure vaccination addendum may not make a paper by itself, but it would be great if it could somehow be integrated into the literature.
		
		<p>
		The basic paradigm of GitHub is the <i>forking</i> of a software project.  I write some code, you take what I've done and make some additions.  I then have the option of folding your changes back into my version, or if I'm not happy with your changes, the two versions continue on their separate ways.  With something like a GitHub of Science, someone else could fork my canalization paper, code and all, and append a short section and figure on vaccination.  I could choose to <i>pull</i> this addition, integrating it as part of the paper, or the forked version could exist on its own.  Here, I'm imagining a scenario where most collaboration manifests as a network of fork and pull requests between co-authors, where a story emerges by combining a number of individual contributions.
		
		<p>
		In a conversation with <a href="http://benfry.com/">Ben Fry</a> about this, he commented that the most beneficial aspect of peer review is  that it forces scientists to work in such a way that their research can be reviewed, and, at least in theory, replicated.  Working in such a way that research could be <i>forked</i> would be a much higher, better, bar in terms of documentation and reproducibility.  There is continual innovation in terms of models for Open Science (<a href="http://stackexchange.com/">Stack Exchange</a>, <a href="http://arxiv.org/">arXiv</a>, <a href="http://github.com/">GitHub</a>, etc...).  I'm hopeful that we can eventually come up with something that gains some traction.  However, I'm sure that whatever we start with, it has to produce a publishable end product, so that both old and new systems could continue forward, existing side-by-side.
		
		]]></description>
	</item>

	<item>
		<pubDate>Mon, 30 Jan 2012 01:57:00 GMT</pubDate>
		<title>Estimating global flu diversity</title>
		<link>http://www.trevorbedford.com/archive/jan_30_2012.html</link>
		<description><![CDATA[
		
		<p class="margin">
		<img class="offset" src="/images/flu_turnover.png">		
		
		<p>
		How many strains of flu are circulating at any given moment?  And how much sampling is necessary to capture this diversity?  This came up in a conversation with <a href="http://tree.bio.ed.ac.uk/people/arambaut/">Andrew Rambaut</a> and <a href="http://www.erikvolz.info/">Erik Volz</a> last week.  Fortunately, we can get a back-of-the-envelope estimate using standard population genetic theory.  Here, I've downloaded all the amino acid sequences for the HA1 region of the H3N2 hemagglutinin protein that exist in Genbank between January 2002 and June 2009.  This figure is looking at 10 week windows, with each colored region representing the frequency of a particular sequence in that window's sample.  You can see that there are a few common sequences and many rare sequences, and that sequence diversity rapidly changes over time.  The HA1 region is the region of the influenza genome most responsible for antigenic variation.  Evolution of HA1 is what allows the virus to infect people that have built up immunity to previous strains of flu.  Looking at amino acid diversity of HA1 will give an under-estimate of total genomic diversity of flu, but should be a decent proxy for functional diversity.

		<p class="margin">
		<img class="offset" src="/images/samples_vs_types.png">	
		
		<p>
		We can use the <a href="http://en.wikipedia.org/wiki/Ewens's_sampling_formula">Ewen's sampling formula</a> to calculate the probability that we observe <i>k</i> distinct sequences (or alleles) in a sample of <i>n</i> sequences.  In this case, the expected number of alleles in sample of <i>n</i> sequences is <img align="center" src="/images/ewens.png">, where <i>&theta;</i> represents the level of mutational input into the population.  This formula assumes neutral demography, no geographic subdivision and an infinite alleles mutation model, where every mutation creates a new allele.  I fit this formula to the windows from Genbank comparing the number of sequences sampled each month to the number of distinct sequences observed.  Doing so, I get an estimate for <i>&theta;</i> of 28.8, shown in red.
		
		<p>
		With this number in hand, it's possible to estimate the number of distinct alleles that one would find in a very large sample.  We expect to find 104 alleles in a sample of 1000 sequences and 169 alleles in a sample of 10k sequences.  Estimated global prevalence of influenza is around 70 million (more during the northern hemisphere winter, but this should be good enough for our purposes).  A sample of 70 million sequences is expected to have 358 distinct sequences.  However, most of these are at very low frequency.  We would only expect to see around 30 alleles at greater than 1% frequency, 86 alleles present at >0.1% frequency and 164 alleles present at >0.01% frequency in the population.  I'm not sure exactly where to draw the line in terms of "important" variation, but I would think that 1 in 1000 is a good ballpark.  Thus, it seems to me that a sample of around 500 sequences (with an expected 84 unique alleles) would be sufficient to capture all the possibly important diversity in the HA1 protein.
				
		
		]]></description>
	</item>

	<item>
		<pubDate>Mon, 16 Jan 2012 11:10:00 GMT</pubDate>
		<title>LaTeX manuscript template with web display</title>
		<link>http://www.trevorbedford.com/archive/jan_16_2012.html</link>
		<description><![CDATA[
		<p>
		I've spent a bit more time cleaning up my LaTeX template to make for fully automated web display.  You can find it <a href="http://github.com/trvrb/canalization">over on GitHub</a>. This is currently set up for the canalization paper, but it should make a good basis for any sort of scientific manuscript.  I've provided style sheets and a ruby script to cleanup output from TeX4ht into a presentable web version.  This web version is <a href="http://trvrb.github.com/canalization/">hosted automatically through GitHub pages</a>.  Thus, by running a single script, the LaTeX source is compiled to HTML and with a GitHub push you can update the public web version of the manuscript.  I hope this sort of approach will prove useful for collaborative writing.  
		
		<p>
		It was easy to run this on my previous manuscripts written in LaTeX.  I now have web versions of the <a href="/tree_topology/">tree topology</a> and the <a href="/migration_dynamics/">global migration dynamics</a> papers up.
		
		]]></description>
	</item>

	<item>
		<pubDate>Tue, 03 Jan 2012 13:10:00 GMT</pubDate>
		<title>Interactive visualization of the Serengeti food web</title>
		<link>http://edbaskerville.com/static/research/serengeti-food-web/groups-figure3-interactive/</link>
		<description><![CDATA[
		<p>
		In the <a href="/pdfs/baskerville-serengeti-2011.pdf"">Serengeti food web paper</a>, we present a network diagram of predator-prey relationships, illustrating network structure (<a href="/images/food_web_full.png">Figure 3</a>).  In getting with the times, we've also made an <a href="http://edbaskerville.com/static/research/serengeti-food-web/groups-figure3-interactive/">interactive version of this figure, presenting the network in a force-directed layout</a>.  Ed coded this up in <a href="http://mbostock.github.com/d3/">d3.js</a> based on a version I did in <a href="http://processing.org/">Processing</a>.  Green nodes represent plants, blue nodes represent herbivores and red nodes represent carnivores.  Edges connecting nodes pull them toward each other following Hooke's Law, while nodes are repelled from each other according to Coulomb's Law.  We add an additional force pulling nodes belonging to the same group toward each other.
		
		<p>
		My favorite part of the visualization is the concept of <i>focus</i>.  If you click on a node, the spring forces applied to the edges of this node are magnified, pulling its connections closer.  This makes it easier to explore relationships in the network.  A double-click removes focus.
		
		]]></description>
	</item>

	<item>
		<pubDate>Mon, 02 Jan 2012 14:20:00 GMT</pubDate>
		<title>Spatial guilds in the Serengeti food web revealed by a Bayesian group model</title>
		<link>http://www.trevorbedford.com/pdfs/baskerville-serengeti-2011.pdf</link>
		<description><![CDATA[
		
		<p class="margin">
		<a href="/pdfs/baskerville-serengeti-2011.pdf"><img class="offset" src="/images/serengeti.jpg"></a>
		
		<p>
		Our <a href="/pdfs/baskerville-serengeti-2011.pdf">paper on modeling food webs</a> was just published in PLoS Computational Biology.  Here, I was happy to bring the statistics I've learned from phylogenetic analysis to an entirely different field.  I advised <a href="http://edbaskerville.com/">Ed Baskerville</a> in implementing MCMC and marginal likelihood estimation for network data.  In this case, the data is a matrix of predator-prey relationships, which can be thought of as a network of directed edges specifying who-eats-whom.  We investigated <i>structure</i> in the Serengeti food web through a model in which groups of species behave similarly to one another in terms of what species they eat and what species they are eaten by.  The inferred model shows a high degree of trophic and spatial clustering in which a number of spatially distinct plant groups are fed upon by a few wider-ranging herbivore groups, which are in turn fed upon by just a couple of predator groups.  
		
		<p>
		Also of possible interest, <a href="/pdfs/baskerville-serengeti-supp.pdf">the supporting appendix</a> provides a nice overview of the use Bayesian methods for inference on network data.  The model we present here really should be useful in a variety of biological contexts; genetic regulatory networks and protein interaction networks immediately come to mind. 
		<i>Photo by <a href="http://www.princeton.edu/~dobber/">Andy Dobson</a></i>.
		
		]]></description>
	</item>

	<item>
		<pubDate>Wed, 14 Dec 2011 01:27:00 GMT</pubDate>
		<title>Reproducible peer review</title>
		<link>http://www.trevorbedford.com/archive/dec_14_2011.html</link>
		<description><![CDATA[

		<p class="margin">		
		<table style="float:right; text-align:center; padding:10px 30px 15px 30px;" width=250>
			<tr><td></td><td colspan=2>Journal A</td></tr>
			<tr><td>&nbsp;</td><td>Accept</td><td>Reject</td></tr>
			<tr><td>Accept</td><td>71 (67)</td><td>35 (43)</td></tr>
			<tr><td>Reject</td><td>42 (43)</td><td>31 (27)</td></tr>
			<tr><td colspan=2>&nbsp;</td></tr>
			<tr><td></td><td colspan=2>Journal B</td></tr>
			<tr><td>&nbsp;</td><td>Accept</td><td>Reject</td></tr>
			<tr><td>Accept</td><td>57 (47)</td><td>18 (27)</td></tr>
			<tr><td>Reject</td><td>16 (27)</td><td>25 (15)</td></tr>
		</table>	

		<p>
		I discovered this <a href="http://brain.oxfordjournals.org/content/123/9/1964">paper by Peter Rothwell and Christopher Martyn</a> through an excellent <a href="http://blogs.scientificamerican.com/guest-blog/2011/11/02/what-is-peer-review-for/">blog post by Bradley Voytek</a>.  In the paper, the authors show that reviews of the same paper by two independent reviewers show a level of agreement little better than expected by chance alone.  The authors repeat their experiment across two neuroscience journals.  For the first journal, they have 179 pairs of reviews, with 219 of the 358 votes (61%) recommending acceptance or acceptance with revision.  If votes between reviewers were distributed entirely by chance, we would expect 67 accept-accept pairs, 43 accept-reject pairs, 43 reject-accept pairs and 27 reject-reject pairs.  However, if the reviewers are coming to some sort of scientific consensus we would see an overabundance of accept-accept and reject-reject pairs.  
		
		<p>
		Here, I've shown their findings, with observed and (expected) counts for each scenario.  In journal A, there appears to be little or no difference from the chance expectation, while journal B shows a very modest improvement over the chance expectation.  A simple Fisher's exact test gives a <i>P</i> value of 0.285 on the results of journal A and a <i>P</i> value of less than 0.0001 on the results of journal B.  Additionally, Rothwell and Martyn find little correspondence in reviewer's assessments of priority of publication.
		
		<p>
		Interestingly, the authors studied reproducibility of abstract acceptance at two different scientific conferences.  Here, each abstract was reviewed and rated on a 1 to 6 scale by a panel of 14 or 16 reviewers.  In this case, variance across abstracts can be assessed, but also variance across reviewers (we expect some reviewers to be tougher than others in their assessments).  Rothwell and Martyn find a very modest <i>R</i><sup>2</sup> across abstracts of 0.11–0.15, indicating very little reviewer agreement.  However, <i>R</i><sup>2</sup> across reviewers was a more respectable 0.27–0.32, suggesting more variation in reviewer "toughness".
		
		<p>
		Thus, it appears that in small samples of two or three reviewers, noise from positive/negative reviewer bias may swamp the signal of a particular manuscript.  This fits with my own anecdotal experiences.  Usually (but by no means always) reviewers seem to agree on what's lacking in a manuscript, but will often disagree on how terrible a particular failing is to the manuscript's prospects.  Perhaps if each reviewer's overall positive/negative rating bias were taken into account, we could arrive at a measure of manuscript quality that is more repeatable between independent reviewers.  In turn, this could make authors less beholden to the roll of the reviewer die.
			
		]]></description>
	</item>

	<item>
		<pubDate>Wed, 23 Nov 2011 12:23:00 GMT</pubDate>
		<title>Canalization of the evolutionary trajectory of the human influenza virus</title>
		<link>http://www.trevorbedford.com/archive/nov_23_2011.html</link>
		<description><![CDATA[
		
		<p class="margin">
		<a href="/canalization/index.html"><img class="offset" src="/images/canal_map.png"></a>
		
		<p>
		In an ongoing effort to be more open in my scientific dealings, I've posted a preprint of my latest paper <a href="http://arxiv.org/abs/1111.4579">to the arXiv</a> and here on my website, as both <a href="/pdfs/bedford-canalization-2011.pdf">PDF</a> and <a href="/canalization/index.html">HTML</a>.  This represents my first attempt at a straight-up modeling study.  There's a lot going on with the epidemiology and evolution of influenza; I've made a model that attempts capture all the salient details.  This includes things like the yearly attack rates, rate of antigenic evolution, genetic diversity, and geographic spread.  At it's core, the model assumes that the antigenic phenotype of the virus can be adequately explained as a point in a Euclidean space.  Mutation serves to jostle the location of the virus in this space and infection by one virus confers immunity to subsequent infection by nearby viruses in this antigenic space.  The geometric basis of the model stems from empirical studies of influenza's antigenic phenotype (see <a href="http://www.sciencemag.org/content/305/5682/371.short">Smith et al. 2004</a>).  In this study, I find that evolution in such a space results in a "canalized" trajectory.  The best move for a virus is to move as far away from its past as possible, resulting in linear antigenic movement and a distinctive single-trunked phylogenetic tree.
		
		<p>
		I'm especially proud of my <a href="/canalization/index.html">HTML version of the manuscript</a>, which, through the magic of LaTeX, has all sorts of hyperlinking between figures and references.  In addition, I've done my best to make something that's highly readable on the screen.  Almost everything is taken care of by <a href="http://tug.org/tex4ht/">TeX4ht</a> conversion from my LaTeX source and with a CSS stylesheet, so with only a little more work I should be able to fully automate the process.
		
		<p>
		I'm working now to put the source code for the simulations behind this online.  In the meantime, I would very much welcome any feedback you might have on the manuscript.  Good to get feedback before publication, when there's still an opportunity to incorporate it.
		
		]]></description>
	</item>

	<item>
		<pubDate>Mon, 31 Oct 2011 11:50:00 GMT</pubDate>
		<title>Visualizing mortality data</title>
		<link>http://www.trevorbedford.com/mortality/index.html</link>
		<description><![CDATA[
		
		<p class="margin">
		<a href="/mortality/index.html"><img class="offset" src="/images/mortality_small.png"></a>
		
		<p>
		I came across a simple <a href="http://www.guardian.co.uk/news/datablog/2011/oct/28/mortality-statistics-causes-death-england-wales-2010#_">visualization of England and Wales mortality data in the Guardian</a>.  And because I couldn't deal with the network-y display of hierarchical count data, I decided to redesign the graphic as a tree map.  In googling for "treemap", I found <a href="http://mbostock.github.com/d3/">d3.js</a>, which makes extremely attractive Javascript graphics, with a number of rather fancy built-in figure types.  It seems a little harder to get into than <a href="http://processing.org/">Processing</a>, as it exposes more of the raw Javascript, but the results are beautiful and it provides full SVG support.  Here's the <a href="/mortality/index.html">mortality data laid out with d3's treemap algorithm</a>.
		
		]]></description>
	</item>

	<item>
		<pubDate>Tue, 25 Oct 2011 00:00:00 GMT</pubDate>
		<title>Estimating the effective population size of swine flu</title>
		<link>http://www.trevorbedford.com/archive/oct_25_2011.html</link>
		<description><![CDATA[
		
		<p class="margin">
		<a href="/images/measles_swine_human_large.png"><img class="offset" src="/images/measles_swine_human.png"></a>
		
		<p>
		In my paper on <a href="/tree_topology/">selection in viral phylogenies</a>, I compared the effective population size of measles virus to the effective population of human influenza virus.  The concept of effective population size <i>N<sub>e</sub></i> is central to population genetics.  It measures the timescale of population turnover, or, looking backwards in time, it measures how long it takes for individuals in the population to find a common ancestor.  Genetic diversity is a combination of this timescale and mutation rate.  
		
		<p>
		This is just a small addendum to that paper.  I had wanted to include swine influenza in with the comparison of measles virus and human influenza virus, but decided that this would detract from the paper's focus.  Here, the sequences of swine influenza come from <a href="http://jvi.asm.org/cgi/content/short/81/8/4315">de Jong et al. (2007)</a>.
		
		<p>
		The scaled effective population size <i>N<sub>e</sub>&tau;</i> of measles is estimated at 124.6 years, <i>N<sub>e</sub>&tau;</i> of global H3N2 human influenza is estimated at 7.2 years, and <i>N<sub>e</sub>&tau;</i> of European H3N2 swine influenza is estimated at 24.1 years.
		
		<p>
		This fits nicely with the observed patterns of antigenic evolution.  Infection with measles confers life-long immunity; evolution of the measles genome does not change its antigenic phenotype.  This results in neutral population dynamics.  However, human influenza evolves in antigenic phenotype very rapidly, causing strong selective pressures that reduce effective population size.  Swine influenza presents a nice example between these two extremes.  In comparing rates of antigenic evolution, de Jong et al. find that "while human H3N2 viruses have evolved at a rate of about 2.0 antigenic units per year since 1982, swine H3N2 viruses have evolved more than six times more slowly, about 0.3 antigenic units per year."  In this case, selective pressures still reduce effective population size, but not to the degree seen in human influenza.
		
		]]></description>
	</item>

	<item>
		<pubDate>Mon, 10 Oct 2011 00:00:00 GMT</pubDate>
		<title>Illustrating Darwin's Principle of Divergence</title>
		<link>http://www.trevorbedford.com/divergence/</link>
		<description><![CDATA[
		
		<p class="margin">
		<a href="/divergence/index.html"><img class="offset" src="/images/darwin_tree_closeup.png"></a>
		
		<p>
		In my work on flu I've been trying to build joint evolutionary and epidemiological models, where natural selection emerges dynamically from influenza strains competing for susceptible hosts.  In speaking on this, I found it useful to broaden the context a bit. Here, you can think very generally of genetic / ecological variants competing with one another in some sort of ecological space.  Variants that are close together in this space strongly compete, while more distant variants exist more-or-less independently.
		
		<p>
		This is exactly the model that Darwin used to illustrate the <i>Origin of Species</i>.  Here, I've described this idea in a bit more detail and built a <a href="/divergence/index.html">visualization of the model</a>.
		]]></description>
	</item>

	<item>
		<pubDate>Tue, 23 Aug 2011 00:00:00 GMT</pubDate>
		<title>Wright-Fisher population genetic simulation with selection</title>
		<link>http://www.trevorbedford.com/selsim/</link>
		<description><![CDATA[
		<p>I put the source code to the simulations in last month's tree topology paper <a href="/selsim/index.html">online</a>.
		]]></description>
	</item>

	<item>
		<pubDate>Mon, 25 Jul 2011 00:00:00 GMT</pubDate>
		<title>Strength and tempo of selection revealed in viral gene genealogies</title>
		<link>http://www.trevorbedford.com/pdfs/bedford-tree-topology-2011.pdf</link>
		<description><![CDATA[
		
		<p class="margin">
		<a href="/pdfs/bedford-tree-topology-2011.pdf"><img style="float:right; padding:8px;" src="/images/topology.png"></a>
		
		<p>
		I just had <a href="http://www.biomedcentral.com/1471-2148/11/220">a paper published online</a> in BMC Evolutionary Biology.  I've hosted a <a href="/pdfs/bedford-tree-topology-2011.pdf">compiled PDF</a> with all the figures inline, while their production office gets the official version together.  
		
		<p>This was a fun project.  It's essentially showing how the basic models of selection in population genetics play out in detailed phylogenies, where the passage of time is clearly evident.  I think a visual / phylogenetic approach really helps to understand the processes at work.  In this case, rather than describing an allele that reaches 100% frequency, the process of <i>fixation</i> describes a lineage that outcompetes its contemporaries and comes to be the progenitor of the entire future population.  A fundamental finding in population genetics is that selection reduces <i>effective population size</i>, that is when there is heritable variation for fitness, then the patterns of ancestry connecting individuals will resemble a smaller population.  Essentially, only a few fitter individuals have a chance of contributing their genetic legacy to the future population.  This paper explores the effects of selection on phylogeny shape, with particular attention to uncovering selective dynamics through time.
		]]></description>
	</item>

	<item>
		<pubDate>Tue, 14 Jun 2011 00:00:00 GMT</pubDate>
		<title>Matching R0 to cumulative prevalence in the H1N1 influenza pandemic</title>
		<link>http://www.plosone.org/article/info:doi/10.1371/journal.pone.0020358</link>
		<description><![CDATA[
		<p>It's a reassuring thing when science works the way it's supposed to and the pieces mesh together.  A paper by McLeish and colleagues just came out looking at <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0020358">sero-prevalence of pandemic H1N1 from 2009 to winter 2010 in Scotland</a>.  This comes from measuring antibodies specific to pandemic H1N1 in a large sample of the population.  McLeish et al. find that through March 2010, approximately 35% of people in Scotland were infected with pandemic H1N1 influenza.  This number apparently came as a surprise to the media and the story has been getting a lot of play.
		
		<p>However, what's cool is that this 35% number is exactly what we expected from our epidemiological models.  The basic reproductive number <i>R</i><sub>0</sub> measures the number of secondary infections expected to result from a given infection in a naive host population (a population with no previous exposure).  This number was quite small for the H1N1 pandemic, estimated <a href="http://www.sciencemag.org/content/324/5934/1557.abstract">at around 1.2 to 1.3 from the early upswing of the pandemic in 2009</a>.  The original paper detailing the SIR model (Kermack and McKendrick 1927) gives the formula: <img align="center" src="/images/finalsize.png">, where <i>Z</i> represents the final size of an epidemic (in terms of proportion of the population infected).  Numerically solving this for these values of <i>R</i><sub>0</sub> gives an expected final epidemic size of 31% to 42%.  This is amazingly on target.
		]]></description>
	</item>

	<item>
		<pubDate>Mon, 04 Apr 2011 00:00:00 GMT</pubDate>
		<title>Counting parameters in a phylogeny</title>
		<link>http://www.trevorbedford.com/writeups/effective_parameters.html</link>
		<description><![CDATA[
		<p>
		I did a <a href="/writeups/effective_parameters.html">small write-up</a> on the scaling of the effective number of parameters in phylogenetic inference.  I was surprised by how nicely it worked out.  Basically, each additional taxa included in the model contributes one additional parameter.
		]]></description>
	</item>

	<item>
		<pubDate>Tue, 08 Mar 2011 00:00:00 GMT</pubDate>
		<title>Multi-strain multi-deme model with SIRS dynamics</title>
		<link>http://www.trevorbedford.com/spatialflu/</link>
		<description><![CDATA[
		<p>I finally got the simulation code associated with the <a href="/pdfs/bedford-flu-mig-2010.pdf">"Global migration dynamics"</a> paper up and online.  Hopefully this will prove useful to other groups working on the evolutionary dynamics of viral pathogens.  Source code and analysis can be found <a href="/spatialflu/index.html">here</a>.
		]]></description>
	</item>

	<item>
		<pubDate>Thu, 02 Sep 2010 00:00:00 GMT</pubDate>
		<title>Updating coaltrace to use Javascript</title>
		<link>http://www.trevorbedford.com/coaltracejs/</link>
		<description><![CDATA[
		<p>I've been trying to bring myself somewhat up to date with current web technologies.  I got my coalescent visualization <a href="/coaltracejs/index.html">ported over</a> to HTML5 / Javascript by using <a href="http://processingjs.org">Processing.js</a>.  If you're running Chrome or Safari, then this definitely makes for the better version.
		]]></description>
	</item>

	<item>
		<pubDate>Thu, 10 Jun 2010 00:00:00 GMT</pubDate>
		<title>Adaptive impact of the chimeric gene Quetzalcoatl in Drosophila melanogaster</title>
		<link>http://www.trevorbedford.com/pdfs/rogers-qtzl-2010.pdf</link>
		<description><![CDATA[
		
		<p class="margin">	
		<a href="/pdfs/rogers-qtzl-2010.pdf"><img class="offset" src="/images/qtzl_large.png"></a>

		<p>Another <a href="/pdfs/rogers-qtzl-2010.pdf">paper to share</a>.  Here, I helped with the population genetic side of things in <a href="http://www.oeb.harvard.edu/faculty/hartl/lab/people/rebekah.html">Rebekah Roger's</a> analysis of a new gene in <i>Drosophila</i>.  Before I left the Hartl lab, I worked with Rebekah on a bioinformatic analysis of <em>chimeric</em> genes.  These are strange accidents of evolution where two functioning genes are spliced together to create a new gene with bits from both parental genes.  We found 14 such genes in <i>Drosophila</i>, one of which is the focus of this current work.  This new gene, which we're calling <i>Quetzalcoatl</i>, appears to be fantastic for the flies that possess it.
		
		<p>From my perspective, it's good to see the bioinformatic work pay dividends.
		]]></description>
	</item>

	<item>
		<pubDate>Fri, 28 May 2010 00:00:00 GMT</pubDate>
		<title>Reuters: "Who to blame for flu? Maybe the US, study finds"</title>
		<link>http://www.reuters.com/article/idUSN2713330520100527</link>
		<description><![CDATA[
		<p class="emph"><a href="http://www.reuters.com/article/idUSN2713330520100527">Reuters: "Who to blame for flu? Maybe the US, study finds."</a>
		
		<p>This is hilarious.  Of course it has to be someone's fault.  Wow.  
		
		<p>Ph.D. comics did a <a href="http://www.phdcomics.com/comics.php?f=1174">piece on this</a>.  Fortunately, the articles produced by science writers were actually pretty good: <a href="http://www.eurekalert.org/pub_releases/2010-05/uom-fdd052010.php">U of M News Service</a>, <a href="http://www.hhmi.org/news/pascual20100527.html">HHMI News</a>,  <a href="http://www.cidrap.umn.edu/cidrap/content/influenza/general/news/may2710strains.html">CIDRAP News</a>.
		]]></description>
	</item>

	<item>
		<pubDate>Thu, 27 May 2010 00:00:00 GMT</pubDate>
		<title>Global migration dynamics underlie evolution and persistence of human influenza A (H3N2)</title>
		<link>http://www.trevorbedford.com/pdfs/bedford-flu-mig-2010.pdf</link>
		<description><![CDATA[
		
		<p class="margin">			
		<a href="/pdfs/bedford-flu-mig-2010.pdf"><img class="offset" src="/images/flumap.png"></a>
		
		<p>
		Today, my paper on <a href="/pdfs/bedford-flu-mig-2010.pdf">migration patterns in the flu virus</a> was published in PLoS Pathogens.  This was fun work to do, requiring approaches from multiple disciplines.  While the basics of the migration model came from population genetics and coalescent theory, fitting this model to sequence data required a lot of heavy-lifting computation implemented by Peter Beerli in the program <a href="http://popgen.sc.fsu.edu/Migrate-n.html">Migrate</a>.  I originally wrote my program <a href="/pact/index.html">PACT</a> to deal with the enormous (2000+ tips) phylogenetic trees produced by this analysis.  Additionally, a lot of epidemiology went in to making realistic simulations on which to hone the methods.
		
		<p>The common ancestor of all contemporaneous H3N2 flu can be traced back to a single infection occurring somewhere in the world approximately 2-5 years before hand.  This infection, by luck and by virtue of its genotype, becomes the progenitor of the entire worldwide flu population.  The main goal of this analysis was to trace this progenitor lineage through time.  We found that this lineage existed primarily in China and Southeast Asia, but also, surprisingly, in the USA.  The occasional presence of this progenitor lineage in USA has important public health implications.
		
		<p>I'm not terribly happy with the PLoS presentation.  Rather than keeping figures as line art, they were converted to low quality bitmaps.  Also, I don't like the splitting of the supporting information into 10 different files.  So, in addition to the paper, I'm hosting high quality PDFs of the figures and a single PDF of the entire supporting appendix.  Go <a href="/papers.html">here</a>.
		]]></description>
	</item>

	<item>
		<pubDate>Sat, 01 May 2010 00:00:00 GMT</pubDate>
		<title>Presentation on phyloseminar.org</title>
		<link>http://phyloseminar.org</link>
		<description><![CDATA[
		<p>About a week ago I gave an internet seminar on phylogenetics of the influenza virus over at <a href="http://phyloseminar.org">phyloseminar.org</a>.  I'm very happy someone stepped up to organize something like this (thanks Erick!).  You can watch me talk to my computer for an hour if you'd like, and the other seminars are really good as well.  
		]]></description>
	</item>

	<item>
		<pubDate>Thu, 11 Mar 2010 00:00:00 GMT</pubDate>
		<title>Population genetic simulation on a mutational landscape</title>
		<link>http://www.trevorbedford.com/poptrace/</link>
		<description><![CDATA[
		
		<p class="margin">	
		<a href="/poptrace/index.html"><img class="offset" src="/images/poptrace_large.jpg"></a>
		
		<p>
		I've posted another Processing app. This one is <a href="/poptrace/index.html">a basic population genetic simulation</a>.  There are multiple variants within a population of reproducing individuals.  Variants can mutate into other variants, and the frequencies of each change over time due to genetic drift and natural selection.  
		
		<p>There are a number of basic results that are immediately obvious here, such as the conditions required for persistent variation in the population and the conditions required for the evolution of mutational robustness. 
		]]></description>
	</item>

	<item>
		<pubDate>Wed, 13 Jan 2010 00:00:00 GMT</pubDate>
		<title>Compilation of most interesting Wikipedia articles</title>
		<link>http://www.trevorbedford.com/writeups/wikipedia.html</link>
		<description><![CDATA[
		<p>I often find myself getting lost in Wikipedia.  There are so many amazing things in this world.  More recently, I've started keeping track of some of the more interesting / outlandish articles I come across.  You can find the list <a href="/writeups/wikipedia.html">here</a>.
		]]></description>
	</item>

	<item>
		<pubDate>Mon, 12 Oct 2009 00:00:00 GMT</pubDate>
		<title>PACT: Posterior Analysis of Coalescent Trees</title>
		<link>http://www.trevorbedford.com/pact/</link>
		<description><![CDATA[
		
		<p class="margin">		
		<a href="/pact/index.html"><img class="offset" src="/images/single_tree.png"></a>
		
		<p>
		I wrote a program called <a href="/pact/index.html">PACT (Posterior Analysis of Coalescent Trees)</a> this spring to properly analyze the genealogical trees produced by <a href="http://popgen.scs.fsu.edu/Migrate-n.html">Migrate</a>.  I finally put in the extra effort to write documentation and make it easy for other people to use the software.  It's now available for <a href="/pact/index.html">download</a>.
		
		<p>I had originally wanted to estimate the relative contribution of various geographic regions to the evolution of the influenza virus.  Trees produced by Migrate contain an explicit description of which geographic region branches reside in.  It was just a matter a extracting, displaying and summarizing this information.  The program can do a variety of things beyond this, and hopefully should prove a useful accessory to any sort of coalescent inference.
		]]></description>
	</item>

	<item>
		<pubDate>Wed, 23 Sep 2009 00:00:00 GMT</pubDate>
		<title>Basic coalescent simulation with physics-based layout</title>
		<link>http://www.trevorbedford.com/coaltrace/</link>
		<description><![CDATA[
		
		<p class="margin">	
		<a href="/coaltrace/index.html"><img class="offset" src="/images/coaltrace_large.jpg"></a>
		
		<p>
		I've written a small Processing <a href="/coaltrace/index.html">app to visualize the genealogical process</a>.  I've seen a lot of evolutionary trees drawn quite nicely.  However, this is the first example that I've seen that presents trees in a dynamic fashion, showing how they evolve over time.  It also allows for interactivity.  For instance, you can see how adding more individuals to an evolving population causes their evolutionary tree to deepen.
		
		<p>Probably the best part about writing this in <a href="http://processing.org">Processing</a> is how nicely objected-orientated things are.  Each individual in the simulation follows a simple physics simulation, repelling away from other individuals.  This takes care of layout without having to worry about high-level control.
		
		<p>I'm planning on writing more apps in this vein.  I think it might be a very useful framework for data analysis, rather than just simulation. 
		]]></description>
	</item>

	<item>
		<pubDate>Thu, 17 Sep 2009 00:00:00 GMT</pubDate>
		<title>Welcome</title>
		<link>http://www.trevorbedford.com</link>
		<description><![CDATA[
		<p>Welcome.  I created this site to host my work, both large and small.  Large projects have a natural home as articles  in scientific journals.  However, I'll often spend an afternoon following up on some small thing that's of passing interest to me.  I would like to keep a journal of these small creations.  Not planning a blog, but something involving a bit more novelty.  We'll see what happens...
		]]></description>
	</item>

	</channel></rss>

