Monday, October 10, 2011

Hiatus

Thanks to the regular readers of this blog!

Since I started this blog, I've become a lot busier, and readership of my blog has decreased a bit, such that blogging can't effectively compete for my time right now.

So, I am sorry to say that I will no longer be posting here.

I look forward to a time when I can dedicate more energy towards blogging.

In the meantime, here are some great science blogs, of which I am a fan:

Bad Science
Cosmic Variance
Not Exactly Rocket Science

Thursday, September 8, 2011

how many people even read scientific papers, anyway?

Turns out this can be a hard question to answer. Most journals don't publish that information. But, at the end of the day, I think it's a pretty important question to answer, in terms of assessing how much (if any) impact my work has had on other people, so I'm going to try and address it here.

Now, for basically any published paper, it's pretty easy (using web of science, scopus, or google scholar) to find out how many other papers cited that one. So, what remains is to find a conversion from number of citations (C) to number of readers (R).

Fortunately, there are a few (rare!) journals that actually do publish the number of page views for the on-line versions of journal articles. In particular, the PLoS (public library of science) journals do just that. These are (pretty highly regarded) open-access journals, predominantly featuring biomedical research. So, using the PLoS data, we can take a stab at estimating how many readers one should associate with a given number of citations.

We'll do this by looking at randomly selected papers from PLoS biology, from 2008 and 2007. The (more than around 1 year) age is important because, for very recent papers, there hasn't been enough time for articles to be written that cite them, and for those citations to be cataloged, so the apparent number of citations would be erroneously small. We'll also restrict ourselves to research articles. Most journals (PLoS included) publish a lot of different types of content (including reviews of various kinds), but I'm mainly interested in research articles for this blog post.

For each paper, we'll take the number of citations from Web of Science ('cuz we need to pick some source, and they're as good as any).

I ended up selecting 50 papers, mainly because I got bored of doing data entry after that long. I did check, however, and the numerical conclusions I draw (below) are the same as I found from an analysis of the first 30 papers I looked at, so they're probably reasonably robust to the sample size.

The results are actually pretty surprising.

First up, the average number of citations was 29.48, while the average number of readers was (gasp) 6804.2.

That's way more readers than I might have naively thought. For a more in-depth analysis, we'll need to look at the relationship between number of citations and number of page views, on a paper-by-paper basis. That data is shown in the scatter plot below, on which each dot represents one paper. The red line is a best-fit line that is required to go through the origin (so a paper that's never been read can't be cited: seems reasonable, no?).


The first thing to notice is that the points don't really lie on that line. In particular, the correlation between number of readers and number of citations is actually pretty weak. We can quantify this by measuring the linear correlation coefficient, which would be 1 if the blue dots lay perfectly on the red line (so number of readers was a perfect predictor of number of citations), 0 if there was no relationship between these numbers, and possibly negative if fewer people cited more widely-read papers. For the data I show here, that correlation coefficient is 0.373, which confirms what we see visually: more highly read papers are more highly cited, but not by that much.

Finally, the slope of the red line tells us (roughly) how many people read a given paper for every one time that paper is cited. That number is 1/323. This agrees pretty well with a comparison between the averages above, which would suggest 230.8 readers per citation.

So, let's wrap this up by saying: for every time your paper gets cited, you can guess that several hundred people read it, but it could very well be way more (or less) than that number, because of the relatively weak correlation between readers and citations.

To wrap up this post, I'll point out a few potential flaws in my analysis:

1) I didn't consider any journals besides PLoS Biology, so I don't know how well my conclusions generalize.
2) I analyzed a pretty small number of articles.
3) There is at least one data point on that plot that looks like an "outlier", which probably could be excluded from this analysis.

Tuesday, August 30, 2011

We're all (sort of) blind, even if we can "see"

When your brain processes visual inputs, some information is ignored / discarded. This is pretty well known, and most of us have had experiences where we've failed to notice something that was right in front of us (for example, the "where's Waldo?" books).

As a (slightly weak, but really cool) example, consider the pictures on this website. The first picture shows a man standing in front of a shelf in a supermarket aisle. It's not hard to imagine that, if you didn't know he was there, and looked pretty quickly at the scene, you might miss him.

Now, the question that I want to know the answer to (and that Freeman and Simoncelli have helped answer), is "what information is used, and what information is discarded?".

To help answer this question, they hypothesized a certain set of statistics that might characterize an image. The details of these statistics are technical, and are based on a model of visual cortex.

Then, they took real images, and for each image they computed their statistics, and then generated synthetic images that had all of their statistics correct, but were otherwise as random as possible.

They then had human subjects perform a discrimination task, where they were shown one picture (a real one), then another one shortly after, and were asked whether the two images were the same or not.

What they found was there were certain (pretty severe!) image manipulations for which their subjects couldn't tell the difference between synthetic and real images, thus performing at chance levels (50%) on the discrimination task.

The structure of the un-noticeable manipulations they performed let them infer several properties of the visual system, which agree well with what others have measured by using invasive electrophysiology techniques.

So, next time you look out your window, and think you are seeing all the "stuff" that's out there, think again! You're actually only seeing a (very!) impoverished fraction of the available information.

Monday, August 29, 2011

"bad science" is really good

Frequent readers of this blog may remember an earlier post, in which I discussed the problem of publication bias in medical literature.

Recently, I came accross an excellent blog called bad science that chronicles the issues of communicating statistical results (specifically about medical research) to the broader community, and especially the difficulties that arise when (oftentimes sensationalist) media are involved.

The posts are generally very accessible, and serve to highlight the (growing?) enormity of this issue. Kudos to Goldacre!

Monday, August 8, 2011

Howard Hughes is my Patron (?)

For those who don't know, Howard Hughes was an eccentric american gazillionaire, who founded the Howard Hughes Medical Institute (HHMI), and subsequently bequeathed his substantial fortune to HHMI upon his demise.

HHMI currently funds huge amounts of biological and medical research, predominantly in the US.

Recently, HHMI executives decided to start offered PhD fellowships to foreign graduate students, to support them for the last 2 or 3 years of their doctoral studies. I was lucky enough to be chosen as one of the recipients of this new award.

I am pretty excited about this for a few reasons:

1) I know some of the other students who won (and a few who were turned down) for this award, and they are a pretty talented bunch, so it's an honor to be included in this group

2) I can finish my studies at Berkeley without worrying about how to pay my tuition and salary

3) Unlike most PhD fellowships (NSF, for example), this HHMI grant includes a (modest) budget
for travel to professional meetings. With the current state of Berkeley economics, I probably wouldn't get to go to many neat conferences otherwise.

4) I think this recognition will help me get grants and/or jobs in the future (although I could always be mistaken).

Anyhow, many thanks to Howard Hughes for ponying up the cash to support my studies! If you are interesting, the HHMI press release has more details about the fellowship.

Also, if you are a foreign graduate student, doing a PhD in the US in a biology-related field, definitely consider applying for next year's competition!



Treating Parkinson's with Math

So... I'm back in the USofA now, after a long-ish trip to Sweden for the CNS conference. Overall, the meeting was pretty good, and there was some great science presented! On top of that, Stockholm is a gorgeous city, and well worth a visit.

One of the keynote talks at this meeting was by a German physicist-turned-neuroscientist (much like myself), on a very exciting new treatment for Parkinson's Disease.

For those of you who don't know, Parkinson's is a debilitating condition often associated with uncontrolled shaking of the limbs, and difficulty in controlling movement.

They key to treatment is the realization that Parkinson's arises from overly synchronized neural activity in the midbrain, often caused by a lack of dopamine-producing cells. Normally, neurons fire relatively asynchronously (not all at the same time), so that synchrony is a clear atypical situation.

The question is, then, can that synchrony be removed, and if so, will that restore functionality for the Parkinson's patient? Schockingly, the answer is yes!

This, on it's own, is nothing really new. In particular, a technique called deep brain stimulation (DBS) has been around for awhile, and amounts to implanting something akin to a pacemaker in the brain. While that is already a big advance in Parkinson's treatment, it's not really a cure because as soon as one turns off the pacemaker, the symptoms return, and the effectiveness of the pacemaker often decreases over time.

What Tass and his colleagues did, however, is a bit more interesting. They started by modeling the diseased condition as a set of coupled oscillators (a standard physicsy thing to do), wherein the couplings were affected by the neural activity (via STDP, a well-known form of neural plasticity that is though to underly learning and adaptation).

They then realized that, if they could co-activate subsets of these oscillators, the STDP adaptation would, over time, break those connections that were forcing the synchronous activity.

So far, I think it's a fairly neat story, but not an unusual one: a physicist sees some real-world thing and says "ah... I think that's easy to model", and writes down some equations.

However, Tass took this a bit further, and invented a device to perform that neural co-activitation, leading to a technique he calls Coordinated Reset stimulation. He got permission to implant it into some Parkinson's patients, and studied their outcomes.

The results were surprising: after only a short period of treatment, the Parkinson's symptoms were gone, and they did not return when the treatment ended (much unlike the standard DBS pacemaker treatements).

A summary of this talk is available online. I think it's a great reminder to physicists to keep tackling real-world problems, and not to stop once the equations are solved, but rather to keep pushing until the solution is implemented, or it becomes apparent that it is not implementable.







Wednesday, July 13, 2011

White is the color of... LGN?

A lot of computational neuroscientists use something called information theory to try and understand how the parts of the brain communicate with each other. Info theory is a relatively young field, dating back to some work by Claude Shannon in the mid 1900's, and basically formalizes (mathematically) a lot of ideas about how much one could learn from a signal.

The goal of this blog post is to understand a beautiful experimental result published in 1996 by Yang Dan and colleagues. To understand this, we need to first understand how redundancy affects information transfer efficiency.

Let's imagine that you and I are in a conversation, and I choose to repeat every word twice (so it starts as "Hi Hi how how are are you you doing doing today today??"). Now clearly that is not an efficient use of my speech, because we know of a simple way I could have said the same thing in less (1/2 as many) words. One way to formalize that notion is by observing that, the way I spoke, you could predict every 2nd word, once you knew the odd-numbered words, so 1/2 the words are redundant.

What Yang Dan and colleagues showed is that the outputs of LGN neurons have the minimum possible amount of redundancy (like in the case where I only say "Hi how are you doing today?" instead of repeating myself), when presented with naturalistic movies; they showed Casablanca to their subjects.

Now, on it's own, that might seem unimpressive: maybe LGN is just set up so that it always has non-redundant outputs. Well, they did a great control experiment to show that that's not true: they presented their subjects with white noise stimuli (like the static you might see on old-timey televisions when the cable is out), and found that, in that case, LGN outputs were highly redundant! What gives?

Well, it turns out that movies (and images) of real-world stuff (forests, cities, animals, etc.) all have very similar statistical properties. This means that, if you were to make a system for communicating those signals, you could set it up in a way that removes all the redundancies that occur in those movies (like, for example, nearby parts of an image tend to be the same brightness). But, if you took that highly engineered system and applied it to movies with different redundancies, it wouldn't work quite right.

The result of Yang Dan's experiment suggests that, by adapting to the natural environment (possibly over evolutionary time scales), our brains are set up so as to do the most efficient possible job for typical real-world movies!

This remains to me one of the best success stories of systems neuroscience, in which a combination of mathematics (understanding information theory) and experimentation lead us to better understand how it is that our brains work.