local field potential: how many people even read scientific papers, anyway?

Turns out this can be a hard question to answer. Most journals don't publish that information. But, at the end of the day, I think it's a pretty important question to answer, in terms of assessing how much (if any) impact my work has had on other people, so I'm going to try and address it here.

Now, for basically any published paper, it's pretty easy (using web of science, scopus, or google scholar) to find out how many other papers cited that one. So, what remains is to find a conversion from number of citations (C) to number of readers (R).

Fortunately, there are a few (rare!) journals that actually do publish the number of page views for the on-line versions of journal articles. In particular, the PLoS (public library of science) journals do just that. These are (pretty highly regarded) open-access journals, predominantly featuring biomedical research. So, using the PLoS data, we can take a stab at estimating how many readers one should associate with a given number of citations.

We'll do this by looking at randomly selected papers from PLoS biology, from 2008 and 2007. The (more than around 1 year) age is important because, for very recent papers, there hasn't been enough time for articles to be written that cite them, and for those citations to be cataloged, so the apparent number of citations would be erroneously small. We'll also restrict ourselves to research articles. Most journals (PLoS included) publish a lot of different types of content (including reviews of various kinds), but I'm mainly interested in research articles for this blog post.

For each paper, we'll take the number of citations from Web of Science ('cuz we need to pick some source, and they're as good as any).

I ended up selecting 50 papers, mainly because I got bored of doing data entry after that long. I did check, however, and the numerical conclusions I draw (below) are the same as I found from an analysis of the first 30 papers I looked at, so they're probably reasonably robust to the sample size.

The results are actually pretty surprising.

First up, the average number of citations was 29.48, while the average number of readers was (gasp) 6804.2.

That's way more readers than I might have naively thought. For a more in-depth analysis, we'll need to look at the relationship between number of citations and number of page views, on a paper-by-paper basis. That data is shown in the scatter plot below, on which each dot represents one paper. The red line is a best-fit line that is required to go through the origin (so a paper that's never been read can't be cited: seems reasonable, no?).

The first thing to notice is that the points don't really lie on that line. In particular, the correlation between number of readers and number of citations is actually pretty weak. We can quantify this by measuring the linear correlation coefficient, which would be 1 if the blue dots lay perfectly on the red line (so number of readers was a perfect predictor of number of citations), 0 if there was no relationship between these numbers, and possibly negative if fewer people cited more widely-read papers. For the data I show here, that correlation coefficient is 0.373, which confirms what we see visually: more highly read papers are more highly cited, but not by that much.

Finally, the slope of the red line tells us (roughly) how many people read a given paper for every one time that paper is cited. That number is 1/323. This agrees pretty well with a comparison between the averages above, which would suggest 230.8 readers per citation.

So, let's wrap this up by saying: for every time your paper gets cited, you can guess that several hundred people read it, but it could very well be way more (or less) than that number, because of the relatively weak correlation between readers and citations.

To wrap up this post, I'll point out a few potential flaws in my analysis:

1) I didn't consider any journals besides PLoS Biology, so I don't know how well my conclusions generalize.
2) I analyzed a pretty small number of articles.
3) There is at least one data point on that plot that looks like an "outlier", which probably could be excluded from this analysis.

local field potential

Thursday, September 8, 2011

how many people even read scientific papers, anyway?

No comments:

Post a Comment