Related Reading
Research Library
The Unseen Scholars
Herbert Van de Sompel and Johan Bollen discuss researching information in the digital age
Accessing Complete Papers
Johan: But once a scientist searches for and finds a paper, he or she wants to read it. Just a few years ago, there were some more problems involved in doing that, but another of Herbert's ideas, SFX, fixed it.
Herbert: The situation was this: a scientist logged onto the library to read a research paper and found in it a link to another paper. The first paper was in a journal published by Elsevier, a large publishing firm, while the second was in a journal published by Wiley. Because the library has a subscription to access content from both publishers, the scientist should have been able to click on the link and read the second paper, but when he clicked on the link, he was told, "Sorry, you don't have a license to access this article."
The problem was that many institutions would subscribe to a publisher indirectly through a content aggregator like Ebsco, which provides access to lots of electronic journals. Elsevier, however, would send everybody to Wiley indiscriminately, even though the scientist should go to Ebsco, because that is where their institutional subscription existed. It was a huge problem.
1663: And the fix?
Herbert: The fix was SFX, a link server, or a computer that acts as a sort of concierge. Once installed at an institution, it knows all about the institution's subscriptions and services. Now when a scientist clicks, Elsevier's link goes to SFX at the scientist's institution because that link server knows which publisher or provider the scientist should get the Wiley content from. SFX sends the scientist a pop-up message that says, "Click here to access this paper."
Johan: But Herbert didn't tell every institution around the world to buy an SFX link server. Instead he created a standard—OpenURL—that specifies how an information system such as Elsevier should link to a link server such as SFX. As a result, many commercial link servers were developed, and most academic institutions worldwide now use one. OpenURL is even supported by Google Scholar.
1663: That's very clever.
Johan: It gets better. The network of link servers opened up a whole new area of research. Every user who clicks on a link is announcing to his institution's link server, "I want to access this now." So the link server can maintain a log file of what's being accessed and when. The log is called usage data. Herbert and I realized that if we could access the usage data of researchers worldwide, we could build an incredible picture of what is going on in science. The Andrew W. Mellon Foundation, a philanthropic foundation interested in scholarship and new tools for scholarship, came to the same conclusion and funded the MESUR project, which I've been working on for the past few years. One goal is to see if usage data will give us a way to assess scholarly impact, that is, to see who are the most-influential people, which are the most-important journals, what are the critical institutions, etc.
The value of a research paper is currently assessed using citation data. People literally count how many times the paper is cited. It's assumed that good papers get cited more often. But citation data provide a view of how science existed several years ago. It may take a year before a scientist's idea is written up, peer reviewed, and published, and then an equally long time before another scientist reads the paper and writes a new paper that cites it. It often takes several years for citations to mature in a particular discipline. If the number of citations is the only metric used to assess scholarly worth, young researchers who have been publishing for only a few years may be undervalued.
Herbert: And there is more that citation data do not reveal. Say a journal doesn't get cited much but is read by both physicists and archeologists. The journal fosters the flow of ideas between the two fields. Citation data do not reveal this because a physicist will typically not cite the archeology paper, but usage data show the connection.
Click to enlarge
In this "map" of real usage data, each dot is an electronic journal; a line between dots indicates that people accessing one journal went on to access the other. Shorter lines mean a stronger correlation. The map shows an interconnected cluster of journals from the social sciences and humanities surrounded by a network of journals from the natural sciences. The flow of information between the two domains is largely through interdisciplinary fields.


