In a recent post in response to a suggestion that there should be a ‘GitHub for science’, Cameron Neylon discusses the need for core technology which will allow irreducible ‘grains’ of research to be distributed. He makes the argument that these packets of information need context and sufficient information that they become the ‘building blocks’ of scientific information on the web – with these, the higher level online web transactions that we anticipate will revolutionise and accelerate research will precipitate out with the minimum of effort as they have done for software, social interactions etc for the wider web.
Neylon’s post links (as an example of a step in the right direction) to this work on Research Objects, “sharable, reusable digital objects that enable research to be recorded and reused“. This is great stuff and if standardisable might start to fulfil Neylon’s vision for a transfer protocol for research information. However Research Objects in particular are likened to academic papers, which I think is the wrong scale to be looking at the problem. Using the code analogy we need snippets that can be rehashed into other uses, not complete programs, whether open source or not.
For e.g. laboratory chemistry an experiment itself might be made up of many research objects, such as a buffer solution of a particular composition and concentration (which is in turn made up of water of a particular purity and constituent chemicals of a particular level of purity from a particular manufacturer and batch). All this data should be encoded. One can imagine a globally unique identifier for research objects at this very granular level. Other examples might be the workflow for the physical experiment and subsequent data processing and the scripts used to process the data and do the statistics. Granulating and classifying all this really appeals to my geeky side and I’ve tried to do this kind of stuff in my lab based and other research in my open lab notebook, for instance defining individual reagent or standard solutions then using them in analyses and documenting bits of e.g. experimental design or data analysis with associated code snippets to allow reproduction.
This approach could conceivably work very well for experimental information and data, and even the numerical analysis of data, but it doesn’t necessarily capture another important transacted currency in research – ideas; or the joining material between the ideas and the research object – the (numerical or qualitative) assessment of a body of evidence provided by discrete pieces of research in support of or against a particular idea. You could call these quantities hypothesis and synthesis. I think these fundamental concepts in research are often lost in the written record of research at the moment for a number of reasons, most importantly because much of the work of proposing hypotheses and conducting synthesis work tends to fall through the cracks of ‘lost knowledge’ in the publication process. It’s difficult to get hypotheses and synthesis work published in the literature on a stand alone basis.
Furthermore the effort of proposing hypotheses, testing and assessing them is something which is better done at the community- rather than the individual- level. As well as sharing the effort and avoiding repetition, community-level synthesis and hypothesis testing should result in better research. In my area of science, where we look at the complex interactions of physics, chemistry and biology in natural systems, I find there is much ‘received wisdom’, concepts and ideas which propagate through the field with little easily accessible documentation to back them up. It might be out there buried in little packets distributed across many papers in the literature, but often it isn’t assessed openly by the community.
For example the received wisdom (simplified here for argument’s sake) is currently that the North Sea is Nitrogen limited (i.e. there is an unspoken hypothesis to this effect). A decade or two ago most people thought it was phosphorous limited. Nobody has written a paper about it or studied it specifically (at least not in the literature), people just look at one aspect or another of this when doing their own study on something else and make statements or inferences in their papers, which tend to influence the field. Other people may present evidence against the hypothesis in their paper, but they aren’t considering the subject in their analysis so pass no comment on it. The measurement people don’t ask ‘what do the models say’. The modellers don’t think about things in the same way, so don’t ask the question, or look for the answer. There’s no crosstalk, or open reasoned discussion which is inclusive to the whole community. I’m not saying that I disbelieve the hypothesis, I just think most people who use the argument in discussions probably don’t have a good grip on the whole of the body of knowledge we have on the subject. By restating the hypothesis they strengthen the community belief in it. I’m not expert enough or well read enough in that particular subject to know whether the idea that the North Sea is N limited is a well-evidenced hypothesis or a meme. People I trust and respect have told me it’s true, but that is no substitute for a structured and argued body of evidence. I would like a centralised source of evidence for and against such a hypothesis and an open community-driven assessment of its validity – it would be really useful for the proposal I’m currently involved in writing. I could spend weeks reading the literature and make my own assessment, but I haven’t the time.
Similarly, in biochemistry there is currently debate over the significance that structural dynamics have on the reactivity of enzymes. There are papers arguing for and against. As the author of this blog post points out, discussion in the literature can be biased by friendly or hostile reviewers, who take a strong view for and against the hypothesis in their reviews of experimental or perspective pieces. This is a problem for reasoned trustworthy debate; and the forum for debate and response in the peer-reviewed literature is slow and difficult. By the time a response is published the field has moved on and potentially adopted the ideas presented in the original paper, which may or may not have been biased to one side of the argument. Furthermore, with papers and responses scattered throughout the literature, there is no central point from which to access the body of published knowledge (unless someone writes a review article and manages to capture all the relevant evidence – again something that is more likely to be successful if a wide community is involved rather than just a small group). Future papers cannot be caught by this ‘net’. If I want to read up, I have to a) find all the papers and b) read all the papers. If all I want to do is cite the ‘state of the art’ in the field in the introduction to a paper I’m writing on something related but not completely dependent on the hypothesis, then I’m more likely to cite a single article which takes one view or the other, or cite one of each, thus reinforcing ether one side of the argument or propagating the idea that ‘we don’t know’ which may or may not be true – and is impossible to assess without a detailed synthesis and assessment of the available information. Back to needing a community effort… If a group of experts state that they all believe a hypothesis and do so at the front of a big body of community-compiled and analysed evidence and argument, then I’d be much more happy to ‘receive their wisdom’.
Maybe this is something we can tackle with existing web technology without the need for a new underlying research-specific standard of information transfer. There are plenty of reasons why building “X” for research isn’t a particularly good idea (which boil down to “why not just use X”), but there is space for online tools which take research-specific data models and build web services around them. Figshare, for instance, or Mendeley. These tools are not restricted to research, however. Anybody can use them. I’ve been considering a similar web service for hypotheses and syntheses recently. Let’s call it Hypothify for argument’s sake (domain already registered :-)). It would be a space where hypotheses can be proposed, evidence compiled and synthesised, reasoned discussion can be conducted. Majority consensus could be built. Or not. Depending on the state of our knowledge. Hypotheses could range from highly specific (“In our experiment X we expect to find that Y happens“) to very broad conceptual hypothesis (“It is statistically unlikely that Earth is the only planet in the universe which supports intelligent life”). Key papers could be identified in support of/ against the hypothesis and short summaries written. Corresponding authors of those papers would be notified and invited to contribute. Contributions would be rated by the community. The major contributors of evidence for or against would be listed. Thus each hypothesis would be a ‘living document’ with an ‘author list’. Not peer reviewed but peer assembled. Citeable with a doi?
In some way hypotheses will tend to be heirarchical and interdependent, or importantly, mutually exclusive and this could be represented where appropriate. Hypotheses needn’t be limited to science: “Edwin Drood was murdered by his uncle“. Academics and members of the public would be equally able to contribute. Some moderation would inevitably be necessary on controversial topics – climate change for instance. But Hypothify would be a space for engagement with the wider community both in terms of content but also the process of academic research. This is a positive thing. We can take useful bits of the wider web to use for our work (GitHub, Twitter, Slideshare), why not send something back the other way?
In my next post I’ll outline the (rather sketchy) details of how I think Hypothify might work. Would love to hear what you think! If you’re already convinced, please register your interest here.