Volunteer team working on UEA Broad

Originally posted on UEA Broad Time-Series Project:

A team of undergrad student volunteers will be working on the UEA broad project for the coming weeks. Led by a funded internship through UEA’s springboard programme (http://www.uea.ac.uk/internships) a team of students will be running the project, collating and managing the data, making additional measurements and trying to establish a sustainable model for the project to continue as a student-run volunteer project into the future.

For the past ~2 years regular (weekly) water samples have been taken from the lake by me and a couple of PhD student volunteers and analysed for a wide range of biogeochemical parameters including inorganic and organic nutrients, major ions, pH, conductivity, carbonate chemistry etc by the excellent analytical labs in the School of Environmental Sciences. This dataset is pretty unique in its broad range of chemical parameters measured at such high resolution, but there are some key measurements missing. For instance, we don’t…

View original 206 more words

Additional uses for Mendeley

Alison from Mendeley asked me to write a bit about the unconventional uses I’ve put Mendeley to, ahead of the Advisor Meeting in London at the end of next month.

The thing that caught Alison’s attention was the use of Mendeley groups in grant proposals. I’ve done this during the last two grant proposals I’ve been closely involved in writing. In both cases we have collated papers relevant to the grant proposal topic in a public Mendeley group.  As well as being a generally useful process and source of material for collaboratively writing a proposal, the idea has been that the groups demonstrate initial steps towards curating a ‘knowledge base’ on the subjects we’re applying for money for. This can then go into the ‘impacts plan’ part of the proposal as something we’re giving back to the wider community. I’ve had no specific feedback from reviewers / funders on this approach but have had very good feedback on the impacts plans more widely.

Of more direct relevance to trying to secure funding, for a proposal for a ‘desk based’ study of trace gas fluxes in the Arctic we needed to demonstrate that there was sufficient data to undertake the study we had proposed (suspecting that the reviewers would assume there wasn’t!). Therefore we collated papers on observations of trace gas concentrations in the Arctic Ocean and overlying atmosphere into a group that was cited directly in the proposal as evidence that there was sufficient data out there for what we wanted to do. This satisfied the reviewers, one stating that we had ‘demonstrated the data was out there’ and the other expressing surprise at the number of relevant papers – I assume from their comments that both had followed the link and had a poke around at the papers we’d collated. Sadly, in spite of very positive reviews, the grant was unsuccessful for unrelated reasons. However the approach of using Mendeley groups to demonstrate evidence/  data availability seemed to be completely acceptable.

A further use I’ve put Mendeley groups to recently is to ‘publish’ a list of the articles I cite in  a recent paper. Seemed like a useful thing to do – I’ve used private groups to organise papers relating to particular papers I’m working on, but making this one public at the end provides quick and easy links to the cited materials for anyone who wants to get at them. May never be used, but was almost zero effort for me to make it public, so why not?

I’m really looking forward to the Mendeley Advisor day in September – meeting the Mendeley team and finding out how other people are using it.

Google scholar h index and author page looking good

Recenty, Google Scholar released an update allowing authors to take ownership of their publications and group multiple instances of the same publication together. It logs citations to papers and thus can calculate an ‘h-index’ along the lines of those used by Scopus and Web of Science. This is really good because i) it’s free and ii) it doesn’t just log citations by journal articles, but also books, PhD theses, and other online materials of an academic flavour. It doesn’t always get right, but hell, neither does Scopus. 

Here’s my page. It was easy as falling off a log to find my publications and get them organised. My h-index is 2 higher on scholar than Scopus because of badly formatted citations not being caught by Scopus’ algorithm, and because of citations by non-counted sources in Scopus. Obviously I’m pleased about this! Taking into account a wider range of citing materials is a positive for alternative metrics.

The added bonus for authors is that by taking ownership of their publications, Google is more strongly associating authors with their work. Thus authors’ work will be found more readily and read and used (and cited if it’s any good) more often by people searching google for information. Other than the obvious concerns about  Google’s monopoly over the whole of the universe, this is pretty handy. 

Slightly plagiarised! AIBU?

Here’s the introduction to a paper I wrote in 2008  and following it, the introduction to a paper which came out by some other completely different authors this February. I won’t name them here. The two paragraphs are all but identical!

…In the atmosphere and ocean, ammonia and its protonated form, ammonium (NH4+) are ubiquitous. Naturally and anthropogenically produced NHx (NH3 + NH4+) is transported through the atmosphere and generally occurs in decreasing concentration in air with distance from land. It has been suggested that in preindustrial times the oceans were probably a net source of NHx to the continents [Duce et al., 1991], but this is not the case today. … NHx is produced in surface waters by the biological reduction of nitrate (either directly or via the degradation of biologically synthesized organic nitrogenous material). In solution it is partitioned between ammonium and ammonia according to equilibrium thermodynamics: The proportion of NHx that occurs as NH3 (dependent on pH, temperature and ionic strength of the medium) is available for emission to the atmosphere; the phase partitioning being dependent on the Henry’s Law coefficient. Ammonia is also emitted to the atmosphere by plants and animals in terrestrial environments (both directly and through breakdown of organic nitrogen) by soil microorganisms and by various industrial and agricultural processes, including the direct volatilization of solid ammonium nitrate salts in fertilizer. There is also evidence of a volcanic source of NHx to the atmosphere [Uematsu et al., 2004], and of substantial ammonia emissions from seabird and seal colonies [Blackall et al., 2007Theobald et al., 2006].

In the atmosphere and ocean, NH3 and its ionized form NH+4 are ubiquitous. Naturally and anthropogenically produced NHx (NH3 + NH+4 ) are transported through the atmosphere and generally their concentrations in air decrease as the distance from land increases. It has been suggested that in preindustrial times, the oceans were probably a net source of NHx of the continents (Duce et al., 1991), but this is not the case today (Sutton et al., 1995, 2000). NHx is produced in surface water by the biological reduction of nitrate (either directly or via the degradation of biologically synthesized organic nitrogenous material/agricultural run-off). In a solution, NHx is partitioned between NH+4 and NH3 according to equilibrium thermodynamics: the proportion of NHx that occurs as NH3 (depending on pH, temperature and ionic strength of the medium) is available for emission to the atmosphere (Aneja et al., 2001). NH3 is also emitted to the atmosphere by plants, animals and its environments, by soil micro-organisms and by various industrial and agricultural processes, including the direct volatilization of solid NH4NO3 salts and fertilizers (Sutton et al., 2000; Li et al., 2006; Sharma et al., 2010a, b). There is also evidence of volcanic source of NHx to the atmosphere (Uematsu et al., 2004) and of substantial NH3 emissions from seabird colonies (Blackall et al., 2007; Theobald et al., 2006).

I have mixed feelings about this. It’s no big deal – nobody’s stolen my data (they could, cos it’s mostly online…), or passed off my  paradigm-changing hypothesis as their own (I don’t have one of those, in case you’re wondering!). However, putting together a concise but informative introductory paragraph for that paper would have taken me a good few hours at the time and actually represents the end of a long process of reading and synthesis and understanding of the background literature over the course of my PhD some years back. This was my killer summary of environmental ammonia. I remember being rather pleased with it at the time. Someone else has come along and spent 15 minutes changing the odd word here or there and adding in one or two more up to date references (mine was maybe a bit sparse reference-wise) and there they have it.

But why shouldn’t they? Better that than they waste their time re-doing a job already done. Instead, they augment it with some extra references to bring it a bit more up to date and make it more information-rich. That’s exactly what hypothify/synthify is all about and I’m all for reducing the amount of repeated work (i.e. inefficiency) in research. And imitation is the ultimate form of flattery, of course.

I suppose the reason it feels rough is the complete lack of attribution for the work. They look clever through my hard work. When the name of the game in science is essentially looking clever, then they’ve got one up on me in a competitive environment. Just dropping a reference to my original paper into that paragraph would go along way to making me feel better about it. That’s all they could really do under the current publishing system.  Don’t get me wrong, I’m not really cross about this, and it’s not going to spoil my day (it’s already been spoiled by grant proposal writing!), I’m just saying. Attribution please!

For future reference, I hereby release the above paragraphs under a creative commons CC-by license, so anybody can use them,  so long as they attribute them properly!

Hypothify: first outline

I’ve been pondering a hypothesis-based academic social media site for a few weeks now; and talked with a couple of people about it. Ideas are only just beginning to coalesce but now seems the right time to try to outline what I see hypothify doing, and how it might work. I’m conscious that it needs to start very simple, and remain as simple as possible. It’s easy to come up with a massive feature list, but identifying the most important stuff and planning how it might work is key. [edit after writing – there’s still A WAY to go on this!!]

What?

A place to propose,  discuss and aggregate evidence for/against hypotheses. Consensus building through democratic voting up/down of evidence/ discussion items. What I thought Quora was going to be when I first heard about it, but less question/answer and more evidence-based.

In traditional(ish) publishing terms it would represent a platform for ‘living’ review articles on each hypothesis. However, would integrate social media aspects (voting and sharing) and wider evidence than just academic papers.

Not peer-reviewed, or peer evaluated, but peer assembled.

Why?

Hypotheses are fundamental to academic work. They represent the ideas and concepts which propagate through the academic sphere and out into the wider world as our understanding of the natural world/ the universe/ the human condition etc. They are often dissociated from the piece-by-piece evidence in the traditional academic record. Currently academics are supposed  to read everything and make up their own mind on a particular matter. For each individual this is only possible for a limited number of concepts/hypotheses because of the massive time cost of i) finding all the literature and ii) reading it all and ii) keeping up to date. In reality we all take ‘received wisdom’ on many matters on trust from other academics, or tend to disbelieve everything we’re told and argue it out ourselves from first principles! Hypothify would solve the pain of i) and negates ii) and iii) by providing community-maintained consensus instead of ‘received wisdom’ on each given hypothesis.

How?

The platform would allow the proposal of hypotheses by any user. Evidence items (papers [via Mendeley API if poss], unpublished data [figshare, slideshare, blogs or notebook entries], snippets of discussion / reasoned argument [from discussion of this hypothesis or elsewhere via e.g Disqus]) can be presented by any member of the community  as being ‘for’ or ‘against’ the hypothesis. Key to the usefulness of evidence items will be a tweet-length summary of what that evidence item contributes to the assessment of the hypothesis. One will have to be added by the person introducing the evidence, other ‘competing’ summaries may be added. Where necessary, discussion of the evidence can be conducted and this itself can be cited as evidence itself. It is conceivable that a single piece of evidence may be argued to support either side of a hypothesis. Maybe it’s necessary to recognise that evidence can be ‘agnostic’?

Key to the success of the platform will be the voting up/down of content (a la stack exchange). Hypotheses themselves should not be voteable on I think  -i.e. there will be no opportunity for individuals  to vote subjectively/dogmatically for/against a hypothesis, only vote up or down evidence supporting or contradicting the hypothesis. Plus vote up or down particular summaries of evidence items, so the best summaries float to the top for each bit of evidence. So the ‘hypothesis view’ page will show the hypothesis at the top and evidence items for and against (highest voted first), with the best summary pertaining to that hypothesis for each one. Plus a link to the evidence item (i.e. NOT stored on-site). I think this is really neat because a user can find a hypothesis they’re interested in, find what the community thinks it the base evidence for and against, read those bits, and make and informed decision based on comprehensive community review of the field. It may or may not be useful to have a ‘swingometer’ for each hypothesis which  represents the net votes for evidence for and against the hypothesis, which give a ‘community assesment’ of the hypothesis.

Attracting users?

What’s in it for users? Firstly, being seen to propose and and contribute to hypothesis assessment will bring kudos to users. A ‘reputation’ system (also a la Stack Exchange) could be implemented to measure reputation / value of contributions… Even badges etc would probably work for academics, but I think there’s a more instantly attractive ‘doughnut’ (as my good friend Charlie calls them) – promotion of your research output. If you add a good summary of your paper which informs debate on a particular hypothesis it will (if it’s good) float towards the top for that hypothesis. You will be able to engage with other interested parties and discuss your research. Google will love it.

Engage ‘the enemy’. Let’s say I propose a hypothesis, which just happens to be something I’ve proposed in papers in the literature in the past. Great. I put up the hypothesis, provide evidence items. As I’m the only contributor, the hypothesis is only ‘proposed’. To move it to the ‘community debated’ stage I need to get other people involved. So I share it on Twitter, but also invite people I know will be interested, to join the debate. Furthermore, other established hypothify users will be automatically invited to join based on their interests  and other hypotheses they’re active on and the tags which have been associated with the hypothesis in question.

As evidence items are added, the system will attempt to ‘scrobble’ originator details (emails, figshare user details, Mendeley author details) and contact the originators to inform them that their work is being used to inform debate on a particular hypothesis. They will be invited to join the debate. I’m guessing if their work is being ‘correctly’ cited they will be flattered enough to go and have a look, and if it’s being ‘incorrectly’ cited (in their opinion) they will be incensed enough to wade in and put those upstarts right. Thus the experts will hopefully filter in.

Furthermore, as evidence and discussion accumulates and more people vote evidence, evidence summaries and the hypothesis summary  up and down, the ‘top contributors’ will be identified. Those top contibutors, plus the hypothesis proposer (i.e. the proposer on hypothify, not necessarily the originator of the hypothesis in the outside world [who should be cited and acknowledged]) will be identified at the head of the hypothesis view as the ‘authors’ of the synthesis. Thus each hypothesis becomes a peer assembled citeable document (hopefully with a doi assigned). A publication! And as we all know in academia, publications are all. And what’s really nice is that it doesn’t matter which ‘side’ you’re on. If you’re presenting valuable evidence and discussion in either direction, you’ll be listed. So all those old vested interests evaporate away like the eyewash they are.

Synthify?

Not all problems present well as hypotheses. For instance, in my field – marine biogeochemistry – much science is exploratory, and/or based on assessing magnitudes of things : “What is the globally representative concentration of ammonium in the surface ocean“; “What is the annual global carbon uptake by primary producers in the ocean“. Of course, these can be presented as hypotheses “The globally representative concentration of ammonium in the surface ocean is 150nM“, but this is rather pointless. However, the accumulation of evidence leading to an assessment is much the same process as outlined above for hypotheses, only without the FOR and AGAINST argument. And these syntheses could then feed in to wider hypotheses as evidence. Synthify.com has been taken unfortunately, but I think it’s reasonable to conduct such data synthesis work under the hypothify banner. For the field I work in at least, I think the ‘synthify’ function will be as useful as the ‘hypothify’ one.

Anything else?

Moderation will be important and will rely strongly on the community. Controversial topics could get very sticky very quickly. Need to think about policing. Free speech is important, but balanced debate more so. Anthropogenic global warming deniers and intelligent designers are going to be a challenge to the stability and value of the system.

Integration with the rest of the web is obviously very important. All items will obviously be fully shareable, but a proper API would ideally allow full functional push and pull to and from other sites – mendeley, peer evaluation, wikipedia, quora, disqus etc etc.

If all this sounds irresistably interesting, please hit the pre-sign-up  at http://hypothify.kickofflabs.com

Transactions in research on the web: hypothesis and synthesis

In a recent post in response to  a suggestion that there should be a ‘GitHub for science’, Cameron Neylon discusses the need for core technology which will allow irreducible ‘grains’ of research to be distributed. He makes the argument that these packets of information need context and sufficient information that they become the ‘building blocks’ of scientific information on the web – with these, the higher level online web transactions that we anticipate will revolutionise and accelerate research will precipitate out with the minimum of effort as they have done for software, social interactions etc for the wider web.

Neylon’s post links (as an example of a step in the right direction) to this work on Research Objects, “sharable, reusable digital objects that enable research to be recorded and reused“. This is great stuff and if standardisable might start to fulfil Neylon’s vision for a transfer protocol for research information. However Research Objects in particular are likened to academic papers, which I think is the wrong scale to be looking at the problem. Using the code analogy we need snippets that can be rehashed into other uses, not complete programs, whether open source or not.

For e.g. laboratory chemistry an experiment itself might be made up of many research objects, such as a buffer solution of a particular composition and concentration (which is in turn made up of water of a particular purity and constituent chemicals of a particular level of purity from a particular manufacturer and batch). All this data should be encoded. One can imagine a globally unique identifier for research objects at this very granular level. Other examples might be the workflow for the physical experiment and subsequent data processing and the scripts used to process the data and do the statistics. Granulating and classifying all this really appeals to my geeky side and I’ve tried to do this kind of stuff in my lab based and other research in my open lab notebook, for instance defining individual reagent or standard solutions then using them in analyses and documenting  bits of e.g. experimental design or data analysis with associated code snippets to allow reproduction.

This approach could conceivably work very well for experimental information and data, and even the numerical analysis of data, but it doesn’t necessarily capture another important transacted currency in research – ideas; or the joining material between the ideas and the research object – the (numerical or qualitative) assessment of a body of evidence provided by discrete pieces of research in support of or against a particular idea. You could call these quantities hypothesis and synthesis. I think these fundamental concepts in research are often lost in the written record of research at the moment for a number of reasons, most importantly because much of the work of proposing hypotheses and conducting synthesis work tends to fall through the cracks of ‘lost knowledge’ in the publication process.  It’s difficult to get hypotheses and synthesis work published in the literature on a stand alone basis.

Furthermore the effort of proposing hypotheses, testing and assessing them is something which is better done at the community- rather than the individual- level. As well as sharing the effort and avoiding repetition, community-level synthesis and hypothesis testing should result in better research. In my area of science, where we look at the complex interactions of physics, chemistry and biology in natural systems, I find there is much ‘received wisdom’, concepts and ideas which propagate through the field with little easily accessible documentation to back them up. It might be out there buried in little packets distributed across many papers in  the literature, but often it isn’t assessed openly by the community.

For example the received wisdom (simplified here for argument’s sake) is currently that the North Sea is Nitrogen limited  (i.e. there is an unspoken hypothesis to this effect). A decade or two ago most people thought it was phosphorous limited. Nobody has written a paper about it or studied it specifically (at least not in the literature), people just look at one aspect or another of this when doing their own study on something else and make statements or inferences in their papers, which tend to influence the field. Other people may present evidence against the hypothesis in their paper, but they aren’t considering the subject in their analysis so pass no comment on it. The measurement people don’t ask ‘what do the models say’. The modellers don’t think about things in the same way, so don’t ask the question, or look for the answer. There’s no crosstalk, or open reasoned discussion which is inclusive to the whole community.  I’m not saying that I disbelieve the hypothesis, I just think most people who use the argument in discussions probably don’t have a good grip on the whole of the body of knowledge we have on the subject. By restating the hypothesis they strengthen the community belief in it. I’m not expert enough or well read enough in that particular subject to know whether the idea that the North Sea is N limited is a well-evidenced hypothesis or a meme. People I trust and respect have told me it’s true, but that is no substitute for a structured and argued body of evidence.  I would like a centralised source of evidence for and against such a hypothesis and an open  community-driven assessment of its validity – it would be really useful for the proposal I’m currently involved in writing.  I could spend weeks reading the literature and make my own assessment, but I haven’t the time.

Similarly, in biochemistry there is currently debate over the significance that structural dynamics have on the reactivity of enzymes. There are papers arguing for and against. As the author of this blog post points out, discussion in the literature can be biased by friendly or hostile reviewers, who take a strong view for and against the hypothesis in their reviews of experimental or perspective pieces. This is a problem for reasoned trustworthy debate; and the forum for debate and response in the peer-reviewed literature is slow and difficult. By the time a response is published the field has moved on and potentially adopted the ideas presented in the original paper, which may or may not have been biased to one side of the argument. Furthermore, with papers and responses scattered throughout the literature, there is no central point from which to access the body of published knowledge (unless someone writes a review article and manages to capture all the relevant evidence – again something that is more likely to be successful if a wide community is involved rather than just a small group). Future papers cannot be caught by this ‘net’. If I want to read up, I have to a) find all the papers and b) read all the papers. If all I want to do is cite the ‘state of the art’ in the field in the introduction to a paper I’m writing on something related but not completely dependent on the hypothesis, then  I’m more likely to cite a single article which takes one view or the other, or cite one of each, thus reinforcing ether one side of the argument or propagating the idea that ‘we don’t know’ which may or may not be true – and is impossible to assess without a detailed synthesis and assessment of the available information. Back to needing a community effort… If a group of experts state that they all believe a hypothesis and do so at the front of a big body of community-compiled and analysed evidence and argument, then I’d be much more happy to ‘receive their wisdom’.

Maybe this is something we can tackle with existing web technology without the need for a new underlying research-specific standard of information transfer. There are plenty of reasons why building “X” for research isn’t a particularly good idea (which boil down to “why not just use X”), but there is space for online tools which take research-specific data models and build web services around them. Figshare, for instance, or Mendeley. These tools are not restricted to research, however. Anybody can use them. I’ve been considering a similar web service for hypotheses and syntheses recently. Let’s call it Hypothify for argument’s sake (domain already registered :-)). It would be a space where hypotheses can be proposed, evidence compiled and synthesised, reasoned discussion can be conducted. Majority consensus could be built. Or not. Depending on the state of our knowledge. Hypotheses could range from highly specific (“In our experiment X we expect to find that Y happens“) to very broad conceptual hypothesis (“It is statistically unlikely that Earth is the only planet in the universe which supports intelligent life”). Key papers could be identified in support of/ against the hypothesis and short summaries written. Corresponding authors of those papers would be notified and invited to contribute. Contributions would be rated by the community. The major contributors of evidence for or against would be listed. Thus each hypothesis would be a ‘living document’ with an ‘author list’. Not peer reviewed but peer assembled. Citeable with a doi?

In some way hypotheses will tend to be heirarchical and interdependent, or importantly, mutually exclusive and this could be represented where appropriate. Hypotheses needn’t be limited to science: “Edwin Drood was murdered by his uncle“.  Academics and members of the public would be equally able to contribute. Some moderation would inevitably be necessary on controversial topics – climate change for instance. But Hypothify would be a space for engagement with the wider community both in terms of content but also the process of academic research. This is a positive thing. We can take useful bits of the wider web to use for our work (GitHub, Twitter, Slideshare), why not send something back the other way?

In my next post I’ll outline the (rather sketchy) details of how I think Hypothify might work. Would love to hear what you think! If you’re already convinced, please register your interest here.