Tag Archives: science

Hypothify: first outline

I’ve been pondering a hypothesis-based academic social media site for a few weeks now; and talked with a couple of people about it. Ideas are only just beginning to coalesce but now seems the right time to try to outline what I see hypothify doing, and how it might work. I’m conscious that it needs to start very simple, and remain as simple as possible. It’s easy to come up with a massive feature list, but identifying the most important stuff and planning how it might work is key. [edit after writing – there’s still A WAY to go on this!!]


A place to propose,  discuss and aggregate evidence for/against hypotheses. Consensus building through democratic voting up/down of evidence/ discussion items. What I thought Quora was going to be when I first heard about it, but less question/answer and more evidence-based.

In traditional(ish) publishing terms it would represent a platform for ‘living’ review articles on each hypothesis. However, would integrate social media aspects (voting and sharing) and wider evidence than just academic papers.

Not peer-reviewed, or peer evaluated, but peer assembled.


Hypotheses are fundamental to academic work. They represent the ideas and concepts which propagate through the academic sphere and out into the wider world as our understanding of the natural world/ the universe/ the human condition etc. They are often dissociated from the piece-by-piece evidence in the traditional academic record. Currently academics are supposed  to read everything and make up their own mind on a particular matter. For each individual this is only possible for a limited number of concepts/hypotheses because of the massive time cost of i) finding all the literature and ii) reading it all and ii) keeping up to date. In reality we all take ‘received wisdom’ on many matters on trust from other academics, or tend to disbelieve everything we’re told and argue it out ourselves from first principles! Hypothify would solve the pain of i) and negates ii) and iii) by providing community-maintained consensus instead of ‘received wisdom’ on each given hypothesis.


The platform would allow the proposal of hypotheses by any user. Evidence items (papers [via Mendeley API if poss], unpublished data [figshare, slideshare, blogs or notebook entries], snippets of discussion / reasoned argument [from discussion of this hypothesis or elsewhere via e.g Disqus]) can be presented by any member of the community  as being ‘for’ or ‘against’ the hypothesis. Key to the usefulness of evidence items will be a tweet-length summary of what that evidence item contributes to the assessment of the hypothesis. One will have to be added by the person introducing the evidence, other ‘competing’ summaries may be added. Where necessary, discussion of the evidence can be conducted and this itself can be cited as evidence itself. It is conceivable that a single piece of evidence may be argued to support either side of a hypothesis. Maybe it’s necessary to recognise that evidence can be ‘agnostic’?

Key to the success of the platform will be the voting up/down of content (a la stack exchange). Hypotheses themselves should not be voteable on I think  -i.e. there will be no opportunity for individuals  to vote subjectively/dogmatically for/against a hypothesis, only vote up or down evidence supporting or contradicting the hypothesis. Plus vote up or down particular summaries of evidence items, so the best summaries float to the top for each bit of evidence. So the ‘hypothesis view’ page will show the hypothesis at the top and evidence items for and against (highest voted first), with the best summary pertaining to that hypothesis for each one. Plus a link to the evidence item (i.e. NOT stored on-site). I think this is really neat because a user can find a hypothesis they’re interested in, find what the community thinks it the base evidence for and against, read those bits, and make and informed decision based on comprehensive community review of the field. It may or may not be useful to have a ‘swingometer’ for each hypothesis which  represents the net votes for evidence for and against the hypothesis, which give a ‘community assesment’ of the hypothesis.

Attracting users?

What’s in it for users? Firstly, being seen to propose and and contribute to hypothesis assessment will bring kudos to users. A ‘reputation’ system (also a la Stack Exchange) could be implemented to measure reputation / value of contributions… Even badges etc would probably work for academics, but I think there’s a more instantly attractive ‘doughnut’ (as my good friend Charlie calls them) – promotion of your research output. If you add a good summary of your paper which informs debate on a particular hypothesis it will (if it’s good) float towards the top for that hypothesis. You will be able to engage with other interested parties and discuss your research. Google will love it.

Engage ‘the enemy’. Let’s say I propose a hypothesis, which just happens to be something I’ve proposed in papers in the literature in the past. Great. I put up the hypothesis, provide evidence items. As I’m the only contributor, the hypothesis is only ‘proposed’. To move it to the ‘community debated’ stage I need to get other people involved. So I share it on Twitter, but also invite people I know will be interested, to join the debate. Furthermore, other established hypothify users will be automatically invited to join based on their interests  and other hypotheses they’re active on and the tags which have been associated with the hypothesis in question.

As evidence items are added, the system will attempt to ‘scrobble’ originator details (emails, figshare user details, Mendeley author details) and contact the originators to inform them that their work is being used to inform debate on a particular hypothesis. They will be invited to join the debate. I’m guessing if their work is being ‘correctly’ cited they will be flattered enough to go and have a look, and if it’s being ‘incorrectly’ cited (in their opinion) they will be incensed enough to wade in and put those upstarts right. Thus the experts will hopefully filter in.

Furthermore, as evidence and discussion accumulates and more people vote evidence, evidence summaries and the hypothesis summary  up and down, the ‘top contributors’ will be identified. Those top contibutors, plus the hypothesis proposer (i.e. the proposer on hypothify, not necessarily the originator of the hypothesis in the outside world [who should be cited and acknowledged]) will be identified at the head of the hypothesis view as the ‘authors’ of the synthesis. Thus each hypothesis becomes a peer assembled citeable document (hopefully with a doi assigned). A publication! And as we all know in academia, publications are all. And what’s really nice is that it doesn’t matter which ‘side’ you’re on. If you’re presenting valuable evidence and discussion in either direction, you’ll be listed. So all those old vested interests evaporate away like the eyewash they are.


Not all problems present well as hypotheses. For instance, in my field – marine biogeochemistry – much science is exploratory, and/or based on assessing magnitudes of things : “What is the globally representative concentration of ammonium in the surface ocean“; “What is the annual global carbon uptake by primary producers in the ocean“. Of course, these can be presented as hypotheses “The globally representative concentration of ammonium in the surface ocean is 150nM“, but this is rather pointless. However, the accumulation of evidence leading to an assessment is much the same process as outlined above for hypotheses, only without the FOR and AGAINST argument. And these syntheses could then feed in to wider hypotheses as evidence. Synthify.com has been taken unfortunately, but I think it’s reasonable to conduct such data synthesis work under the hypothify banner. For the field I work in at least, I think the ‘synthify’ function will be as useful as the ‘hypothify’ one.

Anything else?

Moderation will be important and will rely strongly on the community. Controversial topics could get very sticky very quickly. Need to think about policing. Free speech is important, but balanced debate more so. Anthropogenic global warming deniers and intelligent designers are going to be a challenge to the stability and value of the system.

Integration with the rest of the web is obviously very important. All items will obviously be fully shareable, but a proper API would ideally allow full functional push and pull to and from other sites – mendeley, peer evaluation, wikipedia, quora, disqus etc etc.

If all this sounds irresistably interesting, please hit the pre-sign-up  at http://hypothify.kickofflabs.com

Arctic research proposal

The last few weeks have been increasingly dominated by writing a research proposal to the NERC Arctic funding programme. I’ll write down just how insane the process of applying for money to do research is some time soon, but right now just want to share what we’re planning to do, because I’m actually quite excited about it. Here are the summary and objectives sections of the online form that has to be filled in (just a tiny part of the whole proposal, but a good overview).

The Arctic is a key focus of Environmental research today, because it plays an important but not well understood role in the Earth’s climate system, and because it is subject to double the average warming to date. This project will improve our understanding of the role of the Arctic seas in releasing or taking up various gases which are important in the functioning of Earth’s atmosphere and climate system and the functioning of the underlying Arctic ocean ecosystem and how this may change in the future. This will be achieved by grouping the Arctic Ocean region (AOR) into separate areas based on their physical, chemical and biological (‘biogeochemical’) properties, i.e. where a particular area has specific characteristics which are different to adjacent areas. This process is called bioregionalisation.

This will allow us to ‘scale-up’ the relatively limited data on gas concentrations and other important biogeochemical measurements to the whole AOR. This will help us to better estimate the emission or uptake of gases which play an important role in climate (e.g. methane, nitrous oxide) or atmospheric chemistry (e.g. ammonia, dimethylsulfide). We will use satellite data on the AOR for the last decade to define these biogeochemical regions up to the present-day, and output from the latest generation of climate models to predict their changes into the future. This is an important approach because it bridges the gap between full-coverage, high resolution datasets such as satellite or model data and data collected by scientists in the field, which is relatively limited in space and time. By generalising from the high detail datasets, we can better extrapolate from the very valuable but low detail measurements, increasing their value to understanding global and regional processes.

We will use our institution’s high performance computing facilities to work through the large satellite and model datasets, using a set of rigorous statistical criteria in a computer program to determine the bioregionalisation i.e. there will be no subjective human eye defining the boundaries between the regions. This is important because it is very easy to see shapes and boundaries which don’t really exist (like picking out the face of “the man in the moon”), and to miss ones which do, particularly when we will be using multiple overlaid sets of data. Key datasets for defining the bioregions will be the amount of chlorophyll (a measure of the algal productivity of the ocean) and sea-surface temperature (SST, related to the source of the water and thus the nutrients provided for algal growth). The computer programs will output data ‘maps’ of the divisions between different bioregions as they change over time from the turn of the century to 2100. These ‘data products’ and the software we will write to produce them will be publicly available on the web for others to use. This will especially benefit scientists working in the Arctic or funded by the NERC Arctic programme.

In collaboration with international colleagues we will compile datasets of measured gas concentrations in the Arctic ocean and atmosphere. Using the time and place they were collected we will be able to allocate the measurements to a particular bioregion. We will then use this information, along with the cycles of chlorophyll in each bioregion, to determine the seasonal cycles of gas concentrations in the present-day and into the future, allowing us to calculate and predict fluxes of these gases with greater certainty previously possible. This data will feed into better estimates of future climate by providing improved input data to the new generation of Earth-system models which are beginning to have sophisticated atmospheric chemistry models nested inside them to better predict cloud formation, methane oxidation and other climate-relevant processes. All of the data we produce on gas fluxes and the models and data we use to produce it will be shared openly as a public resource.

The overarching aims of this work are:

1) To improve present-day and future estimates of the net ocean-atmosphere flux of a core list of biogeochemically important trace gases (see below) over the Arctic Ocean region (AOR), by applying objective “bioregionalisation” algorithms to delineate biogeochemically-similar regions of the Arctic Ocean from satellite data and Earth-system model (ESM) output;

2) To provide a novel framework (of data products, input files and tuned statistical algorithms within an integrated set of software tools) to allow the community to undertake their own bioregionalisation and contextualisation of biogeochemial parameters (including, but not limited to, trace gas fluxes) across the AOR (or elsewhere in the global ocean, given that new input data files will need to be generated);

3) To support other NERC Arctic Programme and other Arctic-focussed studies in quantifying present and future Arctic-wide marine emissions of their gases of interest (not limited to the core gases listed below).

This will be achieved by fulfilling the following specific objectives, which are not listed in the science case, but are intended to provide and overview of the work propsed in detail within the science case document:

1) Calibration and evaluation of two proven algorithms for delineation of biogeochemical sub-provinces (BSPs) for use in the AOR, working at finer spatial scale than more traditional methods (e.g. ‘Longhurst’ biogeochemical provinces), hence ‘sub-provinces’.

2) Application of these calibrated algorithms to ten years (2002-2012) of satellite chlorophyll-a concentrations, other satellite datasets (SST, water-column penetrating LIDAR) and ancillary data to provide week-averaged, month-averaged and decadal (climatological monthly) data products delineating the BSPs, the strength of the boundaries between them, and the characteristic values of their key biogeochemical variables (SST, salinity, chlorophyll-a) for each week/ month / climatological month.

3) Re-calibration of the algorithms to best reproduce the data in objective 2) using input data from the new generation Hadley Centre ESM (HADGEM2-ES), using baseline HADGEM2 runs for 2002-2012 from the latest phase of the Coupled Model Intercomparison Project (CMIP5) and production of data products to match those produced from the satellite data. Other models ESMs and high resolution ocean and coupled models will be used for comparison.

4) Application of the algorithms to HADEM2-ES runs for the 4 main emission scenarios run in CMIP5 for 2010-2100 to produce data products of month-averaged and decadal monthly climatologies of BSPs and related data (as in objective 2) for the AOR.

5) Using SST, salinity and windspeed data extracted from satellite and model outputs used above, calculate high-resolution gridded fields of gas transfer velocities and assoc iated uncertainties for a suite trace gases of biogeochemical importance (gases and their properties already compiled by R-CoI Johnson in recent publication).

6) In collaboration with project collaborators, other NERC-Arctic projects and the wider scientific community, compile datasets of marine and atmospheric concentrations of the core trace gases which are the focus of this project (CH4, N2O, DMS, Halocarbons, NH3) and use the satellite-derived BSPs to extrapolate observations to produce seasonal AOR concentration fields and calculate spatially-resolved fluxes for the period 2002-2012.

7) Using predicted BSP fields, project AOR trace gas fluxes to 2100 along the 4 emission scenarios.

8 ) Disseminate findings through academic publications and public engagement and the framework through open online access to data and software.