Open access medical content and the world’s largest encyclopedia

Authors & Wikipedians: Thomas Shafee, Diptanshu Das, James Heilman & Gwinyai Masukume

Wikipedia aims to make a free and accessible summary of all human knowledge and is therefore one of the most well known open access efforts. The cumulative efforts of its volunteer writers (Wikipedians) has resulted in it dwarfing all previous encyclopedias in scope and depth. Additional collaborations with members of the open access community are taking this further. Many of these ideas are globally relevant, however a number of initiatives exist in Australia and New Zealand. A pair of recent papers in Science and JECH make the case that there has never been a better time to help shape the world’s most-read information source.

An open access encyclopedia

A few decades ago, an encyclopedia was a luxury that few could afford. Now, all with Internet access have free access to an encyclopedia larger than could fit in most homes, if printed. Wikipedia is extensively used by the general public, as well as doctors, medical students, lawmakers, and educators.

Indeed, it’s the primary free information source in many countries, especially for biomedical content. For example, during the 2014 Ebola outbreak, the rapid updating and translation of relevant Wikipedia into more than 115 languages lead to these articles being read nearly 100 million times that year. Access to Wikipedia without data charges is also available in over 50 countries via the Wikipedia Zero project, covering more than 300 million people. The offline medical Wikipedia app and Internet-in-a-box initiatives offer greater accessibility to those with limited connectivity. With over half of the world’s population not online, and many more with only intermittent access, these efforts are critical.

Wikipedia and the open access movement strengthen each other

Wikipedia is an encyclopedia, and therefore can only summarise existing knowledge. It therefore depends on citing reliable and verifiable reference sources to support its statements. Since it is editable by anyone, it is particularly important that anyone be able to cross-check the stated ‘facts’. Indeed, Wikipedia is the 6th highest referrer of DOI links (the unique hyperlinks assigned to academic articles).

However, most Wikipedia readers (and many of its writers) do not have access to paywalled articles. How then can references be checked? Some journals provide access to Wikipedians through the Wikipedia Resource Library. This allows details within paywalled sources to be duly summarised and distributed, but it’s an imperfect solution. Readers wanting to check a source or read deeper into a subject hit the wall, and images can’t be easily replicated. Wikipedia articles commonly cite open access articles, however there are often no open access alternatives to paywalled articles. Currently, there is no perfect solution for which sources to cite, but any efforts that strengthen open access benefit the encyclopedia.

What can be done to help?

Any advances in the open access movement aid Wikipedia, as well as more targeted efforts.

On an individual level, teaching people how to directly edit Wikipedia enables them to get involved on the ground-level. There are widespread Australian examples, including universities, conferences, libraries, and  societies across the country. Similar events in New Zealand have been hosted by Royal Society Te Apārangi and Whanganui museum. The editing interface has been updated to be as easy to use as a Word document. People may contribute for a specific event (an edit-a-thon), or become regular contributors. The writer community organises itself into groups  called ‘WikiProjects’ with shared topic interests. Efforts include adding or improving text, copy-editing, reviewing new edits, and adding images or other media.

Encouraging professional bodies to formally recognise Wikipedia editing as a service to the academic community and wider world will help legitimize it as a worthwhile use of time by busy professionals. Greater involvement by subject experts can improve Wikipedia’s quality. As yet, no Australian or New Zealand funding body formally recognises Wikipedia editing for grant or fellowship applications.

We also strongly support the expansion of dual-publishing of peer reviewed articles by academic journals (e.g. by PLOS, Gene, and Wiki.J.Med). This process creates a citable ‘version of record’ in the journal (providing academic credit for the authors) and the content is then used to create or overhaul the relevant Wikipedia pages. Through Wikipedia, health professionals can massively impact public health literacy (even obscure Wikipedia pages usually get hundreds or thousands of views per day). Academics similarly gain a public impact that is matched by few other platforms. In return, the encyclopedia benefits from the accurate and expert-reviewed information and the journal gains greater exposure.

Larger groups and organisations can also be mobilised to contribute to Wikipedia as an open access outlet. For example, Blausen Medical and have contributed galleries of open access images and videos, which are used to illustrate the encyclopedia. Institutions such as the Cochrane, Cancer research UK, and Consumer Reports have teamed up with experienced Wikipedians and trained their members to add information and references to relevant Wikipedia articles. Journals can also be encouraged to release their back-catalogues under open access licenses, unlocking vital sources. Studies at Australia’s Monash University also recommended integrating Wikipedia editing into university courses, and several universities, such as the University of Sydney, do just this.  Even database services can integrate their data into Wikipedia’s structured knowledge database, WikiData (e.g. on genes and RNA families).

By Marcos Vinicius de Paulo (CC BY-SA 3.0), via Wikimedia Commons

The big picture

Although the recent articles in Science and JECH focused on the biomedical field, these are examples of a much wider phenomenon. For instance, there have been several ongoing collaborations between Galleries, Libraries and Museums around the world to add their knowledge to Wikipedia under open access licenses.

Wikipedia also has the potential to be a knowledge access platform for the 4 billion people who are not currently online. Its open license allows people to translate, build upon, and distribute its content in new and innovative ways with no requirements beyond attribution and releasing what they create under a similar license.

Wikipedia and the open access movement are already intertwined. Open access publishing provides information needed for growing, improving and updating Wikipedia. Meanwhile, Wikipedians search, summarise and combine that vast sea of information into free articles. Each benefits from the strengths of the other, and can be helped by specific collaboration efforts.

The Wikimedia Foundation, the organisation that hosts Wikipedia, is currently formulating its strategy through to 2030 and has identified collaboration with the wider knowledge ecosystem as one of its key themes.


Shafee, Thomas; Masukume, Gwinyai; Kipersztok, Lisa; Das, Diptanshu; Häggström, Mikael; Heilman, James (2017-10-29). “The evolution of Wikipedia’s medical content: past, present and future”. Journal of Epidemiology and Community Health. 71 (10). doi:10.1136/jech-2016-208601.

Shafee, Thomas; Mietchen, Daniel; Su, Andrew I. (2017-08-11). “Academics can help shape Wikipedia”. Science. 357 (6351): 557–558. doi:10.1126/science.aao0462.

Lead image (white books):   Michael Mandiberg (CC BY-SA 4.0), via Wikimedia Commons

This work is licensed by AOASG under a Creative Commons Attribution 4.0 International License.

Competing interests:

All authors have contributed to Wikipedia articles, are current participants in WikiProject Medicine, and are on the editorial board of WikiJournal of Medicine. Thomas Shafee is on the editorial board of PLOS Genetics. James Heilman is a former and current member of the Wikimedia Foundation board of trustees. The authors do not receive financial compensation for their contributions to these projects.

Follow the authors on Twitter:

Gwinyai Masukume
James Heilman
Diptanshu Das

Australasian Libraries Needed to Help Scale Knowledge Unlatched

Lucy Montgomery writes about the need for new models in humanities publishing and the second round of Knowledge Unlatched. Other models include Open Library of the Humanities.

Contact: @KUnlatched

Specialist scholarly books, or monographs, are a vital form of publication for Humanities and Social Sciences (HSS) scholars globally. Monographs allow HSS researchers to develop and share complex ideas at length, and to engage with international communities of peers in processes of knowledge creation. However, library spending on books hasn’t kept pace with growth in the number of researchers required to publish a book in order to secure tenure and promotion. Dramatic increases in the costs of maintaining journal subscriptions have left libraries with little to spend on other areas. As a result monograph sales have declined by as much as 90% over 20 years.

Although a growing number of librarians, authors, research funders and publishers would like to see books transition to OA, book-length scholarly works pose unique challenges. This is because the fixed costs of publishing a 70,000 — 100,000-word book are much higher than they are for a 5,000 – 10,000 word journal article. High costs mean that ‘gold’ routes to OA are not a practical option for most authors. Monograph publishers, many of whom are not-for-profit University Presses and already dependant on subsidies, are struggling to find funding to support OA experimentation. Creative approaches to enabling positive change across the system are needed.

ku_mark_facebookAustralasian Libraries are playing a key role the development of one such model. In 2014 Australasian libraries took part in the global pilot of a revolutionary OA book experiment: Knowledge Unlatched (KU). Libraries from around the world were invited to share the costs of making a 28 book Pilot Collection OA.  The collection, which included globally relevant topics such as Constructing Muslims in France and Understanding the Global Energy Crisis, has now been downloaded more than 40,000 times by readers in 170 countries. In addition to demonstrating the viability of KU’s global library consortium approach to supporting OA for books, the award-winning Pilot also allowed KU to demonstrate the power of OA to increase the visibility of specialist scholarly books in digital landscapes. In 2015 KU helped to secure the indexing of monographs in Google Scholar.

The 2014 KU Pilot confirmed that Australasian libraries are important change-makers in the global scholarly communications landscape.  KU is widely regarded as a strongly Australasian project, thanks in no small part to the three Founding Libraries that provided additional cash support for the development of the KU model: UWA, University of Melbourne and QUT. Australasia also punched well above its weight in sign-up rates for the Pilot Collection. 28 libraries from Australia and New Zealand took part, joining a global community of close to 300 libraries that contributed to making the 28 book Pilot Collection OA.

Libraries are now invited to support the next phase of the project by signing up for Round 2. Round 2 is a key step in scaling the KU model and ensuring that the project delivers on its promise to create a sustainable route to OA for large numbers of scholarly books.

As the end of the year fast approaches, we encourage you to consider signing up. Libraries have until 31 January 2016 to pledge, but we’d be happy to assist with earlier invoicing for those that would prefer to support the project from a 2015 budget. KU Round 2 is an opportunity for libraries from around the world to share the costs of making 78 new books from 26 recognised publishers OA.  The 78 new books are being offered in 8 individual packages. Libraries must sign up for at least six in order to participate.

As with the Pilot Collection, books in Round 2 will also be hosted on OAPEN and HathiTrust with  Creative Commons licences, preserved by CLOCKSS and Portico, and MARC records will be provided to libraries.

If models like KU are to succeed it will be because libraries have made a conscious effort to move beyond established work-flows to support new innovative approaches to OA and publishing generally.  At this stage in its development the support of Australian libraries remains key to the capacity of KU to scale and operate sustainably.

Competing interests: Lucy Montgomery is Deputy Director (an unpaid voluntary position) of Knowledge Unlatched.

About the author: Associate Professor Lucy Montgomery is Deputy Director of Knowledge Unlatched and Director of the Centre for Culture and Technology at Curtin University.


Case study: Implementing DSpace at Ballarat Heath Services

This blog by Gemma Siemensma, Library Manager at Ballarat Health Services describes the thinking and processes behind the introduction of their DSpace repository, the Ballarat Health Services Digital Repository.

Ballarat Health Services (BHS) is the major hospital for western Victoria. Located an hour west of Melbourne it is the principal referral centre for the Grampians region, which extends from Bacchus Marsh to the South Australian border, covering a catchment of 48,000 square kilometres, and providing services to almost 250,000 people.

Why a digital repository?

There were two reasons why we decided to head down the repository path:

  1. The BHS Library kept hardcopy folders of research publications. There was no access or knowledge of what hospital staff had produced except the occasional mention in an Annual Report. Using an Endnote library was not an option as the organisation doesn’t license it.
  2. Piles and piles of “historical stuff” was dumped in the library. People didn’t want to throw out anything of value so they kindly gave it to us. We had no idea what we had and were often asked if we had old photos etc. that were of past staff or old buildings.

I liked the idea of a repository as I could chase the copyright for researchers and make the items freely and publicly available. A repository was clearly in line with my organisation’s values as it would allow staff to be recognised as researchers and enhance the research reputation of the organisation.

Where I started…

It didn’t take me long to realise that I needed to go down the repository path. I had explored the functionality of our existing catalogue but it wouldn’t suffice. I spoke with several vendors to hear about their products and had a bit of a play with these online. For me (and my miniscule budget) the costs were extremely prohibitive. I also spoke to repository staff at universities to garner their ideas and opinions (thanks to staff at the University of Ballarat and Swinburne University of Technology whom I pestered on several occasions).

I put a call out to the aliaHEALTH e-list to see what other hospitals were doing. There were few responses however one did mention that they were heading in the same direction and were looking at using DSpace which could be hosted externally by Prosentient Systems based in Sydney. Hospital IT Departments can be restrictive and Prosentient could provide the support and training I would require. I also read lots and lots about different repositories and played around with them to see if they would suit our needs.


We decided to use DSpace for a variety of reasons. It was a lot cheaper than other systems and Peninsula Health were also looking to head down this path so we figured we could work in tandem. DSpace (through Prosentient Systems) would come configured and hosted and they would look after the technical side. These are skills that no one is our library (EFT of 3.2) possessed at a high end level. This included helping us set up an IT sub-domain; configuring the site; Handle registration; Google analytic set up; indexing on Trove; training; and helping to test, re-test and tweak the system. It also included on-going support and maintenance which we pay for annually. We had worked with Prosentient before (they look after Gratisnet ILL system for health libraries) so we were confident in their abilities to help us. DSpace was also very popular with universities so for me it had credibility and lots of support out there which I could tap into when required.

What’s included?

Our repository is divided into two sections:

  1. Research – this contains published journal articles, books, book chapters, conference papers and theses. We chase copyright from publishers for all completed works. We are yet to chase authors for pre-publication versions as we are still getting a feel for how this all works. This is something we hope to do more of in 2014. We currently sit at about 20% of full text research but expect this to rise.
  2. Historical content – so far this includes Annual Reports from BHS. These have been scanned and indexed in Trove. In the future we aim to add in newspapers articles, photographs, internal reports and recordings.


Before beginning the repository it was put to both the BHS & St John of God Human Research and Ethics Committee (HREC) and the BHS Research Advisory Committee to garner support. Both were keen to see the repository implemented as it highlighted what research was undertaken within BHS and the region. The repository was also added to organisation wide policies. This included the “Rules for Publication/Presentation” policy as well as embedding it in HREC documentation. Each year BHS hold a Research Symposium and I am asked to speak about where we are up to and encourage people to deposit works into the repository. I also promote the repository in the internal staff newsletter. There is also the possibility for external promotion in the local newspaper in relation to the historical documents. The historical components also open up opportunities to work with local historical groups, the research room at the local public library and other local archives.


The BHS Digital Repository has just ticked over its first birthday. In that time we have managed to add just over 400 items. Three-quarters of these relate to research and we have many more to add. The researchers we worked closely with in the initial stages of the project are very keen to alert us to new publications and we are finding that when we approach staff about publications they are happy to give us a list of all their work for inclusion.

Library value

In its vision to provide excellence in health care, BHS is strongly committed to the values of research, continuing education and collaboration with other service providers and is committed to sharing its knowledge and experiences to build a better health system. The BHS Library supports these values through a number of library initiatives (literature searching; database training; electronic journals and books etc.)

Over the last three years, the majority of health libraries have remained static or experienced a decrease in their budget, staff hours and space. As a manager I know that I need the library to add value to the organisation for them to recognise that we are a resource worth keeping. I feel that implementing a repository has done this. Not only has it shown us that we have transferable skills, but it has opened up professional visibility for both the organisation and the library. We communicate more widely across the organisation and in doing so are promoting the library and showing staff that we are more than just books. When we talk to researchers it puts the library in their head space and they approach us more frequently for help. It’s a win-win situation.

Gemma Siemensma
Library Manager

This is a version of a paper being presented at the 10th HLi conference: #vital on October 18th 2013

Shall we sing in CHORUS or just SHARE? Responses to the US OA policy

Well things certainly have been moving in the land of the free since the Obama administration announced its Increasing Access to the Results of Federally Funded Scientific Research policy  in February.

In short, the policy requires that within 12 months US Federal agencies that spend over $100 million in research and development have to have a plan to “support increased public access to the results of research funded by the Federal Government”. (For a more detailed analysis of that policy see this previous blog.)

In the last couple of weeks two opposing ‘solutions’ have been proposed for the implementation of the policy.

In the publishing corner…

A coalition of subscription based journal publishers has suggested a system called CHORUS – which stands for Clearing House for the Open Research of the United States. The proposal is for a “framework for a possible public-private partnership to increase public access to peer-reviewed publications that report on federally-funded research”.

The plan is to create a domain called where publishers can deposit the metadata about papers that have relevant funding. When a user wants to find research they can look via CHORUS or through the funding agency site, and then view the paper through a link back to the publishers site.

While this sounds reasonable the immediate questions that leap out is why would this not be searchable through search engines, and what embargo periods are being held on the full text of publications? The limited amount of information available on the proposal does not seem to address these questions.

The Association of American Publishers released their explanation of the proposal ‘Understanding CHORUS’ on 5 June. There is not a great deal of other information available, although The Chronicle published a news story about it.

The Scholarly Kitchen blog – run by the Society for Scholarly Publishing – put up a post on 4 June 2013 with some further details. According to the post the CHORUS group represents a broad-based group of scholarly publishers, both commercial and not-for-profit There are 11 members on the steering group and many signatory organisations. The blog states the group collectively publishes the vast majority of the articles reporting on federally-funded research.

The time frame is fast, with plans including:

  • High-level System Architecture — Friday, June 14
  • Technical Specifications — Friday, July 26
  • Initial Proof-of-Concept — Friday, August 30

On this blog there is the comment that CHORUS is:

a much more modern and sensible response to the demand for access to published papers after a reasonable embargo period, as it doesn’t require an expensive and duplicative secondary repository like PubMed Central. Instead, it uses networked technologies in the way they were intended to be used, leveraging the Internet and the infrastructure of scientific publishing without diverting taxpayer dollars from research budgets.

Not surprisingly the comment coming from commercial publishers about diverting taxpayer dollars from research budgets has attracted some criticism, not least from Stevan Harnad in his commentary “Yet another Trojan Horse from the publishing industry” :

And, without any sense of the irony, the publisher lobby (which already consumes so much of the scarce funds available for research) is attempting to do this under the pretext of saving “precious research funds” for research!

Harnad’s main argument against this proposal is that it represents an attempt to take the power to provide open access out of the hands of researchers so that publishers gain control over both the timetable and the infrastructure for providing open access.

Mike Eisen in his blog on the topic points out that taxpayers will end up paying for the service anyway:

publishers will without a doubt try to fold the costs of creating and maintaining the system into their subscription/site license charges – the routinely ask libraries to pay for all of their “value added” services. Thus not only would potential savings never materialize, the government would end up paying the costs of CHORUS indirectly.

Harnad notes that this is a continuation from previous activities by publishers to counter the open access movement, not least the 2007 creation of PRISM (the Partnership for Research Integrity in Science and Medicine)  which grew from the American Association of Publishers employing a public relations expert to “counter messages from groups such as the Public Library of Science (PLoS)”

In the university corner….

Three days after the Scholarly Kitchen blog, the development paper for a proposal called SHARE was released from a group of university and library organisations.

The paper for SHARE (the SHared Access Research Ecosystem) states the White House directive ‘provides a compelling reason to integrate higher education’s investments to date into a system of cross-institutional digital repositories’. The plan is to federate existing university-based digital repositories, obviating the need for central repositories.

The Chronicle published a story on the proposal on the same day.

The SHARE system would draw on the metadata and repository knowledge already in place in the institutional community, such as using ORCID numbers to identify researchers. There would be a requirement that all items added to the system include the correct metadata like: the award identifier, PI number and the repository in which it sits.

This type of normalisation of metadata is something repository managers have already addressed in Australia, in response to the development of Trove at the National Library of Australia which pulls information in from all Australian institutional repositories. Also more recently here, there has been agreement about the metadata field to be used to identify research from a grant to comply with the NHMRC and the ARC policies.

In the SHARE proposal, existing repositories, including subject based repositories, would work together to ensure metadata matching to become a ‘linked node’ in the system. The US has a different university system to Australia with a mixture of private and state-funded institutions. But every state has one or more state-funded universities and most of these already have repositories in place. Other universities without repositories would use the repository of their relevant state university.

A significant challenge in the proposal, as it reads, is the affirmation that for the White House policy to succeed, federal agencies will need universities to require of their Principal Investigators; “sufficient copyright licensing licensed to enable permanent archiving, access, and reuse of publication”. While sounding simple, in practicality, this means altering university open access and intellectual property policies, and running a substantial educational campaign amongst researchers. This is no small feat.

The timeframe the SHARE proposal puts forward is in phases, with requirement and capabilities developed within 12-18 months, and the supporting software completed within another six months. So there is a two-year minimum period after initiation of implementation before the system would be operational. It is also possible that given the policy issues, it could take longer to eventuate in reality.

There has been less discussion about the SHARE proposal on open access lists, but this is hardly surprising as more energy on these lists will be directed towards criticism of the publishers’ proposal.

So which one will win?

Despite the two proposals emerging within days of one another, the sophistication of both proposals indicates that they have been in development from some time.

Indeed, the CHROUS proposal would have required lead-time to negotiate ‘buy-in’ from the different publishers. On the other hand, the SHARE proposal includes a complex flow chart on page 4 which appears to be the equivalent to the ‘High-level System Architecture’ the CHROUS proposal states would be ready on Friday 14 June. According to a post on the LibLicense discussion list, SHARE was developed without awareness of CHORUS, so it is not an intentional ‘counterattack’.

There are glaring differences between the two proposals. SHARE envisions text and data mining as part of the system, two capabilities missing from the CHORUS proposal. SHARE also provides searching through Google rather than requiring the user to go to the system to find materials as CHORUS seems to be proposing. But as Peter Suber points out: “CHORUS sweetens the deal by proposing OA to the published versions of articles, rather than to the final versions of the author’s peer-reviewed manuscripts”.

So which will be adopted? As one commentator said CHORUS will work because publishers have experience setting up this kind of system, whereas SHARE does not have a good track record in this area. They suggest that.

A cynical publisher might say: Let’s fight for CHORUS, but let’s make sure SHARE wins. Then we (the publishers) have the best of all worlds: the costs of the service will not be ours to bear, the system will work haphazardly and pose little threat to library subscriptions, and the blame will lie with others.

This is an area to watch.

Dr Danny Kingsley
Executive Officer
Australian Open Access Support Group