Although the main target of Open Access is … well, just that, content being freely and openly accessible for anyone in perpetuity, it does have the additional advantage that content may be re-used. Of course that use has to fit with the license given, but with most OA having a CC-BY license, there are a lot of opportunities for aggregation, text mining and the like. It is interesting that up to now only a few aggregation initiatives have sprung up, most notably PubMed Central (3.2M full text OA papers) and Europe PubMed Central (570K full text OA papers), that aggregate OA content in biomedical and life science. In PMC and PMC Europe most content is deposited by publishers and authors. Apart from these subject specific initiatives there aren’t many full text OA aggregators. Other sites either are not limited to OA and do not aggregate the papers in one place (e.g. Google Scholar) or do no full text indexing and also no aggregation (e.g. BASE, Oaister, DOAJ).
Enter Paperity, that was launched last week (so in October 2014): a multidisciplinary aggregator of Open Access scholarly content. It holds over 160K articles from over 2,100 journals. It is an initiative from Poland (well at least the founder is from Poland, although the website is registered in France) and led by Marcin Wojnarski. Paperity has a slick and friendly website that offers access to aggregated OA content, with full text search and a built-in PDF-reader. It promises more functionality to communicate around papers and use Web 2.0 options. It has a list of journals covered, and links to versions of the same paper on publisher websites. Publishers/editors of OA journals can request for their journals to be included.
Almost immediately, on Twitter and in a thread over at the GOAL Open Access discussion list, questions were raised, and answers given, that I will summarize here.
1) What are the inclusion criteria used by Paperity? Paperity aims at 100% of Open Access peer reviewed papers. Currently it is at 160K papers, which is somewhere around 10% of Gold OA papers but below 2 percent if you include of Green OA content. It is not stated explicitly but it seems logical y that Paperity only aggregates stuff that it is allowed to aggregate (so not if ‘no derivatives’ is in the CC license).
2) What is the business model of Paperity? Paperity seems to have started as a ‘non-profit academic project’, but it will have to look for more structural funding, which might include adds or charging journals.
3) Will Paperity allow text mining through a API or otherwise? According to Wojnarski that is not possible currently but Paperity is certainly sympathetic to the idea.
4) Why does Paperity focus on Gold OA journals? Paperity regards this content as the most reliable in terms of bibliographic data. Although repositories are easy to harvest, Paperity says that determining the version and status of texts is more difficult than with publisher provided full text journals. This initial focus on Gold OA also makes it easier to strictly have only peer reviewed content, according to Paperity.
If Paperity develops further I would like to see them start aggregating Green OA soon and also add more functionality in the built in PDF reader (e.g. annotations), text mining options, more advanced search and browsing and faceted search results.
Jeroen Bosman, @jeroenbosman