I&M / I&O 2.0

Open Access – The Business of Scholarship

Geplaatst op 10/10/2018 door Jan de Boer

At Erasmus University we attended the European premiere of Paywall: The Business of Scholarship, preceded by interesting key notes by Pearl Dykstra, Robert-Jan Smits and Jason Schmitt. Some observations:

Robert-Jan Smits answered questions by a Springer representative and Leo van der Wees, project manager of Stichting OpenRecht, concerning the role of publishers. He acknowledged the contribution of publishers to the dissemination of research and wants them to be part of the change. But he also made it clear that the Academic Community should set the conditions (and they have done so in Plan S), not the publishers.
Again, on Green OA and plan S, Smits emphasized that Plan S only sets the conditions for the immediate open access under a CC-BY license with copyright for the author. The plan does not aim for one specific model but can include Green, Gold and Diamond.
There will be an implementation plan for plan S (including the proverbial devil and details) by the end of the year.
Jason Schmitt gave a passionate introduction to his movie Paywall: The Business of Scholarship. His view on the pace of change was very different from plan S, talking about the OA communities work for future generations (in contrast to 100% OA in 2020). He referred to the Ford Model T, which hadn’t changed significantly 10 years in production (https://en.wikipedia.org/wiki/Ford_Model_T#Gallery for proof of this).
Side note: after reading the Wikipedia lemma on the Ford Model T, it offers other desirable analogies with Open Access, not pursued by Jason Schmitt. For instance with the increase in the numbers of Model T’s produced every year the price for a single car dropped significantly. Also the resistance to change the Model T (being all the car a person could ever need) eventually led to the loss in market share to other manufacturers.

You can also read the report on the meeting at Science Guide (in Dutch)

Jan de Boer / Sandrien Banens

Geplaatst in I&M2.0, Open Access | Een reactie plaatsen

Open Science FAIR

Geplaatst op 29/01/2018 door tepronk

Al een tijdje geleden, maar nog steeds relevant! Van 6-8 september was in Athene de Open Science Fair. Drie dagen vol OS initiatieven, van inventarisatie tot infrastructuur. Er waren veel organisaties vertegenwoordigd. Hieronder een greep uit het programma.

Als eerste bezocht ik een sessie over het verzamelen van data voor hergebruik in de agrifood sector. Hier merkten ze op dat open data op zich niet altijd bruikbaar is, of een heleboel werk kost doordat het niet netjes is, het staat overal, het moet gefilterd worden, gevalideerd, geïnterpreteerd, etc. Maar er waren hiervoor oplossingen bedacht. Case study 1: Het verzamelen van data voor het verbeteren van agricultuur werd gedaan door het aanleveren van gestandaardiseerde data door boeren, door hen via een app en een makkelijke omschrijving van een standaard manier om grond te beoordelen data te laten aanleveren. Case study 2: Agrifood science cloud eRosa is een hele infrastructuur voor wetenschappelijke datasets. Met interoperabele semantiek, en research portals en research enivironments voor het gestandaardiseerd bij elkaar brengen van de data. Kan dit dan onderdeel worden van de european science cloud, of gebruik maken daarvan, dat bleef een open vraag.

In een andere workshop benadrukte Wolfram Horstmann van Liber het belang van de onder steuning van de ‘long tail’ of science. En inderdaad, de meeste services van de aanwezige organisaties op OSFair zijn vooral voor integraties door andere organisaties, niet door individuele onderzoekers. De bibliotheek is een goed aanspreekpunt voor meer diverse klanten.

In een keynote, pleitte Sachs voor het stimuleren van patenten om het de moeite waard te maken te investeren in innovatie. Niet alles open dus! Want dat is schadelijk. Maar alleen in gelimiteerde context, geen basis-wetenschap patenteren of andere on-ethische patenten.

Er was nog een ‘science theater’ waarin deelnemers zich moesten inleven in de verschillende stakeholders, en als slot drie hilarisch grappige toneelstukjes over Open Science. Dit is een leuke manier om onderzoekers zich te laten inleven.

Volgens Jon Tenant is het grootste probleem voor onderzoekers die wel OA willen publiceren, de kosten. En volgens hem heeft 70% van de OA journals geen of weinig Author fee. Geen lijstje bijgeleverd.

De belangrijkste ‘open’ producten waren publicaties, data, software. Hoe ga je die verbinden? Er zijn initiatieven om software te ontwikkelen die deze dingen aaan elkaar linkt als ze op verschillende plekken staan door OpenAire.

Bij het NIOO is Antica Culina bezig met de ontsluiting en opslag van de resultaten uit onderzoek. Uitdaging daarbij is de diversiteit van de metingen op het NIOO zelfs op een enkel organisme, en ze nam daarbij het voorbeeld van de koolmees. Metingen varieren, van veldobservaties, genomics data tot gegevens over populatiedynamiek. Hoe ga je dat bruikbaar en koppelbaar opslaan? En de datasets zijn dynamisch er komt steeds data bij. Een nuttig idee: Zij vindt dat de Open Science advocates aanwezig zouden moeten zijn op vakspecifieke conferenties, want andersom zullen onderzoekers niet naar OS bijeenkomsten gaan.

In Elixir hebben ze de taak om bioinformatica resources (software en applicaties) te bewaren. Met de EDAM ontology hebben ze 6100 BioTOOLS beschreven en opgeslagen. Ze hebben ook de BIP! Tool, waarmee ze kwaliteit van artikelen meten en resultaten aan de hand daarvan ordenen bij een zoektocht.

Er was ook een sessie over Open Science Monitoring. Er zijn al vele monitors, Oa Sparc How open is it, OSI openness score, Fair metrics, RAND EU open science monitor study (monitored drie grote thema’s: OA publicaties, OA data, Open scholarly communication), OpenAire, usage statistics, Open data monitor http://opendatamonitor.eu, Open data barometer. EOSC gaat een monitoring framework maken, en indicators meten. OpenAIRe ontwerpt ook een monitor DANS ontwerkp een FAIR indicator. GoFAIR maakt ook een FAIR monitor. Uit het publiek komt het verzoek aan alle monitoring intitiatieven op elkaar af te stemmen en het meten van indicatoren te standaardiseren. Wat een goede opmerking! We moeten zelf ook FAIR zijn en zorgen dat alle initiatieven samen een beter beeld kunnen opleveren!

Dan was ik nog bij een sessie over de Europese data catalogus. Verschillende initiatieven met hun eigen catalogus kwamen aan het woord, oa EPOS waar Otto ook aan mee werkt, Bluebridge, Fairdom hub, Sea Data Net. How FAIR is your data catalogue? EOSC pilot heft een werkpakket die alle verschillende catalogi bij elkaar wil brengen, om versnippering tegen te gaan. Sommige pogingen om informatie te verzamelen van verschillende bronnen te verzamelen kwamen ook aan het word: Omix discovery index, OpenAire European Catalogue (alleen data pakketten bij publicaties). En EPOS ook.

Nog wat leuke weetjes:

Om groen OA te bevorderen, heeft JISC een publication router, die artikelen automatisch naar een green repository kan sturen.

Zenodo heeft een Github integratie, waardoor Code in Git een DOI kan krijgen en geciteerd kan worden.

Geplaatst in I&M2.0 | Een reactie plaatsen

DH Clinics 3: Natural language processing en Linked Data

Geplaatst op 19/10/2017 door joostvangemert1

Op 17 oktober woonde ik samen met collega’s Jan de Boer en Coen van der Stappen het ochtendprogramma van de derde Digital Humanities Clinic bij, gewijd aan het thema: Natural language processing en Linked Data. De locatie was een zeer sfeervolle, nl.: de onlangs gerestaureerde bibliotheekzaal van het Rijksmuseum. In haar welkomstwoord verbond Saskia Scheltjens, hoofd Research Services van het Rijksmuseum, de geschiedenis met de toekomst: de indrukwekkende bibliotheekzaal met de hoog opgetaste collectie (die overigens maar een klein deel van de totale bibliotheekbezit omvat) is inmiddels aangevuld met vele omvangrijke databases, die ook een belangrijke rol (gaan) spelen in Linked Data-projecten in samenwerking met andere instellingen.

Hierna was het woord aan Marieke van Erp, werkzaam bij het KNAW Humanities Cluster. Zij ging op buitengewoon heldere wijze in op Natural Language Processing: het programmeren van computers op zodanige wijze dat zij grote hoeveelheden natuurlijke tekst kunnen analyseren. Marieke onderscheidde verschillende analyseniveaus: woordanalyse, syntactische analyse, contextanalyse en semantische analyse. Er worden zeker grote vorderingen gemaakt met betrekking tot het hanteren van statistische methodes, b.v. Named Entity Recognition, waarbij “namen” van personen, organisaties, plaatsen of wat dan ook “herkend” worden en (mede op basis van contextanalyse) op de juiste wijze worden geklasseerd. Hiervoor is overigens training van de software d.m.v het vooraf ingeven van (gedefinieerde) termen een voorwaarde. Met name de semantische analyse blijft echter een moeilijk punt: wat heeft de spreker of auteur nu eigenlijk bedoeld? Kortom: ook hier bleek naar mijn smaak weer dat digitale analysemethodes en close reading elkaar aanvullen.

De hierna volgende presentatie door Seth van Hooland (Vrije Universiteit Brussel) was aangekondigd als een lezing over Linked Data. Linked data kwam ook wel aan de orde, maar binnen een veel breder kader. Seth besprak vier methodes voor het modelleren van informatie die sinds de jaren zestig/zeventig gefunctioneerd hebben: eenvoudige tabellen, b.v. in Excel (waarbij het aanbrengen van een hiërarchie onmogelijk is); databases (die bij uitstek gericht zijn op het aanbrengen van een hiërarchische ordening van de informatie); XML (waarbij een hiërarchie wordt gecombineerd met extra semantische lagen) en RDF (waarbij “triples”, bestaande uit een subject, een predikaat en een object, onbeperkt aan elkaar gekoppeld kunnen worden.) Seth meldde dat hij enigszins gereserveerd stond t.o.v. RDF. RDF is naar zijn mening in feite een terugkeer naar het hiërarchieloze Excel-principe: de mogelijkheden om informatie af te schermen (zoals geboden door databases en XML) worden hiermee verlaten. Dat is op zichzelf prachtig maar kan ook veel problemen opleveren: de keten van triples kan eindeloos uitgebreid worden.

Ik heb genoten van deze zeer leerzame en breed georiënteerde ochtend. Het historische besef van Saskia Scheltjens, de state of the art-lezing van Marieke van Erp en de knappe conceptuele, en eigenlijk ook historische, presentatie van Seth van Hooland gingen prachtig samen.

Joost van Gemert

Geplaatst in I&M2.0 | 1 reactie

Reducing the problem? On NWO, funding applications and research assessment

Geplaatst op 16/10/2017 door Bianca Kramer

Early this year, I wrote ‘NWO – the impact factor paradox‘, about the actual and perceived role of the impact factor in NWO grant applications, and the intentions of NWO to consider novel ideas to organize research assessment for funding distribution.

The announced national working conference and international conference on such ex ante research assessment have since taken place (I participated in the former). Last week, NWO made public which concrete measures it plans to take in this area (full report currently in Dutch only (pdf)). As stated in the report, the proposed measures have been approved by the council of university rectors, organized in the VSNU.

Below, I will go over what to me personally are the most salient aspects of the report, not only in the context of reducing the burden of applications and peer review (which is what NWO focused on), but also in light of research assessment in general and the implications for open science in particular.

REDUCING THE BURDEN OF APPLICATIONS AND PEER REVIEW
In the report as well as in the process leading up to it (including the two stakeholder conferences), the main problem to address has consistently been framed as that of an increasing burden on researchers to apply for funding, with an accompanying increase in workload for peer reviewers.

Consequently, most proposed measures aim to remedy this issue specifically – e.g. by postponing calls until enough funds are available to guarantee a minimum rate of acceptance (Fig. 1), stimulating universities to play an active role in guiding researchers’ decisions to apply for NWO funding instruments, and make decisions on tenure less contingent on the receipt of NWO-funding.

Fig 1. Acceptance rate

POSSIBLY MAYBE
There are also some potential measures that NWO decided require further investigation. In two cases, NWO promises to allocate specific resources to such investigations:

SOFA (Self-organized funding allocation)

With the SOFA model, researchers would receive baseline non-competitive funding, with the stipulation that they distribute part of that funding among other researchers (see link for more details). While this model will not at this time be implemented, or experimented with on a small scale, NWO has committed to financing ‘a couple of’ PhD-students to further analyze the model and investigate whether it could be fit for use (Fig. 2).

Fig 2. SOFA model

In this light, it is also interesting to note the recent investigation, published in PLOS One, on the potential effects of baseline non-competitive funding (in this case without further redistribution) in the Dutch system, and the commentary of NWO, (Dutch only) challenging the assumptions and number used in the paper.

Preselection based on CV

Another measure that NWO plans to consider (esp. for the Innovational Research Incentives Scheme consisting of the personal Veni, Vidi and Vici grants) is to preselect applications based on CV only (Fig. 3). The expectation is that this would result in “a considerable reduction in efforts (…), with the chance that excellent proposals are missed being very small” (pdf p. 11). This assumption is based on the observation (within the domain of Social Sciences and Humanities) that “only 8% of Veni-grants are awarded to the lowest 40% in terms of CV (only 2% to the ‘worst’ [sic] 30%)“.

Choice of language apart, I see three red flags with this measure and the underlying assumption:

The percentages given might speak as much about the weight currently given to applicants’ CVs as about the content of the research proposals from these applicants. Only by blinding assessors on CVs can a true estimate be made of how many ‘excellent proposals’ would be missed by relying on CV info only.
Relying on CVs to assess eligibility for submitting a proposal would further enhance the Matthew effect the report rightly warns against in the discussion on other measures.
With this measure, the criteria on which the CV of a researcher is evaluated become of even greater importance. Will evaluation be done (explicitly or implicitly) on the basis of the length of the publication list and the perceived quality of journals and monograph publishers? Or will a more rounded approach be taken, with publications being assessed on their own value, other research output taken into account, and societal impact being considered as well?

Fig 3. Preselection based on CV

DRAWING LOTS – YES OR NO ?
One possible measure that has been discussed during the stakeholder meetings, but that NWO has decided against implementing, is that of drawing lots to decide on funding. NWO will not investigate or experiment with this approach either, but will keep an eye on current experiments in Germany (Volkswagen Stiftung – Experiment!), Denmark (Villum – Experiment*) and New Zealand (Health Research Council – Explorer grants) (Fig. 4).

Fig. 4. No plans for a lottery

Interestingly, in the first NWO stakeholder meeting, which I attended myself, a lottery draw was proposed as a secondary selection mechanism only, to decide between those proposals that all score equal in the eyes of peer reviewers after the regular assessment procedure. Selection between these proposals is often felt as arbitrary, also by the review panel, which is why the idea of a lottery was proposed as more time-efficient and fair in these cases.

The proposals referenced by NWO above go much further: first, a check of proposals on basic quality criteria is done in a double-blind procedure (so on content only, not relying on CV). Subsequent selection of proposals is then done by lottery. Thus in these models, lottery is really intended to replace, not complement, selection on ‘excellence’ by peer review.

(*NB As far as I could ascertain, for the Villum Experiment grants, no lottery element is included; this assessment scheme only includes the double-blind peer review aspect)

Such experiments can also offer important insights into various aspects of grant assessment, such as the Matthew effect, the effect of conscious and unconscious bias of reviewers, and the difference over time in results of research selected on ‘excellence’ or on basic quality criteria only. Thus, they will contribute to more evidence-based research assessment. In the report, developing such evidence is mentioned as something NWO vows to take an active role in (pdf, p. 12) (Fig. 5).

Fig. 5. Supporting evidence-based assessment

RESEARCH ASSESSMENT: A QUESTION OF EXCELLENCE
Throughout the report, NWO stresses that while it aims to reduce the burden of applications and peer review, its goal remains to select the most ‘excellent’ proposals (e.g. see Fig. 4 above). While this reduces the problem to a manageable level, it also sidesteps the bigger question of what is good research and how best to stimulate that.

As has been argued elsewhere, e.g. in the paper “Excellence R Us” (Moore, Neylon, Eve et al., 2016), the focus on ‘excellence’ in research assessment stimulates hypercompetition and homophily (= same begets same). The resulting push to publish is not always conducive to reproducible research or to the advance of knowledge through collaboration.

This is why I think that the most interesting parts of the NWO report are proposals that will not (yet) be implemented: SOFA, double-blind peer review of proposals, allocation of at least some funding by lottery after an initial check for quality. These all have the potential to shift the current tenets of research assessment away from competing for excellence only.

Together with measures that stimulate reproducible research (funding replication studies and both requesting and awarding pre-registrations, sharing methods and data and open access publication of all research output), this would create a culture where sound research is rewarded and collaboration and (re)use of research findings is encouraged. This is also in line with the ambitions of NWO as signatory to the National Plan Open Science (NPOS), and with the ambitions of the new Dutch government as outlined in the coalition agreement that was published last week (Fig. 6).

Fig 6. Quotes from Dutch government coalition agreement, published Oct 10, 2017

With the coalition agreement, NPOS and this report on funding assessment measures on the table, it will be interesting and promising to follow developments in NWO policy over the coming months.

[edited to add: Also listen to the Oct 14 episode of the Dutch investigative reporting radio program Argos where the NWO funding allocation measures are discussed]

Geplaatst in funding, I&M2.0, nwo, research assessment | Een reactie plaatsen

Open Science summer school

Geplaatst op 11/10/2017 door Jeroen Bosman

by Bianca Kramer (@MsPhelps) and Jeroen Bosman (@jeroenbosman)

Open Science is the buzzword of the day. And for good reason, because under its umbrella people are working to make science and scholarship more inclusive, open, efficient and reproducible. To share and test our knowledge and ideas, Utrecht University Library offered a course on Open Science in August 2017 as part of the Utrecht University Summerschool.

Seven young researchers from around the world, and from a range of disciples attended a full week’s course in the middle of summer. We had turned the library’s Buchelius room into a cosy ’living’ (yes it can be done), with a continuous supply of stroopwafels.

The idea behind the course was for the participants to gain insight in the why and how of an array of open science practices and ideally also start applying that knowledge to their research projects.

Apart from a very important introductory session (to get to know each other and create and agree on a code of conduct) and a wrap-up session, the programme (see Fig. 1) consisted of 8 half day sessions, for each of the phases of the research workflow (one phase got a full day). Despite having a full week, it was still difficult to decide what to include.

Fig. 1 The main programme schedule showing the half day sessions

One of the premises was that participants should be allowed to be ‘selfish’ and spend their time in such a way to get the most out of the course for themselves. To make that possible we limited plenary presentations to 1 hour per session and let it up to the participants to work on one or more assignments of their choosing for the remainder of each session. Despite this approach it was good to see many lively discussions and group work.

The full programme with links to all presentations, assignments and supporting material was and still is publicly available with a CC-BY license. To bring in extra expertise and have a variety of opinions and approaches we invited five co-lecturers: Chris Hartgerink, Stephanie Paalvast, Tessa Pronk, Jon Tennant and Rolf Zwaan.

Jon Tennant, one of the co-lecturers, in action

A special and very well received element in the course was the afternoon dedicated to outreach and advocacy. Participants had very stimulating talks way Open Science is practised in Utrecht with Frank Miedema (University Medical Centre Utrecht), Sanli Faez (Science Faculty) and Maud Radstake (Studium Generale).

The course as a whole was well received (all participants filled out the evaluation form) with an average score of 8 out of 10. Everyone indicated that they learnt things that were relevant and applicable in their own research. Suggestions for a next edition were a.o. to look at the price (or offer reductions) and to make it even more personalised.

In all it was an intensive and very enriching experience. A lot of course materials were developed and are now available for reuse (and improvement). Ideas and materials are also very valuable for our own university as it plans to stimulate and support open science practices across the board.

Geplaatst in doelgroepen, I&M2.0, onderwijs, Open Access, Open science | Tags: summer school, summerschool | Een reactie plaatsen

Digital Humanities Clinic (dag 1)

Geplaatst op 05/09/2017 door Jan de Boer

Vandaag was de eerste van vijf Digital Humanities Clinics. Doel van deze reeks is medewerkers van bibliotheken en verwante instellingen kennis te laten maken met Digital Humanities.

In het programma brengen we met hulp van de onderzoekers zelf het kennisniveau van de deelnemers omhoog. Door samen te leren hoe de onderzoekers gebruik maken van digitale collecties wordt de bibliotheek de gesprekspartner voor onderzoekers die nog niet zo digitaal vaardig zijn, maar dit wel willen. Met dit traject word je geen programmeur of datacruncher, maar het geeft je wel voldoende kennis om te adviseren en begeleiden bij nieuw DH-onderzoek.

Het ochtendprogramma bestaat steeds uit lezingen, het middagprogramma uit een workshop.

Digitalisering

Marco de Niet, directeur van DEN Kennisinstituut Digitale Cultuur mocht aftrappen met een lezing over digitalisering. Zijn presentatie ging vooral in op de meer beleidsmatige kanten van digitalisering, niet de praktische aspecten. Digitaliseren is meer dan scannen alleen, het gaat om de informatie, de context, de ontsluiting.
Hij noemde “trots op het eigen vak” als grote vijand bij digitaliseringsprojecten. Bibliotheken, archieven, musea etc. digitaliseren teveel vanuit hun eigen expertise, het is goed om over de grenzen van je sector te kijken en samen te werken. Een historische atlas zal door een bibliotheek bijvoorbeeld heel anders gedigitaliseerd worden dan door een museum.

Zijn presentatie bevatte ook wat ontluisterende cijfers (binnenkort op de website van DEN).

74% van de collecties van erfgoedinstellingen is niet online beschreven
45% van de gedigitaliseerde collecties is niet online beschikbaar (gedigitaliseerd voor eigen gebruik, vanwege auteursrecht alleen binnen de instelling beschikbaar)
51% van de metadata is niet beschikbaar in het publieke domein

Een laatste interessante waarneming vond ik de gebruiker centraal?, met vraagteken dus. Er is een fundamenteel verschil tussen commerciële instellingen die volledig de gebruikers volgen (wat wil de klant, daar maken we een product bij) en erfgoedinstellingen die een verantwoordelijkheid hebben voor hun materialen (we hebben een object, hoe betrekken we het publiek daarbij?).

Databases

Marnix van Berchum, buitenpromovendus muziekwetenschap aan de UU, gaf met schwung een presentatie over databases en probeerde en passant onze kennis over oude muziek wat op te vijzelen. Ik ken nu het verschil tussen een document database, relationele database en graph database, maar ga niet proberen dat hier te reproduceren. Welk type database je ook kiest als onderzoeker, het gaat er vooral om dat de informatie goed gestructureerd en gedocumenteerd is. Dat maakt de informatie in een database duurzaam en herbruikbaar. Marnix illustreerde vervolgens de relationele database aan de hand van zijn werk voor het project Computerized Mensural Music Editing (CMME).

SQL

Dit sloot goed aan bij het middagprogramma. Janneke van der Zwaan (eScience Engineer bij het Netherlands eScience Center) gaf ons een introductie in het gebruik van SQL, een taal waarmee je relationele databases kunt bevragen en aanpassen. Dankzij Janneke (en charmante assistent Ben) snap ik wat hier staat, sterker nog ik heb het zelf geschreven:

SELECT issns, month, AVG(citation_count), COUNT(*)
FROM articles
GROUP BY month, issns
ORDER BY issns

Maar, om te relativeren, dit is zo ongeveer het equivalent van het bestellen van een stokbrood op vakantie in Frankrijk. We hebben bijvoorbeeld maar één tabel bevraagd en niet gegevens uit verschillende tabellen gecombineerd. En dat is nu juist de essentie van een relationele database.

Volgende keer: lezingen over computationeel denken en over tool- en datakritiek. In de middag een workshop Python in drie uur.

Geplaatst in digital humanities, I&M2.0 | Een reactie plaatsen