Shared, reproducible research Eetu Mäkelä, D.Sc. Assistant Professor in Digital Humanities / University of Helsinki Docent (Adjunct Professor) in Computer Science / Aalto University

Open science • • • • • • • • •

Open data Open research process Open research results Open access to publications Open source Open peer review Collaborative research Citizen participation in research …

Open, reproducible research Reproducible research •Open data •Open research process •Open research results •Open access to publications •Open source •Open peer review •Collaborative research •Citizen participation in research •…

Open, reproducible research Reproducibility of research is •Open data important: •Open research process •Open research results • 2011-2015, Reproducibility Project: Psychology: only 35 out of •Open access to publications 97 landmark studies were •Open source reproducible •Open peer review • Over half of publications in •Collaborative research psychology contain simple statistical errors •Citizen participation in research • Only 9 out of 53 landmark cancer •… studies were reproducible

Digital humanities research process raw data

cleaning up data (80% of work)

exploratory tools

results

research articles

understanding data

80% of your time to understanding and cleaning up data, because data is always complicated and messy, and thus blindly trusting it cannot yield trustworthy science

Leverage collaboration, open science workflows to reduce individual workload

raw data

cleaning up data (80% of work) d

exploratory tools

understanding data, 2 collaborate, share these, speed up research for everyone

+ reproducibility

results

research articles

Sharing data, code and publications: GitHub • GitHub is a great collaboration platform for open research that is based on code • Supports e.g. revision tracking, merging conflicting edits, tagging release versions, issue tracking, … • Integrates with Zenodo, a service hosted at CERN for sharing and long term archiving all aspects of research. • Basically, you can upload anything there, and it gives you a resolvable DOI (Digital Object Identifier) for it

Design for Digital Publishing of Data Driven Research

Examples of open, shared, reproducible research • • • •

Polymath NMRLipids rOpenSci COMHIS • Bibliographica, ESTC, Fennica

Commensurate numismatics • • • •

Collaboratively created unified terminology Place identifiers from Pleiades Open source code for publication platform Instances: • Coin hoards of the Roman Republic Online • Coinage of the Roman Republic Online • Online Coins of the Roman Empire

Reproducible research is hard • GitHub for sharing and Zenodo for long term archiving, but how to make sure someone in the future can reproduce your results? • Documentation! • Literate programming = the publication and the code that led to it at the same place ‒ Jupyter notebooks ‒ R Reproducible Research package • Versioning of data and code (git) • Management and versioning of external dependencies • Packrat for R

Insuring your reproducible research against the future is even harder • Format/software obsolescence

• Hardware obsolescence

[email protected] http://j.mp/s-makela

http://presemo.helsinki.fi/meth4dh

Shared, reproducible research

•Open data. •Open research process. •Open research results. •Open access to publications. •Open source. •Open peer review. •Collaborative research. •Citizen participation in research. •… Open, reproducible research. Reproducible research ...

1MB Sizes 7 Downloads 281 Views

Recommend Documents

CPI2: CPU performance isolation for shared ... - Research at Google
part of a job with at least 10 tasks, and 87% of the tasks are part of a job with ... similar data. A typical web-search query involves thousands of ma- ..... tagonist to 0.01 CPU-sec/sec for low-importance (“best ef- fort”) batch ..... onto a sh

Around the Water Cooler: Shared Discussion ... - Research at Google
could create new problems with using social annotations ef- fectively. For example ... site, e.g. searching with the term “netflix” to reach the site netflix.com.

The shared views of four research groups
these systems the acoustic input is typically represented by con- catenating .... TIMIT database [12], [13] that were used to demonstrate the power of this two-stage .... distribution. An undirected ..... Note the objective function of (17) derived f

DIPLOMA: Consistent and Coherent Shared ... - Research at Google
Abstract—1 Location-based services for mobile devices are pervasive, and ... leads to sensed data being sent through the cellular network to a centralized ...

Shared Memory
Algorithm. Server. 1. Initialize size of shared memory shmsize to 27. 2. Initialize key to 2013 (some random value). 3. Create a shared memory segment using shmget with key & IPC_CREAT as parameter. a. If shared memory identifier shmid is -1, then st

Shared Governance
Public community college governance stands quite apart from the ... in America's community colleges is virtually a state-by-state choice with some of the.

Reproducible, relocatable, customizable builds and ... -
Only supports one platform, usually require root. • Door B: Language ... Platform- and parameter-driven customization. • Cygwin for ... Automation. Installing ...

IEEE GlobeCom2013 - Towards Reproducible Performance Studies ...
IEEE GlobeCom2013 - Towards Reproducible Performa ... ures Using An Open-Source Simulation Approach.pdf. IEEE GlobeCom2013 - Towards Reproducible ...

Human Computation Must Be Reproducible - CEUR Workshop ...
Some examples of tasks using human computation are: labeling images [Nowak and ... polluted the data or perturbed the research results, validity provides ..... feedback and protection against spammers, but these do not reveal the accuracy of ...

Reproducible, relocatable, customizable builds and ... -
Automation. Installing scientific software shouldn't be hard. Build it Once. Use it Everywhere. ... Mailing List https://groups.google.com/d/forum/hashdist.

Towards Reproducible Performance Studies Of Datacenter Network ...
Data Storage Institute ... codes for our simulation set- ups are publicly available at http://code.google.com/p/ntu-dsi- dcn/. ... fully functional datacenter network of 50,000 servers [5], with .... such as as higher network capacity and graceful pe

Human Computation Must Be Reproducible - CEUR Workshop ...
a social network website. Not only is it ... In the social sciences, content analysis is a methodology ..... chronic pain in adults, excluding headache, Pain, 80, 1-13.

Making Computations and Publications Reproducible with VisTrails
6/8/12 10:41 AM ... through a Web-based interface, and upgrade the ..... the host and database name: .... best practices aren't necessarily formalized. By pub-.

Shared!Practice!Forum! -
Nepal!earthquake,!the!initial!mental!burden!of!shock!and! ... OPENPediatrics'! clinician! community! site! and! public! website.! Please! go! to!

Developing Shared Purpose.pdf
Page 1 of 3. Groundswell | 1850 M Street NW, Suite 1150 | Washington, D.C. 20036. 202.505.3051 | www.groundswell.org | @grndswell. Activity: Developing ...

2017.01 Shared Reading.pdf
Download. Connect more apps... Try one of the apps below to open or edit this item. 2017.01 Shared Reading.pdf. 2017.01 Shared Reading.pdf. Open. Extract.

Online Learning of Multiple Tasks with a Shared ... - Research at Google
We study the problem of learning multiple tasks in parallel within the online ... and analysis can be adapted to regression and multiclass problems using ideas in ...... example, say that we are operating an online store and that we have multiple ...

Lab-4 Shared Memory.pdf
Whoops! There was a problem loading more pages. Retrying... Whoops! There was a problem previewing this document. Retrying... Download. Connect more ...

Shared Governance Model 4.pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. Shared ...