Facilitating Reproducibility

Many scientists are making the reproducibility of their research a much higher priority these days than they used to. But it's a time consuming task, which means that many are searching for tools and workflows to help facilitate their efforts.

Hatef Monajemi, a PhD student in Civil and Environmental Engineering, and his PhD advisor Professor David L. Donoho, have developed a new piece of software that can make reproducibility an easier goal to achieve. His new software is called Clusterjob (CJ). This software can be used to develop reproducible computational packages and make the generation of data for a research study fully reproducible. CJ is an open-source software available on GitHub.

While the full software has not yet been published, an article demonstrating the power of this software ("Incoherence of Partial-Component Sampling in Multidimensional NMR") is currently in review for publication in the Journal of Magnetic Resonance. The study examines new approaches to assess artifacts caused by partial-component random undersampling in NMR spectroscopy experiments, and the data are fully reproducible via the use of Clusterjob.

"Think of CJ as an agent that you would install on your machine and it takes care of, and tracks your computations in a hassle-free and reproducible way. CJ asks for computational code of your experiments and its dependencies, it then generates a reproducible computational 'package', gives you a PID (for Package ID), finds a cluster to run your experiments and gives you back the results of your experiments when inquired. CJ can also speed up your computations by automatically distributing the task amongst many cores of a computational cluster."

You can download a project generated by CJ from the Stanford Digital Repository. "Any person downloading that package should be able to fully reproducible the computations I have done given they have access to MATLAB."

Not only does CJ make it easier to run entire sets of computations, it restricts researchers from changing the results of computations. This prevents problems like p-value hacking, which improves the reproducibility of the research.

We look forward to hearing more in the future about how others are using Clusterjob to improve the reproducibility of their research, and congratulate Hatef on this excellent contribution that helps enable others to improve the reproducibility of their work.

This Data Story was written by Amy Hodge.

Find out more about the data featured in this Data Story.