OpenBioML is a decentralized research community dedicated to enabling high-impact research at the intersection of machine learning and biology. It is characterized by a “fully in the open” approach to research and a commitment to permissively-licensed machine learning models, datasets, and research-related artifacts. By treating these resources as public goods and fostering a collaborative research environment, OpenBioML aims to increase structural diversity in the biology research landscape and address some of the market failures of science that characterize it, all intending to accelerate progress in the field.
Inspired by the success of EleutherAI, OpenBioML conducts research in the open so anyone anywhere can observe and contribute at every stage of our projects and is encouraged to do so. With this approach, we aim to prevent the formation of knowledge monopolies and enable talented individuals with different backgrounds from all over the world to contribute.
OpenBioML community projects should be characterized by large scale collaboration. The field is already characterized by venues for competition, like CASP for protein structure prediction, and it is our belief that the field will benefit from a setting dedicated to collaborative efforts. We hope this will help diminish the amount of duplicate work in the field and allow more researchers to focus on the most creative aspects of the research process itself.
Cutting edge research requires large amounts of resources, including both compute and storage, to rapidly iterate on new ideas through experiments and scale up early stage success. For this reason we provide community-sponsored projects with vast amounts of resources that are made available to us through our partners.
It is our belief that machine learning's positive impact in biology can only be realized if the relevant research is available for all researchers to experiment with and build upon. Therefore, it is a core requirement of OpenBioML projects to release all project-associated artifacts, e.g., datasets, experiment logs, source code and trained model checkpoints, under a permissive license such as Creative Commons Attribution (CC-BY), MIT, or Apache.
In order for research to function properly and resources to be allocated on a reasonable basis, it is necessary to implement proper credit assignment. We believe author order to be insufficient by itself, and we therefore require publications derived from community projects to include a section outlining all individual contributions.
Released project-associated artifacts must enable other researchers to reproduce project results. OpenBioML projects should therefore aim to make it possible to set up all dependencies with a single command, allow the download of all datasets and model weights from a reliable file host, and make the source code available in a public project repository. Key analysis should be recorded in a digital laboratory journal, together with experiment logs, and the analysis itself should be deterministic, as to better account for reproducibility.
Currently, Stability AI is OpenBioML's only partner. Stability AI supports the community by both supporting its organizers and making available compute, storage, and high performance computing expertise for its projects. Given that OpenBioML's research benefits the wider landscape of machine learning in biology and that its output can be used for commercial purposes, our hope is that in the future more organizations will support OpenBioML. Please reach out to us if you are interested in supporting our community.
OpenBioML's partners support our efforts driven by the idea that open-source machine learning will play an increasingly important role in biotechnological research. Furthermore, everyone, including our partners, can freely leverage the community's research output for both non-commercial and commercial applications.