Enabling Open-Source, Collaborative Research at the Intersection of Machine Learning and Biology.

Research Philosophy

OpenBioML is a decentralized research community dedicated to enabling high-impact research at the intersection of machine learning and biology. It is characterized by a "fully in the open" approach to research and a commitment to permissively-licensed machine learning models, datasets, and research-related artifacts. By treating these resources as public goods and fostering a collaborative research environment, OpenBioML aims to increase structural diversity in the biology research landscape and address some of the market failures of science that characterize it, intending to accelerate progress in the field.

Conducting research in the open

Inspired by the success of EleutherAI, OpenBioML conducts research in the open, so anyone can observe and is encouraged to contribute at every stage of our projects. With this approach, we aim to prevent the formation of knowledge monopolies and enable talented individuals from different backgrounds from all over the world to contribute.

Enabling ambitious research through large scale community resources

Cutting edge research requires large amounts of resources, including compute and storage, to rapidly iterate on new ideas through experiments and scale up early stage success. For this reason, we provide community-sponsored projects with vast resources made available to us through our partners.

Emphasize collaboration over competition

Our community projects should be characterized by large scale collaboration. The field is already represented by venues for competition, like CASP for protein structure prediction, and the field will benefit from a setting dedicated to collaborative efforts. In addition, the collaboration will help diminish the amount of duplicate work in the field and allow more researchers to focus on the most creative aspects of the research process.

Research Requirements

Releasing all project-associated artifacts under permissive licenses

Machine learning's positive impact on biology can only be realized if the relevant research is available for all researchers to experiment with and build upon. Therefore, it is a core requirement of OpenBioML projects to release all project-associated artifacts, e.g., datasets, experiment logs, source code and trained model checkpoints, under a permissive license such as Creative Commons Attribution (CC-BY), MIT, or Apache.

Including a section dedicated to individual contributors for credit assignment

For research to function properly and resources to be allocated reasonably, it is necessary to implement proper credit assignments. We believe author order to be insufficient by itself, and we, therefore, require publications derived from community projects to include a section outlining all individual contributions.

Ensuring experiment reproducibility

Released project-associated artifacts must enable other researchers to reproduce project results. OpenBioML projects should therefore aim to make it possible to set up all dependencies with a single command, allow the download of all datasets and model weights from a reliable file host, and make the source code available in a public project repository. In addition, the key analysis should be recorded in a digital laboratory journal and experiment logs, and the research itself should be deterministic to better account for reproducibility.

Partnerships

Current partners and their relationship with OpenBioML

Stability AI is OpenBioML's only partner. Stability AI supports the community by both supporting its organizers and making available compute, storage, and high performance computing expertise for its projects.

Given that OpenBioML's research benefits the broader landscape of machine learning in biology, we hope more organizations will support it. Please reach out to us if you are interested in supporting our community.

Why OpenBioML's partners make their resources available

Our partners support our efforts driven by the idea that open-source machine learning will play an increasingly important role in biotechnological research. Furthermore, everyone, including our partners, can freely leverage the community's research output for non-commercial and commercial applications.