AIRBDS AI-Readiness Dataset Scoring Metric
A practical guide to evaluating bioscience datasets for AI/ML use
Developed by the AIRBDS Working Group, AIBIO-UK · Funded by BBSRC · Licensed CC BY-SA 4.0
Overview
Questions this tutorial answers:
- What does the AIRBDS metric measure, and why does it matter?
- Should I use the CSV (spreadsheet) or YAML (text file) format?
- How do I work through all 28 questions and calculate a grade?
- How do I submit my completed review?
Learning Objectives
By the end of this tutorial, you will be able to:
- Describe what the AIRBDS metric assesses and how it is scored
- Choose the format (CSV or YAML) that suits your technical background
- Complete a full dataset review and assign a grade (Caution / Bronze / Silver / Gold)
- Submit a completed review to the repository
Who this is for: Researchers, data curators, and repository managers working with bioscience datasets
Skill levels covered: Beginner (CSV path) · Intermediate (YAML path)
Estimated time: 30–60 minutes per dataset review
What is the AIRBDS Metric?
The AIRBDS metric is a structured checklist of 28 Yes/No questions that assesses whether a bioscience dataset is ready for use in AI and machine learning workflows. Questions cover four areas:
| Scope | Questions | What it checks |
|---|---|---|
| Infrastructure | ACM-1 – ACM-10 | Access, licensing, unique identifiers, version control |
| Metadata | ACM-11 – ACM-17 | Bias documentation, standards, preprocessing, provenance |
| Content | ACM-18 – ACM-23 | Completeness, consistency, format |
| Ethics | ACM-24 – ACM-28 | Consent, privacy, security, data protection |
Each answer contributes to a weighted score. Datasets receive one of four grades:
| Grade | Meaning |
|---|---|
| 🔴 Caution | Fails one or more Critical criteria — serious limitations for AI/ML use |
| 🟤 Bronze | Passes most Critical questions (≥ 7/8) |
| ⚪ Silver | Passes all Critical + ≥ 50% of Important questions |
| 🟡 Gold | Passes all Critical and Important + ≥ 50% of Optional questions |
How to use this tutorial
Start here → Chapter 1: Getting Started — Choose Your Format
Chapter 1 explains the difference between the CSV and YAML formats and helps you choose the right one based on your experience level. You will then follow either:
- Chapter 2 — CSV walkthrough (beginner, no coding required, Excel or Google Sheets)
- Chapter 3 — YAML walkthrough (intermediate, text editor and command line)
Citation
If you use this metric in your research, please cite:
AIRBDS Working Group, AIBIO-UK. (2025). AIRBDS AI-Readiness Dataset Scoring Metric (v0.3). GitHub. https://github.com/AIBIO-UK/airbds-metric
Full citation metadata is in CITATION.cff.
Template attribution
This tutorial site is built using the ELIXIR Training Lesson Template by van Geest G, Kronander E, Romero Herrera JA, Žlender N, ELIXIR Training Coordination Team & Cardona A (2023). DOI: 10.5281/zenodo.7913092 · CC BY-SA 4.0. Content has been replaced with the AIRBDS tutorial and AIBIO-UK branding applied. This tutorial is therefore also licensed CC BY-SA 4.0.