YAML Walkthrough
Intermediate · Requires a text editor and basic command-line familiarity
Overview
Questions:
- How do I clone the repository and copy the template?
- How do I fill in reviewer details and answer all 28 questions in YAML?
- How do I validate the file and calculate a grade?
- How do I submit via pull request?
Learning Objectives
By the end of this chapter, you will be able to:
- Clone the repository and copy the YAML review template
- Fill in all reviewer metadata and 28 question answers
- Correctly handle the Ethics questions with the
not_applicablefield - Validate the YAML file and resolve common errors
- Calculate a grade and submit via a pull request
Time: 30–60 minutes per dataset review (depending on how well you know the dataset)
Prerequisite: Complete Chapter 1 to confirm YAML is the right format for you.
Prerequisites
Before you begin, make sure you have:
- Git installed — run
git --versionto check - A plain-text editor (VS Code, Sublime Text, nano, vim, or similar)
- Python 3 — run
python3 --versionto check (used for validation in Step 6)
Step 1 — Get the repository
If you plan to submit your review, fork the repository on GitHub first, then clone your fork:
git clone https://github.com/<your-username>/airbds-metric.git
cd airbds-metric
If you just want to explore locally without submitting, clone the main repo directly:
git clone https://github.com/AIBIO-UK/airbds-metric.git
cd airbds-metric
Step 2 — Copy the template
Create your review file in the reviews/ folder using this naming convention:
<dataset_accession>_<your_initials>_<review_number>.yaml
cp metric/review_template.yaml reviews/E-MTAB-1234_CH_1.yaml
Replace E-MTAB-1234 with the dataset’s accession number and CH with your initials. Use a short descriptive name if there is no accession number.
Open the new file in your text editor.
Reference file
Keep metric/airbds_metric_v0.3.yaml open in another tab — it contains the full question text and guidance for each ACM ID.
Step 3 — Fill in reviewer and dataset metadata
The top of the file has two blocks. Fill in every field:
reviewer:
name: "Charlie Harrison"
initials: "CH"
orcid: "0000-0001-2345-6789" # leave as "" if you don't have an ORCID
affiliation: "Aberystwyth University"
review_date: "2025-06-01" # YYYY-MM-DD format
dataset:
name: "My Dataset Name"
url: "https://www.ebi.ac.uk/arrayexpress/experiments/E-MTAB-1234/"
hosting_resource: "ArrayExpress"
accession: "E-MTAB-1234"
comments: ""
process_comments: ""
Step 4 — Answer the 28 questions
Scroll to the answers: block. For each question, set answer to "Yes" or "No":
answers:
ACM-1:
answer: "Yes"
comments: "Dataset is fully downloadable via FTP."
ACM-2:
answer: "Yes"
comments: ""
ACM-3:
answer: "No"
comments: "No authentication required — dataset is fully public."
Formatting rules
answermust be exactly"Yes"or"No"— case-sensitive and quoted.commentsis optional — leave as""if you have nothing to add.- Do not delete any question block, even if the answer is
"No". - YAML is indentation-sensitive — do not mix tabs and spaces.
Work through all 28 question blocks (ACM-1 through ACM-28). The full question text and guidance for each ID is in metric/airbds_metric_v0.3.yaml.
Weight matters
Eight questions are marked Critical (80 pts each): ACM-4, ACM-5, ACM-9, ACM-12, ACM-13, ACM-17, ACM-20, and ACM-24. Read the guidance carefully for these — failing one has a large impact on the grade.
Step 5 — Handle Ethics questions (ACM-24 to ACM-28)
These five questions apply only to datasets containing human or animal subject data.
Does your dataset contain human or animal subjects?
Set not_applicable: true and answer: "Yes" for each of ACM-24 to ACM-28:
ACM-24:
answer: "Yes"
not_applicable: true
comments: "No human or animal subject data in this dataset."
ACM-25:
answer: "Yes"
not_applicable: true
comments: ""
ACM-26:
answer: "Yes"
not_applicable: true
comments: ""
ACM-27:
answer: "Yes"
not_applicable: true
comments: ""
ACM-28:
answer: "Yes"
not_applicable: true
comments: ""
Answer each of ACM-24 to ACM-28 normally ("Yes" or "No") and leave not_applicable: false.
Step 6 — Validate your YAML
Before calculating the score, check that your file is syntactically valid:
python3 -c "import yaml; yaml.safe_load(open('reviews/E-MTAB-1234_CH_1.yaml')); print('YAML is valid')"
If you see YAML is valid, proceed. If you see an error, it will show the line number.
Common mistakes:
| Symptom | Likely cause |
|---|---|
could not find expected ':' |
Missing colon after a key |
found character '\t' |
Tab used instead of spaces for indentation |
expected a block sequence entry |
answer value not quoted — write "Yes" not Yes |
mapping values are not allowed here |
Colon inside an unquoted string — quote the value |
Step 7 — Calculate the score
Use these weight values:
| Weight tier | Points | Questions |
|---|---|---|
| Critical | 80 | ACM-4, 5, 9, 12, 13, 17, 20, 24 (8 total) |
| Important | 5 | ACM-1, 2, 6, 11, 16, 18, 19, 21, 25, 26, 27 (11 total) |
| Optional | 2 | ACM-3, 7, 8, 10, 14, 15, 22, 23, 28 (9 total) |
Compute pass rates (proportion of questions answered “Yes” per tier):
Critical pass rate = (Critical "Yes" count) / 8
Important pass rate = (Important "Yes" count) / 11
Optional pass rate = (Optional "Yes" count) / 9
The full scoring reference is in metric/scoring_schema.yaml.
Step 8 — Determine the grade
| Grade | Critical pass rate | Important pass rate | Optional pass rate |
|---|---|---|---|
| 🔴 Caution | < 0.875 (< 7/8) | any | any |
| 🟤 Bronze | ≥ 0.875 (≥ 7/8) | any | any |
| ⚪ Silver | = 1.0 (all 8) | ≥ 0.5 | any |
| 🟡 Gold | = 1.0 (all 8) | = 1.0 (all 11) | ≥ 0.5 |
Record the score and grade in the result: block at the bottom of your YAML file:
result:
weighted_score: 595
grade: "Silver"
Step 9 — Submit via pull request
# Create a feature branch
git checkout -b review/add-E-MTAB-1234
# Stage your file
git add reviews/E-MTAB-1234_CH_1.yaml
# Commit
git commit -m "review: add review for E-MTAB-1234 (CH)"
# Push to your fork
git push -u origin review/add-E-MTAB-1234
Then open a pull request on GitHub against the main branch of AIBIO-UK/airbds-metric.
Commit message convention
| Prefix | Use for |
|---|---|
review: |
Adding or updating a dataset review |
metric: |
Changes to the metric YAML |
docs: |
Documentation updates |
fix: |
Typos, broken links, formatting |
For full contribution guidelines see CONTRIBUTING.md.
Well done — you have completed a dataset review using the YAML format! 🎉
Continue to Further Resources or return to the About page.