CONTEXT LAYER

QTL data, normalized into one queryable system.

QTL evidence is spread across hundreds of studies and supplementary tables, each with its own schema, conventions, and statistical assumptions.

Devano continuously extracts, normalizes, and links this fragmented landscape into a provenance-aware system for cross-study analysis and AI-native workflows.

249
studies
1,566
structured tables
23
QTL types
22
populations
60+
tissues
97.6%
QC pass

Snapshot from corpus_overview · April 30, 2026 · numbers refresh as the index rebuilds.

01 · The capability category

Not another QTL atlas. A context layer across the ecosystem.

Most QTL resources are distributed as papers, supplementary tables, or project-specific releases. Valuable scientific resources — but fundamentally snapshots: fixed schemas, fixed assumptions, fixed release cycles.

The QTL Index is different. It is a continuously updated system that extracts, normalizes, ontology-maps, and links QTL evidence into infrastructure that computational and AI workflows can directly operate on.

CONTINUOUS INGESTION
Papers enter the index as they're published.

No fixed release cadence. New QTL studies are extracted, normalized, and QC-gated on a rolling basis — the index is a live view of the field, not a frozen snapshot.

CROSS-STUDY NORMALIZATION
Heterogeneous tables, one queryable shape.

Units, schemas, and column conventions across 1,566 extracted tables reduce to a small set of typed shapes — sentinel hits, coloc, MR, fine-mapping, conditional, trans — comparable across papers.

ONTOLOGY MAPPING
Tissues, cell types, diseases as first-class IDs.

Free-text labels resolve to BTO / CL / MONDO so a query like tissue=CSF or disease=T2D reaches every study mapping that biology, regardless of how the authors phrased it.

MULTI-MODAL LINKING
Expression, splicing, protein, methylation — linked.

eQTL, sQTL, pQTL, mQTL, caQTL, haQTL and a long tail of rarer modalities sit in the same model, joinable on gene, variant, and locus — not stacked as separate browsers.

PROVENANCE · QC
Raw supplement column to query result, auditable.

Every transform — rename, type, schema, validation — is logged. Rigorous automated QC runs on every study; high-severity failures are held back. 97.6% of studies pass.

MACHINE-USABLE CONTEXT
Designed so models can reason over it.

Progressive-disclosure MCP, deterministic schemas, SQL against parquet. An agent can orient in ~200 tokens, search in ~50/result, and pull rows directly — without pipelines or API glue.

02 · Statistical depth

Beyond sentinel hits — coloc, MR, fine-mapping, all queryable.

The structured downstream outputs every QTL paper actually publishes — colocalization, Mendelian randomization, fine-mapping posteriors, conditional analyses, trans hits, gene–phenotype maps — live in SQL-queryable parquet, not buried supplements.

~30% of indexed studies ship genome-wide summary stats. The rest of the field stops at top hits.

Extracted tables by category 843 structured QTL outputs across 249 studies
Sentinel hits
499
Colocalization · PP4 vs GWAS
82
Gene–phenotype maps
72
Phenotype maps
49
Genome-wide summary stats · full
36
Conditional summary stats · COJO
34
MR results · causal estimates
33
Fine-mapping posteriors · PIPs
23
Trans hits · distant-acting
13
Credible sets · resolved
2
Catalog baseline
Discovery
Locus · gene
Causal inference
Trans

Each table is reachable via query_table with SQL. Chart excludes 723 supporting tables (GWAS overlaps, PheWAS, gene-prioritization curation) — those remain queryable but aren't the load-bearing differentiator here.

Live colocalization query · study 37563310 (Zhao 2023, inflammatory proteins × GTEx coloc) PP4 = posterior probability of single shared causal variant
SELECT protein, gene_symbol, tissue_type, pp_h4
FROM data WHERE pp_h4 > 0.7 ORDER BY pp_h4 DESC LIMIT 7;
protein gene tissue PP4
CD40CD40Artery_Aorta1.00
CD40CD40Nerve_Tibial1.00
CD40CD40Skin_Sun_Exposed_Lower_leg1.00
CD6CD6Brain_Amygdala1.00
CD6CD6Brain_Anterior_cingulate_cortex_BA241.00
CD6CD6Brain_Cerebellar_Hemisphere1.00
CD6CD6Brain_Cerebellum1.00

PP4 = 1.00 means the pQTL and the eQTL at this locus share a single causal variant with certainty — the data that lets you call a gene, not just a region.

Live run · query_table(study_id="37563310", table_id="…ST7", query=…)

03 · Modality × ancestry × context

Comparable across modality, ancestry, and biological context.

Coverage isn't the headline — it's the consequence of the normalization layer. The same query reaches across regulatory modalities, ancestry groups, and tissue contexts that today sit in separate releases and supplements.

MODALITY STACK · 23 QTL TYPES
Studies by QTL modality non-exclusive tags · 249 studies
eQTL · expression
140
sQTL · splicing
57
pQTL · protein
52
mQTL · DNA methylation
47
caQTL · chromatin accessibility
38
haQTL · histone acetylation
13
apaQTL · alt. polyadenylation
3
16 rarer QTL types
16
RNA expression
RNA processing
Protein
Epigenetic
Rare / exotic

Rarer tags include imageQTL, edQTL, nucQTL, tfQTL, vQTL, fpQTL — each queryable as a first-class filter alongside expression and protein.

WHY MULTI-MODAL MATTERS

In an iPSC multi-omic atlas (study 39986281), adding chromatin and histone QTLs to expression alone lifted GWAS-locus annotation 2.3×.

ANCESTRY COVERAGE · 22 POPULATIONS, 17 STRATIFIED DESIGNS

17 studies use ancestry-stratified designs — purpose-built to detect population-specific effects, not incidentally diverse. Ancestry filters resolve to sample-level metadata, with confidence exposed.

Studies by ancestry label non-exclusive · multi-ancestry studies tag in multiple rows
European
130
Not reported
51
Mixed / multi-ethnic
50
African
30
East Asian
29
African American
19
South Asian
13
Hispanic / Latino
8
Finnish · Middle Eastern · Icelandic
11
Other (Sardinian, Indonesian, Korean, …)
9
Non-European
Multi-ethnic
Founder / under-represented
European
Not reported
UGANDA · PLASMA pQTL
Soremekun 2026 · 41507467
N=525 · 2,873 Olink Explore proteins

399 plasma pQTLs in continental-African individuals — 37 never reported in any population. Colocalized SIRPb1 with T2D risk via an ancestry-specific mechanism.

ANCESTRY-STRATIFIED · pQTL + mQTL
Yang 2025 · 40789849
2,338 EUR vs 414 AFR · plasma SomaScan

Head-to-head stratified mapping reveals 954 AFR-specific pQTLs and a distinct set of T2D effector proteins — biology a European-only analysis misses entirely.

INDONESIAN · SPLICING QTL
Ibeh 2024 · preprint 2024.05.07
N=115 · Papuan-like ancestry · whole blood

Maps >6,000 cis-sQTLs across Indonesian island populations spanning an ancestry gradient (~2% Denisovan introgression) — a population virtually absent from existing atlases.

TISSUE & DISEASE CONTEXT · PATIENT-TISSUE STUDIES

Reference atlases like GTEx are the standard for non-diseased adult tissues. The harder-to-assemble material — patient-tissue cohorts, case-control disease-state designs, tumor tissue, rare tissues like cartilage, CSF, and atrial appendage — is scattered across individual papers and project portals. The QTL Index organizes that material in the same model as the reference data, so it's reachable by the same queries.

WHY PATIENT TISSUE MATTERS

A brain/CSF/plasma proteome atlas (study 34239129) found 48–77% of pQTLs don't colocalize with eQTL, sQTL, mQTL, or haQTL — protein-level genetic regulation is a distinct layer, and one only patient-tissue studies can access.

Tissue families · studies per family 60+ distinct biosamples, rolled up
Immune / blood / biofluid
136
CNS / brain · incl. CSF
38
Metabolic · liver, islets, adipose, muscle
23
Cardiovascular
10
Musculoskeletal · cartilage, synovium
10
Reproductive / developmental
10
Tumor tissue · primary cancer
9
Ocular · retina + RPE
6

Rare tissues (cartilage, synovium, placenta, atrial appendage, dried blood spots, pancreatic islets) are queryable as first-class filters. 9 case-control studies and 9 tumor-tissue studies sit alongside the population-cohort designs.

04 · Provenance

Auditable from raw column to query result.

Clean-looking tables are easy. The harder thing is showing every transform between the raw supplement — usually a spreadsheet, sometimes a PDF table — and the SQL-queryable parquet, then re-running that chain as new papers land. Every rename, type, and schema decision is logged; each rebuild replays the chain.

Provenance trace · study 41501544 — Li 2026 (UKB-PPP trans-pQTL, N=52,363) snapshot · 2026-04-30
01 · RAWsupplement S10
chr
bp_hg38
allele0
allele1
beta
se
log10p
target
cis_trans
02 · RENAMEDrename_mapping
beta → effect_size
allele1 → alt_allele
log10p → p_value
bp_hg38 → position
target → marker_id
03 · SCHEMAdescribe_table
chromosome  VARCHAR
position    BIGINT
ref_allele  VARCHAR
alt_allele  VARCHAR
effect_size DOUBLE
se          DOUBLE
p_value     DOUBLE
gene_symbol VARCHAR
cis_trans   VARCHAR
04 · QUERYquery_table SQL
chr6:143503967
ref=G, alt=T
EAF=0.253
β=−1.297
log10p=7785.79
gene=FUCA1
trans
Every row in the index unrolls this chain. Click any table → see the raw supplement column on the left, the normalized parquet schema on the right.
05 · Quality control

Every study, QC'd before search.

We run rigorous automated QC on every study before it reaches search. High-severity failures are held back. Minor issues stay searchable but are visibly flagged on the study, so reviewers can see exactly why before relying on the numbers. No silent failures, no hidden caveats.

QUALITY FLOOR
97.6%

of studies pass automated QC and are searchable by default.

6 studies are held back from search unless explicitly requested. 30 studies carry a visible flag and remain searchable.

searchable, clean · 21385.5%
searchable, flagged · 3012.0%
held back · 62.4%

Of the 249 studies, 130 (52%) have parquet you can SQL today; 93 link to external repositories; 23 have no extractable data.

QC FLAGS · VISIBLE, NOT BURIED

When a study can't be loaded cleanly, the reason is a labeled flag attached to the study itself — not a footnote in a release-notes file.

Held back A liver caQTL study declares QTL types but no primary summary-stats table was found in its supplements — can't be loaded as a primary QTL study.
Held back A consortium portal release has no tissue or cell-type metadata after extraction — without that, downstream filters would silently mis-match.
Flagged 29 studies have a declared table category that disagrees with their declared summary-stats scope. Searchable, but flagged so reviewers can inspect before relying on the numbers.
06 · Machine-usable context

A context layer models can reason over.

The same normalized model that powers human exploration is the surface an agent can reliably use. Progressive disclosure on the MCP surface means an agent can orient in a couple hundred tokens, search in fifty per result, open any study in a few hundred more, then run SQL against the actual published table — no pipeline, no API glue, no schema guessing.

D
Claude · Devano MCP
qtl−index

One MCP. Four resolutions. Orient → search → open → query.

01 · Orient
corpus_overview
~200 tokens
What's in here?
02 · Search
search_studies
~50 / result
Filter by tissue × modality × ancestry × disease.
03 · Open
study_narrative · study_detail
~300–500
Inspect a study, its cohorts, its tables.
04 · Query
query_table
SQL against parquet
Pull rows. Filter. Aggregate.

Two extra calls round out the surface: search_entities for gene-level lookup across studies, and find_similar_studies for narrative-embedding discovery. Both ~50 tokens per result.

GENE LOOKUP · search_entities("IL6R")

6 records across 5 studies spanning pQTL + eQTL, four ancestries, two tissues — one tool call.

33067605Folkersen 2021 · CV pQTL (N=30,931)
39316441Tahir 2024 · African-American pQTL
38565889Kalnapenkis 2024 · Estonian plasma pQTL (p=1e−106)
41507467Soremekun 2026 · Ugandan plasma pQTL
31917787Nurnberg 2020 · vascular eQTL
DISCOVERY · find_similar_studies(37794188)

Seed = Eldjarn 2023 multi-ancestry plasma pQTL. Neighbors via narrative embedding, not keyword overlap.

0.89Nanoparticle MS proteomics · British South Asians
0.88UKB-PPP plasma pQTL atlas · 14,287 sentinels
0.88Ma 2025 disease-stratified UKB pQTL · 28K patients

Cosine similarity >0.87 across all three. The embedding captures conceptual proximity — agents surface replication candidates without knowing study IDs in advance.

07 · Where this sits

A context layer across the QTL ecosystem.

Excellent scientific resources exist for QTL evidence. We use them ourselves. The QTL Index is not a replacement for any of these — it's the continuously normalized context layer above them, pulling paper-native evidence, downstream statistical tables, ancestry- and disease-context studies, and provenance into a single queryable model.

GTEx Portal

The reference atlas for non-diseased adult tissues — the canonical baseline for cross-tissue eQTL/sQTL effects.

eQTL Catalogue

Uniformly reprocessed eQTL studies across consortium cohorts — the standard for comparable expression QTLs.

QTLbase2

A broad public xQTL browser spanning multiple molecular phenotypes and tissue contexts.

Open Targets Genetics

A post-GWAS target prioritization platform that fuses associations with selected functional evidence.

GWAS Catalog

The standard catalog for trait associations and sentinel hits, with summary statistics for many studies.

Devano QTLx · what we add

Continuous ingestion of paper-native evidence (including supplements), cross-study normalization, downstream statistical tables (coloc/MR/fine-mapping/conditional/trans), provenance back to raw columns, and machine-usable query — all in a single model.

None of this replaces the role of consortia or curated portals. It complements them — turning the heterogeneous, ever-growing literature around them into a structure computational and agent workflows can rely on.

METHODOLOGY

Headline counts emit from corpus_overview against the QTL Index build, snapshot 2026-04-30. Per-study examples sourced via search_studies, study_detail, study_narrative, and query_table SQL against extracted parquet. Numbers refresh after each pipeline rebuild — typically monthly.

Data presented on this page is provided by 🧬 Devano (devano.ai). Underlying studies are credited to their original authors and journals.