CONTEXT LAYER

QTL data, normalized into one queryable system.

QTL evidence is spread across hundreds of studies and supplementary tables, each with its own schema, conventions, and statistical assumptions.

Devano continuously extracts, normalizes, and links this fragmented landscape into a provenance-aware system for cross-study analysis and AI-native workflows.

249

studies

1,566

structured tables

QTL types

populations

60+

tissues

97.6%

QC pass

Snapshot from corpus_overview · April 30, 2026 · numbers refresh as the index rebuilds.

01 · The capability category

Not another QTL atlas. A context layer across the ecosystem.

Most QTL resources are distributed as papers, supplementary tables, or project-specific releases. Valuable scientific resources — but fundamentally snapshots: fixed schemas, fixed assumptions, fixed release cycles.

The QTL Index is different. It is a continuously updated system that extracts, normalizes, ontology-maps, and links QTL evidence into infrastructure that computational and AI workflows can directly operate on.

CONTINUOUS INGESTION

Papers enter the index as they're published.

No fixed release cadence. New QTL studies are extracted, normalized, and QC-gated on a rolling basis — the index is a live view of the field, not a frozen snapshot.

CROSS-STUDY NORMALIZATION

Heterogeneous tables, one queryable shape.

Units, schemas, and column conventions across 1,566 extracted tables reduce to a small set of typed shapes — sentinel hits, coloc, MR, fine-mapping, conditional, trans — comparable across papers.

ONTOLOGY MAPPING

Tissues, cell types, diseases as first-class IDs.

Free-text labels resolve to BTO / CL / MONDO so a query like tissue=CSF or disease=T2D reaches every study mapping that biology, regardless of how the authors phrased it.

MULTI-MODAL LINKING

Expression, splicing, protein, methylation — linked.

eQTL, sQTL, pQTL, mQTL, caQTL, haQTL and a long tail of rarer modalities sit in the same model, joinable on gene, variant, and locus — not stacked as separate browsers.

PROVENANCE · QC

Raw supplement column to query result, auditable.

Every transform — rename, type, schema, validation — is logged. Rigorous automated QC runs on every study; high-severity failures are held back. 97.6% of studies pass.

MACHINE-USABLE CONTEXT

Designed so models can reason over it.

Progressive-disclosure MCP, deterministic schemas, SQL against parquet. An agent can orient in ~200 tokens, search in ~50/result, and pull rows directly — without pipelines or API glue.

02 · Statistical depth

Beyond sentinel hits — coloc, MR, fine-mapping, all queryable.

The structured downstream outputs every QTL paper actually publishes — colocalization, Mendelian randomization, fine-mapping posteriors, conditional analyses, trans hits, gene–phenotype maps — live in SQL-queryable parquet, not buried supplements.

~30% of indexed studies ship genome-wide summary stats. The rest of the field stops at top hits.

Extracted tables by category 843 structured QTL outputs across 249 studies

Sentinel hits

499

Colocalization · PP4 vs GWAS

Gene–phenotype maps

Phenotype maps

Genome-wide summary stats · full

Conditional summary stats · COJO

MR results · causal estimates

Fine-mapping posteriors · PIPs

Trans hits · distant-acting

Credible sets · resolved

Catalog baseline

Discovery

Locus · gene

Causal inference

Trans

Each table is reachable via query_table with SQL. Chart excludes 723 supporting tables (GWAS overlaps, PheWAS, gene-prioritization curation) — those remain queryable but aren't the load-bearing differentiator here.

Live colocalization query · study 37563310 (Zhao 2023, inflammatory proteins × GTEx coloc) PP4 = posterior probability of single shared causal variant

SELECT protein, gene_symbol, tissue_type, pp_h4
FROM data WHERE pp_h4 > 0.7 ORDER BY pp_h4 DESC LIMIT 7;

protein	gene	tissue	PP4
CD40	CD40	Artery_Aorta	1.00
CD40	CD40	Nerve_Tibial	1.00
CD40	CD40	Skin_Sun_Exposed_Lower_leg	1.00
CD6	CD6	Brain_Amygdala	1.00
CD6	CD6	Brain_Anterior_cingulate_cortex_BA24	1.00
CD6	CD6	Brain_Cerebellar_Hemisphere	1.00
CD6	CD6	Brain_Cerebellum	1.00

PP4 = 1.00 means the pQTL and the eQTL at this locus share a single causal variant with certainty — the data that lets you call a gene, not just a region.

Live run · query_table(study_id="37563310", table_id="…ST7", query=…)

03 · Modality × ancestry × context

Comparable across modality, ancestry, and biological context.

Coverage isn't the headline — it's the consequence of the normalization layer. The same query reaches across regulatory modalities, ancestry groups, and tissue contexts that today sit in separate releases and supplements.

MODALITY STACK · 23 QTL TYPES

Studies by QTL modality non-exclusive tags · 249 studies

eQTL · expression

140

sQTL · splicing

pQTL · protein

mQTL · DNA methylation

caQTL · chromatin accessibility

haQTL · histone acetylation

apaQTL · alt. polyadenylation

16 rarer QTL types

RNA expression

RNA processing

Protein

Epigenetic

Rare / exotic

Rarer tags include imageQTL, edQTL, nucQTL, tfQTL, vQTL, fpQTL — each queryable as a first-class filter alongside expression and protein.

WHY MULTI-MODAL MATTERS

In an iPSC multi-omic atlas (study 39986281), adding chromatin and histone QTLs to expression alone lifted GWAS-locus annotation 2.3×.

ANCESTRY COVERAGE · 22 POPULATIONS, 17 STRATIFIED DESIGNS

17 studies use ancestry-stratified designs — purpose-built to detect population-specific effects, not incidentally diverse. Ancestry filters resolve to sample-level metadata, with confidence exposed.

Studies by ancestry label non-exclusive · multi-ancestry studies tag in multiple rows

European

130

Not reported

Mixed / multi-ethnic

African

East Asian

African American

South Asian

Hispanic / Latino

Finnish · Middle Eastern · Icelandic

Other (Sardinian, Indonesian, Korean, …)

Non-European

Multi-ethnic

Founder / under-represented

European

Not reported

UGANDA · PLASMA pQTL

Soremekun 2026 · 41507467

N=525 · 2,873 Olink Explore proteins

399 plasma pQTLs in continental-African individuals — 37 never reported in any population. Colocalized SIRPb1 with T2D risk via an ancestry-specific mechanism.

ANCESTRY-STRATIFIED · pQTL + mQTL

Yang 2025 · 40789849

2,338 EUR vs 414 AFR · plasma SomaScan

Head-to-head stratified mapping reveals 954 AFR-specific pQTLs and a distinct set of T2D effector proteins — biology a European-only analysis misses entirely.

INDONESIAN · SPLICING QTL

Ibeh 2024 · preprint 2024.05.07

N=115 · Papuan-like ancestry · whole blood

Maps >6,000 cis-sQTLs across Indonesian island populations spanning an ancestry gradient (~2% Denisovan introgression) — a population virtually absent from existing atlases.

TISSUE & DISEASE CONTEXT · PATIENT-TISSUE STUDIES

Reference atlases like GTEx are the standard for non-diseased adult tissues. The harder-to-assemble material — patient-tissue cohorts, case-control disease-state designs, tumor tissue, rare tissues like cartilage, CSF, and atrial appendage — is scattered across individual papers and project portals. The QTL Index organizes that material in the same model as the reference data, so it's reachable by the same queries.

WHY PATIENT TISSUE MATTERS

A brain/CSF/plasma proteome atlas (study 34239129) found 48–77% of pQTLs don't colocalize with eQTL, sQTL, mQTL, or haQTL — protein-level genetic regulation is a distinct layer, and one only patient-tissue studies can access.

Tissue families · studies per family 60+ distinct biosamples, rolled up

Immune / blood / biofluid

136

CNS / brain · incl. CSF

Metabolic · liver, islets, adipose, muscle

Cardiovascular

Musculoskeletal · cartilage, synovium

Reproductive / developmental

Tumor tissue · primary cancer

Ocular · retina + RPE

Rare tissues (cartilage, synovium, placenta, atrial appendage, dried blood spots, pancreatic islets) are queryable as first-class filters. 9 case-control studies and 9 tumor-tissue studies sit alongside the population-cohort designs.

04 · Provenance

Auditable from raw column to query result.

Clean-looking tables are easy. The harder thing is showing every transform between the raw supplement — usually a spreadsheet, sometimes a PDF table — and the SQL-queryable parquet, then re-running that chain as new papers land. Every rename, type, and schema decision is logged; each rebuild replays the chain.

Provenance trace · study 41501544 — Li 2026 (UKB-PPP trans-pQTL, N=52,363) snapshot · 2026-04-30

01 · RAWsupplement S10

chr
bp_hg38
allele0
allele1
beta
se
log10p
target
cis_trans

02 · RENAMEDrename_mapping

beta → effect_size
allele1 → alt_allele
log10p → p_value
bp_hg38 → position
target → marker_id

03 · SCHEMAdescribe_table

chromosome  VARCHAR
position    BIGINT
ref_allele  VARCHAR
alt_allele  VARCHAR
effect_size DOUBLE
se          DOUBLE
p_value     DOUBLE
gene_symbol VARCHAR
cis_trans   VARCHAR

04 · QUERYquery_table SQL

chr6:143503967
ref=G, alt=T
EAF=0.253
β=−1.297
log10p=7785.79
gene=FUCA1
trans

Every row in the index unrolls this chain. Click any table → see the raw supplement column on the left, the normalized parquet schema on the right.

05 · Quality control

Every study, QC'd before search.

We run rigorous automated QC on every study before it reaches search. High-severity failures are held back. Minor issues stay searchable but are visibly flagged on the study, so reviewers can see exactly why before relying on the numbers. No silent failures, no hidden caveats.

QUALITY FLOOR

97.6%

of studies pass automated QC and are searchable by default.

6 studies are held back from search unless explicitly requested. 30 studies carry a visible flag and remain searchable.

searchable, clean · 21385.5%

searchable, flagged · 3012.0%

held back · 62.4%

Of the 249 studies, 130 (52%) have parquet you can SQL today; 93 link to external repositories; 23 have no extractable data.

QC FLAGS · VISIBLE, NOT BURIED

When a study can't be loaded cleanly, the reason is a labeled flag attached to the study itself — not a footnote in a release-notes file.

Held back A liver caQTL study declares QTL types but no primary summary-stats table was found in its supplements — can't be loaded as a primary QTL study.

Held back A consortium portal release has no tissue or cell-type metadata after extraction — without that, downstream filters would silently mis-match.

Flagged 29 studies have a declared table category that disagrees with their declared summary-stats scope. Searchable, but flagged so reviewers can inspect before relying on the numbers.

06 · Machine-usable context

A context layer models can reason over.

The same normalized model that powers human exploration is the surface an agent can reliably use. Progressive disclosure on the MCP surface means an agent can orient in a couple hundred tokens, search in fifty per result, open any study in a few hundred more, then run SQL against the actual published table — no pipeline, no API glue, no schema guessing.

Claude · Devano MCP

qtl−index

One MCP. Four resolutions. Orient → search → open → query.

01 · Orient

corpus_overview

~200 tokens

What's in here?

02 · Search

search_studies

~50 / result

Filter by tissue × modality × ancestry × disease.

03 · Open

study_narrative · study_detail

~300–500

Inspect a study, its cohorts, its tables.

04 · Query

query_table

SQL against parquet

Pull rows. Filter. Aggregate.

Two extra calls round out the surface: search_entities for gene-level lookup across studies, and find_similar_studies for narrative-embedding discovery. Both ~50 tokens per result.

GENE LOOKUP · search_entities("IL6R")

6 records across 5 studies spanning pQTL + eQTL, four ancestries, two tissues — one tool call.

33067605Folkersen 2021 · CV pQTL (N=30,931)

39316441Tahir 2024 · African-American pQTL

38565889Kalnapenkis 2024 · Estonian plasma pQTL (p=1e−106)

41507467Soremekun 2026 · Ugandan plasma pQTL

31917787Nurnberg 2020 · vascular eQTL

DISCOVERY · find_similar_studies(37794188)

Seed = Eldjarn 2023 multi-ancestry plasma pQTL. Neighbors via narrative embedding, not keyword overlap.

0.89Nanoparticle MS proteomics · British South Asians

0.88UKB-PPP plasma pQTL atlas · 14,287 sentinels

0.88Ma 2025 disease-stratified UKB pQTL · 28K patients

Cosine similarity >0.87 across all three. The embedding captures conceptual proximity — agents surface replication candidates without knowing study IDs in advance.

07 · Where this sits

A context layer across the QTL ecosystem.

Excellent scientific resources exist for QTL evidence. We use them ourselves. The QTL Index is not a replacement for any of these — it's the continuously normalized context layer above them, pulling paper-native evidence, downstream statistical tables, ancestry- and disease-context studies, and provenance into a single queryable model.

GTEx Portal

The reference atlas for non-diseased adult tissues — the canonical baseline for cross-tissue eQTL/sQTL effects.

eQTL Catalogue

Uniformly reprocessed eQTL studies across consortium cohorts — the standard for comparable expression QTLs.

QTLbase2

A broad public xQTL browser spanning multiple molecular phenotypes and tissue contexts.

Open Targets Genetics

A post-GWAS target prioritization platform that fuses associations with selected functional evidence.

GWAS Catalog

The standard catalog for trait associations and sentinel hits, with summary statistics for many studies.

Devano QTLx · what we add

Continuous ingestion of paper-native evidence (including supplements), cross-study normalization, downstream statistical tables (coloc/MR/fine-mapping/conditional/trans), provenance back to raw columns, and machine-usable query — all in a single model.

None of this replaces the role of consortia or curated portals. It complements them — turning the heterogeneous, ever-growing literature around them into a structure computational and agent workflows can rely on.

METHODOLOGY

Headline counts emit from corpus_overview against the QTL Index build, snapshot 2026-04-30. Per-study examples sourced via search_studies, study_detail, study_narrative, and query_table SQL against extracted parquet. Numbers refresh after each pipeline rebuild — typically monthly.

Data presented on this page is provided by 🧬 Devano (devano.ai). Underlying studies are credited to their original authors and journals.