QTL data, normalized into one queryable system.
QTL evidence is spread across hundreds of studies and supplementary tables, each with its own schema, conventions, and statistical assumptions.
Devano continuously extracts, normalizes, and links this fragmented landscape into a provenance-aware system for cross-study analysis and AI-native workflows.
Snapshot from corpus_overview · April 30, 2026 · numbers refresh as the index rebuilds.
Not another QTL atlas. A context layer across the ecosystem.
Most QTL resources are distributed as papers, supplementary tables, or project-specific releases. Valuable scientific resources — but fundamentally snapshots: fixed schemas, fixed assumptions, fixed release cycles.
The QTL Index is different. It is a continuously updated system that extracts, normalizes, ontology-maps, and links QTL evidence into infrastructure that computational and AI workflows can directly operate on.
No fixed release cadence. New QTL studies are extracted, normalized, and QC-gated on a rolling basis — the index is a live view of the field, not a frozen snapshot.
Units, schemas, and column conventions across 1,566 extracted tables reduce to a small set of typed shapes — sentinel hits, coloc, MR, fine-mapping, conditional, trans — comparable across papers.
Free-text labels resolve to BTO / CL / MONDO so a query like tissue=CSF or disease=T2D reaches every study mapping that biology, regardless of how the authors phrased it.
eQTL, sQTL, pQTL, mQTL, caQTL, haQTL and a long tail of rarer modalities sit in the same model, joinable on gene, variant, and locus — not stacked as separate browsers.
Every transform — rename, type, schema, validation — is logged. Rigorous automated QC runs on every study; high-severity failures are held back. 97.6% of studies pass.
Progressive-disclosure MCP, deterministic schemas, SQL against parquet. An agent can orient in ~200 tokens, search in ~50/result, and pull rows directly — without pipelines or API glue.
Beyond sentinel hits — coloc, MR, fine-mapping, all queryable.
The structured downstream outputs every QTL paper actually publishes — colocalization, Mendelian randomization, fine-mapping posteriors, conditional analyses, trans hits, gene–phenotype maps — live in SQL-queryable parquet, not buried supplements.
~30% of indexed studies ship genome-wide summary stats. The rest of the field stops at top hits.
Each table is reachable via query_table with SQL. Chart excludes 723 supporting tables (GWAS overlaps, PheWAS, gene-prioritization curation) — those remain queryable but aren't the load-bearing differentiator here.
SELECT protein, gene_symbol, tissue_type, pp_h4 FROM data WHERE pp_h4 > 0.7 ORDER BY pp_h4 DESC LIMIT 7;
| protein | gene | tissue | PP4 |
|---|---|---|---|
| CD40 | CD40 | Artery_Aorta | 1.00 |
| CD40 | CD40 | Nerve_Tibial | 1.00 |
| CD40 | CD40 | Skin_Sun_Exposed_Lower_leg | 1.00 |
| CD6 | CD6 | Brain_Amygdala | 1.00 |
| CD6 | CD6 | Brain_Anterior_cingulate_cortex_BA24 | 1.00 |
| CD6 | CD6 | Brain_Cerebellar_Hemisphere | 1.00 |
| CD6 | CD6 | Brain_Cerebellum | 1.00 |
PP4 = 1.00 means the pQTL and the eQTL at this locus share a single causal variant with certainty — the data that lets you call a gene, not just a region.
Live run · query_table(study_id="37563310", table_id="…ST7", query=…)
Comparable across modality, ancestry, and biological context.
Coverage isn't the headline — it's the consequence of the normalization layer. The same query reaches across regulatory modalities, ancestry groups, and tissue contexts that today sit in separate releases and supplements.
Rarer tags include imageQTL, edQTL, nucQTL, tfQTL, vQTL, fpQTL — each queryable as a first-class filter alongside expression and protein.
In an iPSC multi-omic atlas (study 39986281), adding chromatin and histone QTLs to expression alone lifted GWAS-locus annotation 2.3×.
17 studies use ancestry-stratified designs — purpose-built to detect population-specific effects, not incidentally diverse. Ancestry filters resolve to sample-level metadata, with confidence exposed.
41507467399 plasma pQTLs in continental-African individuals — 37 never reported in any population. Colocalized SIRPb1 with T2D risk via an ancestry-specific mechanism.
40789849Head-to-head stratified mapping reveals 954 AFR-specific pQTLs and a distinct set of T2D effector proteins — biology a European-only analysis misses entirely.
preprint 2024.05.07Maps >6,000 cis-sQTLs across Indonesian island populations spanning an ancestry gradient (~2% Denisovan introgression) — a population virtually absent from existing atlases.
Reference atlases like GTEx are the standard for non-diseased adult tissues. The harder-to-assemble material — patient-tissue cohorts, case-control disease-state designs, tumor tissue, rare tissues like cartilage, CSF, and atrial appendage — is scattered across individual papers and project portals. The QTL Index organizes that material in the same model as the reference data, so it's reachable by the same queries.
A brain/CSF/plasma proteome atlas (study 34239129) found 48–77% of pQTLs don't colocalize with eQTL, sQTL, mQTL, or haQTL — protein-level genetic regulation is a distinct layer, and one only patient-tissue studies can access.
Rare tissues (cartilage, synovium, placenta, atrial appendage, dried blood spots, pancreatic islets) are queryable as first-class filters. 9 case-control studies and 9 tumor-tissue studies sit alongside the population-cohort designs.
Auditable from raw column to query result.
Clean-looking tables are easy. The harder thing is showing every transform between the raw supplement — usually a spreadsheet, sometimes a PDF table — and the SQL-queryable parquet, then re-running that chain as new papers land. Every rename, type, and schema decision is logged; each rebuild replays the chain.
chr bp_hg38 allele0 allele1 beta se log10p target cis_trans
beta → effect_size allele1 → alt_allele log10p → p_value bp_hg38 → position target → marker_id
chromosome VARCHAR position BIGINT ref_allele VARCHAR alt_allele VARCHAR effect_size DOUBLE se DOUBLE p_value DOUBLE gene_symbol VARCHAR cis_trans VARCHAR
chr6:143503967 ref=G, alt=T EAF=0.253 β=−1.297 log10p=7785.79 gene=FUCA1 trans
Every study, QC'd before search.
We run rigorous automated QC on every study before it reaches search. High-severity failures are held back. Minor issues stay searchable but are visibly flagged on the study, so reviewers can see exactly why before relying on the numbers. No silent failures, no hidden caveats.
of studies pass automated QC and are searchable by default.
6 studies are held back from search unless explicitly requested. 30 studies carry a visible flag and remain searchable.
Of the 249 studies, 130 (52%) have parquet you can SQL today; 93 link to external repositories; 23 have no extractable data.
When a study can't be loaded cleanly, the reason is a labeled flag attached to the study itself — not a footnote in a release-notes file.
A context layer models can reason over.
The same normalized model that powers human exploration is the surface an agent can reliably use. Progressive disclosure on the MCP surface means an agent can orient in a couple hundred tokens, search in fifty per result, open any study in a few hundred more, then run SQL against the actual published table — no pipeline, no API glue, no schema guessing.
One MCP. Four resolutions. Orient → search → open → query.
Two extra calls round out the surface: search_entities for gene-level lookup across studies, and find_similar_studies for narrative-embedding discovery. Both ~50 tokens per result.
6 records across 5 studies spanning pQTL + eQTL, four ancestries, two tissues — one tool call.
Seed = Eldjarn 2023 multi-ancestry plasma pQTL. Neighbors via narrative embedding, not keyword overlap.
Cosine similarity >0.87 across all three. The embedding captures conceptual proximity — agents surface replication candidates without knowing study IDs in advance.
A context layer across the QTL ecosystem.
Excellent scientific resources exist for QTL evidence. We use them ourselves. The QTL Index is not a replacement for any of these — it's the continuously normalized context layer above them, pulling paper-native evidence, downstream statistical tables, ancestry- and disease-context studies, and provenance into a single queryable model.
The reference atlas for non-diseased adult tissues — the canonical baseline for cross-tissue eQTL/sQTL effects.
Uniformly reprocessed eQTL studies across consortium cohorts — the standard for comparable expression QTLs.
A broad public xQTL browser spanning multiple molecular phenotypes and tissue contexts.
A post-GWAS target prioritization platform that fuses associations with selected functional evidence.
The standard catalog for trait associations and sentinel hits, with summary statistics for many studies.
Continuous ingestion of paper-native evidence (including supplements), cross-study normalization, downstream statistical tables (coloc/MR/fine-mapping/conditional/trans), provenance back to raw columns, and machine-usable query — all in a single model.
None of this replaces the role of consortia or curated portals. It complements them — turning the heterogeneous, ever-growing literature around them into a structure computational and agent workflows can rely on.
Headline counts emit from corpus_overview
against the QTL Index build, snapshot 2026-04-30. Per-study examples sourced via
search_studies,
study_detail,
study_narrative, and
query_table SQL against extracted parquet.
Numbers refresh after each pipeline rebuild — typically monthly.
Data presented on this page is provided by 🧬 Devano (devano.ai). Underlying studies are credited to their original authors and journals.