48,779 pathway diagram images → 1.4M edges, 1.4M nodes, 3 KEGG assignment signals
One row per image. KEGG assignments from three independent signals plus Gemini's own label and related pathways.
| pmc_id | filename | figure_number | gemini_label | gemini_related_pathways | n_genes | jaccard_kegg_id | jaccard_kegg_name | jaccard_score | caption_kegg_id | caption_kegg_name | caption_confidence | caption_reasoning | caption_jaccard | metadata_kegg_id | metadata_kegg_name | metadata_confidence | metadata_reasoning | metadata_jaccard | year |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| PMC5403593 | PMC5403593__F2.jpg | F2 | Cell cycle | MAPK/ERK signaling|Hippo signaling|PI3K-AKT signaling | 2 | hsa04392 | Hippo signaling pathway - multiple species | 0.03333 | hsa04014 | Ras signaling pathway | 0.6 | The paper's keywords explicitly list KRAS, RAS… | 0.00000 | 2017 | |||||
| PMC5795421 | PMC5795421__F2.jpg | F2 | MicroRNA biogenesis | RNA interference|Gene silencing|Post-transcriptional regulation | 5 | hsa03013 | Nucleocytoplasmic transport | 0.00893 | hsa03250 | Viral life cycle - HIV-1 | 0.6 | The paper discusses therapies to achieve an HI… | 0.00000 | 2017 | |||||
| PMC6599049 | PMC6599049__F6.jpg | F6 | cGAS-STING signaling | Antiviral innate immunity|Mitochondrial signaling | 4 | hsa04623 | Cytosolic DNA-sensing pathway | 0.04819 | hsa04623 | Cytosolic DNA-sensing pathway | 0.9 | The caption explicitly describes the mechanism… | 0.04819 | hsa04623 | Cytosolic DNA-sensing pathway | 0.9 | The paper title and abstract explicitly attrib… | 0.04819 | 2019 |
| PMC4509066 | PMC4509066__F2.jpg | F2 | Apoptosis | Calcium signaling|MAPK signaling|G protein signaling | 11 | hsa04215 | Apoptosis - multiple species | 0.20000 | hsa04210 | Apoptosis | 0.8 | The abstract explicitly states that the 'most… | 0.05755 | 2015 | |||||
| PMC3309942 | PMC3309942__F1.jpg | F1 | RANK signaling | NF-kB signaling|Autophagy|MAPK signaling | 6 | hsa04064 | NF-kappa B signaling pathway | 0.02778 | hsa04064 | NF-kappa B signaling pathway | 1.0 | The caption explicitly describes the role of p… | 0.02778 | hsa04064 | NF-kappa B signaling pathway | 0.8 | The abstract details the role of p62 as a scaf… | 0.02778 | 2012 |
One row per directed interaction extracted from a figure.
| pmc_id | filename | figure_number | source | target | interaction | uncertain |
|---|---|---|---|---|---|---|
| PMC8096095 | PMC8096095__F5.jpg | F5 | CASP3 | PARP1 | inhibition | False |
| PMC8096095 | PMC8096095__F5.jpg | F5 | CASP3 | Apoptosis | activation | False |
| PMC8096095 | PMC8096095__F5.jpg | F5 | DNA Damage | ATM | activation | False |
| PMC8096095 | PMC8096095__F5.jpg | F5 | ATM | TP53 | activation | False |
| PMC8096095 | PMC8096095__F5.jpg | F5 | TP53 | Apoptosis | activation | True |
One row per node extracted from a figure. Genes mapped to Entrez IDs where possible.
| pmc_id | filename | figure_number | label | node_type | entrez_id | is_family_representative | notes |
|---|---|---|---|---|---|---|---|
| PMC5403593 | PMC5403593__F2.jpg | F2 | YAP1 | gene | 10413 | False | |
| PMC5403593 | PMC5403593__F2.jpg | F2 | ERK | gene | True | MAPK family | |
| PMC5403593 | PMC5403593__F2.jpg | F2 | PI3K | complex | True | ||
| PMC5403593 | PMC5403593__F2.jpg | F2 | beta-catenin | gene | 1499 | False | CTNNB1 |
| PMC5403593 | PMC5403593__F2.jpg | F2 | G0 | phenotype | False | Quiescence |
| Nodes/Image | Edges/Image | |
|---|---|---|
| mean | 27.9 | 28.3 |
| std | 16.1 | 16.0 |
| min | 0 | 0 |
| 25% | 16 | 16 |
| 50% | 24 | 25 |
| 75% | 35 | 37 |
| max | 135 | 194 |
Three independent KEGG pathway assignment signals:
Jaccard (best-match) — gene-set overlap between image nodes and every KEGG pathway
Caption KEGG — LLM assignment from figure caption
Metadata KEGG — LLM assignment from paper title/abstract/MeSH
| Signal | Images | Coverage |
|---|---|---|
| Jaccard (best-match) | 47,526 | 97.4% |
| Caption KEGG | 32,299 | 66.2% |
| Caption Jaccard | 29,662 | 60.8% |
| Metadata KEGG | 46,577 | 95.5% |
| Metadata Jaccard | 44,200 | 90.6% |
| Caption OR Metadata | 47,719 | 97.8% |
| Any signal | 48,747 | 99.9% |
Pairwise agreement on KEGG pathway assignment (n = 31,157 images with both caption and metadata):
| Pair | Agree | Rate |
|---|---|---|
| Caption = Metadata | 17,666 | 56.7% |
| Caption = Best Jaccard | 9,799 | 31.5% |
| Metadata = Best Jaccard | 7,584 | 24.3% |
| All 3 agree | 6,687 | 21.5% |
Caption–Metadata agreement stratified by confidence. At 1.0/1.0, agreement reaches 88.9%.
| Caption Conf | Metadata Conf | n | Agree | Rate |
|---|---|---|---|---|
| 1.0 | 1.0 | 5,390 | 4,794 | 88.9% |
| 1.0 | 0.8 | 6,415 | 4,540 | 70.8% |
| 1.0 | 0.6 | 2,459 | 815 | 33.1% |
| 0.8 | 1.0 | 822 | 414 | 50.4% |
| 0.8 | 0.8 | 4,001 | 2,390 | 59.7% |
| 0.8 | 0.6 | 2,826 | 1,052 | 37.2% |
| 0.6 | 1.0 | 158 | 55 | 34.8% |
| 0.6 | 0.8 | 592 | 241 | 40.7% |
| 0.6 | 0.6 | 825 | 351 | 42.5% |
| Best-match | Caption | Metadata | |
|---|---|---|---|
| count | 47,684 | 29,662 | 44,200 |
| mean | 0.093 | 0.070 | 0.053 |
| std | 0.084 | 0.088 | 0.071 |
| 25% | 0.042 | 0.019 | 0.011 |
| 50% | 0.068 | 0.043 | 0.031 |
| 75% | 0.114 | 0.084 | 0.067 |
| max | 0.957 | 0.957 | 0.938 |
Best-match is highest by construction. Caption-assigned pathways have higher Jaccard than metadata-assigned, suggesting captions are more figure-specific.
PI3K/AKT/mTOR axis dominates. AKT and Akt counted separately (case-sensitive Gemini labels).
| # | Gene | Appearances | # | Gene | Appearances | |
|---|---|---|---|---|---|---|
| 1 | PI3K | 6,797 | 14 | NF-kB | 2,771 | |
| 2 | AKT | 5,980 | 15 | ERK1 | 2,663 | |
| 3 | mTOR | 4,991 | 16 | ERK2 | 2,655 | |
| 4 | Akt | 4,290 | 17 | TLR4 | 2,636 | |
| 5 | ERK | 4,277 | 18 | TRAF6 | 2,360 | |
| 6 | PTEN | 3,627 | 19 | MAPK | 2,310 | |
| 7 | JNK | 3,601 | 20 | PKC | 2,289 | |
| 8 | IL-6 | 3,277 | 21 | MyD88 | 2,282 | |
| 9 | STAT3 | 3,134 | 22 | PDK1 | 2,199 | |
| 10 | p53 | 3,115 | 23 | beta-catenin | 2,164 | |
| 11 | Ras | 3,056 | 24 | Wnt | 2,059 | |
| 12 | EGFR | 2,939 | 25 | APC | 2,000 | |
| 13 | MEK | 2,842 |
Sorted by total count across all three signals. Jaccard favors smaller, specific pathways (ErbB, Prolactin) where gene overlap is high; LLM signals favor broader well-known pathways (PI3K-Akt, MAPK, JAK-STAT).
| Pathway | Caption | Metadata | Jaccard |
|---|---|---|---|
| Toll-like receptor signaling | 1,497 | 1,807 | 2,182 |
| Wnt signaling | 1,884 | 2,303 | 1,074 |
| PI3K-Akt signaling | 1,777 | 2,224 | 44 |
| TGF-beta signaling | 1,270 | 1,699 | 1,019 |
| NF-kappa B signaling | 1,035 | 1,437 | 1,041 |
| ErbB signaling | 453 | 699 | 2,160 |
| MAPK signaling | 1,233 | 1,668 | 97 |
| Apoptosis | 960 | 1,548 | 254 |
| p53 signaling | 473 | 669 | 1,474 |
| JAK-STAT signaling | 1,064 | 1,243 | 300 |
| Apoptosis - multiple species | 4 | 2 | 2,475 |
| mTOR signaling | 762 | 1,162 | 546 |
| Hedgehog signaling | 426 | 620 | 927 |
| Cell cycle | 620 | 920 | 391 |
| VEGF signaling | 214 | 383 | 1,289 |
| Cytosolic DNA-sensing | 380 | 409 | 1,001 |
| RIG-I-like receptor signaling | 414 | 469 | 842 |
| HIF-1 signaling | 390 | 533 | 800 |
| Protein processing in ER | 569 | 735 | 380 |
| Adipocytokine signaling | 232 | 406 | 1,001 |
| Complement & coagulation cascades | 359 | 611 | 608 |
| Autophagy - animal | 548 | 901 | 65 |
| Hippo signaling | 588 | 707 | 219 |
| Insulin signaling | 468 | 758 | 284 |
| NOD-like receptor signaling | 592 | 684 | 68 |
| AMPK signaling | 372 | 417 | 549 |
| Prolactin signaling | 28 | 41 | 1,241 |
| Notch signaling | 361 | 449 | 487 |
| IL-17 signaling | 110 | 188 | 987 |
| Ferroptosis | 338 | 384 | 547 |
| Th17 cell differentiation | 91 | 156 | 980 |
| T cell receptor signaling | 359 | 582 | 253 |
| Pluripotency signaling | 269 | 578 | 339 |
| Glycolysis / Gluconeogenesis | 354 | 559 | 211 |
| Autophagy - other | 7 | 7 | 1,095 |
| TNF signaling | 258 | 319 | 426 |
| Adherens junction | 65 | 93 | 838 |
| PPAR signaling | 262 | 460 | 223 |
| Fc epsilon RI signaling | 40 | 67 | 794 |
| Regulation of actin cytoskeleton | 306 | 513 | 45 |
249 KEGG pathways have at least one figure assigned by any signal:
| Signal | Pathways with ≥1 figure |
|---|---|
| Caption | 229 |
| Metadata | 240 |
| Jaccard | 237 |