Medical Pathway Analyzer

Automated extraction of gene/protein interaction networks from 49k pathway diagrams using Gemini 3 Pro

48,779
Images
1.33M
Nodes
1.38M
Edges
8,821
Pathways
42,402
PMC Papers
99.6%
Success

Methodology

Prompt (v2)
You are an expert bioinformatician specializing in pathway analysis.
Extract gene/protein interaction networks from signaling pathway diagrams.

CRITICAL RULES:
1. Map genes to Human Entrez Gene IDs when possible
2. Protein families (Rho, GAP, MAPK) get entrez_id: null, is_family_representative: true
3. Phenotypes/outcomes (Cell scattering, APOPTOSIS, G1 Phase) get node_type: "phenotype"
4. Identify small molecules, drugs, inhibitors, or metabolites (e.g., 'PD98059', 'ATP', 'Calcium', 'Wortmannin') as node_type: "chemical". Do not exclude them. Do not assign Entrez IDs to chemicals.
5. Ensure biological processes (e.g., 'Cell Cycle', 'G1 Phase', 'Apoptosis') are strictly labeled as 'phenotype' to distinguish them from molecular entities.
6. Dashed lines or "?" marks → uncertain: true
7. Detect if the pathway diagram contains any question marks (?) and set has_question_marks accordingly
8. Identify the overall pathway/biological process shown in the diagram and provide a brief description
9. Identify the PRIMARY canonical pathway shown. Use standard names like: "PI3K-AKT signaling", "MAPK/ERK signaling", "Wnt/beta-catenin signaling", "NF-kB signaling", "TGF-beta signaling", "JAK-STAT signaling", "Notch signaling", "Hedgehog signaling", "p53 signaling", "Apoptosis", "Autophagy", "Cell cycle", "mTOR signaling", "AMPK signaling", "Hippo signaling", "TNF signaling", "Toll-like receptor signaling", "Insulin signaling", "VEGF signaling", "Calcium signaling", etc. If the pathway doesn't match a canonical name, provide the most descriptive name possible.
10. Output ONLY valid JSON with no extra text

Return this exact JSON structure:
{
  "figure_metadata": {
    "has_question_marks": boolean,
    "pathway_description": "string (1-2 sentence description)",
    "canonical_pathway": "string (primary pathway name, e.g. 'PI3K-AKT signaling')",
    "related_pathways": ["string (other pathways shown or connected)"]
  },
  "nodes": [
    {
      "label": "string",
      "node_type": "gene|phenotype|complex|chemical|other",
      "entrez_id": integer|null,
      "is_family_representative": boolean,
      "notes": "string"
    }
  ],
  "edges": [
    {
      "source": "string",
      "target": "string",
      "interaction": "activation|inhibition|binding|indirect",
      "uncertain": boolean
    }
  ]
}
Example JSON Output
{
  "figure_metadata": {
    "has_question_marks": false,
    "pathway_description": "A schematic of the cell cycle highlighting the G1 phase sub-stages (G1-pm and G1-ps). It illustrates a signaling network where ERK and YAP1 in the post-mitotic G1 phase activate PI3K and Beta-catenin in the pre-synthesis G1 phase to drive cell cycle progression.",
    "canonical_pathway": "Cell cycle",
    "related_pathways": ["MAPK/ERK signaling", "Hippo signaling", "PI3K-AKT signaling", "Wnt/beta-catenin signaling"]
  },
  "nodes": [
    {"label": "YAP1", "node_type": "gene", "entrez_id": 10413, "is_family_representative": false, "notes": ""},
    {"label": "ERK", "node_type": "gene", "entrez_id": null, "is_family_representative": true, "notes": "MAPK family"},
    {"label": "PI3K", "node_type": "complex", "entrez_id": null, "is_family_representative": true, "notes": ""},
    {"label": "beta-catenin", "node_type": "gene", "entrez_id": 1499, "is_family_representative": false, "notes": "CTNNB1"},
    {"label": "G0", "node_type": "phenotype", "entrez_id": null, "is_family_representative": false, "notes": "Quiescence"},
    {"label": "G1", "node_type": "phenotype", "entrez_id": null, "is_family_representative": false, "notes": "Gap 1 phase"}
  ],
  "edges": [
    {"source": "ERK", "target": "PI3K", "interaction": "activation", "uncertain": false},
    {"source": "YAP1", "target": "beta-catenin", "interaction": "activation", "uncertain": false},
    {"source": "ERK", "target": "beta-catenin", "interaction": "activation", "uncertain": true},
    {"source": "YAP1", "target": "PI3K", "interaction": "activation", "uncertain": true}
  ]
}
Processing Pipeline

Source: 49,014 pathway diagram images from PubMed Central papers stored in Dropbox

Model: Google Gemini 3 Pro Preview via Vertex AI

Throughput: ~120 images/minute with 200 concurrent requests using native async API

Resilience: Auto-resume on failure, 5x retry with exponential backoff, 120s timeout per request

Output: JSONL with nodes, edges, and metadata per image; exported to CSV (154 MB total)

Distributions

Node Types

gene
68.0%
chemical
13.7%
phenotype
11.1%
complex
4.2%
other
3.0%

Edge Types

activation
66.7%
inhibition
14.9%
binding
13.0%
indirect
5.2%

7.8% of edges marked uncertain

Top 20 Canonical Pathways
1,783 PI3K-AKT signaling
1,673 Wnt/beta-catenin
1,624 Toll-like receptor
1,355 TGF-beta signaling
1,197 mTOR signaling
1,188 JAK-STAT signaling
971 Apoptosis
955 NF-kB signaling
638 MAPK/ERK signaling
611 Hippo signaling
588 Autophagy
575 Hedgehog signaling
542 Unfolded Protein Response
497 Insulin signaling
478 Cell cycle
452 p53 signaling
406 PI3K-AKT-mTOR
397 Notch signaling
309 Central Carbon Metabolism
300 cGAS-STING signaling

Audit: Random Samples

5 randomly selected images from the 48,779 processed. Compare extracted networks against original diagrams.

PMC7823885 — Figure F3

Liver metabolic reprogramming and inflammatory signaling
PMC7823885 F3

The diagram illustrates metabolic reprogramming in liver cells, characterized by alterations in lipid and cholesterol metabolism, fatty acid oxidation, and glycolysis, coupled with inflammatory signaling pathways mediated by IL-1, MAPK/ERK, and JNK.

79 nodes 67 edges
Nodes (79)
LabelTypeEntrezNotes
MCT1/4complexTransporter complex
Acetatechemical
Acss2gene55902
Me1gene4199
Acetyl-CoAchemical
Acacagene31
Fasngene2194
Fdpsgene2224
Nsdhlgene50814
Cholesterolchemical
Il1a/bgene3552IL1A and IL1B
ERKgene5595MAPK1/3
JNKgene5599MAPK8
Srebp1gene6720SREBF1
...+ 65 more
Edges (67)
SourceTargetType?
MCT1/4Acetateactivation
AcetateAcss2activation
Acss2Acetyl-CoAactivation
Il1a/bIl1rbinding
Il1rERKactivation
ERKSRFactivation
JNKJunactivation
...+ 60 more

PMC7105607 — Figure F2

NF-kB signaling
PMC7105607 F2

The diagram illustrates the cell-type specific roles of NF-kB signaling pathway components (NF-kB1, IKKbeta, c-Rel, RelB) in CNS cells and immune cells, regulating inflammation, cell survival, and T cell differentiation.

19 nodes 21 edges
Nodes (19)
LabelTypeEntrezNotes
NF-kB1gene4790Astrocytes & Th2
IKKbetagene3551Microglia, Neurons
c-Relgene5966T cell differentiation
RelBgene5971Oligodendrocytes
Pro-inflammatory cytokinesphenotype
Th0 cellsotherPrecursor T cells
Th1 cellsphenotype
Th17 cellsphenotype
Treg cellsphenotype
Th2 cellsphenotype
...+ 9 more
Edges (21)
SourceTargetType?
NF-kB1Pro-inflammatory cytokinesactivation
IKKbetaPro-inflammatory cytokinesactivation
c-RelTh17 cellsactivation
c-RelTreg cellsactivation
RelBTh1 cellsactivation
NF-kB1Th2 cellsactivation
Th0 cellsTh1 cellsactivation
B cellsT cellsindirect?
...+ 13 more

PMC4580081 — Figure F2

JAK-STAT signaling
PMC4580081 F2

Chaperone-mediated life cycle of STAT3 and JAK kinases. Details the folding of STAT3 involving NAC, prefoldin, TRiC, and HSP90/ERp57, its activation by JAKs, and translocation to nucleus or mitochondria.

17 nodes 19 edges
Nodes (17)
LabelTypeEntrezNotes
IL-6gene3569
JAK1gene3716
JAK2gene3717
STAT3gene6774
HSP90geneChaperone family
HOPgene10963STIP1
ERp57gene2923PDIA3
CDC37gene11140
TRiCcomplexChaperonin
NACcomplex
prefoldincomplex
TOMcomplexOuter membrane
TIMcomplexInner membrane
Proteasomecomplex
HSP70geneChaperone family
HSP40geneChaperone family
MitochondriaotherOrganelle
Edges (19)
SourceTargetType?
IL-6JAK1activation
IL-6JAK2activation
JAK1STAT3activation
JAK2STAT3activation
NACSTAT3activation?
prefoldinSTAT3activation?
TRiCSTAT3activation
HSP90STAT3binding
ERp57STAT3binding
STAT3TOMbinding?
STAT3TIMbinding?
CDC37JAK1binding
HSP90JAK1binding
ProteasomeJAK1inhibition
...+ 5 more

PMC7155770 — Figure F4

Necroptosis signaling
PMC7155770 F4

TNF superfamily death receptor signaling pathways (TNFR1, CD95, TRAIL-R) leading to cell survival, apoptosis, or necroptosis. Details the formation of intracellular complexes (Complex I, IIa, IIb) and key regulators.

39 nodes 25 edges
Nodes (39)
LabelTypeEntrezNotes
TNFalphagene7124
TNFR1gene7132
CD95Lgene356FASLG
TRAILgene8743TNFSF10
CD95gene355FAS
FADDgene8772
Caspase-8gene841
RIPK1gene8737
RIPK3gene11035
MLKLgene197259
Apoptosisphenotype
Necroptosisphenotype
Survivalphenotype
...+ 26 more
Edges (25)
SourceTargetType?
TNFalphaTNFR1binding
CD95LCD95binding
TRAILTRAIL R1/R2binding
FADDCaspase-8binding
Caspase-8Caspase-3/-7activation
Caspase-3/-7Apoptosisactivation
MLKLNecroptosisactivation
p65Survivalactivation
...+ 17 more

PMC5867880 — Figure F3

Regulation of Cell Migration
PMC5867880 F3

CD99 signaling in cell migration. In CD99 wt cells, CD99 upregulates Caveolin-1, which inhibits Src, resulting in decreased ROCK2 and ARP2/3 levels and suppressed cell migration.

10 nodes 12 edges
Nodes (10)
LabelTypeEntrezNotes
CD99gene4267Membrane
Cav-1gene857Caveolin-1
Srcgene6714
ARP2/3complexActin complex
ROCK2gene9475
pJNKgenePhospho-JNK
ERK1/2geneMAPK3/1
AP1complexTF complex
MMP9gene4318
Cell migrationphenotype
Edges (12)
SourceTargetType?
CD99Cav-1activation
Cav-1Srcinhibition
SrcROCK2activation
CD99ARP2/3inhibition
ARP2/3Cell migrationactivation
ROCK2Cell migrationactivation
SrcpJNKactivation
SrcERK1/2activation
pJNKAP1activation
ERK1/2AP1activation
AP1MMP9activation
MMP9Cell migrationactivation

Audit: Same Pathway Comparison

5 images all labeled as Autophagy from different papers. These show biological consensus — different researchers independently agreeing on core pathway components.

10 genes shared by ALL 5 papers:

Loading...

Similarities & Differences

Consensus (Agreement)

  • 10 core genes in all 5 papers
  • ~50% average overlap (higher than Notch's 38%)
  • ULK1 complex: ULK1, ATG13, FIP200, ATG101
  • PI3K III complex: VPS34, VPS15, Beclin-1
  • ATG conjugation: ATG3, ATG5, ATG7, ATG10, ATG12

Variation (Context-dependent)

  • Regulatory focus: Some emphasize mTOR, others AMPK
  • Phagophore → Autophagosome: "activation" vs "indirect"
  • Unique genes: WIPI2 (PMC8888908), Bcl-2/Rab7 (PMC7272661)
  • Naming variants: ATG16L vs ATG16L1 vs ATG16-1
Key Insight: Autophagy shows ~50% gene overlap — the highest consensus of any pathway we've analyzed. This reflects its status as a highly conserved, well-characterized cellular process.