Hippo Label Validation

Do images labeled "Hippo" actually match Hippo pathways best?

What this shows: We compare each "Hippo" labeled image against ALL 239 KEGG and 984 WikiPathways to verify the labeling.

Key question: If an image is labeled "Hippo signaling", does it have highest gene overlap with Hippo pathways, or does it actually match PI3K-AKT or some other pathway better?

Method: Pure Jaccard similarity using Entrez gene IDs:

Jaccard(A, B) = |A ∩ B| / |A ∪ B|
= (genes in both) / (genes in either)

Example: Image has genes {YAP1, LATS1, MST1} and KEGG Hippo has {YAP1, LATS1, LATS2, MST1, MST2, SAV1, ...157 total}

Overlap = {YAP1, LATS1, MST1} = 3 genes
Union = 3 + (157 - 3) = 157 genes
Jaccard = 3 / 157 = 1.9%

Note: Low Jaccard is expected — images show partial pathway views, not the full reference pathway.

⚠️ Known limitation: Gene family labels

29% of gene labels (482/1648) lack Entrez IDs. Many are Hippo-related family names:

LATS1/2 (37x), MST1/2 (32x), YAP/TAZ (25x), TEAD (35x), MOB1 (14x)

These don't contribute to Jaccard overlap, making true scores higher than reported.

Loading validation data...