Two Innovations to Measure True Impact

We address both problems with new methods for linking GWAS discoveries to pharmaceutical innovation.

1. Text Mining Gene-Disease Pairs

Instead of relying on citations, we use BioBERT (a biomedical language model) to extract gene and disease mentions directly from patent text.

This captures the actual scientific content being used in patents—not just what's formally cited.

Result: A comprehensive map of which gene-disease pairs appear in pharmaceutical patents.

2. Measuring Spillovers via Pathways

We use KEGG biological pathways to define relationships between genes. When GWAS links Gene A to a disease, we can trace how that knowledge flows to related genes.

Genes in the same pathway share biological function—a discovery about one informs research on others.

Result: A "distance" metric for how far knowledge travels through biological networks.