Stratified Word Embeddings with Patient and Provider Metadata

Sun, 05 Apr 2026 19:52:47 -0700

Standard word embeddings treat every occurrence of a word as equivalent. “Chest pain” means the same thing whether it appears in a note written by an emergency attending about an 80-year-old Medicare patient or by a resident describing an elective pre-op. That homogeneity is a fundamental design choice in Word2Vec and FastText, and for most downstream tasks it is a reasonable one. Clinical NLP, however, operates in a domain where who is speaking, about whom, and in what care setting are inseparable from meaning. This post describes FastTextContext, a from-scratch C++ implementation that extends FastText’s skip-gram model with learned metadata embeddings for patient demographics and provider role, fused through a shared projection matrix.

FastTextContext on Volundarhus

Stratified Word Embeddings with Patient and Provider Metadata