Medication Extraction from Clinical Notes
This example demonstrates how to extract structured medication information from clinical notes using LangExtract. You'll learn how to define schemas for medication data, create effective prompts, and handle the complexities of medical text.
Overview
Clinical notes often contain medication information in unstructured formats. LangExtract can help extract:
- Medication names (brand and generic)
- Dosage amounts and units
- Frequency of administration
- Route of administration (oral, IV, topical, etc.)
- Duration of treatment
- Special instructions
Defining the Schema
Start by defining a Pydantic schema for medication information:
from pydantic import BaseModel, Field
from typing import Optional, List
class Medication(BaseModel):
name: str = Field(description="Name of the medication (generic or brand)")
dosage: Optional[str] = Field(None, description="Dosage amount and unit (e.g., '500mg', '10ml')")
frequency: Optional[str] = Field(None, description="How often to take (e.g., 'twice daily', 'every 6 hours')")
route: Optional[str] = Field(None, description="Route of administration (e.g., 'oral', 'IV', 'topical')")
duration: Optional[str] = Field(None, description="Duration of treatment (e.g., '7 days', 'as needed')")
instructions: Optional[str] = Field(None, description="Special instructions or notes")
class MedicationList(BaseModel):
medications: List[Medication] = Field(description="List of medications mentioned in the text")
Creating Example Prompts
Provide diverse examples that cover different scenarios:
examples = [
{
"input": "Patient was prescribed aspirin 81mg daily for cardiovascular protection.",
"output": {
"medications": [
{
"name": "aspirin",
"dosage": "81mg",
"frequency": "daily",
"route": "oral"
}
]
}
},
{
"input": "Take metformin 500mg twice daily with meals. Also prescribed lisinopril 10mg once daily.",
"output": {
"medications": [
{
"name": "metformin",
"dosage": "500mg",
"frequency": "twice daily",
"route": "oral",
"instructions": "with meals"
},
{
"name": "lisinopril",
"dosage": "10mg",
"frequency": "once daily",
"route": "oral"
}
]
}
},
{
"input": "IV antibiotics: vancomycin 1g every 12 hours for 7 days.",
"output": {
"medications": [
{
"name": "vancomycin",
"dosage": "1g",
"frequency": "every 12 hours",
"route": "IV",
"duration": "7 days"
}
]
}
}
]
Running the Extraction
Use LangExtract to extract medication information:
import langextract as lx
clinical_note = """
Patient presents with hypertension. Current medications:
- Lisinopril 10mg once daily
- Hydrochlorothiazide 25mg daily
- Atorvastatin 20mg at bedtime
Patient also reports taking ibuprofen 200mg as needed for joint pain.
"""
result = lx.extract(
text_or_documents=clinical_note,
prompt_description="Extract all medication information from the clinical note, including name, dosage, frequency, route, and any special instructions.",
examples=examples,
schema=MedicationList,
model_id="gemini-2.0-flash-exp"
)
print(result.medications)
Handling Complex Cases
Clinical notes can be complex. Consider these patterns:
- Multiple medications: Extract all medications mentioned, even if scattered throughout the text
- Abbreviations: Handle common medical abbreviations (e.g., "bid" for twice daily)
- Discontinued medications: Distinguish between current and past medications if needed
- Dosage changes: Handle dosage titrations and changes over time
- Combination drugs: Extract combination medications appropriately
Validation and Quality Checks
After extraction, validate the results:
- Schema validation: Ensure all required fields are present
- Dosage format: Verify dosage formats are consistent
- Route validation: Check that routes match expected values
- Source grounding: Review source spans to verify accuracy
Visualization
Use LangExtract's visualization tools to review extractions:
# Visualize the extraction with source highlighting
visualization = lx.visualize(result, source_text=clinical_note)
visualization.show()
This will highlight where each medication was found in the source text, making it easy to verify accuracy.
Best Practices
- Use medical terminology: Include examples with proper medical terms
- Handle variations: Account for different ways medications are mentioned
- Validate against standards: Consider validating against drug databases when possible
- Review source grounding: Always check source spans for critical extractions
- Iterate on prompts: Refine prompts based on real clinical notes