Precise source grounding
Every extracted field can be traced back to the exact character spans in the input text, enabling audits, highlighting, and robust UI overlays.
Learn about grounding →LangExtract is a Python library from Google that turns unstructured text into reliable structured data using Gemini and other LLMs – with precise source grounding, schema-aware extraction, and rich visualization.
import langextract as lx
result = lx.extract(
text_or_documents=report_text,
prompt_description="Extract diagnoses, meds, and dates",
examples=examples,
model_id="gemini-2.5-flash",
)
Designed for production-grade information extraction, LangExtract combines powerful LLMs with deterministic post-processing and transparent provenance.
Every extracted field can be traced back to the exact character spans in the input text, enabling audits, highlighting, and robust UI overlays.
Learn about grounding →Define your target structure in natural language, as a JSON schema, or with Pydantic-like models and let LangExtract handle validation and coercion.
Schema & validation docs →Use Gemini, Vertex AI, OpenAI, or local models via Ollama. Swap providers without rewriting extraction logic using a unified API and plugin system.
Provider integrations →LangExtract is already being used in domains like healthcare, finance, legal, and customer support to turn long-form documents into structured records.
Visit the Examples page for detailed walkthroughs, including a full Romeo and Juliet extraction, medication extraction, and radiology report structuring.
Process long documents efficiently with batching, streaming, and composable extraction passes. Combine summary passes with targeted follow-ups to keep quality high and costs low.
The Benchmarks page summarizes evaluation results and links to the evaluation scripts on GitHub.
Install from PyPI, configure your model provider, and start extracting.
pip install langextract
Or install from source for development and testing. See the Getting started guide for virtual environment and Docker instructions.
Use Gemini (via LangExtract API key or Vertex AI), OpenAI, or a local Ollama model. The Providers page walks through configuration for each backend.
Draft a prompt_description, add a few examples, and optionally define a schema. Explore
extraction design patterns and the interactive visualization
tools.
Dive into the documentation, adapt real-world examples, or learn how to extend LangExtract via custom providers.