Integrated Clinical Document Processing and Curation with Linguamatics I2E

LabKey’s clinical document abstraction solution with Linguamatics I2E natural language processing engine integration is designed to streamline the abstraction and curation of unstructured data files. Teams using LabKey + I2E can significantly reduce the number of manual processes involved in extracting structured data from free-text documents and reports by automating the acquisition, processing and assignment of files for curation.

LabKey Server and Linguamatics i2E natural language processing (NLP) engine integrate to support automatic extraction and curation of unstructured data for healthcare

Document Acquisition

Automated integration of free text documents via LabKey ETLs streamlines the acquisition process. Files can also be uploaded manually using the File Management system offering additional flexibility to add documents ad hoc or in small batches.

Curation Workflow

Teams can define document abstraction and review workflows that support their specific research scenario and ensure that documents follow a consistent curation process. Workflows can be a combination of automated and manual processes.


Reporting features in LabKey provide a complete view of an organization’s curation operations. Managers can use information about abstractor workloads and metrics like average abstraction time per document to help balance and optimize operations.

Processing Pipeline with I2E

Files added to LabKey can be run through the Linguamatics i2E NLP engine as part of an integrated data processing pipeline, before they become available in the UI for abstraction. I2E indexes documents and extracts target values for review by abstractors.

Curation UI

A curation UI presents abstractors with a side-by-side view of unstructured documents and the data fields for abstraction, allowing them to efficiently review and record data points in a single screen. Abstractors and reviewers can easily toggle between documents in their queue and monitor their progress through an assigned document batch.

Querying & Analysis

The resulting data generated from the curation process is stored in structured format allowing users to conduct simple and complex queries to locate datasets of interest.

About Linguamatics I2E Natural Language Processing Engine

I2E is an agile and interactive text mining platform for the extraction and analysis of information. Linguamatics uses a powerful blend of methods, including Machine Learning, for high precision and recall.

Linguamatics I2E Extracts:

  • Pathology report data including cancer histology, grade, and behavior, biomarker value, and cancer stage
  • Patient profiling including diseases, medications, lab values etc.
  • Social determinants of health and lifestyle factors to support population health analytics
  • Phenotypic characteristics to support genotype-phenotype studies using human phenotype ontology

Key I2E Capabilities include:

  • Data Discovery – using large scale unannotated data sets analysts can rapidly, iteratively develop algorithms, saving months compared to manual chart review
  • Democratized NLP – an intuitive GUI provides easy user access to NLP, with no coding or scripting required
  • Programmatic Workflow integration – SOA friendly RESTful web services provide fail-safe and recoverable NLP processes for tight integration with LabKey’s ETL (Extract, Transform, Load) workflows

Get Started with LabKey Server

Request a Demo

Request a custom demo with a LabKey team member to explore your specific areas of interest.