Patient matching, the crucial task of accurately identifying and linking patient records across disparate healthcare systems, poses a significant challenge in the digital health landscape. WellSky is tackling this challenge head-on with sophisticated algorithms. In this blog post, we will take a deep dive into WellSky's Enterprise Patient Matching (EPM) solution, exploring its innovative approach, the challenges it addresses, and the underlying architecture that empowers its effectiveness.
The Problem: Disparate Data and Inconsistent Records
The healthcare industry faces a critical issue: patient data is scattered across numerous systems, leading to inconsistency and fragmentation. Imagine a scenario in which a patient's information is stored in separate systems for their primary care physician, specialist, hospital, and pharmacy. Each system may have variations in the patient's name, address, contact information, or even their date of birth. This fragmented data landscape can result in inaccurate medical records, compromised patient safety, and inefficient care coordination. One of our solutions is algorithm-driven patient matching.
EPM Objectives
Leveraging sophisticated algorithms, EPM aims to address this challenge by the following key objectives:
- Data integrity: Ensuring consistent and accurate patient records across all WellSky systems.
- Interoperability: Enabling seamless communication and data exchange between different healthcare systems.
- Improved patient experience: Facilitating a seamless transition across systems for WellSky clients, resulting in better care coordination and a more holistic view of the patient’s medical history.
Developing effective algorithms presents its own challenges. Failing to link records that belong to the same patient or incorrectly linking records to different patients could result in inaccuracies regardless of if the matching algorithm is flawless. Ensuring that the criteria used for matching must be accurate, comprehensive, and adaptable to various data formats, like FHIR.
Matching Algorithm with Precision, Recall, and F-Score
The team assessed the effectiveness of matching algorithms using several key metrics:
- Precision: measures the percentage of true matches among the records identified as matches. A high precision score indicates that the algorithm is accurately identifying true matches.
𝑃 = 𝑇𝑝 / 𝑇𝑝+𝐹𝑝
- Recall: measures the percentage of actual matches that were correctly identified by the system. A high recall score indicates that the algorithm is capturing most of the true matches in the dataset.
𝑅 = 𝑇𝑝 / 𝑇𝑝+𝐹𝑛
- F-Score: combines precision and recall, providing a balanced measure of accuracy and coverage. A high F-score indicates a good balance between identifying true matches and minimizing both missed and false matches.
𝐹 = 2 × ((𝑃×𝑅) / (𝑃+𝑅))
Example: Evaluating Matching Effectiveness
Consider a dataset of five patients with similar data fields and three different matching algorithms A, B, and C.
A match query comes in with three data fields:
- First Name: Jonathan
- Last Name: Smith
- Birth Date: 03/28/1952
The match responses between each algorithm are:
- 1000, 3000, 4000
- 1000, 2000, 3000, 4000
- 1000, 2000, 3000, 4000, 5000
Both A and B have 100% precision, but only B returned the full set. C has a lower precision despite returning all relevant records. B and C both have a recall of 1, but C returned an invalid record. Neither Precision nor Recall can be used alone to judge efficacy, and F-score is better to gauge.
Other Considerations: The Fellegi-Sunter Model
These algorithms rely on predefined rules and thresholds for data field comparisons, using pre-determined weights to calculate match scores. They are generally faster but less flexible than probabilistic models. In action, exact matches do not guarantee that it refers to the patient in the query. A match can result in multiple records with the same score, matching on the same or different data fields. Also, a match with the lowest score might be the correct match. It's important to understand the broader landscape of algorithms in this field. The Fellegi-Sunter model is considered a foundational probabilistic approach to record linkage proposing a two-step procedure adjusting matching weights based on frequency, and it offers a different perspective compared to some other commonly used algorithms.
How the Fellegi-Sunter Model Works:
- Comparison vector: Instead of directly comparing raw data fields, the model transforms record pairs into a vector of comparison codes. Each code represents the agreement or disagreement level between corresponding fields (e.g., exact match, partial match, no match).
- Estimating probabilities: The model then uses these comparison vectors to estimate two key probabilities:
- m-probability: The probability of observing a particular comparison vector given that the two records truly belong to the same entity.
- u-probability: The probability of observing the same comparison vector given that the records belong to different entities.
- Match weights: Based on the m- and u-probabilities, weights are assigned to each comparison code, reflecting its discriminating power in distinguishing true matches from false matches.
- Threshold for matching: Finally, the weights are summed across all comparison codes for a given record pair, resulting in a match score. This score is compared against a predefined threshold to determine if the records are a match, a non-match, or if further manual review is required.
The probabilistic nature of the model can make it challenging to interpret the reasoning behind specific match decisions, which is crucial for building trust and accountability in healthcare settings. The exact model formulas are better described in more detail in this NIH article by Huiping Xu, Xiaochun Li, and Shaun Grannis.
Data Elements Used for Weighted Scoring
Each data element is assigned a weighted score based on its significance in determining a match. For example, social security number (SSN) and date of birth carry higher weights due to their uniqueness and reliability. This considers a wide range of data elements, including:
- Identifiers: SSN, driver's license number, Medicare/Medicaid IDs.
- Demographics: Family name, given name, date of birth, gender, marital status.
- Contact Information: Phone numbers, email addresses, fax numbers.
- Addresses: Address line, city, state, zip code, county.
- Contextual Information: Active status, organization, tags, and other relevant data points.
The weighted score ultimately grades the match, providing a qualitative assessment of the confidence level. Grades typically range from “certain” for very high confidence to “possible” for matches that require further review. It is also possible to filter out less-confident results altogether with query configuration.
Using the Levenshtein Distance algorithm to “fuzzy” match increases the probability of matches based on the data elements, where swapping one or two characters with “edits” needed to match two sequences will increase the effectiveness of the results by the F-score previously mentioned.
Achieving performance at scale
A requirement set by the development team is the match should give results within 300 milliseconds. As well as defining a robust matching algorithm, a scalable architecture leverages cloud-native technologies to ensure reliability and performance when handling the complexities of large-scale patient data processing.
Data is fed into ElasticSearch from a relational Cloud SQL database synced from structured FHIR-compliant patient data sent to the Firestore. Data elements, like phone numbers and addresses, are standardized into a patient document during an ETL process. These documents are stored in ElasticSearch databases, so when a Match request is received, it performs the preliminary matching using a patient index. The preliminary results are then scored to ascertain matches and grade them based on their scores.
Conclusion
WellSky’s Enterprise Patient Matching solution represents a significant advancement in addressing data fragmentation and inconsistent patient records in the healthcare industry. By leveraging sophisticated algorithms, a weighted scoring system, and a robust cloud-based architecture, WellSky is establishing the foundation to empower healthcare organizations to achieve greater data integrity, interoperability, and better care coordination based on medical history.