Features
Real-Time PII/PHI Detection That Catches What Others Miss
ML-powered detection identifies 40+ sensitive data types in milliseconds—including context-dependent information that regex patterns can't catch. Purpose-built for AI workflows where accuracy and speed both matter.
Why Detection Accuracy Matters
Catch What Regex Misses
Traditional DLP uses pattern matching that fails on natural language. "My social is one two three..." doesn't match regex. ML-powered detection understands context and catches it anyway.
High accuracy on standard PII benchmarks
Cover All the Data Types That Matter
From SSNs and credit cards to medical record numbers and custom identifiers, our detection covers the sensitive data types relevant to healthcare, finance, legal, and enterprise.
40+ data types out of the box, custom patterns available
Protect Without Slowing Down AI
Detection runs in under 15ms on typical requests. Your team won't notice protection is happening—they'll just notice their data is safe.
<15ms average detection latency
Meet Compliance Detection Requirements
HIPAA requires identifying PHI. SOC 2 requires data classification. Our detection provides the foundation for compliance with detection logs.
Detection logs exportable for compliance reporting
Reduce False Positives with Tuned Models
Detection that cries wolf undermines adoption. Our models are tuned for <0.5% false positive rate, with adjustable sensitivity for your specific needs.
<0.5% false positive rate on production workloads
Scale Detection Across All AI Tools
Same detection engine protects ChatGPT and DeepSeek today, with Grok, Claude, and Magic coming soon. One policy, consistent protection everywhere.
Unified detection across 40+ integrations
How Detection Works
Text Ingestion
Input
Raw text from user prompt, document upload, or API request
Control
Text preprocessing and normalization
Output
Prepared text ready for analysis
Multi-Model Analysis
Input
Prepared text
Control
Ensemble of specialized ML models: Named Entity Recognition (NER) for names, organizations; Pattern-enhanced detection for formatted data (SSN, CC, etc.); Context analysis for informal mentions; Custom pattern matching for organization-specific data
Output
Candidate entities with classifications
Confidence Scoring
Input
Candidate entities
Control
Confidence scoring based on: Model agreement across ensemble; Pattern strength; Context indicators; Training data similarity
Output
Scored entities with confidence levels
Policy Application
Input
Scored entities
Control
Apply detection policy: Minimum confidence threshold; Enabled/disabled data types; Sensitivity level (conservative, balanced, aggressive)
Output
Final detection results
Annotation and Logging
Input
Final detection results
Control
Annotate original text and create audit record
Output
Annotated text ready for masking + detection log entry
Technical Specifications
Detection Performance
Model Architecture
Primary models
Transformer-based NER (fine-tuned)
Pattern layer
Regex + format validation
Context layer
Contextual embedding analysis
Ensemble method
Weighted voting with confidence calibration
Training data
Healthcare, finance, legal domain corpora
Supported Data Types
Personal Identifiers
| Data Type | Examples | Risk Level |
|---|---|---|
| Full Name | John Smith, Dr. Sarah Chen | Medium |
| Social Security Number | 123-45-6789, xxx-xx-1234 | High |
| Date of Birth | 03/15/1985, March 15 1985 | Medium |
| Driver's License | State-specific formats | High |
| Passport Number | Country-specific formats | High |
| National ID | Various international formats | High |
Healthcare (PHI)
| Data Type | Examples | Risk Level |
|---|---|---|
| Medical Record Number | Facility-specific formats | High |
| Health Plan ID | Insurance member IDs | High |
| Diagnosis Code | ICD-10 codes | Medium |
| Medication | Drug names with dosages | Medium |
| Lab Results | Test values with units | Medium |
| Provider NPI | 10-digit NPI numbers | Medium |
Financial Data
Authentication & Security
Custom Patterns
Define custom detection patterns for proprietary identifiers, internal codes, or industry-specific data types not covered by default models.
Regex patterns
Define custom regex for proprietary identifiers
Keyword lists
Match against lists of sensitive terms
Contextual rules
Combine patterns with context requirements
Validation logic
Add format validation to reduce false positives
Detection in Action
Healthcare
Clinical staff prompting AI to draft patient communications
Detected:
Patient names, MRNs, diagnoses, medications, dates of service, provider names
Outcome:
All 18 HIPAA identifiers detected before reaching the LLM
Compliance:
HIPAA access documentation satisfied
Financial Services
Advisors using AI to analyze client portfolios
Detected:
Client names, account numbers, SSNs, portfolio values, transaction details
Outcome:
Client PII protected; AI still provides useful financial analysis
Compliance:
GLBA and SEC data handling requirements supported
Legal
Attorneys using AI to research case law and draft documents
Detected:
Party names, case numbers, privileged communications, settlement amounts
Outcome:
Confidential case details protected; AI assists with legal research
Compliance:
Attorney-client privilege maintained
Enterprise
Employees using AI for everyday work tasks
Detected:
Employee IDs, internal project names, customer data, API credentials
Outcome:
AI usage brought under security control with consistent protection
Compliance:
SOC 2 data classification requirements met
Frequently Asked Questions
How does detection handle informal mentions like "my social is..."?
Can I add custom detection patterns?
What's the false positive rate?
Does detection work on non-English text?
How do I tune detection for my specific data?
What about images and documents?
Related Features
Context-Preserving Masking
After detection, our masking replaces sensitive data with semantic tokens that maintain meaning for accurate AI responses.
Secure Unmasking
Our Reveal Technology restores original values in AI responses for authorized users—with full access controls.
Compliance Automation
Detection logs feed directly into compliance reporting for HIPAA, SOC 2, and other frameworks.
