Features

Real-Time PII/PHI Detection That Catches What Others Miss

Q: How does detection handle informal mentions like "my social is..."?

Our context-aware models are trained on natural language patterns. They detect SSNs mentioned conversationally ("my social is one two three...") as well as formatted patterns ("123-45-6789").

Q: Can I add custom detection patterns?

Yes. Team and Enterprise plans support custom regex patterns, keyword lists, and contextual rules. You can define proprietary identifiers, internal project codes, or industry-specific data types.

Q: What's the false positive rate?

<0.5% on production workloads with balanced sensitivity. You can adjust detection sensitivity (conservative, balanced, aggressive) to tune the tradeoff between false positives and false negatives.

Q: Does detection work on non-English text?

Detection currently performs best on English text. We're expanding language support. Contact us for current coverage of your specific languages.

Q: How do I tune detection for my specific data?

Start with our default models and monitor detection results. Adjust sensitivity levels, enable/disable specific data types, and add custom patterns as needed. Enterprise customers get tuning assistance from our team.

Q: What about images and documents?

We detect PII in text extracted from uploaded documents (PDFs, Word docs). Image OCR and direct image PII detection are on our roadmap.

ML-powered detection identifies 40+ sensitive data types in milliseconds—including context-dependent information that regex patterns can't catch. Purpose-built for AI workflows where accuracy and speed both matter.

40+ data types — SSNs, credit cards, PHI, and custom patterns

Context-aware — Catches "my social is 1234" that regex misses

Sub-15ms latency — Protection without workflow disruption

Why Detection Accuracy Matters

Catch What Regex Misses

Traditional DLP uses pattern matching that fails on natural language. "My social is one two three..." doesn't match regex. ML-powered detection understands context and catches it anyway.

High accuracy on standard PII benchmarks

Cover All the Data Types That Matter

From SSNs and credit cards to medical record numbers and custom identifiers, our detection covers the sensitive data types relevant to healthcare, finance, legal, and enterprise.

40+ data types out of the box, custom patterns available

Protect Without Slowing Down AI

Detection runs in under 15ms on typical requests. Your team won't notice protection is happening—they'll just notice their data is safe.

<15ms average detection latency

Meet Compliance Detection Requirements

HIPAA requires identifying PHI. SOC 2 requires data classification. Our detection provides the foundation for compliance with detection logs.

Detection logs exportable for compliance reporting

Reduce False Positives with Tuned Models

Detection that cries wolf undermines adoption. Our models are tuned for <0.5% false positive rate, with adjustable sensitivity for your specific needs.

<0.5% false positive rate on production workloads

Scale Detection Across All AI Tools

Annotate original text and create audit record

Output

Annotated text ready for masking + detection log entry

Technical Specifications

Detection Performance

Average latency<15ms

P95 latency<30ms

Accuracy (standard benchmark)High

False positive rate<0.5%

False negative rate<1%

Throughput10,000+ requests/second

Model Architecture

Primary models

Transformer-based NER (fine-tuned)

Pattern layer

Regex + format validation

Context layer

Contextual embedding analysis

Ensemble method

Weighted voting with confidence calibration

Training data

Healthcare, finance, legal domain corpora

Supported Data Types

Personal Identifiers

Personal identifier data types with examples and risk levels
Data Type	Examples	Risk Level
Full Name	John Smith, Dr. Sarah Chen	Medium
Social Security Number	123-45-6789, xxx-xx-1234	High
Date of Birth	03/15/1985, March 15 1985	Medium
Driver's License	State-specific formats	High
Passport Number	Country-specific formats	High
National ID	Various international formats	High

Healthcare (PHI)

Healthcare Protected Health Information data types with examples and risk levels
Data Type	Examples	Risk Level
Medical Record Number	Facility-specific formats	High
Health Plan ID	Insurance member IDs	High
Diagnosis Code	ICD-10 codes	Medium
Medication	Drug names with dosages	Medium
Lab Results	Test values with units	Medium
Provider NPI	10-digit NPI numbers	Medium

Financial Data

Credit Card NumberHigh

Bank Account NumberHigh

Routing NumberMedium

Financial Account IDHigh

Authentication & Security

API KeyCritical

PasswordCritical

OAuth TokenCritical

Private KeyCritical

CertificateHigh

Custom Patterns

Define custom detection patterns for proprietary identifiers, internal codes, or industry-specific data types not covered by default models.

Regex patterns

Define custom regex for proprietary identifiers

How does detection handle informal mentions like "my social is..."?

Our context-aware models are trained on natural language patterns. They detect SSNs mentioned conversationally ("my social is one two three...") as well as formatted patterns ("123-45-6789").

Can I add custom detection patterns?

Yes. Team and Enterprise plans support custom regex patterns, keyword lists, and contextual rules. You can define proprietary identifiers, internal project codes, or industry-specific data types.

What's the false positive rate?

<0.5% on production workloads with balanced sensitivity. You can adjust detection sensitivity (conservative, balanced, aggressive) to tune the tradeoff between false positives and false negatives.

Does detection work on non-English text?

Detection currently performs best on English text. We're expanding language support. Contact us for current coverage of your specific languages.

How do I tune detection for my specific data?

Start with our default models and monitor detection results. Adjust sensitivity levels, enable/disable specific data types, and add custom patterns as needed. Enterprise customers get tuning assistance from our team.

What about images and documents?

We detect PII in text extracted from uploaded documents (PDFs, Word docs). Image OCR and direct image PII detection are on our roadmap.

Related Features

Context-Preserving Masking

After detection, our masking replaces sensitive data with semantic tokens that maintain meaning for accurate AI responses.

Secure Unmasking

Our Reveal Technology restores original values in AI responses for authorized users—with full access controls.

Compliance Automation

Detection logs feed directly into compliance reporting for HIPAA, SOC 2, and other frameworks.

Ready to See Detection in Action?

Try our interactive demo with sample data—or get started for free with your own content.

Demo uses synthetic data • No credit card required • Full detection capabilities