Reference Guide

PII/PHI Detection Guide

Understanding sensitive data types and detection methods. This guide covers the 40+ data types that require protection in AI workflows, along with detection strategies and risk classification.

35 min readReference guideFor security teams

Explore Data Types

Sensitive Data Categories

Six primary categories of sensitive data that require protection in AI workflows.

high risk

Personal Identifiers

Data that directly identifies an individual

Examples:

Full nameSocial Security NumberDriver's licensePassport number

Pattern matching + context analysis

high risk

Contact Information

Data used to contact or locate an individual

Examples:

Email addressPhone numberPhysical addressIP address

Regex patterns + format validation

high risk

Financial Data

Banking, payment, and financial account information

Examples:

Credit card numbersBank accountRouting numbersFinancial records

Luhn algorithm + pattern matching

high risk

Health Information (PHI)

Medical records and health-related data

Examples:

Medical recordsPrescriptionsLab resultsInsurance IDs

Healthcare terminology + identifier patterns

high risk

Authentication Data

Credentials and access tokens

Examples:

PasswordsAPI keysOAuth tokensSSH keys

Entropy analysis + pattern matching

medium risk

Location Data

Geographic and location information

Examples:

GPS coordinatesStreet addressesZip codesGeolocation

Geographic format patterns

PII/PHI Types Reference

Comprehensive list of sensitive data types with detection methods.

Data Type	Category	Risk	Regex	ML	Example
SSN	Identifier	high	Yes	Yes	123-45-6789
Full Name	Identifier	high	-	Yes	John Smith
Email	Contact	high	Yes	-	john@example.com
Phone	Contact	medium	Yes	-	(555) 123-4567
Credit Card	Financial	high	Yes	-	4111-1111-1111-1111
Bank Account	Financial	high	Yes	Yes	Account: 12345678
Address	Location	medium	-	Yes	123 Main St, City
Date of Birth	Identifier	medium	Yes	Yes	01/15/1990
IP Address	Technical	medium	Yes	-	192.168.1.1
Medical Record	PHI	high	Yes	Yes	MRN: 12345678
API Key	Credential	high	Yes	Yes	sk_live_xxx...
Password	Credential	high	-	Yes	Password123!

Detection Methods

Understanding different approaches to PII/PHI detection.

Pattern Matching (Regex)

Rules-based detection using regular expressions for structured data formats.

Strengths

+High precision for structured data
+Fast execution
+Predictable results

Limitations

-Limited context understanding
-Cannot detect unstructured PII
-Maintenance overhead

Best for: SSN, credit cards, email, phone numbers

Machine Learning (NER)

Named entity recognition models trained to identify PII in unstructured text.

Strengths

+Handles context and variations
+Detects unstructured PII
+Improves over time

Limitations

-Requires training data
-May have false positives
-Computationally intensive

Best for: Names, addresses, free-form text

Hybrid Approach

Combines regex patterns with ML models for comprehensive detection.

Strengths

+Best of both methods
+Higher recall and precision
+Handles edge cases

Limitations

-More complex to implement
-Requires tuning
-Higher latency

Best for: Enterprise-grade protection

Risk Classification Matrix

How to prioritize protection based on data sensitivity.

High Risk

Data that can directly lead to identity theft, financial loss, or regulatory violations.

SSNCredit CardBank AccountMedical RecordsPasswordsAPI Keys

Always mask before AI transmission

Medium Risk

Data that could contribute to identification when combined with other information.

Phone NumberEmail AddressDate of BirthIP AddressEmployee ID

Mask based on context and policy

Lower Risk

Data with limited sensitivity but still requiring consideration in aggregate.

First Name OnlyCity/StateJob TitleOrganization Name

Monitor and log for audit

Implementation Steps

How to implement PII/PHI detection in your organization.

Data Discovery

Identify where sensitive data exists in your AI workflows

Audit current AI tool usage
Map data flows to AI systems
Identify data sources

Classification

Categorize data by type and risk level

Apply data taxonomy
Assign risk levels
Document data lineage

Detection Configuration

Configure detection rules for your data types

Enable built-in detectors
Create custom patterns
Set confidence thresholds

Protection Policies

Define how each data type should be handled

Set masking rules
Configure reveal permissions
Establish exceptions

Monitoring

Continuously monitor and improve detection accuracy

Review detection logs
Tune false positives
Update patterns

Continue Learning

Explore related guides and resources.

AI Security 101

Complete introduction to AI security fundamentals.

Read Guide

HIPAA Compliance Guide

Healthcare-specific guidance for PHI protection.

Read Guide

Glossary

Definitions for PII, PHI, and security terms.

Read Guide

Automate PII/PHI Detection

Secured AI detects 40+ sensitive data types with high accuracy, protecting your data before it reaches any AI system.

Explore Detection Features Request Demo

Free trial - No credit card required