PHI vs PII: Differences, Examples, and How to Protect Each

Fast definitions: PHI, PII, and ePHI

What is PII?

PII (Personally Identifiable Information) generally means information that can identify a person—directly (like a full name) or indirectly (like a unique identifier that can be tied back to them).

PII is used broadly across industries (finance, retail, government). It is also commonly referenced in security programs even when the legal definition differs.

What is PHI?

PHI (Protected Health Information) is identifiable health information that is handled in a HIPAA-covered context—typically by a covered entity or a business associate.

PHI typically relates to a person's health condition, care, or payment for care—and includes identifiers.

What is ePHI?

ePHI is PHI in electronic form. The "e" matters because technical controls—access control, encryption, logging, DLP, endpoint management—apply most directly to electronic systems.

Quick classification shortcut:
If it identifies a person, it may be PII.
If it identifies a person and relates to healthcare or payment in a HIPAA context, it may be PHI.
If that PHI lives in systems/files/messages, it is likely ePHI.

Why the difference matters (for CISOs and compliance)

In healthcare security programs, "PHI vs PII" is not just semantics. It affects:

Scope: what systems are in your HIPAA Security Rule scope and what systems still need strong privacy controls
Vendor contracting: when you need a BAA vs a standard DPA, and what controls and audit rights you require
Incident response: what notifications and timelines may apply, who owns the decision, and what evidence you must preserve
Training: what staff should avoid sharing in tickets, chat, email, and AI tools
Data strategy: how to use data for analytics and AI while reducing identification risk

The operational reality: staff don't label data correctly

Most frontline staff do not think in regulatory categories. They think in tasks. If your program relies on staff perfectly identifying PHI vs PII in the moment, you will lose. Instead, build:

Safe defaults (approved channels, templates, redaction)
Automation (warnings, DLP, access controls)
Clear examples ("Don't paste patient names into tickets; use record IDs.")

PHI vs PII: quick comparison matrix

PHI vs PII vs ePHI Quick Comparison infographic featuring three vertical columns.

Dimension	PII	PHI	ePHI
What it is	Data that identifies a person	Identifiable health/payment info in HIPAA context	PHI stored/transmitted electronically
Industry scope	Any industry	Healthcare-specific	Healthcare-specific
Does context matter?	Less; it's about identifiability	Yes; who holds it and why matters	Yes; also depends on system/transmission
Typical examples	Name, email, phone, SSN	Patient name + appointment; MRN + lab results	PHI in EHR, email, cloud docs, tickets
Main risks	Identity theft, fraud, privacy harm	Privacy harm + regulatory exposure	Same as PHI + broader cyber risk

Is PHI considered PII?

Often, yes—PHI frequently contains identifiers, so it often overlaps with PII. But it helps to be precise:

PHI is not "a type of PII" everywhere. PHI is a healthcare-specific concept tied to HIPAA context.
Many PHI elements contain PII. Patient name, address, email, phone number, MRN, and dates can all be identifying.
Not all PII is PHI. A hospital employee's payroll records may be PII but not PHI.

Useful way to say it internally: "PHI is often PII, plus healthcare context."

Common healthcare examples (and how to label them)

Data element	Usually PII?	Usually PHI?	Notes
Patient name in provider system	Yes	Often yes	Treat as PHI in provider environment
Employee name in HR/payroll	Yes	No (typically)	Still sensitive, usually not HIPAA PHI
Medical record number (MRN)	Often	Often yes	Strong internal identifier
Diagnosis code + patient name	Yes	Yes	Classic PHI example
Appointment reminder with name	Yes	Often yes	Care provision context is enough
Insurance ID + claim status	Yes	Often yes	Payment for care tied to individual
De-identified dataset	Not necessarily	Not necessarily	Still sensitive due to re-identification risk
IP address from patient portal	Sometimes	Could be	Becomes identifying when tied to portal accounts

Operational rule: If the artifact can identify a patient and relates to care or payment, treat it as PHI and handle it through approved channels.

Edge cases that confuse teams

1) A list of patient names (no diagnoses)

Many teams assume a list of names is "just PII." In a healthcare provider context, a patient list can still be treated as PHI because it can indicate the individual received healthcare services.

2) Marketing and outreach data

Healthcare outreach campaigns can include identifiers and care context. Even when the content feels "non-clinical," it may still be regulated internally as PHI.

3) Call recordings and voicemails

A voicemail that includes "My name is… my DOB is… I need my medication refilled" contains identifiers and care context. Treat recordings and transcripts as ePHI when they contain PHI.

4) Support tickets and chat threads

Many healthcare "incidents" are not sophisticated hacks—they are convenience leaks: an agent pastes a full portal message, a nurse posts a screenshot in the wrong channel.

5) Consumer health apps

Health data collected by consumer apps may be highly sensitive, but it is not automatically PHI under HIPAA unless handled by a HIPAA covered entity.

Where ePHI fits

In many organizations, the most important practical distinction is not PHI vs PII—it is PHI vs ePHI. That is because ePHI is where security programs can meaningfully apply technical controls at scale.

Examples of ePHI in everyday workflows

EHR records and exports (PDFs, CCDAs, clinical summaries)
email threads about patients and attachments
cloud documents and shared drives containing patient data
ticket attachments and screenshots
chat transcripts in collaboration tools
call transcripts stored in contact center platforms

Why this matters for CISOs

The "e" is what turns privacy rules into security engineering: identity and access management, encryption, logging and audit trails, DLP and content inspection, endpoint controls, data lifecycle controls.

De-identified data: when it's not PHI (and still risky)

Even if a dataset is not PHI under your chosen de-identification method, it can still create risk:

Re-identification: linking to other datasets can reveal identities
Inference: some models can infer sensitive attributes even from partial data
Misuse: broad access invites unintended use cases

Minimum necessary: the rule that prevents oversharing

A workflow diagram titled Use IDs, not identities showing three steps: Problem report, Safe handoff, and Resolution.

If you want one principle that reduces both PHI and PII exposure, it is this: share the minimum necessary to accomplish the task.

Minimum necessary in modern workflows

In tickets: reference the record ID and describe the issue; avoid pasting notes and screenshots with identifiers
In chat: do not post patient identifiers in broad channels; use approved secure messaging
With vendors: share synthetic examples first; escalate to real data only when necessary
With AI tools: prefer redacted text and structured summaries over raw patient communications

Training line that works: "Use IDs, not identities."

How to classify data in practice (a workable framework)

A simple 4-tier model for healthcare teams

Tier	Examples	Handling
Tier 1: Regulated clinical sensitivity (PHI/ePHI)	Clinical notes, lab results, imaging, claims	Approved systems only; strict access control; logging; encryption
Tier 2: High-risk identifiers (PII)	SSN, driver's license, passport, payroll IDs	Restricted storage; encryption; strong IAM; monitoring
Tier 3: Operational sensitive	Internal HR issues, security incidents, vendor credentials	Role-restricted; avoid broad sharing; controlled tools
Tier 4: Public / non-sensitive	Published content, general education materials	Normal collaboration tools permitted

Controls that reduce PHI/PII exposure across real workflows

The most effective programs don't just say "don't share PHI." They make it hard to do the wrong thing accidentally.

1) Control where PHI can go (approved channel strategy)

Define approved messaging and file storage for PHI
Block or discourage PHI in tools not designed for it
Document exceptions and require justification

2) Content-aware guardrails (practical DLP)

Detect identifiers in outbound email and chat
Show "are you sure?" prompts when sharing externally
Scan ticket attachments and warn when they contain identifiers

3) Reduce bulk exports

Build report access inside controlled systems
Use time-limited links instead of attachments
Monitor unusual export patterns

4) Identity, access, and auditing

Role-based access aligned to job functions
MFA for privileged access and remote access
Regular access reviews

5) Endpoint controls

Full-disk encryption for laptops
MDM for mobile devices accessing portals
Screen lock policies and idle timeouts

6) Vendor controls that match your PHI reality

BAA when PHI is handled
Data retention limits and deletion SLAs
Access logging, admin roles, and support access constraints

PHI/PII in AI workflows (LLMs, summarization, copilots)

PHI and PII in AI workflows illustration depicting a Prompt chatbox leading to a Redaction/Masking filter and Policy boundary shield.

AI is where PHI vs PII confusion becomes a real risk—fast. People paste text into tools to save time, and the tool's output looks helpful, so the behavior spreads.

Common AI use cases that accidentally include PHI

summarizing patient portal messages
drafting letters to patients or payers
generating ticket summaries from screenshots
creating "examples" for training using real incidents
extracting structured fields from unstructured notes

A safer operating model for AI in healthcare

approved tools only for workflows that may involve PHI
clear data boundaries (what can be sent, what cannot)
redaction/masking as a default step before prompting
logging and oversight for prompts/outputs where required

Incident triage: questions to ask when data is exposed

When something goes wrong, the PHI vs PII distinction shapes triage and escalation.

First questions to ask

What data was involved? Names, MRNs, DOBs, diagnoses, claims, credentials, screenshots?
Can it identify a person? Directly or indirectly?
Does it relate to care, payment, or healthcare operations?
Who received it / could access it?
How long was it exposed?
Can we revoke access?
Do we have logs?

Program takeaway: If your incident process begins with "Is this PHI?" you will lose time. Start with "Can someone be harmed?" and "Can we contain it?"

Frequently Asked Questions

Is PHI the same as PII?

No. PII is a broad category of identifying data used across industries. PHI is identifiable health/payment information in a HIPAA-covered healthcare context. PHI often includes PII, but not all PII is PHI.

Is a patient's name PHI or PII?

A patient's name is an identifier (PII). In a healthcare provider context, a patient name is often treated as PHI because it is tied to healthcare services or can imply a relationship with a provider. In practice, many healthcare organizations handle patient names as PHI.

Is an appointment reminder PHI?

It can be. If the reminder identifies the patient and relates to healthcare services (even without diagnosis), many organizations treat it as PHI and protect it accordingly.

What is ePHI?

ePHI is PHI in electronic form—records in an EHR, PDFs, emails, cloud docs, tickets, and system logs that contain PHI.

If data is 'de-identified,' is it still PHI?

Properly de-identified data may not be PHI under HIPAA standards, but it can still be sensitive and carry re-identification risk. Many organizations restrict access and sharing of de-identified datasets anyway.

What is the easiest way to avoid PHI/PII mistakes?

Use approved channels, share the minimum necessary, and avoid copying identifiers into tools that are not designed or approved for sensitive data. Build safe defaults (templates, warnings, restricted sharing) so staff don't need to be experts to do the right thing.

TL;DR

Table of Contents

Fast definitions: PHI, PII, and ePHI

What is PII?

What is PHI?

What is ePHI?

Why the difference matters (for CISOs and compliance)

The operational reality: staff don't label data correctly

PHI vs PII: quick comparison matrix

Is PHI considered PII?

Common healthcare examples (and how to label them)

Edge cases that confuse teams

1) A list of patient names (no diagnoses)

2) Marketing and outreach data

3) Call recordings and voicemails

4) Support tickets and chat threads

5) Consumer health apps

Where ePHI fits

Examples of ePHI in everyday workflows

Why this matters for CISOs

De-identified data: when it's not PHI (and still risky)

Minimum necessary: the rule that prevents oversharing

Minimum necessary in modern workflows

How to classify data in practice (a workable framework)

A simple 4-tier model for healthcare teams

Controls that reduce PHI/PII exposure across real workflows

1) Control where PHI can go (approved channel strategy)

2) Content-aware guardrails (practical DLP)

3) Reduce bulk exports

4) Identity, access, and auditing

5) Endpoint controls

6) Vendor controls that match your PHI reality

PHI/PII in AI workflows (LLMs, summarization, copilots)

Common AI use cases that accidentally include PHI

A safer operating model for AI in healthcare

Incident triage: questions to ask when data is exposed

First questions to ask

Frequently Asked Questions

Protect PHI and PII in your AI workflows

Related Articles

What is PII? The Complete Guide to Personally Identifiable Information

What is PHI? A Practical Guide to Protected Health Information

AI Security Risks: 2026 Threat Landscape