SDK Reference
PII & Masking

PII & Masking

Best practices for handling personally identifiable information (PII) in the Grounded Intelligence SDK.

Note: The package is installed as lumina-sdk, but we refer to it as the Grounded Intelligence SDK.

Core Principle

Keep PII minimal. Prefer a pointer-first approach where you correlate UI sessions and traces via provider pointers or annotations, and only send transcripts when necessary (masked by default in production).

Masking Modes

Configure how transcripts are captured:

Full Capture

Send complete, unredacted text:

import { Lumina, CaptureTranscript } from 'lumina-sdk'
 
Lumina.init({
  captureTranscript: CaptureTranscript.Full,
  maskFn: (text) => text // No masking
})

Use when:

  • Operating in a secure, compliant environment
  • Internal tools with proper data governance
  • Development/testing environments

Masked Capture

Apply a masking function before sending:

import { Lumina, CaptureTranscript } from 'lumina-sdk'
 
Lumina.init({
  captureTranscript: CaptureTranscript.Masked,
  maskFn: (text) => {
    // Mask emails
    text = text.replace(/[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}/g, '[EMAIL]')
    // Mask phone numbers
    text = text.replace(/\b\d{3}[-.]?\d{3}[-.]?\d{4}\b/g, '[PHONE]')
    // Mask SSN
    text = text.replace(/\b\d{3}-\d{2}-\d{4}\b/g, '[SSN]')
    // Mask credit cards
    text = text.replace(/\b\d{4}[-\s]?\d{4}[-\s]?\d{4}[-\s]?\d{4}\b/g, '[CARD]')
    return text
  }
})

Use when:

  • You need transcript analysis
  • Must redact sensitive data
  • Operating in regulated industries

No Capture

Skip transcripts entirely:

import { Lumina, CaptureTranscript } from 'lumina-sdk'
 
Lumina.init({
  captureTranscript: CaptureTranscript.None
})

Use when:

  • PII compliance requires no text storage
  • Highly sensitive environments (healthcare, finance)
  • Only metadata and metrics needed

Note: You can still track metadata, tools, retrieval, and annotations. Only message transcripts are skipped.

What Gets Masked (examples)

PatternExampleReplacement
Emailjane@example.com[EMAIL]
Phone555-123-4567, (555) 123-4567, 5551234567[PHONE]
SSN123-45-6789[SSN]
Credit Card4532-1234-5678-9010, 4532 1234 5678 9010[CARD]

Selective Masking

Mask only specific patterns by adjusting your maskFn:

maskFn: (text) => text
  .replace(/[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}/g, '[EMAIL]')
  .replace(/\b\d{4}[-\s]?\d{4}[-\s]?\d{4}[-\s]?\d{4}\b/g, '[CARD]')

Custom Replacement Text

// Use different replacement text
maskFn: (text) => text.replace(/[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}/g, '<EMAIL>')
 
// Input:  "Contact me at jane@example.com"
// Output: "Contact me at <EMAIL>"

Custom Mask Functions

You can also write custom masking functions:

Email & Phone Redaction

maskFn: (text) => {
  return text
    .replace(/[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}/g, '<EMAIL>')
    .replace(/\b\d{3}[-.]?\d{3}[-.]?\d{4}\b/g, '<PHONE>')
}

Credit Card Masking

maskFn: (text) => {
  return text
    .replace(/\b\d{4}[-\s]?\d{4}[-\s]?\d{4}[-\s]?\d{4}\b/g, '<CARD>')
    .replace(/\b\d{3,4}\b/g, (match) => 
      match.length >= 3 ? '<CVV>' : match
    )
}

SSN & Government IDs

maskFn: (text) => {
  return text
    .replace(/\b\d{3}-\d{2}-\d{4}\b/g, '<SSN>')
    .replace(/\b[A-Z]{2}\d{6,8}\b/g, '<ID>')
}

IP Addresses

maskFn: (text) => {
  return text.replace(
    /\b(?:\d{1,3}\.){3}\d{1,3}\b/g,
    '<IP_ADDRESS>'
  )
}

API Keys & Tokens

maskFn: (text) => {
  return text
    .replace(/\b[A-Za-z0-9_-]{32,}\b/g, '<TOKEN>')
    .replace(/Bearer\s+[A-Za-z0-9\-._~+/]+=*/g, 'Bearer <TOKEN>')
}

Combine Multiple Patterns

maskFn: (text) => {
  // Apply multiple patterns
  text = text
    .replace(/\bpassword:\s*\S+/gi, 'password: <REDACTED>')
    .replace(/\bapi[_-]?key:\s*\S+/gi, 'api_key: <REDACTED>')
  
  return text
}

Structured Data Masking

When adding annotations or metadata, mask sensitive fields:

turn.annotate({
  user_email: maskEmail(user.email),
  account_id: user.accountId, // Safe to log (reference)
  ip_address: '<REDACTED>',
  session_duration_ms: 45000 // Safe (metric)
})
 
function maskEmail(email: string): string {
  const [local, domain] = email.split('@')
  return `${local[0]}***@${domain}`
}
 
// Input: "jane.doe@example.com"
// Output: "j***@example.com"

Pointer-First Approach

Prefer storing references to sensitive context rather than raw transcripts.

With Providers (Recommended)

// Initialize Lumina with a UI provider (e.g., PostHog)
Lumina.init({
  captureTranscript: CaptureTranscript.None, // Don't store transcripts
  uiAnalytics: postHogProvider({ /* ... */ })
})
 
const session = await Lumina.session.start()
const turn = session.turn()
 
// Attach UI pointers via annotations
turn.annotate({
  ui_session_id: uiProvider.getSessionId(),
  ui_replay_url: uiProvider.getReplayUrl(),
})
 
// Avoid setMessages(); only metadata is sent
await turn.finish()

Benefits:

  • Sensitive data stays in the replay/tracing tools
  • Dashboard shows metadata + links to replays/traces
  • Complies with strict PII policies while preserving correlation

Compliance Considerations

GDPR (General Data Protection Regulation)

Requirements:

  • Right to access: Users can request their data
  • Right to erasure: Users can request deletion
  • Data minimization: Only collect necessary data
  • Purpose limitation: Use data for stated purposes only

Lumina Implementation:

// Use pointer-based approach for EU users
Lumina.init({
  captureTranscript: CaptureTranscript.None, // No text storage
  uiAnalytics: postHogProvider({ /* ... */ })
})
 
// Support data deletion by distinctId
async function handleGDPRDeletion(userId: string) {
  await fetch('/api/lumina/delete-user', {
    method: 'DELETE',
    body: JSON.stringify({ distinctId: userId })
  })
}
 
// Support data export
async function handleGDPRExport(userId: string) {
  const data = await fetch(`/api/lumina/export-user/${userId}`)
  return data.json()
}

HIPAA (Health Insurance Portability and Accountability Act)

Requirements:

  • No protected health information (PHI) in logs
  • Business Associate Agreements (BAA) required
  • Audit trails for all data access
  • Encryption at rest and in transit

Lumina Implementation:

// Strict masking for healthcare
Lumina.init({
  captureTranscript: CaptureTranscript.Masked,
  maskFn: (text) => {
    // Add healthcare-specific patterns
    text = text
      .replace(/\bMRN:?\s*\d+/gi, '[MRN_REDACTED]') // Medical Record Number
      .replace(/\bDOB:?\s*[\d/-]+/gi, '[DOB_REDACTED]') // Date of Birth
      .replace(/\b(?:diagnosed|diagnosis|condition|symptom|medication):\s*\w+/gi, '[CONDITION_REDACTED]')
    
    return text
  }
})
 
// Or use pointer-only approach
Lumina.init({
  captureTranscript: CaptureTranscript.None
})

CCPA (California Consumer Privacy Act)

Requirements:

  • Disclose data collection practices
  • Right to opt-out of data sale
  • Right to deletion
  • Right to data portability

Lumina Implementation:

// Opt-out mechanism
function handleCCPAOptOut(userId: string) {
  // Stop tracking
  Lumina.reset()
  
  // Record opt-out
  localStorage.setItem('lumina_opted_out', 'true')
}
 
// Check opt-out status before initialization
if (localStorage.getItem('lumina_opted_out') !== 'true') {
  Lumina.init({ /* ... */ })
}

PCI DSS (Payment Card Industry Data Security Standard)

Requirements:

  • Never store CVV/CVV2 codes
  • Mask PAN (Primary Account Number) if stored
  • Encrypt cardholder data
  • Restrict access to cardholder data

Lumina Implementation:

Lumina.init({
  captureTranscript: CaptureTranscript.Masked,
  maskFn: (text) => text
    .replace(/\b\d{4}[-\s]?\d{4}[-\s]?\d{4}[-\s]?\d{4}\b/g, '[CARD]')
    .replace(/[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}/g, '[EMAIL]')
})
 
// Never log payment details in annotations
anchor.annotate({
  transaction_id: 'txn_abc123', // ✅ Safe reference
  amount: 99.99, // ✅ OK
  currency: 'USD', // ✅ OK
  // ❌ Never do this:
  // card_number: '4532-1234-5678-9010'
  // cvv: '123'
})

Testing Your Mask Function

Always test your masking function with representative data:

// Simple manual tests
const testCases = [
  "Contact me at bob@example.com",
  "My card is 4532-1234-5678-9010",
  "Call me at 555-123-4567 or (555) 987-6543",
  "My SSN is 123-45-6789",
]
 
const maskFn = (text: string) => {
  return text
    .replace(/[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}/g, '[EMAIL]')
    .replace(/\b\d{3}[-.]?\d{3}[-.]?\d{4}\b/g, '[PHONE]')
    .replace(/\b\d{3}-\d{2}-\d{4}\b/g, '[SSN]')
    .replace(/\b\d{4}[-\s]?\d{4}[-\s]?\d{4}[-\s]?\d{4}\b/g, '[CARD]')
}
 
console.log('=== Masking Tests ===')
for (const input of testCases) {
  console.log('Input: ', input)
  console.log('Output:', maskFn(input))
}

Automated Testing

Create unit tests for your masking function:

import { describe, it, expect } from 'vitest'
 
const maskFn = (text: string) => text
  .replace(/[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}/g, '[EMAIL]')
  .replace(/\b\d{3}[-.]?\d{3}[-.]?\d{4}\b/g, '[PHONE]')
  .replace(/\b\d{4}[-\s]?\d{4}[-\s]?\d{4}[-\s]?\d{4}\b/g, '[CARD]')
 
describe('PII Masking', () => {
  it('masks emails', () => {
    expect(maskFn('bob@example.com')).toContain('[EMAIL]')
  })
  it('masks phone numbers', () => {
    expect(maskFn('Call 555-123-4567')).not.toContain('555-123-4567')
  })
  it('masks credit cards', () => {
    expect(maskFn('Card: 4532-1234-5678-9010')).toContain('[CARD]')
  })
})

Best Practices

1. Mask by Default in Production

// ✅ Good: Mask in production
const isDev = process.env.NODE_ENV === 'development'
 
Lumina.init({
  captureTranscript: isDev ? CaptureTranscript.Full : CaptureTranscript.Masked,
  maskFn: isDev ? (t) => t : (t) => t.replace(/[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}/g, '[EMAIL]')
})
 
// ❌ Bad: Full capture in production
Lumina.init({
  captureTranscript: CaptureTranscript.Full // Risky!
})

2. Use Pointer-Based Approach When Possible

// ✅ Good: Pointer-first architecture
Lumina.init({
  captureTranscript: CaptureTranscript.None,
  uiAnalytics: postHogProvider({ /* ... */ })
})
 
// ❌ Less optimal: Storing full transcripts
Lumina.init({
  captureTranscript: CaptureTranscript.Full // No provider correlation
})

3. Test Masking Thoroughly

// ✅ Good: Comprehensive tests
const testCases = [
  "Email: bob@example.com",
  "Phone: 555-123-4567",
  "Card: 4532-1234-5678-9010",
  "SSN: 123-45-6789",
  "Mixed: Call bob@example.com at 555-123-4567"
]
 
testCases.forEach(test => {
  const masked = maskFn(test)
  console.assert(!masked.includes('@'), 'Email not masked')
  console.assert(!masked.match(/\d{3}-\d{3}-\d{4}/), 'Phone not masked')
})

4. Document Your Privacy Approach

/**
 * PII Masking Strategy:
 *
 * - captureTranscript: Masked (production) / Full (development)
 * - Patterns masked: emails, phones, credit cards, SSN
 * - Pointer-first: UI session replays (PostHog) for full context
 * - Compliance: GDPR, HIPAA, CCPA friendly
 */
Lumina.init({
  captureTranscript: CaptureTranscript.Masked,
  maskFn: (text) => text.replace(/\b[\w.-]+@[\w.-]+\.[A-Za-z]{2,}\b/g, '[EMAIL]')
})

5. Monitor for PII Leaks

// Add validation to catch unmasked PII
function validateNoPI I(text: string): boolean {
  const piiPatterns = [
    /[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}/, // Email
    /\d{3}-\d{2}-\d{4}/, // SSN
    /\d{4}[-\s]?\d{4}[-\s]?\d{4}[-\s]?\d{4}/ // Credit card
  ]
  
  return !piiPatterns.some(pattern => pattern.test(text))
}
 
// Use in development to catch leaks
if (process.env.NODE_ENV === 'development') {
  const originalMaskFn = maskFn
  maskFn = (text) => {
    const masked = originalMaskFn(text)
    if (!validateNoPII(masked)) {
      console.error('PII detected after masking:', masked)
    }
    return masked
  }
}

Next Steps