Skip to content

Classification Patterns

Automatic data classification rules and patterns for compliance.

Overview

GateFlow uses pattern matching and machine learning to automatically classify data based on sensitivity levels. This ensures consistent handling across your organization.

Classification Levels

LevelCodeDescriptionDefault Access
PublicpublicPublicly available informationAll users
InternalinternalInternal business useAuthenticated users
ConfidentialconfidentialSensitive business dataRole-based access
RestrictedrestrictedHighly sensitiveNamed individuals
PHIphiProtected Health InformationHIPAA-compliant access

Built-in Patterns

PII Patterns

python
from openai import OpenAI

client = OpenAI(
    base_url="https://api.gateflow.ai/v1",
    api_key="gw_prod_..."
)

# View built-in PII patterns
patterns = client.get("/compliance/patterns/pii")

for pattern in patterns["patterns"]:
    print(f"{pattern['name']}: {pattern['classification']}")

Built-in PII Patterns:

PatternExampleDefault Classification
SSN123-45-6789restricted
CREDIT_CARD4111-1111-1111-1111restricted
EMAILuser@example.cominternal
PHONE(555) 123-4567internal
ADDRESS123 Main St, City, ST 12345confidential
DATE_OF_BIRTH01/15/1990confidential
PASSPORTAB1234567restricted
DRIVER_LICENSED1234567restricted
BANK_ACCOUNT1234567890restricted
IP_ADDRESS192.168.1.1internal

PHI Patterns (HIPAA)

PatternExampleClassification
MEDICAL_RECORD_NUMBERMRN-12345678phi
HEALTH_PLAN_IDHP-987654321phi
DIAGNOSIS_CODEICD-10: J06.9phi
MEDICATIONLisinopril 10mgphi
LAB_RESULTA1C: 6.5%phi
PROVIDER_NPI1234567890confidential

Financial Patterns

PatternExampleClassification
ACCOUNT_NUMBER1234567890restricted
ROUTING_NUMBER021000021confidential
TAX_ID12-3456789restricted
SALARY$125,000restricted
REVENUE$1.2M quarterlyconfidential

Custom Patterns

Create a Custom Pattern

python
# Define a custom classification pattern
response = client.post(
    "/compliance/patterns",
    json={
        "name": "employee_id",
        "description": "Company employee ID format",
        "pattern": {
            "type": "regex",
            "value": r"EMP-[A-Z]{2}\d{6}",
            "case_sensitive": False
        },
        "classification": "internal",
        "actions": {
            "on_detect": "tag",
            "redact_in_logs": True
        }
    }
)

print(f"Pattern ID: {response['pattern_id']}")

Pattern Types

Regex Patterns

python
# Complex regex pattern
client.post(
    "/compliance/patterns",
    json={
        "name": "project_code",
        "pattern": {
            "type": "regex",
            "value": r"PRJ-\d{4}-[A-Z]{3}",
            "flags": ["IGNORECASE"]
        },
        "classification": "confidential"
    }
)

Keyword Patterns

python
# Keyword-based classification
client.post(
    "/compliance/patterns",
    json={
        "name": "confidential_keywords",
        "pattern": {
            "type": "keyword_list",
            "values": [
                "confidential",
                "proprietary",
                "trade secret",
                "internal only",
                "do not distribute"
            ],
            "match_type": "any",
            "case_sensitive": False
        },
        "classification": "confidential"
    }
)

ML-Based Patterns

python
# Machine learning classification
client.post(
    "/compliance/patterns",
    json={
        "name": "legal_documents",
        "pattern": {
            "type": "ml_classifier",
            "model": "document-classifier-v2",
            "labels": ["contract", "nda", "agreement", "legal"],
            "threshold": 0.85
        },
        "classification": "confidential"
    }
)

Classification Rules

Rule Priority

Rules are evaluated in priority order (lower number = higher priority):

python
# Create classification rules with priority
rules = [
    {
        "name": "phi_override",
        "priority": 1,
        "conditions": [
            {"pattern": "MEDICAL_RECORD_NUMBER"},
            {"pattern": "DIAGNOSIS_CODE"}
        ],
        "operator": "any",
        "classification": "phi"
    },
    {
        "name": "pii_sensitive",
        "priority": 10,
        "conditions": [
            {"pattern": "SSN"},
            {"pattern": "CREDIT_CARD"},
            {"pattern": "BANK_ACCOUNT"}
        ],
        "operator": "any",
        "classification": "restricted"
    },
    {
        "name": "default_internal",
        "priority": 100,
        "conditions": [
            {"pattern": "EMAIL"},
            {"pattern": "PHONE"}
        ],
        "operator": "any",
        "classification": "internal"
    }
]

for rule in rules:
    client.post("/compliance/classification-rules", json=rule)

Conditional Rules

python
# Classification based on multiple conditions
client.post(
    "/compliance/classification-rules",
    json={
        "name": "financial_report_rule",
        "priority": 5,
        "conditions": [
            {"pattern": "REVENUE"},
            {"metadata": {"document_type": "financial_report"}},
            {"content_contains": "quarterly results"}
        ],
        "operator": "all",  # All conditions must match
        "classification": "restricted"
    }
)

Context-Aware Rules

python
# Different classification based on context
client.post(
    "/compliance/classification-rules",
    json={
        "name": "email_context",
        "priority": 15,
        "conditions": [
            {"pattern": "EMAIL"}
        ],
        "context_rules": [
            {
                "context": {"document_type": "employee_directory"},
                "classification": "internal"
            },
            {
                "context": {"document_type": "customer_list"},
                "classification": "confidential"
            }
        ],
        "default_classification": "internal"
    }
)

Automatic Classification

On Document Upload

python
# Upload with automatic classification
response = client.post(
    "/data/documents",
    files={"file": open("report.pdf", "rb")},
    data={
        "auto_classify": True,
        "classification_rules": "default",  # Use default ruleset
        "min_classification": "internal"    # Floor classification
    }
)

print(f"Detected classification: {response['classification']}")
print(f"Patterns matched: {response['patterns_matched']}")

Classification Report

python
# Get classification details for a document
report = client.get(
    f"/data/documents/{document_id}/classification-report"
)

print(f"Final Classification: {report['classification']}")
print(f"\nPatterns Detected:")
for pattern in report["patterns_detected"]:
    print(f"  - {pattern['name']}: {pattern['count']} occurrences")
    print(f"    Locations: {pattern['locations'][:3]}...")

print(f"\nRules Applied:")
for rule in report["rules_applied"]:
    print(f"  - {rule['name']} (priority {rule['priority']})")

Override and Escalation

Manual Override

python
# Override automatic classification
client.patch(
    f"/data/documents/{document_id}/classification",
    json={
        "classification": "restricted",
        "reason": "Contains merger details not detected by patterns",
        "override_by": "compliance_officer@company.com",
        "expires_at": None  # Permanent override
    }
)

Escalation Rules

python
# Configure automatic escalation
client.post(
    "/compliance/escalation-rules",
    json={
        "name": "high_volume_pii",
        "trigger": {
            "pattern_count": {"SSN": {"$gte": 10}},
            "timeframe_minutes": 60
        },
        "action": {
            "escalate_to": ["security-team@company.com"],
            "auto_quarantine": True,
            "classification_override": "restricted"
        }
    }
)

Monitoring and Alerts

Classification Dashboard

python
# Get classification statistics
stats = client.get(
    "/compliance/classification-stats",
    params={
        "period": "30d",
        "group_by": "classification"
    }
)

for level, data in stats["by_classification"].items():
    print(f"{level}:")
    print(f"  Documents: {data['document_count']}")
    print(f"  Size: {data['total_size_mb']} MB")
    print(f"  Growth: {data['growth_percent']}%")

Alert Configuration

python
# Set up classification alerts
client.post(
    "/compliance/alerts",
    json={
        "name": "phi_detection_alert",
        "condition": {
            "classification": "phi",
            "source": {"$ne": "healthcare_system"}  # Unexpected PHI
        },
        "notify": ["compliance@company.com"],
        "severity": "high"
    }
)

Integration with Workflows

Pre-Processing Hook

python
# Configure classification as pre-processing step
client.post(
    "/workflows/hooks",
    json={
        "event": "document.uploaded",
        "action": {
            "type": "classify",
            "ruleset": "enterprise",
            "on_restricted": {
                "require_approval": True,
                "notify": ["data-governance@company.com"]
            }
        }
    }
)

Best Practices

  1. Start with built-in patterns - Use proven patterns as a foundation
  2. Layer custom patterns - Add organization-specific patterns on top
  3. Set appropriate priorities - Higher sensitivity = lower priority number
  4. Test before production - Validate patterns against sample data
  5. Monitor false positives - Tune patterns to reduce noise
  6. Regular audits - Review classification effectiveness quarterly
  7. Document overrides - Always require reasons for manual changes

API Reference

EndpointMethodDescription
/compliance/patternsGETList all patterns
/compliance/patternsPOSTCreate pattern
/compliance/patterns/{id}PATCHUpdate pattern
/compliance/classification-rulesGETList rules
/compliance/classification-rulesPOSTCreate rule
/data/documents/{id}/classification-reportGETGet classification details

Next Steps

Built with reliability in mind.