Skip to Content
Data Loss PreventionTokenization & Masking

Tokenization & Masking

Slim.io provides two primary mechanisms for protecting sensitive data discovered during scans or intercepted by Endpoint DLP: tokenization (reversible encryption) and masking (irreversible redaction). This page covers how each works, when to use them, and how to interact with the tokenization API.

Tokenization (Reversible)

Tokenization replaces sensitive values with encrypted tokens using AES-256 encryption with message authentication. The original value can be recovered by authorized users through a time-limited grant, making tokenization suitable for scenarios where the data may be needed later.

Three Tokenization Modes

Slim.io supports three tokenization modes, each with different properties:

ModeBest forProperties
DeterministicCross-system joins, deduplicationSame input always produces the same token. Enables matching across CRM, warehouse, and logs without revealing the original value.
RandomizedMaximum securityEach tokenization produces a unique token, even for the same input. No joinability.
Format-PreservingDownstream validationOutput has the same format as the input (e.g., SSN stays 9 digits, phone keeps country code). Passes downstream schema validation.

How It Works

  1. The original PII value is normalized (e.g., SSN dashes stripped, email lowercased).
  2. A per-tenant encryption key is retrieved from the key management system.
  3. The value is encrypted using the selected mode.
  4. The encrypted token replaces the original value.
  5. A token record is stored in the vault with full audit metadata.
Original: "123-45-6789" Tokenized: "slim_AQAB8Gt9hPzI5Qx7kM2..." (deterministic/randomized) or: "541-89-6743" (format-preserving)

Token Properties

Each token is:

  • Authenticated — Includes a message authentication tag bound to the tenant, PII type, and scope. Tampering is detected before any decryption is attempted.
  • Tenant-scoped — Cannot be decrypted by any other tenant’s keys.
  • URL-safe — Uses base64url encoding, suitable for HTTP headers, query strings, and JSON.
  • Audited — Every tokenization and detokenization is recorded in the decision audit trail.

Deterministic Scope

For deterministic mode, you control the linkability boundary — how broadly the same input produces the same token:

ScopeJoinabilityUse case
GlobalSame token across all connectorsCross-system joins (CRM + warehouse + logs)
ConnectorSame token within one data sourceIntra-system matching only
ResourceSame token within one table/bucketIntra-table deduplication
FieldSame token within one columnMaximum isolation

Key Management

Slim.io uses a three-tier key hierarchy:

  • Root Key — Stored in a cloud-managed hardware security module (HSM), never exported.
  • Per-Tenant Key — Derived per tenant with automatic rotation (default: every 90 days).
  • Data Encryption Key (DEK) — Ephemeral key used for each encryption operation, wrapped by the tenant key.

Key rotation creates new DEKs while retaining previous keys for decrypting older tokens.

Format-Preserving Tokenization

For use cases where the token must maintain the same format as the original value:

PII TypeInputOutputPreserved
US SSN123-45-6789541-89-6743Last 3 digits cleartext
Phone(555) 123-4567(555) 987-6543Country + area code
Passport (US)1234567899876543219-digit format
Monetary$1,234.56$5,678.90Currency format

Format-preserving tokenization uses a smaller cryptographic domain than standard AES-256 tokenization. For PII types with small domains (e.g., SSN), your policy must explicitly acknowledge this tradeoff. Use deterministic or randomized mode when format preservation is not required.

Tokenization Policies

Policies control what gets tokenized, in which mode, and who can detokenize.

Creating a Policy

Navigate to Tokenization → Token Policies in the dashboard, or use the API:

POST /api/v1/tokens/policies Authorization: Bearer $TOKEN Content-Type: application/json { "name": "HIPAA Compliance Policy", "type_rules": { "us_ssn": { "mode": "format_preserving", "min_confidence": 0.85, "deterministic_scope": "global" }, "email": { "mode": "deterministic", "min_confidence": 0.8 } }, "allowed_roles": ["admin", "privacy_officer"], "allowed_purposes": ["customer_support", "regulatory"], "require_reason": true, "failure_policy": "fail_closed" }

Policy Evaluation

When a value is tokenized, the policy is evaluated in a deterministic order:

  1. Emergency disable check
  2. Policy enabled check
  3. PII type rule lookup
  4. Confidence threshold check
  5. Context label filters
  6. Mode selection
  7. Tokenize or skip

Every decision is logged in the audit trail with the policy ID and version.

Detokenization (Grant-Based)

Detokenization requires a time-limited grant — you cannot decrypt tokens directly. This ensures every access is authorized, audited, and bounded.

Grant Flow

  1. Request a grant with the token IDs, purpose, and reason.
  2. The system validates your role and purpose against the policy.
  3. If authorized, a single-use grant is issued (valid for 5 minutes).
  4. Use the grant to detokenize the specified tokens.
  5. The grant is consumed — it cannot be reused.
# Step 1: Request a grant POST /api/v1/tokens/grant Authorization: Bearer $TOKEN { "token_ids": ["tok_abc123", "tok_def456"], "purpose": "customer_support", "reason": "Case #12345 — customer identity verification" } # Step 2: Detokenize with the grant POST /api/v1/tokens/detokenize Authorization: Bearer $TOKEN { "tokens": ["tok_abc123", "tok_def456"], "grant_jwt": "eyJ..." }

Grants are single-use and expire after 5 minutes. Each grant is bound to the requesting user, tenant, and specific token IDs — it cannot be reused for different tokens or by different users.

Detokenization Audit

All detokenization events are recorded with:

  • Who requested access (user ID, role)
  • Why they needed access (purpose, reason text)
  • Which tokens were accessed
  • Whether access was granted or denied
  • Timestamp and trace ID for correlation

View the audit trail at Tokenization → Detokenize Audit in the dashboard.

Masking (Irreversible)

Masking permanently removes the sensitive value by replacing it with a redaction marker. The original data cannot be recovered.

Masking Strategies

StrategyOutputUse Case
redact[REDACTED]General-purpose redaction
partial***-**-6789Preserve last N characters for reference
category[SSN]Replace with the PII category label
hasha1b2c3d4...One-way hash for deduplication without revealing the value
null(empty string)Remove the value entirely

When to Use Tokenization vs. Masking

ConsiderationTokenizationMasking
ReversibilityYes — authorized users can decrypt via grantNo — original value is destroyed
Incident investigationPreferred — analysts can reveal values when neededNot suitable — values are permanently lost
Compliance (right to erasure)Delete the encryption key to render all tokens irrecoverableAlready irreversible by design
AI/LLM workflowsTokens in LLM output can be rehydrated for authorized usersMasked values cannot be recovered
Cross-system joinsDeterministic tokens enable joins without revealing valuesMasked values cannot be correlated
Downstream processingFormat-preserving tokens maintain data structureMasked values may break downstream schemas

A common pattern is to tokenize during initial detection and mask only when a governance policy explicitly requires permanent redaction — for example, before data is exported to a third-party system.

SDK Integration

Tokenization is available via the Python and Node.js SDKs:

# Python from slim_tokens import SlimTokens client = SlimTokens(api_key="slim_...") result = client.tokenize( items=[{"value": "123-45-6789", "pii_type": "us_ssn"}], policy_id="pol_hipaa", ) print(result.results[0].token) # slim_AQAB...
// Node.js import { SlimTokens } from '@slim-io/tokens'; const client = new SlimTokens({ apiKey: 'slim_...' }); const result = await client.tokenize({ items: [{ value: '123-45-6789', piiType: 'us_ssn' }], policyId: 'pol_hipaa', }); console.log(result.results[0].token); // slim_AQAB...

LLM Context Rehydration

When your LLM receives tokenized data (tokens in the context or prompt), slim.io can automatically detect and replace them with the original values — subject to your detokenization policy. This enables AI workflows to operate on real data while maintaining access controls.

Rehydration is bounded: a maximum of 100 tokens per request, with output size limits to prevent amplification attacks. Cross-tenant tokens are automatically blocked.

Configure rehydration settings at Tokenization → Integrations in the dashboard.

Last updated on