Tokenization & Masking

Slim.io provides two primary mechanisms for protecting sensitive data discovered during scans or intercepted by Endpoint DLP: tokenization (reversible encryption) and masking (irreversible redaction). This page covers how each works, when to use them, and how to interact with the tokenization API.

Tokenization (Reversible)

Tokenization replaces sensitive values with encrypted tokens using AES-256 encryption with message authentication. The original value can be recovered by authorized users through a time-limited grant, making tokenization suitable for scenarios where the data may be needed later.

Three Tokenization Modes

Slim.io supports three tokenization modes, each with different properties:

Mode	Best for	Properties
Deterministic	Cross-system joins, deduplication	Same input always produces the same token. Enables matching across CRM, warehouse, and logs without revealing the original value.
Randomized	Maximum security	Each tokenization produces a unique token, even for the same input. No joinability.
Format-Preserving	Downstream validation	Output has the same format as the input (e.g., SSN stays 9 digits, phone keeps country code). Passes downstream schema validation.

How It Works

The original PII value is normalized (e.g., SSN dashes stripped, email lowercased).
A per-tenant encryption key is retrieved from the key management system.
The value is encrypted using the selected mode.
The encrypted token replaces the original value.
A token record is stored in the vault with full audit metadata.


Original:  "123-45-6789"
Tokenized: "slim_AQAB8Gt9hPzI5Qx7kM2..."  (deterministic/randomized)
       or: "541-89-6743"                    (format-preserving)

Token Properties

Each token is:

Authenticated — Includes a message authentication tag bound to the tenant, PII type, and scope. Tampering is detected before any decryption is attempted.
Tenant-scoped — Cannot be decrypted by any other tenant’s keys.
URL-safe — Uses base64url encoding, suitable for HTTP headers, query strings, and JSON.
Audited — Every tokenization and detokenization is recorded in the decision audit trail.

Deterministic Scope

For deterministic mode, you control the linkability boundary — how broadly the same input produces the same token:

Scope	Joinability	Use case
Global	Same token across all connectors	Cross-system joins (CRM + warehouse + logs)
Connector	Same token within one data source	Intra-system matching only
Resource	Same token within one table/bucket	Intra-table deduplication
Field	Same token within one column	Maximum isolation

Key Management

Slim.io uses a three-tier key hierarchy:

Root Key — Stored in a cloud-managed hardware security module (HSM), never exported.
Per-Tenant Key — Derived per tenant with automatic rotation (default: every 90 days).
Data Encryption Key (DEK) — Ephemeral key used for each encryption operation, wrapped by the tenant key.

Key rotation creates new DEKs while retaining previous keys for decrypting older tokens.

Format-Preserving Tokenization

For use cases where the token must maintain the same format as the original value:

PII Type	Input	Output	Preserved
US SSN	`123-45-6789`	`541-89-6743`	Last 3 digits cleartext
Phone	`(555) 123-4567`	`(555) 987-6543`	Country + area code
Passport (US)	`123456789`	`987654321`	9-digit format
Monetary	`$1,234.56`	`$5,678.90`	Currency format

Format-preserving tokenization uses a smaller cryptographic domain than standard AES-256 tokenization. For PII types with small domains (e.g., SSN), your policy must explicitly acknowledge this tradeoff. Use deterministic or randomized mode when format preservation is not required.

Tokenization Policies

Policies control what gets tokenized, in which mode, and who can detokenize.

Creating a Policy

Navigate to Tokenization → Token Policies in the dashboard, or use the API:


POST /api/v1/tokens/policies
Authorization: Bearer $TOKEN
Content-Type: application/json
 
{
  "name": "HIPAA Compliance Policy",
  "type_rules": {
    "us_ssn": {
      "mode": "format_preserving",
      "min_confidence": 0.85,
      "deterministic_scope": "global"
    },
    "email": {
      "mode": "deterministic",
      "min_confidence": 0.8
    }
  },
  "allowed_roles": ["admin", "privacy_officer"],
  "allowed_purposes": ["customer_support", "regulatory"],
  "require_reason": true,
  "failure_policy": "fail_closed"
}

Policy Evaluation

When a value is tokenized, the policy is evaluated in a deterministic order:

Emergency disable check
Policy enabled check
PII type rule lookup
Confidence threshold check
Context label filters
Mode selection
Tokenize or skip

Every decision is logged in the audit trail with the policy ID and version.

Detokenization (Grant-Based)

Detokenization requires a time-limited grant — you cannot decrypt tokens directly. This ensures every access is authorized, audited, and bounded.

Grant Flow

Request a grant with the token IDs, purpose, and reason.
The system validates your role and purpose against the policy.
If authorized, a single-use grant is issued (valid for 5 minutes).
Use the grant to detokenize the specified tokens.
The grant is consumed — it cannot be reused.


# Step 1: Request a grant
POST /api/v1/tokens/grant
Authorization: Bearer $TOKEN
 
{
  "token_ids": ["tok_abc123", "tok_def456"],
  "purpose": "customer_support",
  "reason": "Case #12345 — customer identity verification"
}
 
# Step 2: Detokenize with the grant
POST /api/v1/tokens/detokenize
Authorization: Bearer $TOKEN
 
{
  "tokens": ["tok_abc123", "tok_def456"],
  "grant_jwt": "eyJ..."
}

Grants are single-use and expire after 5 minutes. Each grant is bound to the requesting user, tenant, and specific token IDs — it cannot be reused for different tokens or by different users.

Detokenization Audit

All detokenization events are recorded with:

Who requested access (user ID, role)
Why they needed access (purpose, reason text)
Which tokens were accessed
Whether access was granted or denied
Timestamp and trace ID for correlation

View the audit trail at Tokenization → Detokenize Audit in the dashboard.

Masking (Irreversible)

Masking permanently removes the sensitive value by replacing it with a redaction marker. The original data cannot be recovered.

Masking Strategies

Strategy	Output	Use Case
`redact`	`[REDACTED]`	General-purpose redaction
`partial`	`*--6789`	Preserve last N characters for reference
`category`	`[SSN]`	Replace with the PII category label
`hash`	`a1b2c3d4...`	One-way hash for deduplication without revealing the value
`null`	(empty string)	Remove the value entirely

When to Use Tokenization vs. Masking

Consideration	Tokenization	Masking
Reversibility	Yes — authorized users can decrypt via grant	No — original value is destroyed
Incident investigation	Preferred — analysts can reveal values when needed	Not suitable — values are permanently lost
Compliance (right to erasure)	Delete the encryption key to render all tokens irrecoverable	Already irreversible by design
AI/LLM workflows	Tokens in LLM output can be rehydrated for authorized users	Masked values cannot be recovered
Cross-system joins	Deterministic tokens enable joins without revealing values	Masked values cannot be correlated
Downstream processing	Format-preserving tokens maintain data structure	Masked values may break downstream schemas

A common pattern is to tokenize during initial detection and mask only when a governance policy explicitly requires permanent redaction — for example, before data is exported to a third-party system.

SDK Integration

Tokenization is available via the Python and Node.js SDKs:


# Python
from slim_tokens import SlimTokens
 
client = SlimTokens(api_key="slim_...")
result = client.tokenize(
    items=[{"value": "123-45-6789", "pii_type": "us_ssn"}],
    policy_id="pol_hipaa",
)
print(result.results[0].token)  # slim_AQAB...


// Node.js
import { SlimTokens } from '@slim-io/tokens';
 
const client = new SlimTokens({ apiKey: 'slim_...' });
const result = await client.tokenize({
  items: [{ value: '123-45-6789', piiType: 'us_ssn' }],
  policyId: 'pol_hipaa',
});
console.log(result.results[0].token);  // slim_AQAB...

LLM Context Rehydration

When your LLM receives tokenized data (tokens in the context or prompt), slim.io can automatically detect and replace them with the original values — subject to your detokenization policy. This enables AI workflows to operate on real data while maintaining access controls.

Rehydration is bounded: a maximum of 100 tokens per request, with output size limits to prevent amplification attacks. Cross-tenant tokens are automatically blocked.

Configure rehydration settings at Tokenization → Integrations in the dashboard.