Tokenization & Masking
Slim.io provides two primary mechanisms for protecting sensitive data discovered during scans or intercepted by Endpoint DLP: tokenization (reversible encryption) and masking (irreversible redaction). This page covers how each works, when to use them, and how to interact with the tokenization API.
Tokenization (Reversible)
Tokenization replaces sensitive values with encrypted tokens using AES-256 encryption with message authentication. The original value can be recovered by authorized users through a time-limited grant, making tokenization suitable for scenarios where the data may be needed later.
Three Tokenization Modes
Slim.io supports three tokenization modes, each with different properties:
| Mode | Best for | Properties |
|---|---|---|
| Deterministic | Cross-system joins, deduplication | Same input always produces the same token. Enables matching across CRM, warehouse, and logs without revealing the original value. |
| Randomized | Maximum security | Each tokenization produces a unique token, even for the same input. No joinability. |
| Format-Preserving | Downstream validation | Output has the same format as the input (e.g., SSN stays 9 digits, phone keeps country code). Passes downstream schema validation. |
How It Works
- The original PII value is normalized (e.g., SSN dashes stripped, email lowercased).
- A per-tenant encryption key is retrieved from the key management system.
- The value is encrypted using the selected mode.
- The encrypted token replaces the original value.
- A token record is stored in the vault with full audit metadata.
Original: "123-45-6789"
Tokenized: "slim_AQAB8Gt9hPzI5Qx7kM2..." (deterministic/randomized)
or: "541-89-6743" (format-preserving)Token Properties
Each token is:
- Authenticated — Includes a message authentication tag bound to the tenant, PII type, and scope. Tampering is detected before any decryption is attempted.
- Tenant-scoped — Cannot be decrypted by any other tenant’s keys.
- URL-safe — Uses base64url encoding, suitable for HTTP headers, query strings, and JSON.
- Audited — Every tokenization and detokenization is recorded in the decision audit trail.
Deterministic Scope
For deterministic mode, you control the linkability boundary — how broadly the same input produces the same token:
| Scope | Joinability | Use case |
|---|---|---|
| Global | Same token across all connectors | Cross-system joins (CRM + warehouse + logs) |
| Connector | Same token within one data source | Intra-system matching only |
| Resource | Same token within one table/bucket | Intra-table deduplication |
| Field | Same token within one column | Maximum isolation |
Key Management
Slim.io uses a three-tier key hierarchy:
- Root Key — Stored in a cloud-managed hardware security module (HSM), never exported.
- Per-Tenant Key — Derived per tenant with automatic rotation (default: every 90 days).
- Data Encryption Key (DEK) — Ephemeral key used for each encryption operation, wrapped by the tenant key.
Key rotation creates new DEKs while retaining previous keys for decrypting older tokens.
Format-Preserving Tokenization
For use cases where the token must maintain the same format as the original value:
| PII Type | Input | Output | Preserved |
|---|---|---|---|
| US SSN | 123-45-6789 | 541-89-6743 | Last 3 digits cleartext |
| Phone | (555) 123-4567 | (555) 987-6543 | Country + area code |
| Passport (US) | 123456789 | 987654321 | 9-digit format |
| Monetary | $1,234.56 | $5,678.90 | Currency format |
Format-preserving tokenization uses a smaller cryptographic domain than standard AES-256 tokenization. For PII types with small domains (e.g., SSN), your policy must explicitly acknowledge this tradeoff. Use deterministic or randomized mode when format preservation is not required.
Tokenization Policies
Policies control what gets tokenized, in which mode, and who can detokenize.
Creating a Policy
Navigate to Tokenization → Token Policies in the dashboard, or use the API:
POST /api/v1/tokens/policies
Authorization: Bearer $TOKEN
Content-Type: application/json
{
"name": "HIPAA Compliance Policy",
"type_rules": {
"us_ssn": {
"mode": "format_preserving",
"min_confidence": 0.85,
"deterministic_scope": "global"
},
"email": {
"mode": "deterministic",
"min_confidence": 0.8
}
},
"allowed_roles": ["admin", "privacy_officer"],
"allowed_purposes": ["customer_support", "regulatory"],
"require_reason": true,
"failure_policy": "fail_closed"
}Policy Evaluation
When a value is tokenized, the policy is evaluated in a deterministic order:
- Emergency disable check
- Policy enabled check
- PII type rule lookup
- Confidence threshold check
- Context label filters
- Mode selection
- Tokenize or skip
Every decision is logged in the audit trail with the policy ID and version.
Detokenization (Grant-Based)
Detokenization requires a time-limited grant — you cannot decrypt tokens directly. This ensures every access is authorized, audited, and bounded.
Grant Flow
- Request a grant with the token IDs, purpose, and reason.
- The system validates your role and purpose against the policy.
- If authorized, a single-use grant is issued (valid for 5 minutes).
- Use the grant to detokenize the specified tokens.
- The grant is consumed — it cannot be reused.
# Step 1: Request a grant
POST /api/v1/tokens/grant
Authorization: Bearer $TOKEN
{
"token_ids": ["tok_abc123", "tok_def456"],
"purpose": "customer_support",
"reason": "Case #12345 — customer identity verification"
}
# Step 2: Detokenize with the grant
POST /api/v1/tokens/detokenize
Authorization: Bearer $TOKEN
{
"tokens": ["tok_abc123", "tok_def456"],
"grant_jwt": "eyJ..."
}Grants are single-use and expire after 5 minutes. Each grant is bound to the requesting user, tenant, and specific token IDs — it cannot be reused for different tokens or by different users.
Detokenization Audit
All detokenization events are recorded with:
- Who requested access (user ID, role)
- Why they needed access (purpose, reason text)
- Which tokens were accessed
- Whether access was granted or denied
- Timestamp and trace ID for correlation
View the audit trail at Tokenization → Detokenize Audit in the dashboard.
Masking (Irreversible)
Masking permanently removes the sensitive value by replacing it with a redaction marker. The original data cannot be recovered.
Masking Strategies
| Strategy | Output | Use Case |
|---|---|---|
redact | [REDACTED] | General-purpose redaction |
partial | ***-**-6789 | Preserve last N characters for reference |
category | [SSN] | Replace with the PII category label |
hash | a1b2c3d4... | One-way hash for deduplication without revealing the value |
null | (empty string) | Remove the value entirely |
When to Use Tokenization vs. Masking
| Consideration | Tokenization | Masking |
|---|---|---|
| Reversibility | Yes — authorized users can decrypt via grant | No — original value is destroyed |
| Incident investigation | Preferred — analysts can reveal values when needed | Not suitable — values are permanently lost |
| Compliance (right to erasure) | Delete the encryption key to render all tokens irrecoverable | Already irreversible by design |
| AI/LLM workflows | Tokens in LLM output can be rehydrated for authorized users | Masked values cannot be recovered |
| Cross-system joins | Deterministic tokens enable joins without revealing values | Masked values cannot be correlated |
| Downstream processing | Format-preserving tokens maintain data structure | Masked values may break downstream schemas |
A common pattern is to tokenize during initial detection and mask only when a governance policy explicitly requires permanent redaction — for example, before data is exported to a third-party system.
SDK Integration
Tokenization is available via the Python and Node.js SDKs:
# Python
from slim_tokens import SlimTokens
client = SlimTokens(api_key="slim_...")
result = client.tokenize(
items=[{"value": "123-45-6789", "pii_type": "us_ssn"}],
policy_id="pol_hipaa",
)
print(result.results[0].token) # slim_AQAB...// Node.js
import { SlimTokens } from '@slim-io/tokens';
const client = new SlimTokens({ apiKey: 'slim_...' });
const result = await client.tokenize({
items: [{ value: '123-45-6789', piiType: 'us_ssn' }],
policyId: 'pol_hipaa',
});
console.log(result.results[0].token); // slim_AQAB...LLM Context Rehydration
When your LLM receives tokenized data (tokens in the context or prompt), slim.io can automatically detect and replace them with the original values — subject to your detokenization policy. This enables AI workflows to operate on real data while maintaining access controls.
Rehydration is bounded: a maximum of 100 tokens per request, with output size limits to prevent amplification attacks. Cross-tenant tokens are automatically blocked.
Configure rehydration settings at Tokenization → Integrations in the dashboard.