Key Concepts

This page defines the core terminology and data model used throughout the Slim.io platform. Understanding these concepts will help you navigate the documentation and configure the platform effectively.

Connectors

A connector represents a configured link between Slim.io and a cloud storage provider. Each connector encapsulates:

Provider type — AWS S3, Google Cloud Storage, or Azure Blob Storage
Credentials — IAM Role ARN, Workload Identity Federation config, or Service Principal details
Scope — Which buckets or containers the connector has access to scan
Status — Whether the connector is active, disconnected, or in an error state

Connectors are the foundation of all scanning operations. You must have at least one active connector before running scans.

Scans

A scan is a job that processes files in a connected cloud storage location to detect sensitive data. Scans have several properties:

Scan Type — Full (processes all files), Incremental (processes only new or modified files since last scan), or Event-Driven (triggered by storage events)
Parallelism — Scans run as distributed jobs across multiple workers for throughput
Lifecycle — Created → Queued → Running → Completed (or Failed/Cancelled)
Tier Limits — Maximum file count and scan frequency are governed by your subscription tier

Findings

A finding is a single instance of detected sensitive data within a file or database column. Each finding includes:

PII Category — The type of sensitive data detected (SSN, email, credit card number, etc.)
Confidence Score — A value from 0.0 to 1.0 indicating detection certainty
Location — The byte offset, line number, field path, or database database.schema.table.column reference
Classifier — Which classifier produced the finding
Detection Method — How the finding was produced. Slim.io tags every finding with one of the labels below for legal transparency and auditing:
- pattern — Match produced by a structural rule against the value
- validated — Pattern match that additionally passed an algorithmic validation (for example, the Luhn check on a credit-card number, or a checksum on a national ID format)
- contextual — Confidence boosted because the surrounding context (column name, nearby keywords, neighbouring fields) supports the classification

PII Categories

Slim.io recognizes a broad set of PII categories out of the box:

Category	Examples
Personal Identifiers	SSN, passport number, driver’s license, national ID
Contact Information	Email address, phone number, physical address
Financial Data	Credit card number, bank account, routing number
Health Information	Medical record number, diagnosis codes, insurance ID
Authentication	API keys, passwords, tokens, private keys
Custom	Any pattern defined through custom classifiers

Risk Scores

Every finding, file, and connector receives a computed risk score from 0 to 100:

0–25 — Low risk: minimal sensitive data exposure
26–50 — Medium risk: moderate PII presence, review recommended
51–75 — High risk: significant sensitive data, action required
76–100 — Critical risk: large-scale PII exposure, immediate remediation needed

Risk scores factor in data volume, sensitivity category, exposure level (public vs. private buckets), and whether governance policies cover the findings.

Tokenization

Tokenization replaces sensitive data values with encrypted tokens using authenticated AES-256 encryption. Key properties:

Reversible — Authorized users can decrypt tokens back to the original value
Format-Preserving — Tokens maintain a consistent format for downstream system compatibility
Key Management — Encryption keys are managed per-tenant and rotated on a configurable schedule

Tokenization is an optional remediation action. You can configure governance policies to automatically tokenize specific PII categories upon detection.

Classifiers

A classifier is a detection rule that identifies specific types of sensitive data. Slim.io supports multiple classifier types:

Regex — Pattern matching against regular expressions (e.g., SSN format \d{3}-\d{2}-\d{4})
ML Model — Machine learning models trained on labeled datasets
Dictionary — Lookup against known value lists (e.g., medical terms, country names)
Proximity — Contextual detection based on nearby keywords (e.g., “SSN:” near a number pattern)
Checksum — Validation algorithms (e.g., Luhn check for credit card numbers)

Classifiers can be built-in (shipped with Slim.io) or custom-defined via YAML configuration.

Policies

A policy is a governance rule that defines what actions to take when specific conditions are met. Policies are defined in YAML and include:

Scope — Which connectors, buckets, or file types the policy applies to
Conditions — What findings trigger the policy (PII category, confidence threshold, risk score)
Actions — What happens when conditions are met (alert, tokenize, mask, quarantine, notify)
Mode — Dry-run (log only) or enforced (take action)

Workspaces

A workspace provides multi-tenant isolation within a single Slim.io organization. Each workspace:

Has its own set of connectors, scans, and findings
Supports role-based access control (RBAC) with Admin, Editor, and Viewer roles
Can be mapped to business units, teams, or compliance boundaries
Maintains isolated scan quotas and governance policies