Skip to Content
Getting StartedKey Concepts

Key Concepts

This page defines the core terminology and data model used throughout the Slim.io platform. Understanding these concepts will help you navigate the documentation and configure the platform effectively.

Connectors

A connector represents a configured link between Slim.io and a cloud storage provider. Each connector encapsulates:

  • Provider type — AWS S3, Google Cloud Storage, or Azure Blob Storage
  • Credentials — IAM Role ARN, Workload Identity Federation config, or Service Principal details
  • Scope — Which buckets or containers the connector has access to scan
  • Status — Whether the connector is active, disconnected, or in an error state

Connectors are the foundation of all scanning operations. You must have at least one active connector before running scans.

Scans

A scan is a job that processes files in a connected cloud storage location to detect sensitive data. Scans have several properties:

  • Scan Type — Full (processes all files), Incremental (processes only new or modified files since last scan), or Event-Driven (triggered by storage events)
  • Parallelism — Scans run as distributed jobs across multiple workers for throughput
  • Lifecycle — Created → Queued → Running → Completed (or Failed/Cancelled)
  • Tier Limits — Maximum file count and scan frequency are governed by your subscription tier

Findings

A finding is a single instance of detected sensitive data within a file or database column. Each finding includes:

  • PII Category — The type of sensitive data detected (SSN, email, credit card number, etc.)
  • Confidence Score — A value from 0.0 to 1.0 indicating detection certainty
  • Location — The byte offset, line number, field path, or database database.schema.table.column reference
  • Classifier — Which classifier produced the finding
  • Detection Method — How the finding was produced. Slim.io tags every finding with one of the labels below for legal transparency and auditing:
    • pattern — Match produced by a structural rule against the value
    • validated — Pattern match that additionally passed an algorithmic validation (for example, the Luhn check on a credit-card number, or a checksum on a national ID format)
    • contextual — Confidence boosted because the surrounding context (column name, nearby keywords, neighbouring fields) supports the classification

PII Categories

Slim.io recognizes a broad set of PII categories out of the box:

CategoryExamples
Personal IdentifiersSSN, passport number, driver’s license, national ID
Contact InformationEmail address, phone number, physical address
Financial DataCredit card number, bank account, routing number
Health InformationMedical record number, diagnosis codes, insurance ID
AuthenticationAPI keys, passwords, tokens, private keys
CustomAny pattern defined through custom classifiers

Risk Scores

Every finding, file, and connector receives a computed risk score from 0 to 100:

  • 0–25 — Low risk: minimal sensitive data exposure
  • 26–50 — Medium risk: moderate PII presence, review recommended
  • 51–75 — High risk: significant sensitive data, action required
  • 76–100 — Critical risk: large-scale PII exposure, immediate remediation needed

Risk scores factor in data volume, sensitivity category, exposure level (public vs. private buckets), and whether governance policies cover the findings.

Tokenization

Tokenization replaces sensitive data values with encrypted tokens using authenticated AES-256 encryption. Key properties:

  • Reversible — Authorized users can decrypt tokens back to the original value
  • Format-Preserving — Tokens maintain a consistent format for downstream system compatibility
  • Key Management — Encryption keys are managed per-tenant and rotated on a configurable schedule

Tokenization is an optional remediation action. You can configure governance policies to automatically tokenize specific PII categories upon detection.

Classifiers

A classifier is a detection rule that identifies specific types of sensitive data. Slim.io supports multiple classifier types:

  • Regex — Pattern matching against regular expressions (e.g., SSN format \d{3}-\d{2}-\d{4})
  • ML Model — Machine learning models trained on labeled datasets
  • Dictionary — Lookup against known value lists (e.g., medical terms, country names)
  • Proximity — Contextual detection based on nearby keywords (e.g., “SSN:” near a number pattern)
  • Checksum — Validation algorithms (e.g., Luhn check for credit card numbers)

Classifiers can be built-in (shipped with Slim.io) or custom-defined via YAML configuration.

Policies

A policy is a governance rule that defines what actions to take when specific conditions are met. Policies are defined in YAML and include:

  • Scope — Which connectors, buckets, or file types the policy applies to
  • Conditions — What findings trigger the policy (PII category, confidence threshold, risk score)
  • Actions — What happens when conditions are met (alert, tokenize, mask, quarantine, notify)
  • Mode — Dry-run (log only) or enforced (take action)

Workspaces

A workspace provides multi-tenant isolation within a single Slim.io organization. Each workspace:

  • Has its own set of connectors, scans, and findings
  • Supports role-based access control (RBAC) with Admin, Editor, and Viewer roles
  • Can be mapped to business units, teams, or compliance boundaries
  • Maintains isolated scan quotas and governance policies
Last updated on