Scanner Fleet

The Scanner Fleet is the distributed infrastructure that executes scans across your cloud environments. Slim.io supports two scanning modes: agentless cloud scanning for object storage providers, and connector-based scanning for databases and SaaS applications.

Deployment Topologies

You can configure where scanners run and where bytes are read on a per-connector basis. The platform supports four topologies, gated by your contract:

Topology	Scanner runs in	Bytes read by	When to use
Slim-Hosted	slim.io infrastructure	slim.io workers (cross-account access)	Default for most customers — fastest onboarding, no infra to deploy.
In-Customer-Cloud Agentless	Your cloud account, slim.io managed	A scanner deployed inside your AWS / GCP / Azure account — bytes never leave your tenancy	Regulated industries that prohibit external data transfer. slim.io provides a Terraform module; the scanner registers automatically once you `terraform apply`.
BYOC (Bring Your Own Cloud)	Your cloud account, you manage	Your scanner deployment	Air-gapped environments, fully customer-operated infrastructure, custom orchestration requirements. See BYOC.
Hybrid	Mix of the above per connector	Per-connector	Larger fleets where some sources can use Slim-Hosted (fast) and others need in-customer-cloud (compliance).

When you create a connector, the wizard adapts to the topologies allowed for your account — if only one is allowed, the wizard skips the picker; if multiple are allowed, you choose per connector. The chosen topology is immutable for that connector so the audit trail of what was scanned where stays clean. To change topology, create a new connector and migrate.

The connector’s hosting topology is shown alongside its other metadata in your Customer Dashboard, so you can see at a glance which connectors are running where.

Agentless Cloud Scanning

Agentless scanning connects directly to cloud storage APIs without deploying any software in your environment. Slim.io authenticates using cross-account roles (AWS), Workload Identity Federation (GCP), or service principals (Azure) and reads objects remotely.

Supported Providers

Provider	Service	Auth Method	Scanning Mode
AWS	S3	IAM Role (cross-account assume role)	Agentless
GCP	Cloud Storage	Workload Identity Federation	Agentless
Azure	Blob Storage	Service Principal (OAuth2)	Agentless

How It Works

Authentication — Slim.io assumes the configured role/identity with read-only permissions
Enumeration — Lists all objects in the scoped buckets/containers, applying prefix and type filters
Streaming Download — Objects are streamed in configurable chunks (default 4 MB) to the scan workers
Detection — Each chunk passes through the PII detection pipeline
Chunk Overlap — 512 bytes of overlap between chunks catches PII values that span chunk boundaries
Finding Storage — Detected findings are stored with full metadata (location, classifier, confidence)

Agentless scanning does not require any infrastructure in your cloud account. Slim.io’s workers download objects directly from your storage using temporary credentials. For environments that prohibit external data transfer, use BYOC mode instead.

Incremental Enumeration

After an initial full enumeration, subsequent scans use incremental enumeration to only process changed objects. The since parameter filters resources by modification time:

AWS S3 — Uses ListObjectsV2 with modification time filtering
GCP GCS — Uses object metadata updated timestamp
Azure Blob — Uses Last-Modified header filtering

This reduces enumeration time from minutes to seconds on large buckets.

Connector-Based Scanning

Connector-based scanning covers databases and SaaS applications that require protocol-specific access. Each connector implements a standardized interface for authentication, resource enumeration, and data streaming.

Supported Connectors

Databases

Provider	Protocol	Auth Method
PostgreSQL	Native wire protocol	Username/password, SSL certificates
MySQL	Native wire protocol	Username/password, SSL
Microsoft SQL Server	TDS protocol	SQL auth, Windows auth
Oracle	Oracle Net	Username/password, wallet
IBM DB2	DRDA protocol	Username/password
Snowflake	REST API + Arrow	Username/password, key pair
Databricks	REST API + SQL	Personal access token, OAuth

SaaS Applications

Provider	API	Auth Method
Salesforce	REST API	OAuth 2.0 (connected app)
Slack	Web API	Bot token (OAuth 2.0)
Microsoft Teams	Graph API	App registration (OAuth 2.0)
Google Drive	Drive API	Service account (OAuth 2.0)
OneDrive	Graph API	App registration (OAuth 2.0)
SharePoint	Graph API	App registration (OAuth 2.0)

Database Scanning Architecture

Database connectors use server-side cursors to stream results without loading entire tables into memory:

Connect — Establish connection with provided credentials, enforce read-only mode where supported
Enumerate — List all schemas, tables, and columns with metadata
Sample — For each table, sample a configurable number of rows (default: 10,000) using randomized sampling
Classify — Run PII detection on sampled values, classify at the column level
Disconnect — Clean up connection and cursor resources

Database connectors always request the minimum necessary permissions. Slim.io recommends creating a dedicated read-only database user for scanning. Never provide credentials with write or administrative access.

SaaS Scanning Architecture

SaaS connectors use provider APIs to enumerate and stream content:

Authenticate — Exchange OAuth credentials for access tokens
Enumerate — List channels, drives, sites, or objects using provider-specific APIs
Stream — Download message content, file attachments, or document bodies in chunks
Classify — Run PII detection on streamed content
Revoke — Clean teardown of API sessions

SaaS connectors support incremental scanning using provider-native delta mechanisms (Graph API delta queries, Slack conversation history timestamps).

Scanner Fleet Management

Fleet Overview

The Scanner Fleet page in the Customer Dashboard shows:

Active Workers — Currently running scan workers across all connectors
Connector Status — Health and authentication status of each connector
Scan Throughput — Files processed per minute across the fleet
Resource Utilization — Memory and CPU usage of scan workers

Connector Lifecycle

Each connector follows a standard lifecycle:


Create → Test → Active → Scan → Disconnect → Delete

State	Description
Created	Connector configured with credentials and scope
Testing	Credential validation in progress
Active	Credentials validated, ready for scanning
Scanning	Scan currently in progress
Error	Credential or permission issue detected
Disconnected	Temporarily disabled, configuration preserved
Deleted	Permanently removed with all associated data

Health Monitoring

Connector health is checked automatically:

Credential Validity — Periodic test of authentication (tokens, roles, secrets)
API Availability — Provider API endpoint reachability
Permission Scope — Verification that required permissions are still granted
Rate Limit Status — Current position relative to provider rate limits

Unhealthy connectors display a warning badge and are excluded from scheduled scans until the issue is resolved.

Token Generation and Rotation

Scanner Tokens

For BYOC deployments, scanner agents authenticate to the Slim.io control plane using scanner tokens. These tokens authorize the agent to:

Report scan progress and findings
Receive scan job assignments
Download classifier configurations

Generating a Token

Navigate to Settings > Scanner Fleet in the Customer Dashboard.
Click Generate Token.
Copy the token immediately — it is only shown once.
Configure the scanner agent with the token via environment variable:


SLIM_SCANNER_TOKEN=slt_xxxxxxxxxxxxxxxxxxxx

Token Rotation

Tokens should be rotated regularly (recommended: every 90 days):

Generate a new token (the old token remains valid).
Update the scanner agent configuration with the new token.
Verify the agent connects successfully with the new token.
Revoke the old token under Settings > Scanner Fleet > Active Tokens.

Revoking a token immediately disconnects any scanner agent using it. Always deploy the new token before revoking the old one to avoid downtime.

Token Permissions

Scanner tokens are scoped to specific operations:

Permission	Description
`scan:execute`	Run scan jobs assigned by the control plane
`scan:report`	Submit findings and progress updates
`config:read`	Download classifier configurations and scan parameters
`heartbeat`	Send periodic health status updates

Tokens do not have access to customer data, dashboard operations, or administrative functions.

Scanner Profiles

Scanner images are available in four profiles, each optimized for a specific category of data sources. Choose the profile that matches your connectors to minimize image size and attack surface.

Profile	Supported Connectors	Use Case
cloud-storage	AWS S3, GCP Cloud Storage, Azure Blob	Cloud object storage scanning
database	PostgreSQL, MySQL, MSSQL, Oracle, Snowflake, Databricks, DB2	Relational and analytical database scanning
saas	Slack, Teams, Salesforce, Google Drive, OneDrive, SharePoint	SaaS application and collaboration tool scanning
full	All of the above (17 connector types)	Universal scanner for mixed environments

When deploying BYOC scanners, specify the profile and version in your deployment configuration:


# Docker (always use a pinned version — never :latest)
docker pull slimio/scanner:1.0.0-database
 
# Kubernetes
image: slimio/scanner:1.0.0-cloud-storage

Scanners automatically register their capabilities with the control plane. Jobs are only assigned to scanners that have the required connectors installed. A database scan will never be sent to a cloud-storage scanner.

Scanner Versioning

Scanner images use explicit version tags following {major}.{minor}.{patch}-{profile} format. We never publish a :latest tag for BYOC images to prevent accidental upgrades.

Version Notifications

The Scanner Fleet page shows an Update Available badge when a scanner is running an older version:

BYOC scanners — You update at your own pace. The badge tells you a newer version is available with the changelog.
Slim.io-managed scanners — Updates are applied automatically via rolling deployment. No action required.

Safe Update Guarantee

Scanner updates never interrupt an active scan. The update process:

The scanner finishes all in-progress scan jobs
No new jobs are assigned during the update window
The new version is deployed
Health is verified before resuming job assignment

This means you can update scanners at any time without worrying about data loss or interrupted scans.

Scan Completeness

Every scan result includes a completeness score (0.0–1.0) that quantifies how thoroughly the scan covered the target data:

1.0 — All enumerated resources were scanned successfully
High — Most resources scanned, some skipped due to unsupported format or size
Medium — Significant resources skipped due to access issues or errors
0.0 — Scan could not enumerate or access the target data

The completeness score is computed from actual per-resource outcomes, not estimates. Every resource is accounted for in the coverage report with one of these statuses:

Status	Meaning
Scanned	Successfully processed through the detection pipeline
Skipped (format)	Binary or unsupported format, automatically excluded
Skipped (access denied)	Insufficient permissions to read the resource
Skipped (size)	Resource exceeds the maximum scannable size for this connector
Failed	Scan attempted but encountered an error
Cancelled	Scan was cancelled before this resource was reached

A low completeness score indicates that the scan may have missed sensitive data. Review the coverage report breakdown for details on skipped resources and remediation steps.

Scan Decision Audit Trail

Every decision the platform makes during a scan is recorded in an immutable audit trail. This provides full transparency into why specific resources were included, skipped, or handled differently.

What Gets Logged

Decision	Example
Resource included	Resource matched scan scope and format filters
Resource skipped (binary)	File classified as binary format, excluded per policy
Resource skipped (format)	Unsupported file format (e.g., video, executable)
Resource skipped (size)	File exceeds maximum scannable size
Resource skipped (access)	Insufficient permissions to read the resource
Budget exceeded	Scan reached its time or cost budget before processing this resource
Coverage target met	Scan reached its configured coverage target and stopped
Circuit isolation triggered	Connector type experiencing failures, paused to prevent cascade
Resource truncated	File partially scanned due to size limits

Viewing the Audit Trail

The decision trail is accessible from:

Scan Detail page — Click any scan to see the full decision log with filters by decision type
Compliance Export — Download the trail as CSV / JSON for archival or compliance reporting

The decision audit trail is designed for compliance and forensics. When an auditor asks “why wasn’t this bucket fully scanned?”, the trail provides a timestamped, per-resource answer.

Adding a Scanner

Click Add Scanner in the Scanner Fleet page to deploy a new scanner:

Slim.io Managed

Select “Slim.io manages it” — we provision and manage the scanner for you:

Choose a scanner profile (cloud-storage, database, saas, or full)
Click Deploy — the scanner is provisioned automatically
The scanner appears in your fleet within ~30 seconds
No infrastructure setup required on your end

Scanner count is managed by your plan. Contact your administrator if you need more scanners.

Self-Hosted (BYOC)

Select “I’ll deploy in my environment” — you run the scanner in your own VPC:

Choose a scanner profile and deployment platform (Docker, Kubernetes, or Cloud Run)
Copy the generated deployment configuration (includes a pre-embedded registration token)
Run the deployment command in your environment
The scanner auto-registers and appears in your fleet

For detailed setup, see BYOC Deployment.

Both hosting modes use the same scanner software and produce identical results. The only difference is where the compute runs. Choose self-hosted when data must remain within your network boundary.

Connector Health Alerts

Configure alerts per connector to stay informed about health issues:

Alert	Default	Description
Credential Expiring	ON (7 days before)	Warns before credentials expire so you can rotate proactively
Authentication Failed	ON (immediate)	Notifies when a connector fails to authenticate
Connector Offline	OFF	Alerts when a connector is unreachable. Enable per your needs — some connectors go offline intentionally during maintenance.
Health Degraded	ON	Alerts when a connector’s health score drops below threshold

Alert thresholds and notification channels (Slack, Teams, Google Chat) are configurable per connector in the connector settings panel.

Learn More

Parallel Scanning Engine — Worker scaling and chunk partitioning
Connectors Overview — Provider-specific setup guides
BYOC Deployment — Deploying scanners in your cloud
Scan Management — Starting, scheduling, and controlling scans