Scanner Fleet
The Scanner Fleet is the distributed infrastructure that executes scans across your cloud environments. Slim.io supports two scanning modes: agentless cloud scanning for object storage providers, and connector-based scanning for databases and SaaS applications.
Agentless Cloud Scanning
Agentless scanning connects directly to cloud storage APIs without deploying any software in your environment. Slim.io authenticates using cross-account roles (AWS), Workload Identity Federation (GCP), or service principals (Azure) and reads objects remotely.
Supported Providers
| Provider | Service | Auth Method | Scanning Mode |
|---|---|---|---|
| AWS | S3 | IAM Role (cross-account assume role) | Agentless |
| GCP | Cloud Storage | Workload Identity Federation | Agentless |
| Azure | Blob Storage | Service Principal (OAuth2) | Agentless |
How It Works
- Authentication — Slim.io assumes the configured role/identity with read-only permissions
- Enumeration — Lists all objects in the scoped buckets/containers, applying prefix and type filters
- Streaming Download — Objects are streamed in configurable chunks (default 4 MB) to the scan workers
- Detection — Each chunk passes through the PII detection pipeline
- Chunk Overlap — 512 bytes of overlap between chunks catches PII values that span chunk boundaries
- Finding Storage — Detected findings are stored with full metadata (location, classifier, confidence)
Agentless scanning does not require any infrastructure in your cloud account. Slim.io’s workers download objects directly from your storage using temporary credentials. For environments that prohibit external data transfer, use BYOC mode instead.
Incremental Enumeration
After an initial full enumeration, subsequent scans use incremental enumeration to only process changed objects. The since parameter filters resources by modification time:
- AWS S3 — Uses
ListObjectsV2with modification time filtering - GCP GCS — Uses object metadata
updatedtimestamp - Azure Blob — Uses
Last-Modifiedheader filtering
This reduces enumeration time from minutes to seconds on large buckets.
Connector-Based Scanning
Connector-based scanning covers databases and SaaS applications that require protocol-specific access. Each connector implements a standardized interface for authentication, resource enumeration, and data streaming.
Supported Connectors
Databases
| Provider | Protocol | Auth Method |
|---|---|---|
| PostgreSQL | Native wire protocol | Username/password, SSL certificates |
| MySQL | Native wire protocol | Username/password, SSL |
| Microsoft SQL Server | TDS protocol | SQL auth, Windows auth |
| Oracle | Oracle Net | Username/password, wallet |
| IBM DB2 | DRDA protocol | Username/password |
| Snowflake | REST API + Arrow | Username/password, key pair |
| Databricks | REST API + SQL | Personal access token, OAuth |
SaaS Applications
| Provider | API | Auth Method |
|---|---|---|
| Salesforce | REST API | OAuth 2.0 (connected app) |
| Slack | Web API | Bot token (OAuth 2.0) |
| Microsoft Teams | Graph API | App registration (OAuth 2.0) |
| Google Drive | Drive API | Service account (OAuth 2.0) |
| OneDrive | Graph API | App registration (OAuth 2.0) |
| SharePoint | Graph API | App registration (OAuth 2.0) |
Database Scanning Architecture
Database connectors use server-side cursors to stream results without loading entire tables into memory:
- Connect — Establish connection with provided credentials, enforce read-only mode where supported
- Enumerate — List all schemas, tables, and columns with metadata
- Sample — For each table, sample a configurable number of rows (default: 10,000) using randomized sampling
- Classify — Run PII detection on sampled values, classify at the column level
- Disconnect — Clean up connection and cursor resources
Database connectors always request the minimum necessary permissions. Slim.io recommends creating a dedicated read-only database user for scanning. Never provide credentials with write or administrative access.
SaaS Scanning Architecture
SaaS connectors use provider APIs to enumerate and stream content:
- Authenticate — Exchange OAuth credentials for access tokens
- Enumerate — List channels, drives, sites, or objects using provider-specific APIs
- Stream — Download message content, file attachments, or document bodies in chunks
- Classify — Run PII detection on streamed content
- Revoke — Clean teardown of API sessions
SaaS connectors support incremental scanning using provider-native delta mechanisms (Graph API delta queries, Slack conversation history timestamps).
Scanner Fleet Management
Fleet Overview
The Scanner Fleet page in the Customer Dashboard shows:
- Active Workers — Currently running scan workers across all connectors
- Connector Status — Health and authentication status of each connector
- Scan Throughput — Files processed per minute across the fleet
- Resource Utilization — Memory and CPU usage of scan workers
Connector Lifecycle
Each connector follows a standard lifecycle:
Create → Test → Active → Scan → Disconnect → Delete| State | Description |
|---|---|
| Created | Connector configured with credentials and scope |
| Testing | Credential validation in progress |
| Active | Credentials validated, ready for scanning |
| Scanning | Scan currently in progress |
| Error | Credential or permission issue detected |
| Disconnected | Temporarily disabled, configuration preserved |
| Deleted | Permanently removed with all associated data |
Health Monitoring
Connector health is checked automatically:
- Credential Validity — Periodic test of authentication (tokens, roles, secrets)
- API Availability — Provider API endpoint reachability
- Permission Scope — Verification that required permissions are still granted
- Rate Limit Status — Current position relative to provider rate limits
Unhealthy connectors display a warning badge and are excluded from scheduled scans until the issue is resolved.
Token Generation and Rotation
Scanner Tokens
For BYOC deployments, scanner agents authenticate to the Slim.io control plane using scanner tokens. These tokens authorize the agent to:
- Report scan progress and findings
- Receive scan job assignments
- Download classifier configurations
Generating a Token
- Navigate to Settings > Scanner Fleet in the Customer Dashboard.
- Click Generate Token.
- Copy the token immediately — it is only shown once.
- Configure the scanner agent with the token via environment variable:
SLIM_SCANNER_TOKEN=slt_xxxxxxxxxxxxxxxxxxxxToken Rotation
Tokens should be rotated regularly (recommended: every 90 days):
- Generate a new token (the old token remains valid).
- Update the scanner agent configuration with the new token.
- Verify the agent connects successfully with the new token.
- Revoke the old token under Settings > Scanner Fleet > Active Tokens.
Revoking a token immediately disconnects any scanner agent using it. Always deploy the new token before revoking the old one to avoid downtime.
Token Permissions
Scanner tokens are scoped to specific operations:
| Permission | Description |
|---|---|
scan:execute | Run scan jobs assigned by the control plane |
scan:report | Submit findings and progress updates |
config:read | Download classifier configurations and scan parameters |
heartbeat | Send periodic health status updates |
Tokens do not have access to customer data, dashboard operations, or administrative functions.
Scanner Profiles
Scanner images are available in four profiles, each optimized for a specific category of data sources. Choose the profile that matches your connectors to minimize image size and attack surface.
| Profile | Supported Connectors | Use Case |
|---|---|---|
| cloud-storage | AWS S3, GCP Cloud Storage, Azure Blob | Cloud object storage scanning |
| database | PostgreSQL, MySQL, MSSQL, Oracle, Snowflake, Databricks, DB2 | Relational and analytical database scanning |
| saas | Slack, Teams, Salesforce, Google Drive, OneDrive, SharePoint | SaaS application and collaboration tool scanning |
| full | All of the above (17 connector types) | Universal scanner for mixed environments |
When deploying BYOC scanners, specify the profile and version in your deployment configuration:
# Docker (always use a pinned version — never :latest)
docker pull slimio/scanner:1.0.0-database
# Kubernetes
image: slimio/scanner:1.0.0-cloud-storageScanners automatically register their capabilities with the control plane. Jobs are only assigned to scanners that have the required connectors installed. A database scan will never be sent to a cloud-storage scanner.
Scanner Versioning
Scanner images use explicit version tags following {major}.{minor}.{patch}-{profile} format. We never publish a :latest tag for BYOC images to prevent accidental upgrades.
Version Notifications
The Scanner Fleet page shows an Update Available badge when a scanner is running an older version:
- BYOC scanners — You update at your own pace. The badge tells you a newer version is available with the changelog.
- Slim.io-managed scanners — Updates are applied automatically via rolling deployment. No action required.
Safe Update Guarantee
Scanner updates never interrupt an active scan. The update process:
- The scanner finishes all in-progress scan jobs
- No new jobs are assigned during the update window
- The new version is deployed
- Health is verified before resuming job assignment
This means you can update scanners at any time without worrying about data loss or interrupted scans.
Scan Completeness
Every scan result includes a completeness score (0.0–1.0) that quantifies how thoroughly the scan covered the target data:
- 1.0 — All enumerated resources were scanned successfully
- High — Most resources scanned, some skipped due to unsupported format or size
- Medium — Significant resources skipped due to access issues or errors
- 0.0 — Scan could not enumerate or access the target data
The completeness score is computed from actual per-resource outcomes, not estimates. Every resource is accounted for in the coverage report with one of these statuses:
| Status | Meaning |
|---|---|
| Scanned | Successfully processed through the detection pipeline |
| Skipped (format) | Binary or unsupported format, automatically excluded |
| Skipped (access denied) | Insufficient permissions to read the resource |
| Skipped (size) | Resource exceeds the maximum scannable size for this connector |
| Failed | Scan attempted but encountered an error |
| Cancelled | Scan was cancelled before this resource was reached |
A low completeness score indicates that the scan may have missed sensitive data. Review the coverage report breakdown for details on skipped resources and remediation steps.
Scan Decision Audit Trail
Every decision the platform makes during a scan is recorded in an immutable audit trail. This provides full transparency into why specific resources were included, skipped, or handled differently.
What Gets Logged
| Decision | Example |
|---|---|
| Resource included | Resource matched scan scope and format filters |
| Resource skipped (binary) | File classified as binary format, excluded per policy |
| Resource skipped (format) | Unsupported file format (e.g., video, executable) |
| Resource skipped (size) | File exceeds maximum scannable size |
| Resource skipped (access) | Insufficient permissions to read the resource |
| Budget exceeded | Scan reached its time or cost budget before processing this resource |
| Coverage target met | Scan reached its configured coverage target and stopped |
| Circuit isolation triggered | Connector type experiencing failures, paused to prevent cascade |
| Resource truncated | File partially scanned due to size limits |
Viewing the Audit Trail
The decision trail is accessible from:
- Scan Detail page — Click any scan to see the full decision log with filters by decision type
- Admin API — Query decisions programmatically for compliance reporting
The decision audit trail is designed for compliance and forensics. When an auditor asks “why wasn’t this bucket fully scanned?”, the trail provides a timestamped, per-resource answer.
Adding a Scanner
Click Add Scanner in the Scanner Fleet page to deploy a new scanner:
Slim.io Managed
Select “Slim.io manages it” — we provision and manage the scanner for you:
- Choose a scanner profile (cloud-storage, database, saas, or full)
- Click Deploy — the scanner is provisioned automatically
- The scanner appears in your fleet within ~30 seconds
- No infrastructure setup required on your end
Scanner count is managed by your plan. Contact your administrator if you need more scanners.
Self-Hosted (BYOC)
Select “I’ll deploy in my environment” — you run the scanner in your own VPC:
- Choose a scanner profile and deployment platform (Docker, Kubernetes, or Cloud Run)
- Copy the generated deployment configuration (includes a pre-embedded registration token)
- Run the deployment command in your environment
- The scanner auto-registers and appears in your fleet
For detailed setup, see BYOC Deployment.
Both hosting modes use the same scanner software and produce identical results. The only difference is where the compute runs. Choose self-hosted when data must remain within your network boundary.
Connector Health Alerts
Configure alerts per connector to stay informed about health issues:
| Alert | Default | Description |
|---|---|---|
| Credential Expiring | ON (7 days before) | Warns before credentials expire so you can rotate proactively |
| Authentication Failed | ON (immediate) | Notifies when a connector fails to authenticate |
| Connector Offline | OFF | Alerts when a connector is unreachable. Enable per your needs — some connectors go offline intentionally during maintenance. |
| Health Degraded | ON | Alerts when a connector’s health score drops below threshold |
Alert thresholds and notification channels (Slack, Teams, Google Chat) are configurable per connector in the connector settings panel.
Learn More
- Parallel Scanning Engine — Worker scaling and chunk partitioning
- Connectors Overview — Provider-specific setup guides
- BYOC Deployment — Deploying scanners in your cloud
- Scan Management — Starting, scheduling, and controlling scans