Skip to Content

Scanner Fleet

The Scanner Fleet is the distributed infrastructure that executes scans across your cloud environments. Slim.io supports two scanning modes: agentless cloud scanning for object storage providers, and connector-based scanning for databases and SaaS applications.

Agentless Cloud Scanning

Agentless scanning connects directly to cloud storage APIs without deploying any software in your environment. Slim.io authenticates using cross-account roles (AWS), Workload Identity Federation (GCP), or service principals (Azure) and reads objects remotely.

Supported Providers

ProviderServiceAuth MethodScanning Mode
AWSS3IAM Role (cross-account assume role)Agentless
GCPCloud StorageWorkload Identity FederationAgentless
AzureBlob StorageService Principal (OAuth2)Agentless

How It Works

  1. Authentication — Slim.io assumes the configured role/identity with read-only permissions
  2. Enumeration — Lists all objects in the scoped buckets/containers, applying prefix and type filters
  3. Streaming Download — Objects are streamed in configurable chunks (default 4 MB) to the scan workers
  4. Detection — Each chunk passes through the PII detection pipeline
  5. Chunk Overlap — 512 bytes of overlap between chunks catches PII values that span chunk boundaries
  6. Finding Storage — Detected findings are stored with full metadata (location, classifier, confidence)

Agentless scanning does not require any infrastructure in your cloud account. Slim.io’s workers download objects directly from your storage using temporary credentials. For environments that prohibit external data transfer, use BYOC mode instead.

Incremental Enumeration

After an initial full enumeration, subsequent scans use incremental enumeration to only process changed objects. The since parameter filters resources by modification time:

  • AWS S3 — Uses ListObjectsV2 with modification time filtering
  • GCP GCS — Uses object metadata updated timestamp
  • Azure Blob — Uses Last-Modified header filtering

This reduces enumeration time from minutes to seconds on large buckets.

Connector-Based Scanning

Connector-based scanning covers databases and SaaS applications that require protocol-specific access. Each connector implements a standardized interface for authentication, resource enumeration, and data streaming.

Supported Connectors

Databases

ProviderProtocolAuth Method
PostgreSQLNative wire protocolUsername/password, SSL certificates
MySQLNative wire protocolUsername/password, SSL
Microsoft SQL ServerTDS protocolSQL auth, Windows auth
OracleOracle NetUsername/password, wallet
IBM DB2DRDA protocolUsername/password
SnowflakeREST API + ArrowUsername/password, key pair
DatabricksREST API + SQLPersonal access token, OAuth

SaaS Applications

ProviderAPIAuth Method
SalesforceREST APIOAuth 2.0 (connected app)
SlackWeb APIBot token (OAuth 2.0)
Microsoft TeamsGraph APIApp registration (OAuth 2.0)
Google DriveDrive APIService account (OAuth 2.0)
OneDriveGraph APIApp registration (OAuth 2.0)
SharePointGraph APIApp registration (OAuth 2.0)

Database Scanning Architecture

Database connectors use server-side cursors to stream results without loading entire tables into memory:

  1. Connect — Establish connection with provided credentials, enforce read-only mode where supported
  2. Enumerate — List all schemas, tables, and columns with metadata
  3. Sample — For each table, sample a configurable number of rows (default: 10,000) using randomized sampling
  4. Classify — Run PII detection on sampled values, classify at the column level
  5. Disconnect — Clean up connection and cursor resources

Database connectors always request the minimum necessary permissions. Slim.io recommends creating a dedicated read-only database user for scanning. Never provide credentials with write or administrative access.

SaaS Scanning Architecture

SaaS connectors use provider APIs to enumerate and stream content:

  1. Authenticate — Exchange OAuth credentials for access tokens
  2. Enumerate — List channels, drives, sites, or objects using provider-specific APIs
  3. Stream — Download message content, file attachments, or document bodies in chunks
  4. Classify — Run PII detection on streamed content
  5. Revoke — Clean teardown of API sessions

SaaS connectors support incremental scanning using provider-native delta mechanisms (Graph API delta queries, Slack conversation history timestamps).

Scanner Fleet Management

Fleet Overview

The Scanner Fleet page in the Customer Dashboard shows:

  • Active Workers — Currently running scan workers across all connectors
  • Connector Status — Health and authentication status of each connector
  • Scan Throughput — Files processed per minute across the fleet
  • Resource Utilization — Memory and CPU usage of scan workers

Connector Lifecycle

Each connector follows a standard lifecycle:

Create → Test → Active → Scan → Disconnect → Delete
StateDescription
CreatedConnector configured with credentials and scope
TestingCredential validation in progress
ActiveCredentials validated, ready for scanning
ScanningScan currently in progress
ErrorCredential or permission issue detected
DisconnectedTemporarily disabled, configuration preserved
DeletedPermanently removed with all associated data

Health Monitoring

Connector health is checked automatically:

  • Credential Validity — Periodic test of authentication (tokens, roles, secrets)
  • API Availability — Provider API endpoint reachability
  • Permission Scope — Verification that required permissions are still granted
  • Rate Limit Status — Current position relative to provider rate limits

Unhealthy connectors display a warning badge and are excluded from scheduled scans until the issue is resolved.

Token Generation and Rotation

Scanner Tokens

For BYOC deployments, scanner agents authenticate to the Slim.io control plane using scanner tokens. These tokens authorize the agent to:

  • Report scan progress and findings
  • Receive scan job assignments
  • Download classifier configurations

Generating a Token

  1. Navigate to Settings > Scanner Fleet in the Customer Dashboard.
  2. Click Generate Token.
  3. Copy the token immediately — it is only shown once.
  4. Configure the scanner agent with the token via environment variable:
SLIM_SCANNER_TOKEN=slt_xxxxxxxxxxxxxxxxxxxx

Token Rotation

Tokens should be rotated regularly (recommended: every 90 days):

  1. Generate a new token (the old token remains valid).
  2. Update the scanner agent configuration with the new token.
  3. Verify the agent connects successfully with the new token.
  4. Revoke the old token under Settings > Scanner Fleet > Active Tokens.

Revoking a token immediately disconnects any scanner agent using it. Always deploy the new token before revoking the old one to avoid downtime.

Token Permissions

Scanner tokens are scoped to specific operations:

PermissionDescription
scan:executeRun scan jobs assigned by the control plane
scan:reportSubmit findings and progress updates
config:readDownload classifier configurations and scan parameters
heartbeatSend periodic health status updates

Tokens do not have access to customer data, dashboard operations, or administrative functions.

Scanner Profiles

Scanner images are available in four profiles, each optimized for a specific category of data sources. Choose the profile that matches your connectors to minimize image size and attack surface.

ProfileSupported ConnectorsUse Case
cloud-storageAWS S3, GCP Cloud Storage, Azure BlobCloud object storage scanning
databasePostgreSQL, MySQL, MSSQL, Oracle, Snowflake, Databricks, DB2Relational and analytical database scanning
saasSlack, Teams, Salesforce, Google Drive, OneDrive, SharePointSaaS application and collaboration tool scanning
fullAll of the above (17 connector types)Universal scanner for mixed environments

When deploying BYOC scanners, specify the profile and version in your deployment configuration:

# Docker (always use a pinned version — never :latest) docker pull slimio/scanner:1.0.0-database # Kubernetes image: slimio/scanner:1.0.0-cloud-storage

Scanners automatically register their capabilities with the control plane. Jobs are only assigned to scanners that have the required connectors installed. A database scan will never be sent to a cloud-storage scanner.

Scanner Versioning

Scanner images use explicit version tags following {major}.{minor}.{patch}-{profile} format. We never publish a :latest tag for BYOC images to prevent accidental upgrades.

Version Notifications

The Scanner Fleet page shows an Update Available badge when a scanner is running an older version:

  • BYOC scanners — You update at your own pace. The badge tells you a newer version is available with the changelog.
  • Slim.io-managed scanners — Updates are applied automatically via rolling deployment. No action required.

Safe Update Guarantee

Scanner updates never interrupt an active scan. The update process:

  1. The scanner finishes all in-progress scan jobs
  2. No new jobs are assigned during the update window
  3. The new version is deployed
  4. Health is verified before resuming job assignment

This means you can update scanners at any time without worrying about data loss or interrupted scans.

Scan Completeness

Every scan result includes a completeness score (0.0–1.0) that quantifies how thoroughly the scan covered the target data:

  • 1.0 — All enumerated resources were scanned successfully
  • High — Most resources scanned, some skipped due to unsupported format or size
  • Medium — Significant resources skipped due to access issues or errors
  • 0.0 — Scan could not enumerate or access the target data

The completeness score is computed from actual per-resource outcomes, not estimates. Every resource is accounted for in the coverage report with one of these statuses:

StatusMeaning
ScannedSuccessfully processed through the detection pipeline
Skipped (format)Binary or unsupported format, automatically excluded
Skipped (access denied)Insufficient permissions to read the resource
Skipped (size)Resource exceeds the maximum scannable size for this connector
FailedScan attempted but encountered an error
CancelledScan was cancelled before this resource was reached

A low completeness score indicates that the scan may have missed sensitive data. Review the coverage report breakdown for details on skipped resources and remediation steps.

Scan Decision Audit Trail

Every decision the platform makes during a scan is recorded in an immutable audit trail. This provides full transparency into why specific resources were included, skipped, or handled differently.

What Gets Logged

DecisionExample
Resource includedResource matched scan scope and format filters
Resource skipped (binary)File classified as binary format, excluded per policy
Resource skipped (format)Unsupported file format (e.g., video, executable)
Resource skipped (size)File exceeds maximum scannable size
Resource skipped (access)Insufficient permissions to read the resource
Budget exceededScan reached its time or cost budget before processing this resource
Coverage target metScan reached its configured coverage target and stopped
Circuit isolation triggeredConnector type experiencing failures, paused to prevent cascade
Resource truncatedFile partially scanned due to size limits

Viewing the Audit Trail

The decision trail is accessible from:

  • Scan Detail page — Click any scan to see the full decision log with filters by decision type
  • Admin API — Query decisions programmatically for compliance reporting

The decision audit trail is designed for compliance and forensics. When an auditor asks “why wasn’t this bucket fully scanned?”, the trail provides a timestamped, per-resource answer.

Adding a Scanner

Click Add Scanner in the Scanner Fleet page to deploy a new scanner:

Slim.io Managed

Select “Slim.io manages it” — we provision and manage the scanner for you:

  1. Choose a scanner profile (cloud-storage, database, saas, or full)
  2. Click Deploy — the scanner is provisioned automatically
  3. The scanner appears in your fleet within ~30 seconds
  4. No infrastructure setup required on your end

Scanner count is managed by your plan. Contact your administrator if you need more scanners.

Self-Hosted (BYOC)

Select “I’ll deploy in my environment” — you run the scanner in your own VPC:

  1. Choose a scanner profile and deployment platform (Docker, Kubernetes, or Cloud Run)
  2. Copy the generated deployment configuration (includes a pre-embedded registration token)
  3. Run the deployment command in your environment
  4. The scanner auto-registers and appears in your fleet

For detailed setup, see BYOC Deployment.

Both hosting modes use the same scanner software and produce identical results. The only difference is where the compute runs. Choose self-hosted when data must remain within your network boundary.

Connector Health Alerts

Configure alerts per connector to stay informed about health issues:

AlertDefaultDescription
Credential ExpiringON (7 days before)Warns before credentials expire so you can rotate proactively
Authentication FailedON (immediate)Notifies when a connector fails to authenticate
Connector OfflineOFFAlerts when a connector is unreachable. Enable per your needs — some connectors go offline intentionally during maintenance.
Health DegradedONAlerts when a connector’s health score drops below threshold

Alert thresholds and notification channels (Slack, Teams, Google Chat) are configurable per connector in the connector settings panel.

Learn More

Last updated on