Smart Scanning Modes

Slim.io offers multiple scan modes and profiles that control how resources are discovered, prioritized, and analyzed. These options let you balance speed, cost, and coverage based on your use case — from a quick onboarding assessment to a full compliance audit.

Scan Modes

Every scan runs in one of four modes:

Mode	Behavior	Best For
Full	Scans every resource in the connector scope regardless of prior scan history	Initial baseline, periodic re-assessment
Incremental	Scans only resources that changed since the last scan, using ETags, version identifiers, and timestamps to detect changes	Ongoing monitoring with lower cost
Smart (default)	Combines incremental change detection with risk-based prioritization — high-risk resources are scanned first, unchanged low-risk resources are skipped	Day-to-day operations
Bootstrap	Progressive onboarding for large environments (see below)	First-time onboarding of petabyte-scale data stores

Smart Mode

Smart mode is the default for all scheduled scans. It works in two passes:

Change Detection — Identifies resources modified since the last completed scan using provider-native change signals (ETags, version IDs, modification timestamps).
Risk Prioritization — Ranks changed resources by risk score. Publicly-exposed resources, resources with prior findings, and resources in high-sensitivity zones are scanned first.

Unchanged resources with no prior findings are skipped entirely. Unchanged resources that previously contained findings are periodically re-verified on a configurable cadence.

Bootstrap Mode

Bootstrap mode is designed for onboarding large environments where a full scan would take days or weeks. It runs in four progressive stages:

Metadata Scan — Enumerates all resources and collects metadata (names, types, sizes, access policies). No content is read.
Risk Ranking — Uses metadata signals to assign preliminary risk scores. Publicly-accessible resources and resources with PII-indicative names are ranked highest.
Priority Sampling — Scans the highest-risk resources by risk score with full content analysis.
Public Exposure Sweep — Scans all publicly-exposed resources regardless of risk score.

After bootstrap completes, you have a prioritized findings report covering your highest-risk data. Subsequent scans can run in Smart mode to incrementally expand coverage.

Bootstrap mode typically delivers actionable findings within hours, even on environments with millions of resources. You do not need to wait for a full scan to see results.

Scan Profiles

Scan profiles control the depth of content analysis and the load placed on your infrastructure:

Profile	Behavior	Best For
Light	Minimal parallelism, smaller samples. Lowest load on your data sources.	Production databases under active load
Standard (default)	Balanced parallelism and sampling. Good coverage without overloading infrastructure.	General-purpose scanning
Deep	Higher parallelism, larger samples, all classifier tiers engaged. Most thorough.	Staging environments, compliance audits, dedicated scan infrastructure

You can combine any scan mode with any scan profile. For example, an Incremental + Light scan gives you rapid change detection with minimal infrastructure impact, while a Full + Deep scan provides maximum coverage for a compliance audit.

Adaptive Sampling

For database connectors, Slim.io automatically adjusts sample sizes based on table characteristics:

Small tables (under 100,000 rows) — All rows are sampled.
Medium tables — A statistically representative sample is drawn, targeting sufficient coverage to detect PII types present in a meaningful percentage of rows.
Large tables (over 10 million rows) — Column-level sampling is used. Representative rows are sampled per column, and the column is classified as a whole. A 1-billion-row table does not need every row scanned — representative sampling reveals the same PII types with high confidence.

Sample sizes are configurable per scan. The default is optimized for accuracy while keeping scan duration practical.

Adaptive sampling applies to all supported database connectors including PostgreSQL, MySQL, Snowflake, Databricks, Oracle, SQL Server, and DB2.

Budget Controls

Scans can be configured with budgets that cap resource consumption. When any budget threshold is reached, the scan stops gracefully and reports partial results.

Budget	Default	Description
Max Time	24 hours	Maximum wall-clock duration for the scan
Max Resources	Unlimited	Maximum number of resources to scan
Max Bytes	Unlimited	Maximum total bytes processed

When a budget is hit:

In-progress file processing completes (no data is discarded mid-analysis)
Remaining unprocessed resources are marked as “skipped — budget exceeded”
The scan completes with status Completed (Partial) and full coverage metrics

Budget controls are safety nets, not precision instruments. A scan may slightly exceed a budget threshold because in-flight work is allowed to finish. Plan budgets with a reasonable margin.

Coverage Reports

Every completed scan produces a coverage report that shows exactly what was processed and what was not:

Metric	Description
Total Enumerated	Resources discovered during enumeration
Scanned	Resources that received content analysis
Skipped (Unchanged)	Resources unchanged since the last scan (incremental/smart modes)
Skipped (Unsupported)	Resources in formats not supported by the detection pipeline
Skipped (Budget)	Resources not reached before a budget threshold was hit
Skipped (Access Denied)	Resources the connector credentials could not access
Bytes Scanned	Total data volume processed
Coverage Percentage	Scanned / Total Enumerated

Coverage reports are available in the dashboard and via the API. They are retained for the lifetime of the scan record, so you can compare coverage trends across scan runs.

Confidence Tiers

Slim.io uses a four-tier detection model. Each tier adds precision at increasing computational cost. Findings are classified by the highest tier that confirmed them.

Tier	Name	Method	Confidence Tier	Speed
1	Pattern Matching	Fast regex-based detection of known PII formats	Low–Medium	Fastest
2	Validated	Patterns confirmed by structural checks (Luhn algorithm for credit cards, format validation for SSNs, IBAN check digits)	Medium	Fast
3	Contextual	Column names, field labels, and surrounding text provide additional signal that boosts or reduces confidence	Medium–High	Moderate
4	AI-Assisted	Borderline findings are reviewed by the AI pipeline for disambiguation using semantic understanding	High	Slowest

Most findings are resolved at Tier 1 or Tier 2. The AI-Assisted tier is only invoked for ambiguous cases where lower tiers produce inconclusive results, keeping costs low while maintaining high accuracy.

How Tiers Interact

Tiers run in sequence. A finding starts at Tier 1 and is promoted to higher tiers only if additional validation is needed:


Pattern Match (Tier 1)
  → Structural Validation (Tier 2)
    → Contextual Analysis (Tier 3)
      → AI Review (Tier 4, if still ambiguous)

If a finding reaches high confidence at any tier, it is finalized without progressing further. This means the majority of findings never reach the more expensive tiers.

Configuring Scans

Scan mode, profile, and budget controls are set when creating a scan via the dashboard or API:


POST /api/v1/scans
{
  "connector_id": "conn_abc123",
  "mode": "smart",
  "profile": "standard",
  "budget": {
    "max_time_seconds": 86400,
    "max_resources": null,
    "max_bytes": null
  }
}

All parameters are optional — omitted values use the defaults shown in the tables above.