Scan Management

The Scans page is the operations center for all scanning activity. It provides a real-time view of active scans, a queue of pending jobs, full scan history, and controls to start, pause, resume, stop, and cancel scans.

Starting a New Scan

Navigate to Scans in the Customer Dashboard sidebar.
Click New Scan to open the scan configuration dialog.
Configure the scan parameters:

Connector Selection

Select the data source to scan from the dropdown. Only connectors with Active status are available. If no connectors are active, you will need to set one up first under Connectors.

Scan Types

Type	Description	When to Use
Full Discovery	Scans all resources from scratch, ignoring previous scan state	Initial baseline, periodic full re-scan, after major data migration
Incremental	Only scans resources that are new or modified since the last completed scan	Daily or weekly ongoing monitoring with lower cost
Targeted	Scans a specific path or prefix within the connector scope	Investigating a known directory, post-incident verification

For Targeted scans, an additional text field appears where you enter the path or prefix to scan (e.g., data/exports/2026/).

Classifier Template

Select a classifier template to control which classifiers run during the scan. If a default template exists, it is pre-selected automatically.

Template selected — Only classifiers in that template are active for this scan
None selected — All globally enabled classifiers run

See Classifiers for details on managing classifier sets.

Scan Profile

Control how intensively the scanner interacts with your data source:

Profile	Behavior	When to Use
Light	Minimal parallelism, smaller samples. Lowest load on your infrastructure.	Production databases under active load
Standard	Balanced parallelism and sampling. Default for most environments.	General-purpose scanning
Deep	Higher parallelism, larger samples. More thorough but higher resource usage.	Staging environments, dedicated scan infrastructure

Scan Profile controls how much load the scanner places on YOUR infrastructure, not ours. Light is recommended for production databases that serve live traffic.

Click Start Scan to submit the job.

Scan Lifecycle

Every scan progresses through defined states:


Created → Queued → Running → Completed
                     ↕           ↓
                   Paused     Stopped
                     ↓
                  Cancelled

Status	Description
Created	Scan job initialized, configuration validated
Queued	Waiting for available workers within tier limits
Running	Workers actively processing files through the detection pipeline
Paused	Scan halted at a checkpoint, can be resumed
Completed	All resources scanned and the configured coverage target was met
Partial	Scan finished with useful findings but did not meet its coverage target, or some resources errored. All findings collected before the partial transition are preserved. See Partial Scan Outcomes below.
Stopped	Manually stopped by user, partial results preserved
Cancelled	Manually cancelled by user, all partial results discarded
Failed	Unrecoverable error (credential issue, infrastructure failure, or zero resources scanned)

Completed vs. Partial: A scan is Completed only when it meets its configured coverage target with no resource failures. If the scan ran successfully but only covered 60% of the target, or some resources errored mid-scan, it transitions to Partial — not Failed. Partial scans are still useful and their findings are first-class. The distinction tells you whether to trust the absence of findings as well as the presence of them.

Scan Controls

Active and paused scans expose four control actions:

Pause

Pauses a running scan at the current checkpoint. Workers finish processing their current file and then idle. The scan state is saved so it can be resumed later.

Available when scan status is Running
The scan transitions to Paused status
Polling interval slows to 30 seconds while paused

Resume

Resumes a paused scan from its saved checkpoint. Workers pick up where they left off.

Available when scan status is Paused
The scan transitions back to Running status
Polling interval returns to the active rate

Stop

Gracefully stops a scan and saves all partial results. Findings collected up to the point of stopping are preserved in the Data Catalog. The scan is marked as Stopped in history.

Available when scan status is Running or Paused
Partial findings are retained and visible in the Data Catalog
The scan cannot be resumed after stopping

Cancel

Cancels a running scan. Workers finish processing their current resource, persist all findings discovered so far, and exit cleanly. The scan transitions to Cancelled status with an accurate coverage report.

Available when scan status is Running or Paused
Findings are preserved — all PII detected before cancellation is saved and visible in the Data Catalog
The coverage report shows exactly which resources were scanned and which were not yet reached
The scan cannot be resumed after cancelling

Stop vs. Cancel: Stop and Cancel both preserve findings. The difference is intent: Stop means “I have enough data, wrap up.” Cancel means “abort this scan.” Both produce an accurate coverage report showing what was and was not scanned.

Pre-Scan Cost Estimate

Before a scan starts, the platform estimates the compute cost based on the number of resources, total data volume, and connector type. This estimate is displayed in the scan confirmation dialog.

If the estimated cost exceeds your configured ceiling, the scan is blocked with a clear message explaining the estimate and the ceiling. You can:

Reduce scope — Narrow the scan to a specific prefix or path
Switch to Light profile — Lower parallelism reduces cost
Increase the ceiling — Ask your administrator to raise the per-scan cost ceiling in Settings

Scan Monitoring

KPI Summary

The top of the Scans page displays four key metrics:

Total Scans — Lifetime count of all scans
Active — Currently running scans
Queued — Scans waiting for available workers
Avg Duration — Average scan completion time

Active Operations

The Active Operations panel shows all currently running scans with:

Connector name and provider — Which data source is being scanned
Phase indicators — Visual progress through Discovering, Scanning, Analyzing, Complete
Progress bar — Animated gradient bar showing percentage complete
Current stage — Text description of the current pipeline phase
PII Findings — Real-time count of findings detected so far
ETA — Estimated time remaining

Concurrency Queue

Scans that are waiting for available workers appear in the queue panel with their queue position number. Scans execute in FIFO order within tier concurrency limits.

Scan History

The Scan History table shows all past scans with:

Column	Description
Connector	Name and provider of the scanned data source
Status	Color-coded badge (completed, partial, failed, cancelled, stopped, paused)
Items Scanned	Total files or resources processed
PII Findings	Number of sensitive data matches detected
Duration	How long the scan took
Started At	Relative timestamp (e.g., “2h ago”)

Status badges use the following color coding:

Status	Color
Completed	Green
Partial	Amber
Running	Blue (pulsing dot)
Paused	Yellow
Stopped	Orange
Failed	Red
Cancelled	Grey

Scan Capacity

Scan capacity is provisioned per-tenant based on your subscription agreement. Your specific limits — concurrent scans, file counts, and worker parallelism — are configured by your customer success representative during onboarding and visible in Settings → Scanner → Limits in the Customer Dashboard.

If a scan is stuck at Queued status, you may have reached your tenant’s concurrent-scan capacity. Check Settings → Scanner → Limits to see your current usage, or contact your customer success representative if you need additional capacity.

Scan API

Scans can also be triggered and managed via the API:


# Start a scan
POST /api/v1/connectors/:connector_id/scan
{
  "scan_type": "full",
  "template_id": "tmpl_abc123"
}
 
# Pause a scan
POST /api/v1/scans/:scan_id/pause
 
# Resume a scan
POST /api/v1/scans/:scan_id/resume
 
# Stop a scan (preserves results)
POST /api/v1/scans/:scan_id/stop
 
# Cancel a scan (discards results)
POST /api/v1/scans/:scan_id/cancel
 
# Get scan history
GET /api/v1/scans/history

Scan Scheduling

Schedule recurring scans on any connector:

Navigate to the connector detail page.
Open the Schedule tab.
Configure:

Setting	Description
Frequency	Daily, weekdays only, weekly, biweekly, monthly, or custom cron
Time	Date and time displayed in YOUR local timezone — no UTC conversion needed
Scan Profile	Light, Standard, or Deep (same as manual scans)
Scan Type	Full Discovery, Incremental, or Targeted
Max Duration	Optional time limit — if the scan doesn’t complete, it checkpoints and resumes on the next scheduled run

All scheduled times are displayed and stored in your local timezone. If you schedule a scan for “2:00 AM EST”, it runs at 2:00 AM Eastern regardless of where slim.io’s servers are located.

Resume Behavior

If a scheduled scan doesn’t complete within its time window:

The scan checkpoints at the last completed resource
On the next scheduled run, it resumes from the checkpoint (not from scratch)
If the checkpoint is older than the configured gap limit (default: 48 hours), the scan restarts fresh to avoid stale coverage

Scan Results

Every scan (running or terminal) includes a structured coverage report so you can audit exactly how much of the target data was covered, what was skipped, and why:

Field	Description
Completeness Score	0-100% — derived signal combining coverage percentage, enumeration status, and resource failures. Green (>90%), yellow (>70%), red (70% or below)
Coverage Target	The coverage threshold the scan was configured to meet (e.g., 70%)
Coverage Strategy	`Best effort` (continue past target if there is time) or `Strict` (mark Partial if target not met)
Coverage Percentage	Actual fraction of enumerated resources that were scanned
Target Met	Whether the scan met its configured coverage target
Enumeration Status	Whether resource discovery finished — `complete`, `in progress`, `partial`, or `failed`
Resources Enumerated	Total resources discovered by the connector
Resources Scanned	Resources actually processed by the detection pipeline
Resources Skipped	Resources excluded by policy (binary content, unsupported format, size limits)
Resources Failed	Resources that errored during scanning (permissions, network, parse errors)
Clean Confidence	For zero-finding scans: HIGH (thorough scan, no PII found), MEDIUM (partial coverage), LOW (significant gaps), NONE (couldn’t access data)
Cost Estimate	Estimated infrastructure cost for this scan
Detection Method	How findings were detected (column sampling, full scan, API pagination)

A completeness score below 70% means the scan may have missed sensitive data. Check the Resources Skipped and Resources Failed counts to understand what was missed and why, and review the Enumeration Status to see whether the connector even finished discovering resources.

Partial Scan Outcomes

A scan transitions to Partial status (not Failed) when it produced useful findings but did not meet its coverage target, or some resources errored mid-scan. The partial_reason field on the scan record explains exactly why:

Reason	Meaning
`COVERAGE_BELOW_TARGET`	The scan finished without meeting its configured coverage target (e.g., target 70%, actual 45%)
`RESOURCE_FAILURES`	One or more resources errored during scanning (permissions, network, parse errors)
`TIME_EXCEEDED`	The scan hit its time budget before reaching the coverage target
`FINDINGS_LIMIT`	The scan hit the maximum findings limit configured for the job
`CANCELLED_BY_USER`	The user cancelled the scan after some findings had been collected
`CANCELLED_BY_SYSTEM`	The system cancelled the scan (capacity limits, deadline exceeded)

Partial scans preserve every finding collected before the partial transition. They are functionally equivalent to Completed scans for the resources that were scanned — only the coverage was incomplete. Use the coverage report to understand whether to re-scan with a higher target, a longer time budget, or different scope.

Severity Scoring

Every finding is assigned a severity level based on the combination of PII type and detection confidence:

Severity	Criteria	Examples
Critical	High-risk PII detected with high confidence	Social Security numbers, credit card numbers, bank account numbers
High	High-risk PII with medium confidence, or medium-risk PII with high confidence	Passport numbers, driver’s license numbers, medical record numbers
Medium	Medium-risk PII with medium confidence, or low-risk PII with high confidence	Email addresses, phone numbers, physical addresses
Low	Low-risk PII or any PII type detected with lower confidence	IP addresses, usernames, generic identifiers

Severity levels are visible throughout the dashboard — in the Data Catalog, Investigation view, Command Center cards, and scan history. Use severity filters to prioritize remediation of the most sensitive exposures first.

Finding Lifecycle

Findings progress through a defined lifecycle as your team triages and remediates them:

State	Description
Active	Newly detected finding, awaiting triage. Default state for all new detections.
Confirmed	Team has verified this is real PII that requires action.
Resolved	The underlying data exposure has been remediated (data removed, encrypted, or access restricted). Resolved findings are retained in history for audit purposes.
Suppressed	Marked as an accepted risk or known false positive. Suppressed findings are excluded from active counts and Command Center cards but remain searchable.

To change a finding’s state, select one or more findings in the Data Catalog or Investigation view and use the Actions menu. Bulk state changes are supported for efficient triage workflows.

Resolved and suppressed findings are not permanently deleted. They remain in your scan history for compliance and audit purposes, and can be reactivated if the same PII is detected again in a future scan.

Per-Object Findings

Every scan result includes a per-object breakdown showing exactly which files, tables, or resources contain sensitive data:

Field	Description
Object Path	Full path to the file, table, or resource
File Type	Detected format — CSV, JSON, Parquet, Excel, PDF, Avro, ORC, or text
Findings Count	Total PII detections in this specific object
Findings by Type	Breakdown by PII category (e.g., 12 SSN, 20 email, 8 credit card)
Classification	Highest sensitivity level found — Public, Internal, Confidential, or Restricted

Click any object to drill down into its individual findings with:

PII type filter — Show only SSNs, only emails, etc.
Confidence scores — How certain the detection is
Detection method — Column sampling, full scan, or API pagination
Masked evidence — Redacted preview (e.g., ****-**-6789) confirming the match

Object IDs are stable across re-scans. The same file scanned twice produces the same object record, with findings updated rather than duplicated.

Finding Feedback

On any finding in the Data Catalog, you can provide feedback:

Confirm (thumbs up) — This is real PII. Strengthens classifier confidence.
Reject (thumbs down) — This is a false positive. Helps improve detection accuracy over time.

Feedback is used to improve classifier calibration. The more feedback provided, the more accurate future scans become.

Filtering by file type

The Scan Detail page includes filter chips that let you focus on findings from one file category at a time:

Structured — Findings from columnar data files (Parquet, ORC, Avro).
Archive — Findings from inside compressed archives (ZIP, TAR, OOXML/ODF documents, GZIP).
Email — Findings from email archives (EML, MBOX, MSG) — covers headers, body, and attachments.
Legacy Office — Findings from pre-2007 binary Office files (.doc, .xls, .ppt).

Each chip shows a count badge so you can see the distribution at a glance, and clicking a chip narrows the findings list below to that category. The All chip restores the full list. Unknown appears for legacy findings that pre-date this feature.

The filter chips are visible only when the scan produced findings from one of the categories above. A scan over a database connector (which has no archives or emails) shows the chips with their counts but no truncation banner.

Archive truncation banner

If a deeply nested or extremely large archive trips the platform’s recursion safety limits, the Scan Detail page displays a high-visibility warning banner above the findings breakdown. The banner explains:

What was cut off — which safety limit was hit (recursion depth, total uncompressed size, or file count cap).
How much was scanned — exact file count and byte count processed before the limit fired.
Why this matters — findings from the archive past the cap are missing; this scan’s coverage of that archive is incomplete.
How to recover — raise the relevant limit in Settings → Scanner → Recursion and re-scan, or upgrade to a higher profile.

This banner only appears when truncation actually happened. Most scans never trigger it.

Do not interpret a missing-findings report as “no PII” if the truncation banner is showing — it means the scan was cut short, not that the archive was clean.

Learn More

Parallel Scanning Engine — Distributed scan architecture and worker scaling
Classifiers — Managing classifier sets and suppression rules
Event-Driven Scanning — Real-time scan triggers from cloud storage events
Run Your First Scan — Step-by-step guide for your first scan