Scan Management
The Scans page is the operations center for all scanning activity. It provides a real-time view of active scans, a queue of pending jobs, full scan history, and controls to start, pause, resume, stop, and cancel scans.
Starting a New Scan
- Navigate to Scans in the Customer Dashboard sidebar.
- Click New Scan to open the scan configuration dialog.
- Configure the scan parameters:
Connector Selection
Select the data source to scan from the dropdown. Only connectors with Active status are available. If no connectors are active, you will need to set one up first under Connectors.
Scan Types
| Type | Description | When to Use |
|---|---|---|
| Full Discovery | Scans all resources from scratch, ignoring previous scan state | Initial baseline, periodic full re-scan, after major data migration |
| Incremental | Only scans resources that are new or modified since the last completed scan | Daily or weekly ongoing monitoring with lower cost |
| Targeted | Scans a specific path or prefix within the connector scope | Investigating a known directory, post-incident verification |
For Targeted scans, an additional text field appears where you enter the path or prefix to scan (e.g., data/exports/2026/).
Classifier Template
Select a classifier template to control which classifiers run during the scan. If a default template exists, it is pre-selected automatically.
- Template selected — Only classifiers in that template are active for this scan
- None selected — All globally enabled classifiers run
See Classifiers for details on managing classifier sets.
Scan Profile
Control how intensively the scanner interacts with your data source:
| Profile | Behavior | When to Use |
|---|---|---|
| Light | Minimal parallelism, smaller samples. Lowest load on your infrastructure. | Production databases under active load |
| Standard | Balanced parallelism and sampling. Default for most environments. | General-purpose scanning |
| Deep | Higher parallelism, larger samples. More thorough but higher resource usage. | Staging environments, dedicated scan infrastructure |
Scan Profile controls how much load the scanner places on YOUR infrastructure, not ours. Light is recommended for production databases that serve live traffic.
- Click Start Scan to submit the job.
Scan Lifecycle
Every scan progresses through defined states:
Created → Queued → Running → Completed
↕ ↓
Paused Stopped
↓
Cancelled| Status | Description |
|---|---|
| Created | Scan job initialized, configuration validated |
| Queued | Waiting for available workers within tier limits |
| Running | Workers actively processing files through the detection pipeline |
| Paused | Scan halted at a checkpoint, can be resumed |
| Completed | All resources scanned and the configured coverage target was met |
| Partial | Scan finished with useful findings but did not meet its coverage target, or some resources errored. All findings collected before the partial transition are preserved. See Partial Scan Outcomes below. |
| Stopped | Manually stopped by user, partial results preserved |
| Cancelled | Manually cancelled by user, all partial results discarded |
| Failed | Unrecoverable error (credential issue, infrastructure failure, or zero resources scanned) |
Completed vs. Partial: A scan is Completed only when it meets its configured coverage target with no resource failures. If the scan ran successfully but only covered 60% of the target, or some resources errored mid-scan, it transitions to Partial — not Failed. Partial scans are still useful and their findings are first-class. The distinction tells you whether to trust the absence of findings as well as the presence of them.
Scan Controls
Active and paused scans expose four control actions:
Pause
Pauses a running scan at the current checkpoint. Workers finish processing their current file and then idle. The scan state is saved so it can be resumed later.
- Available when scan status is Running
- The scan transitions to Paused status
- Polling interval slows to 30 seconds while paused
Resume
Resumes a paused scan from its saved checkpoint. Workers pick up where they left off.
- Available when scan status is Paused
- The scan transitions back to Running status
- Polling interval returns to the active rate
Stop
Gracefully stops a scan and saves all partial results. Findings collected up to the point of stopping are preserved in the Data Catalog. The scan is marked as Stopped in history.
- Available when scan status is Running or Paused
- Partial findings are retained and visible in the Data Catalog
- The scan cannot be resumed after stopping
Cancel
Cancels a running scan. Workers finish processing their current resource, persist all findings discovered so far, and exit cleanly. The scan transitions to Cancelled status with an accurate coverage report.
- Available when scan status is Running or Paused
- Findings are preserved — all PII detected before cancellation is saved and visible in the Data Catalog
- The coverage report shows exactly which resources were scanned and which were not yet reached
- The scan cannot be resumed after cancelling
Stop vs. Cancel: Stop and Cancel both preserve findings. The difference is intent: Stop means “I have enough data, wrap up.” Cancel means “abort this scan.” Both produce an accurate coverage report showing what was and was not scanned.
Pre-Scan Cost Estimate
Before a scan starts, the platform estimates the compute cost based on the number of resources, total data volume, and connector type. This estimate is displayed in the scan confirmation dialog.
If the estimated cost exceeds your configured ceiling, the scan is blocked with a clear message explaining the estimate and the ceiling. You can:
- Reduce scope — Narrow the scan to a specific prefix or path
- Switch to Light profile — Lower parallelism reduces cost
- Increase the ceiling — Ask your administrator to raise the per-scan cost ceiling in Settings
Scan Monitoring
KPI Summary
The top of the Scans page displays four key metrics:
- Total Scans — Lifetime count of all scans
- Active — Currently running scans
- Queued — Scans waiting for available workers
- Avg Duration — Average scan completion time
Active Operations
The Active Operations panel shows all currently running scans with:
- Connector name and provider — Which data source is being scanned
- Phase indicators — Visual progress through Discovering, Scanning, Analyzing, Complete
- Progress bar — Animated gradient bar showing percentage complete
- Current stage — Text description of the current pipeline phase
- PII Findings — Real-time count of findings detected so far
- ETA — Estimated time remaining
Concurrency Queue
Scans that are waiting for available workers appear in the queue panel with their queue position number. Scans execute in FIFO order within tier concurrency limits.
Scan History
The Scan History table shows all past scans with:
| Column | Description |
|---|---|
| Connector | Name and provider of the scanned data source |
| Status | Color-coded badge (completed, partial, failed, cancelled, stopped, paused) |
| Items Scanned | Total files or resources processed |
| PII Findings | Number of sensitive data matches detected |
| Duration | How long the scan took |
| Started At | Relative timestamp (e.g., “2h ago”) |
Status badges use the following color coding:
| Status | Color |
|---|---|
| Completed | Green |
| Partial | Amber |
| Running | Blue (pulsing dot) |
| Paused | Yellow |
| Stopped | Orange |
| Failed | Red |
| Cancelled | Grey |
Scan Tier Limits
Scan capacity is governed by your subscription tier:
| Tier | Max Files / Scan | Max Concurrent Workers | Scans / Month |
|---|---|---|---|
| Free | 10,000 | 2 | 10 |
| Starter | 100,000 | 5 | 50 |
| Professional | 1,000,000 | 20 | Unlimited |
| Enterprise | Unlimited | Custom | Unlimited |
If a scan is stuck at “Queued” status, check your tier’s scan quota. You may have reached the monthly scan limit or the maximum concurrent worker count.
Scan API
Scans can also be triggered and managed via the API:
# Start a scan
POST /api/v1/connectors/:connector_id/scan
{
"scan_type": "full",
"template_id": "tmpl_abc123"
}
# Pause a scan
POST /api/v1/scans/:scan_id/pause
# Resume a scan
POST /api/v1/scans/:scan_id/resume
# Stop a scan (preserves results)
POST /api/v1/scans/:scan_id/stop
# Cancel a scan (discards results)
POST /api/v1/scans/:scan_id/cancel
# Get scan history
GET /api/v1/scans/historyScan Scheduling
Schedule recurring scans on any connector:
- Navigate to the connector detail page.
- Open the Schedule tab.
- Configure:
| Setting | Description |
|---|---|
| Frequency | Daily, weekdays only, weekly, biweekly, monthly, or custom cron |
| Time | Date and time displayed in YOUR local timezone — no UTC conversion needed |
| Scan Profile | Light, Standard, or Deep (same as manual scans) |
| Scan Type | Full Discovery, Incremental, or Targeted |
| Max Duration | Optional time limit — if the scan doesn’t complete, it checkpoints and resumes on the next scheduled run |
All scheduled times are displayed and stored in your local timezone. If you schedule a scan for “2:00 AM EST”, it runs at 2:00 AM Eastern regardless of where slim.io’s servers are located.
Resume Behavior
If a scheduled scan doesn’t complete within its time window:
- The scan checkpoints at the last completed resource
- On the next scheduled run, it resumes from the checkpoint (not from scratch)
- If the checkpoint is older than the configured gap limit (default: 48 hours), the scan restarts fresh to avoid stale coverage
Scan Results
Every scan (running or terminal) includes a structured coverage report so you can audit exactly how much of the target data was covered, what was skipped, and why:
| Field | Description |
|---|---|
| Completeness Score | 0-100% — derived signal combining coverage percentage, enumeration status, and resource failures. Green (>90%), yellow (>70%), red (70% or below) |
| Coverage Target | The coverage threshold the scan was configured to meet (e.g., 70%) |
| Coverage Strategy | Best effort (continue past target if there is time) or Strict (mark Partial if target not met) |
| Coverage Percentage | Actual fraction of enumerated resources that were scanned |
| Target Met | Whether the scan met its configured coverage target |
| Enumeration Status | Whether resource discovery finished — complete, in progress, partial, or failed |
| Resources Enumerated | Total resources discovered by the connector |
| Resources Scanned | Resources actually processed by the detection pipeline |
| Resources Skipped | Resources excluded by policy (binary content, unsupported format, size limits) |
| Resources Failed | Resources that errored during scanning (permissions, network, parse errors) |
| Clean Confidence | For zero-finding scans: HIGH (thorough scan, no PII found), MEDIUM (partial coverage), LOW (significant gaps), NONE (couldn’t access data) |
| Cost Estimate | Estimated infrastructure cost for this scan |
| Detection Method | How findings were detected (column sampling, full scan, API pagination) |
A completeness score below 70% means the scan may have missed sensitive data. Check the Resources Skipped and Resources Failed counts to understand what was missed and why, and review the Enumeration Status to see whether the connector even finished discovering resources.
Partial Scan Outcomes
A scan transitions to Partial status (not Failed) when it produced useful findings but did not meet its coverage target, or some resources errored mid-scan. The partial_reason field on the scan record explains exactly why:
| Reason | Meaning |
|---|---|
COVERAGE_BELOW_TARGET | The scan finished without meeting its configured coverage target (e.g., target 70%, actual 45%) |
RESOURCE_FAILURES | One or more resources errored during scanning (permissions, network, parse errors) |
TIME_EXCEEDED | The scan hit its time budget before reaching the coverage target |
FINDINGS_LIMIT | The scan hit the maximum findings limit configured for the job |
CANCELLED_BY_USER | The user cancelled the scan after some findings had been collected |
CANCELLED_BY_SYSTEM | The system cancelled the scan (capacity limits, deadline exceeded) |
Partial scans preserve every finding collected before the partial transition. They are functionally equivalent to Completed scans for the resources that were scanned — only the coverage was incomplete. Use the coverage report to understand whether to re-scan with a higher target, a longer time budget, or different scope.
Severity Scoring
Every finding is assigned a severity level based on the combination of PII type and detection confidence:
| Severity | Criteria | Examples |
|---|---|---|
| Critical | High-risk PII detected with high confidence | Social Security numbers, credit card numbers, bank account numbers |
| High | High-risk PII with medium confidence, or medium-risk PII with high confidence | Passport numbers, driver’s license numbers, medical record numbers |
| Medium | Medium-risk PII with medium confidence, or low-risk PII with high confidence | Email addresses, phone numbers, physical addresses |
| Low | Low-risk PII or any PII type detected with lower confidence | IP addresses, usernames, generic identifiers |
Severity levels are visible throughout the dashboard — in the Data Catalog, Investigation view, Command Center cards, and scan history. Use severity filters to prioritize remediation of the most sensitive exposures first.
Finding Lifecycle
Findings progress through a defined lifecycle as your team triages and remediates them:
| State | Description |
|---|---|
| Active | Newly detected finding, awaiting triage. Default state for all new detections. |
| Confirmed | Team has verified this is real PII that requires action. |
| Resolved | The underlying data exposure has been remediated (data removed, encrypted, or access restricted). Resolved findings are retained in history for audit purposes. |
| Suppressed | Marked as an accepted risk or known false positive. Suppressed findings are excluded from active counts and Command Center cards but remain searchable. |
To change a finding’s state, select one or more findings in the Data Catalog or Investigation view and use the Actions menu. Bulk state changes are supported for efficient triage workflows.
Resolved and suppressed findings are not permanently deleted. They remain in your scan history for compliance and audit purposes, and can be reactivated if the same PII is detected again in a future scan.
Per-Object Findings
Every scan result includes a per-object breakdown showing exactly which files, tables, or resources contain sensitive data:
| Field | Description |
|---|---|
| Object Path | Full path to the file, table, or resource |
| File Type | Detected format — CSV, JSON, Parquet, Excel, PDF, Avro, ORC, or text |
| Findings Count | Total PII detections in this specific object |
| Findings by Type | Breakdown by PII category (e.g., 12 SSN, 20 email, 8 credit card) |
| Classification | Highest sensitivity level found — Public, Internal, Confidential, or Restricted |
Click any object to drill down into its individual findings with:
- PII type filter — Show only SSNs, only emails, etc.
- Confidence scores — How certain the detection is
- Detection method — Column sampling, full scan, or API pagination
- Masked evidence — Redacted preview (e.g.,
****-**-6789) confirming the match
Object IDs are stable across re-scans. The same file scanned twice produces the same object record, with findings updated rather than duplicated.
Finding Feedback
On any finding in the Data Catalog, you can provide feedback:
- Confirm (thumbs up) — This is real PII. Strengthens classifier confidence.
- Reject (thumbs down) — This is a false positive. Helps improve detection accuracy over time.
Feedback is used to improve classifier calibration. The more feedback provided, the more accurate future scans become.
Learn More
- Parallel Scanning Engine — Distributed scan architecture and worker scaling
- Classifiers — Managing classifier sets and suppression rules
- Event-Driven Scanning — Real-time scan triggers from cloud storage events
- Run Your First Scan — Step-by-step guide for your first scan