Skip to Content

Cloud DLP

Cloud DLP extends Slim.io’s data loss prevention to data at rest in cloud storage. It integrates with native cloud DLP services — Google Cloud DLP, AWS Macie, and Azure Purview — while supplementing their detections with Slim.io’s own classification engine.

How Cloud DLP Works

Cloud DLP operates through the same connector infrastructure used for scanning. When enabled, it combines two detection sources:

  1. Slim.io Detection Engine — Slim.io’s multi-layered classifier pipeline (regex, dictionary, proximity, checksum, ML) runs against file contents.
  2. Native Cloud DLP — Findings from the cloud provider’s built-in DLP service are ingested and correlated with Slim.io’s detections.

The result is a unified findings view that merges both sources, deduplicates overlapping detections, and applies a combined confidence score.

Provider Setup

Google Cloud DLP

Google Cloud DLP provides deep content inspection for data stored in GCS, BigQuery, and Datastore.

Prerequisites:

  • Google Cloud DLP API enabled in your project
  • A service account with roles/dlp.user and roles/dlp.reader
  • The Slim.io connector’s service account must also have DLP permissions

Configuration:

connector: provider: gcp dlp: enabled: true inspection_template: projects/your-project/inspectionTemplates/slim-io-template info_types: - CREDIT_CARD_NUMBER - US_SOCIAL_SECURITY_NUMBER - EMAIL_ADDRESS - PHONE_NUMBER - PERSON_NAME min_likelihood: LIKELY max_findings_per_item: 100

Supported info types: Google Cloud DLP supports 150+ built-in info types spanning financial data, government IDs, health data, credentials, and demographic information. Slim.io imports all findings regardless of info type and maps them to its unified category taxonomy.

AWS Macie

Amazon Macie provides automated data discovery and classification for S3 buckets.

Prerequisites:

  • Amazon Macie enabled in the target AWS region
  • The Slim.io cross-account IAM role must include macie2:GetFindings and macie2:ListFindings permissions

Configuration:

connector: provider: aws dlp: enabled: true macie: classification_job_schedule: daily managed_data_identifiers: - CREDIT_CARD_NUMBER - AWS_CREDENTIALS - SSH_PRIVATE_KEY - USA_SOCIAL_SECURITY_NUMBER custom_data_identifiers: [] severity_filter: MEDIUM # Minimum severity: LOW, MEDIUM, HIGH

Finding sync: Slim.io polls Macie findings on a configurable schedule (default: every 6 hours) and merges them with its own scan results. Duplicate findings are deduplicated based on file path and data category.

Azure Purview

Microsoft Purview (formerly Azure Purview) provides unified data governance including sensitive data classification.

Prerequisites:

  • Microsoft Purview account provisioned in your Azure subscription
  • The Slim.io Service Principal must have Purview Data Reader role
  • Sensitivity labels configured in the Microsoft Purview compliance portal

Configuration:

connector: provider: azure dlp: enabled: true purview: account_name: your-purview-account scan_rule_set: default sensitivity_labels: - Confidential - Highly Confidential - Internal classification_rules: - Credit Card Number - U.S. Social Security Number (SSN) - Email Address

Label mapping: Slim.io maps Purview sensitivity labels to its own severity system. Custom label mappings can be configured in the connector settings.

Unified Findings

When both Slim.io and a native cloud DLP service detect sensitive data in the same file, the findings are correlated:

ScenarioBehavior
Both detect same dataFindings are merged; confidence score uses the higher of the two
Only Slim.io detectsFinding is stored with Slim.io as the source
Only native DLP detectsFinding is imported and mapped to Slim.io categories
Category mismatchBoth findings are preserved with their respective categories

Native cloud DLP findings are imported as read-only. Slim.io does not modify or delete findings in the source DLP service. All policy actions (tokenize, mask, quarantine) operate on Slim.io’s own finding records.

Info Type Mapping

Slim.io maintains a mapping table between native cloud DLP info types and its own category taxonomy:

Slim.io CategoryGoogle Cloud DLPAWS MacieAzure Purview
SSNUS_SOCIAL_SECURITY_NUMBERUSA_SOCIAL_SECURITY_NUMBERU.S. Social Security Number
Credit CardCREDIT_CARD_NUMBERCREDIT_CARD_NUMBERCredit Card Number
EmailEMAIL_ADDRESSEMAIL_ADDRESSEmail Address
PhonePHONE_NUMBERUSA_PHONE_NUMBERU.S. Phone Number
AWS CredentialsN/AAWS_CREDENTIALSN/A
Private KeyENCRYPTION_KEYSSH_PRIVATE_KEYN/A

Custom info types from any provider are mapped to a Custom category unless a specific mapping is configured.

Scheduling

Cloud DLP integrations follow the same scheduling model as standard scans:

  • Full sync — Import all findings from the native DLP service (run periodically or after initial setup)
  • Incremental sync — Import only new findings since the last sync (default behavior on schedule)
  • Event-driven sync — Triggered when the native service reports new findings (supported for Google Cloud DLP via Pub/Sub)

The default sync interval is 6 hours. Configure this under Connectors > [Connector Name] > DLP Settings.

Best Practices

  1. Enable native DLP alongside Slim.io — Native services have access to provider-specific data types (e.g., AWS credentials, Azure sensitivity labels) that complement Slim.io’s general-purpose classifiers.
  2. Use Slim.io as the single pane — Even with native DLP enabled, review all findings in the Slim.io dashboard for a unified view across providers.
  3. Align info types — Configure native DLP services to detect the same categories as your Slim.io classifiers to maximize deduplication accuracy.
  4. Monitor sync health — Check the connector health page for sync failures or delays. Native DLP API rate limits can cause temporary sync interruptions.
Last updated on