Skip to Content
How-To GuidesCreate a Custom Classifier

Create a Custom Classifier

This guide walks you through defining a custom YAML classifier to detect organization-specific sensitive data patterns.

Time required: 5–10 minutes

Prerequisites:

  • Editor or Admin role in the Customer Dashboard
  • Knowledge of the data pattern you want to detect

Step 1: Navigate to Classifiers

  1. In the Customer Dashboard, navigate to Classifiers in the sidebar.
  2. Click Create Classifier.
  3. Select YAML Editor as the creation method.

Step 2: Write the Classifier YAML

Here is an example classifier that detects an internal employee ID format:

apiVersion: slim.io/v1 kind: Classifier metadata: name: internal-employee-id description: "Internal employee ID in format EMP-XXXXXXXX" category: Employee ID tags: - internal - hr-data spec: type: proximity pattern: '\bEMP-[A-Z0-9]{8}\b' keywords: - "employee" - "emp id" - "staff number" - "personnel" window: 80 confidence: high # high | medium | low — relative to your tuning enabled: true

Choosing the Right Type

Pattern CharacteristicRecommended Type
Fixed format with reliable regexregex
Format needs contextual keywordsproximity
Known list of valuesdictionary
Format includes check digitschecksum

Step 3: Validate the Classifier

  1. Click Validate in the YAML editor.
  2. Slim.io checks:
    • YAML syntax is valid
    • All required fields are present
    • The regex pattern compiles without errors
    • No duplicate classifier name exists
  3. Fix any reported errors before proceeding.

Step 4: Test Against Sample Data

  1. Click Test to open the validation console.
  2. Paste sample text containing the pattern you want to detect:
Employee Record Name: Jane Doe Emp ID: EMP-A1B2C3D4 Department: Engineering
  1. Click Run Test.
  2. Verify the classifier matches the expected values with the correct confidence score.

Test with both positive examples (text that should match) and negative examples (text that should not match) to validate precision and recall before deploying.

Step 5: Deploy the Classifier

  1. Click Deploy to activate the classifier.
  2. The classifier is immediately active and will be used in all subsequent scans.
  3. Existing scan results are not retroactively updated — run a new scan to apply the classifier.

Step 6: Verify in a Scan

  1. Trigger a scan on a connector that contains data matching your pattern.
  2. After the scan completes, navigate to the Data Catalog.
  3. Filter findings by your custom category (e.g., “Employee ID”).
  4. Verify the matches are correct and the confidence scores are appropriate.

Advanced: Detection-as-Code

For teams that manage classifiers through Git, add the YAML file to your repository:

slim-io-config/ classifiers/ custom/ internal-employee-id.yaml

Enable Git sync under Settings > Integrations to automatically deploy classifier changes on merge. See Detection-as-Code for details.

Writing Effective Classifiers

  1. Be specific — Narrow regex patterns reduce false positives
  2. Use proximity — Adding contextual keywords significantly improves accuracy
  3. Set appropriate confidence — Higher for validated formats, lower for broad patterns
  4. Write suppressions — Add suppression rules for known false positive patterns
  5. Document thoroughly — The description field should explain what the classifier detects and why

Next Steps

Last updated on