Skills

Data Classification Scanner

verified
Updated May 13, 2026

What this skill does

Scans GitHub repositories, S3 buckets, and Google Drive for personally identifiable information (PII), protected health information (PHI), financial data, and other sensitive content. Classifies findings by data type and sensitivity level, maps data flows, and generates a comprehensive data classification report with remediation recommendations.

Example

PROMPT

We're a healthcare company subject to HIPAA and need to know where sensitive data lives before our annual compliance audit. Can you scan our 60 GitHub repos and 15 S3 buckets for PII and PHI? We also have a shared Google Drive with patient-facing documents. Need a classification report showing what we found, where it is, and what we should do about it.

OUTPUT
Data Classification Report — Healthcare Corp (HIPAA Scope) EXECUTIVE SUMMARY Total Findings: 47 across 3 data sources Critical (PHI): 12 — SSN, medical record IDs, patient names in code/config High (PII): 18 — Email, phone, addresses in repos and S3 Medium: 11 — Financial data, internal IDs Low (metadata): 6 — Sample data, test fixtures FINDINGS BY SOURCE | Source | Critical | High | Medium | Low | Total | |---------------|----------|------|--------|-----|-------| | GitHub (60) | 8 | 14 | 7 | 4 | 33 | | S3 (15) | 3 | 3 | 2 | 1 | 9 | | Google Drive | 1 | 1 | 2 | 1 | 5 | TOP FINDINGS (Critical) - patient-service/config: sample SSN in config template - analytics-export bucket: CSV with MRN + diagnosis codes - patient-portal repo: hardcoded test patient name in seed data REMEDIATION PRIORITIES 1. Remove PHI from patient-service config, use env vars 2. Encrypt S3 analytics-export bucket, enable access logging 3. Replace seed data with synthetic PHI in patient-portal Generated: Data flow map, sensitivity matrix, remediation checklist

Required Tools

GitHubGitHub
AWSAWS
Google DriveGoogle Drive

Compatible Agents

ClaudeClaude
CursorCursor
WindsurfWindsurf
ChatGPTChatGPT
GitHub CopilotGitHub Copilot
Any MCP-compatible agentAny MCP-compatible agent

Add to your agent

Download Skill

Or install via CLI:

$ npx skills add webrix-ai/agent-skills --skill data-classification-scanner

Deploy Org-wide

Provision to teams via RBAC
Identity-aware execution
Signed & verified skills
Full audit trail
Auto-bundled with required MCP servers
Use withwillow

Free for up to 5 users

Your agents are already in the wild.

Give them a Basecamp. Go from AI chaos to AI work, in minutes.

Data Classification Scanner | Willow Marketplace