Audit Your File System with Surveilr: A Practical Guide
How to scan your file system and query files using SQL TL;DR: In this guide, you'll learn how to use surveilr to scan directories, capture file metadata in a SQLite database, and query for files using SQL—find PDFs, track document changes, and audit file system activity. Prerequisites: surveilr installed If you've ever needed to track files across your system for compliance audits or project management, you know the pain: Scattered files — Documents, spreadsheets, and PDFs across multiple directories Manual searches — Using find and Finder/Explorer is tedious and limited No audit trail — No record of what existed when Point-in-time only — No way to track file changes over time What if you could turn your entire file system into a queryable SQL database where you can find any file with SQL queries? That's exactly what surveilr does. surveilr is an edge-based, SQL-first surveillance platform that turns your file system into a Resource Surveillance State Database (RSSD)—a SQLite database containing all your file metadata. Think of it as "Git for your entire file system's metadata" that you can query with SQL. Key features: 📂 Scans directories and captures file metadata (size, timestamps, hashes) 🔍 Query files with SQL (no custom DSL to learn) 🔄 Watch mode for continuous monitoring 🔒 Works offline (edge-based, no cloud required) 📊 Perfect for compliance audits (HIPAA, GDPR, SOC 2) Let's see it in action. Initialize a new surveillance database: surveilr admin init -d my-audit.db What just happened? my-audit.db—a SQLite database that will store all your file evidence. This is your Resource Surveillance State Database (RSSD). 💡 Pro tip: The RSSD is a standard SQLite database. Once created, it's independent of surveilr and can be queried by any tool that supports SQLite. Scan your Documents folder (or any directory): # Scan your Documents folder surveilr ingest files -r ~/Documents -d my-audit.db What's happening behind the scenes: surveilr walks the directory tree recursively Computes file hashes (SHA-256) for content tracking Captures metadata: size, timestamps, permissions, extensions Stores everything in SQLite tables Now comes the fun part. Open the interactive SQL shell: surveilr shell -d my-audit.db 📌 Note: File metadata lives in two joined tables: uniform_resource — Core resource metadata (hash, size, timestamps) ur_ingest_session_fs_path_entry — File-specific metadata (path, name, extension) Find all PDF files modified in the last 30 days: SELECT e.file_path_abs, e.file_basename, u.size_bytes / 1024 AS size_kb, u.last_modified_at FROM uniform_resource u JOIN ur_ingest_session_fs_path_entry e ON u.uniform_resource_id = e.uniform_resource_id WHERE e.file_extn = 'pdf' AND u.last_modified_at > datetime('now', '-30 days') ORDER BY u.last_modified_at DESC; Example output: /Users/you/Documents/report-2024.pdf | report-2024.pdf | 2048 | 2024-05-12 /Users/you/Projects/proposal.pdf | proposal.pdf | 3248 | 2024-05-10 /Users/you/Desktop/invoice-may.pdf | invoice-may.pdf | 512 | 2024-05-08 Find all Excel spreadsheets across your system: SELECT e.file_path_abs, e.file_basename, u.size_bytes / 1024 AS size_kb, u.last_modified_at FROM uniform_resource u JOIN ur_ingest_session_fs_path_entry e ON u.uniform_resource_id = e.uniform_resource_id WHERE e.file_extn IN ('xlsx', 'xls', 'csv') ORDER BY u.last_modified_at DESC LIMIT 20; Search for documents by filename: SELECT e.file_path_abs, e.file_basename, u.last_modified_at FROM uniform_resource u JOIN ur_ingest_session_fs_path_entry e ON u.uniform_resource_id = e.uniform_resource_id WHERE e.file_extn IN ('docx', 'doc') AND e.file_basename LIKE '%report%' ORDER BY u.last_modified_at DESC; Export results as JSON for reports: surveilr shell -d my-audit.db --cmd " SELECT e.file_path_abs, e.file_basename, u.size_bytes, u.last_modified_at FROM uniform_resource u JOIN ur_ingest_session_fs_path_entry e ON u.uniform_resource_id = e.uniform_resource_id WHERE e.file_extn = 'pdf' " --output json > pdf-files-report.json Or save as a SQL file for reuse: cat > find-pdfs.sql pdfs.json Set up continuous monitoring so surveilr automatically re-scans when files change: surveilr ingest files -r ~/Documents -d my-audit.db --watch What happens: surveilr monitors the directory for changes New or modified files are automatically ingested Press Ctrl+C to stop Track PHI locations — Document where patient health information files are stored for GDPR/HIPAA audits. Map PII locations — Track where personally identifiable information is stored across your organization. Detect tampering — Establish file baselines and detect unauthorized modifications during security incidents. Problem: "Permission denied" errors when scanning Solution: Run with appropriate permissions or scan a different directory Problem: RSSD file getting too large Solution: Use --dry-run first to check file counts: surveilr ingest files -r ~/Documents --dry-run You've just created your first compliance evidence database! Here's what to explore next: Email Compliance Tracking — Track email communications via IMAP GitHub/GitLab/Jira Tracking — Project data with Singer taps surveilr.com — Full documentation and advanced features surveilr.com — Official website and documentation Installation Guide — Download and install surveilr ✅ surveilr turns your file system into a queryable SQL database Standard SQLite = no vendor lock-in, works with any SQLite tool Perfect for compliance audits (HIPAA, GDPR, SOC 2) Continuous monitoring with watch mode All data stays on your machine (edge-based, no cloud required) Result: A complete audit trail of your file system that you can query anytime Found this helpful? Visit surveilr.com to learn more! Have questions? Open an issue.
