Building a Clipboard DLP Agent: Why I Went Native
Pure Python clipboard monitoring misses 30%+ of sensitive data. Here's why I wrote a native C DLL hook and how Shannon entropy catches what regex can't.
The Problem With Pure Python
Python's pyperclip polls the clipboard every N milliseconds. Between polls, clipboard content can change — and you miss it.
For a Data Loss Prevention tool, missing anything is not acceptable.
The Native Hook Approach
I wrote a C DLL that registers a system-level clipboard listener using SetClipboardViewer on Windows. This fires synchronously on every clipboard change — no polling, no misses.
Python loads this via ctypes and registers a callback.
Why Shannon Entropy?
Regex catches patterns like credit card numbers and email addresses. But API keys and private keys don't follow predictable patterns — they're just high-entropy strings.
Shannon entropy measures randomness. A random 40-character string scores ~5.7 bits/char. English text scores ~4.0 bits/char.
If entropy > 4.5 AND length > 20 AND no spaces → flag it.
This catches AWS secret keys, GitHub tokens, and private keys that no regex would find.
The Result
80+ detection patterns + entropy analysis + Luhn validation for payment card numbers. False positive rate under 2% in production.