Building a Clipboard DLP Agent: Why I Went Native

Pure Python clipboard monitoring misses 30%+ of sensitive data. Here's why I wrote a native C DLL hook and how Shannon entropy catches what regex can't.

The Problem With Pure Python

Python's pyperclip polls the clipboard every N milliseconds. Between polls, clipboard content can change — and you miss it.

For a Data Loss Prevention tool, missing anything is not acceptable.

The Native Hook Approach

I wrote a C DLL that registers a system-level clipboard listener using SetClipboardViewer on Windows. This fires synchronously on every clipboard change — no polling, no misses.

HWND nextViewer;

LRESULT CALLBACK WndProc(HWND hwnd, UINT msg,

WPARAM wParam, LPARAM lParam) {

if (msg == WM_DRAWCLIPBOARD) {

// Clipboard just changed — notify Python

trigger_python_callback();

if (nextViewer)

SendMessage(nextViewer, msg, wParam, lParam);

}

return DefWindowProc(hwnd, msg, wParam, lParam);

}

Python loads this via ctypes and registers a callback.

Why Shannon Entropy?

Regex catches patterns like credit card numbers and email addresses. But API keys and private keys don't follow predictable patterns — they're just high-entropy strings.

Shannon entropy measures randomness. A random 40-character string scores ~5.7 bits/char. English text scores ~4.0 bits/char.

If entropy > 4.5 AND length > 20 AND no spaces → flag it.

This catches AWS secret keys, GitHub tokens, and private keys that no regex would find.

The Result

80+ detection patterns + entropy analysis + Luhn validation for payment card numbers. False positive rate under 2% in production.