Automating Spam Training with rspamd-learn-helper

Fighting spam is a constant battle. Even with powerful tools like Rspamd, it’s easy to forget to regularly train your filters, especially in multi-user or low-maintenance environments. That’s where rspamd-learn-helper comes in—a simple yet effective script that brings automation, clarity, and ease of use to the spam learning process.

Motivation

Rspamd offers a powerful bayesian filter that improves significantly with regular learning. However, manually invoking rspamc learn_spam or learn_ham can become tedious, and scripting it correctly for production use (e.g., on a mail server) is not always straightforward. Especially when mailboxes are spread across several folders or spam is collected into shared IMAP locations, it’s easy to lose track or to miss the opportunity to retrain the filter efficiently.

This project aims to:

  • Automate and standardize the learning process
  • Make spam training repeatable and consistent
  • Log clearly what was learned and why
  • Support integration into systemd timers or cron jobs

How It Works

The rspamd-learn-helper script scans defined mail folders (e.g., Junk, Spam, INBOX, or Ham) and invokes rspamc to train Rspamd accordingly. It supports:

  • Multiple mailbox sources
  • Custom folder mappings for spam and ham
  • Dry-run mode for safe testing
  • Verbose logging with color-coded CLI output
  • HTML reporting for monitoring and automation logs

You can integrate the script into a daily or hourly cron job, or run it manually as needed. For production environments, it’s ideal to connect it to a scheduled systemd timer, ensuring that your spam filters evolve over time with minimal effort.

Example Use Case

Imagine a mail server where:

  • Users occasionally move misclassified spam into a personal folder named .TrainingSpam
  • Legitimate emails (ham) that were falsely marked as spam are moved into .TrainingHam
  • Each user has their own Maildir folder structure, e.g. /home/username/Maildir/

With the default configuration:

home_directory="/home"
spam_folder=".TrainingSpam"
ham_folder=".TrainingHam"

The script will recursively loop through all subdirectories in /home, and look for the paths:

/home/*/Maildir/.TrainingSpam/cur/home/*/Maildir/.TrainingHam/cur

Each message file in these cur/ folders will be passed to Rspamd for learning as spam or ham, respectively.

This setup ensures:

  • Learning is done only on explicitly marked samples
  • Each user contributes to Rspamd’s bayesian model without risk of misclassification
  • Automation is safe and transparent, with optional logging

You can adjust the folder names or base path via variables, or use custom regex-based logic if needed.

Designed for Admins

rspamd-learn-helper was created to:

  • Reduce false positives and false negatives
  • Automate repetitive mail processing tasks
  • Gain insight into the effectiveness of the bayesian filter
  • Offer a transparent, low-maintenance solution for small and medium-sized mail servers

If you value automation and want your mail server to get smarter over time, this helper script can make a noticeable difference in your spam detection quality.

👉 Get it here:

Pull requests, suggestions, and issues are welcome.