Google Groups as a Spam Vector

Introduction

If you’re operating a modern mail system and rely on technologies like SPF, DKIM, and DMARC, you expect spoofed and unauthenticated messages to be stopped. But what if spam arrives perfectly authenticated – via Google’s own infrastructure? That’s what happened in our environment, and it took far too long to identify the true cause: Google Groups.

Despite their legitimacy on paper, many spam messages bypassed filtering because they originated from *.googlegroups.com, using valid SPF, DKIM signatures, and impressive reputation scores. These properties rendered Rspamd’s default filters ineffective. The spam arrived flawlessly, often with a List-ID header indicating a Google Group – yet Rspamd let them pass with minimal scoring.

Rejecting them based solely on source domain would be irresponsible. Google Groups are used by legitimate projects, teams, and mailing lists. A blunt solution like domain-based blocking would break too much.

So instead, I decided to analyze first, then block selectively.

Objective

Before taking action, I wanted visibility. Which Google Groups were sending messages to our infrastructure? Which ones were legitimate? Which ones were spam? The goal: create a curated allowlist and denylist based on the List-ID header, which uniquely identifies a mailing list – even across Google’s infrastructure.

Implementation

To extract actionable insights, I wrote a shell script:
analyze_google_groups.sh

It parses mail logs or .eml files, extracts the List-ID headers for messages received from Google Groups, and aggregates them by frequency.

Key characteristics:

  • Fast awk-based parsing
  • Groups by domain and count
  • Works with .eml archives or live logs
  • Exportable to whitelist/blacklist map formats for Rspamd

This gave me a clear picture of which List-IDs were:

  • Frequently seen (likely internal or legitimate)
  • Rare or suspicious (spam candidates)

Rspamd Integration

Once the data was available, I integrated it with Rspamd via multimap:

LIST_ID_WHITELIST {
  type = "header";
  header = "List-ID";
  filter = "regexp";
  map = "${LOCAL_CONFDIR}/local.d/listid_whitelist.map";
  score = -10.0;
}

LIST_ID_BLOCK {
  type = "header";
  header = "List-ID";
  filter = "regexp";
  map = "${LOCAL_CONFDIR}/local.d/listid_blacklist.map";
  score = 20.0;
  action = "reject";
}

Whitelisted List-IDs are explicitly scored down. Blacklisted ones are scored heavily and rejected immediately. This bypasses the issue of high reputation and perfect SPF/DKIM alignment.

Operational Tip: Always place the whitelist rule before the block rule in the configuration. Rspamd applies multimap entries in order, and once a reject is triggered, later rules are skipped.

Outcome

After activating the rules:

  • Spam from obscure or disposable Google Groups was stopped instantly.
  • Legitimate dev lists and team communication remained unaffected.
  • No false positives were reported.

Most importantly, we moved from guesswork to control.

Repository:

https://github.com/filipnet/google-groups-scan

Summary

Google Groups is not the enemy — but blind trust is.
When spam is relayed by a high-reputation domain, classic filters like SPF/DKIM won’t help.

My approach:

  • Use real data to identify groups in use
  • Only whitelist what’s proven legitimate
  • Reject or penalize everything else with confidence

This gives me full control over a common spam vector without disrupting valid traffic.