Detecting BGP Route Leaks with Python and RPKI
Building a lightweight BGP route leak detection tool using pyBGPStream and RPKI validation to catch misconfigurations before they propagate.
Contents
Why Route Leaks Still Happen
Despite years of best practices documentation, BGP route leaks remain one of the most common causes of large-scale internet outages. A single misconfigured peer can advertise routes it should not, redirecting traffic through unintended paths or causing reachability failures across entire regions.
The problem is that BGP was designed around trust. When your peer announces a prefix, your router accepts it if it matches your inbound policy. If your policy is too permissive — or if you forgot to apply one — leaked routes slip through.
The Detection Approach
Rather than waiting for NOC tickets, we can build proactive monitoring. The architecture looks like this:
The approach combines two data sources:
- BGP stream data — real-time route announcements from public route collectors
- RPKI validation — checking whether announcements match the signed ROA (Route Origin Authorization) records
from pybgpstream import BGPStream
import subprocess
import json
def validate_rpki(prefix, origin_asn):
"""Check if a BGP announcement is RPKI-valid."""
result = subprocess.run(
["rpki-client", "-j", "-n", prefix],
capture_output=True, text=True
)
if result.returncode != 0:
return "not-found"
data = json.loads(result.stdout)
for roa in data.get("roas", []):
if roa["asn"] == origin_asn and roa["maxLength"] >= int(prefix.split("/")[1]):
return "valid"
return "invalid"
stream = BGPStream(
project="ris-live",
record_type="updates",
filter="prefix more 10.0.0.0/8"
)
for rec in stream.records():
for elem in rec:
if elem.type == "A":
prefix = elem.fields["prefix"]
as_path = elem.fields["as-path"].split()
origin = int(as_path[-1])
status = validate_rpki(prefix, origin)
if status == "invalid":
print(f"RPKI INVALID: {prefix} from AS{origin}")
Filtering the Noise
Raw BGP stream data is noisy. A busy route collector sees millions of updates per hour. We need to filter down to what matters:
- Your prefixes — announcements for prefixes you originate or your customers originate
- Your upstream paths — routes that should only come from specific transit providers
- RPKI-invalid origins — any announcement where the origin ASN does not match the ROA
The key insight is to maintain a baseline of expected announcements and alert on deviations. This is where Python shines — you can build a stateful monitor that tracks the current RIB and flags changes.
Integrating with Alerting
Once you have a detection pipeline, the next step is alerting. We push alerts to both Slack and PagerDuty depending on severity:
def classify_severity(prefix, origin_asn, rpki_status):
"""Determine alert severity based on the leak characteristics."""
prefix_len = int(prefix.split("/")[1])
if rpki_status == "invalid" and prefix_len <= 16:
return "critical" # Large prefix, RPKI invalid
elif rpki_status == "invalid":
return "warning" # Smaller prefix, still invalid
elif origin_asn not in EXPECTED_ORIGINS.get(prefix, set()):
return "info" # Unexpected origin but RPKI valid
return None
Deployment Considerations
We run this as a systemd service on a small VM colocated with our route servers. A few things we learned:
- Memory matters — pyBGPStream can consume significant memory if you are tracking a full table. Filter early and aggressively.
- Rate limit alerts — BGP convergence events can trigger hundreds of updates in seconds. Debounce your alerting with a 30-second window.
- Log everything — even if you do not alert on it. Historical BGP data is invaluable for post-incident analysis.
The total cost is one small VM and a few hours of Python. Compared to the potential impact of an undetected route leak, it is a worthwhile investment.
What This Does Not Catch
This approach has limitations. It relies on public route collector visibility, which means leaks that do not propagate to a collector go undetected. It also cannot detect leaks where the origin ASN is correct but the AS path is manipulated — that requires path validation techniques like ASPA or BGPsec.
For most operational teams, RPKI validation plus origin monitoring covers the high-impact scenarios. Start there and add complexity as needed.