Esc
Type to search posts, tags, and more...
Skip to content

Replacing Legacy ACLs with Infrastructure as Code

Moving from manual ACL management to declarative infrastructure as code using Terraform and Python, eliminating the risk of fat-finger mistakes at 2am.

Contents

The Problem with Manual ACL Management

If you have spent any time managing network access control lists by hand, you know the pain. A typical enterprise Cisco router might have hundreds of ACL entries spread across multiple interfaces. Each change requires an SSH session, careful typing, and a prayer that you did not accidentally lock yourself out.

Consider a typical day: a ticket comes in asking to allow TCP port 443 from the new developer subnet to the staging environment. You SSH into the router and type:

router# configure terminal
router(config)# ip access-list extended STAGING-INBOUND
router(config-ext-nacl)# 150 permit tcp 10.50.0.0 0.0.3.255 host 10.100.5.20 eq 443
router(config-ext-nacl)# end
router# write memory

Simple enough. But multiply this by dozens of routers, factor in change windows, peer review requirements, and the occasional emergency rollback, and you have a process that does not scale.

Why Infrastructure as Code

Infrastructure as Code (IaC) treats network configuration the same way software engineers treat application code: version-controlled, peer-reviewed, tested, and automatically deployed.

The benefits are immediate:

Every change is tracked in Git. Every deployment is repeatable. Every rollback is a git revert away.

Here is what that same ACL change looks like in Terraform using the Cisco IOS provider:

resource "cisco_ios_access_list_extended" "staging_inbound" {
  name = "STAGING-INBOUND"

  entry {
    sequence = 150
    action   = "permit"
    protocol = "tcp"
    source   = "10.50.0.0 0.0.3.255"
    destination      = "host 10.100.5.20"
    destination_port = "eq 443"
    remark   = "TICKET-4521: Allow dev subnet to staging HTTPS"
  }
}

Building the Pipeline

Step 1: Extract Current ACLs

First, we need to capture the current state. A Python script using Netmiko does the heavy lifting:

from netmiko import ConnectHandler
import json

device = {
    "device_type": "cisco_ios",
    "host": "core-rtr-01.lab.internal",
    "username": "netops",
    "password_from_vault": True,
}

def extract_acls(device_params: dict) -> dict:
    """Connect to device and extract all extended ACLs."""
    conn = ConnectHandler(**device_params)
    output = conn.send_command("show ip access-lists", use_textfsm=True)
    conn.disconnect()
    return output

acls = extract_acls(device)
with open("baseline_acls.json", "w") as f:
    json.dump(acls, f, indent=2)

Step 2: Define the Desired State

We store ACL definitions as structured YAML files:

# acls/staging-inbound.yaml
name: STAGING-INBOUND
interface: GigabitEthernet0/1
direction: in
entries:
  - seq: 100
    action: permit
    protocol: tcp
    source: 10.10.0.0/22
    destination: 10.100.5.0/24
    port: 443
    remark: "Production web access"
  - seq: 150
    action: permit
    protocol: tcp
    source: 10.50.0.0/22
    destination: host 10.100.5.20
    port: 443
    remark: "TICKET-4521: Dev subnet staging access"
  - seq: 999
    action: deny
    protocol: ip
    source: any
    destination: any
    log: true
    remark: "Implicit deny with logging"

Step 3: Generate and Apply Configurations

A Jinja2 template renders the YAML into Cisco IOS commands:

from jinja2 import Environment, FileSystemLoader
import yaml

env = Environment(loader=FileSystemLoader("templates"))
template = env.get_template("acl_extended.j2")

with open("acls/staging-inbound.yaml") as f:
    acl_data = yaml.safe_load(f)

config = template.render(acl=acl_data)
print(config)

The rendered output:

ip access-list extended STAGING-INBOUND
 remark Production web access
 100 permit tcp 10.10.0.0 0.0.3.255 10.100.5.0 0.0.0.255 eq 443
 remark TICKET-4521: Dev subnet staging access
 150 permit tcp 10.50.0.0 0.0.3.255 host 10.100.5.20 eq 443
 remark Implicit deny with logging
 999 deny ip any any log

Step 4: CI/CD Integration

We wire this into a GitHub Actions pipeline that validates syntax, runs against a lab environment, waits for approval, then deploys to production:

# .github/workflows/acl-deploy.yml
name: ACL Deployment
on:
  push:
    paths: ['acls/**']
jobs:
  validate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Lint ACL definitions
        run: python scripts/validate_acls.py
      - name: Dry-run against lab
        run: python scripts/deploy.py --target lab --dry-run
  deploy:
    needs: validate
    environment: production
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Apply to production
        run: python scripts/deploy.py --target production

Results

After migrating to IaC, we saw measurable improvements:

  • Change lead time dropped from 4 hours to 15 minutes
  • Rollback time dropped from “whatever it takes” to a single git revert
  • Configuration drift eliminated through periodic reconciliation
  • Audit compliance became trivial with Git history as the source of truth

Lessons Learned

The hardest part was not the tooling. It was convincing the team that treating network configs like code was worth the initial investment. Once the first emergency rollback took 30 seconds instead of 30 minutes, the skeptics came around.

Start small. Pick one ACL on one router. Automate it end to end. Show the results. Then scale.

! Was this useful?