Esc
Type to search posts, tags, and more...
Skip to content

A network MCP

What a network MCP server is, why multi-device correlation is the real value, and the hard problems: two paths for different use cases, command whitelists as a safety model, and auth delegation.

Contents

A BGP session has been flapping for 20 minutes. You need to check neighbor state on both peers, compare interface counters on both ends of the link, pull logs from the router and the firewall, and cross-reference device inventory. That’s four SSH sessions, a dozen show commands, and 15 minutes of context-switching before you even start diagnosing.

The value of a network MCP server isn’t running one command on one device — you can SSH for that. It’s an agent doing the entire correlation in one invocation: querying across devices, collecting the data, and presenting a coherent picture. Fifteen minutes of manual work compressed into a single request.

The concept

MCP (Model Context Protocol) is an open standard that gives LLM agents a structured way to call tools on external systems. A network MCP server exposes device operations — state queries, interface checks, routing tables, log retrieval — as tools an agent can invoke. The agent calls a tool with typed parameters, gets back structured data, and decides what to do next.

An IETF Internet-Draft is already exploring MCP for network management. The industry is paying attention.

The interesting question isn’t the protocol. It’s the problems underneath.

Two paths

The industry frames network MCP around YANG models and NETCONF. That works — for some use cases. The mistake is thinking it’s the only path.

Well-defined use cases

Self-service VLAN provisioning, interface description updates, static route management. You know the inputs, you know the YANG model, you build a structured MCP tool. The tool is the expertise — typed parameters, validated inputs, predictable outputs.

YANG + NETCONF is the right path here.

Open-ended use cases

Network diagnostics. Troubleshooting. “Why is this circuit degraded?” You can’t anticipate every diagnostic path. One problem needs show ip bgp summary, another needs show interfaces counters errors, another needs show logging | include OSPF.

CLI is the right path here.

For diagnostics, give the agent a whitelisted set of CLI commands and let skills describe when and how to use them. The MCP tool becomes the transport — run this command on this device, return the output. The expertise lives in the skill — the human-documented runbook that tells the agent what to look for and how to interpret it.

The SSH route needs jump server support. Most production networks don’t expose device management directly. The MCP server connects through a bastion host, the same way your NOC team does — SSH to the jump server, then to the target device. The agent follows the same path, same access controls.

The whitelist as a safety model

For read-only diagnostics, a command whitelist is the safety model. The agent can run anything on the whitelist without human approval. Anything not on the list gets blocked.

This is simpler than building a risk classification engine. You maintain a list of safe show and display commands per platform. The MCP server enforces it. No ambiguity, no edge cases in risk scoring.

Write operations are a different problem. Anything that changes state — configure terminal, write memory, route policy modifications — needs approval gates and human sign-off. The read/write boundary is the natural place to draw the line. Start read-only, get value immediately, and add write operations later behind explicit approval flows.

Auth delegation

The hardest unsolved problem: how does an agent act on behalf of a human?

When you SSH into a router, you authenticate as yourself. The audit log says your username ran those commands. When an agent does it, the questions stack up:

  • Whose credentials does the agent use?
  • Does the agent get its own service account, or carry the human’s token?
  • If the agent’s session is compromised, how do you revoke it without killing the human’s access?
  • How does the device know which human authorized the session?

The MCP spec includes OAuth 2.1 with scope-based access control. That handles app-level authorization. But the identity delegation chain — tracing an agent’s actions back to a specific person with their specific authorization level — is the part the industry hasn’t figured out yet.

This matters for compliance. In regulated environments, “an AI agent made this change” is not an acceptable audit entry. It needs to be “this agent, authorized by this person, following this skill, made this change.”

The audit trail advantage

Here’s the counterintuitive part: an agent produces a better audit trail than a human SSH session.

When you troubleshoot manually, the record is whatever you remember to document afterward. Maybe you copy-paste some outputs into a ticket. Maybe you don’t.

An agent logs everything by design: which commands it chose, which skill guided the decision, the raw output from each device, and its interpretation. Every step is traceable. For change management, compliance, and post-incident review, that’s a significant upgrade over “I SSH’d in and fixed it.”

The challenge is making this audit trail accessible — structured, searchable, and tied to the identity delegation chain so you can trace any action back to the human who authorized it.

Where this goes

Start with read-only. A whitelisted set of diagnostic commands, skills that encode your team’s troubleshooting knowledge, and SSH connectivity through your existing jump servers. That alone replaces the 15-minute manual correlation workflow.

Write operations, approval gates, and the auth delegation problem come next — but you don’t need to solve them to get value today.

! Was this useful?