Why Compiled Skill Servers Outperform Flat File Distribution for Enterprise AI Workflow Deployments
This post describes an architectural pattern we developed and use internally. It is a starting point for discussion — not a product specification. The approach is novel and the trade-offs described here are specific to our deployment context.
Executive Summary
Enterprise AI workflows depend on two kinds of knowledge: tools (what the model can do) and skills (how the model should behave in a specific domain). Tools are well-served by MCP servers. Skills have typically been distributed as Markdown files on each user's local filesystem — an approach that loads the entire content into the context window at the start of every conversation, whether it is needed or not.
This paper describes an architectural pattern we developed to address this: embedding procedural skills directly inside an MCP server, serving them on demand via a tiered loading pattern. The result eliminates manual file distribution, synchronises skill updates with tool releases, reduces context consumption for conversations that do not invoke the skill, and provides a practical degree of intellectual property protection. These benefits come with real trade-offs that this paper addresses directly.
1. The Problem: Flat File Skill Distribution
The dominant pattern for distributing workflow instructions to AI assistants is straightforward: create a Markdown file describing the procedure and place it where your AI assistant will read it at startup. In Claude.ai, that typically means /mnt/skills/user/. In Claude Code, it means CLAUDE.md. The file loads into context at the start of every conversation.
For a simple one-page workflow, this is acceptable. For a complex enterprise procedure covering contract parsing, tax mapping, destination-split rules, seven structured item codes, and amendment edge cases — such a file easily exceeds 2,500 tokens. Those tokens are consumed in every conversation, including ones about unrelated tasks. Context compaction occurs earlier. Long conversations lose critical information.
There is a second problem: distribution. If your workflow encodes institutional knowledge — the specific nuances your team has refined over years of operating in a domain — distributing that knowledge to every employee as a readable Markdown file on a local filesystem is neither efficient nor secure.
2. The Approach: Skills as MCP Tools
The Model Context Protocol exposes three primitives: Tools (callable functions), Resources (readable data), and Prompts (reusable prompt templates). Our approach uses the Tools primitive to implement a skill server with three operations:
Returns an inventory of available skill names and descriptions. Minimal load — a few dozen tokens. Called first to discover what is available.
Returns a compact summary (~500 tokens) containing core rules, workflow steps, and key parameters. Designed to cover the majority of use cases without loading the full reference.
Returns the full detailed reference, optionally filtered to a named section. Called only when the compact summary is insufficient for the task at hand.
The skill content is stored as TypeScript string constants inside the MCP server repository — organised into a compact summary and named detail sections. The server compiles to JavaScript and distributes like any other npm package.
Load the minimum first, load more only if needed. Most conversations invoking a complex workflow only need the compact summary. Full detail — code examples, exhaustive reference tables, edge case rules — is retrieved only when a specific ambiguity demands it.
3. Benefits Analysis
Context Efficiency
In Claude.ai, skill files in /mnt/skills/user/ are injected into the system prompt at conversation start. A 2,500-token file consumes that capacity in every session regardless of the task. With the MCP server, nothing loads upfront — the model calls get_skill only when a domain task is invoked, loading ~500 tokens instead of 2,500.
Zero Distribution Overhead
Every employee who installs the MCP server npm package automatically receives the skills — no separate file copy, no directory to manage, no onboarding process that includes Markdown file installation instructions. The skill and the tools it references travel together in the same package.
Version Synchronisation
Skill content and tool implementation share a version number. When a new Xero tool is added and the workflow skill is updated to reference it, they publish together. Employees running npm update receive both changes atomically. There is no possible state where a skill references a tool that does not yet exist in the installed version.
Intellectual Property Protection
A Markdown file on an employee's filesystem can be read in any text editor. Compiled JavaScript distributed via npm is substantially less readable. The institutional knowledge encoded in the skill logic — the domain-specific nuances your team has developed — is obscured from end users while remaining fully readable by the AI being served.
4. Trade-offs and Weaknesses
This approach carries real disadvantages that any practitioner should evaluate before adopting it.
| Dimension | Flat File (Markdown) | MCP Skill Server |
|---|---|---|
| Content update | Edit the file directly, no build step | Edit .ts, build, publish |
| Employee-readable | Fully readable in any editor | Compiled JS — obscured but not encrypted |
| Efficiency in Claude Code | Loaded via CLAUDE.md or @ reference | Tool call loads same content — identical footprint |
| Efficiency in Claude.ai | Injected into system prompt at startup | Loaded only on demand, tiered loading |
| Learning curve | Markdown — everyone can write it | TypeScript, MCP SDK, npm tooling required |
| Distribution | Manual copy per employee | Automatic via npm install |
| Version sync | None — tools and skills are decoupled | Atomic — same package version |
In Claude Code sessions, calling get_skill via the MCP tool loads content into the context window exactly as reading a local file would. The efficiency benefit applies primarily to Claude.ai desktop conversations, where skills were previously pre-loaded into the system prompt from the local filesystem. If your workflow is primarily Claude Code-driven, the token advantage is minimal.
5. The Alternative: The MCP Prompts Primitive
The Model Context Protocol defines three first-class primitives. We implemented our skill server using the Tools primitive, but the Prompts primitive is actually more idiomatic for this use case. MCP Prompts allow servers to expose named, reusable prompt templates that integrate natively into AI assistant interfaces.
In Claude.ai, MCP Prompts surface as slash commands — /dnd-xero-workflow — invocable directly from the chat interface without the model needing to call a discovery tool. The Prompts approach is cleaner from a protocol perspective, but it offers less control over tiered loading. The Tools primitive we use enables the summary/section-detail model that Prompts-based loading makes harder to implement.
Flexible Tiered Loading
Full control over granularity: compact summary first, specific sections on demand. Requires explicit tool calls for discovery and loading. Works in any MCP-compatible client.
Native Interface Integration
Skills surface as slash commands in Claude.ai. Discovery and invocation without a separate discovery tool. Less flexibility in loading granularity — the skill typically loads in full.
6. Practical Implications
This pattern is most valuable in cases that combine several conditions. If your organisation meets most of these criteria, the implementation investment is likely worth the cost:
- Complex, high-volume workflows. One-page skills in CLAUDE.md are over-engineering. Multi-thousand-token procedures that apply to some sessions but not others are the target use case.
- Distributed team. One employee. One filesystem. One Markdown file. That is manageable. Ten employees with unsynchronised skill updates is the problem this pattern solves.
- Proprietary institutional knowledge. If the workflow logic represents intellectual property you do not want exposed in plain text on employee machines, compiled is preferable to Markdown.
- Strong tool-skill coupling. When skills reference specific tools by name — and those tools change with versions — atomic delivery prevents version drift.
- Primarily Claude.ai (not Claude Code) usage. The context efficiency benefit only applies to environments where skills were previously pre-loaded from the filesystem.
Conclusion
Flat file distribution works until it doesn't. For single-user deployments with simple skills, a Markdown file remains the correct approach — less complexity, direct editing, no build cycle. For team deployments with complex, proprietary workflows and Claude.ai desktop environments, the MCP skill server approach solves several problems simultaneously.
This is not a widely adopted pattern. We found no direct prior art in public MCP server literature — most existing skill-serving implementations serve facts (memory skills) or reference documentation (resource skills), not procedural workflow knowledge. The tiered loading pattern in particular — compact summary first, section on demand second — is our own contribution to this problem.
As with any architectural choice, the right decision depends on context. This paper was written to help practitioners evaluate that context honestly.
A skill in a flat file is knowledge you distribute. A skill in an MCP server is knowledge you deploy. The difference — version control, atomic delivery, IP protection, on-demand loading — only matters when distribution becomes a problem. Build accordingly.