What an AI-native operating system actually is.
Five layers. Every install ships all of them. Skipping any of them is the difference between AI that lifts the P&L and AI that sits in a license agreement. This page is the technical reference; if you want the operator-fluent overview, the Method page is shorter.
Five layers, top to bottom.
The compute substrate.
Frontier models (Claude, GPT, Gemini) for high-judgment work. Self-hosted Llama-class models for high-volume, low-margin work. Vertical-specialty models (CoCounsel, Harvey, Med-PaLM) where they meaningfully beat frontier for the domain.
Model selection is the substrate of everything above. Wrong model for the job and every layer above suffers — agents hallucinate more, knowledge retrieval fails on edge cases, governance gets harder to defend.
Most firms run on whatever ChatGPT or Copilot defaults to. That works for the easy 60% and silently breaks on the hard 40% where the workflow actually pays back.
- Anthropic Claude Opus/Sonnet
- OpenAI GPT-5
- Llama 3.3 (self-hosted)
- Domain models (CoCounsel, Harvey, etc.)
Your firm's IP, made retrievable.
Proprietary knowledge bases built from your engagement files, working papers, historical decisions, partner workproduct, and operating playbooks. Embedded into a vector store, retrieval tuned to your domain's terminology, scoped per-user via your existing permission model.
Without your knowledge in retrieval, models hallucinate around your firm's positions, partners stay the bottleneck for substantive questions, and new staff take twice as long to ramp. The knowledge layer is what makes generic AI specifically valuable to your firm.
You're paying frontier-model prices for generic answers your firm's senior people would never have given. The model layer alone can't fix this.
- Pinecone / pgvector
- LlamaIndex
- Anthropic file API
- Custom embedding fine-tunes
Models that act, not just talk.
Autonomous routines that execute named workflows end-to-end — drafting a memo, processing intake, reconciling a close, answering inbound. Each agent has a defined tool surface, evaluation suite, and human-review checkpoints where the workflow demands them.
Without agents, the model is a research assistant. With agents, the model is a junior associate who works 24/7 and never forgets. The agent layer is what moves the P&L — chat doesn't.
Productivity lift caps out at ~5–10% — the share of work where a senior person was happy to use a chatbot. The rest stays manual.
- Anthropic Computer Use
- Vercel AI SDK + tool use
- Inngest / Trigger.dev (orchestration)
- LangGraph (when state machines warranted)
The business processes themselves — rebuilt.
The actual operational workflows in scope, rebuilt around the agent layer. Some workflows get fully encoded; some get reshaped into hybrid human+agent loops; a few stay manual because the math doesn't favor automation. The Install spec says which, by name.
Layers 1–3 don't matter if the workflow underneath is broken. A poorly-designed process automated is still a poorly-designed process — just faster. The workflow layer is where most consultants stop and most install attempts fail.
You've automated yesterday's workflow. The productivity lift is real but capped at the ceiling of the old design.
- Internal workflow rebuild playbook
- Vertical workflow templates (CPA close, law intake, HVAC dispatch, etc.)
- BPMN modeling (for complex re-designs)
Productivity that isn't measured isn't productivity. Compliance that isn't logged isn't compliance.
Instrumentation for every workflow's KPIs. Audit logging for every model interaction. Data-routing controls for regulated workloads. Audit-trail evidence for peer review, regulatory exam, or just internal sanity. The dashboard your operations team actually checks.
Without observability, you have no idea whether the install is working. Without governance, you can't defend the install in a regulatory exam or peer review. These aren't afterthoughts — they're the layer that lets the install actually live in a real firm.
You're flying blind on a system that touches client data. That posture works until the first incident, at which point it doesn't.
- Sentry / Datadog (observability)
- Custom KPI dashboards
- Audit-log pipelines to SIEM
- Governance memo template (legal-ready)
Why the order matters.
The numbering is the dependency order. Layer 5 depends on Layer 4 (you can't instrument a workflow that doesn't exist). Layer 4 depends on Layer 3 (you can't rebuild a workflow around agents you haven't built). Layer 3 depends on Layer 2 (agents without your knowledge are generic). Layer 2 depends on Layer 1 (knowledge retrieval that uses the wrong model is just slow).
Most firms stop at Layer 1. Some make it to Layer 2 (Copilot + a SharePoint search). The few that ship Layers 3+ in-house do so over years and at multiples of the externally-driven Install cost.
Named, by layer.
The stack is selected per-engagement. This is the current default.
- Anthropic Claude Opus/Sonnet
- OpenAI GPT-5
- Llama 3.3 (self-hosted)
- Domain models (CoCounsel, Harvey, etc.)
- Pinecone / pgvector
- LlamaIndex
- Anthropic file API
- Custom embedding fine-tunes
- Anthropic Computer Use
- Vercel AI SDK + tool use
- Inngest / Trigger.dev (orchestration)
- LangGraph (when state machines warranted)
- Internal workflow rebuild playbook
- Vertical workflow templates (CPA close, law intake, HVAC dispatch, etc.)
- BPMN modeling (for complex re-designs)
- Sentry / Datadog (observability)
- Custom KPI dashboards
- Audit-log pipelines to SIEM
- Governance memo template (legal-ready)
Book an Ops Call.
30 minutes. Operator-to-operator. No deck. No follow-up nurture sequence designed to wear you down.
Book an Ops Call →