AIR-GAPPED AI

Air-gapped AI for classified environments and regulated industries

Fully disconnected AI for classified environments, hard data-residency rules, and regulators that won't tolerate any cloud-LLM connection. Onyx + a private LLM (vLLM or Ollama) deployed inside your air-gapped network — no outbound internet required, full audit trails, FedRAMP-aligned controls.
0

Outbound internet calls at runtime. Prompts, embeddings, model weights, and chat history all stay inside your perimeter.

IL5+

Aligned controls: FedRAMP High, DoD IL4 / IL5, GovCloud, Azure Government, sovereign EU and UK — shapes we’ve shipped.

100%

On-prem open-weight model serving (Llama, Mistral, Qwen, DeepSeek). Zero third-party cloud LLM in the data path.

What you get from an air-gapped AI deployment

Six outcomes regulated, classified, and sovereign teams see when they move private AI off cloud LLMs and onto an air-gapped stack inside their perimeter.

Zero Outbound at Runtime

No third-party cloud in the data path. Chat, search, embeddings, vector store, and LLM serving all run inside your perimeter — and stay there, even during inference.

Open-Weight LLMs on Your GPUs

Llama, Mistral, Qwen, DeepSeek, Falcon, or your fine-tuned variants — served by vLLM, SGLang, or Ollama on the GPUs inside your data center, GovCloud region, or classified enclave.

Permission-Aware Retrieval

Connectors respect each source's ACLs and clearance levels. Users only see results from documents they already have access to in the source app — not a one-size share-everything default.

FedRAMP / IL5 / Sovereign Aligned

Hardened components, FIPS-validated crypto, audit logging, and the artifact package your accreditation team needs to bring this through ATO.

Internal Artifact Mirror

Container images, model weights, and dependencies mirrored to a registry inside your perimeter. Upgrades go through your existing change-management process — no surprise dependency calls.

Accreditation-Ready Audit

Every prompt, retrieval, model call, and tool invocation logged in a format your ATO package, IL5 review, HIPAA assessment, or SOC2 audit expects out of the box.

Why ChatGPT Enterprise, Glean, and cloud AI SaaS can't ship to classified or air-gapped environments

ChatGPT Enterprise, Glean, and other cloud AI SaaS were built around a single architectural assumption: your prompts, documents, embeddings, and chat traffic sit in the vendor’s multi-tenant cloud. That assumption is fine for most knowledge workers — but it’s a non-starter the moment your environment is classified, IL5+, sovereign-restricted, or governed by data-residency rules a cloud vendor’s standard SOC report can’t satisfy.

An air-gapped AI deployment — Onyx + LibreChat + vLLM serving open-weight models on your hardware — is the open-source path that survives accreditation review. Same connector breadth across workplace apps, same chat-with-citations UX, same custom-assistants framework. Except the entire data path lives inside your perimeter, and nothing about the deployment depends on an outbound internet connection at runtime.

An air-gapped AI stack is what FedRAMP High, DoD IL5, sovereign-cloud, and classified teams deploy when no cloud-LLM vendor can ship to their environment. Same chat-with-citations UX, same connectors, same per-team assistants. Open-weight models serving on your GPUs, embeddings generated inside your perimeter, audit logs in a format your accreditation team can sign off on.

Inside an air-gapped AI deployment — the 8 capabilities we build

Eight capabilities your air-gapped AI stack delivers behind your perimeter — no outbound internet at runtime, no third-party cloud in the data path, no chat or document content ever leaving the environment your security team owns.

1. Self-hosted LLM inference (no cloud calls at runtime)

vLLM, SGLang, or Ollama serving open-weight models (Llama, Mistral, Qwen, DeepSeek, Falcon, or your fine-tuned variants) on GPUs inside your perimeter. Zero outbound calls to OpenAI, Anthropic, Gemini, or any third-party cloud at inference time. We size the cluster to your peak QPS and choose quantization that fits your GPU budget.

2. Private chat UI with full audit trail

LibreChat or Open WebUI branded for your org and deployed inside the air-gapped boundary. Every message, every model call, every tool invocation, and every document retrieved is logged in your database — the audit log your CISO, ISSO, and regulator each need, and the record that proves no data ever crossed the perimeter.

3. On-prem enterprise search across your sources

Onyx (formerly Danswer) running fully inside the air-gap, indexed against the workplace and line-of-business apps that exist behind your perimeter — SharePoint, network shares, on-prem Confluence, classified document stores, internal ticketing, and custom systems. Permission-aware retrieval respects each source’s ACLs and clearance levels.

4. Air-gapped RAG with on-prem embeddings

Vector store (pgvector or Qdrant) and embedding model both running on your hardware. Document embeddings are generated inside the perimeter and never transmitted out for processing by a third party. Chat answers come back with inline citations linked to the source paragraphs in the original document.

5. FedRAMP / IL5 / sovereign-cloud aligned controls

We’ve shipped to FedRAMP High, DoD IL4 / IL5 environments, GovCloud, Azure Government, sovereign EU and UK clouds, and on-prem SCIF deployments. Every component (chat, search, embeddings, model serving) ships with the hardening, logging, and FIPS-validated cryptography your accreditation package expects.

6. Internal artifact mirror for offline upgrades

Container images, model weights, Helm charts, and OS packages are mirrored to an internal artifact registry inside your perimeter (Harbor, JFrog, internal Nexus). Upgrades happen through your existing change-management process — no outbound internet required to pull new versions, no surprise dependency reach-outs, no SBOM gaps.

7. Hardware-agnostic deployment (cloud, on-prem, edge)

Deploys on NVIDIA H100 / H200 / A100 GPUs in your data center, AMD MI300 clusters, AWS GovCloud, Azure Government, Oracle Sovereign Cloud, or edge nodes for classified field environments. One Kubernetes namespace or Docker Compose stack — the same stack across every region or enclave you need.

8. SSO, RBAC, audit, and accreditation-ready logging

SAML SSO and OIDC integrations for Okta, Azure AD, Entra ID, ICAM, PIV/CAC-aware auth, and on-prem identity providers. Role-based admin controls aligned to your clearance and need-to-know model. Audit logs in a format your ATO package, IL5 review, or HIPAA risk assessment expects out of the box.

START TODAY

Talk to an air-gapped AI deployment expert

Bring us your accreditation environment (FedRAMP, IL5, sovereign, SCIF), your model preferences, your GPU footprint, and the connector landscape you need to index. We’ll come prepared with the right deployment shape, sizing for your peak QPS, and the artifact package your accreditation team will expect.

Ask us about

    Contact Us
    Need experts to collaborate with for your AI/ML journey? Drop us an email and we will get in touch

    When you need air-gapped AI, not cloud AI

    ChatGPT Enterprise, Glean, and other cloud AI SaaS cover commercial knowledge work well. That’s enough if your accreditation, data-residency, and clearance constraints permit a vendor-hosted multi-tenant deployment.

    But teams shipping AI to classified, sovereign, or regulated environments need things SaaS structurally can’t deliver:

    • Every prompt, embedding, model call, and document inside your perimeter — never in a vendor’s cloud
    • Open-weight LLMs (Llama, Mistral, Qwen, DeepSeek) served on your GPUs — not API calls to a third-party
    • FedRAMP / IL5 / sovereign-cloud / SCIF-aligned hardening — not just a vendor’s SOC report
    • Internal artifact mirror for upgrades — not outbound internet pulls
    • Audit logs your ATO package and accreditation review actually accept
    • Permission-aware retrieval against classified and on-prem document stores — not just SaaS workplace apps

    An air-gapped AI deployment is the path that survives accreditation review. Deploy it once inside your perimeter, configure it for your environment, and your AI is a capability you fully own — with no outbound dependency, no vendor data path, and no surprise upgrade calls.

    Frequently asked questions

    No outbound internet at runtime, no third-party cloud in the data path, and no document, prompt, or model weight ever leaving the environment your security team owns. The full stack — chat UI, search, embeddings, vector store, and LLM serving — runs on hardware inside your perimeter. Upgrades come through an internal artifact mirror your change-control process already governs.
    Only if your accreditation package permits it. For true air-gapped environments — classified, IL5+, sensitive sovereign-cloud — the answer is no, and we serve open-weight models (Llama, Mistral, Qwen, DeepSeek) on local GPUs instead. For “sensitive but not classified” deployments that allow a vetted cloud LLM connection through a forward proxy, we set up routing rules so the cloud model is only used for non-sensitive workloads.
    Depends on your task and GPU budget. For general chat + RAG we usually start with Llama 3.1 70B Instruct or Qwen 2.5 72B on H100s, or Mistral Small / Llama 3.1 8B for smaller clusters. For coding workloads, DeepSeek-Coder-V2 or Qwen-Coder. We benchmark on your specific tasks before locking the model — what works for one classified mission set isn't necessarily what works for another.
    Internal artifact mirror. Container images, model weights, Helm charts, OS packages, and Python wheels are pulled into a registry inside your perimeter (Harbor, JFrog, internal Nexus) through your existing approved-import process. Upgrades happen against that mirror through your change-control workflow — same governance you already apply to every other system in the enclave.
    FedRAMP High, DoD IL4 and IL5, AWS GovCloud, Azure Government, on-prem SCIF deployments, and sovereign cloud regions in the EU and UK. We work with your accrediting authority and ISSO on the artifact package, control mappings, and continuous-monitoring evidence. We don't directly hold an ATO — your accreditation is yours — but we deliver the deployment in a shape your accreditation team can sign off on.
    Yes for very small deployments. With CPU-only inference you're limited to small open-weight models (Llama 3.1 8B, Phi-3, Mistral 7B) at modest throughput — fine for low-volume chat and RAG against a few dozen users. For production deployments serving hundreds of users with low latency, you'll want GPUs (A100, H100, H200, MI300) inside the perimeter. We help size before procurement.

    Related solutions in the private-AI cluster

    Additional resources

    AI Transformation Workshop

    Half-day strategy workshop to map your accreditation environment, open-weight model selection, and air-gapped deployment shape. Book a workshop →

    AI Strategy Session

    60-minute scoping call. We’ll talk through your accreditation environment, GPU footprint, and connector landscape, then sketch the right air-gapped AI deployment. Book a session →

    AI Consultant vs In-House Team

    Honest tradeoffs on bringing an air-gapped AI deployment in-house versus engaging a partner who has shipped through IL5 / FedRAMP High before. Read the comparison →

    Ready to deploy air-gapped AI?

    A 45-minute strategy call. We’ll walk through your accreditation environment, model and connector requirements, GPU footprint, and rollout sequence — then come back with a concrete deployment shape and the artifact package your accreditation team will need.