Open Knowledge Format: the standard AI agents read

On June 12, 2026, Google Cloud published version 0.1 of the Open Knowledge Format (OKF): an open, vendor-independent standard for structuring the knowledge that artificial intelligence agents consume. The release includes reference implementations, sample bundles over public datasets, and a specification that, according to its authors, fits on a single page.

What’s striking about the announcement isn’t the technology, which is deliberately simple. It’s the diagnosis that precedes it. Sam McVeety and Amir Hormati, the authors of the specification, spend the first half of their Google Cloud blog post explaining why the knowledge problem in organizations can’t be solved with more model capacity or better retrieval systems. Their thesis is that what’s missing isn’t another service. What’s missing is a format.

That diagnosis deserves attention well beyond Google Cloud’s data engineering teams. Because the problem OKF formalizes isn’t a problem of technical infrastructure. It’s the problem of any organization whose value proposition depends on an AI agent being able to represent it accurately. A university, a public institution, a professional services firm: any organization that produces knowledge destined, ultimately, to be consumed by a machine.

What exactly is the Open Knowledge Format

OKF is, at its fundamental design, a directory of markdown files with YAML metadata in the header of each document. It’s not a database. It’s not a platform. It’s not a proprietary API. It’s a format: a convention for organizing, naming and tagging text files so that the knowledge they contain can move between systems without ad hoc integrations.

The design bet lies precisely in what OKF does not require. No SDK. No account with any provider. No specific deployment platform or proprietary runtime. An OKF bundle is, in its most basic form, a folder of text files that any text editor can open, any search engine can index, any agent can read and any developer can audit without special tools.

McVeety and Hormati put it precisely: “If you’ve used Obsidian, Notion, Hugo or any of the LLM-wiki patterns that have emerged over the past year, the shape will feel familiar. OKF formalizes the small set of conventions needed to make those patterns interoperable.” Interoperability is the goal. Not the technology.

The problem Google set out to diagnose

To understand what OKF sets out to solve, it helps to understand the problem it describes —which is not primarily technical.

In most organizations, the knowledge an agent needs to answer questions accurately is distributed across systems that don’t talk to each other. A table’s schema lives in the team’s metadata catalog, accessible through that provider’s API. The meaning of a business metric —what exactly it counts, what it excludes, since what date it’s calculated this way and not another— lives in the head of whoever defined it three years ago, or in a Confluence doc nobody has updated since. The constraint that explains why two tables don’t join directly is in a Slack thread from a team that has already reorganized. The procedure describing how to respond to an anomaly is in a Google Doc whose access permissions expired last quarter.

When an agent needs to answer “how do we calculate the retention rate of our weekly active users?”, it has to assemble that answer from fragmented, mutually incompatible surfaces, with no guarantee that what it finds is the current version of the truth. The agent infers. And inference, operating over implicit or fragmented knowledge, produces what engineers call “plausible nonsense”: answers that sound coherent and are structurally wrong.

Google’s diagnosis in the OKF specification is the same one Andrej Karpathy, the AI researcher who inspired part of the design, has articulated from the perspective of wikis: LLMs don’t get bored, don’t forget to update a cross-reference and can touch fifteen files in a single pass. The maintenance that leads humans to abandon personal wikis is exactly what LLMs do well. What LLMs cannot do —and no one can do for them— is reason correctly about knowledge that has not been declared. Model capacity doesn’t change the underlying problem when the knowledge itself isn’t there.

More parameters, better semantic retrieval, larger context windows: all of that helps, but none of those improvements changes the content the models read. If the facts are implicit, inconsistent or were never declared, the system still has to infer too much, and it’s in inference that reasoning breaks down.

The anatomy of an OKF bundle

For the diagnosis to have practical consequences, it’s worth examining the concrete design of the specification.

A well-built OKF bundle has three layers. The first is the concept directory: a hierarchy of folders that mirrors the organization’s knowledge domains, with one file per concept and optional index.md files that ease hierarchical navigation for agents exploring the bundle incrementally. The second layer is the links between concepts: standard markdown references that connect tables to metrics, metrics to procedures, procedures to the business context that justifies them, APIs to the data they expose. The third layer is the optional log.md files that record a chronological history of changes to a specific concept, turning the bundle into a living record rather than a static snapshot.

Each document’s YAML header carries five fields: type, title, description, resource and timestamp. The type field is the only one the v0.1 specification considers mandatory, and it’s deliberately open: OKF doesn’t prescribe what types exist. Each organization defines its own vocabulary of types according to its needs. What the standard guarantees is that this vocabulary is queryable uniformly by any consumer, regardless of who produced the bundle.

This minimalism has a principled justification the authors articulate clearly. The value of a knowledge format doesn’t come from the richness of its specification, but from the number of parties that adopt it. A standard that’s too opinionated about the content model turns adoption into a negotiation over the model, and that negotiation rarely concludes. OKF bets on being minimally opinionated —just enough for interoperability to be possible— leaving the rest to the discretion of each producer and consumer.

Google published three reference implementations alongside the specification: an enrichment agent that walks a BigQuery dataset and generates an OKF bundle for each table and view, a static HTML visualizer that turns any bundle into a navigable graph with no backend, and three sample bundles over public datasets —GA4 ecommerce, Stack Overflow and Bitcoin— produced by the reference agent and available in the repository. They’re proofs of concept, not the product. The ecosystem of producers and consumers OKF expects to generate extends well beyond what Google has implemented.

Why a format and not another service

The decision to publish OKF as an open standard rather than as a Google Cloud product has a logic worth making explicit, because it isn’t obvious.

The knowledge management market for agents is fragmented today the same way the API market was fragmented before REST and OpenAPI became shared conventions. Each data catalog provider has its own metadata model, accessible through its own API, with its own vocabulary of types and its own relationship logic. Each documentation system has its own export format. Every organization that has tried to feed agents with internal context has built its own system of adapters to translate between surfaces.

The result: every team building an agent solves the same context-assembly problem from scratch, knowledge stays trapped behind the surface that created it, and interoperability between systems requires manual integration every time something changes.

An open standard breaks that dynamic in a way a service can’t. A service requires commercial adoption: someone has to decide to pay for it, implement it and depend on the provider maintaining it. A standard requires technical adoption, whose marginal cost is significantly lower. If the standard is simple enough that any team can implement a producer in a few hours and a consumer in a few more, the barrier to entry effectively disappears.

Google’s bet with OKF is explicit in the announcement text: “the value of a knowledge format comes from how many parties speak it, not who owns it.” It’s not open-source rhetoric. It’s the same logic that made HTML, RSS or JSON have more impact than any proprietary platform built in the same era: a format’s value resides in its ubiquity, and ubiquity is only reached when the format belongs to no one.

OKF and Machine Experience: transport and origin

When I read the OKF specification, I recognized from another angle the problem Machine Experience describes: the distinction between where knowledge is produced and how it travels from there to whoever consumes it.

It’s worth illustrating with the case of the university student. The authoring environment where someone creates a page, publishes a date or records a piece of data determines whether that data is born with the structure a machine needs to trust it. If the environment records an event date as free text inside a paragraph —with no associated status or expiry field— the information it publishes cannot declare its own expiration. An agent that reads it four months later has no way of knowing it’s no longer valid: it reads “registration open until April 15” and, with no machine-readable signal indicating when it stops being true, infers it’s still current. That inference is where the student’s customer journey breaks.

The Machine Experience diagnosis operates at the origin of the problem: the moment content is born. The OKF diagnosis operates at the transport: the moment knowledge moves from one system to another. They are two distinct moments in the same chain.

The connection between them isn’t metaphorical. An OKF bundle is the external representation of what Machine Experience describes as the responsibility of the authoring environment. If the authoring environment captures an event’s expiry date as a structured field with status —not as free text, but as a typed datum with a value and an expiration date— that field can be published in an OKF document with its timestamp. If the environment captures a university program’s admission threshold as an authoritative datum —linked to its source, dated, with a maintenance owner— that datum can travel in an OKF bundle to any agent that needs it, without being reinterpreted and without asking the consumer to trust an unsupervised inference.

The distinction OKF calls “producer/consumer independence” is the same one Machine Experience describes when it says the with what determines the for what. OKF formalizes the transport standard. Machine Experience names the necessary condition for that transport to work. They are two sides of the same architectural decision: OKF formalizes the external face; Machine Experience names the internal one.

What this means for a university

OKF isn’t exclusively a tool for data engineering teams at large tech companies. It’s a standard any organization can adopt to improve the quality of the knowledge its internal and external agents consume. For universities, the implications are direct.

The stale-content problem has a technical name. Four of the seven university websites I reviewed showed the same pattern: events announced in the future with dates already past, registration deadlines that closed months ago presented as current, cutoff scores sourced from aggregators instead of primary sources. It’s not negligence: it’s that the knowledge about each piece of content’s real status exists inside the organization —in the content management systems, in the academic systems, in the admissions teams— but doesn’t travel to the published page in a form a machine can read. OKF puts a technical name to that gap and proposes the mechanism to close it.

The admission moment is exactly where agents fail. The student’s customer journey toward enrollment is a system of constraints: pre-registration deadlines, cutoff scores, open-day dates, documentation requirements, formalization periods. Each of those constraints is, in OKF’s language, a concept that should exist in a verifiable bundle: with its type, its authoritative source, its update timestamp and its relationships to other concepts in the ecosystem. A university that publishes those constraints as text in HTML pages with no structured metadata gives any agent representing it a source it can only infer from. One that publishes them as structured knowledge gives it a source it can reason from.

The content management platform is the bundle-production infrastructure. If a university’s digital content —its programs, its events, its critical dates, its admission conditions— is produced by an authoring environment that captures that data as structured fields with status, source and expiry, that environment is already producing the ingredients of an OKF bundle. What determines whether that knowledge can travel to agents with enough structure to be reliable is whether the environment where it’s born declares it at the origin. Content management platforms that export their clients’ knowledge in OKF-aligned formats —or that directly produce compatible documents— let that knowledge be consumed by third-party agents without bilateral integration. It’s exactly what we’re tackling with OKF at Griddo: the same leap the ecosystem made when CMSs adopted standard REST APIs, from point-to-point integrations to an ecosystem of consumers that connect without friction.

OKF as GEO infrastructure

There’s a dimension of OKF the specification touches indirectly but that has direct implications for any organization with a digital presence: its relationship with GEO —Generative Engine Optimization, the set of practices that determine whether an organization is accurately represented in the answers of generative AI systems.

The language models that power AI assistants learn from what they can read, verify and attribute. An organization whose knowledge is declared in structured form —with citable sources, current timestamps and explicit relationships between concepts— gives those models exactly the kind of knowledge they can incorporate with confidence. An organization whose knowledge is buried in unstructured text, with no update dates and no declared authority, is one the models have to infer about, and inference produces imprecise representations no marketing team can control afterward.

OKF, understood from this perspective, is GEO infrastructure —not in the sense of a positioning technique, but in the most basic sense: a mechanism for an organization’s knowledge to exist in a form AI systems can read, verify and cite. Positioning in LLMs isn’t achieved by optimizing text for an algorithm; it’s achieved by being a source worthy of trust. And a source is worthy of trust when its knowledge is declared with enough structure that whoever consumes it can verify it.

At Griddo we’ve explored how web architecture determines citation success in generative systems, and how semantic search operates over content that has been structured to be understood, not just retrieved. OKF adds one more layer to that argument: it matters not only how the page is structured, but whether the knowledge that page contains can travel outside it in verifiable form. A university that masters GEO isn’t only the one with well-structured pages; it’s the one with knowledge that can be consumed by any agent with enough context to trust it.

Bots already outnumber people in internet traffic, according to recent Cloudflare data. For an organization whose visibility depends on agents representing it accurately —which is, increasingly, any organization with a digital presence— the structured-knowledge problem goes from being an architectural best practice to being a basic condition of visibility.

OKF in context: what already existed and what it formalizes

OKF doesn’t emerge in a vacuum. It formalizes a family of patterns that technical teams had developed independently over the past few years, each with its own convention and with no interoperability between them.

The most influential pattern is the LLM-wiki Karpathy described in 2024: a repository of markdown files that the agents themselves maintain and update, working as persistent memory and a knowledge base across work sessions. The logic is that LLMs are good at exactly what humans are bad at when maintaining wikis: consistency, updating cross-references, the ability to touch many files in a single pass. The wiki humans abandon because maintenance is tedious is the wiki an LLM can maintain indefinitely.

The AGENTS.md and CLAUDE.md pattern, popularized by Claude Code and other agentic environments, applies the same logic at the code-repository level: a markdown instructions file the agent reads at the start of each session that gives it the context it needs to operate without asking about every project convention. It’s a single-file OKF bundle, without the name and without the YAML header.

Obsidian Vaults connected to code agents are another example: an Obsidian vault is, structurally, a bundle of markdown files with YAML frontmatter and links between documents. Obsidian’s Dataview plugin demonstrates that this format is rich enough to build complex queries over the knowledge graph with no external database.

What all these patterns share is the intuition OKF formalizes: the knowledge agents need to operate well can be represented as markdown with metadata, versioned with git and distributed without a platform. What was missing was the small set of conventions that lets one team’s wiki, another’s data catalog and a third’s documentation be consumable by the same agent without adapters. OKF is that set of conventions.

The time to adopt

OKF v0.1 is an initial specification, and its authors are explicit about it. The format will evolve as more producers and consumers emerge and as the agent ecosystem identifies which knowledge representations are actually needed in practice.

But the timing of the announcement isn’t accidental. The adoption curve of agentic systems is at the point where context and knowledge problems stop being theoretical limitations and become practical bottlenecks. Teams building agents aren’t limited by model capacity; they’re limited by the quality of the knowledge the model can read. OKF appears at that precise moment, when demand for a standard is high enough for adoption to have its own momentum.

The question for any organization isn’t whether OKF will be relevant. It’s when to adopt it and to what depth. Organizations that build OKF-compatible knowledge infrastructure now —even partially, starting with the domains most critical to their agents— will be better positioned when the consumer ecosystem matures and demand for quality bundles rises.

Organizations that wait for the standard to be fully stabilized will repeat the same mistake made with schema.org in the early years of the last decade: the time to adopt a structured-knowledge standard isn’t when everyone does it, but when doing so has low cost and high differential advantage.

For universities, the translation is concrete. The first step doesn’t require redesigning the entire digital architecture: it requires identifying which critical knowledge —admission deadlines, each program’s conditions, the dates students need to make decisions— is today buried in free text with no verifiable structure, and starting to declare it in a form an agent can read and trust. The infrastructure to do so already exists in any content management platform that captures structured fields with status and timestamps. What OKF adds is the standard that lets that knowledge travel.

Conclusion

When an organization with Google’s technical capacity publishes a specification and says what was missing wasn’t a service but a format, it’s making a diagnosis about the state of the market. The diagnosis is that the knowledge problem in organizations is a problem of declaration, not retrieval. Agents have access to more information than ever; what they lack is information that has been declared in a form they can trust.

OKF proposes the transport mechanism: the convention that lets knowledge travel from a producer to a consumer with enough context to be interpreted correctly at the destination. Machine Experience describes where it’s decided whether that mechanism has anything useful to transport: the authoring environment, the backoffice, the moment content is born.

Both are necessary, and neither solves the other’s problem. An OKF bundle built over content that was born without structure merely moves the problem. And an authoring environment that produces structured knowledge but has no standard format to distribute it leaves that knowledge trapped in the tool that created it.

For any organization that manages content destined to be consumed by agents —and that set is, increasingly, any organization with a digital presence— the question is no longer whether to structure knowledge. It’s which standard, from where and at what pace. OKF answers the first question. Machine Experience answers the second. The with what, as always, determines the for what.