Your Website Doesn't Speak AI Yet

In early March 2026, Google quietly shipped something that most of the tech press treated as a developer tool release. A command-line interface for Google Workspace. Useful, sure. Interesting, maybe. But buried in the GitHub repo alongside the CLI was something that deserved a lot more attention: 100+ SKILL.md files.

One for every supported API. Recipes for Gmail, Drive, Calendar, Docs, Sheets. Pre-written instructions telling AI agents exactly how to operate Google’s entire productivity suite.

The question nobody seemed to ask: “Why markdown?”

Not JSON. Not YAML. Not an OpenAPI spec. Not a proper API contract with types and validation and versioning. Just plain text files with pound signs for headers and hyphens for lists. The same format a developer uses to write a README. The same format this sentence was drafted in before it became a blog post.

Why would Google … with all the engineering firepower on the planet … choose that?

The answer to that question is the story of how the web just grew a new layer. And most people, including most enterprise IT organizations, have no idea it’s happening.

I know this because I’m one of them. I’ve been blogging on WordPress since 2011. Fifteen years of plugins, themes, post formats, permalink structures, SEO settings, and site configuration. I’m about as familiar with the inner workings of a WordPress site as a person can reasonably get without being a developer.

You want to know when I learned llms.txt was even a thing?

Today.

I stumbled across the Google announcement, started pulling the thread on why markdown, and in the middle of that conversation Claude mentioned (almost as an aside) that I should probably make sure my llms.txt was up to date. I stopped cold.

“Huh? My what?”

Turns out this site didn’t have one at all. In the next two hours I installed Yoast, enabled llms.txt generation, configured schema aggregation, connected Bing Webmaster Tools, installed the IndexNow plugin, and cleaned up thirteen dead categories that had been cluttering my taxonomy since 2012.

I’ve been running this site for fifteen years. I found out about a foundational piece of AI-era web infrastructure the same day I wrote a post about it.

That’s the unknown unknown problem. And it’s not just me.

The Accidental Standard

John Gruber invented Markdown in 2004. The goal was modest to the point of being almost embarrassingly simple: let writers write for the web without thinking about HTML. No angle brackets. No closing tags. No memorizing that a hyperlink is an anchor tag with an href attribute. Just text with a few simple conventions that could be converted to clean HTML automatically. Asterisks for bold, pound signs for headers, hyphens for lists.

It wasn’t trying to be infrastructure. It wasn’t trying to be anything other than a productivity tool for bloggers who were tired of accidentally breaking their posts with malformed tags.

It worked. Developers adopted it immediately because it matched how they already thought about documentation. Hierarchical, reference-able, version-controllable. GitHub made it the default language for READMEs, issues, pull request descriptions, and project wikis. If you were doing anything technical on the internet after 2010, you were writing markdown whether you called it that or not. Static site generators like Jekyll and Hugo embraced it as their native content format. Discord moderators learned to speak it fluently to communicate extravagant announcements. Technical writers standardized on it because it could live in a Git repo alongside the code it documented. Slowly, without anyone convening a standards body or issuing a mandate, markdown became the de facto language of the technical web.

Nobody voted on this.
Nobody issued an RFC.
It just won.

But something more interesting was happening underneath all of that adoption, something that only became obvious in retrospect…

AI models were being trained on the internet.

Hundreds of billions of documents, scraped and cleaned and fed into training pipelines. And a disproportionately significant portion of that training data was markdown. GitHub repositories. Documentation sites. Technical blogs. Developer forums. Stack Overflow answers with code blocks. All of it heavy with the clean, structured, low-noise text that markdown produces. Headings that clearly delineate topics. Lists that enumerate options. Code fences that separate instructions from prose.

Compare that to the average corporate webpage… a labyrinth of nested divs, inline styles, navigation chrome, cookie consent overlays, chat widgets, tracking pixels, and footer links to pages nobody reads. The actual content is in there somewhere, buried under layers of presentation logic that exists entirely for human visual consumption.

Models didn’t just learn to read markdown. They became native speakers. Fluent in a way they never quite became in raw HTML. When you ask an AI to produce structured content today, it reaches for markdown instinctively. Not because it was instructed to, but because that’s the substrate it thinks in. The format that was invented to spare bloggers from typing angle brackets turned out to be the format that AI finds most natural to both read and write.

That’s when markdown stopped being a convenience and became infrastructure.

And almost nobody noticed.

Quiet Standardization Nobody Voted On

In September 2024, a researcher and developer named Jeremy Howard proposed something called llms.txt. Howard is the founder of fast.ai and had just shipped a new Python web framework called FastHTML. The most common complaint from developers trying to use it? AI coding assistants couldn’t help them because the library was too new, created after the models’ training cutoff. The assistants kept hallucinating APIs that didn’t exist because they had no reliable source of truth for how FastHTML actually worked.

His solution was elegantly simple… put a markdown file at the root of your website that curates your most important content for AI systems. Not for human visitors… they have navigation menus and search for that. Not for Google’s crawler… it has its own well-established mechanisms. Specifically, this was for large language models trying to understand your site in real time during a conversation, when they have a limited context window and need to find the right information fast.

The analogy Howard reached for was robots.txt, the decades-old convention where websites tell crawlers what they’re allowed to index. llms.txt is the affirmative version of that. Instead of “here’s what to avoid,” it says “here’s what matters, here’s where the good stuff lives, here’s how to understand what this site is about.”

The format? You guessed it…

Markdown.

The reasoning is technical and it’s worth understanding. When an AI crawler hits an HTML page cold, it has to process everything: the DOCTYPE declaration, the head tags, the navigation links, the hero image alt text, the sidebar widgets, the related posts section, the comment form, the footer with its seventeen links to legal pages. Every one of those tokens costs context window space. By the time the model gets to the actual article content, it may have burned through a significant chunk of its available attention on scaffolding that contributes nothing to understanding the page’s meaning.

Markdown simply strips all of that away. What remains is pure signal. A heading tells you what a section is about. A link tells you where to go for more. A paragraph tells you what the author actually thinks. No noise. Maximum information density per token. For a model operating under context constraints in real time, that’s not a minor optimization. It’s the difference between finding the answer and missing it entirely.

The adoption curve was faster than almost anyone expected. By mid-2025, Anthropic, Perplexity, Stripe, Cloudflare, Hugging Face, Cursor, Zapier, and hundreds of other organizations had llms.txt files live at their root domains. Yoast, the SEO plugin running on tens of millions of WordPress sites, shipped native llms.txt generation as a standard feature in June 2025, making it a one-toggle operation for anyone running WordPress. The llmstxt.org specification became a community standard. Tools to validate your implementation, generate the file automatically, and audit competitor implementations started appearing.

An entire ecosystem, built in about nine months, around a freakin’ text file.

Most website owners have no idea any of this exists.

And Then Google Ships 100 Skill Files

Which brings us back to that GitHub repo, and to the question that started all of this.

The Google Workspace CLI, built by a senior developer relations engineer at Google named Justin Poehnelt, isn’t just a developer convenience for people who hate clicking through the Gmail UI. It’s something structurally different from what came before it. The entire command surface is generated dynamically at runtime from Google’s own Discovery Service, which means every time Google adds a new API endpoint, gws picks it up automatically without a new release. It outputs structured JSON by default, which any LLM can parse reliably. It has a built-in MCP server, meaning any MCP-compatible AI agent can call it directly as a tool.

It was designed from the ground up not for humans at a terminal but for AI agents operating autonomously.

And alongside the CLI, Google shipped those 100+ SKILL.md files. One for every supported API surface. Higher-level helpers for common multi-step workflows. Fifty curated recipes for the most frequent Gmail, Drive, Calendar, Docs, and Sheets operations.

Poehnelt described the philosophy behind the skill files in a single sentence that is worth writing down:

“…a skill file is cheaper than a hallucination. Without pre-written instructions, an AI agent trying to automate a Workspace task has to infer the right subcommand, guess the right flags, assume the right parameter names, and hope the output format matches what it expects.”

Every one of those inferences is a potential failure point. A skill file eliminates the ambiguity entirely. It documents exact invocation patterns, expected inputs, required flags, output formats, and error handling approaches. It turns a probabilistic guessing exercise into a deterministic operation.

Read that again from an infrastructure perspective…

A skill file is documentation that an AI can execute against. Not documentation that a human reads and then implements. Documentation that is itself the interface.

That is a fundamentally different thing than what we’ve been building for the past thirty years. And markdown is the language it’s written in.

Here’s the competitive context that makes this sharper: Microsoft began deprecating the Microsoft Graph CLI on September 1, 2025, with full retirement scheduled for August 28, 2026, directing users to PowerShell as the replacement. Google launched gws with native MCP support and 100 skill files into that same window.

Spoiler alert > That juxtaposition is a signal about where the interface layer for enterprise productivity software is going, and which company is currently more serious about getting there.

The Repeating Pattern Repeats

NetApp shipped its own proprietary API layer for ONTAP years before REST was even a meaningful conversation in the storage industry. We called them ZAPIs (pronounced “zappies”) and for a long time they were how everything automated against ONTAP worked. Provisioning volumes, monitoring performance, managing snapshots, configuring data protection… all of it ran through ZAPIs. They were powerful, they were comprehensive, and they were entirely proprietary. If you wanted to build automation against NetApp storage, you learned the ZAPI protocol or you didn’t build automation.

Then REST happened. Industry-standard, interoperable, human-readable, and compatible with every modern development toolchain. NetApp started building REST parity into ONTAP beginning with version 9.6, reached functional coverage for most use cases around 9.10, and finally announced ZAPI end-of-availability with 9.18… a multi-year transition from a proprietary protocol to an open standard. For customers who had built deep automation on ZAPIs, it meant migration work. For the ecosystem broadly, it meant a better foundation going forward.

The pattern is familiar to anyone who has watched technology standardize: proprietary first, open standard second, and a long period of parallel operation in between while the installed base catches up.

The broader API story follows the same arc, just at a larger scale. Before APIs became ubiquitous, applications were silos. Data lived inside them and couldn’t get out without a human operating the interface. Integrating two enterprise systems meant custom point-to-point connections, usually built by consultants who charged by the hour and left behind code nobody could maintain. Then the REST API model arrived and created a universal handshake… a standardized, stateless, self-describing way for systems to communicate without a human in the middle. It unlocked an entire economy built on top of that integration layer. Stripe, Twilio, Plaid, every SaaS integration you’ve ever used… all of it runs on the REST API foundation.

Markdown files are doing something structurally similar, but at the content and instruction layer rather than the data layer.

The web was built for human eyes. HTML, CSS, JavaScript… all of it exists to render something visually consumable by a person sitting in front of a screen. AI can technically process it, but it’s like reading a book through a frosted window. The meaning is in there somewhere, but buried under presentation.

Markdown strips the window away. Content with just enough structure to convey hierarchy and intent, without any rendering overhead. And crucially, it’s bidirectional in a way HTML never was. Humans can read it fluently. Machines can read it fluently. And now machines can write it as operating instructions for other machines.

The trajectory from here follows the same curve we’ve seen before. A quiet period of early adoption by the technically sophisticated. A tipping point where tooling makes it accessible to everyone. A period of parallel operation where the old way and the new way coexist. And eventually, a world where the new layer is assumed to exist, and the organizations that skipped it are the ones spending money playing catch up.

Every meaningful surface on the internet will eventually have a markdown layer running parallel to the human-facing layer.

Not replacing it.
Alongside it.

A website that is also callable. Not through a formal API with authentication flows and versioning contracts and SDK maintenance burden. Through a plain text file that says: “Hey! Here is what this thing does, here is how you make it do something, and here is what you get back.”

The distinction between using a website and using an API collapses. And the instruction language for all of it is something John Gruber built in 2004 so bloggers wouldn’t have to type angle brackets.

Governance While Rome Burns

Here is where I’m going to get direct with you, because I have spent a lot of time in enterprise IT environments over the decades.

Most large organizations right now are consumed by AI conversations that look like this…

“Which LLM do we standardize on?”
“How do we govern AI outputs?”
“What’s our data residency posture?”
“Which vendor’s copilot do we license?”

These are real questions that need answers… they’re just at the wrong layer.

While those conversations are happening, the instruction layer for how AI actually interacts with the web is being quietly standardized into plain text files. The same way the API economy snuck up on organizations that were still debating SOA governance frameworks while REST was quietly rewiring the integration layer underneath them. The same way ZAPIs became a migration problem for NetApp customers who assumed proprietary would last forever. The pattern keeps repeating because the people making architecture decisions are usually looking at the application layer while the interface layer shifts underneath them.

Enterprise IT shops have thousands of internal tools, portals, and applications. Intranets, ticketing systems, documentation platforms, HR portals, finance systems, procurement workflows. Every single one of them is currently invisible to AI agents because there is no skill layer. No markdown map. No instruction set. An agent can’t operate what it can’t read, and right now it can’t read any of it.

The organizations that instrument their internal surfaces for agent interaction first, and that build the markdown layer on top of their existing systems, will have a compounding productivity advantage over those still having governance discussions. Not because they picked the right LLM, but because they built the right instruction layer for the LLM to work with.

Microsoft deprecated their Graph CLI in late 2025. Google shipped a replacement with 100 skill files two months later. That’s not a coincidence. That’s a signal about where the interface layer is going.

I found out about llms.txt today. I’ve run a silly tech blog for fifteen years.

Imagine what your enterprise’s internal tools don’t know yet.

/Nick

Discover more from DatacenterDude

Subscribe to get the latest posts sent to your email.

One comment

Scott Hanson

April 17, 2026 / 6:20 am Reply

Is it strange that I read it as “11ms”?? … the round trip latency needed for typical synchronous replication in storage … my brain said, “oh that makes sense, each LLM can only spend 11ms scraping a site 🙂

Loading...

Your Website Doesn’t Speak AI Yet

The Accidental Standard

Quiet Standardization Nobody Voted On

And Then Google Ships 100 Skill Files

The Repeating Pattern Repeats

Governance While Rome Burns

Related

Discover more from DatacenterDude

One comment

Leave a ReplyCancel reply

The Accidental Standard

Quiet Standardization Nobody Voted On

And Then Google Ships 100 Skill Files

The Repeating Pattern Repeats

Governance While Rome Burns

Share this:

Related

Discover more from DatacenterDude

One comment

Leave a ReplyCancel reply

Discover more from DatacenterDude