From Human-Readable to Machine-Readable: The Quest for a Semantic Web

The Problem with Web Pages

Since the 1990s, the web has primarily served as a platform for publishing documents meant for human eyes. These documents are written in HTML, a markup language that offers limited structural cues—it can indicate where a paragraph begins or that a word should be emphasized. Then comes CSS, which adds visual flair, like making text appear in tiny gray sans-serif fonts. While this might seem trendy to some, it can alienate older readers who struggle with low contrast and small type.

From Human-Readable to Machine-Readable: The Quest for a Semantic Web — Source: www.joelonsoftware.com

This approach works for human readers, but computers struggle to extract meaning from such loosely structured content. For instance, if you mention a book on a webpage—like Goodnight Moon by Margaret Wise Brown, illustrated by Clement Hurd—the only structural hint is the bold formatting of the title. A naive program reading that page has no way of knowing it’s encountering a book, let alone its author, illustrator, publisher, or ISBN.

The Dream of a Semantic Web

Back in 1999, Tim Berners-Lee envisioned a more structured web. In his book Weaving the Web, he wrote:

“I have a dream for the Web [in which computers] become capable of analyzing all the data on the Web – the content, links, and transactions between people and computers. A ‘Semantic Web’, which makes this possible, has yet to emerge, but when it does, the day-to-day mechanisms of trade, bureaucracy and our daily lives will be handled by machines talking to machines.”

To realize this dream, web publishers would need to add extra markup that tells computers what the content means. The natural starting point is schema.org, a collaborative project that provides a shared vocabulary for structured data. Using formats like RDF or JSON-LD, you could annotate your HTML to explicitly say, “Hey, this is a book!”—including details such as the author, illustrator, publisher, and ISBN.

Why So Little Progress?

Despite these good intentions, adding semantic markup remains a daunting task. After writing a beautiful, human-readable blog post, few have the mental energy to tackle the homework of embedding structured data. Unless a computer is actively consuming that data, the effort feels wasted. As a result, Semantic Web adoption has been painfully slow—most of the web still lacks meaningful machine-readable annotations.

The Cost of Complexity

The burden lies in the complexity: you need to learn vocabularies, choose a serialization format, and integrate the markup without breaking your existing page. For many content creators, the payoff isn’t obvious until search engines or other services start using the data. And even then, the immediate benefit is often intangible.

A Simpler Path Forward

Here’s a core belief: people will only add semantic markup to their web pages if doing so is effortless. In other words, the process must be nearly as simple as writing the content itself. The Block Protocol aims to address exactly that—by providing a way to embed structured blocks (like a book, a recipe, or an event) directly into any website, with zero overhead. Instead of wrestling with JSON-LD or RDF, authors can use pre-built blocks that automatically expose semantic data.

This approach lowers the barrier to entry, making it trivial to publish both human- and machine-readable content. As more publishers adopt these blocks, the Semantic Web that Berners-Lee dreamed of could finally move from theory to practice.

Looking Ahead

The web’s original design for human documents was a brilliant start, but to unlock its full potential—for search engines, intelligent agents, and everyday automation—we need structure. The Block Protocol represents a step toward making that structure as easy as typing a few words. Progress may have been slow since 1999, but with tools like this, the next chapter of the web could be far more accessible—to both people and machines.