F3 Is a Reminder That File Formats Are Becoming Runtime Contracts

F3 is on the Hacker News front page today, and the thread is exactly what you would expect when someone proposes a "file format for the future."

Some people see a serious research idea: a next-generation columnar format with embedded WebAssembly decoders so files can stay readable even when native library support is missing. Other people see a prototype with a thin README, no obvious migration path, and a familiar problem: Parquet is everywhere, so why would anyone leave?

Both reactions are fair.

The useful takeaway is not "replace Parquet." It is this:

File formats are becoming runtime contracts.

Last updated: June 23, 2026

That matters for data systems, analytics tools, and AI agents that increasingly consume files they did not create.

What F3 Is

F3 stands for Future-proof File Format. The project describes itself as an open-source data file format designed around efficiency, interoperability, and extensibility. The README is explicit that it is a research prototype and should not be used in production.

The paper's core claim is that modern columnar formats like Parquet and ORC were designed for an older hardware and workload environment. They can evolve, but every evolution runs into compatibility problems. Engines need native decoders. Tooling drifts. New encodings can be hard to deploy everywhere.

F3's answer is to make files self-describing in a stronger way:

the file carries data
the file carries metadata
the file can carry WebAssembly binaries that decode the data
native decoders can still exist
Wasm acts as a compatibility fallback when native support is unavailable

That is the interesting bit. The file does not just describe its shape. It can bring a portable decoder with it.

Why Developers Were Skeptical

The HN thread had a clear theme: the repo does not make the case quickly enough.

People asked:

What is this a file format for?
What Parquet shortcomings does it fix?
Why would anyone leave Parquet or ORC?
Where are the examples?
Is embedded executable code inside data a security risk?
Is this a research artifact or a project with adoption momentum?

Those are not nitpicks. File formats live or die on boring adoption constraints. A better format that nobody can read is worse than a flawed format that every warehouse, query engine, notebook, ETL tool, and object-store scanner already supports.

Parquet has massive present-tense gravity. Spark reads it. DuckDB reads it. BigQuery reads it. Snowflake reads it. Pandas, Polars, Arrow, Trino, Athena, and a pile of internal systems read it. That support is the product.

So the practical stance is simple: F3 is not a migration recommendation today.

It is a design signal.

Get the weekly deep dive

Tutorials on Claude Code, AI agents, and dev tools - delivered free every week.

From the archive

GLM-5.2 Local Deployment: Running Z.ai's 744B Model on Consumer Hardware

Jun 23, 2026 • 7 min read

In Praise of Memcached: Why Simpler Caching Might Be Better

Jun 23, 2026 • 7 min read

Mistral OCR 4 and Unlimited OCR Make Document Parsing an Agent Runtime Choice

Jun 23, 2026 • 8 min read

Do AI Coding Agents Need Their Own Version Control?

Jun 23, 2026 • 8 min read

The Wasm Decoder Idea Is the Signal

The embedded Wasm decoder idea points at a bigger shift.

Historically, a data file mostly carried bytes and schema. The runtime carried the intelligence:

the engine knew the format
the library knew the encoding
the application knew how to interpret fields
the user hoped versions lined up

F3 pushes some of that contract into the artifact itself. If a file uses a new encoding, it can include a decoder implementation. The consumer still needs a sandbox and execution policy, but the file is no longer helpless without a matching native library.

That is a powerful direction for long-lived data.

Think about files that need to survive:

scientific archives
compliance exports
ML training corpora
public datasets
government records
company data lakes
analytics snapshots

The longer a file has to live, the more painful decoder drift becomes.

This is also where the security question becomes real. A file that carries executable logic must be treated differently from a file that only carries inert bytes. Wasm is designed for sandboxed execution, but sandboxing is a policy surface, not a magic word. Readers need resource limits, capability controls, deterministic execution expectations, and a clear answer for "what is this decoder allowed to do?"

That makes F3 less like a simple format and more like a runtime boundary.

Why This Matters for AI Agents

AI agents make the file-format problem sharper.

Agents are constantly asked to inspect unfamiliar artifacts:

CSV exports
Parquet datasets
JSON logs
notebooks
model cards
trace files
benchmark outputs
internal reports

The agent often sees the surface text but not the deep contract. It can summarize a README, but can it verify the encoding? Can it recover column semantics? Can it explain a weird compression scheme? Can it cite which decoder produced the data?

As agents move closer to data engineering work, the file is not just input. It is an operational boundary.

We made a similar argument in agent workspaces need filesystem contracts: agents become safer when their workspaces expose clear, inspectable contracts. F3 applies that mindset lower in the stack. The file itself becomes more self-explaining and self-contained.

That does not mean agents should blindly execute Wasm from random files. It means agent runtimes need a stronger notion of file trust:

inert metadata is safe to inspect
embedded code requires sandbox policy
generated decoders need provenance
derived values need receipts
file-level capabilities should be explicit

This is the same trust-boundary lesson that shows up in MCP servers, plugin systems, and tool-call sandboxes. Once data can carry behavior, the reader needs policy.

Where F3 Could Matter

F3's best near-term use case is not replacing every Parquet file in a data lake.

The better fit is research and specialized systems where format evolution is the bottleneck:

testing new encodings without waiting for every engine to ship native support
distributing datasets with portable decoding behavior
experimenting with hardware-aware layouts
preserving long-lived scientific or compliance datasets
building engines that can safely execute file-provided decoders

That is still meaningful. Research prototypes do not need to win the whole market to move the conversation.

The important question is whether the idea can be packaged into something boring enough for real operators:

clear examples
obvious Parquet comparisons
reproducible benchmarks
sandbox defaults
a small reader API
compatibility with Arrow-like workflows
a migration story for existing data lakes

HN was right to ask for the "why" upfront. A future file format has to win trust before it can win adoption.

My Take

F3 is interesting because it treats file compatibility as a runtime problem, not just a schema problem.

That is a big idea.

It is also nowhere near enough to overcome Parquet by itself. The data world is not short on clever formats. It is short on formats that every tool reads, every team trusts, and every operator can debug at 2 a.m.

So the practical takeaway is not "move to F3."

The takeaway is to watch the contract shift:

files will carry richer metadata
formats will need safer extension points
portable execution will become more common
readers will need policy, not just parsers
agents will need to reason about file provenance and decoder trust

That is bigger than F3. It is the direction data infrastructure has to go if files are going to outlive the tools that created them.

FAQ

What is F3?

F3 is a research prototype for a future-proof columnar data file format. It is designed around efficiency, interoperability, and extensibility, including embedded WebAssembly decoders.

Is F3 ready for production?

No. The F3 README says the project is a research prototype and should not be used in production.

Is F3 trying to replace Parquet?

The paper compares F3 against existing columnar formats such as Parquet and ORC, but developers should not treat it as a drop-in replacement today. Parquet's ecosystem support remains the practical default.

Why embed Wasm decoders in a file?

The idea is that a file can remain readable even when a native decoder for its encoding is not available. Wasm provides a portable fallback, assuming the reader has a safe sandbox and execution policy.

Is executable code inside a data file risky?

Yes, it creates a trust boundary. Wasm can reduce risk through sandboxing, but readers still need resource limits, capability controls, provenance checks, and a policy for whether embedded decoders can run at all.

Sources

Fetched June 23, 2026.

What F3 Is

Why Developers Were Skeptical

GLM-5.2 Local Deployment: Running Z.ai's 744B Model on Consumer Hardware

In Praise of Memcached: Why Simpler Caching Might Be Better

Mistral OCR 4 and Unlimited OCR Make Document Parsing an Agent Runtime Choice

Do AI Coding Agents Need Their Own Version Control?

The Wasm Decoder Idea Is the Signal

Why This Matters for AI Agents

Where F3 Could Matter

My Take

FAQ

What is F3?

Is F3 ready for production?

Is F3 trying to replace Parquet?

Why embed Wasm decoders in a file?

Is executable code inside a data file risky?

Sources

Agent Workspaces Need Filesystem Contracts

Codebase Graphs Are the New Agent Map

The MCP Server Ecosystem: A Developer's Guide for 2026

Related Tools

CSS Modules

E2B

Apps from Developers Digest

agentfs

Voice

SkillForge CI

Related Guides

.claude/rules Directory - Claude Code

Read Tool - Claude Code

Write Tool - Claude Code

Related Posts

Agent Workspaces Need Filesystem Contracts

Codebase Graphs Are the New Agent Map

The MCP Server Ecosystem: A Developer's Guide for 2026

The 98% Context Reduction Pattern

Headroom: Compress Agent Tool Output Before It Reaches the LLM

Armin Ronacher on The Coming Loop and Why Agent-Driven Code Still Needs Human Comprehension

Get Smarter About AI Dev

What F3 Is

Why Developers Were Skeptical

GLM-5.2 Local Deployment: Running Z.ai's 744B Model on Consumer Hardware

In Praise of Memcached: Why Simpler Caching Might Be Better

Mistral OCR 4 and Unlimited OCR Make Document Parsing an Agent Runtime Choice

Do AI Coding Agents Need Their Own Version Control?

The Wasm Decoder Idea Is the Signal

Why This Matters for AI Agents

Where F3 Could Matter

My Take

FAQ

What is F3?

Is F3 ready for production?

Is F3 trying to replace Parquet?

Why embed Wasm decoders in a file?

Is executable code inside a data file risky?

Sources

Agent Workspaces Need Filesystem Contracts

Codebase Graphs Are the New Agent Map

The MCP Server Ecosystem: A Developer's Guide for 2026

Related Tools

CSS Modules

E2B

Apps from Developers Digest

agentfs

Voice

SkillForge CI

Related Guides

.claude/rules Directory - Claude Code

Read Tool - Claude Code

Write Tool - Claude Code

Related Posts

Agent Workspaces Need Filesystem Contracts

Codebase Graphs Are the New Agent Map

The MCP Server Ecosystem: A Developer's Guide for 2026

The 98% Context Reduction Pattern

Headroom: Compress Agent Tool Output Before It Reaches the LLM

Armin Ronacher on The Coming Loop and Why Agent-Driven Code Still Needs Human Comprehension

Get Smarter About AI Dev