
TL;DR
F3 is trending on Hacker News as a research prototype for a future-proof columnar file format. The useful takeaway is not to replace Parquet tomorrow. It is that data files are starting to carry more of their own runtime contract.
F3 is on the Hacker News front page today, and the thread is exactly what you would expect when someone proposes a "file format for the future."
Some people see a serious research idea: a next-generation columnar format with embedded WebAssembly decoders so files can stay readable even when native library support is missing. Other people see a prototype with a thin README, no obvious migration path, and a familiar problem: Parquet is everywhere, so why would anyone leave?
Both reactions are fair.
The useful takeaway is not "replace Parquet." It is this:
File formats are becoming runtime contracts.
Last updated: June 23, 2026
That matters for data systems, analytics tools, and AI agents that increasingly consume files they did not create.
F3 stands for Future-proof File Format. The project describes itself as an open-source data file format designed around efficiency, interoperability, and extensibility. The README is explicit that it is a research prototype and should not be used in production.
The paper's core claim is that modern columnar formats like Parquet and ORC were designed for an older hardware and workload environment. They can evolve, but every evolution runs into compatibility problems. Engines need native decoders. Tooling drifts. New encodings can be hard to deploy everywhere.
F3's answer is to make files self-describing in a stronger way:
That is the interesting bit. The file does not just describe its shape. It can bring a portable decoder with it.
The HN thread had a clear theme: the repo does not make the case quickly enough.
People asked:
Those are not nitpicks. File formats live or die on boring adoption constraints. A better format that nobody can read is worse than a flawed format that every warehouse, query engine, notebook, ETL tool, and object-store scanner already supports.
Parquet has massive present-tense gravity. Spark reads it. DuckDB reads it. BigQuery reads it. Snowflake reads it. Pandas, Polars, Arrow, Trino, Athena, and a pile of internal systems read it. That support is the product.
So the practical stance is simple: F3 is not a migration recommendation today.
It is a design signal.
Get the weekly deep dive
Tutorials on Claude Code, AI agents, and dev tools - delivered free every week.
From the archive
Jun 23, 2026 • 7 min read
Jun 23, 2026 • 7 min read
Jun 23, 2026 • 8 min read
Jun 23, 2026 • 8 min read
The embedded Wasm decoder idea points at a bigger shift.
Historically, a data file mostly carried bytes and schema. The runtime carried the intelligence:
F3 pushes some of that contract into the artifact itself. If a file uses a new encoding, it can include a decoder implementation. The consumer still needs a sandbox and execution policy, but the file is no longer helpless without a matching native library.
That is a powerful direction for long-lived data.
Think about files that need to survive:
The longer a file has to live, the more painful decoder drift becomes.
This is also where the security question becomes real. A file that carries executable logic must be treated differently from a file that only carries inert bytes. Wasm is designed for sandboxed execution, but sandboxing is a policy surface, not a magic word. Readers need resource limits, capability controls, deterministic execution expectations, and a clear answer for "what is this decoder allowed to do?"
That makes F3 less like a simple format and more like a runtime boundary.
AI agents make the file-format problem sharper.
Agents are constantly asked to inspect unfamiliar artifacts:
The agent often sees the surface text but not the deep contract. It can summarize a README, but can it verify the encoding? Can it recover column semantics? Can it explain a weird compression scheme? Can it cite which decoder produced the data?
As agents move closer to data engineering work, the file is not just input. It is an operational boundary.
We made a similar argument in agent workspaces need filesystem contracts: agents become safer when their workspaces expose clear, inspectable contracts. F3 applies that mindset lower in the stack. The file itself becomes more self-explaining and self-contained.
That does not mean agents should blindly execute Wasm from random files. It means agent runtimes need a stronger notion of file trust:
This is the same trust-boundary lesson that shows up in MCP servers, plugin systems, and tool-call sandboxes. Once data can carry behavior, the reader needs policy.
F3's best near-term use case is not replacing every Parquet file in a data lake.
The better fit is research and specialized systems where format evolution is the bottleneck:
That is still meaningful. Research prototypes do not need to win the whole market to move the conversation.
The important question is whether the idea can be packaged into something boring enough for real operators:
HN was right to ask for the "why" upfront. A future file format has to win trust before it can win adoption.
F3 is interesting because it treats file compatibility as a runtime problem, not just a schema problem.
That is a big idea.
It is also nowhere near enough to overcome Parquet by itself. The data world is not short on clever formats. It is short on formats that every tool reads, every team trusts, and every operator can debug at 2 a.m.
So the practical takeaway is not "move to F3."
The takeaway is to watch the contract shift:
That is bigger than F3. It is the direction data infrastructure has to go if files are going to outlive the tools that created them.
F3 is a research prototype for a future-proof columnar data file format. It is designed around efficiency, interoperability, and extensibility, including embedded WebAssembly decoders.
No. The F3 README says the project is a research prototype and should not be used in production.
The paper compares F3 against existing columnar formats such as Parquet and ORC, but developers should not treat it as a drop-in replacement today. Parquet's ecosystem support remains the practical default.
The idea is that a file can remain readable even when a native decoder for its encoding is not available. Wasm provides a portable fallback, assuming the reader has a safe sandbox and execution policy.
Yes, it creates a trust boundary. Wasm can reduce risk through sandboxing, but readers still need resource limits, capability controls, provenance checks, and a policy for whether embedded decoders can run at all.
Fetched June 23, 2026.
Read next
GitHub's latest agent workspace trend points at a boring but important primitive: agents need explicit filesystem contracts before they get more tools.
8 min readGraphify is trending because coding agents keep hitting the same wall: they can edit files, but they still need a durable map of how the codebase, docs, schemas, and decisions connect.
8 min readAn opinionated guide to the MCP server ecosystem in 2026. Curated picks by category, real configuration examples, installation commands, and honest assessments of what works and what does not.
13 min readTechnical content at the intersection of AI and development. Building with AI agents, Claude Code, and modern dev tools - then showing you exactly how it works.
Locally-scoped CSS for component-based apps. Plain CSS files with hashed class names, no runtime overhead, no learning c...
View ToolOpen-source cloud sandboxes for AI agents. Isolated environments that start in under 200ms, run code in Python, JavaScri...
View ToolGive your agents a filesystem that branches like git. Crash-safe by default.
View AppTalk, get text. A Mac dictation app that doesn't waste your words.
View AppCatch broken SKILL.md files in CI before they hit your team.
View AppPath-specific rules that only load for matching files.
Claude CodeRead file contents with line limiting, offset, and binary support.
Claude CodeCreate or overwrite files; requires permission for existing paths.
Claude Code
GitHub's latest agent workspace trend points at a boring but important primitive: agents need explicit filesystem contra...

Graphify is trending because coding agents keep hitting the same wall: they can edit files, but they still need a durabl...

An opinionated guide to the MCP server ecosystem in 2026. Curated picks by category, real configuration examples, instal...

Efficient agents do not stuff every tool result into the model context. They keep intermediate state in code, files, and...

Headroom is a context compression layer that intercepts your AI agent's tool outputs and strips 60-95% of the tokens bef...

Armin Ronacher's new essay explores the tension between letting AI agents loop autonomously and maintaining the engineer...

New tutorials, open-source projects, and deep dives on coding agents - delivered weekly.