Because code gets cheaper and cheaper to write, this includes re-implementations. I mentioned recently that I had an AI port one of my libraries to another language and it ended up choosing a different design for that implementation. In many ways, the functionality was the same, but the path it took to get there was different. The way that port worked was by going via the test suite.
Something related, but different, happened with chardet. The current maintainer reimplemented it from scratch by only pointing it to the API and the test suite. The motivation: enabling relicensing from LGPL to MIT.
The Ship of Theseus Paradox
This raises interesting questions about software identity. If you rewrite a piece of software piece by piece, at what point does it become a different program? This is the Ship of Theseus paradox applied to code.
The Ship of Theseus is a thought experiment that goes back to ancient Greece. If you replace every plank in a ship one by one, is it still the same ship? At what point does it become a different vessel? And if you take all the old planks and build another ship, which one is the "real" ship?
When an AI rewrites your code, even if it preserves the same functionality, is it still your code? The implementation details, the particular choices made along the way, the subtle bugs and their fixes — all of these contribute to the unique fingerprint of a codebase.
// Original implementation
function detectEncoding(buffer) {
const detector = new CharsetDetector();
return detector.detect(buffer);
}
// AI-generated port with different design
const detectEncoding = async (buffer) => {
const result = await analyzeBuffer(buffer);
return result.encoding;
};
The Role of Tests in Code Identity
What's fascinating about the AI-driven porting process is that it uses tests as the specification. The tests become the identity of the code — they define what the code is, not how it does it. This is a profound shift in how we think about software.
When I write code, I'm making thousands of small decisions. Should this be a class or a function? Should this error be handled here or passed up the stack? Should this data structure be an array or a map? Each of these decisions is informed by my experience, my mental model of the problem, and my aesthetic preferences about code.
An AI making these decisions based on a test suite is free from all of that baggage. It might make different choices, some of which might be better. But are these choices still "mine"? Does the code still reflect my intentions as a programmer?
Implications for Open Source
This has profound implications for open source licensing. If an AI can be pointed at a codebase and generate a clean-room implementation that passes the same tests, what happens to the original author's rights and attribution?
"The question is not whether AI can reproduce your code, but whether we consider the output to be a derivative work."
Current open source licenses assume that code is copied, modified, or extended. They don't account for the possibility that code could be "read" and then "reimplemented" by an AI. The legal framework we have wasn't designed for this scenario.
Consider the implications:
- Copyleft licenses like GPL require derivative works to be licensed under the same terms. But is AI-generated code based on GPL-licensed code a derivative work?
- Permissive licenses like MIT require attribution. But if an AI "learns" from MIT-licensed code and produces similar code, does that attribution requirement transfer?
- Trade secrets are protected by not sharing code. But if an AI can reverse-engineer functionality from tests or public documentation, what happens to that protection?
The Future of Code Ownership
As we navigate this new landscape, we'll need to develop new norms and possibly new legal frameworks around code ownership and attribution in the age of AI-generated software.
Perhaps the solution lies in being more explicit about what we consider the "essence" of our code. Is it:
- The specific implementation details?
- The algorithmic approach?
- The API surface?
- The test suite?
- Or something else entirely?
My suspicion is that we'll need to think about code on multiple levels, with different rules for each. The API surface might be one thing (with its own copyright considerations), while the implementation might be another. Tests might occupy a middle ground — functional specifications that are neither purely API nor purely implementation.
What's clear is that the old ways of thinking about code ownership are being challenged. The Ship of Theseus isn't just a philosophical puzzle anymore — it's a practical question that we, as an industry, need to answer.
And unlike the ancient Greeks, we can't afford to debate it indefinitely. The AI systems are already here, and they're already rewriting our code, one function at a time.