Monday, June 1, 2026

Continuing work on Data Packaging

My work has always been under the broad heading of Data Packaging: how information is represented, moved, protected, transformed, compressed, decompressed, and made useful across systems.

The Base64 codec in use in millions of devices now, was one practical spinoff of that work. It was not the whole project. It was a clean, useful, standards-friendly tool that came out of a much larger research program into data representation, compression, randomness, encryption, and transformation. Its widespread use is evidence that the work produced real-world value: not speculative value, not paper value, but deployed value. Trantor’s work, my own contribution, and the Canadian SR&ED support behind that research helped produce code and methods that went out into the world and became part of the working substrate of modern computing.

The larger thesis, which I was already pursuing in the early 1990s, was that the apparent limits of compression were not necessarily hard physical limits imposed by CPU, RAM, disk, or transmission bandwidth. In many cases, the real limitation was that we did not yet know how to find the right representational space. If the correct multidimensional transform could be found, then compression would not merely mean packing existing symbols more tightly. It would mean finding the underlying structure from which those symbols could be regenerated.

That is why modern LLM/GPT systems are so interesting to me. They are not the same thing as what I was building, but they partially demonstrate the same intuition. A trained model is, in effect, a compressed representational space. The decoder samples or traverses that space and produces coherent output. It does not merely retrieve stored text. It generates plausible continuations from learned structure. That is close to the old Beethoven example: if the representational space captures enough of what makes Beethoven Beethoven, then the decoder can produce not only known works, but also works that were never written, yet remain structurally plausible.

This is not magic. It is compression, transformation, and decoding at a much higher level of abstraction.

That also means the current generation of AI systems should not be mistaken for the endpoint. They are impressive, but they are still algorithmically immature. They consume vast compute because the underlying methods remain crude compared with what is likely possible. The lesson I take from LLMs is not “we have reached the limit.” It is the opposite. They provide evidence that there are orders of magnitude still available if we improve the representation, the transformations, the indexing, the sampling, the verification, and the packaging of knowledge itself.

So my case is simple:

The earlier grant-supported work already paid back public value through useful, deployed technology.

The project that produced that spinoff is still alive and has become more relevant, not less, in the AI era.

Further support, offered with an open hand rather than merely through narrow SR&ED mechanisms, would likely produce more public value because the core problem has now become central: how to package, verify, compress, transmit, decode, and govern knowledge in an age of generative systems.

Base64 was a useful artifact. The larger work is about the architecture of information itself.

Continuing work on Data Packaging

My work has always been under the broad heading of Data Packaging : how information is represented, moved, protected, transformed, compresse...