How WebAssembly is Eating the Database, One UDF At a Time

User-defined functions (UDFs) have been a fixture in database systems for a considerable period of time, allowing users to extend the database’s built-in functionality to complement good ol’ SQL. Although the traditional UDF is a powerful tool, developers are in most cases forced to use unfamiliar programming languages, typically unique to the database itself. However, with a portable, low-level binary format with sandboxed security controls and broad language support (WebAssembly!), there’s a new wave of UDF implementations emerging and we like what we see.

For example, Singlestore’s Code Engine provides users with the ability to create both UDFs and Table-Valued Functions (TVFs) using code compiled to Wasm, while both InfinyOn and RedPanda enable developers to manipulate data streams from within their platforms using Wasm. Our own Extism, the universal plug-in system, pulls back the curtains with a demo showing how to extend SQLite3 with Wasm UDFs. Let’s take a deeper look into what’s driving this trend, along with some thoughts and considerations for the future.

Polyglot: more languages, now and in the future

Supporting a wide array of programming languages for UDF creation translates to a market expansion for database providers by enabling wider reach into the many developer communities. On the flip side, database users want to write UDFs in the languages that are dearest to their hearts and that provide them with the best tools for productivity. Thus, more language support is clearly a win-win for both database creators and consumers, so why have we seen such limited language support for UDFs in the past? Time and material is the likely suspect. Without a common runtime and executable format for languages to target, a database vendor would need to build/integrate and maintain various language runtimes in order to offer Wasm’s broad language support. This is a complex and time-consuming task, so prioritization becomes the natural fall back, and the obvious move is to pick a widely used general purpose programming language (e.g. JavaScript) or create a Domain-Specific Language (DSL).

With the former option, the vendor has tied the success of their UDF engine to the chosen language, and with the latter, they have done the same, but also introduced more friction to product adoption due to the imposed learning curve and lack of transferrable skills.

We’re happy to say, it’s time to have your UDF cake — and eat it too. At the time of this writing, there are well over ten languages that target Wasm directly, and as additional languages are able to do so in the future, database platforms could gain these additions with little to no incremental work required.

Security

In addition to supporting various language runtimes, database providers must also ensure proper sandbox controls are in place so that the host system (i.e. the database) is protected from any unintended or malicious side effects. Wasm is particularly well adapted to this as runtimes are designed to execute code in a sandboxed environment with restricted access to the resources of the host system by default. With this isolation built-in, database providers can spend less time rolling their own secure runtime, and more time being cute on Twitter.

Performance & Portability

Wasm code is well-known for its especially small footprint, which makes it easy to bring compute to the data, and near native execution speeds makes it a natual fit for complex data workloads.

Datasets grow larger by the day, making it cumbersome and costly to ferry them back-and-forth between database and application for processing. Aside from the added latency created by such workflows, additional points of failure are added with each new application connected to a database, and concerns related to data residency both make running Wasm directly alongside the data quite appealing. Being able to keep your data where it lives, and bring (at least part of) the application to it is a huge simplification to an unclear and complex regulatory environment.

Additionally, the ability to bring compute to data eliminates the need for as many microservices — run those in the database instead!

As far as portability is concerned, why should database users implement their UDFs over and over again for different host databases? Write once, and run ~~anywhere~~ in any database! A traditional benefit of a UDF is that it can be reused across SQL queries, but why restrict that reusability to one database platform? With Wasm in the mix, the scope of UDF reuse can, in theory, be expanded to any database that supports a Wasm runtime along with a standardized interface into host capabilities. In practice, this would require some level of collaboration across the database provider ecosystem, or efforts by the community to virtualize various interfaces into a common standard, but the possiblity, and the associated benefits, are there for the taking.

Additional Thoughts and Considerations

Wasm is clearly a force multiplier for UDFs, but let’s talk about some of the design and implementation aspects that should be considered.

Host interfaces and amenities

The database provider must support a set of imports and exports that will serve as the foundation for the interaction between the database and a Wasm UDF. As noted above, it benefits the entire ecosystem if that interface can be standardized. WASI SQL Embedding is an example of one such proposal to bring standards to the way in which WASM modules can be embedded in SQL databases as extensions.
Designing high-quality language support and libraries is critical for success as UDF creators will need such support to achieve their implementation goals. For example, Python developers will want, and expect, NumPy to be available for analytical workloads. Database providers should consider including such support at the host level, escpecially when native extensions are required for performance reasons. Doing so provides users with access to the necessary libraries without forcing them to be compiled directly into the Wasm UDF, which would increase the size of the module itself.
When possible, language-specific wrappers should be provided around any imports in order to provide idiomatic language support. Look to open source projects such as Extism, which provide a good head-start for teams creating these types of utilities.

Transaction Isolation/Integrity

Most production-ready databases already run UDFs within transactions, allowing for seamless rollbacks when exceptions occur, and eliminating the need for failure compensation to be implemented in the mutation code itself. That being said, considering how transactions fit into Wasm flavored UDFs and their view of the world is important, and still very much an “exercise left to the reader” for database implementers to solve.

Observability and Validation

What tools do database providers and UDF creators need to observe and inspect the performance of UDFs and debug them when the unexpected happens? Multi-tenant observability is an underserved need within the ecosystem that should be given careful thought.
Back to the topic of host/guest interface standardization, if a bespoke interface is a must, consider the usage of tools such as Modsurfer, which provide a way for database providers and UDF creators to validate mutual compatibility in a proactive manner. The existence of certain imports and exports, runtime complexity, binary size, and function signatures are all critical components that should be part of any validation scheme.

UDFs and so much more

It would be an error of omission if we didn’t mention the other aspects of the database that are ripe for extensibility with WebAssembly aside from UDFs. SycllaDB provides us with a detailed look into their impelementation of both UDFs and User Defined Aggregates (UDAs). Stored procedures, table-valued functions (TVFs), and even deeper modifications to the core database itself (think new types, indexes, etc.) are also interesting focus areas. Time to kick the tires and let users express their deepest data driven desires!

As Wasm continues to entrench itself as an indespensable component of the computing landscape, we hope to see more support for it in the databases we know and love (e.g. Supabase, Planetscale, Neon). Got some additional thoughts on this topic? Reach out to us at hello@dylibso.com — we’d be glad to talk shop!

Dominique Saulet, Director of TPM
Steve Manuel, Co-founder + CEO

A special thank you to Carl Sverre and Shomik Ghosh for reviewing this post!