Bringing WebAssembly to PostgreSQL using Extism

Running Extism Plugins in PostgreSQL

Bringing WebAssembly to PostgreSQL using Extism

Running Extism Plugins in PostgreSQL

by: Muhammad Azeez

wasm udf postgresql plug-in

Database UDFs, or user-defined functions, are functions that are created by users to add custom functionality to a database. With UDFs, users can extend the capabilities of a database beyond its built-in functions and tailor it to their specific needs. UDFs are commonly used to perform complex calculations, manipulate data, or perform custom operations that are not natively supported by the database. They help to reduce the amount of code that needs to be written and improve performance by avoiding the need to move data outside of the database.

Supercharge UDFs with WebAssembly + Extism

We’ve previously discussed the impact of UDFs on modern database workloads, as well as the myriad benefits that WebAssembly (Wasm) brings to the mix. In this post, we’ll show an example of this by augmenting PostgreSQL with an extension that runs Wasm-powered UDFs. We’ll accomplish this feat by using the WebAssembly framework, Extism, which gives us several nice features on top of core Wasm:

  • Extism has Plug-in Development Kits (PDKs) for several programming languages, enabling users to write UDFs in their favorite language and then compile them to Wasm.
  • With Extism, we don’t need to worry about the low-level implementation details for each Wasm runtime or how memory allocation/deallocation is handled.
  • Extism hosts provide developer-friendly constructs on top of Wasm’s sandboxing and capability-based security features.

What’s pgrx

pgrx is a Rust framework for developing PostgreSQL extensions. It abstracts away the internal implementation details of PostgreSQL and allows us to focus on writing our extensions in idiomatic Rust. We can easily define functions from Rust:

  #[pg_extern]
fn to_lowercase(input: &str) -> String {
    input.to_lowercase()
}

and they’ll be available in SQL:

  pg_extism=# select to_lowercase('HELLO WORLD!');
 to_lowercase
--------------
 hello world!
(1 row)

Introducing pg_extism

pg_extism uses pgrx and the Extism Rust SDK to create a PostgreSQL extension that can run Extism plugins. Users can easily define their own custom plugins and use them in their SQL queries. The implementation is based on these 2 functions:

extism_call takes a path of a Extism plugin (a .wasm file), and the name of a function exported by the Extism plugin, and an input JSON object. extism_call then instantiates the plugin using Extism, and calls it and returns the result as a JSON object.

  #[pg_extern]
fn extism_call(path: &str, name: &str, input: Json) -> Result<Json, Error> {
    let json_string = serde_json::to_string(&input.0).unwrap();

    let mut plugin = new_plugin(path);

    let data = match plugin.call(name, json_string) {
        Ok(v) => v,
        Err(e) => error!("Error while calling plugin: {}", e),
    };

    let output = match std::str::from_utf8(data) {
        Ok(v) => v,
        Err(e) => error!("Invalid UTF-8 sequence: {}", e),
    };

    let response_json: serde_json::Value = serde_json::from_str(output).unwrap();

    Ok(pgx::Json(response_json))
}

extism_define takes a path of a Extism plugin (a .wasm file) and a function name and creates an SQL function.

  #[pg_extern]
fn extism_define(path: &str, name: &str) -> Result<(), Error> {
    let mut plugin = new_plugin(path);

    if !plugin.has_function("metadata") {
        error!("Expected a `metadata` function.");
    }

    let metadata_json = match plugin.call("metadata", "") {
        Ok(v) => v,
        Err(err) => error!("Failed to call metadata function: {}", err),
    };

    let metadata: PluginMetadata = match serde_json::from_slice(metadata_json) {
        Ok(v) => v,
        Err(err) => error!("Failed to deserialize metadata: {}", err),
    };

    // Write an SQL function that calls `extism_call` when it's run
    // See below for examples of the generated SQL
    let sql = generate_dynamic_function(path, name, &metadata);
    Ok(pgx::Spi::run(&sql)?)
}

In PostgreSQL, users will only need to call extism_define explicitly:

  select extism_define('/path/to/chatgpt.wasm', 'llm');

This will define an SQL function called llm with the following schema:

  CREATE OR REPLACE FUNCTION llm(prompt TEXT, payload TEXT) RETURNS TEXT AS $$
DECLARE
    result_json json;
    input_param json;
BEGIN
    -- Construct JSON object from parameters
    input_param := json_build_object('payload', payload,'prompt', prompt);

    -- Call the extism_define function using the provided parameters
    SELECT extism_call('/path/to/chatgpt.wasm', 'chatgpt', input_param) INTO result_json;

    -- Return the desired field from the result JSON
    RETURN (result_json->'response')::TEXT;
EXCEPTION
    WHEN others THEN
        -- Handle exceptions if necessary
        RAISE NOTICE 'An error occurred: %', SQLERRM;
        RETURN NULL;
END;
$$ LANGUAGE plpgsql;

Notice how we’re using extism_call to invoke the Extism plugin when the SQL function is called. If the plugin returns an array, then we will generate a similar SQL function:

  select extism_define('/path/to/read_csv.wasm', 'read_csv');
  CREATE OR REPLACE FUNCTION read_csv(path TEXT) RETURNS SETOF JSON AS $$
DECLARE
    result_json json;
    input_param json;
BEGIN
    -- Construct JSON object from parameters
    input_param := json_build_object('path', path);

    -- Call the extism_define function using the provided parameters
    SELECT extism_call('/path/to/read_csv.wasm', '_start', input_param) INTO result_json;

    -- Return the desired field from the result JSON
    RETURN QUERY SELECT value::JSON FROM json_array_elements((result_json->'rows')::json);
EXCEPTION
    WHEN others THEN
        -- Handle exceptions if necessary
        RAISE NOTICE 'An error occurred: %', SQLERRM;
END;
$$ LANGUAGE plpgsql;

Notice how we’re using json_array_elements to turn the json array into SETOF JSON so that it can be used in SQL queries more naturally.

Writing plugins for the database

And how does the generate_dynamic_function know what parameters are necessary, what’s the return type, and what Extism function to invoke? Each pg_extism plugin must export a function called metadata that returns a JSON object like this:

  {
  "entryPoint": "name of exported wasm function to call",
  "parameters": {
    "param1": "String",
    "param2": "Json",
    "param3": "Number"
  },
  "returnType": "StringArray",
  "returnField": "name of the data field on the response JSON object"
}

ChatGPT plugin

This is an Extism plugin that sends a prompt to OpenAI’s ChatGPT API and returns the result. This is the implementation in JavaScript (using Extism’s JS PDK):

  function metadata() {
  const metadata = {
    entryPoint: "chatgpt",
    parameters: {
      "prompt": "String",
      "payload": "String",
    },
    returnType: "String",
    returnField: "response",
  };

  Host.outputString(JSON.stringify(metadata));
}

function chatgpt() {
  const { prompt, payload } = JSON.parse(Host.inputString());

  let body = JSON.stringify({
    "model": "gpt-3.5-turbo",
    "messages": [
      {
        "role": "user",
        "content": `${prompt} ${payload}`,
      },
    ],
  });

  let httpResponse = Http.request(
    {
      url: "https://api.openai.com/v1/chat/completions",
      method: "POST",
      headers: {
        "Content-Type": "application/json",
        "Authorization": `Bearer ${Config.get("openai_apikey")}`,
      },
    },
    body,
  );

  let response = JSON.parse(httpResponse.body);

  if (!response.choices || !response.choices[0]) {
    throw new Error(`Unexpected response from OpenAI: ${httpResponse.body}`);
  }

  console.log(
    "LLM: Received Response from OpenAI",
    response.choices[0].message.content,
  );

  Host.outputString(JSON.stringify({
    response: response.choices[0].message.content,
  }));
}

module.exports = { chatgpt, metadata };

Users can use the plugin in SQL like this:

  pg_extism=# select extism_define('/path/to/chatgpt.wasm', 'llm');

pg_extism=# select llm('summarize this paragraph in less than 5 words: ', 'this tablet is great for reading the text is not a good as it is on my tab s2 9.7 in but this is about $200 cheaper i got it because i wanted a more natural book feel reading my books and the tab s2 was not doing it for me');
                llm
-----------------------------------
 Good tablet, cheaper than Tab S2.
(1 row)

ReadCSV plugin

This plugin is implemented in Go, using Extism’s Go PDK.

  // Some code is omitted for brevity

//export metadata
func metadata() int32 {
	plugin := PluginMetadata{
		EntryPoint: "_start", // we want the Host to call the main method
		Parameters: map[string]string{
			"path": "String",
		},
		ReturnType:  "JsonArray",
		ReturnField: "rows",
	}

	out, err := json.Marshal(plugin)
	if err != nil {
		panic(err)
	}

	pdk.Output(out)

	return 0
}

func read_csv() int32 {
	params := Params{}

	err := json.Unmarshal(pdk.Input(), &params)
	if err != nil {
		panic(err)
	}

	file, err := os.Open(params.Path)
	if err != nil {
		panic(err)
	}

	defer file.Close()

	reader := csv.NewReader(file)
	headers, err := reader.Read()
	if err != nil {
		panic(err)
	}

	var jsonArray []map[string]interface{}

	for {
		record, err := reader.Read()
		if err != nil {
			break
		}

		jsonObject := make(map[string]interface{})
		for i, value := range record {
			jsonObject[headers[i]] = value
		}

		jsonArray = append(jsonArray, jsonObject)
	}

	buff, err := json.Marshal(map[string]interface{}{"rows": jsonArray})

	if err != nil {
		panic(err)
	}

	mem := pdk.AllocateBytes(buff)
	pdk.OutputMemory(mem)

	return 8
}

func main() {
	read_csv()
}

To test the plugins, I have downloaded a subset of this Amazon Reviews dataset from Kaggle.

  -- Create a reviews table
CREATE TABLE reviews (
    id             SERIAL PRIMARY KEY,
    product_id     TEXT,
    name           TEXT,
    rating         NUMERIC,
    review_text    TEXT,
    review_summary TEXT,
    review_sentiment TEXT
);

-- Insert data from /path/to/reviews.csv into reviews table
INSERT INTO reviews (product_id, name, rating, review_text)
SELECT
    json_data->>'id' AS product_id,
    json_data->>'name' AS name,
    (json_data->>'rating')::numeric AS rating,
    json_data->>'review_text' AS review_text
FROM
    read_csv('/path/to/reviews.csv') AS json_data;

-- Ask ChatGPT to summerize and run sentiment analysis on the reviews
update reviews set
review_summary = llm('summerize this in 5 words', review_text),
review_sentiment = llm('give me sentiment of this review (from -1 to 1): ', review_text);

-- Check the results
pg_extism=# select rating, review_summary, review_sentiment from reviews limit 3;
 rating |                     review_summary                     | review_sentiment
--------+--------------------------------------------------------+------------------
      5 | "Perfect size and functional design."                  | "1"
      4 | "Efficient, user-friendly, convenient, touchy button." | "0.8"
      4 | "Convert PDF to Kindle, email articles."               | "0.4"

Future work

While our plugin system for PostgreSQL works, it’s not very fast. Here are some steps that can make it faster and more efficient:

Make communication more efficient by using a different serialization protocol

While JSON is very easy to work with, it can be slow and inefficient. This can add a lot of overhead in chatty scenarios. We can use a binary serialization format like Protocol Buffers or MessagePack to make things faster by both providing faster serialization/deserialization and decreasing the payload size.

Pre-initialize plugins using wizer

Before running Wasm modules, Wasm runtimes need to link appropriate imported functions and initialize their state. Wizer is a Bytecode Alliance tool that allows you to pre-initialize the modules and save a snapshot of them in extism_define. And then, in extism_call you can run the pre-initialized plugin. This can make initialization up to 6 times faster!

Optimize modules using binaryen

binaryen is a Wasm optimizer and compiler toolchain that can make Wasm modules smaller and faster by stripping away unnecessary code and optimizing the wasm opcodes.

Conclusion

We’re excited by all of the opportunities Wasm unlocks, by allowing us to extend systems in a safe and scalable manner. We were able to write PostgreSQL functions easily in two of our favorite programming languages: JavaScript and Go. If you’re also excited about WebAssembly, we want to talk to you! So please reach out on @twitter, email us: hello@dylibso.com, join our newsletter, or hop into the Discord and talk Extism & plug-ins with us.

The source code for pg_extism is available at https://github.com/dylibso/pg_extism

Here to help! 👋

Whether you're curious about WebAssembly or already putting it into production, we've got plenty more to share.

We are here to help, so click & let us know:

Get in touch