PL/SQL is a procedural language designed specifically to embrace SQL statements within its syntax. It includes procedural language elements such as conditions and loops and can handle exceptions (run-time errors).
PL/SQL is native to Oracle databases, and databases like IBM DB2, PostgreSQL, and MySQL support PL/SQL constructs through compatibility features.
JavaScript UDF is Couchbase's alternative to PL/SQL.
JavaScript UDF brings JavaScript's general-purpose scripting flexibility to databases, allowing for dynamic and powerful operations across modern database systems and enhancing flexibility in data querying, processing, and transformation.
Most modern databases like Couchbase, MongoDB, Snowflake, and Google BigQuery support Javascript UDF.
A common problem seen by users migrating from Oracle to Couchbase is porting their PL/SQL scripts. Instead of supporting PL/SQL, Couchbase lets users construct user-defined functions in JavaScript (supported since 2021).
JavaScript UDFs allow easy, intuitive manipulation of variant and JSON data. Variant objects passed to a UDF are transformed into native JavaScript types and values.
The unintended consequence of this is that the majority of RDBMS that have been in existence for the last ten years have strongly encouraged developers to access the database using their procedural extensions to SQL (PL/pgSQL, PL/SQL), which support procedural constructs, integration with SQL, error handling, functions and procedures, triggers, and cursors, or at the very least, functions and procedures (like Sakila). For any attempt to move away from them, all of their scripts would need to be rewritten.
Rewriting code is often a tedious task, especially when dealing with PL/SQL scripts that have been written in the 2000s and maintained since then. These scripts can be complex, often extending to thousands of lines, which can be overwhelming for the average enterprise user.
The ideal approach would be to develop a whole new PL/SQL evaluator, but that would require an excessive amount of engineering hours, and for the same use case, we already have a modern, stable, and fast JsEvaluator -- so why support another evaluator?
This makes the problem a perfect use case to leverage the ongoing advances in AI and LLMs -- and that's what we have done here. We have used Generative AI models to automate the conversion of PL/SQL to JSUDF.
As of June 2024, models have a limited context window, which means longer PL/SQLs get hit with the error:
"This model's maximum context length is 8192 tokens. However, your messages resulted in <More-than-8192> tokens. Please reduce the length of the messages."
So do we wait for AI to become more powerful and allow more tokens (like Moore's Law, but for the AI's context-length-vs-precision)?
No: that's where the ANTLR parser generator tool comes in. ANTLR is well-known to be used for Compiler and Interpreter Development. That way we can break the big script into smaller units that can be translated independently.
We use ANTLR's Listener interface to grab individual Procedure/Function/Anonymous blocks, as they are independent blocks of code. In a case where the Procedure/Function/Anonymous blocks themselves exceed the context window, we translate at a statement level (where the LLM assumes the existence of use of variables/function calls that aren't defined here but somewhere before).
Subsequently, steps 3, 4, 5, and 6 are left to the LLM (GPT), i.e., translating each PL/SQL block into a JavaScript function to the best of its ability that also preserves the operational semantics of the block and is syntactically accurate.
The results are surprisingly quite positive: the translation is 80-85% accurate.
Another benefit of the solution is that we reduce hallucination by focusing on one task at a time, resulting in more accurate translations.
The executable expects the following command-line arguments:
To briefly explain the above script, an outer loop runs for 4 iterations, incrementing x by 1000, counter by 1. The inner loop runs for 4 iterations, incrementing x by 1, counter by 1.
The translated script has a function (name generated by LLM) that does exactly what the original Anonymous PL/SQL block specifies.
PL/SQL and JS are two different languages, and the way they are supported in Oracle and Couchbase doesn't allow for a clean direct mapping between the two. Below are some limitations we discovered and the workarounds we have implemented for the same:
Procedures/Functions listed in the package specification are global and can be used from other packages via . However, the same is not true for a JavaScript Library in Couchbase, as import-export constructs are not supported by Couchbase's JavaScript evaluation implementation.
It is auto-handled by the program -- with a warning that it should be verified by a human set of eyes!
PL/SQL supports package level and session level global variables, but global variables are not supported in JsUDF deliberately by design, as this causes concern for memory leaks.
The suggested workaround requires manual tweaking of the generated translation. For example:
Any function that modifies a global variable must accept it as an argument and return it to the caller.
Any function that only reads a global can accept it as an argument.
This section shows an end-to-end package-to-library conversion using the tool.
This errors out, and that's ok -- we can fix it manually.
Shoot! There's a goof-up: employee 1 isn't Joe, it's Emily.
Again, we have an error with the generated code. Looking at the reason and exception, we can confirm that the translated code encloses in a transaction, which wasn't the case in the original.
For transactions, buckets need to have durability set, but this requires more than one data server; hence, the error.
The fix here is to alter the code to remove the enclosing translation.
Now, all functions in the original PL/SQL work in Couchbase via JS UDFs. Yes, the example is pretty trivial, but you get the gist of how to go about using the tool to migrate your PL/SQL scripts with little manual supervision.
Remember the tool is supposed to take you 80%: the other 20% still needs to be done by you, but much better than writing all of that code yourself!
This project is open-source, so feel free to contribute. Some ideas that were thrown at us included:
And also include the limitations discussed earlier.
Finally, I'd like to thank Kamini Jagtiani for guiding me and Pierre Regazzoni for helping me test the conversion tool.