Join the DZone community and get the full member experience.
Join For Free
Amid the many technological evolutions in software and hardware (CISC/RISC, Internet, Cloud, and AI), one technology has endured: Relational Database Systems (RDBMS), aka SQL databases. For over 50 years, RDBMS has survived and thrived, overcoming many challenges. It has evolved and adopted beneficial features from emerging technologies like object-relational databases and now competes robustly with NoSQL databases.
Today, RDBMS dominates the market, with four of the top five databases and seven of the top ten being relational. RDBMS has smartly borrowed ideas, like JSON support, from NoSQL, while NoSQL has also borrowed from RDBMS. NoSQL no longer rejects SQL. From a user perspective, all modern databases have SQL-inspired query language and a set of APIs. All applications manage the respective data model and data via these DDLs and DMLs.
The question now stands: will SQL continue to dominate, or will SQL++ take over? What does natural language processing mean for database query language?
To answer these strategic questions, let's visit the evolution of APIs in the database and other industries.
For a technological revolution to occur, the underlying technology and the user interface (or API) must change. Every technology is limited by the difficulty of using or accessing it; making it accessible is an important dimension. This shift brings in new users who were previously non-consumers of the technology.
When the world transitioned from telegraphs to telephones, voice communication replaced Morse code, making it accessible to a broader audience. Similarly, Oracle and DB2's implementation of SQL freed programmers from the complexities of record pointer manipulation in the hierarchical databases. Oracle was the first to market with SQL and has maintained its lead for over 45 years. In every technological revolution, the company's introduction of a new, easy-to-use API supported by innovative technology has supplanted the incumbent and dominated the market. Examples include Western Union to AT&T, IBM to Oracle, and Nokia to Apple.
It's interesting that IBM scientists invented relational databases and SQL; Oracle and others benefited most from them. We need to remember the difference between "invention" and its cousin, "innovation," which brings ideas to market.
Note 1: Telegraph did not replace postal, not because it improved the "interface" but because it improved the speed of communication by orders of magnitude. Before Twitter, sending effective telegrams with a few words was a learned skill.
Note 2: See the inspiring story of Morse code here.
The battle between NoSQL and RDBMS has centered on performance, availability, and scalability. NoSQL databases feature declarative query languages that extend SQL to varying degrees, with MongoDB's MQL mimicking many aspects of SQL. These innovations aim to address the limitations of traditional relational databases while leveraging NoSQL's strengths. Despite these advancements, SQL has proven to be unreasonably effective, and every new database, relational or not, attempts to incorporate or emulate its principles. This has given RDBMS a significant advantage. Even after 15 years of NoSQL developments, Oracle and other RDBMS still dominate, with SQL continuing to grow. While cloud databases and NoSQL solutions have drastically reduced transaction costs,
ChatGPT and the Large Language Models (LLM) represent a new paradigm in data management and data wrangling. They process all publicly available data, regardless of format, and leverage natural language Q&A as its primary interface. Users interact through both a browser-based shell (chat) and a natural language interface (API), ensuring accessibility and ease of use. For every question (query), ChatGPT model provides an answer, accurate or not, captivating users with its broad capabilities. It seamlessly processes and generates multi-modal data, showcasing its versatility and potential to redefine how we interact with and utilize data.
It's time to rethink the data model and the user API for databases to meet the evolving needs of humans, developers, and AI copilots. Each new generation of databases has introduced distinct data models and interfaces through new languages or APIs. Today, databases can handle any data if it's converted to JSON, which limits their reach. Imagine bringing the ease of use of ChatGPT into the enterprise and everyday use cases. This shift necessitates a new data model and query approach, which we call the Natural Data Model and Natural Queries.
By leveraging natural language interactions, we can simplify complex data tasks, making data management more intuitive and accessible for users at all levels. This innovative approach promises to streamline workflows, enhance productivity, and democratize data access across the enterprise, ultimately transforming how organizations harness and utilize their data assets.
Here's the table with databases, data models, and APIs:
Note: NoSQL technologies have replaced relational databases in specific use cases and industries but have not summarily replaced SQL databases (RDBMS). The theory of disruptive innovation says you have to make the interface much easier to attract non-consumers of previous-generation technology to your technology.
The Natural Data Model is an innovative framework designed to accommodate and process any type of data users possess, regardless of format. Whether the data is structured, semi-structured, or unstructured data (e.g., JSON, CSV, TSV, Avro, simply text), the Natural Data Model seamlessly accesses it, transforms it if necessary, and queries it. This model prioritizes flexibility and accessibility, allowing users to effortlessly work with diverse datasets without extensive data preparation or conversion. By embracing a universal approach to data formats, the Natural Data Model ensures that users can focus on deriving insights and making data-driven decisions rather than getting bogged down by the complexities of data management. This approach fosters a more intuitive and efficient interaction with data, empowering users to unlock the full potential of their information assets.
Natural Queries represent a revolutionary approach to interacting with data, allowing users to formulate queries in a natural language like English. By enabling completely natural interactions similar to ChatGPT's Q&A format, users can ask questions and receive answers without mastering complex query languages. This will be for both analysis and manipulation. This system also supports business-specific lingo, adapting to the unique terminologies and requirements of different industries and companies. Natural Queries can be used in interactive applications, offering more predictable and structured interactions that closely align with traditional SQL but with the simplicity and ease of natural language. This approach makes data querying more intuitive and accessible.
Yes, there has been a huge interest in adding layers on top of databases to convert natural language questions into SQL. That's a start, not the end state.
Natural answers (or results) should follow the principles championed by Edward Tufte: they should be presented in a form that directly addresses the question and facilitates clear understanding and analysis. These answers might take the form of structured data, text, charts, images, or any combination of these. It's similar to how you approach a school exam -- where the question doesn't dictate the format of your response. You provide whatever is necessary to fully answer and explain the question, ensuring clarity and comprehension.
The technical initiatives around these three ideas will make databases easier to use, lower the barrier to use databases, and expand the use of sophisticated databases.