SQL, Meet AI
GenSQL uses generative AI to simplify complex database queries, making statistical analyses faster without requiring specialized knowledge.
It seems like generative artificial intelligence (AI) is showing up everywhere these days. Initially these tools, ranging from text-to-image generators to chatbots, were largely standalone applications. But as they have risen in popularity, they have increasingly been integrated into everyday tools like spreadsheets and word processors. This integration allows users to leverage AI’s powerful capabilities directly within familiar software environments, enhancing productivity and creativity. For instance, AI can now assist us in generating complex data visualizations in spreadsheets or crafting compelling narratives in word processors, streamlining workflows and reducing the time required for routine tasks.
This integration also brings the power of generative AI to groups of users that may otherwise not understand how to leverage its capabilities. And this forward march of progress is continuing on. The next target — databases. A team led by researchers at MIT and Carnegie Mellon University has developed what they call GenSQL. It is a programming system that was designed for querying generative models of database tables. The goal of GenSQL is to make complicated statistical analyses of large datasets simple by hiding the details.
The majority of databases are interrogated by using structured query language (SQL). Many insights can be gained through carefully crafted queries, however, in order to do this one needs not only an advanced understanding of SQL, but also of the nature of the data itself, statistics, and more. GenSQL, on the other hand, utilizes generative AI models to detect anomalies, make predictions, correct errors, fill in realistic, in-distribution missing values, and more without requiring any such specialized knowledge.
GenSQL was built on top of the SQL language, and it integrates a traditional tabular dataset with a generative probabilistic AI model. To use the system, a user types their question in a plain, natural language sentence. This is then converted into a GenSQL query that resembles traditional SQL. That query is then forwarded into the GenSQL planner, which prepares the query for execution against an interface for a probabilistic model of tabular data. The model then returns an answer, which takes the form of tabular data of the sort that a standard SQL query would return.
One interesting feature of the system is that it is fully auditable. Oftentimes, AI models are black boxes, and it is nearly impossible to determine how they arrived at a particular answer. The probabilistic models used by GenSQL, however, allow users to see the specific data that was used to answer a query. This is a crucial factor when precision is required.
An evaluation of GenSQL was conducted by the team in which it was compared with other existing AI-based data analysis tools that leverage traditional neural networks. It was found that GenSQL was significantly faster — between 1.7 and 6.8 times faster — than these approaches. Furthermore, GenSQL also proved to produce more accurate results. These studies additionally demonstrated the utility of GenSQL for tasks like identifying erroneous data points and generating realistic, synthetic data.
Looking ahead, the researchers believe they can make GenSQL even more powerful with additional optimizations and automations. Ultimately, they hope to develop a simple-to-use ChatGPT-like interface that will enable anyone to ask natural language questions about any database that they are interested in.