Under the Hood – Text to SQL with LangChain, LLM’s and Hana

In my recent blog  Data Wizardry – Unleashing Live Insights with OpenAI, LangChain & SAP HANA I introduced an exciting vision of the future—a world where you can effortlessly interact with databases using natural language and receive real-time results.


The process involves LangChain, OpenAI, and Hana working together to unlock the potential of natural language based data analysis.


Magic revealed

Now, let’s explore further to demystify some of the magic. Let’s focus on what we asked the LLM to do and the risks it poses to Enterprises.

LangChain’s initial task was to enhance a simple question like “Which product was our bestseller in Australia in January?” by providing additional prompts. These prompts help the LLM understand the question better and format the response appropriately.

To see a simplified version of the enriched question, which doesn’t require a database connection, check it out here:

simplified ChatGPT prompt.txt


Now, I input this enriched question into  ChatGPT   (no API Key required).

Feel free to try it yourself!   What answer does it give you?


in a new ChatGPT chat  here is the enriched question:






Now, we eagerly await ChatGPT’s response.


ChatGPT Reponse (Click to enlarge)


Hey Presto!‘ it provides an answer that Oranges were the best-selling fruit in Australia in January.


However, there’s a catch—ChatGPT doesn’t have access to my Hana database. It simply made an educated guess. Given that my dataset only includes five fruits, it had a chance of getting it right. But was OpenAI somehow observing my previous API communications to help guess the answer?


The fact that ChatGPT provided a completely fictitious quantity of 250.14 sold might offer some reassurance (or maybe not).


In reality, the latest data in Hana tells a different story: we actually sold 3346.279 kg of oranges.


Hana SQL Results (Click to enlarge)


Moreover, consider the scenario where a user enters a harmful question like “How do I delete all the billing information?”

Here’s what you might get back:


ChatGPT Risky Response (Click to enlarge)


Well, ChatGPT attempted to answer it… and that’s concerning if this response was connected to a process that executed it and we’d inadvertenly given the user authorisation for deletions.


The Challenges

An LLM lacks knowledge of your specific data structures and the appearance of your data. Without that knowledge, it relies on guesswork.

So, while it’s possible to enrich the question with enough context to make it work, there are trade-offs.

Sharing table structures, sample data rows, and business annotations was necessary in my case. But would you be comfortable doing the same with your enterprise information?

Scalability is also a factor. If you only have a few tables, it might be practical and efficient to use an LLM.

But when dealing with solutions like S/4, consisting of thousands of views with numerous columns, sharing metadata for every question becomes impractical, introduces latency and cost.

Would you expose your business questions and query responses to an LLM?

What sensitive information might those API communications contain?

While Azure’s OpenAI offering addresses some concerns, others require careful consideration when developing a custom solution.

Open LLMs are rapidly advancing, as the underlying transformer architecture remains relatively similar. Organizations with the Datasets, GPUs, and skilled engineering teams may find this a viable option today.

Perhaps in the future, LLMs pre-trained on SAP-centric data structures and easily fine-tuned for unique business questions will become more mainstream.


In the following blog I demonstrate how fine-tuning an LLM on Enterprise data has some potential.

Into the SQL weeds – Fine-Tuning an LLM with Enterprise Data


I welcome your thoughts on additional risks and concerns for Enterprises in the comments below.


Original Article:

Related blogs


Please enter your comment!
Please enter your name here