As I mentioned in my first post in this series, the central purpose of Data Science is to find patterns in data and use these patterns to make useful predictions about the future. It’s this predictive part of Data Science which gives the discipline its mystique; even though Data Scientists actually only spend a relatively small fraction of their time on this area compared to the more workaday activities of loading, cleaning and understanding the data, it’s the step of building predictive models which unlocks the value hidden within the data.
A side-effect of all the time I spend breathing the rarified alpine air of the CDO community is that my SQL skills have become rather rusty. So I’ve been intrigued by the idea of using the code-generation capabilities of tools like ChatGPT and Bard to write SQL for me. But how good is the current crop of LLMs at creating SQL code that not only works, but generates the insight you’re actually looking for? I decided to find out.