Data Transformation with the Amazing New ChatGpt4 ‘Code Interpreter’, with Google Colab

Jul 09, 2023

The recently released ChatGPT4 'Code Interpreter' is now accessible to everyone, bringing significant advancements in code execution. I had the opportunity to test it using Google Colab, a cloud-based version of Jupyter Notebook, and I must say, it's a game changer.

Unlike previous versions that only emulated Python code, ChatGPT4 'Code Interpreter' can actually run Python code. This exciting feature includes the ability to upload files directly into the interpreter, expanding its capabilities. To learn more about this powerful tool, I recommend reading Ethan Mollick's informative article titled "What AI can do with a toolbox... Getting started with Code Interpreter" published two days ago, on Jul 7, 2023.

“This is just scratching the surface of Code Interpreter, which I think is the strongest case yet for a future where AI is a valuable companion for sophisticated knowledge work. Things that took me weeks to master in my PhD were completed in seconds by the AI, and there were generally fewer errors than I would expect from a human analyst. Human supervision is still vital, but I would not do a data project without Code Interpreter at this point.”

See also Mollick’s summary on the ChatGPT subreddit: “Code Interpreter is the MOST powerful version of ChatGPT Here's 10 incredible use cases“, and Shubham Saboo on Twitter: "Multimodal AI is here - GPT-4 can now turn your images into a text file in a snap with the new code interpreter model. Witness the OCR magic in action“ (screengrab at the end of this post).

Refactoring a Google Sheets data transformation template to Python

I was successfully able to make serious headway on a project that I’ve been wanting to do for years: refactoring a Google Sheets data transformation template to Python.

The final Python code uses the pandas library to transform and map data in a dataset. It performs operations such as changing data types, filling missing values, creating new columns, and mapping values based on reference tables.

Here’s what I found to be best practices:

Upload the two csvs into collab, not zipped.

Command:

load these two csvs as dataframes, don’t show me the first few rows of each dataframe:
Load the data:
df_cpy_pst = pd.read_csv('cpy-pst.csv')
Load the reference tables:
df_ref = pd.read_csv('REF.csv')

Copy-paste the text from the .py file into chatgpt4, removing extraneous intro:

Command:

Run the .py code

This new tool has tremendous value.

Data Transformation with the Amazing New ChatGpt4 ‘Code Interpreter’, with Google Colab

Refactoring a Google Sheets data transformation template to Python

Further reading:

Discussion about this post