Hello everyone, this is the Data Wrangler team at Microsoft!
Over the past year, we've engaged with numerous data scientists who expressed frustration over the time-consuming and often tedious tasks of cleaning, preparing, and analyzing their datasets. This process typically involves a trial-and-error approach, repeatedly searching for the right Pandas API for a transformation and verifying the results by examining the output data frames.
To address these challenges, we are thrilled to introduce a new tool within VS Code aimed at streamlining these processes, Data Wrangler.
Data Wrangler is a free, code-centric data viewing and cleaning tool integrated into VS Code and VS Code Jupyter Notebooks. It offers a data grid user interface that allows you to view and analyze your data, provides insightful column statistics and visualizations, and automatically generates the corresponding Python and Pandas code as you clean and transform data using the UI/grid.
Lets take a common scenario where you need to identify and fill missing values in a dataset. Typically, you would write code to determine which columns contain missing values, or even searching online or using ChatGPT to find the appropriate Pandas API. After running the code, you would need to write additional code to verify that the missing values have been addressed. Data Wrangler simplifies this by displaying missing value statistics above each column and providing built-in data cleaning operations to remove or fill these values. It then automatically generates the necessary Pandas and Python code, which can be exported back into your Notebook to be reused.
Over the past year, we've engaged with numerous data scientists who expressed frustration over the time-consuming and often tedious tasks of cleaning, preparing, and analyzing their datasets. This process typically involves a trial-and-error approach, repeatedly searching for the right Pandas API for a transformation and verifying the results by examining the output data frames.
To address these challenges, we are thrilled to introduce a new tool within VS Code aimed at streamlining these processes, Data Wrangler.
Data Wrangler is a free, code-centric data viewing and cleaning tool integrated into VS Code and VS Code Jupyter Notebooks. It offers a data grid user interface that allows you to view and analyze your data, provides insightful column statistics and visualizations, and automatically generates the corresponding Python and Pandas code as you clean and transform data using the UI/grid.
Lets take a common scenario where you need to identify and fill missing values in a dataset. Typically, you would write code to determine which columns contain missing values, or even searching online or using ChatGPT to find the appropriate Pandas API. After running the code, you would need to write additional code to verify that the missing values have been addressed. Data Wrangler simplifies this by displaying missing value statistics above each column and providing built-in data cleaning operations to remove or fill these values. It then automatically generates the necessary Pandas and Python code, which can be exported back into your Notebook to be reused.
Download the free VS Code extension here: https://marketplace.visualstudio.com/items?itemName=ms-tools... Learn more about Data Wrangler in our detailed walkthrough video here: https://www.youtube.com/watch?v=5tWJVLF6PuA&ab_channel=Visua...
We would love for everyone here to try out Data Wrangler on your data! And we’ll be here to help answer questions and receive feedback.
Thanks again!