18.12.2022
This is too fast!! If it is expected that the student should have some background before taking this course, it needs to be mentioned in the course overview
Covering the essentials and providing hands-on experience working with *.csv, *.txt, *.json, and other types of text files in Python. With these tools under your belt, you’ll be an independent analyst, ready to gain more insights from your data.
By completing this course, you’ll obtain a comprehensive set of theoretical and practical tools to understand and handle raw data. We will extensively cover the difference between a file and file object, reading and parsing, and structured and unstructured data. Then, we’ll dive into the complex world of data connectivity by solving hands-on tasks in Python with *.txt, *.json, *.xlsx, and *.csv formats. The more you learn about data science and become an authentic practitioner, the more you realize that you’ll be independent in your research and making your predictions if (and only if) you can manage your data by yourself.
At the beginning of your analyses, data will rarely be waiting for you neatly and cleanly in a, say, *.csv format that you can load and use for statistical analysis directly. On the contrary, real-world data is messy, comes from various sources, and contains errors and missing or unknown values. It’s vague and, on top of that, comes in different formats, shapes, and sizes. This course teaches you how to tame this chaos.
What do you need to know to manage text files in Python easily? Apart from learning about text files and data connectivity, there are several distinctions to clarify, among which include File vs File Object, Reading vs Parsing Data, Structured vs Semi-structured and Unstructured Data.
In this section, we dig deeper into the contents of a text file and learn about its various types—giving you some ideas on how to work with a messy dataset stored in a text file, regardless of the latter’s format. More precisely, we will clarify terms like ‘character’, ‘separator’, ‘delimiter’, ‘interpreter’, and ‘encoding’ and discuss the most notable types of text files: plain text files and flat files, and fixed-width files. To complete the principles you’ll need, we conclude this section with a presentation of the most widely used naming conventions in programming.
Once you’re equipped with general knowledge about working with text files and the specific skills to import them in Python, it’s time to see how things work in practice. In this section, we go into the details about importing data with Python’s open() function, *.csv files with pandas, *.json files in Python, and Excel files. We’ll also discuss several tips and tricks to improve the quality of your dataset in Python and show you how to save it.
Through practice and taking certain risks while coding, one can become proficient in dealing with real-world raw text files.
Student feedback
“A good data scientist spends more time preparing their data than using it for prediction afterward. In just a couple of hours, you’ll acquire reliable guidelines for handling data in various formats and learn to apply them in Python.”
Worked at the European Commission
Working with Text Files in Python
with Martin Ganchev