Resolved: getting error on sqeeze function when using in read_csv
how can i fix it
Hi Dev Vansh!
Thanks for reaching out!
It seems that in your case the Location.csv file is interpreted as a DataFrame instead of a Series object which leads to an error. This may happen due to an incorrect file or an accidental modification of the existing one within the code.
The squeeze
parameter in pandas is used to convert the data to a pandas Series instead of a DataFrame if the data consists of a single column. The resource file is designed with just one column, specifically for this reason.
I recommend downloading the CSV file once again and ensuring that you execute the code cells in the correct sequence.
Don't hesitate to get back to us for further assistance if needed.
Hope this helps.
Best,
Ivan
I was having the same problem. So here is what I did. It displays the data as in NumPy. Any comments?
Hello everyone!
After investigating the issue in further detail, I discovered that this is a problem related to pandas. It is due to the deprecation of the squeeze
parameter in newer versions of the library. In earlier versions, squeeze=True
was used to convert a single-column DataFrame into a Series. However, with its removal in newer versions, this approach doesn't work the same way as it used to.
So, one option to prevent this issue is to downgrade pandas to a previous version, where the squeeze parameter works, for instance, 1.3.5:
pip install pandas==1.3.5
Alternatively, you can modify the code a bit, read the file as a DataFrame (without applying the squeeze
parameter), and select the single column after reading the CSV file. Here's how you can do that:
import pandas as pd
# Read the CSV file into a DataFrame
data = pd.read_csv('Location.csv')
# Select the single column to convert it into a Series
location_data = data.iloc[:, 0]
# Display the first five rows of the Series
location_data.head()
@Charlemagne, your code should function correctly initially. However, you may encounter challenges later when you are required to make modifications to a Series object. Some of the methods and attributes might not perform as expected if you are using a DataFrame. That's why I suggest working with a Series object from the start.
Hope this helps.
Best,
Ivan
Thanks Ivan, that was helpful. I believe there is a typo in your code. Instead of location_data = data_df.iloc[:, 0] it should be location_data = data.iloc[:, 0].
Thank you Boris!
Good luck and feel free to get back to us for further assistance if needed.
Best,
Ivan
It's probably best to use squeeze method in df like so (https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.squeeze.html)
locations = pd.read_csv("./Location.csv")
locations_s = locations.squeeze('columns')
type(locations_s)
Hi everyone!
I hope you don't mind if I join the conversation.
@Manoj: Indeed, the current version of pandas requires us to use a .squeeze()
method to squeeze the data into a column (or more) at import.
Hope this helps.
Best,
Martin,
The 365 Team
I find this solution works also;
read_csv no longer has a squeeze parameter. It was removed in Pandas 2.0.
Removed arguments ..., squeeze, ... from read_csv()
Instead, use DataFrame.squeeze('columns') after reading the CSV.
I hope this helps!
Hi Romeo!
Thanks for reaching out.
We've already been notified about the lack of explanation on this difference in the pandas course. We will update the course content with a note on that soon.
In the note, we will basically ask you to refer to the following video for an explanation on how to use the .squeeze()
method:
Hope this helps.
Best,
Martin