Pandas
Explore the Flashcards:
An open-source data analysis and data manipulation library for Python.
It provides data structures like DataFrames and Series to efficiently handle and analyze data.
Python Documentation
Refers to the official guides, tutorials,and references provided by the Python Software Foundation to help users understand and utilize Python's features and libraries.
Python Library
A collection of modules and packages that provide reusable functions and tools to perform various tasks,such as data manipulation, web development, and machine learning.
Python Object
An instance of a class that encapsulates data and functions.
Objects are the basic building blocks of Python programming, allowing for structured and modular code.
Pandas DataFrame
A two-dimensional, size-mutable, and potentially heterogeneous tabular data structure with labeled axes (rows and columns)used for data manipulation and analysis.
Series
A one-dimensional, labeled array capable of holding data of any type (integer, string, float, etc.). It is similar to a column in a DataFrame but can also function as a single array.
Variables
Used to store data values. They are symbolic names that reference or point to an object or value, allowing for data manipulation and processing within the program.
Panel Data
A multi-dimensional dataset involving measurements over time. In Pandas, it can be represented using multi-index DataFrames to handle complex data structures.
Metadata
Refers to data that provides information about other data. In the context of Pandas, metadata can include information such as data types, column names, and descriptions of the dataset.
Single-column Data
Refers to a dataset that consists of only one column of values. In Pandas, this is typically represented as a Series, which is a one-dimensional array-like structure.
Multi-column Data
Refers to a dataset that consists of multiple columns of values. In Pandas, this is represented as a DataFrame, which is a two-dimensional array-like structure with labeled axes.
What will the this function return?
version Method
Refers to checking the installed version of the Pandas library using pd.__version__. It helps ensure compatibility with other code and libraries.
What will the this function return?
dtype: object
Numpy
A fundamental Python library for numerical computations, providing support for arrays, matrices, and a wide range of mathematical functions to operate on these data structures.
What will the this function return?
0 10
1 20
2 30
3 40
4 50
What will the this function return?
dtype: int32
Attributes
Refer to the properties or metadata associated with data structures like DataFrames and Series. Common attributes include shape, dtype, and index.
pd.dtype
Refers to the data type of the elements in a Series or DataFrame column. It provides information about the kind of data stored, such as integers, floats, or strings.
pd.size
Refers to the total number of elements in a DataFrame or Series. It is calculated as the product of the DataFrame's shape dimensions (rows multiplied by columns).
pd.name
Refers to the name of a Series. It is an attribute that can be set to provide a meaningful identifier for the Series, which can be useful for labeling and documentation purposes.
What will the this function return?
int32
What will the this function return?
5
What will the this function return?
object
What will the this function return?
4
What will the this function return?
Product Categories
Indexing
Refers to accessing and modifying data in Series or DataFrames using labels, integers, or boolean arrays. It allows for efficient data selection and manipulation.
Label-based Indexing
Refers to accessing data using the labels or names of the rows and columns. It is achieved using the .loc accessor, enabling selection based on explicit index values.
Position-based Indexing
Refers to accessing data using the integer positions of the rows and columns. It is achieved using the .iloc accessor, enabling selection based on numerical positions.
RangeIndex
A default index for DataFrames and Series created using a range of integers. It is efficient and memory-saving, commonly used when explicit indexing is not necessary.
What will the this function return?
<class 'dict'>
What will the this function return?
Inedex(['Product A', 'Product B', 'Product C'], dtype='object')
Indices
Refer to the labels or positions that uniquely identify rows and columns in a DataFrame or Series. They facilitate data alignment and access.
What will the this function return?
22250
What will the this function return?
15600
What will the this function return?
20
What will the this function return?
10
What will the this function return?
10
What will the this function return?
10
Methods
Refer to the functions that are associated with DataFrame and Series objects. These methods perform operations such as aggregation, transformation, and data manipulation.
Functions
Rrefer to built-in methods that perform specific operations on data structures. They include aggregation, transformation, and manipulation functions like sum(), mean(), and groupby().
pd.sum()
A function that returns the sum of the values over the requested axis in a DataFrame or Series. It can be used for quick aggregation of numerical data.
pd.min()
A function that returns the minimum value over the requested axis in a DataFrame or Series. It is useful for finding the smallest value in a dataset.
pd.max()
A function that returns the maximum value over the requested axis in a DataFrame or Series. It is used to identify the largest value in a dataset.
pd.idmax()
A function that returns the index of the first occurrence of the maximum value over the requested axis in a DataFrame or Series. It helps locate the position of the highest value.
pd.idmin()
A function that returns the index of the first occurrence of the minimum value over the requested axis in a DataFrame or Series. It helps locate the position of the smallest value.
pd.head()
A function that returns the first n rows of a DataFrame or Series. By default, it returns the top 5 rows, providing a quick preview of the dataset.
pd.tail()
A function that returns the last n rows of a DataFrame or Series. By default, it returns the bottom 5 rows, allowing a quick look at the end of the dataset.
dropna()
A pandas function that removes missing values from a DataFrame. Rows or columns with missing values can be dropped using this function.
fillna()
A pandas function used to replace NaN values with a specified value.
merge()
A pandas function that combines DataFrames using database-style join operations based on common columns or indices.
concat()
A pandas function used to concatenate DataFrames along a particular axis (row-wise or column-wise).
drop_duplicates()
A pandas function used to remove duplicate rows from a DataFrame.
groupby()
A pandas function that splits data into groups based on some criteria and applies a function to each group independently.
interpolate()
A pandas function used to fill NaN values using various interpolation methods.
isnull()
A pandas function that detects missing values in a DataFrame, returning a DataFrame of the same shape with boolean values.
notnull()
A pandas function that detects non-missing values in a DataFrame, returning a DataFrame of the same shape with boolean values.
What will the this function return?
6100
What will the this function return?
100
What will the this function return?
2000
What will the this function return?
7/4/2014
What will the this function return?
1/2/2015
What will the this function return?
What will the this function return?
Parameters
Refer to the variables that are used in the function definition to accept input values. They define what kind of arguments the function can accept.
Arguments
The actual values or data that are passed to a function when it is called. They correspond to the parameters defined in the function signature.
What will the this function return?
What will the this function return?
What will the this function return?
pd.describe()
A function that generates descriptive statistics of a DataFrame or Series, including count, mean, standard deviation, minimum, and maximum values, and quartiles.
pd.unique()
A function that returns the unique values in a Series or DataFrame column. It is useful for identifying distinct values within a dataset.
pd.nunique()
A function that returns the number of unique values in a Series or DataFrame column. It helps to understand the variability within the data.
pd.values()
An attribute that returns the underlying data of a DataFrame or Series as a NumPy array. It allows for efficient manipulation and computation.
pd.array()
A function that creates an array object from a data structure. It is used to create new array-like structures, which can be useful for certain operations.
pd.to_numpy()
A function that converts a DataFrame or Series to a NumPy array. This is useful for performing operations that require NumPy arrays.
pd.sort_values()
A function that sorts the values in a DataFrame or Series by the specified axis. It is used for organizing data in ascending or descending order.
What will the this function return?
What will the this function return?
Attribute Chaining
Refers to accessing multiple attributes in a single line of code. It allows for concise and readable data manipulation.
Method Chaining
Refers to applying multiple methods in succession on a DataFrame or Series in a single line of code. It improves code readability and efficiency.
What will the this function return?
None
What will the this function return?
Series vs. DataFrame
Series and DataFrames as Programming Objects
Data Selection
Refers to accessing specific subsets of data within a DataFrame or Series using indexing, slicing, and boolean indexing techniques.
pd.iloc[]
An indexer for position-based indexing. It is used to select data by row and column positions, specified as integer indices.
pd.loc[]
An indexer for label-based indexing. It is used to select data by row and column labels, specified as strings or boolean arrays.
Dos and Don'ts for .iloc[] and .loc[]
Data Consistency
Refers to ensuring that the data in a dataset is accurate, reliable, and follows defined rules. It involves maintaining the integrity of the data throughout its lifecycle.
Data Cleaning
Refers to the process of identifying and correcting errors or inconsistencies in a dataset. It includes handling missing values, duplicates, and incorrect data types.
Data Preprocessing
Involves preparing raw data for analysis by transforming it into a clean and usable format. It includes tasks such as normalization, encoding, and scaling.
Data Preparation
Refers to the steps taken to ready a dataset for analysis or modeling. It encompasses data cleaning, preprocessing, transformation, and feature engineering.