PySpark Drop Column

Name: Reviews
Brand: HKRTrainings
Rating: 5.0 (2806 reviews)

Last updated on Jun 12, 2024

by Manikanth

PySpark Drop Column - Table of content

Dropping Single column in pyspark
Dropping Multiple Column in PySpark
Dropping Column Using the Position in PySpark
Dropping Columns that have Null values in PySpark
Dropping Column with the NaN/NA values in PySpark
Conclusion

Dropping a Single Column in PySpark:

In Pyspark, using the drop() function, we can drop a single column. Drop function with the column name as an argument will delete this particular column.

Syntax: df_orders.drop(‘column1’). show()

When we execute the above syntax, column1 column will be dropped from the dataframe.

We can also drop a single column with the drop function using df.name_of_the_column as an argument.

Syntax: df_orderd.drop(df_orders.column1).show()

If we execute the above syntax, then column1 column will be dropped from the dataframe.

Dropping Multiple Column in PySpark:

We can also drop a number of columns into pyspark using the drop() function. We need to include the columns name list as the argument in the drop function to drop those columns.

Syntax: df_orderd.drop(‘column1’, ‘column2’).show()

When we execute the above syntax, column1 and column2 columns will be dropped from the dataframe.

Start learning PySpark training from hkrtrainings to make a bright career in the world of PySpark!

PySpark Training Certification

Master Your Craft
Lifetime LMS & Faculty Access
24/7 online expert support
Real-world & Project Based Learning

Explore Curriculum

We can also drop multiple columns using the drop() function with another method. The list of column names to be deleted is listed under "columns_to_be_dropped". Then we need to pass this list to the drop() function.

Syntax: columns_to_be_dropped =[‘column1’, ‘column2’]

df_orderd.drop(*columns_to_be_dropped).show()

When we execute the above syntax, column1 and column2 columns will be dropped from the dataframe.

There is another method through which we can drop multiple columns. In this method, we include two drop() functions. This function will drop the columns one after the other in a single step.

Syntax: df_orderd.drop(df_orders.column1).drop(df_orders.column2).show()

If we execute the above syntax, then column1, column2 columns will be dropped from the data frame one by one in a sequence.

Dropping Column Using the Position in PySpark:

Deleting more than one column using the position in pyspark is done in a rounded manner. The required list of columns and rows are extracted using the select() function first and then converted into a dataframe.

Syntax: spark.createDataFrame(df_orders.select(df_orders.columns[:2]).take(5)).show()

Frequently asked PySpark Interview questions & answers

When we execute the above syntax, column1 and column2 columns will be dropped from the dataframe.

Dropping column with column name that begins with a particular string in PySpark:

Deleting more than one column that starts with a particular string in pyspark is done in a rounded manner. The column name list that starts with a particular string will be extracted first with the help of the startswith() function, and this function is then passed to the drop() function.

Syntax: list_name=df_orders.columns
             Columns_to_be_deleted = [i for i in list_name if i.startswith(‘column’)]
             df_orders.drop(*column_to_be_deleted).show()

When we execute the above syntax, columns that start with the name 'column' will be dropped from the dataframe.

Subscribe to our YouTube channel to get new updates..!

Dropping column with column name that terminates with a particular string in PySpark:

Deleting more than one column that terminates with a particular string in pyspark is done in a rounded manner. The column name list that terminates with a particular string will be extracted first with the help of endwith() function, and this function is then passed to the drop() function.

Syntax: list_name=df_orders.columns
             Columns_to_be_deleted = [i for i in list_name if i.endswith(‘id’)]
             df_orders.drop(*column_to_be_deleted).show()

When we execute the above syntax, columns that terminate with the name 'id' will be dropped from the dataframe.

Dropping column name that contains a particular string in PySpark:

Deleting more than one column that contains a particular string in pyspark is done in a rounded manner. The column names that contain a particular string will be extracted first, and then this function is then passed to the drop() function.

Syntax: list_name=df_orders.columns
             Columns_to_be_deleted = [i for i in df.columns if i._contains_(‘name’)]
             df_orders.drop(*column_to_be_deleted).show()

When we execute the above syntax, columns that contain 'name' will be dropped from the dataframe.

We have the perfect professional PySpark Tutorial for you. Enroll now!

Dropping Columns that have Null values in PySpark:

Deleting more than one column that contains Null values in pyspark is done in a rounded manner by creating the user-defined function. The column names that contain null values will be extracted first using the isNull() function, and this function is then passed to the drop() function.

Syntax: import pyspark.sql.functions as X
             def drop_null_columns(df_ord):
             null_counts = df_ord.select([X.count(X.when(X.col(a).isNull(), a)).alias(a) for a in df_ord.columns]).collect()[0].asDict()
             to_delete = [p for p, u in null_counts.items() if u>0]
             df_ord = df_orders.drop(*to_delete)
             return df_ord
             drop_null_columns(df_ord).show()

When we execute the above syntax, columns that contain null values will be dropped from the dataframe.

PySpark Training Certification

Weekday / Weekend Batches

See Batch Details

Dropping Column with the NaN/NA values in PySpark:

Deleting more than one column that contains NaN/NA values in pyspark is done in a rounded manner by creating the user-defined function. The column names that contain NaN/NA values will be extracted using the isnan() function, and this function is then passed to the drop() function.

Syntax: import pyspark.sql.functions as X
             def drop_null_columns(df_ord):
             null_counts = df_ord.select([X.count(X.when(X.col(a).isnan(), a)).alias(a) for a in df_ord.columns]).collect()[0].asDict()
             to_delete = [p for p, u in null_counts.items() if u>0]
             df_ord = df_orders.drop(*to_delete)
             return df_ord
             drop_null_columns(df_ord).show()

When we execute the above syntax, columns that contain NaN/NA values will be dropped from the dataframe.

Conclusion:

In this blog, we have learned to drop a single column, multiple columns, a column with column name that begins with a particular string, a column with column name that terminates with a particular string, column by column position, a column with the NaN/NA values and null values in pyspark. I hope the information provided in this blog is helpful. Feel free to comment if you have any queries.

Related Articles:

PySpark Filter
PySpark Join On Multiple Columns

About Author

Manikanth

As a Senior Writer for HKR Trainings, Sai Manikanth has a great understanding of today’s data-driven environment, which includes key aspects such as Business Intelligence and data management. He manages the task of creating great content in the areas of Digital Marketing, Content Management, Project Management & Methodologies, Product Lifecycle Management Tools. Connect with him on LinkedIn and Twitter.

Upcoming PySpark Training Certification Online classes

Batch starts on 8th Aug 2025

Sat & Sun (6 Weeks) Fast Track

Timings - 08:30 AM IST

Batch starts on 12th Aug 2025

Mon & Tue (5 Days) Weekday

Timings - 08:30 AM IST

Batch starts on 16th Aug 2025

Mon - Fri (18 Days) Weekend

Timings - 10:30 AM IST

View Details

PySpark Drop Column

Dropping a Single Column in PySpark:

Dropping Multiple Column in PySpark:

PySpark Training Certification

Dropping Column Using the Position in PySpark:

Dropping column with column name that begins with a particular string in PySpark:

Subscribe to our YouTube channel to get new updates..!

Dropping column with column name that terminates with a particular string in PySpark:

Dropping column name that contains a particular string in PySpark:

Dropping Columns that have Null values in PySpark:

PySpark Training Certification

Dropping Column with the NaN/NA values in PySpark:

About Author

Upcoming PySpark Training Certification Online classes

Recommended Trainings

Recommended Articles