The agg()
function in Python is commonly used in the pandas
library to perform aggregate operations on DataFrames or Series. It allows you to apply one or more functions to the data, either along rows or columns, to get summary statistics or custom calculations.
1. Basic Usage of agg()
The agg()
function can be used to apply multiple aggregation functions simultaneously. For example, you can calculate both the mean and standard deviation of a DataFrame:
import pandas as pd
# Create a DataFrame
df = pd.DataFrame({
'A': [1, 2, 3, 4],
'B': [5, 6, 7, 8]
})
# Apply aggregation functions
result = df.agg({
'A': ['mean', 'std'],
'B': ['sum', 'max']
})
print(result)
2. Aggregating Rows or Columns
By default, agg()
applies functions along columns. To apply functions along rows, you need to specify the axis
parameter:
# Aggregate along rows
result = df.agg('mean', axis=1)
print(result)
3. Applying Custom Functions
You can also use agg()
to apply custom functions. Here’s an example of applying a custom lambda function to calculate the range (difference between max and min values) of each column:
# Custom function to calculate range
range_func = lambda x: x.max() - x.min()
# Apply the custom function
result = df.agg(range_func)
print(result)
4. Using agg()
with GroupBy
The agg()
function is often used in conjunction with the groupby()
method to perform aggregations on grouped data:
# Create a DataFrame with grouping
df = pd.DataFrame({
'Category': ['A', 'A', 'B', 'B'],
'Value': [10, 20, 30, 40]
})
# Group by 'Category' and apply aggregation
result = df.groupby('Category').agg({
'Value': ['mean', 'sum']
})
print(result)
5. Conclusion
The agg()
function is a powerful tool in pandas
for performing multiple aggregation operations on data. It provides flexibility to apply both built-in and custom functions, making it a valuable feature for data analysis and manipulation.