October 13, 2024

agg() Function in Python

The agg() function in Python is commonly used in the pandas library to perform aggregate operations on DataFrames or Series. It allows you to apply one or more functions to the data, either along rows or columns, to get summary statistics or custom calculations.

1. Basic Usage of agg()

The agg() function can be used to apply multiple aggregation functions simultaneously. For example, you can calculate both the mean and standard deviation of a DataFrame:

import pandas as pd

# Create a DataFrame
df = pd.DataFrame({
    'A': [1, 2, 3, 4],
    'B': [5, 6, 7, 8]
})

# Apply aggregation functions
result = df.agg({
    'A': ['mean', 'std'],
    'B': ['sum', 'max']
})

print(result)
    

2. Aggregating Rows or Columns

By default, agg() applies functions along columns. To apply functions along rows, you need to specify the axis parameter:

# Aggregate along rows
result = df.agg('mean', axis=1)

print(result)
    

3. Applying Custom Functions

You can also use agg() to apply custom functions. Here’s an example of applying a custom lambda function to calculate the range (difference between max and min values) of each column:

# Custom function to calculate range
range_func = lambda x: x.max() - x.min()

# Apply the custom function
result = df.agg(range_func)

print(result)
    

4. Using agg() with GroupBy

The agg() function is often used in conjunction with the groupby() method to perform aggregations on grouped data:

# Create a DataFrame with grouping
df = pd.DataFrame({
    'Category': ['A', 'A', 'B', 'B'],
    'Value': [10, 20, 30, 40]
})

# Group by 'Category' and apply aggregation
result = df.groupby('Category').agg({
    'Value': ['mean', 'sum']
})

print(result)
    

5. Conclusion

The agg() function is a powerful tool in pandas for performing multiple aggregation operations on data. It provides flexibility to apply both built-in and custom functions, making it a valuable feature for data analysis and manipulation.