September 11, 2024

How to Create a DataFrame in Python

A DataFrame is a two-dimensional, size-mutable, and potentially heterogeneous tabular data structure with labeled axes (rows and columns). It is a primary data structure in the pandas library, which is widely used for data analysis in Python. Below are several methods to create a DataFrame in Python using pandas.

1. Importing pandas

Before creating a DataFrame, you need to import the pandas library.

Example:

import pandas as pd

This imports the pandas library and allows you to use it with the alias pd.

2. Creating a DataFrame from a Dictionary

You can create a DataFrame from a dictionary where the keys represent the column names and the values represent the data for each column.

Example:

# Creating a DataFrame from a dictionary
data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [24, 27, 22],
    'City': ['New York', 'Los Angeles', 'Chicago']
}

df = pd.DataFrame(data)
print(df)

Output:

      Name  Age         City
0    Alice   24     New York
1      Bob   27  Los Angeles
2  Charlie   22      Chicago

This example creates a DataFrame with columns for “Name”, “Age”, and “City”.

3. Creating a DataFrame from a List of Lists

You can also create a DataFrame from a list of lists, specifying the column names separately.

Example:

# Creating a DataFrame from a list of lists
data = [
    ['Alice', 24, 'New York'],
    ['Bob', 27, 'Los Angeles'],
    ['Charlie', 22, 'Chicago']
]

df = pd.DataFrame(data, columns=['Name', 'Age', 'City'])
print(df)

Output:

      Name  Age         City
0    Alice   24     New York
1      Bob   27  Los Angeles
2  Charlie   22      Chicago

This creates the same DataFrame as the previous example but uses a list of lists instead of a dictionary.

4. Creating a DataFrame from a List of Dictionaries

You can create a DataFrame from a list of dictionaries, where each dictionary represents a row in the DataFrame.

Example:

# Creating a DataFrame from a list of dictionaries
data = [
    {'Name': 'Alice', 'Age': 24, 'City': 'New York'},
    {'Name': 'Bob', 'Age': 27, 'City': 'Los Angeles'},
    {'Name': 'Charlie', 'Age': 22, 'City': 'Chicago'}
]

df = pd.DataFrame(data)
print(df)

Output:

      Name  Age         City
0    Alice   24     New York
1      Bob   27  Los Angeles
2  Charlie   22      Chicago

This example creates a DataFrame by specifying each row as a dictionary within a list.

5. Creating a DataFrame from a NumPy Array

If you have data in a NumPy array, you can convert it into a DataFrame and specify the column names.

Example:

import numpy as np

# Creating a DataFrame from a NumPy array
data = np.array([
    ['Alice', 24, 'New York'],
    ['Bob', 27, 'Los Angeles'],
    ['Charlie', 22, 'Chicago']
])

df = pd.DataFrame(data, columns=['Name', 'Age', 'City'])
print(df)

Output:

      Name Age         City
0    Alice  24     New York
1      Bob  27  Los Angeles
2  Charlie  22      Chicago

This example creates a DataFrame from a NumPy array, with column names specified.

6. Creating an Empty DataFrame

You can create an empty DataFrame and add data to it later.

Example:

# Creating an empty DataFrame
df = pd.DataFrame(columns=['Name', 'Age', 'City'])
print(df)

Output:

Empty DataFrame
Columns: [Name, Age, City]
Index: []

This creates an empty DataFrame with specified columns, which you can populate with data later.