Pandas DataFrame Python: Build and Filter Rows

Create a pandas DataFrame in Python with pd.DataFrame(data), using a dictionary of equal-length lists for the columns and values. Use a DataFrame when records need labeled rows and columns, column selection, or vectorized filtering. A boolean mask filters tabular data without writing a Python loop. The result keeps its original index labels unless you reset them explicitly.

Python Pandas DataFrame Example For Filtering Rows

import pandas as pd

sales = pd.DataFrame({"product": ["Keyboard", "Mouse", "Monitor"], "units": [12, 7, 15]})
ready = sales[sales["units"] >= 10]
print(ready.to_string(index=False))

Output:

Output will appear here...

Output:

product  units
Keyboard     12
 Monitor     15

How This Example Works

import pandas as pd gives the package its conventional pd alias.
The pandas DataFrame constructor turns each dictionary key into a column label. Because both lists contain three values, pandas creates three rows and supplies a default RangeIndex containing 0, 1, and 2.
sales["units"] >= 10 produces a boolean Series aligned with that index. Passing the mask back to sales[...] keeps only rows whose value is True.
to_string(index=False) hides the index in the printed table; it does not remove or renumber the stored labels.

Prove What the DataFrame Run Actually Did

Run the program twice on the same page, opening Run Details after each completion. This turns the panel into a dependency experiment rather than mistaking the cold-run duration for a DataFrame benchmark.

Observation	Cold runtime	Warm same-page runtime
Requests	0	0
Newly available packages	pandas 1.5.3, numpy 1.26.1, python-dateutil 2.8.2, six 1.16.0, pytz 2023.3	0
What changed	pandas and its dependencies became available	the runtime already had them

web.run detects import pandas and loads supported packages before the user’s Python starts. Run Details reports the difference between packages available before and after that run, so an empty warm-run Packages list means “nothing newly loaded,” not “pandas was unused.”

The total first-run duration includes package bootstrap as well as Python execution. Compare it with the warm run before blaming pd.DataFrame(...); timings vary by machine and cache state, so use the Packages list to identify bootstrap work rather than expecting a fixed speedup.

Both runs report zero user requests because the dictionary and filtering stay local. The cold run still downloads package files as runtime activity. Therefore, “no user requests” proves the snippet made no external request, not that the browser transferred zero bytes.

The Filtered-Index Trap: loc Is Not iloc

Filtering preserves labels. Here ready.index.tolist() is [0, 2], even though index=False makes the output look like a fresh two-row table. Code that treats the second displayed row as label 1 fails:

Wrong:

second_product = ready.loc[1, "product"]  # KeyError: 1

Right for the second row by position:

second_product = ready.iloc[1]["product"]

Right when later code needs sequential labels:

ready = ready.reset_index(drop=True)
second_product = ready.loc[1, "product"]

Use .loc for index labels and .iloc for zero-based positions. Reset the index only when new labels are part of the intended result; otherwise preserving source labels helps trace filtered rows back to the original data.

DataFrame Constructor Length Errors

A dictionary of lists must describe a rectangular table. pandas raises ValueError: All arrays must be of the same length rather than guessing how to fill a short column.

Wrong:

pd.DataFrame({"product": ["Keyboard", "Mouse"], "units": [12]})

Right:

pd.DataFrame({"product": ["Keyboard", "Mouse"], "units": [12, None]})

Add an explicit missing value only when it represents the data honestly; otherwise fix the source records before constructing the table.

When a DataFrame Is the Right Container

Choose	When it fits
DataFrame	Multiple labeled columns need boolean indexing, column operations, or tabular analysis
Series	One labeled dimension is enough
List of dictionaries	A tiny record collection needs iteration but no columnar operations
NumPy array	Homogeneous numeric data needs matrix operations without row or column labels

A DataFrame earns its package and memory overhead when labels and vectorized operations simplify real transformations. Keep a list of dictionaries when the records are small and only need ordinary Python iteration.