DataFrame Rendering¶
DataFusion provides configurable rendering for DataFrames in both plain text and HTML
formats. The datafusion.dataframe_formatter module controls how DataFrames are
displayed in Jupyter notebooks (via _repr_html_), in the terminal (via __repr__),
and anywhere else a string or HTML representation is needed.
Basic Rendering¶
In a Jupyter environment, displaying a DataFrame triggers HTML rendering:
# Will display as HTML table in Jupyter
df
# Explicit display also uses HTML rendering
display(df)
In a terminal or when converting to string, plain text rendering is used:
# Plain text table output
print(df)
Configuring the Formatter¶
You can customize how DataFrames are rendered by configuring the global formatter:
from datafusion.dataframe_formatter import configure_formatter
configure_formatter(
max_cell_length=25, # Maximum characters in a cell before truncation
max_width=1000, # Maximum width in pixels (HTML only)
max_height=300, # Maximum height in pixels (HTML only)
max_memory_bytes=2097152, # Maximum memory for rendering (2MB)
min_rows=10, # Minimum number of rows to display
max_rows=10, # Maximum rows to display
enable_cell_expansion=True, # Allow expanding truncated cells (HTML only)
custom_css=None, # Additional custom CSS (HTML only)
show_truncation_message=True, # Show message when data is truncated
style_provider=None, # Custom styling provider (HTML only)
use_shared_styles=True, # Share styles across tables (HTML only)
)
The formatter settings affect all DataFrames displayed after configuration.
Custom Style Providers¶
For HTML styling, you can create a custom style provider that implements the
StyleProvider protocol:
from datafusion.dataframe_formatter import configure_formatter
class MyStyleProvider:
def get_cell_style(self):
"""Return CSS style string for table data cells."""
return "border: 1px solid #ddd; padding: 8px; text-align: left;"
def get_header_style(self):
"""Return CSS style string for table header cells."""
return (
"background-color: #007bff; color: white; "
"padding: 8px; text-align: left;"
)
# Apply the custom style provider
configure_formatter(style_provider=MyStyleProvider())
Custom Cell Formatters¶
You can register custom formatters for specific Python types. A cell formatter is any callable that takes a value and returns a string:
from datafusion.dataframe_formatter import get_formatter
formatter = get_formatter()
# Format floats to 2 decimal places
formatter.register_formatter(float, lambda v: f"{v:.2f}")
# Format dates in a custom way
from datetime import date
formatter.register_formatter(date, lambda v: v.strftime("%B %d, %Y"))
Custom Cell and Header Builders¶
For full control over the HTML of individual cells or headers, you can set custom builder functions:
from datafusion.dataframe_formatter import get_formatter
formatter = get_formatter()
# Custom cell builder receives (value, row, col, table_id) and returns HTML
def my_cell_builder(value, row, col, table_id):
color = "red" if isinstance(value, (int, float)) and value < 0 else "black"
return f"<td style='color: {color}; padding: 8px;'>{value}</td>"
formatter.set_custom_cell_builder(my_cell_builder)
# Custom header builder receives a schema field and returns HTML
def my_header_builder(field):
return f"<th style='background: #333; color: white; padding: 8px;'>{field.name}</th>"
formatter.set_custom_header_builder(my_header_builder)
Working with the Formatter Directly¶
You can use get_formatter() and set_formatter() for direct access to the global
formatter instance:
from datafusion.dataframe_formatter import (
DataFrameHtmlFormatter,
get_formatter,
set_formatter,
)
# Get and modify the current formatter
formatter = get_formatter()
print(formatter.max_rows)
print(formatter.max_cell_length)
# Create and set a fully custom formatter
custom_formatter = DataFrameHtmlFormatter(
max_cell_length=50,
max_rows=20,
enable_cell_expansion=False,
)
set_formatter(custom_formatter)
Reset to default formatting:
from datafusion.dataframe_formatter import reset_formatter
# Reset to default settings
reset_formatter()
Memory and Display Controls¶
You can control how much data is displayed and how much memory is used for rendering:
from datafusion.dataframe_formatter import configure_formatter
configure_formatter(
max_memory_bytes=4 * 1024 * 1024, # 4MB maximum memory for display
min_rows=20, # Always show at least 20 rows
max_rows=50, # Show up to 50 rows in output
)
These parameters help balance comprehensive data display against performance considerations.
Best Practices¶
Global Configuration: Use
configure_formatter()at the beginning of your notebook to set up consistent formatting for all DataFrames.Memory Management: Set appropriate
max_memory_byteslimits to prevent performance issues with large datasets.Shared Styles: Keep
use_shared_styles=True(default) for better performance in notebooks with multiple DataFrames.Reset When Needed: Call
reset_formatter()when you want to start fresh with default settings.Cell Expansion: Use
enable_cell_expansion=Truewhen cells might contain longer content that users may want to see in full.
Additional Resources¶
DataFrames - Complete guide to using DataFrames
IO - I/O Guide for reading data from various sources
Data Sources - Comprehensive data sources guide
CSV - CSV file reading
Parquet - Parquet file reading
JSON - JSON file reading
Avro - Avro file reading
Custom Table Provider - Custom table providers
API Reference - Full API reference