How to format columns in python

An example of converting a Pandas dataframe to an Excel file with column formats using Pandas and XlsxWriter.

It isn’t possible to format any cells that already have a format such as the index or headers or any cells that contain dates or datetimes.

Note: This feature requires Pandas >= 0.16.

##############################################################################
#
# An example of converting a Pandas dataframe to an xlsx file
# with column formats using Pandas and XlsxWriter.
#
# SPDX-License-Identifier: BSD-2-Clause
# Copyright 2013-2022, John McNamara, 
#

import pandas as pd

# Create a Pandas dataframe from some data.
df = pd.DataFrame[{'Numbers':    [1010, 2020, 3030, 2020, 1515, 3030, 4545],
                   'Percentage': [.1,   .2,   .33,  .25,  .5,   .75,  .45 ],
}]

# Create a Pandas Excel writer using XlsxWriter as the engine.
writer = pd.ExcelWriter["pandas_column_formats.xlsx", engine='xlsxwriter']

# Convert the dataframe to an XlsxWriter Excel object.
df.to_excel[writer, sheet_name='Sheet1']

# Get the xlsxwriter workbook and worksheet objects.
workbook  = writer.book
worksheet = writer.sheets['Sheet1']

# Add some cell formats.
format1 = workbook.add_format[{'num_format': '#,##0.00'}]
format2 = workbook.add_format[{'num_format': '0%'}]

# Note: It isn't possible to format any cells that already have a format such
# as the index or headers or any cells that contain dates or datetimes.

# Set the column width and format.
worksheet.set_column[1, 1, 18, format1]

# Set the format but not the column width.
worksheet.set_column[2, 2, None, format2]

# Close the Pandas Excel writer and output the Excel file.
writer.save[]

This section demonstrates visualization of tabular data using the Styler class. For information on visualization with charting please see Chart Visualization. This document is written as a Jupyter Notebook, and can be viewed or downloaded here.

Styler Object and HTML#

Styling should be performed after the data in a DataFrame has been processed. The Styler creates an HTML

and leverages CSS styling language to manipulate many parameters including colors, fonts, borders, background, etc. See here for more information on styling HTML tables. This allows a lot of flexibility out of the box, and even enables web developers to integrate DataFrames into their exiting user interface designs.

The DataFrame.style attribute is a property that returns a Styler object. It has a _repr_html_ method defined on it so they are rendered automatically in Jupyter Notebook.

import pandas as pd
import numpy as np
import matplotlib as mpl

df = pd.DataFrame[[[38.0, 2.0, 18.0, 22.0, 21, np.nan],[19, 439, 6, 452, 226,232]],
                  index=pd.Index[['Tumour [Positive]', 'Non-Tumour [Negative]'], name='Actual Label:'],
                  columns=pd.MultiIndex.from_product[[['Decision Tree', 'Regression', 'Random'],['Tumour', 'Non-Tumour']], names=['Model:', 'Predicted:']]]
df.style

Model:Decision TreeRegressionRandomPredicted:TumourNon-TumourTumourNon-TumourTumourNon-TumourActual Label:      Tumour [Positive]Non-Tumour [Negative]
38.000000 2.000000 18.000000 22.000000 21 nan
19.000000 439.000000 6.000000 452.000000 226 232.000000

The above output looks very similar to the standard DataFrame HTML representation. But the HTML here has already attached some CSS classes to each cell, even if we haven’t yet created any styles. We can view these by calling the .to_html[] method, which returns the raw HTML as string, which is useful for further processing or adding to a file - read on in More about CSS and HTML. Below we will show how we can use these to format the DataFrame to be more communicative. For example how we can build s:

Confusion matrix for multiple cancer prediction models.

Model:Decision TreeRegressionPredicted:TumourNon-TumourTumourNon-TumourActual Label:     Tumour [Positive]Non-Tumour [Negative]
38 2 18 22
19 439 6 452

Formatting the Display#

Formatting Values#

Before adding styles it is useful to show that the Styler can distinguish the display value from the actual value, in both datavalues and index or columns headers. To control the display value, the text is printed in each cell as string, and we can use the .format[] and .format_index[] methods to manipulate this according to a format spec string or a callable that takes a single value and returns a string. It is possible to define this for the whole table, or index, or for individual columns, or MultiIndex levels.

Additionally, the format function has a precision argument to specifically help formatting floats, as well as decimal and thousands separators to support other locales, an na_rep argument to display missing data, and an escape argument to help displaying safe-HTML or safe-LaTeX. The default formatter is configured to adopt pandas’ styler.format.precision option, controllable using with pd.option_context['format.precision', 2]:

df.style.format[precision=0, na_rep='MISSING', thousands=" ",
                formatter={['Decision Tree', 'Tumour']: "{:.2f}",
                           ['Regression', 'Non-Tumour']: lambda x: "$ {:,.1f}".format[x*-1e6]
                          }]

Model:Decision TreeRegressionRandomPredicted:TumourNon-TumourTumourNon-TumourTumourNon-TumourActual Label:      Tumour [Positive]Non-Tumour [Negative]
38.00 2 18 $ -22 000 000.0 21 MISSING
19.00 439 6 $ -452 000 000.0 226 232

Using Styler to manipulate the display is a useful feature because maintaining the indexing and datavalues for other purposes gives greater control. You do not have to overwrite your DataFrame to display it how you like. Here is an example of using the formatting functions whilst still relying on the underlying data for indexing and calculations.

weather_df = pd.DataFrame[np.random.rand[10,2]*5,
                          index=pd.date_range[start="2021-01-01", periods=10],
                          columns=["Tokyo", "Beijing"]]

def rain_condition[v]:
    if v 

Bài mới nhất

Chủ Đề