An example of converting a Pandas dataframe to an Excel file with column formats using Pandas and XlsxWriter.
It isn’t possible to format any cells that already have a format such as the index or headers or any cells that contain dates or datetimes.
Note: This feature requires Pandas >= 0.16.
############################################################################## # # An example of converting a Pandas dataframe to an xlsx file # with column formats using Pandas and XlsxWriter. # # SPDX-License-Identifier: BSD-2-Clause # Copyright 2013-2022, John McNamara, # import pandas as pd # Create a Pandas dataframe from some data. df = pd.DataFrame[{'Numbers': [1010, 2020, 3030, 2020, 1515, 3030, 4545], 'Percentage': [.1, .2, .33, .25, .5, .75, .45 ], }] # Create a Pandas Excel writer using XlsxWriter as the engine. writer = pd.ExcelWriter["pandas_column_formats.xlsx", engine='xlsxwriter'] # Convert the dataframe to an XlsxWriter Excel object. df.to_excel[writer, sheet_name='Sheet1'] # Get the xlsxwriter workbook and worksheet objects. workbook = writer.book worksheet = writer.sheets['Sheet1'] # Add some cell formats. format1 = workbook.add_format[{'num_format': '#,##0.00'}] format2 = workbook.add_format[{'num_format': '0%'}] # Note: It isn't possible to format any cells that already have a format such # as the index or headers or any cells that contain dates or datetimes. # Set the column width and format. worksheet.set_column[1, 1, 18, format1] # Set the format but not the column width. worksheet.set_column[2, 2, None, format2] # Close the Pandas Excel writer and output the Excel file. writer.save[]
This section demonstrates visualization of tabular data using the Styler class. For information on visualization with charting please see Chart Visualization. This document is written as a Jupyter Notebook, and can be viewed or downloaded here.
Styler Object and HTML#
Styling should be performed after the data in a DataFrame has been processed. The
Styler creates an HTML The
The above output looks very similar to the standard DataFrame HTML representation. But the HTML here has already attached some CSS classes to each cell, even if we haven’t yet created any styles. We can view these by calling the .to_html[] method, which returns the raw HTML as string, which is useful for further processing or
adding to a file - read on in More about CSS and HTML. Below we will show how we can use these to format the DataFrame to be more communicative. For example how we can build
Confusion matrix for multiple cancer prediction models. Before adding styles it is useful to show that the
Styler can distinguish the display value from the actual value, in both datavalues and index or columns headers. To control the display value, the text is printed in each cell as string, and we can use the .format[] and
.format_index[] methods to manipulate this according to a format spec string or a callable that takes a single value and returns a string. It is possible to define this for the whole table, or index, or for individual columns, or MultiIndex levels. Additionally, the format function has a precision argument to specifically help formatting floats, as well as decimal and thousands separators to support other locales, an na_rep argument to display missing data, and an escape argument to help displaying safe-HTML or safe-LaTeX. The default formatter is configured to adopt pandas’
Using Styler to manipulate the display is a useful feature because maintaining the indexing and datavalues for other purposes gives greater control. You do not have to overwrite your DataFrame to display it how you like. Here is an example of using the formatting functions whilst still relying on the underlying data for indexing and calculations. and leverages CSS styling language to manipulate many parameters including colors, fonts, borders, background, etc. See here for more information on styling HTML tables. This allows a lot of flexibility out of the box, and even enables web developers to integrate
DataFrames into their exiting user interface designs.
DataFrame.style
attribute is a property that returns a Styler object. It has a _repr_html_
method defined on it so they are rendered automatically in Jupyter Notebook.import pandas as pd
import numpy as np
import matplotlib as mpl
df = pd.DataFrame[[[38.0, 2.0, 18.0, 22.0, 21, np.nan],[19, 439, 6, 452, 226,232]],
index=pd.Index[['Tumour [Positive]', 'Non-Tumour [Negative]'], name='Actual Label:'],
columns=pd.MultiIndex.from_product[[['Decision Tree', 'Regression', 'Random'],['Tumour', 'Non-Tumour']], names=['Model:', 'Predicted:']]]
df.style
Model:Decision TreeRegressionRandom Predicted:TumourNon-TumourTumourNon-TumourTumourNon-Tumour
Actual Label: Tumour [Positive] 38.000000
2.000000
18.000000
22.000000
21
nan
Non-Tumour [Negative] 19.000000
439.000000
6.000000
452.000000
226
232.000000
s
:Model:Decision TreeRegression Predicted:TumourNon-TumourTumourNon-Tumour Actual Label:
Tumour [Positive] 38
2
18
22
Non-Tumour [Negative] 19
439
6
452
Formatting the Display#
Formatting Values#
styler.format.precision
option, controllable using with pd.option_context['format.precision', 2]:
df.style.format[precision=0, na_rep='MISSING', thousands=" ",
formatter={['Decision Tree', 'Tumour']: "{:.2f}",
['Regression', 'Non-Tumour']: lambda x: "$ {:,.1f}".format[x*-1e6]
}]
Model:Decision TreeRegressionRandom Predicted:TumourNon-TumourTumourNon-TumourTumourNon-Tumour
Actual Label: Tumour [Positive] 38.00
2
18
$ -22 000 000.0
21
MISSING
Non-Tumour [Negative] 19.00
439
6
$ -452 000 000.0
226
232
weather_df = pd.DataFrame[np.random.rand[10,2]*5,
index=pd.date_range[start="2021-01-01", periods=10],
columns=["Tokyo", "Beijing"]]
def rain_condition[v]:
if v
Chủ Đề