kode python

Cara menggunakan olap python pandas

By: | Updated: 2022-07-18 | Comments [2] | Related: More > Python

Problem

In case you want to perform extra operations, such as describe, analyze, and visualize your data stored in SQL you need an extra tool. If you have the flexibility or requirement to not use Power BI, you can resort to scripting.

Solution

In this tutorial, we examine the scenario where you want to read SQL data, parse it directly into a dataframe and perform data analysis on it. When connecting to an analytical data store, this process will enable you to extract insights directly from your database, without having to export or sync the data to another system.

Getting Started

Please read my tip on How to Get Started Using Python Using Anaconda and VS Code, if you have not already. Then, open VS Code in your working directory. Create a new file with the .ipynbextension:

Next, open your file by double-clicking on it and select a kernel:

You will get a list of all your conda environments and any default interpreters [if installed]. You can pick an existing one or create one from the conda interface or terminal prior. Assuming you do not have sqlalchemy installed, run pip install SQLAlchemy in the terminal of your target environment:

Repeat the same for the pandas package: pip install pandas.

Establishing a connection

Having set up our development environment we are ready to connect to our local SQL server. First, import the packages needed and run the cell:

import pandas as pd
from sqlalchemy import create_engine

Next, we must establish a connection to our server. This is what a connection string for the local database looks like with inferred credentials [or the trusted connection under pyodbc]:

engine = create_engine[
    'mssql+pyodbc://'
    '@./AdventureWorks2019?' # username:pwd@server:port/database
    'driver=ODBC+Driver+17+for+SQL+Server'
    ]

Let us break it down:

on line 2 the keywords are passed to the connection string
on line 3 you have the credentials, server and database in the format username:pwd@server:port/database. Here both username and password are omitted as we are connecting to the local server. The server itself is denoted by . [dot] for localhost. The port is the default. In case you must specify a port and you don't know it, check this helpful tip: Identify SQL Server TCP IP port being used.
on line 4 we have the driver argument, which you may recognize from a previous tip on how to connect to SQL server via the pyodbc module alone.

Reading data with the Pandas Library

The read_sql pandas method allows to read the data directly into a pandas dataframe. In fact, that is the biggest benefit as compared to querying the data with pyodbc and converting the result set as an additional step.

Let us try out a simple query:

df = pd.read_sql[
      'SELECT [CustomerID]\
      ,[PersonID]\
      ,[StoreID]\
      ,[TerritoryID]\
      ,[AccountNumber]\
      ,[ModifiedDate]\
  FROM [Sales].[Customer]',
  engine,
  index_col='CustomerID']

The first argument [lines 2 – 8] is a string of the query we want to be executed. The second argument [line 9] is the engine object we previously built to connect to the server. Lastly [line10], we have an argument for the index column. Here it is the CustomerID and it is not required. However, if you have a bigger dataset, it can be very useful. For example, thousands of rows where each row has a timestamp column and numerical value column. There, it can be very useful to set the index to the timestamp of each row at query run time instead of post-processing later.

Explore the dataframe

Let us pause for a bit and focus on what a dataframe is and its benefits. The pandas dataframe is a tabular data structure, consisting of rows, columns, and data. It is like a two-dimensional array, however, data contained can also have one or multiple dimensions. Within the pandas module, the dataframe is a cornerstone object allowing quick [relatively, as they are technically quicker ways], straightforward and intuitive data selection, filtering, and ordering. Additionally, the dataframe can provide a good overview of an entire dataset by using additional pandas methods or additional modules to describe [profile] the dataset. Turning your SQL table to a pandas dataframe 'on the fly' enables you as the analyst to gain an overview of the data at hand. You can also process the data and prepare it for further analysis.

More complex example

Let us investigate defining a more complex query with a join and some parameters. Parametrizing your query can be a powerful approach if you want to use variables existing elsewhere in your code. For example:

start_date = '2012-01-01'
end_date = '2012-12-31'
product_name = '%shorts%'
 
df2 = pd.read_sql['SELECT AVG[sod.OrderQty] [Avg Order Qty],\
                p.Name,\
                FORMAT[soh.OrderDate,\'yyyy-MM\'] [Year-Month]\
        FROM Sales.SalesOrderHeader soh\
        JOIN Sales.SalesOrderDetail sod ON sod.SalesOrderID = soh.SalesOrderID\
        JOIN Production.Product p ON sod.ProductID = p.ProductID\
        WHERE soh.OrderDate >= ?\
          AND soh.OrderDate

Cara menggunakan olap python pandas

Getting Started

Establishing a connection

Reading data with the Pandas Library

Explore the dataframe

More complex example

Bài Viết Liên Quan

Penggunaan fungsi FILTURE pada PHP

Cara menggunakan restore database mysql linux

Cara menggunakan does nginx need php

Cara menggunakan mikrotik-api php github

Cara menggunakan script php penggajian karyawan

Cara menggunakan template nomor antrian php

Cara menggunakan pure javascript

Cara menggunakan unlink php

Cara menggunakan prototype in javascript

Cara menggunakan autocomplete javascript from database

Cara menggunakan edit php ini ubuntu

Cara menggunakan program laundry python

Cara menggunakan $msg in php

Cara menggunakan STR.REPLACE pada Python

Cara menggunakan moving average excel 2010

Cara menggunakan vanilla javascript是什么

Cara menggunakan login vanilla php

Cara menggunakan ubah background php

Penggunaan fungsi DATEANDTIME pada PHP

Source code website company profile dengan bootstrap gratis

Toplist mới

Top 7 luas segitiga berikut adalah 130 cm k 300 cm - 120 cm 2023

Top 16 negara asean yang memiliki jumlah penduduk paling sedikit adalah 2023

Top 6 poh larutan cuka 0 1 m ka = 10 ⁻ ⁵ adalah 2023

Top 8 sistem pengangkatan pegawai berdasarkan kecakapan atau kompetensi pegawai adalah 2023

Top 8 proses pertama kali yang terjadi pada saat pembentukan urine adalah 2023

Top 8 contoh diagonal bidang 2023

Top 5 contoh ideologi transnasional 2023

Top 7 kapur gamping dalam alat penjernih air berfungsi untuk …. 2023

Top 6 rumus molekul senyawa dengan rumus empiris c2h2o dan mr 84 adalah 2023

Bài mới nhất

Chủ Đề