Cara menggunakan python sqlite import csv

Seperti yang dapat kita lihat, data tersebut dipisahkan oleh koma, jadi ini termasuk jenis file CSV (Comma Separated Values) sehingga kita dapat menggunakan method read_csv() untuk import data.

Baca juga: Pengenalan Library Python Untuk Data Science

Di Python, untuk mengimport data ke jupyter notebook cukup mudah yaitu hanya dengan tiga langkah.

  1. Import pandas
  2. Definisikan variabel untuk menyimpan path url atau file dataset
  3. Gunakan method read_csv untuk mengimport data
import pandas as pd

url = "https://archive.ics.uci.edu/ml/machine-learning-databases/autos/imports-85.data"
df = pd.read_csv(url)

Jika menggunakan file dari komputer lokal, kita harus menentukan lokasi file tersebut tersimpan.

import pandas as pd

path = "dataset/imports-85.data"
df = pd.read_csv(path)

Kita bisa melihat dataset tersebut dalam bentuk dataframe dengan method head() untk melihat data teratas.

df.head()

Atau tail() untuk melihat data terbawah.

df.tail()

Default dari method head() dan tail() di atas adalah menampilkan 5 baris data teratas ataupun terbawah. Namun, jika kita ingin menentukan jumlah yang spesifik bisa dilakukan dengan memberikan sebuah angka di dalam tanda kurung.

Misalnya :

df.head(10)

atau

df.tail(20)

Baca juga: Cara Mengecek Tipe dan Distribusi Data di Pandas Python

Method read_csv() mengasumsikan setiap data yang dibaca telah mengandung header, sehingga jika ternyata dataset tersebut belum mengandung header seperti dataset automobiles yang kita gunakan, maka otomatis baris pertama dataset tersebut akan dijadikan header seperti terlihat di bawah ini.

Cara menggunakan python sqlite import csv

Python is a cool language, soon as you finish your first projects with it, you will be hocked.

It's a general purpose language and it's gaining popularity for the recent years among data scientists and data engineers.

Personally, I recommend Python to newbies especially teens, not just because it has a simple direct learning curve, but also because it has a large community as well as a rich ecosystem.

About this tutorial

I was helping a friend through how to solve problems by dissecting them into small tiny problems and solve them one-by-one, So here is the code assignment as the main problem we are trying to solve.

The Problem:
Use Python to upload and import CSV file into a pre-defined table in MySQL database.

Steps

  1. Create file upload form
  2. Upload the CSV using Flask
  3. Parse CSV file data
  4. Connect to the database
  5. Insert rows into a specific table in the database

Stack we will use:

  1. Flask;
    A Python Minimal (Micro) web frameworks.
  2. MySQL;
    If you use Windows you can use WAMP or XAMPP, if you are using Linux XAMPP is a good choice. If you are using macOS, XAMPP or MAMP are good solutions.
  3. Pandas;
    Pandas is an open-source python library for data analytics and manipulation. It's widely used among data scientists and data engineers.
  4. mysql.connector;
    The library that we will use to connect to the MySQL database and insert the data.

Requirements

  • Install Flask;
    Here I will install Python directly with pip, but I strongly recommend using python-virtualenv for best practice.
pip install Flask
  • Install XAMPP;
    The XAMPP package will include MySQL database and PHPMyAdmin to manage our MySQL databases.
  • Install the required libraries (Pandas, sql.connector)
pip install pandas
pip install sql.connector
  • A sample CSV file, I will be using addresses.csv from here.

0 Get Flask ready

Now let's test if Flask works by starting to create our main file main.py

from flask import Flask

app = Flask(__name__)

@app.route("/")
def hello():
     return "Flask CSV Filer Uploader and Parser"

if (__name__ == "__main__"):
     app.run(port = 5000)

We can run the file by $ python main.py then heading tohttp://127.0.0.1:5000/.

If everything goes as planned, you will get the rendered message.

Next step is to create the HTML page which will be responsible for uploading the CSV files.

1- Create file upload form

Let's create templates directory and our HTML upload form in it.



  
    FLASK CSV File Upload
  
  
    

Upload your CSV file

2- Upload the files to the server

Now we need to accept the incoming files from this form, let's add the reasonable function in main.py

from flask import Flask, render_template, request, redirect, url_for
import os
from os.path import join, dirname, realpath

app = Flask(__name__)

# enable debugging mode
app.config["DEBUG"] = True

# Upload folder
UPLOAD_FOLDER = 'static/files'
app.config['UPLOAD_FOLDER'] =  UPLOAD_FOLDER


# Root URL
@app.route('/')
def index():
     # Set The upload HTML template '\templates\index.html'
    return render_template('index.html')


# Get the uploaded files
@app.route("/", methods=['POST'])
def uploadFiles():
      # get the uploaded file
      uploaded_file = request.files['file']
      if uploaded_file.filename != '':
           file_path = os.path.join(app.config['UPLOAD_FOLDER'], uploaded_file.filename)
          # set the file path
           uploaded_file.save(file_path)
          # save the file
      return redirect(url_for('index'))

if (__name__ == "__main__"):
     app.run(port = 5000)

As everything is set, we can test the file upload now at http://127.0.0.1:5000/ If everything is ok, you can see the uploaded files at /static/files/

3- Parse CSV file

To parse the CSV file data into rows, we will use pandas the data analysis and manipulation library.

  • Import the library
import pandas as pd

Parse CSV function



def parseCSV(filePath):
      # CVS Column Names
      col_names = ['first_name','last_name','address', 'street', 'state' , 'zip']
      # Use Pandas to parse the CSV file
      csvData = pd.read_csv(filePath,names=col_names, header=None)
      # Loop through the Rows
      for i,row in csvData.iterrows():
             print(i,row['first_name'],row['last_name'],row['address'],row['street'],row['state'],row['zip'],)

Don't forget to add parseCSV(file_path) function soon after the file is saved in uploadFiles

# Output 
0 John Doe 120 jefferson st. Riverside  NJ 8075
1 Jack McGinnis 220 hobo Av. Phila  PA 9119
2 John "Da Man" Repici 120 Jefferson St. Riverside  NJ 8075
3 Stephen Tyler 7452 Terrace "At the Plaza" road SomeTown SD 91234
4 nan Blankman nan SomeTown  SD 298
5 Joan "the bone", Anne Jet 9th, at Terrace plc Desert City CO 123

4- Connect to the database (XAMPP/ MySQL)

  • In this step we will connect to the database and make sure it works.
  • Add import mysql.connector
  • Connect to the MySQL Server
import mysql.connector

mydb = mysql.connector.connect(
  host="localhost",
  user="root",
  password="",
  database="databaseName"
)

Make sure to replace the conditionals right.

  • To test the Database Connection, let's list all available databases:
mycursor = mydb.cursor()

mycursor.execute("SHOW DATABASES")

# List All Databases
for x in mycursor:
  print(x)

Create the database and the table you want to insert the rows into it with the correspondent column names.

5- Insert rows into the database

With this step we will just update our loop to insert every row into the database

def parseCSV(filePath):
      # CVS Column Names
      col_names = ['first_name','last_name','address', 'street', 'state' , 'zip']
      # Use Pandas to parse the CSV file
      csvData = pd.read_csv(filePath,names=col_names, header=None)
      # Loop through the Rows
      for i,row in csvData.iterrows():
             sql = "INSERT INTO addresses (first_name, last_name, address, street, state, zip) VALUES (%s, %s, %s, %s, %s, %s)"
             value = (row['first_name'],row['last_name'],row['address'],row['street'],row['state'],str(row['zip']))
             mycursor.execute(sql, value, if_exists='append')
             mydb.commit()
             print(i,row['first_name'],row['last_name'],row['address'],row['street'],row['state'],row['zip'])

As everything goes as planned, check the new data entries at the database.

Cara menggunakan python sqlite import csv

Troubleshooting and practice

As we are concluded this tutorial, here are some Troubleshooting and ideas to carry on and practice

  • NaN; don't send NaN values to the database
  • Use Secure Form for best practice
  • Create and insert table/ columns based on CSV structure
  • Insert new values only
  • Set upload options: accept only CSV files, limit file size
  • Upload multiple files
  • Move finished files to done directly as soon as they are inserted
  • Better error management
  • List all done and uploaded files in index.html
  • View CSV table in index.html
  • Convert the csv into json

Full code

main.py

from flask import Flask, render_template, request, redirect, url_for
import os
from os.path import join, dirname, realpath

import pandas as pd
import mysql.connector

app = Flask(__name__)

# enable debugging mode
app.config["DEBUG"] = True

# Upload folder
UPLOAD_FOLDER = 'static/files'
app.config['UPLOAD_FOLDER'] =  UPLOAD_FOLDER


# Database
mydb = mysql.connector.connect(
  host="localhost",
  user="root",
  password="",
  database="csvdata"
)

mycursor = mydb.cursor()

mycursor.execute("SHOW DATABASES")

# View All Database
for x in mycursor:
  print(x)





# Root URL
@app.route('/')
def index():
     # Set The upload HTML template '\templates\index.html'
    return render_template('index.html')


# Get the uploaded files
@app.route("/", methods=['POST'])
def uploadFiles():
      # get the uploaded file
      uploaded_file = request.files['file']
      if uploaded_file.filename != '':
           file_path = os.path.join(app.config['UPLOAD_FOLDER'], uploaded_file.filename)
          # set the file path
           uploaded_file.save(file_path)
           parseCSV(file_path)
          # save the file
      return redirect(url_for('index'))

def parseCSV(filePath):
      # CVS Column Names
      col_names = ['first_name','last_name','address', 'street', 'state' , 'zip']
      # Use Pandas to parse the CSV file
      csvData = pd.read_csv(filePath,names=col_names, header=None)
      # Loop through the Rows
      for i,row in csvData.iterrows():
             sql = "INSERT INTO addresses (first_name, last_name, address, street, state, zip) VALUES (%s, %s, %s, %s, %s, %s)"
             value = (row['first_name'],row['last_name'],row['address'],row['street'],row['state'],str(row['zip']))
             mycursor.execute(sql, value, if_exists='append')
             mydb.commit()
             print(i,row['first_name'],row['last_name'],row['address'],row['street'],row['state'],row['zip'])

if (__name__ == "__main__"):
     app.run(port = 5000)

If you like this quick tutorial, please share it and if you have any comment please let us know.


Resources

Photo by Hitesh Choudhary on Unsplash

Cara menggunakan python sqlite import csv