Things you might need before coding

Install python here

Since this solution is realised by using the programming language called python,  you should probably install it as instructed in the website linked above. Additionally, you may need to understand the programming principles independent of programming language and also learn the code syntax in python. An introductory course  can be found here .

Install MySQL connector via pip here

This allows you to connect to a MySQL database using python. Although there are other solutions, such as using a MySQL driver. I personally used the connector since it was simple to install and use immediately. Also, if you are planning on connecting to a MySQL database locally (e.g. localhost), make sure you already have MySQL installed.

Packages to install and why

CSV

Packed within this module methods that can be used to manipulate CSV ( Comma Separated Values) files. You can install this module in a similar fashion with the MySQL connector here . To use any module you install, simple import the module e.g. import csv, at the top of your python file.

OS

Another interesting module that allows you to play with operating system features. For this solution, I will be using the child module, os.path, mainly to validate and transform paths. As you did with the above module, install this module here and import os.path as well at the top of your python script.

SYS

In order to obtain a list of command line arguments, essential for this solution, I will be using the SYS module. More specifically, the command sys.argv. Similary, install module here and import into your file.

Overall logic or semi-pseudo-code

Prior to coding, it is essential to understand why you are about to type some code and how you plan on going about it. The task is simple:

Read a given database and transform its tables into single csv files

  • Obtain the database information from the user
  • Establish a connection to the database 
  • Access all the tables and their contents
  • Transform the tables into python dictionaries 
  • Write and save the csv files

This should give you an idea on how I will create a solution this the “problem”. So lets begin.

Code and its Explanation

Command Line Arguments

__name_ is a python variable that is defined by the python interpreter just before it executes the contents of the python script. If you ran your python script on the command line (ie python somescript.py), it sets the __name__ variable to a string value, “__main__”. Else if you imported the script, as you will with the CSV, SYS, and OS modules, the variable is set to the name of the script file without the “.py” (e.g. __name__ = “somescript”).  So the if statements allows me to define when the code should run automatically.

Using the sys.argv, the code extracts the command line arguments, assuming the script was ran from the command line and not imported as a module, and checks its length. Since the host, database name, username and password are all required for the database connection, the length of the arguments should be greater than 4. Why “greater than 4″ and not ” greater than 3″ ? well, the first element in the sys.argv list is always the name of the script, so name of script + 4 extra arguments = 5. Once the conditions are met, the dabatase connection is established via the function “connect_2_db”.

Connecting to the Database 

Simply out of “habit” and “paranoia”, I check if the values required for the database connections are available via the length. Once that is complete, the  connection is established within a “try, except, finally” scope. Just a fancier way ( compared to if and else statements in my weak opinion) of handling errors/exceptions. 

If the connection is established and confirmed using the is_connected method, the tables within the given database are transformed after calling the “tables_to_dictionary” function. Else if an exception is thrown, whatever lies within the except scope is executed. The finally statement allows you to trigger code regardless of how successful the try and except statement were.

Transforming MYSQL database to a python dictionary

This is where the fun starts and ends lol. It took me a while to figure this out. But I did it!  First the connection object is validated and the presence of the database name is confirmed before moving forward. An instance of the cursor is created which allows us to execute several SQL statements such SELECT, USE, DESCRIBE etc. 

Following the database selection using the USE statement, the names of the tables are extracted and inserted into a list called headers. Then we loop over the list, get all names of the columns within the current table and insert the table and its contents within a dictionary. Once all the tables have been analysed, the cursor and database connection are closed and the CSV files are written using the “write_data” function.

Writing dictionary into CSV files 

The following function is quite self-explanatory.  First it asks the user where the csv files are to be saved, then it loops through the table names within the dictionary given as an argument. Here comes the tricky part: since the columns within a table my vary in length, using hard-coded value for looping within the columns would cause an error. So I created another function called “get_longest_list” which loops through each of the columns of  a given table and determines the “longest”. The length of the longest column is used for the loop, so no hard-written value here, to make sure that all the rows of all columns in a table are not skimmed over and written to the CSV file.

Test Run

Local MySQL Database

I used the website to generate mock / placeholder SQL data which was imported into a database I created using the popular MySQL tool called MySQL workbench. As you can see below, the SQL data inserted a table named mock_data. The persons table was created by me for testing purposes. It is basically an empty table. The mock_data table contains 6 columns namely: ID, first_name, last_name, gender and ip_address. So if the script is “error free”, two CSV files called mock_data.csv and persons.csv should be created upon running the script in the command line.

 

 

 

 

 

 

 

The script ( which I named backuper.py: don’t ask why) and the arguments for database connection provided.

The result! two CSV files within the current directory one empty (persons.csv) and the other correctly filled. YEAAAAAAA!

Files here

Closing Word

 I hope this post helped you not only solve this problem but let you see the possibilities in using python to simplify repetitive tasks. See you in the next post.