Things you might need before coding
Install python here
Since this solution is realised by using the programming language called python, you should probably install it as instructed in the website linked above. Additionally, you may need to understand the programming principles independent of programming language and also learn the code syntax in python. An introductory course can be found here .
Install MySQL connector via pip here
This allows you to connect to a MySQL database using python. Although there are other solutions, such as using a MySQL driver. I personally used the connector since it was simple to install and use immediately. Also, if you are planning on connecting to a MySQL database locally (e.g. localhost), make sure you already have MySQL installed.
Packages to install and why
CSV
Packed within this module methods that can be used to manipulate CSV ( Comma Separated Values) files. You can install this module in a similar fashion with the MySQL connector here . To use any module you install, simple import the module e.g. import csv, at the top of your python file.
OS
Another interesting module that allows you to play with operating system features. For this solution, I will be using the child module, os.path, mainly to validate and transform paths. As you did with the above module, install this module here and import os.path as well at the top of your python script.
SYS
In order to obtain a list of command line arguments, essential for this solution, I will be using the SYS module. More specifically, the command sys.argv. Similary, install module here and import into your file.
Overall logic or semi-pseudo-code
Prior to coding, it is essential to understand why you are about to type some code and how you plan on going about it. The task is simple:
Read a given database and transform its tables into single csv files
- Obtain the database information from the user
- Establish a connection to the database
- Access all the tables and their contents
- Transform the tables into python dictionaries
- Write and save the csv files
This should give you an idea on how I will create a solution this the “problem”. So lets begin.
Code and its Explanation
Command Line Arguments
__name_ is a python variable that is defined by the python interpreter just before it executes the contents of the python script. If you ran your python script on the command line (ie python somescript.py), it sets the __name__ variable to a string value, “__main__”. Else if you imported the script, as you will with the CSV, SYS, and OS modules, the variable is set to the name of the script file without the “.py” (e.g. __name__ = “somescript”). So the if statements allows me to define when the code should run automatically.
if __name__ == "__main__":
if(len( sys.argv ) <= 4):
print("Please provide four values separated by a space: host database username password")
else:
active = {
"host":sys.argv[1],
"database":sys.argv[2],
"user":sys.argv[3],
"password":sys.argv[4]
}
connect_2_db(active)
else:
print("hello")
Using the sys.argv, the code extracts the command line arguments, assuming the script was ran from the command line and not imported as a module, and checks its length. Since the host, database name, username and password are all required for the database connection, the length of the arguments should be greater than 4. Why “greater than 4″ and not ” greater than 3″ ? well, the first element in the sys.argv list is always the name of the script, so name of script + 4 extra arguments = 5. Once the conditions are met, the dabatase connection is established via the function “connect_2_db”.
Connecting to the Database
Simply out of “habit” and “paranoia”, I check if the values required for the database connections are available via the length. Once that is complete, the connection is established within a “try, except, finally” scope. Just a fancier way ( compared to if and else statements in my weak opinion) of handling errors/exceptions.
def connect_2_db(active):
if len(active.keys()) < 4 : raise Exception("database connection arguments are insufficent", active)
try:
connection = MYSQL.connect(host=active["host"], database=active["database"], user=active["user"], password=active["password"], charset='utf8');
if connection.is_connected():
print("connection established")
return tables_to_dictionary(connection,active["database"])
else:
print("database connection was not established. Most likely database does not exists")
except Error as e:
print("Error while connecting: ", e)
finally:
#close database
return False
If the connection is established and confirmed using the is_connected method, the tables within the given database are transformed after calling the “tables_to_dictionary” function. Else if an exception is thrown, whatever lies within the except scope is executed. The finally statement allows you to trigger code regardless of how successful the try and except statement were.
Transforming MYSQL database to a python dictionary
This is where the fun starts and ends lol. It took me a while to figure this out. But I did it! First the connection object is validated and the presence of the database name is confirmed before moving forward. An instance of the cursor is created which allows us to execute several SQL statements such SELECT, USE, DESCRIBE etc.
def tables_to_dictionary(connection, db_name):
if( not connection or not db_name): return null
cursor = connection.cursor()
#show all tables
cursor.execute("USE {}".format(db_name))
cursor.execute("SHOW TABLES")
tables = cursor.fetchall()
all_tables = {}
for table in tables:
#read through all tables
for element in table:
#get the headers of the table
eachtable = {}
try:
cursor.execute("SHOW COLUMNS FROM {}".format(element))
headers = cursor.fetchall()#
for header in headers:
#first value is the header name
header_name = header[0]
first_column_header = headers[0][0]
if not header_name in eachtable:
eachtable[header_name] = []
#select all rows from one column
cursor.execute("USE {}".format(db_name))
#this way it is sorted by most likely the ID or whatever id
cursor.execute("SELECT {} from {} ORDER BY {}.{}".format(header_name,element,element, first_column_header))
rows = cursor.fetchall()
for row in rows:
#create list for column
eachtable[header_name].append(row[0])
#update all tables dictionary
all_tables[element] = eachtable
except Error as e:
print("error while handling headers and table rows", e)
print("MySQL connection is closed")
cursor.close()
connection.close()
return write_data(all_tables)
Following the database selection using the USE statement, the names of the tables are extracted and inserted into a list called headers. Then we loop over the list, get all names of the columns within the current table and insert the table and its contents within a dictionary. Once all the tables have been analysed, the cursor and database connection are closed and the CSV files are written using the “write_data” function.
Writing dictionary into CSV files
The following function is quite self-explanatory. First it asks the user where the csv files are to be saved, then it loops through the table names within the dictionary given as an argument. Here comes the tricky part: since the columns within a table my vary in length, using hard-coded value for looping within the columns would cause an error. So I created another function called “get_longest_list” which loops through each of the columns of a given table and determines the “longest”. The length of the longest column is used for the loop, so no hard-written value here, to make sure that all the rows of all columns in a table are not skimmed over and written to the CSV file.
def write_data(tables):
if(tables.keys() and len(tables.keys()) < 0 ): return
files_save_dir = input("Where do you want your csv files saved ? = ")
if(len(files_save_dir) <= 0): files_save_dir = "."
elif( not os.path.exists(files_save_dir)): files_save_dir = "."
directory_path = os.path.abspath(files_save_dir)
print("Files will be saved at {}".format( directory_path ))
#if not empty write into a file
#loop through keys to get table names; create new file per table
for table_name in tables.keys():
#create new file
with open('{}\{}.csv'.format(directory_path,table_name), mode='w', encoding='utf-8') as table_file:
table_writer = csv.writer(table_file, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)
table_columns = (tables[table_name]).keys()
table_writer.writerow(table_columns)
longest_column = get_longest_list(tables[table_name])
each_row_in_table = []
for index in range(longest_column[0]):
#loop through each column while checking if the length isnt exceeded; if so, skip the column and write another comma #
horizontal = []
cur_table = tables[table_name]
for column_name in table_columns:
if( len(cur_table[column_name]) < index):
horizontal.append("-")
else:
horizontal.append( cur_table[column_name][index] )
table_writer.writerow(horizontal)
#write the row
table_file.close()
return
def get_longest_list(dictionary):
longest = [0, ""]
for key in dictionary.keys():
length = len(dictionary[key])
if(length > longest[0]):
longest[0] = length
longest[1] = key
return longest
Test Run
Local MySQL Database
I used the website to generate mock / placeholder SQL data which was imported into a database I created using the popular MySQL tool called MySQL workbench. As you can see below, the SQL data inserted a table named mock_data. The persons table was created by me for testing purposes. It is basically an empty table. The mock_data table contains 6 columns namely: ID, first_name, last_name, gender and ip_address. So if the script is “error free”, two CSV files called mock_data.csv and persons.csv should be created upon running the script in the command line.
The script ( which I named backuper.py: don’t ask why) and the arguments for database connection provided.
The result! two CSV files within the current directory one empty (persons.csv) and the other correctly filled. YEAAAAAAA!
Files here
Closing Word
I hope this post helped you not only solve this problem but let you see the possibilities in using python to simplify repetitive tasks. See you in the next post.