This is so simple, it needs no heading..

08/19/201810:10:49 PM Drinks Coffee as usual…


Straight to the point:

So this is the CSV data I will be working with ( generated using the website: https://mockaroo.com)

MOCK_DATA

Goal

Create a bar plot showing  the number of emails based on the alphabets they begin with . For instance, how many emails begin with “a” etc… This can be useful for visualizing emails from various email providers (e.g. yahoo, gmail, etc..).

The code……..

Code

import csv
import matplotlib.pyplot as plt
import numpy as np
import sys
filename = "MOCK_DATA.csv"


def count_email(rows):
    dictionary = {}
    for row in rows:
        email = row[3]
        if email[0] not in dictionary:
            dictionary[email[0]] = 0
        dictionary[email[0]] += 1
    return dictionary


def plot_hist(dictionary):
    keys = dictionary.keys()
    newdata = []
    for key in keys:
        newdata.append(dictionary[key])
        x = np.arange(len(keys))
    plt.bar(x, height=newdata)
    plt.xticks(x+0.1, keys)
    plt.xlabel("Emails start with...")
    plt.ylabel("Frequency")
    plt.show()


data = []

try:
    with open(filename) as f:
        reader = csv.reader(f)
        data = []
        for row in reader:
            data.append(row)
except csv.Error as e:
    print("Error while readinf csv file ", e)
    sys.exit(-1)
if data:
    soln = count_email(data)
    print(soln)
    if soln:
        plot_hist(soln)

Notes

This assumes your files are all in the same directory ( same level).

Running the code like this:

 

Produces:

 

Well that was it… if you have suggestions, improvements, questions, do not hesitate to comment down below.

See ya.

%d bloggers like this: