Overview reference

Other Implementations

Java

Problem

Goal

Simply obtain a valid url from the command line, determine if the given url has a rss feed ( <url>/feed). If it does, display a list of all the items and allow the user to open a post by clicking on an item.

Pseudo-code

  1. Get url from user
  2. Make request to url
  3. Find rss feed link
    1. Check if rss feed exists: like content is in url/feed
    2. Else return not found
  4. Make request to feed link
  5. Create UI to display news feed

Solution

Get url from user

def geturlfromuser() -> str:
    userinput: str = input("Type in a valid url ")
    return userinput

This method simply retrieves the user’s url.

Validate Url

Before making any requests, we have to know if it points to something. Therefore with the urlparse module, I can extract the protocol or scheme from the user’s input and check if it is either an http or https scheme.

def validateurl(url: str) -> bool:
    parsed = urlparse(url)
    allowedscheme = ["http", "https"]
    scheme: str = parsed.scheme
    return scheme in allowedscheme

Remove trailing slashes

This function ensures there are no trailing / characters at the end of the user’s url.

def cleanurl(url: str) -> str:
    if url[len(url) - 1] == "/":
        return url[0: len(url) - 1]
    else:
        return url

Fetch the xml file at the user’s url

Once the url has been validated and cleaned up, we can make requests. Since we are simply retrieving data, the HTTP verb GET should suffice. Unlike JS where any bad response (4xx and 5xx status code) gets thrown as an error, you would have to raise the status to get a similar experience using the requests module. However when the status code is 2xx no error is thrown after response.raise_for_status call.

def fetchxmlasstring(url: str) -> str:
    try:
        response = requests.get(url)
        response.raise_for_status()
        return response.text
    except HTTPError as httperror:
        print("Errror:fetchxmlasstring {}".format(httperror))
    except Exception as e:
        print("General excepton: {}".format(e))
    else:
        print("done")

The response content can be read from the text property of the response object. As seen above there are two except blocks with different exception /error type. This allows us to narrow down the reaction of our program based on the type of exception thrown.

Get Item elements from XML

After that, it is time to extract the needed elements of the xml tree. According to this article, the most recent blog posts or articles are represented as item elements in the tree. The children nodes of each item element provides basic information about the specific post. For instance article title, description and link. For this exercise, only these elements are needed. By using the ET module the string xml content can be transformed into an object with methods allowing for specific item selection.

def getitems(content: str) -> List[Item]:
    root = ET.fromstring(content)
    items: List[Item] = list()
    print("finding items")
    for item in root.findall(".//item"):
        title = item.find("title").text
        description = item.find("description").text
        link = item.find("link").text
        item: Item = Item(title, description, link)
        items.append(item)
    return items

You may wonder, why did I use “.//item” and why not “item” ? I did actually, until I read the documentation in more detail. Based on the documented syntax, searching and selection by the tag alone means selection of DIRECT child nodes with the given tag e.g. item. If you open an RSS XML file, you’d notice that the item elements/nodes are not direct children of the root. Thus root.findall(“item”) would return nothing or an empty list. On the other hand, by using “.” ( current node) and / (a level down the current node) we can specify the level at which the selection should begin. Try to view the tree with an xml to json tree online tool; you should see that the item falls under the channel node, which is a direct child of the root node.

Note: .// would mean, two levels below the current node.

Setup the Graphical User Interface

With all the effort we’ve put in, we have to see something right? . Tkinter is apparently a well known module for quickly drawing up an interface in python. The expected GUI for this exercise is a list of items, each showing two non editable text fields ( title and descriptions) and a button ( for opening of the post in a browser).

def setupui(items: List[Item]) -> None:
    top = tkinter.Tk()
    for item in items:
        label = tkinter.Label(top, text=item.title)

        button = tkinter.Button(top, text="Open Post",
                                command=item.openbrowser)
        description = tkinter.Message(top, text=item.description)
        label.pack()
        description.pack()
        button.pack()
    tkinter.mainloop()

Since each button has to open a browser to a different url, i decided to create a class for each item which has title, description and link instance variables. The link value will be used by the instance method called “openbrowser”, which will be bound to the corresponding button element. Once the elements are created, the gui is displayed.

class Item:
    title: str
    description: str
    link: str

    def __init__(self, title, description, link):
        super().__init__()
        self.title = title
        self.description = description
        self.link = link

    def openbrowser(self) -> None:
        webbrowser.open_new_tab(self.link)

Start the Method

def start():
    url: str = geturlfromuser()
    isvalid: bool = validateurl(url)
    if isvalid == True:
        cleanedurl: str = cleanurl(url)
        content: str = fetchxmlasstring("{}/feed".format(cleanedurl))
        if content and len(content) > 0:
            items: List[Item] = getitems(content)
            setupui(items)
    else:
        print("invalid url {}".format(url))

Run Script

Running python script for RSS Feed GUI

Running python script for RSS Feed GUI

Full Code

here

Author Notes

This is most likely not the cleanest implementation, just a heads up.

Links

  1. Importing modules in python
  2. Typing in Python
  3. Parsing url
  4. Finding value in list 
  5. String manipulation
  6. Making Requests
  7. Python class
  8. Python class (1)
  9. Finding elements that are not direct children
  10. GUI
  11. Tkinter
  12. Tkinter events
  13. Binding arguments to functions before call
  14. Webbrowser
%d bloggers like this: