Coding

โ€œPython Learning: Automate Boring Stuff with Python | Chapter 11 : My Solution to Link Verification (with a twist)

Instead of downloading the pages, let’s just write the links out

4:40:41 PM


In the last task we downloaded images directly to our machine into a folder in the current working directory. Since I am well acquainted with downloading files using requests, I decided to just write out the links into a text file instead of downloading the entire file.

The code is down below:

 

[sourcecode language=”python”]

#Linked Verification -> Write all linked or 404 pages in a given page
#USAGE python linkver.py <url>

import logging, sys, requests, bs4, os

logging.basicConfig(level=logging.DEBUG, format="%(asctime)s – %(levelname)s – %(message)s")

if len(sys.argv) == 2:
#TODO: get url
url = sys.argv[1]
#TODO: get page
try:
res = requests.get(url)
res.raise_for_status()
#TODO: get all links
page_soup = bs4.BeautifulSoup(res.text, "html.parser")
page_links = page_soup.select("a")
#TODO: create and write links into respective text files
good_file = open("good_links.txt", "w")
bad_file = open("bad_links.txt", "w")
good_file.close()
bad_file.close()
#TODO: create and write links into respective text files
good_file = open("good_links.txt", "a")
bad_file = open("bad_links.txt", "a")
#TODO: make requests to all links
for link in page_links:
link = link.get("href")
if not link.startswith("http"):
link = url + link
try:
link_res =requests.get(link)
#TODO: get status code
status = link_res.status_code
#TODO: update 404 and good pages list
if int(status) == 200:
good_file.write(link + "\n")
logging.info("Good links written")
else:
bad_file.write(link + "\n")
logging.info("Bad links written")
except Exception as err:
logging.error("Page Link Error: " + str(err))

good_file.close()
bad_file.close()
except Exception as err:
logging.error("Url Error: " + str(err))
else:
logging.error("USAGE python linkver.py <url>")

[/sourcecode]

When I ran “python linkver.py “https://ajalacomfort.com”” I got the following links:

good_links

bad link text file is empty ๐Ÿ˜€

see you soon!

Please follow and like us:
0

Enjoy this blog? Please spread the word :)

error: Content is protected !!