Coding

โ€œPython Learning: Automate Boring Stuff with Python | Chapter 11 : My Solution to Link Verification (with a twist)

Instead of downloading the pages, let’s just write the links out

4:40:41 PM


In the last task we downloaded images directly to our machine into a folder in the current working directory. Since I am well acquainted with downloading files using requests, I decided to just write out the links into a text file instead of downloading the entire file.

The code is down below:

 


#Linked Verification -> Write all linked or 404 pages in a given page 
#USAGE python linkver.py <url>


import logging, sys, requests, bs4, os


logging.basicConfig(level=logging.DEBUG, format="%(asctime)s - %(levelname)s - %(message)s")


if len(sys.argv) == 2:
#TODO: get url
	url = sys.argv[1]
#TODO: get page 
	try:
		res = requests.get(url)
		res.raise_for_status()
#TODO: get all links 
		page_soup = bs4.BeautifulSoup(res.text, "html.parser")
		page_links = page_soup.select("a")
		#TODO: create and write links into respective text files 
		good_file = open("good_links.txt", "w")
		bad_file = open("bad_links.txt", "w")
		good_file.close()
		bad_file.close()
		#TODO: create and write links into respective text files 
		good_file = open("good_links.txt", "a")
		bad_file = open("bad_links.txt", "a")
#TODO: make requests to all links
		for link in page_links:
			link = link.get("href")
			if not link.startswith("http"):
				link = url + link
			try:
				link_res =requests.get(link)
				#TODO: get status code 
				status = link_res.status_code
				#TODO: update 404 and good pages list
				if int(status) == 200:
					good_file.write(link + "\n")
					logging.info("Good links written")
				else:
					 bad_file.write(link + "\n")
					 logging.info("Bad links written")
			except Exception as err:
				logging.error("Page Link Error: " +  str(err)) 

		good_file.close()
		bad_file.close()
	except Exception as err:
		logging.error("Url Error: " + str(err))
else:
	logging.error("USAGE python linkver.py <url>")

When I ran “python linkver.py “https://ajalacomfort.com”” I got the following links:

good_links

bad link text file is empty ๐Ÿ˜€

see you soon!

error: Content is protected !!