Python Web Scraping – Testing with a Crawler

Python Web Scraping: Testing with a Web Scraper

This chapter explains how to use a web crawler for testing in Python.

Introduction

In large website projects, automated testing of the website’s backend is performed regularly, but front-end testing is often skipped. The main reason behind this is that website programming is like a web of various markup and programming languages. We can write unit tests for one language, but interacting with another becomes challenging. That’s why we must have a suite of tests to ensure our code performs as expected.

Testing with Python

When we talk about testing, we mean unit testing. Before we delve into testing in Python, we must understand unit testing. Here are some characteristics of unit testing

  • In each unit test, at least one aspect of a component’s functionality is tested.
  • Each unit test is independent and can be run independently.

  • Unit tests do not interfere with the success or failure of any other tests.

  • Unit tests can be run in any order and must contain at least one assertion.

Unittest – The Python Module

The Python module for unit testing is called Unittest and is included in all standard Python installations. We just need to import it, and the rest is up to the unittest.TestCase class, which does the following:

  • The unittest.TestCase class provides SetUp and TearDown functions. These functions can be run before and after each unit test.
  • It also provides assertion statements that allow tests to pass or fail.

  • It runs all functions that begin with test_ as unit tests.

Example

In this example, we’ll combine a web scraper with unittest. We’ll test a Wikipedia page that searches for the string “Python”. It essentially performs two tests: the first is that the page title matches the search string, i.e., ‘Python’; the second is to ensure that the page has a content div.

First, we’ll import the required Python modules. We’ll use BeautifulSoup for web scraping and, of course, unittest for testing.

from urllib.request import urlopen
from bs4 import BeautifulSoup
import unittest

Now we need to define a class that will extend unittest.TestCase. The global bs object will be shared across all tests. A unittest-specific function, setUpClass, will accomplish this. Here we will define two functions, one for testing the title page and the other for testing the page content.

class Test(unittest.TestCase):
   bs = None
   def setUpClass():
      url = 'https://en.wikipedia.org/wiki/Python'
      Test.bs = BeautifulSoup(urlopen(url), 'html.parser')
   def test_titleText(self):
      pageTitle = Test.bs.find('h1').get_text()      self.assertEqual('Python', pageTitle);
   def test_contentExists(self):
      content = Test.bs.find('div',{'id':'mw-content-text'})
      self.assertIsNotNone(content)
if __name__ == '__main__':
   unittest.main()

After running the above script we will get the following output –

-------------------------------------------------------------------------
Ran 2 tests in 2.773s

OK
An exception has occurred, use %tb to see the full traceback.

SystemExit: False

D:ProgramDatalibsite-packagesIPythoncoreinteractiveshell.py:2870:
UserWarning: To exit: use 'exit', 'quit', or Ctrl-D.
 warn("To exit: use 'exit', 'quit', or Ctrl-D.", stacklevel=1)

Testing with Selenium

Let’s discuss testing with Python Selenium. This is also known as Selenium testing. Python unittest and Selenium have little in common. We know that Selenium sends standard Python commands to different browsers, despite their different browser designs. Recall that we installed and used Selenium in the previous chapter. Here, we’ll create test scripts in Selenium and use them for automation.

Example

With the help of the next Python script, we’ll create a test script for automating the Facebook login page. You can modify this example to automate other forms and logins of your choice, but the concepts remain the same.

First, to connect to the web browser, we will import the webdriver from the Selenium module.

from selenium import webdriver

Now, we need to import Keys from the Selenium module.

from selenium.webdriver.common.keys import Keys

Next, we need to provide our username and password to log in to our Facebook account.

user = "gauravleekha@gmail.com"
pwd = ""

Next, provide the path to the Chrome browser’s web driver.

path = r'C:UsersgauravDesktopChromedriver'
driver = webdriver.Chrome(executable_path=path)
driver.get("<http://www.facebook.com>")

Now we’ll verify these conditions by using the assert keyword.

assert "Facebook" in driver.title

With the help of the following line of code, we’re sending a value to the email section. Here, we’re searching by its ID, but we could have searched by name, like driver.find_element_by_name(“email”)

element = driver.find_element_by_id("email")
element.send_keys(user)

With the help of the following line of code, we are sending a value to the password field. Here, we are searching by its ID, but we could search by name, such as driver.find_element_by_name(“pass”) .

element = driver.find_element_by_id("pass")
element.send_keys(pwd)

The next line of code is used to insert the values into the email and password fields after pressing enter/login.

element.send_keys(Keys.RETURN)

Now we’ll close the browser.

driver.close()

After running the above script, the Chrome web browser will open, and you can see that the email and password have been inserted and the Login button has been clicked.

Python Web Scraping - Testing with a Crawler

Comparison: unittest or Selenium

Comparing unittest and selenium is difficult because if you want to handle large test suites, you need the syntactic rigidity of unittest. On the other hand, if you want the flexibility of testing a website, then Selenium testing would be your first choice. But what if we could combine the two? We can import Selenium into Python unittest and get the best of both worlds. Selenium can be used to obtain information about the website, and unittest can evaluate whether this information meets the criteria for passing the test.

For example, let’s rewrite the Python script above to automate Facebook login, combining the two as shown below.

import unittest
from selenium import webdriver

class InputFormsCheck(unittest.TestCase):
def setUp(self):
self.driver = webdriver.Chrome(r'C:UsersgauravDesktopchromedriver')
def test_singleInputField(self):
user = "gauravleekha@gmail.com"
pwd = ""
pageUrl = "http://www.facebook.com"
driver = self.driver
driver.maximize_window()
driver.get(pageUrl)
assert "Facebook" in driver.title
elem = driver.find_element_by_id("email")
elem.send_keys(user)
elem = driver.find_element_by_id("pass")
      elem.send_keys(pwd)
      elem.send_keys(Keys.RETURN)
   def tearDown(self):
      self.driver.close()
if __name__ == "__main__":
   unittest.main()

Leave a Reply

Your email address will not be published. Required fields are marked *