Python Web Scraping Addresses

Summary

The customer task was to create a mailing list in Excel of addresses for all lot owners in a subdivision in Florida. The names and addresses were available on the county’s appraiser website.

Implementation

It was decided to use Python script with Selenium Chrome driver to perform the web scraping. The web site required user interaction to transverse pages so Selenium was chosen. Python was chosen for its simplicity and ease of working with strings.

All sections were processed taking the name and address information from the 2020 Property Record Card web page. The output was stored, unmodified, in a formatted csv file with multiple name columns, street column, and city/state column. The scraping took over 8 hours and a total of 10,404 entries were captured (less than 3 seconds per entry driven by the speed of the website).

Stale Data

The only trick used was related to stale data issue error raised by the Selenium driver. Normally this problem is solved with a WebDriverWait() but that approach didn’t work. A solution was found that used a while loop with try/except block. This approach solved the problem.

         staleElement = True
        # used for stale data
        while(staleElement):
            try:
                nextpage = driver.find_element_by_xpath(myxpath)
                staleElement = False
            except:
                time.sleep(2)

Python Web Scraping Addresses

Summary

Implementation

Stale Data

Related

Published by Paul Shultz

Leave a ReplyCancel reply