Excel Web Scraping Calculating Distance Between Zip Codes

Summary

For this project the client was looking for the distance between zip codes and cities. The client created an Excel workbook that had multiple worksheets. Each worksheet had hundreds of zip codes in a column and 13 cities in a row. This created the need for almost 20,000 distance values. I located a web address (URL) that provide a web page with the distance between two zip codes. The web scraping was implemented using Selenium with Chrome browser.

installing selenium for excel

To setup Selenium for Excel first you need to download SeleniumBasic from GitHub. The version I used is v2.0.9.0 released 3/2/2016. After installing SeleniumBasic you need to download the driver for your browser. I use Chrome. The browser driver must be installed in the same directory as SeleniumBasic (usually C:\Users\…\AppData\Local\SeleniumBasic).

To use SeleniumBasic with Excel, while in the VBA editor you need to select Selenium Type Library (Tools > References… ).

Implementation

Since the web site URL needed two zip codes I converted the client’s cities to the city airport zip code. The code was set up to loop through all worksheets. On each worksheet the VBA worked each column zip code and cycled through all row zip codes. Some times the column zip codes where zip+4 so the code used the Left function only kept the first 5 digits.

The URL was fixed for all web calls so a string was used with place holder substrings for zip1 (XXXXX) and zip2 (YYYYY). The process simply replaced the substrings with the zip strings taken from the worksheet.

Error handling was necessary for when a web page didn’t load properly and the case where the page loaded but it didn’t contain the distance information. This occurred when a column zip code was not correct. The core code had two while loops for column and row zip codes.

Worksheet Core Code

While (ws.Cells(iRow, 1) <> "") ' until column is blank, no zip code
    zip1 = ws.Cells(iRow, 1)
    zip1 = Left(zip1, 5)    ' zip + 4, only keep 5
    While (ws.Cells(2, iCol) <> "") ' across the columns
        ' create URL and replace zip codes with values
        url = "https://www.zip-codes.com/distance_calculator.asp?zip1=XXXXX&zip2=YYYYY&submit=Calculate"
        zip2 = ws.Cells(2, iCol)
        url = Replace(url, "XXXXX", zip2)
        url = Replace(url, "YYYYY", zip1)
                
        ' attempt to get website
        On Error GoTo tryAgain
        For try = 1 To Attempts
            ' get website
            driver.Get url
            Exit For
tryAgain:
        Next try
        On Error GoTo 0
                    
        If (try <= Attempts) Then
            ' class that has distance in miles
            Set miClass = driver.FindElementByClass("mi", timeout:=0, Raise:=False)
            If Not miClass Is Nothing Then
                ws.Cells(iRow, iCol) = miClass.Text ' get distance data
            Else
                ws.Cells(iRow, iCol) = 0            ' bad url, no data, flag
                ws.Cells(iRow, iCol).Interior.ColorIndex = 3
    ' red
            End If
        Else
            ws.Cells(iRow, iCol) = 0                ' unable to get url
            ws.Cells(iRow, iCol).Interior.ColorIndex = 6
       ' yellow
        End If
        iCol = iCol + 1 ' next column
    Wend
    iRow = iRow + 1 ' next row
    iCol = 2        ' restart column
Wend

Overall the VBA worked well. The error handling was added after run-time errors occurred during web scraping. A quick Internet search help recommend code changes to improve error handling.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s