Summary
For this project the client was looking for the distance between zip codes and cities. The client created an Excel workbook that had multiple worksheets. Each worksheet had hundreds of zip codes in a column and 13 cities in a row. This created the need for almost 20,000 distance values. I located a web address (URL) that provide a web page with the distance between two zip codes. The web scraping was implemented using Selenium with Chrome browser.
installing selenium for excel
To setup Selenium for Excel first you need to download SeleniumBasic from GitHub. The version I used is v2.0.9.0 released 3/2/2016. After installing SeleniumBasic you need to download the driver for your browser. I use Chrome. The browser driver must be installed in the same directory as SeleniumBasic (usually C:\Users\…\AppData\Local\SeleniumBasic).
To use SeleniumBasic with Excel, while in the VBA editor you need to select Selenium Type Library (Tools > References… ).
Implementation
Since the web site URL needed two zip codes I converted the client’s cities to the city airport zip code. The code was set up to loop through all worksheets. On each worksheet the VBA worked each column zip code and cycled through all row zip codes. Some times the column zip codes where zip+4 so the code used the Left function only kept the first 5 digits.
The URL was fixed for all web calls so a string was used with place holder substrings for zip1 (XXXXX) and zip2 (YYYYY). The process simply replaced the substrings with the zip strings taken from the worksheet.
Error handling was necessary for when a web page didn’t load properly and the case where the page loaded but it didn’t contain the distance information. This occurred when a column zip code was not correct. The core code had two while loops for column and row zip codes.
Worksheet Core Code
While (ws.Cells(iRow, 1) <> "") ' until column is blank, no zip code
zip1 = ws.Cells(iRow, 1)
zip1 = Left(zip1, 5) ' zip + 4, only keep 5
While (ws.Cells(2, iCol) <> "") ' across the columns
' create URL and replace zip codes with values
url = "https://www.zip-codes.com/distance_calculator.asp?zip1=XXXXX&zip2=YYYYY&submit=Calculate"
zip2 = ws.Cells(2, iCol)
url = Replace(url, "XXXXX", zip2)
url = Replace(url, "YYYYY", zip1)
' attempt to get website
On Error GoTo tryAgain
For try = 1 To Attempts
' get website
driver.Get url
Exit For
tryAgain:
Next try
On Error GoTo 0
If (try <= Attempts) Then
' class that has distance in miles
Set miClass = driver.FindElementByClass("mi", timeout:=0, Raise:=False)
If Not miClass Is Nothing Then
ws.Cells(iRow, iCol) = miClass.Text ' get distance data
Else
ws.Cells(iRow, iCol) = 0 ' bad url, no data, flag
ws.Cells(iRow, iCol).Interior.ColorIndex = 3
' red
End If
Else
ws.Cells(iRow, iCol) = 0 ' unable to get url
ws.Cells(iRow, iCol).Interior.ColorIndex = 6
' yellow
End If
iCol = iCol + 1 ' next column
Wend
iRow = iRow + 1 ' next row
iCol = 2 ' restart column
Wend
Overall the VBA worked well. The error handling was added after run-time errors occurred during web scraping. A quick Internet search help recommend code changes to improve error handling.