For this project the client was looking for the distance between zip codes and cities. The client created an Excel workbook that had multiple worksheets. Each worksheet had hundreds of zip codes in a column and 13 cities in a row. This created the need for almost 20,000 distance values. I located a web address (URL) that provide a web page with the distance between two zip codes. The web scraping was implemented using Selenium with Chrome browser.
installing selenium for excel
To setup Selenium for Excel first you need to download SeleniumBasic from GitHub. The version I used is v126.96.36.199 released 3/2/2016. After installing SeleniumBasic you need to download the driver for your browser. I use Chrome. The browser driver must be installed in the same directory as SeleniumBasic (usually C:\Users\…\AppData\Local\SeleniumBasic).
To use SeleniumBasic with Excel, while in the VBA editor you need to select Selenium Type Library (Tools > References… ).
Since the web site URL needed two zip codes I converted the client’s cities to the city airport zip code. The code was set up to loop through all worksheets. On each worksheet the VBA worked each column zip code and cycled through all row zip codes. Some times the column zip codes where zip+4 so the code used the Left function only kept the first 5 digits.
The URL was fixed for all web calls so a string was used with place holder substrings for zip1 (XXXXX) and zip2 (YYYYY). The process simply replaced the substrings with the zip strings taken from the worksheet.
Error handling was necessary for when a web page didn’t load properly and the case where the page loaded but it didn’t contain the distance information. This occurred when a column zip code was not correct. The core code had two while loops for column and row zip codes.
Worksheet Core Code
While (ws.Cells(iRow, 1) <> "") ' until column is blank, no zip code zip1 = ws.Cells(iRow, 1) zip1 = Left(zip1, 5) ' zip + 4, only keep 5 While (ws.Cells(2, iCol) <> "") ' across the columns ' create URL and replace zip codes with values url = "https://www.zip-codes.com/distance_calculator.asp?zip1=XXXXX&zip2=YYYYY&submit=Calculate" zip2 = ws.Cells(2, iCol) url = Replace(url, "XXXXX", zip2) url = Replace(url, "YYYYY", zip1) ' attempt to get website On Error GoTo tryAgain For try = 1 To Attempts ' get website driver.Get url Exit For tryAgain: Next try On Error GoTo 0 If (try <= Attempts) Then ' class that has distance in miles Set miClass = driver.FindElementByClass("mi", timeout:=0, Raise:=False) If Not miClass Is Nothing Then ws.Cells(iRow, iCol) = miClass.Text ' get distance data Else ws.Cells(iRow, iCol) = 0 ' bad url, no data, flag ws.Cells(iRow, iCol).Interior.ColorIndex = 3 ' red End If Else ws.Cells(iRow, iCol) = 0 ' unable to get url ws.Cells(iRow, iCol).Interior.ColorIndex = 6 ' yellow End If iCol = iCol + 1 ' next column Wend iRow = iRow + 1 ' next row iCol = 2 ' restart column Wend
Overall the VBA worked well. The error handling was added after run-time errors occurred during web scraping. A quick Internet search help recommend code changes to improve error handling.