Web Scraping Hawaii Oncologists

12/27/2023

Introduction

MediFind is a website that allows people to search for nearby doctors. According to the website, this empowers all patients to take control of their own healthcare decisions and improve their length and quality of life. You can find doctors that have an expertise in a given field or that will perform specific procedures. For example, an oncologist is a doctor who treats cancer and provides medical care for a person diagnosed with cancer. For my next project with SingularAgent, I wanted to improve upon its web scraping capabilities. Today I am announcing that SingularAgent can scrape data for all the oncologists in Hawaii on MediFind and store the results in a CSV file.

Conditional Loops

For a long time, SingularAgent has had the ability to perform conditional branching with If and Switch methods. An If method will execute one process if a condition is true and another process if it evaluates to false. A process in SingularAgent can have one or more methods. A Switch method will execute different processes depending on the value in a parameter. The If and Switch methods provided the flexibility to make conditional decisions. However, if I wanted to repeat a process over and over again, I didn't have an easy way to do so. I could call the process over and over again but this wouldn't be very readable or concise.

So, I created 3 specific methods that would allow me to have conditional loops in SingularAgent: For, Foreach, and While. The For method allows me to call a process for a specific or conditional number of times. The Foreach method allows me to call a process once for each parameter in a list of parameters. The While method allows me to call a process over and over again until the condition is true. Each of these new conditional loop methods insert additional methods into the process during runtime.

Here is an example of a process in SingularAgent before the While method has been executed:

Method A -> While Method to setup loop for Process B -> Method C

Here is an example of a process in SingularAgent after the While method has been executed:

Method A -> While Method -> Process B -> While Condition Check Method (if condition is true then go back to Process B) -> Method C

Now SingularAgent has the flexibility to conditionally loop processes dynamically during runtime.

Challenges Overcome

I had to overcome a number of different challenges for this project. 

The first issue that I ran into, is that the search results containing the lists of doctors didn't actually contain all the doctors after the webpage finished loading. The other doctors were lazy loaded into the DOM (Document Object Model) as the user scrolls down the page. So SingularAgent had to press the end key on a doctor search result page to force the DOM to include the entire list of doctors.

The next issue was finding the Next button on the doctor search results page. The number of doctors on a search results page could be anywhere from 0 to 25. This meant that the Next button could be on the page in a non-fixed location. Before clicking the Next button, SingularAgent had to do the following:

Finally, during my testing I found a bug in the search results of the website. After clicking the Next button, sometimes a doctor at the end of the current page would overwrite the doctor on the first result of the next page. Also, cities that are within 25 miles of each other would list the same doctors. So, I had to remove duplicate doctors from my data before navigating to the doctor URLs and writing to the CSV file.

Demo

Here is an overview of the process being executed in this demo:

The YouTube video is provided below. You'll notice that SingularAgent waits 12 seconds for a page to load. I set this on purpose because I didn't want to send too many requests to the MediFind website.

As a result of these changes, now SingularAgent has the ability to scrape a wide variety of websites and perform many useful automation actions on a computer. Enjoy watching!

https://www.youtube.com/watch?v=_aRD0lX2ShU