How Web Scraping is Used to Extract Google Reviews using Selenium (Python)?

 Blog /  How Web Scraping is Used to Extract Google Reviews using Selenium (Python)

  27 January 2022

how-web-scraping-is-used-to-extract-google-reviews-using-selenium

Web scraping is a useful tool, but can also lead to ethical and legal ambiguity. To begin with, a website might receive lots of applications each second. Browsing at such amazing speeds is bound to attract attention. Such large amounts of requests are likely to clog up a website's servers and, under extreme situations, might be considered a denial-of-service assault. Similarly, any webpage that requires a login may include information that is not regarded as public as a result, and scraping such web pages might put you in legal trouble.

A developer may be required to deal with Google (Map) Reviews in a variety of situations. Anyone acquainted with the Google My Business API knows that getting reviews using the API requires an accountId for every location(business). Scraping reviews may be quite useful in situations when a developer wants to work with evaluations from several sites (or does not have access to a Google business account).

This blog will show you how Selenium and BeautifulSoup are used to Scrape Google reviews.

The technique of obtaining data from web pages is known as web scraping. Various libraries that may assist you with this, including:

Beautiful Soup is a great tool for parsing the DOM, as it simply extracts data from HTML and XML files.

Scrapy is an open-source program that allows you to scrape huge databases at scale.

Selenium is a browser automation tool that aids in the interaction of JavaScript in scraping tasks (clicks, scrolls, filling in forms, drag, and drop, moving between windows and frames, etc.)

Here we will scrape the page with Selenium, add more content, and then parse the HTML file with Beautiful Soup.

Selenium

#Installing with pip
pip install selenium#Installing with conda
conda install -c conda-forge selenium

To interact with the chosen site, Selenium needs a driver.

BeautifulSoup

We will analyze the HTML page with BeautifulSoup and retrieve the data you need.

To install BeautifulSoup

#Installing with pip
pip install beautifulsoup4#Installing with conda
conda install -c anaconda beautifulsoup4

Initialization of the Web Driver and URL Input

Install and initialize the web driver, that's all there is to it. Then you will send the Google Maps URL of the place we wish to receive reviews for to the get method:

from selenium import webdriver
driver = webdriver.Chrome()#London Victoria & Albert Museum URL
url = 'https://www.google.com/maps/place/Victoria+and+Albert+Museum/@51.4966392,-0.17218,15z/data=!4m5!3m4!1s0x0:0x9eb7094dfdcd651f!8m2!3d51.4966392!4d-0.17218'
driver.get(url)

Navigating

Before going to the real web address, we specified through URL variable, the web driver is quite likely to come across the permission Google page to accept to cookies. If that's the case, you may proceed by clicking the "I agree" option.

navigating

Selenium provides numerous ways; in this case, I used find element_by_xpath()

driver.find_element_by_xpath('//*[@id="yDmH0d"]/c-wiz/div/div/div/div[2]/div[1]/div[4]/form/div[1]/div/button').click()#to make sure content is fully loaded we can use time.sleep() after navigating to each page
import time
time.sleep(3)

Here is when things will be complicated. The URL in the example will take you to the précised website and you will click the 41,408 reviews button to load the reviews.

On Google Maps, however, there are various sorts of profile pages, and in many cases, the location will be presented with a several other locations on the left side, and there will almost certainly be some adverts at the top of the list. The following URL, for example, will lead us to this page:

navigating

You will develop an error handler code and go to load reviews so you will not get caught on various sorts of layouts or load the wrong page.

URL = 'https://www.google.com/maps/search/bicycle+store/@51.5026862,-0.1430242,13z/data=!3m1!4b1'

try:
driver.find_element(By.CLASS_NAME, "widget-pane-link").click()
except Exception:
response = BeautifulSoup(driver.page_source, 'html.parser')
# Check if there are any paid ads and avoid them
if response.find_all('span', {'class': 'ARktye-badge'}):
ad_count = len(response.find_all('span', {'class': 'ARktye-badge'}))
li = driver.find_elements(By.CLASS_NAME, "a4gq8e-aVTXAb-haAclf-jRmmHf-hSRGPd")
li[ad_count].click()
else:
driver.find_element(By.CLASS_NAME, "a4gq8e-aVTXAb-haAclf-jRmmHf-hSRGPd").click()
time.sleep(5)
driver.find_element(By.CLASS_NAME, "widget-pane-link").click()

Scrolling and Loading More Reviews

It should lead us to the review page. However, the initial load would only deliver 10 reviews, and the subsequent scroll would bring another ten. We'll calculate how many times we'll have to browse and employ the chrome drivers execute_script () function to obtain all of the reviews for the location.

#Find the total number of reviews
total_number_of_reviews = driver.find_element_by_xpath('//*[@id="pane"]/div/div[1]/div/div/div[2]/div[2]/div/div[2]/div[2]').text.split(" ")[0]
total_number_of_reviews = int(total_number_of_reviews.replace(',','')) if ',' in total_number_of_reviews else int(total_number_of_reviews)#Find scroll layout
scrollable_div = driver.find_element_by_xpath('//*[@id="pane"]/div/div[1]/div/div/div[2]')#Scroll as many times as necessary to load all reviews
for i in range(0,(round(total_number_of_reviews/10 - 1))):
        driver.execute_script('arguments[0].scrollTop = arguments[0].scrollHeight', 
                scrollable_div)
        time.sleep(1)
dsfdsfd

The scrollable element is initially found inside the for loop above. The scroll bar is initially located at the top, which implies the vertical placement is 0. We adjust the vertical position(.scrollTop) of the scroll element(scrollable_div) to its height by giving a simple JavaScript snippet to the chrome driver(driver.execute_script) (.scrollHeight). The scroll bar is being dragged vertically from point 0 to position Y. (Y = any height you choose).

navigating

Our driver navigated towards the end of the reviews page and downloaded all of the accessible review pieces. You can now analyze them and get the information we need. (Reviewer, Review Text, Review Sentiment, and so on.) To do so, look for the same class name as the outermost item displayed inside the scroll layout, which holds all the information about individual reviews, and use it to extract a list of reviews.

Analyzing HTML And Data Extraction

response = BeautifulSoup(driver.page_source, 'html.parser')
reviews = response.find_all('div', class_='ODSEW-ShBeI NIyLF-haAclf gm2-body-2')

We can now construct a function to get important data from the HTML-parsed reviews result set. The code below would accept the answer set and provide a Pandas DataFrame with pertinent extracted data in the Review Rate, Review Time, and Review Text columns.

def get_review_summary(result_set):
rev_dict = {'Review Rate': [],
'Review Time': [],
'Review Text' : []}
for result in result_set:
review_rate = result.find('span', class_='ODSEW-ShBeI-H1e3jb')["aria-label"]
review_time = result.find('span',class_='ODSEW-ShBeI-RgZmSc-date').text
review_text = result.find('span',class_='ODSEW-ShBeI-text').text
rev_dict['Review Rate'].append(review_rate)
rev_dict['Review Time'].append(review_time)
rev_dict['Review Text'].append(review_text)
import pandas as pd 
return(pd.DataFrame(rev_dict))

This blog provided a basic introduction to web scraping using Python as well as a simple example. Data sourcing might necessitate the use of complicated data collection techniques, and without the correct tools, it can be a time-consuming and costly task.

If you are looking to Scrape Google Reviews using Python and Selenium, contact ReviewGator, today!

Request for a quote!

Send a message

Feel free to reach us if you need any assistance.

Contact Us

We’re always ready to help as well as answer all your queries. We are looking forward to hearing from you!

Call Us On

+1(832) 251 7311

Address

10685-B Hazelhurst Dr. # 25582 Houston,TX 77043 USA