How web scraping is useful in finding Amazon customer reviews?

 Blog /  How web scraping is useful in finding Amazon customer reviews

  11 April 2022

How-web-scraping-is-useful-in-finding-Amazon-customer-reviews

The Internet is used to search for many things. These details are readily available, but it is difficult to store the same for later use.

One option is to manually copy the data and store it on your desktop. This is, however, an extremely time-consuming task. In such circumstances, web scraping plays an important role.

Web Scraping

Web scraping is a method of extracting vast amounts of information from websites and saving it to your computer. This information can be analyzed afterward.

Web Scraping

The first step is to determine whether data scraping is permitted on the website or not. This can be verified by including robots.txt at the end of the website's URL.

https://www.amazon.in/robots.txt

Protocol, domain, path, query-string, and fragment are the five parts of a URL. However, we will concentrate on three components: domain, path, and query string.

Web Scraping

STAGES:

  • Get the URL of the scraped page.
  • Examine the page's elements and determine which tags are required.
  • Take a look at the URL.
  • To find the element of required tags

Let's start coding right away!!

import requests
from bs4 import BeautifulSoup
                      

We'll start by importing the two libraries mentioned above. The library ‘Import requests’ method is used to retrieve information from a website. In which we request the URL and receive an answer. Along with the web page content, the answer will also provide a status code. The contents of a page are converted into an appropriate format by another library BeautifulSoup.

Cookies and the Header

In most cases, python queries don't require headers or cookies. But in some instances when we request the page content, we obtain a status code of 403 or 503. This implies access is not given to that web page. In these circumstances, we include headers and cookies to the request argument. get() is a function that returns a value.

Go to Amazon and search for a certain product to find your headers and cookies. Then right-click any element and select Inspect (or use the shortcut key Ctrl+Shift+I). Headers and cookies can be found under the Network tab.

Don’t share cookies.

The page content and status code for the needed query are obtained using a function. To continue with the process, a status code of 200 is needed.

def getAmazonSearch(search_query):
url="https://www.amazon.in/s?k="+search_query
print(url)
page=requests.get(url,cookies=cookie,headers=header)
if page.status_code==200:
    return page
else:
    return "Error"

Looking to have Amazon customer reviews?

Get a Quote!
Ideas Flow

Scraping ASIN Numbers and Product Names

Every item on Amazon has a number called a unique identification number. The ASIN (Amazon Standard Identification Number) is the name given to this number. We may have direct access to each product by using the ASIN number.

The product names and ASIN numbers can be extracted using the mentioned function.

data_asin=[]
response=getAmazonSearch('titan+men+watches')
soup=BeautifulSoup(response.content)
for i in soup.findAll("div",{'class':"sg-col-4-of-24 sg-col-4-of-12 sg-col-4-of-36 s-result-item sg-col-4-of-28 sg-col-4-of-16 sg-col sg-col-4-of-20 sg-col-4-of-32"}):
    data_asin.append(i['data-asin'])

The findall() function is used to locate all the HTML tags of the needed span, attribute, and value as specified in the parameter. These tag settings will be the same for all product names and asins on all product pages. Data ASIN content is just added to create a new list. We can access individual data-asin numbers and their respective sites using this list.

def Searchasin(asin):
url="https://www.amazon.in/dp/"+asin
print(url)
page=requests.get(url,cookies=cookie,headers=header)
if page.status_code==200:
    return page
else:
    return "Error"

We now proceed in the same manner as we did with ASIN numbers. Using the corresponding HTML tag, we extract all of the 'see all customer reviews' links for each product and add the href component to a new list.

link=[]
	for i in range(len(data_asin)):
	    response=Searchasin(data_asin[i])
	    soup=BeautifulSoup(response.content)
	    for i in soup.findAll("a",{'data-hook':"see-all-reviews-link-foot"}):
	        link.append(i['href'])

Extracting All Customer Reviews

Extracting All Customer Reviews

We now have all the links for each product. We can scrape all of the product reviews using these links. As a result, we create a function (same as previous functions) that extracts all the product reviews.

def Search reviews(review_link):
url="https://www.amazon.in"+review_link
print(url)
page=requests.get(url,cookies=cookie,headers=header)
if page.status_code==200:
    return page
else:
    return "Error"

All of the customer reviews are extracted and stored in a list using the above-defined procedure.

reviews=[]
	for j in range(len(link)):
	    for k in range(100):
	        response=Searchreviews(link[j]+'&pageNumber='+str(k))
	        soup=BeautifulSoup(response.content)
	        for i in soup.findAll("span",{'data-hook':"review-body"}):
	            reviews.append(i.text)

By adding '&page=2 or 3 or 4.' to the search query and repeating the procedures from scraping ASIN numbers, we can retrieve the details of any number of product pages.

Downloading Data in CSV Format

Now when we have scraped all the reviews, we need to put them in a file so that we can analyze them further.

rev={'reviews':reviews} #converting the reviews list into a dictionary
	review_data=pd.DataFrame.from_dict(rev) #converting this dictionary into a dataframe
	review_data.to_csv('Scraping reviews.csv',index=False)

We make a dictionary out of the reviews list. Then, to turn the dictionary into a data frame, Import the pandas package and utilize it. We then convert it to a CSV file and save it on a computer using the to_ CSV() function.

Looking for Amazon customer reviews data scraping services? Contact us now!

Send a message

Feel free to reach us if you need any assistance.

Contact Us

We’re always ready to help as well as answer all your queries. We are looking forward to hearing from you!

Call Us On

+1(832) 251 7311

Address

10685-B Hazelhurst Dr. # 25582 Houston,TX 77043 USA