How Web Scraping is Used for Analyzing Yelp Restaurants Data?

 Blog /  How Web Scraping is Used for Analyzing Yelp Restaurants Data

  13 April 2022

How-web-scraping-is-used-for-analyzing-yelp-restaurants-data

Whenever you want to go for a pleasant lunch in the restaurants, you generally prefer to check for the hotels in your region that have the top ratings. Yelp is a website that assists you in filtering restaurants based on user ratings and reviews. The User Interface of Yelp makes it simple to filter and search results based on a person's desired location, food preference, and rating.

Here, Yelp restaurant API is used to scrape Yelp restaurant data results for in New Jersey, Kearny, and then filter out the major 10 highly rated hotels and plotted the histogram to see how restaurants are distributed based on the ratings.

API on earth

API ON EARTH

There are numerous definitions of API and how it is useful for the websites. API is an interface which provides the medium of communication among two servers. This software allows two apps to interact with one another through the use of an interface. The term "API" refers to the service provider that accepts request in an HTTP format and delivers to another server. The server responds to API and gives back.

APIs are used to retrieve data, communicate with other software components, and so on. The data in an API could be in either XML or JSON format. Obtaining data in API, you must generate the key for an API. The API key is required for retrieving data through an API because it assists user authentication. And the key also assists us in recognizing the user. Never share the key with anyone. The access should be given to user and server only.

Web Scraping

Let us look at what Web Scraping services is and how we can do it. Web scraping is the technique of gathering web data and applying it for data analysis to get usable information.

Yelp's Fusion API is used:

GET https://docs.developer.yelp.com/docs/fusion-intro

GET request can be made using the URL mentioned above, it is the HTTP request which will allow to retrieve data from Yelp's search API for restaurants. Now API is known to you so, let us create an API key that will allow you to access the data. The form provided below must be completed to generate the API key, which will result in client ID with the API key.

Get API Key

Functioning Web scrapping

Now when we have the key, let us set up an environment and execute web scraping. Here Google Collab is used, which is a free open-source IDE permits us to develop and run Python code, similar to the Jupyter. The primary step is to build up the libraries, which include the following:

Functioning Web scrapping

Let's look at the functions that each library offers:

  • For getting data, request library will be used for generating the HTTP request.
  • For building the data frame that is used for analyzing data, the pandas library is employed.
  • For making an array, the Numpy library is needed.
  • itertools is a library that allows you to use a for loop to iterate over a data structure.
  • The display library will be used to present facts in an organized manner.
  • Matplotlib is a library that assists you for plotting and generating figures.

Following the installation of the libraries, the request of HTTP is made to URL and saved the database in JSON format. The data is examined after getting it to see if there were any duplicate values, null values, typos, or missing values. If any values are present in the data, data cleaning is performed to remove any of those redundant parts of information, because they are useless in data analysis. You can continue because your data from Yelp's business search API did not contain any of these redundant values. The response that is obtained by running a request in JSON format is shown below.

installation of the libraries

As it can be seen, the data includes details such as the phone number, address, rating, location, and reviews. It is completed by using Yelp's suggested parameters, that are included in the HTTP request's query string.

installation of the libraries

Another step was to use the for loop and go through the JSON text for getting the values for address, reviews, rating, phone, and name, then save the result on the database frame and print the same. The following outcome was obtained:

installation of the libraries

The data available is in a clear form but the duplicate values must be deleted.

Wish to have analysis on Yelp restaurant data?

Request a Quote!
Ideas Flow

Data Cleaning

To get rid of duplicate values, we first counted how many of them were in the database. After evaluating the data, we used the pandas method drop duplicates () for removing duplicated entries based on Phone Number, which is unique.

installation of the libraries

Duplicate values have now been eliminated from the data

Analyze the Data

After cleaning the data, you can use the describe() method to find out how many ratings and reviews of a restaurant has on an average. you learned that the number of ratings of a hotel is near '4', and the amount of reviews are near '152'.

installation of the libraries

Now when all this information is available, you can search for the major ten restaurants in the area with the best rating.

installation of the libraries

The number of restaurants represented on x-axis, while y-axis reflects the rating in the histogram above. There are over 200 restaurants having a '4.0' rating, over 150 restaurants with a '4.5' rating, and over 50 restaurants with a '5.0' rating.

Conclusion

This blog is a beginner's tutorial on restaurant data scraping and getting acquainted with the most often used libraries for online web scraping. This blog will teach you how to interact with APIs and how to use Python in performing data scraping and analysis. To sum up the findings, it is found that the most difficult part of this blog was locating the database, cleanup the data, and then understanding data scraping.

Looking for a scraping analysis of restaurants? Contact ReviewGators!

Send a message

Feel free to reach us if you need any assistance.

Contact Us

We’re always ready to help as well as answer all your queries. We are looking forward to hearing from you!

Call Us On

+1(832) 251 7311

Address

10685-B Hazelhurst Dr. # 25582 Houston,TX 77043 USA