How Web Scraping Google Play App Reviews Will Create Dataset for Sentiment Analysis?

 Blog /  How Web Scraping Google Play App Reviews Will Create Dataset for Sentiment Analysis?

  11 November 2021

how-web-scraping-google-play-app-reviews-will-create-dataset-for-sentiment-analysis

A guide to using Python to scrape Android App reviews and turn the data into a sentiment analysis database.

Let's look at how to scrape reviews and ratings for Android apps to produce a dataset for sentiment analysis. You'll save the material to CSV files after converting the application and reviewing the data into Data Frames.

Executing the code with Scripting with Pytorch (Google Calab)

Installing necessary packages and setting up the imports

You'll learn how to:

  • Establish an objective and criteria for including your dataset.
  • Look for real-world consumer comments on the internet.
  • Use Pandas to convert and store the dataset into CSV files, which you can find on Google Play.
  • The Dataset's Purpose

Setup:

import json
import pandas as pd
from tqdm import tqdm
import seaborn as sns
import matplotlib.pyplot as plt
from pygments import highlight
from pygments.lexers import JsonLexer
from pygments.formatters import TerminalFormatter
from google_play_scraper import Sort, reviews, app
%matplotlib inline
%config InlineBackend.figure_format='retina'
sns.set(style='whitegrid', palette='muted', font_scale=1.2)

The Target of the Dataset

You'd like to receive customer feedback on your items, whether positive or negative; both are valuable. You'd want to know what other people think of your app. Both the negative and positive features are advantageous. The negative one, on the other hand, may reveal critical features that are missing or service disruptions (when it is much more frequent).

Fortunately, Google Play offers a diverse selection of apps, ratings, and reviews. We can scrape app metadata and reviews using the google-play-scraper program.

When it comes to evaluating apps, you have a lot of alternatives. On the other hand, different app categories have diverse target audiences, domain-specific characteristics, and so on. Let's start with the fundamentals.

We need applications that have been around for a long so that natural feedback may be gathered. We want to keep the amount of advertising we utilise to a minimum. Because apps are updated on a regular basis, the date of the review is crucial.

In a perfect world, you'd collect every possible review and use it to your advantage. In the real world, however, data is frequently restricted (too large, inaccessible, etc.). As a result, we'll give it our all.

Let's take a look at a few apps that meet the Productivity category's requirements. We'll use AppAnnie to select a few of the most popular apps in the US:

app_packages = [
'com.anydo',
'com.todoist',
'com.ticktick.task',
'com.habitrpg.android.habitica',
'cc.forestapp',
'com.oristats.habitbull',
'com.levor.liferpgtasks',
'com.habitnow',
'com.microsoft.todos',
'prox.lab.calclock',
'com.gmail.jmartindev.timetune',
'com.artfulagenda.app',
'com.tasks.android',
'com.appgenix.bizcal',
'com.appxy.planner'
]

Extracting App Information

Scraping the information for every application

app_infos = []
	

for ap in tqdm(app_packages):
info = app(ap, lang='en', country='us')
del info['comments']
app_infos.append(info)

For each of the 15 apps, we are able to gather information. Let's create a method to make printing JSON objects easier:
def print_json(json_object):
json_str = json.dumps(
json_object,
indent=2,
sort_keys=True,
default=str
)
print(highlight(json_str, JsonLexer(), TerminalFormatter()))
 Here's an example of app data from the list:
print_json(app_infos[0])

{
"adSupported": null,
"androidVersion": "Varies",
"androidVersionText": "Varies with device",
"appId": "com.anydo",
"containsAds": null,
"contentRating": "Everyone",
"contentRatingDescription": null,
"currency": "USD",
  "description": "\ud83c\udfc6 Editor's Choice by Google\r\n\r\nAny.do is a To Do List, Calendar, Planner, Tasks & Reminders App That Helps Over 25M People Stay Organized and Get More Done.\r\n\r\n\ud83e\udd47 \"It\u2019s A MUST HAVE PLANNER & TO DO LIST APP\" (NYTimes, USA TODAY, WSJ & Lifehacker).\r\n\r\nAny.do is a free to-do list, planner & calendar app for managing and organizing your daily tasks, to-do lists, notes, reminders, checklists, calendar events, grocery lists and more.\r\n\r\n\ud83d\udcc5 Organize Your Tasks & To-Do List in Seconds\r\n\r\n\u2022 ADVANCED CALENDAR & DAILY PLANNER - Keep your to-do list and calendar events always at hand with our calendar widget. Any.do to-do list & planner support daily calendar view, 3-day Calendar view, Weekly calendar view & agenda view, with built-in reminders. Review and organize your calendar events and to do list side by side.\r\n\r\n\u2022 SYNCS SEAMLESSLY - Keeps all your to do list, tasks, reminders, notes, calendar & agenda always in sync so you\u2019ll never forget a thing. Sync your phone\u2019s calendar, google calendar, Facebook events, outlook calendar or any other calendar so you don\u2019t forget an important event.\r\n\r\n\u2022 SET REMINDERS - One time reminders, recurring reminders, Location reminders & voice reminders. NEW! Easily create tasks and get reminders in WhatsApp.\r\n\r\n\u2022 WORK TOGETHER - Share your to do list and assign tasks with your friends, family & colleagues from your task list to collaborate and get more done. \r\n\r\n---\r\n\r\nALL-IN-ONE PLANNER & CALENDAR APP FOR GETTING THINGS DONE\r\nCreate and set reminders with voice to your to do list. \r\nFor better task management flow we added a calendar integration to keep your agenda always up to date. \r\nFor better productivity, we added recurring reminders, location reminders, one-time reminder, sub-tasks, notes & file attachments. \r\nTo keep your to do list up to date, we\u2019ve added a daily planner and focus mode.\r\n\r\nINTEGRATIONS\r\nAny.do To do list, Calendar, planner & Reminders Integrates with Google Calendar, Outlook, WhatsApp, Slack, Gmail, Google Tasks, Evernote, Trello, Wunderlist, Todoist, Zapier, Asana, Microsoft to-do, Salesforce, OneNote, Google Assistant, Amazon Alexa, Office 365, Exchange, Jira & More.\r\n\r\nTO DO LIST, CALENDAR, PLANNER & REMINDERS MADE SIMPLE\r\nDesigned to keep you on top of your to do list, tasks and calendar events with no hassle. With intuitive drag and drop of tasks, swiping to mark to-do's as complete, and shaking your device to remove completed from your to do list - you can stay organized and enjoy every minute of it.\r\n\r\nPOWERFUL TO DO LIST TASK MANAGEMENT\r\nAdd a to do list item straight from your email / Gmail / Outlook inbox by forwarding do@Any.do. Attach files from your computer, Dropbox, or Google Drive to your to- tasks.\r\n\r\nDAILY PLANNER & LIFE ORGANIZER\r\nAny.do is a to do list, a calendar, an inbox, a notepad, a checklist, task list, a board for post its or sticky notes, a task & project management tool, a reminder app, a daily planner, a family organizer, an agenda, a bill planner and overall the simplest productivity tool you will ever have. \r\n\r\nSHARE LISTS, ASSIGN & ORGANIZE TASKS\r\nTo plan & organize projects has never been easier. Now you can share lists between family members, assign tasks to each other, chat and much more. Any.do will help you and the people around you stay in-sync and get reminders so that you can focus on what matters, knowing you had a productive day and crossed off your to do list.\r\n\r\nGROCERY LIST & SHOPPING LIST\r\nAny.do task list, calendar, agenda, reminders & planner is also great for shopping lists at the grocery store. Simply create a list on Any.do, share it with your loved ones and see them adding their shopping items in real-time.",
  "descriptionHTML": "\ud83c\udfc6 Editor's Choice by Google

Any.do is a To Do List, Calendar, Planner, Tasks & Reminders App That Helps Over 25M People Stay Organized and Get More Done.

\ud83e\udd47 "It\u2019s A MUST HAVE PLANNER & TO DO LIST APP" (NYTimes, USA TODAY, WSJ & Lifehacker).

Any.do is a free to-do list, planner & calendar app for managing and organizing your daily tasks, to-do lists, notes, reminders, checklists, calendar events, grocery lists and more.

\ud83d\udcc5 Organize Your Tasks & To-Do List in Seconds

\u2022 ADVANCED CALENDAR & DAILY PLANNER - Keep your to-do list and calendar events always at hand with our calendar widget. Any.do to-do list & planner support daily calendar view, 3-day Calendar view, Weekly calendar view & agenda view, with built-in reminders. Review and organize your calendar events and to do list side by side.

\u2022 SYNCS SEAMLESSLY - Keeps all your to do list, tasks, reminders, notes, calendar & agenda always in sync so you\u2019ll never forget a thing. Sync your phone\u2019s calendar, google calendar, Facebook events, outlook calendar or any other calendar so you don\u2019t forget an important event.

\u2022 SET REMINDERS - One time reminders, recurring reminders, Location reminders & voice reminders. NEW! Easily create tasks and get reminders in WhatsApp.

\u2022 WORK TOGETHER - Share your to do list and assign tasks with your friends, family & colleagues from your task list to collaborate and get more done.

---

ALL-IN-ONE PLANNER & CALENDAR APP FOR GETTING THINGS DONE
Create and set reminders with voice to your to do list.
For better task management flow we added a calendar integration to keep your agenda always up to date.
For better productivity, we added recurring reminders, location reminders, one-time reminder, sub-tasks, notes & file attachments.
To keep your to do list up to date, we\u2019ve added a daily planner and focus mode.

INTEGRATIONS
Any.do To do list, Calendar, planner & Reminders Integrates with Google Calendar, Outlook, WhatsApp, Slack, Gmail, Google Tasks, Evernote, Trello, Wunderlist, Todoist, Zapier, Asana, Microsoft to-do, Salesforce, OneNote, Google Assistant, Amazon Alexa, Office 365, Exchange, Jira & More.

TO DO LIST, CALENDAR, PLANNER & REMINDERS MADE SIMPLE
Designed to keep you on top of your to do list, tasks and calendar events with no hassle. With intuitive drag and drop of tasks, swiping to mark to-do's as complete, and shaking your device to remove completed from your to do list - you can stay organized and enjoy every minute of it.

POWERFUL TO DO LIST TASK MANAGEMENT
Add a to do list item straight from your email / Gmail / Outlook inbox by forwarding do@Any.do. Attach files from your computer, Dropbox, or Google Drive to your to- tasks.

DAILY PLANNER & LIFE ORGANIZER
Any.do is a to do list, a calendar, an inbox, a notepad, a checklist, task list, a board for post its or sticky notes, a task & project management tool, a reminder app, a daily planner, a family organizer, an agenda, a bill planner and overall the simplest productivity tool you will ever have.

SHARE LISTS, ASSIGN & ORGANIZE TASKS
To plan & organize projects has never been easier. Now you can share lists between family members, assign tasks to each other, chat and much more. Any.do will help you and the people around you stay in-sync and get reminders so that you can focus on what matters, knowing you had a productive day and crossed off your to do list.

GROCERY LIST & SHOPPING LIST
Any.do task list, calendar, agenda, reminders & planner is also great for shopping lists at the grocery store. Simply create a list on Any.do, share it with your loved ones and see them adding their shopping items in real-time.", "developer": "Any.do Calendar & To-Do List", "developerAddress": "Any.do Inc.\n\n6 Agripas Street, Tel Aviv\n6249106 ISRAEL", "developerEmail": "feedback+androidtodo@any.do", "developerId": "5304780265295461149", "developerInternalID": "5304780265295461149", "developerWebsite": "https://www.any.do", "free": true, "genre": "Productivity", "genreId": "PRODUCTIVITY", "headerImage": "https://lh3.googleusercontent.com/dZknnlk1LM8fYS3wjOvVHOmWKOGH1HAe691Yuh7LAeBj6a730A1CQqZnXxjNahAYUFFw", "histogram": [27291, 9246, 13735, 29904, 262997], "icon": "https://lh3.googleusercontent.com/zgOLUXCHkF91H8xuMTMLT17smwgLPwSBjUlKVWF-cZRFjlv-Uvtman7DiHEii54fbEE", "installs": "10,000,000+", "minInstalls": 10000000, "offersIAP": true, "price": 0, "privacyPolicy": "https://www.any.do/privacy", "ratings": 343174, "recentChanges": "Faster and smoother for better user experience!", "recentChangesHTML": "Faster and smoother for better user experience!", "released": "Nov 10, 2011", "reviews": 122170, "score": 4.43388, "screenshots": [ "https://lh3.googleusercontent.com/C-L3_FPMlKVrZItAORaszhnQzlzMyXcqF_-oGaabHm_OnwUW1jz02BXBVSKi0HRUtQ", "https://lh3.googleusercontent.com/uAP6G5ANQcgVs4Uj6yrcsAo4OUhejTJRVCXOxnAVA5Efit_OtAnrOYyL1SUHj1rv", "https://lh3.googleusercontent.com/AI5mLFu0Atsl0km2FO9_IwJXNy_1q1_X6Ua3EVMZNedp0dsDToDRaWQ1UDvI6mb1-I0", "https://lh3.googleusercontent.com/bYCAn3mjgB4ugSY0PL-PCcMBfbvXCSFkzL-pLSIIbZ8sQByQPerHboPQ2fA126K4LDtU", "https://lh3.googleusercontent.com/u-dX4lpTepsvXs33ds4xxYpApuGS4JBAEb0UsvY_fPbptxnF0QxaKNW0-tJVXaP8a1E", "https://lh3.googleusercontent.com/qvUz_9IXHQd6FSLUALZo8NKLx-s4uDGyElPOGRsU28TCEficQc0BoNRloRRLqUkH2A", "https://lh3.googleusercontent.com/tEyGs6MGlY97ccLc4c_HxV9xNOpsvwQyHz6uGAezkVtxm1ydAaTj5EZSUgqlg69qrrk", "https://lh3.googleusercontent.com/StN0i2BskOs6HCfaPO0DMBOCQMCag3okWVI_SlFJtMytwbgNMBnD5i9hbSqdNlGxffmn", "https://lh3.googleusercontent.com/GRKqWfo-PLzCKwpgZ8fej4PGsUp1q9eM5a3LQeiYCOW-KUpCOIHXOp3mteZWbJ-pz4My", "https://lh3.googleusercontent.com/pFQQ_qi8u92duWCNXpEcNKpH2lVpD_hFd5f-UlTP_f6wft3YyYLMzwLitxt-UI6G8vs", "https://lh3.googleusercontent.com/AoeCU6bT1x0eHRvJwvQyOSKJ31oSayox959qMNVaSzz3uN9bvk1cGek5zyRDe1BdtA", "https://lh3.googleusercontent.com/vICme1f4J9vFt8wY3xBY-LshGgYyvSbsa4TLJyEtNsy0alUI0i9oMQVq8oJ4l_yR1Aw", "https://lh3.googleusercontent.com/7sn9m__iVM-peiG6_jkKBuE-QVH_xDaycF_oR1XJlwcAC45ybNZ_Exor09ENOJ41Q2U", "https://lh3.googleusercontent.com/9I_m2ZXgPtiU4Po4cw_cyIaEpZxynxQ1n3YkhFgakATfbu63a8_f8vGQDxKOHYITzew" ], "size": "Varies with device", "summary": "Task Manager \u2705 Organizer \ud83d\udcc5 Agenda \ud83d\udcdd Daily Reminders \ud83d\udd14 All-in-One Simple App.", "summaryHTML": "Task Manager \u2705 Organizer \ud83d\udcc5 Agenda \ud83d\udcdd Daily Reminders \ud83d\udd14 All-in-One Simple App.", "title": "Any.do: To do list, Calendar, Planner & Reminders", "updated": 1586258773, "url": "https://play.google.com/store/apps/details?id=com.anydo&hl=en&gl=us", "version": "Varies with device", "video": "https://www.youtube.com/embed/2nkllLD0x6o?ps=play&vq=large&rel=0&autohide=1&showinfo=0", "videoImage": "https://i.ytimg.com/vi/2nkllLD0x6o/hqdefault.jpg" }

This offers a great deal of information, such as the number of ratings, reviews, and ratings for each score (1 to 5). Let's set aside all of that and have a look at their lovely icons:

def format_title(title):
sep_index = title.find(':') if title.find(':') != -1 else title.find('-')
if sep_index != -1:
title = title[:sep_index]
return title[:10]
	
fig, axs = plt.subplots(2, len(app_infos) // 2, figsize=(14, 5))

for i, ax in enumerate(axs.flat):
    ai = app_infos[i]
    img = plt.imread(ai['icon'])
    ax.imshow(img)
    ax.set_title(format_title(ai['title']))
    ax.axis('off')

We can save the app information for later by converting the JSON objects into a Pandas data frame and saving the output to a CSV file:

app_infos_df = pd.DataFrame(app_infos)
	app_infos_df.to_csv('apps.csv', index=None, header=True)
tools

Scraping App Reviews

You may use the scraping tool to create a balanced dataset by filtering the review score. And, to receive a sample of evaluations for each app, you may arrange the reviews by how helpful they are, which Google Play considers to be the most essential factor.

We're looking for:

  • A well-balanced dataset — each score (1–5) has nearly the same number of reviews.
  • A representative sample of each app's reviews

Looking to conduct sentiment analysis based on Google Play app reviews?

Get a Quote!
Ideas Flow

You may achieve the first criterion by utilizing the scrape package option to filter the review score. For the second, we'll sort the reviews by helpfulness, which suggests which are the most important to Google Play. Just in case, we'll get a subset from the most recent:

app_reviews = []
	

	for ap in tqdm(app_packages):
	  for score in list(range(1, 6)):
	    for sort_order in [Sort.MOST_RELEVANT, Sort.NEWEST]:
	      rvs, _ = reviews(
	        ap,
	        lang='en',
	        country='us',
	        sort=sort_order,
	        count= 200 if score == 3 else 100,
	        filter_score_with=score
	      )
	      for r in rvs:
	        r['sortOrder'] = 'most_relevant' if sort_order == Sort.MOST_RELEVANT else 'newest'
	        r['appId'] = ap
	      app_reviews.extend(rvs)

Each review includes the app's id and sort order. Consider the following as an example:

	print_json(app_reviews[0])

{
	  "appId": "com.anydo",
	  "at": "2020-04-05 22:25:57",
	  "content": "Update: After getting a response from the developer I would change my rating to 0 stars if possible. These guys hide behind confusing and opaque terms and refuse to budge at all. I'm so annoyed that my money has been lost to them! Really terrible customer experience. Original: Be very careful when signing up for a free trial of this app. If you happen to go over they automatically charge you for a full years subscription and refuse to refund. Terrible customer experience and the app is just OK.",
	  "repliedAt": "2020-04-07 14:09:03",
	  "replyContent": "Our policy and TOS are completely transparent and can be found in the Help Center and our main page. In addition, payment can only be made upon the user's authorization via the app and Google Play. We provide users with a full 7 days trial to test the app with an additional 48 hours for a refund, along with priority support for all issues.",
	  "reviewCreatedVersion": "4.17.0.3",
	  "score": 1,
	  "sortOrder": "most_relevant",
	  "thumbsUpCount": 37,
	  "userImage": "https://lh3.googleusercontent.com/a-/AOh14GiHdfNEu1DwwcJ6yNyju8Yvn4JwjpzuXvD74aVmDA",
	  "userName": "Andrew Thomas"
	}


repliedAt and replyContent are the developer's answer to the review is included in the content which can be sometimes found missing.

len(app_reviews)

How Many Reviews will you receive?

len(app_reviews)
15750
Save the reviews to CSV files
app_reviews_df = pd.DataFrame(app_reviews)
app_reviews_df.to_csv('reviews.csv', index=None, header=True)

Summary

We now have over 15K user reviews from 15 different productivity apps.

Scripting with Pytorch is used to run the code (Google Calab)

Installing required packages and configuring imports

You learned how to:

  • Define the goals and expectations for your dataset.
  • Google Play app information can be scraped.
  • Google Play app user reviews can be scraped.
  • Save the data as a CSV file.

Following that, we'll use BERT to analyse the reviews for sentiment.

Are you looking for a way to harvest reviews from Google Play? Request a quotation from ReviewGators today.

Send a message

Feel free to reach us if you need any assistance.

Contact Us

We’re always ready to help as well as answer all your queries. We are looking forward to hearing from you!

Call Us On

+1(832) 251 7311

Address

10685-B Hazelhurst Dr. # 25582 Houston,TX 77043 USA