Sometimes you have to do a lot of work to get the answer for a simple question. What goods do your customers prefer and why? What products do they don’t dislike and why they don’t?
Product reviews on Amazon are the a perfect indicator of you customers’ mood and the a direct rate of your work as a seller. The same can be said about your business competitors – and, in a greater extent, about manufacturers – and as well as reviews under their products.
Positive reviews can be used as a part of you advertisinge. By showing them to your audience you are literally saying: “Look, many people liked this product, that’s why you should try it too!” Negative opinions are the a chance to pay attention to your customers’ problems, by replying to these reviews you show yourself as a polite and ready to help person who is ready to help, and able to hear your customers, solving any issues.
Frankly, it could be really hard or even impossible to read all the texts under thousands of products you sell via an Internet marketplace, if you are armed only with your browser and a mouse.
So before we get into reading and content analysis, we should decide how to download Amazon reviews automatically. The data you may need depends on how deep and complicated your research supposes to be. But in the vast majority of cases for each review you should gather:
- Product Name
- Review Title
- Review Content/ Text
- Rating
- Date of publishing review
- Verified Purchase
- Author Name
- URL
There are a lot of ways to download or scrape these data and we will take a closer look at them below.
Software for downloading Amazon reviews
For those who prefer a ready to use services here is a short list of software for downloading Amazon reviews.
- A Review Fetch is SaaS (Software as a Service) that helps you to automatically display your customer reviews on your site. It may be really handy if you already have a good reputation as a seller and want to increase sales displaying the good reviews on your marketplace.
- OutScraper is a common web data extractor offering an Amazon reviews scraper as one of its services. The API section for extracting reviews deserves the special attention as it can act as a liaison between Amazon and your site or any other client software to work with reviews. Notice that you will need to authenticate to watch this API section.
- Amazon reviews scraper from Infovium is a part of a large e-commerce websites scraping tool concentrated on the most complex data extraction from Amazon products for price comparison. You can use it to work with reviews too, but it may seem too complicated for this specific task.
- Grepsr is a nice and modern looking web data scraping tool that has an ability to download Amazon product reviews. It supports any Amazon domains, has a built-in protection from blocking and QA checks to prevent any Amazon’s changes by Amazon affecting on your data parsing process. It also offers you synchronization with popular storage systems – Dropbox, Google Drive, Amazon S3 etc. – or the an option to download Amazon reviews in CSV or JSON formats.
Scrape Amazon reviews with Python
The following paragraph may be useful for Python programmers who are looking for a fast quick example of Amazon reviews scraping.
Python is a very flexible programming language, it is definitely good for web scraping, but the exact way to use it is always up to you. Without a doubt aboutquestioning your creativity, we’ll just say – it is considered a bad practice among programmers to reinvent the wheel and in most cases it is better to inspect someone’s code than to write your own.
Let’s take a look at two articles describing how to legally scrape Amazon reviews with Python.
Python + SelectorLib
The first decision is from Scrape Hero. Besides Except for pure Python, they use a Chrome extension called SelectorLib to mark up the parsing data through Developer Toolbar and create a template with all selectors we will need during scraping. This template is saved as a YML file that is available to Python script. Inside the script it takes a couple of lines of code to initialize and call the special SelectorLib Extractor that gets data from the site using the YML template.
On the one hand, most of the time when writing a scraper we are dealing with selectors and the logic of data extraction itself and it is a good idea to separate this inner logic from all the external algorithms of iterating through pages and URLs, making requests and working with files or databases to write down the result. If the data extraction logic changes, we simply change the YML file without rewriting the script.
On the other hand, the same data extraction logic can be described in the usual manner as the a separate Python function called in one line of code after getting the HTML data from the response. It is much more convenient for most programmers than dealing with some external scraper.
Python + Scrapy
The second article describing how to scrape Amazon reviews Python way is from Usession Buddy. It offers the classic way of data crawling with Scrapy. They inspect the data patterns from a web page with DevTools and create a simple Scrapy Spider that uses CSS-selectors to extract the reviews. The output data is saved into a CSV file.
Comparing to Scrape Hero code, in this decision solution you will find no error handling or any protection against blocking. But the compact and easy- to- read script can be supplemented at your discretion. Scrapy makes asynchronous requests working with a list of URLs, so it should be faster than SelectorLib Extractor.
Conclusion
We talked about main goals of Amazon reviews scraping and took a quick view at some tools and tips that may help you with data extraction.