1 / 7

Build a Text Dataset from AMAZON

Learn how to collect data from webpages using web scraping techniques. This tutorial covers the basics of web scraping, including downloading webpages using the Requests library and parsing them using BeautifulSoup. The dataset used in this tutorial contains approximately 12,000 reviews for 180 laptops, with a total of about 712,000 review words. Each review varies in length, ranging from 2 words to about 600 words, with an average of about 60 words.

gnatali
Download Presentation

Build a Text Dataset from AMAZON

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Build a Text Dataset from AMAZON Raymond ZHAO Wenlong(03/07/2018)

  2. Collect data In the data age In StatisticalML/DL/NLP, volumes of data is a key. We could collect data from the wide world of web.

  3. HTML HTML stands for Hyper Text Markup Language. HTML describes the structure of Web pages.

  4. Web scraping Download the webpage and parse it.

  5. The process Download:Requests is a HTTP library Parse:BeautifulSoup is to parses a web page See the developed script amazon_scraper.py

  6. The dataset There are about 12k reviews for 180 laptops, and about 712k review words totally. Each review is from 2 words to about 600 words; The mean is about 60 words. See the AMAZON dataset amazon_reviews.json

  7. Thanks Thanks Dr. Wong, David and Linkai

More Related