#5, First Floor, 4th Street , Dr. Subbarayan Nagar Kodambakkam, Chennai-600 024 pro@slogix.in

Office Address

  • #5, First Floor, 4th Street Dr. Subbarayan Nagar Kodambakkam, Chennai-600 024 Landmark : Samiyar Madam
  • pro@slogix.in
  • +91- 81240 01111

Social List

How to perform Web scraping using R?
Description

Description

To perform web scraping using R

Functions Used

read_html(url) – To scrap the HTML content from a given URL
html_nodes(“.class” or “#id”) – To call nodes based on CSS class or

id
html_text(x) – To strip the HTML tags and extracts only the text

Libraries required :

library(‘selectr’)
library(‘xml2’)
library(‘rvest’)
library(‘stringr’)

Process

  Load necessary libraries

  Get the URL of the web page

  Read the contents of the web page

  Get the necessary details from the web page using the predefined functions with the help of the corresponding HTML tags

  Right click on the data in the web page and click inspect to get the HMTL tags of that particular data

Sapmle Code

library(rvest)
library(xml2)
library(stringr)
#Specifying the url for desired website to be scrapped
url1 <-“https://www.indiabookstore.net/isbn/9781606868829”
#Reading the html content from the web page
webpage1 webpage1
#To get the name of the book
title_html title_html
title head(title)
# remove all space and new lines
str_replace_all(title, “[\r\t\n]”,””)
#To get the price
price_html price head(price)
#To get the book description
bookdes_html<-html_nodes(webpage1,”span.readable_box_small”)
book_des head(book_des[1])
#To get hte ratings
rating_html rating head(rating)
str_replace_all(rating, “[\r\t\n]”,””)
#To get the number of ratings and reviews
ra_re_html ra_re head(ra_re[1])

Screenshots