how to scrap data using Node JS

Sometimes we need data for testing purposes or any purposes and as a developer, you may not going to enter all the data manually. Instead of doing manually how about take data from any running website. Wait, what? without accessing the database how can somebody take data from a running website which is also not owned by you? Yes, it’s possible and that is what called scrapping. Let’s see then how to scrap data using Node JS.

In this article, we are going to scrap mobile phone data from amazon.in. For reference, I am using the below link. If it’s broken in the future please comment in the below comment box, I will fix it.

https://www.amazon.in/b/ref=s9_acss_bw_cg_CPACSM20_6c1_w?node=21505777031&pf_rd_m=A1K21FY43GMZF8&pf_rd_s=merchandised-search-20&pf_rd_r=5XSNDP2G44RCF9N66HD0&pf_rd_t=101&pf_rd_p=f987b4ed-d4a8-4c28-8154-cf72460802c9&pf_rd_i=1389401031

Prerequisites

Node JS

If you dont have Node JS setup follow Install NodeJS in Windows/Linux/macOS.

Project Setup

To setup the project open your terminal and follow the below command

mkdir amazon-scrapper

cd amazon-scrapper

npm init  //this will ask some questions just follow along.

npm install x-ray

touch index.js

After completion of above commands , we will be ready to start the code part.

Code time

So, In the previous step, we installed x-ray library from npm. This library will help us to scrap data from amazon.in.

To begin code part, open index.js file and these codes in that file.

Step 1 – import required libraries.

const fs = require('fs');
const XRay = require('x-ray');

fs is needed because once our scrapping is done we will save that data in a file. and X-ray is needed for scrapping.

Step 2 – Instantiate X-ray library to use

const x = XRay();

Step 3 – add scrapping functionalities.

const url =
	'https://www.amazon.in/b/ref=s9_acss_bw_cg_CPACSM20_6c1_w?node=21505777031&pf_rd_m=A1K21FY43GMZF8&pf_rd_s=merchandised-search-20&pf_rd_r=5XSNDP2G44RCF9N66HD0&pf_rd_t=101&pf_rd_p=f987b4ed-d4a8-4c28-8154-cf72460802c9&pf_rd_i=1389401031';

x(url, '.s-result-list > li', [
	{
		path: '.a-link-normal@href',
		title: 'a h2',
		price: '.s-price',
	},
])
	.then(res => {
		console.log(res);

		fs.writeFileSync('data.json', JSON.stringify(res, null, 2));
	})
	.catch(err => {
		console.log('ERROR', err);
	});

You can see in x() function I am passing few parameters.

First parameter : URL of website from where we are scrapping data.

Second parameter: Parent DOM classes followed by child DOM li of list.

Third parameter: It is an array of objects with key and value. The key can be anything but the value is Children DOM of the second parameter where our actual data exists. You can inspect and see their DOM on the given link.

Step 4 – Execute / Run the index.js file

type the following command on terminal and will create data.json file with the information we needed in JSON format.

node index.js

So that’s all about how to scrap data using Node JS. Hope you liked this article. If you getting any difficulties let me know in the below comment box.

I am adding this codes on Github, you can clone it from here and play with it.

see also

Leave a Reply

Your email address will not be published. Required fields are marked *