How download all javascript file from website?

“Please create a ZIP for us, so we can download with 1 click.” — I think nearly every developer has heard something like this a lot of times.

But times change and we do not have the resources to shift files with multiple GB of data within applications and there are even more architecture guidelines [especially on prem] telling us where to store the files and how to access them. [We are not using S3 on prem — this would simplify it a lot]

How to handle it?

We can go for various Front-End plugins or try to use Download managers, but most of the time this will need additional components and installations.

I tried to use the straight forward way and I want to show the limitations.

First of all here you can find the project to try on your own: //github.com/robertdiers/js-multi-file-download

Limitations:

  • Browser may warn you because of multiple files getting downloaded from one page
  • Downloads will ask for the target folder each time, when feature is activated
  • Keep an eye on CORS regulations

#2 Multi-File-Download using FileSaver project

I have used it really often, but now we have to try with multiple files: //github.com/eligrey/FileSaver.js/

Example: //github.com/robertdiers/js-multi-file-download/blob/master/src/main/resources/static/index.html

Having a list of URLs [filename und download-url] you can start the downloads using the fetch command:

urls.forEach[function [e] {                
fetch[e.download]
.then[res => res.blob[]]
.then[blob => {
saveAs[blob, e.filename];
}];
}];

This will download the files in parallel, Google Chrome is doing this with a defined amount of files. I have now idea how it sets this number, as it differs from execution to execution. But it works.

Be careful, this code will start to download the file to your Browser and than sending it to your disc.

#3 StreamSaver.js

The new one: //github.com/jimmywarting/StreamSaver.js

I’m working in a highly regulated environment without Internet access and I’m afraid this will not work out of the box or may required approvals…

So StreamSaver creates a own man in the middle that installs the service worker in a secure context hosted on github static pages. either from a iframe [in secure context] or a new popup if your page is insecure.

To be honest — I haven’t investigated more because of this. Feel free to give your experiences with this one in the comments if you like to :-]

EDIT 2022–07–21

#4 using a mix of named approaches:

Open a new tab for the downloads [as Browser will use dedicated memory for this], but execute all downloads sequentially after each other with FileSaver.js [will do it one-by-one without overloading the backend].

  • Browser does not handle download list anymore — no unexpected failures [will be done by the new JS in new tab]
  • Download throttling could be implemented

To be honest I haven’t tried the last one— but let me know in comments how you solved it :-]

Learn how to "download" a website with all its resources [scripts, styles and images] with Node.js

Disclaimer

In no case shall we [Our Code World] or the developer of this module be liable for direct, indirect, special or other consequential damages resulting from the use of this module or from the web pages downloaded with this module. Use it at your own risk.

And with our disclaimer we don't talk about that you computer will explode by using the module that we'll use to copy a website. We only warn that this script should not be used for illegal activities [like, fake a website and expose it in another web domain], but learning more about Node.js and web development. 

Having said that, have you ever seen an awesome website with some awesome gadget or widget that you absolutely want to have or learn how to do but you don'f find an open source library that does that? Because as first step, that's what should do as first, look for an open source library that creates that awesome gadget and if it exists, implement it in your own project. If you don't find it, then you can use the chrome dev tools to inspect the element and see superficially how it works and how you could create it by yourself. However if you're not so lucky or you don't have the skills to copy a feature through the dev tools, then you still have a change to do it.

What would be better than having the entire code that creates the awesome widget and edit it as you want [thing that will help you to understand how the widget works]. That is precisely what you're going to learn in this article, how to download an entire website through its URL with Node.js using a web scraper. Web Scraping [also termed Screen Scraping, Web Data Extraction, Web Harvesting etc] is a technique employed to extract large amounts of data from websites whereby the data is extracted and saved to a local file in your computer or to a database in table [spreadsheet] format.

Requirements

To download all the resources from a website, we are going to use the website-scraper module. This module allows you to download an entire website [or single webpages] to a local directory [including all the resources css, images, js, fonts etc.]. 

Install the module in your project executing the following command in the terminal:

npm install website-scraper

Note

Dynamic websites [where content is loaded by js] may be saved not correctly because website-scraper doesn't execute js, it only parses http responses for html and css files.

Visit the official Github repository for more information here.

1. Download a single page

The scrape function returns a Promise that makes requests to all the providen urls and saves all files found with sources to directory. The resources will be organized into folders according to the type of resources [css, images or scripts] inside the providen directory path. The following script will download the homepage of the node.js website:

const scrape = require['website-scraper'];

let options = {
    urls: ['//nodejs.org/'],
    directory: './node-homepage',
};

scrape[options].then[[result] => {
    console.log["Website succesfully downloaded"];
}].catch[[err] => {
    console.log["An error ocurred", err];
}];

Save the previous script in a js file [script.js] and then execute it with node using node index.js. Once the script finishes, the content of the node-homepage folder will be:

And the index.html file from a web browser will look like:

All the scripts, style sheets were downloaded and the website works like a charm. Notice that the only error shown in the console, is due to the analytics script from Google that you should obviously remove from the code manually.

2. Download multiple pages

If you're downloading multiple pages of a website, you should provide them simultaneously in the same script, scraper is smart enough to know that a resource shouldn't be downloaded twice [but only if the resource has been already downloaded from the same website in another page] and it will download all the markup files but not the resources that already exist.

In this example, we are going to download 3 pages of the node.js website [index, about and blog] specified in the urls property. The content will be saved in the node-website folder [where the script is executed], if it doesn't exists it will be created. To be more organized, we are going to sort out every type of resources manually in different folders respectively [images, javascript, css and fonts]. The sources property specifies with an array of objects to load, specifies selectors and attribute values to select files for loading.

This script is useful if you want specifically some web pages:

const scrape = require['website-scraper'];

scrape[{
    urls: [
        '//nodejs.org/', // Will be saved with default filename 'index.html'
        {
            url: '//nodejs.org/about',
            filename: 'about.html'
        },
        {
            url: '//blog.nodejs.org/',
            filename: 'blog.html'
        }
    ],
    directory: './node-website',
    subdirectories: [
        {
            directory: 'img',
            extensions: ['.jpg', '.png', '.svg']
        },
        {
            directory: 'js',
            extensions: ['.js']
        },
        {
            directory: 'css',
            extensions: ['.css']
        },
        {
            directory: 'fonts',
            extensions: ['.woff','.ttf']
        }
    ],
    sources: [
        {
            selector: 'img',
            attr: 'src'
        },
        {
            selector: 'link[rel="stylesheet"]',
            attr: 'href'
        },
        {
            selector: 'script',
            attr: 'src'
        }
    ]
}].then[function [result] {
    // Outputs HTML 
    // console.log[result];
    console.log["Content succesfully downloaded"];
}].catch[function [err] {
    console.log[err];
}];

3. Recursive downloads

Imagine that you don't need only specific web pages from a website, but all the pages of it. A way to do it, is to use the previous script and specify manually every URL of the website that you can get to download it, however this can be counterproductive because it will take a lot of time and you will probably overlook some URLs. That's why Scraper offers the recursive download feature that allows you to follow all the links from a page and the links from that page and so on. Obviously, that would lead to a very very long [and almost infinite] loop that you can limit with the max allowed depth [maxDepth property]:

const scrape = require['website-scraper'];

let options = {
    urls: ['//nodejs.org/'],
    directory: './node-homepage',
    // Enable recursive download
    recursive: true,
    // Follow only the links from the first page [index]
    // then the links from other pages won't be followed
    maxDepth: 1
};

scrape[options].then[[result] => {
    console.log["Webpages succesfully downloaded"];
}].catch[[err] => {
    console.log["An error ocurred", err];
}];

The previous script should download more pages:

Filter external URLs

As expected in any kind of website, there will be external URLs that don't belong to the website that you want to copy. To prevent that those pages are downloaded too, you can filter it only if the URL matches with the one you use:

const scrape = require['website-scraper'];

const websiteUrl = '//nodejs.org';

let options = {
    urls: [websiteUrl],
    directory: './node-homepage',
    // Enable recursive download
    recursive: true,
    // Follow only the links from the first page [index]
    // then the links from other pages won't be followed
    maxDepth: 1,
    urlFilter: function[url]{
        
        // If url contains the domain of the website, then continue:
        // //nodejs.org with //nodejs.org/en/example.html
        if[url.indexOf[websiteUrl] === 0]{
            console.log[`URL ${url} matches ${websiteUrl}`];
            return true;
        }
        
        return false;
    },
};

scrape[options].then[[result] => {
    console.log["Webpages succesfully downloaded"];
}].catch[[err] => {
    console.log["An error ocurred", err];
}];

That should decrease in our example the number of downloaded pages:

4. Download an entire website

Note

This task require much time, so be patient.

If you want to download an entire website, you can use the recursive download module and increase the max allowed depth to a reasonable number [in this example not so reasonable with 50, but whatever]: 

// Downloads all the crawlable files of example.com.
// The files are saved in the same structure as the structure of the website, by using the `bySiteStructure` filenameGenerator.
// Links to other websites are filtered out by the urlFilter
const scrape = require['website-scraper'];
const websiteUrl = '//nodejs.org/';

scrape[{
    urls: [websiteUrl],
    urlFilter: function [url] {
        return url.indexOf[websiteUrl] === 0;
    },
    recursive: true,
    maxDepth: 50,
    prettifyUrls: true,
    filenameGenerator: 'bySiteStructure',
    directory: './node-website'
}].then[[data] => {
    console.log["Entire website succesfully downloaded"];
}].catch[[err] => {
    console.log["An error ocurred", err];
}];

Final recommendations

If the CSS or JS code of the website are minified [and probably all of them will be], we recommend you to use a beautifier mode for the language [cssbeautify for css or js-beautify for Javascript] in order to pretty print and make the code more readable [not in the same way that the original code does, but acceptable]. 

Happy coding !

How do I download a JavaScript file from a website?

open the Js script link and press ctrl+s i.e save it. Save it by any desired name and then copy it your project folder and then include it in your project files where you have included other files like jquery and css. Thanks first for your help!. When I press ctrl + s its save the all page.

How do I download all source files from a website?

To download a website's HTML source code, navigate using your favorite browser to the page, and then select SAVE PAGE AS from the FILE menu. You'll then be prompted to select whether you want to download the whole page [including images] or just the source code. The download options are common for all browsers.

How do I download a file using JavaScript?

So the steps for downloading the file will be:.
Use fetch API to download the script file..
Transform the data to a blob type..
Convert the blob object to a string by using URL. createObjectURL[] ..
Create an element to download the string..

How do I download url data?

Download File from URL.
Go to the URL..
Right-click the webpage..
Select Save As....

Bài mới nhất

Chủ Đề