`wget` is a command-line utility used for non-interactive downloading of files from the web. It is designed to be robust, recursive, and ideal for mirroring websites or downloading large files over HTTP, HTTPS, and FTP protocols.
- Designed for **retrieving** content - Handles **recursive downloads** and **site mirroring** - Can **resume interrupted downloads** - Ideal for scripting and automation - Works well without user interaction
# Wget vs curl | Feature | `wget` | `curl` | |------------------------|----------------------------------|----------------------------------| | Protocols | HTTP, HTTPS, FTP | Many (HTTP, HTTPS, FTP, SMTP, etc.) | | Recursive Download | Yes | No | | Resume Support | Yes | Partial | | Sending Data (POST) | No | Yes | | Ideal Use Case | Site mirroring, bulk downloads | API calls, single requests | | Scripting Suitability | Excellent for downloading | Excellent for data interaction |
# Example Use Case
You want to download the full content of a web page, including all associated assets (images, stylesheets, scripts) from h2g2.com ![]()
The example below downloads the page and all required assets so you can browse it offline, maintaining the structure and appearance of the original site.
wget --mirror \ --convert-links \ --adjust-extension \ --page-requisites \ --no-parent \ https://h2g2.com/edited_entry/A266591
# Explanation of Flags * `--mirror`: Enables options suitable for mirroring * `--convert-links`: Converts links for local viewing * `--adjust-extension`: Saves files with proper file extensions * `--page-requisites`: Downloads all necessary assets (images, CSS, JS) * `--no-parent`: Prevents climbing to parent directories
# Why `wget` is a Good Fit * **Simplicity:** It’s easy to use and doesn’t require a complex setup. * **Compatibility:** It runs on almost any platform, making it accessible for the original community members. * **Efficiency:** It handles large numbers of pages efficiently and avoids redundant downloads. * **Automation:** You can script it to run periodically or as needed, making it straightforward to maintain a static mirror of the site.
# Steps to Scale up with `wget` 1. **Initial Run:** Start by mirroring the entire site to get a baseline static copy. 2. **Incremental Updates:** You can set up cron jobs or scheduled tasks to periodically update the static mirror, ensuring it stays current. 3. **Distribution:** The resulting static site can be easily shared or hosted on platforms like Netlify, and the simplicity means anyone familiar with basic command-line tools can manage it.
# Future Considerations * **Testing:** Start with a test run on a smaller subset of the site to ensure everything works as expected before scaling up. * **Community Involvement:** Provide clear documentation or scripts so that community members can contribute to maintaining and updating the static mirror.