Downloading Web Pages Using WGet

1. Download and install WGet. WGet is a command-line tool that allows users to non-interactively download remote files from a network. As of this writing, the latest versions of the Windows installer can be found at the URL “http://gnuwin32.sourceforge.net/packages/wget.htm“.

2. In any convenient location, create a new directory named “WGetTest”.

3. In the newly created WGetTest directory, create a new text file named “WGet-DownloadWikipediaArticles.bat”, containing the following text.

for /F %%i in (ArticleNames.txt) do wget.exe --page-requisites --convert-links --html-extension http://en.wikipedia.org/wiki/%%i 
pause

4. Still in the WGetTest directory, create a new text file named “ArticleNames.txt”, containing the following text.

Calculus
Gottfried_Leibniz
Isaac_Newton

5. Execute Wget-DownloadWikipediaArticles.bat. A console window will appear, and WGet will connect to Wikipedia to download articles on Calculus, Isaac Newton, and Gottfried Leibnitz. These articles will appear in a new subdirectory named “en.wikipedia.org”.

6. Open the “en.wikipedia.org” directory and browse the articles. Note that all of the requisite images and files referenced by the articles will be incorporated into the .html files themselves, which may make for large files and slow loading times.

Advertisements
This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s