Here is a new version of a simple script I have written to create local copies of websites suitable for browsing offline. We have been using the program successfully at the university to schedule downloads of websites during off peak hours of internet usage for reading the following day.
The program uses the *nix wget utility to do its magic. My code is simply a wrapper around wget which sets the proper command line arguments for wget to create a mirror website. The script uses conservative settings by default for fetching sites in order to be respectful to website owners and other users of the network. Once a site is downloaded the program automatically zips the file in a tar gz for you. You will need python and wget installed in order to run.
Here are some basic examples of how it can be used:
To view the command's help:
./offline_browser --help
To create a browseable copy of this website you can type:
./offline_browser http://www.saintsjd.com/malawi
To create a browseable copy of this website, and clean up all downloaded files, keeping only the final tar gz archive use the -c option:
./offline_browser -c http://www.saintsjd.com/malawi
Some web masters prevent requests from wget. To identify yourself as if you were browsing with Internet Explorer type:
./offline_browser -U IE http://www.saintsjd.com/malawi
Or for Firefox identification, with clean up:
./offline_browser -U FF -c http://www.saintsjd.com/malawi
Have fun, and send me your bugs and improvements!
Attachment | Size |
---|---|
offline_browser | 4.54 KB |