HTML Sitemap Gen V0.1 This page tells you how to configure HTML Sitemap Gen V0.1 to automatically create a HTML Sitemap for you. Firstly, this script is best run ON THE WEBSERVER!!!!! You can run it from a client, and then change the script to FTP/SCP your sitemap across but it will eat bandwidth. The system will download every single link in your urllist.txt file, so if you have a lot consider the bandwidth implications very carefully. Obviously the script will also run much faster from the server itself. About ----- HTML Sitemap Gen V0.1 is a small script designed to generate HTML sitemaps based on a list of URLs. It can be scheduled to automatically run using your cron daemon of choice. It is released under the GNU GPL and is copyright Ben Tasker 2009. For more information on the license please see either the LICENSE file or visit http://benscomputer.no-ip.org/LICENSE Dependancies ------------ The script does have a few dependancies, but most systems should already have them, if not they are easy enough to get. You will need - wget - sed - awk - tr - BASH - grep Configuring the script ---------------------- The first thing you need to configure is the script itself, open it in a text editor and configure the variables to represent your system. There are only the three at the top that you need to worry about. URLLIST should contain a path to your urllist file, this should be a text file containing all the URLs that you wish to index (I'm working on an automatic generator for that, so check the website soon!) and should contain one URL per line for example http://benscomputer.no-ip.org/ http://benscomputer.no-ip.org/projects.html TEMPLATE should point to the sitemap template that you wish to use, this document contains information on how to configure the template. SITEMAPLOCAT should point to the final resting place of your sitemap. There's no point generating one if it's not going to be published in some form, you can either specify a full path to a file (make sure it end in .html) or point to a directory. If you do the latter then the sitemap will keep the filename sitemap.html If you want, you can also change the cd /tmp to another directory. This simply states where the temporary files will be housed whilst the script is running, the script is pretty good at cleaning up after itself so most people should be able to leave this as is. Configuring the Sitemap Template -------------------------------- You need to provide the system with a HTML template for your sitemap, the template is split into two sections HEADER and FOOTER. Anything in the HEADER section will appear in the sitemap before the actual listings, whilst anything in the FOOTER section will appear afterwards. The sections are identified using the tag to both open and close the HEADER, and to open and close the footer. A basic example is included in the file example_sitemap_template.html. You can include whatever you like inbetween these tags, though only use each of the tags once to open, once to close. Also be aware that depending on the size of the site, the generated sitemap could contain thousands of links, so it would be wise to avoid Server Side Includes. Running the program ------------------- Once everything is configured, you are ready to run the program. Simply run HTML_gen.sh from the commandline (you may need to chmod +x HTML_gen.sh first) and wait for it to finish. It does output a lot of information, this is more for information, and you can pipe it to a logfile using > logfile.txt if you really feel the need. To have it run automatically add it to your crontab.