HTML Sitemap Gen V0.1
This page tells you how to configure HTML Sitemap Gen V0.1 to automatically create a HTML Sitemap for you.
Firstly, this script is best run ON THE WEBSERVER!!!!! You can run it from a client, and then change the script
to FTP/SCP your sitemap across but it will eat bandwidth. The system will download every single link in your
urllist.txt file, so if you have a lot consider the bandwidth implications very carefully. Obviously the script
will also run much faster from the server itself.
About
-----
HTML Sitemap Gen V0.1 is a small script designed to generate HTML sitemaps based on a list of URLs. It can be
scheduled to automatically run using your cron daemon of choice. It is released under the GNU GPL and is
copyright Ben Tasker 2009.
For more information on the license please see either the LICENSE file or visit
http://benscomputer.no-ip.org/LICENSE
Dependancies
------------
The script does have a few dependancies, but most systems should already have them, if not they are easy enough
to get. You will need
- wget
- sed
- awk
- tr
- BASH
- grep
Configuring the script
----------------------
The first thing you need to configure is the script itself, open it in a text editor and configure the variables
to represent your system. There are only the three at the top that you need to worry about.
URLLIST should contain a path to your urllist file, this should be a text file containing all the URLs that you
wish to index (I'm working on an automatic generator for that, so check the website soon!) and should contain
one URL per line
for example
http://benscomputer.no-ip.org/
http://benscomputer.no-ip.org/projects.html
TEMPLATE should point to the sitemap template that you wish to use, this document contains information on how to
configure the template.
SITEMAPLOCAT should point to the final resting place of your sitemap. There's no point generating one if it's
not going to be published in some form, you can either specify a full path to a file (make sure it end in .html)
or point to a directory. If you do the latter then the sitemap will keep the filename sitemap.html
If you want, you can also change the cd /tmp to another directory. This simply states where the temporary files
will be housed whilst the script is running, the script is pretty good at cleaning up after itself so most
people should be able to leave this as is.
Configuring the Sitemap Template
--------------------------------
You need to provide the system with a HTML template for your sitemap, the template is split into two sections
HEADER and FOOTER. Anything in the HEADER section will appear in the sitemap before the actual listings, whilst
anything in the FOOTER section will appear afterwards. The sections are identified using the tag
to both open and close the HEADER, and to open and close the footer. A basic
example is included in the file example_sitemap_template.html.
You can include whatever you like inbetween these tags, though only use each of the tags once to open, once to
close. Also be aware that depending on the size of the site, the generated sitemap could contain thousands of
links, so it would be wise to avoid Server Side Includes.
Running the program
-------------------
Once everything is configured, you are ready to run the program. Simply run HTML_gen.sh from the commandline
(you may need to chmod +x HTML_gen.sh first) and wait for it to finish. It does output a lot of information,
this is more for information, and you can pipe it to a logfile using > logfile.txt if you really feel the need.
To have it run automatically add it to your crontab.