Over the last few years, this Web site has grown pretty big. Managing it manually isn't difficult, but is prone to error. To help alleviate this problem, I've written a few small site management utilities. You might find them useful.
The utilities are Perl scripts, so they should be portable. All except one take as input a list of files, so you can use them on either one page or an entire site. The scripts rely on several Unix command line utilities such as find, stty, and touch; I use Cygwin's DOS versions.
Most of these utilities expect a very specific page format, and rely on information stored in <META>
tags or comments. All expect dates to be in ISO format: yyyy-mm-dd. (December 7, 1941 in ISO format is 1941-12-07). More documentation of formats can be found in the source code of each utility.
The "big site scrubber" is actually three Perl scripts, called from one DOS batch file.
(I know, I know: I'm slowly creating yet another HTML preprocessor.)
This DOS batch file takes two parameters, a directory dir and a number of days N. A list is made of the all files in the directory dir and subdirectories with extension html that have been modified within the last N days. These files are then run through the scripts expire.pl, updateFooterDates.pl, and expireWhatsNewItems.pl. A list of changes made is written to the screen and an output file.
Expires out of date text & tags. This is useful for automatically removing out of date "new" () and "updated" () images. The utility searches for specially formatted comment pairs [ <!-- EXPIRE yyyy-mm-dd -->
text to expire<!-- /EXPIRE -->
] and removes the comments and enclosed text if the date has passed.
Expires old images. This is useful for automatically removing out of date "new" () and "updated" () images. The utility searches for <IMG> tags; ones that have an EXPIRES="yyyy-mm-dd" attribute with a date already past are deleted. If an expiring <IMG> tag is surrounded by <A>...</A> tags, those tags are also removed.
Note. Use of expireIMGs.pl is deprecated, due to its requirement of adding a nonstandard EXPIRES
attribute to IMG
tags. Use expire.pl instead.
Removes out of date items from the "What's New" section of my home page. Details of the section format can be found in comments in the utility's source.
Updates the human-readable "last updated" date at the bottom of a page to match the ISO date in that page's <META NAME="date"> tag.
This script indexes a site based on META keywords. It creates an HTML page that indexes pages based on keywords found in their <META NAME="keywords"> tags. Here's an example index. An individual page's title is taken from its <META NAME="description"> tag. Sorting of keywords and page descriptions is case-insensitive.
If you maintain local and remote copies of your site and FTP files between them, you might find this script useful. It automates the process. You'll need to create an empty file named synch.$$$ in the root of your local Web directory for this to work.
Validating your Web pages and CSS files is a chore. This script submits all files changed since it was last run to the W3's online validators, collates the results, and creates a Web page listing which files are valid, which aren't, and what errors were found. This script is designed to run as a CGI, but also works standalone. You'll need to create an empty timestamp file for this to work; see the script's comments for instructions.
If your browser doesn't want to download these files, you can get all of the above in a Zip file (13,031 bytes), or as text files:
They're not great code, and you'll probably have to modify them, but hey, what do you want for free?
Last updated 28 June 2003
http://www.rdrop.com/~half/General/WebSiteUtilities/index.html
All contents ©2001-2002 Mark L. Irons.