Description
This is a CGI script I have written to check hyperlinks on a single HTML page and to help weed out dead or unavailable links. This is the first Python program I have written, and there may be a few flaws, which I would gladly like to know of if you can tell me.
Requirements
A GNU/Linux or *BSD system having:
- Python interpreter: (I have used Python 2.2.3 and
2.3.2)
http://www.python.org/ - Web server with CGI support
I have used Apache 2.0.47
http://httpd.apache.org/ - A bit of patience and persistence
Download
Get it here: cgilink-0.2.tar.gz - (9.0K).
I have tested it on GNU/Linux and FreeBSD only, and I'd love to know if you are using it on other operating systems as well.
Installation
Extract the archive. All the files except cgilink.cgi are referred to (by me) as accompanying files, and should be placed in a directory accessed by the web server. Then, make sure you have given proper permissions to the files. Then, edit the file cgilink.cgi and set the value of SERVER_INSTALL_DIR to the place where you have kept the accompanying files on the web server. For example, if you can access the folder where you have placed the accompanying files by browsing to http://myserver/my/folder/accom/files/, SERVER_INSTALL_DIR should be set to "/my/folder/accom/files/". Finally, place the cgilink.cgi file in your /cgi-bin/ directory or any other directory from which you can run CGI scripts. If the directory is not /cgi-bin/, edit the example.html in the accompanying files, and in the form, change "/cgi-bin/cgilink.cgi" to the actual server path where you have placed cgilink.cgi. Make sure that you have given it executable permissions.
Testing
Browse to the example.html in the installed folder. Enter the URL you wish to check, as http://someserver/somepage.html or http://someserver/somefolder/. In this version, the pages which end in a directory can be given with or without the trailing slash, and omitting the protocol will automatically prepend the given URL with http://.
Limitations
It fails when it cannot parse the HTML properly. Other limitations may be present; if you test it and find out some, I'd really appreciate it.
License
This software is in the PUBLIC DOMAIN. I will not be responsible for any damage/losses caused by the use of this software.
Contact the author
Please mail me comments, suggestions, advice at ee03b091 @ ee.iitm.ac.in
Please do send me your comments on this software.
Last modified: Sun Jul 29 14:50:17 IST 2007