[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

18. History

LinkController was originally inspired by MOMspider and having the MOMspider code available was very useful when starting the creation of this kit, but, it shares almost no code with MOMspider, other than what has comes to it from the LibWWW-Perl library.

Philosophically, the MOMspider heritage is obvious in the wish to handle big jobs efficiently. In the working practice there are far more differences than similarities, partly caused by Perl language changes.

I decided to completely separate the exploration of the local infostructure, looking for links to be checked, from the actual checking process. This means that checking can be spread over a large number of days and still run efficiently.

The basic aim of this link checking kit is to be able to efficiently handle any size of link checking job. At the bottom end we have checking new pages as they are written. Here we want to use information from previous checks to avoid having to check all of each page every time. At the other end we have massive info structures (sites) which deal in many thousands of links and could not possibly all be checked in one day. For this latter case the aim is to be able to efficiently spread the link checking load into all available low usage periods.

My primary aim in writing this was not to write very efficient code for the small scale case (takes minimum time to do everything), but rather code which would scale well. If your system can check 1000 links in two days, it will hopefully be able to check almost 7000 links in two weeks. I'm trying to make sure all data structures which grow with the number of links are kept on disk.


[ << ] [ >> ]           [Top] [Contents] [Index] [ ? ]

This document was generated by Michael De La Rue on January, 6 2002 using texi2html