In looking for lightweight, simple, and extensible network monitoring systems, I haven’t come across any that really strike me as all three of those (extensible and lightweight are a killer for most).
I’ve decided that writing my own seems to be the best way (for now).
This is the Superintendent Project.
The Idea
I want to be able to get raw stats from the things I need to monitor, and most Linux distros already have great tools for this:
-
Process Monitoring. The
pidofcommand makes this dead simple; Ifpidof apache2returns a bunch of numbers, it’s running, and if the result is blank, it’s not. Couldn’t be easier. -
Apache Status. You may or may not be aware of this, but if you visit
http://localhost/server-statuson any machine running Apache2 with the default config, you’ll get a bunch of statistics about Apache2. TurnExtendedStatuson, and you get even more information. - Postfix Statistics. Need to know how much mail is in the queue? There’s a command for that, built-in.
-
Shorewall Status. Run
shorewall status. That’s all that needs to be said.
This works for pretty much any program you can think of; if there isn’t a tool, you can always grep the logs.
So, what if I made a PHP script that ran all these commands, and sends the raw output securely to a remote server? That way, if any of the commands change, none of the clients need to be updated, just the server—just the Superintendent.
How It’s Going To Work
It’s really very simple. A PHP script will be running on the server machine that runs all of the commands above (and more) and gets their raw output. This PHP script can either be set to daemonize itself, or be run by Cron.
At the specified intervals (assuming Cron is used) all of the output will be posted to an SSL enabled server using PHP’s CURL functions. The server will then parse the output—meaning that if the clients update, and the command output changes, the clients don’t all need to be updated, just the server—and you can choose to do with it what you want. In my case, it will be output to a web-based interface and a wall-mounted monitor.
In addition, several other techniques will be used to monitor the server; things like ping and response time can be handled by the Superintendent server. If no information is received from a node, it can be assumed down, and other backup checks will occur (check if Apache is still responsive, if it’s responding to ping at all, possibly even parse the status page of the server’s datacenter for possible problems.)
I’m sure I will run into pitfalls in the development of this, but it should be pretty awesome when all the bugs are ironed out. It will be lightweight, cross-platform (as the server accepts raw data), simple, and extensible.