Monitoraggio RAID Software con Nagios

From RVM Wiki
Jump to navigation Jump to search

Creare il file del plugin, oppure prelevarlo da:

scp support.rvmgroup.it:/usr/lib/nagios/plugins/check_md /tmp
sudo mv /tmp/check_md /usr/lib/nagios/plugins/check_md
sudo chmod +x /usr/lib/nagios/plugins/check_md

<source lang=perl> cat <<'EOFile' | sudo tee /usr/lib/nagios/plugins/check_md

  1. !/usr/bin/env perl
  1. Get status of Linux software RAID for SNMP / Nagios
  2. Author: Michal Ludvig <michal@logix.cz>
  3. http://www.logix.cz/michal/devel/nagios
  4. Simple parser for /proc/mdstat that outputs status of all
  5. or some RAID devices. Possible results are OK and CRITICAL.
  6. It could eventually be extended to output WARNING result in
  7. case the array is being rebuilt or if there are still some
  8. spares remaining, but for now leave it as it is.
  9. To run the script remotely via SNMP daemon (net-snmp) add the
  10. following line to /etc/snmpd.conf:
  11. extend raid-md0 /root/parse-mdstat.pl --device=md0
  12. The script result will be available e.g. with command:
  13. snmpwalk -v2c -c public localhost .1.3.6.1.4.1.8072.1.3.2

use strict; use Getopt::Long;

  1. Sample /proc/mdstat output:
  2. Personalities : [raid1] [raid5]
  3. md0 : active (read-only) raid1 sdc1[1]
  4. 2096384 blocks [2/1] [_U]
  5. md1 : active raid5 sdb3[2] sdb4[3] sdb2[4](F) sdb1[0] sdb5[5](S)
  6. 995712 blocks level 5, 64k chunk, algorithm 2 [3/2] [U_U]
  7. [=================>...] recovery = 86.0% (429796/497856) finish=0.0min speed=23877K/sec
  8. unused devices: <none>

my $file = "/proc/mdstat"; my $device = "all";

  1. Get command line options.

GetOptions ('file=s' => \$file,

       'device=s' => \$device,
       'help' => sub { &usage() } );
    1. Strip leading "/dev/" from --device in case it has been given

$device =~ s/^\/dev\///;

    1. Return codes for Nagios

my %ERRORS=('OK'=>0,'WARNING'=>1,'CRITICAL'=>2,'UNKNOWN'=>3,'DEPENDENT'=>4);

    1. This is a global return value - set to the worst result we get overall

my $retval = 0;

my (%active_devs, %failed_devs, %spare_devs);

open FILE, "< $file" or die "Can't open $file : $!"; while (<FILE>) {

       next if ! /^(md\d+)+\s*:/;
       next if $device ne "all" and $device ne $1;
       my $dev = $1;
       my @array = split(/ /);
       for $_ (@array) {
               next if ! /(\w+)\[\d+\](\(.\))*/;
               if ($2 eq "(F)") {
                       $failed_devs{$dev} .= "$1,";
               }
               elsif ($2 eq "(S)") {
                       $spare_devs{$dev} .= "$1,";
               }
               else {
                       $active_devs{$dev} .= "$1,";
               }
       }
       if (! defined($active_devs{$dev})) { $active_devs{$dev} = "none"; }
               else { $active_devs{$dev} =~ s/,$//; }
       if (! defined($spare_devs{$dev}))  { $spare_devs{$dev}  = "none"; }
               else { $spare_devs{$dev} =~ s/,$//; }
       if (! defined($failed_devs{$dev})) { $failed_devs{$dev} = "none"; }
               else { $failed_devs{$dev} =~ s/,$//; }
       $_ = <FILE>;
       /\[(\d+)\/(\d+)\]\s+\[(.*)\]$/;
       my $devs_total = $1;
       my $devs_up = $2;
       my $stat = $3;
       my $result = "OK";
       if ($devs_total > $devs_up or $failed_devs{$dev} ne "none") {
               $result = "CRITICAL";
               $retval = $ERRORS{"CRITICAL"};
       }
       print "$result - $dev [$stat] has $devs_up of $devs_total devices active (active=$active_devs{$dev} failed=$failed_devs{$dev} spare=$spare_devs{$dev})\n";

} close FILE; exit $retval;

  1. =====

sub usage() {

       printf("

Check status of Linux SW RAID

Author: Michal Ludvig <michal\@logix.cz> (c) 2006

       http://www.logix.cz/michal/devel/nagios

Usage: mdstat-parser.pl [options]

 --file=<filename>    Name of file to parse. Default is /proc/mdstat
 --device=<device>    Name of MD device, e.g. md0. Default is \"all\"

");

       exit(1);

} EOFile </source>

Renderlo eseguibile:

sudo chmod +x /usr/lib/nagios/plugins/check_md

Il plugin è richimabile con l'opzione --device=/dev/mdx per testare un solo device, o senza, per testare tutti i device.


sudoedit /etc/nagios/nrpe_local.cfg
command[check-md0]=/usr/lib/nagios/plugins/check_md --device=/dev/md0
command[check-md1]=/usr/lib/nagios/plugins/check_md --device=/dev/md1

Riavviare nrpe:

sudo /etc/init.d/nagios-nrpe-server restart

Riferimenti