Differences between revisions 8 and 9
Revision 8 as of 2006-07-03 16:08:38
Size: 3832
Editor: FelixFrank
Comment: info on SNMP traps on Sun X4100
Revision 9 as of 2006-07-07 14:20:10
Size: 4141
Editor: FelixFrank
Comment: more on X4100 SNMP
Deletions are marked like this. Additions are marked like this.
Line 38: Line 38:
=== [update 2006-07-07] ===
Appearantly, certain voltage warnings are processed OK, while their respective recovery traps are not recognized (making snmptt discard them and not informing nagios), we also noted other events that were never recognized in the first placed and not reported to nagios at all.

TableOfContents

Email: MailTo(ffrank@ifh.de)

Design decisions

announce_services

  1. Queries for RPM packages. The initial idea was a construct like {{{ map { chomp; $installed{$_} = 1 } $rpm -qa;

}}} which would have allowed a fast and elegant check like use_service($service_name) if $installed{$package};. This would, however, require exact package names including version numbers in announce_nagios (like firefox-1.0.8-1.4.1.SL3.1.i386). BRSo i chose a different mode of retrieval of the packages installed: {{{ my $installed = 'XxX'.$rpm -qa; $installed =~ s/\n/XxX/g; }}} to allow a more flexible, yet slower query like use_service($service_name) if $installed =~ /XxX$package/;

update_nagios

  1. The subs that make changes to cfg-files may appear quite confusing. They were designed after realizing that, originally, each file was looped over in very similar fashions (i.e., heavy script code redundancy). The applied model closely resembles functional programming: the  filter  sub does not much more but concatenate sets of input lines into "sections" (which are described by parameter), and perform a specific action on such a section. The action is chosen by passing the desired sub-reference to  filter . A swarm of subs defines how all kinds of sections from cfg-files are to be treated. This solution does not facilitate enhanced readability (on the contrary, actually). But it should enable easy changes and additions.

SNMP trapping

v20z

Sun v20z keeps sending mystery traps. Normally, all SP-EVENT traps are supposed to be located under .1.3.6.1.4.1.9237.2.1.1.6 (SP-MasterAgent-MIB::spEvent), we receive traps with the all-too-short OID of .1.3.6.1.4.1.9237 (SP-MasterAgent-MIB::newisys), which is the beginning of Sun's (?) enterprise tree. The cause might be a bug in trapd2. TODO:

  1. check back with nino and upgrade the original trapd :(

  2. try and debug our solution (./)

  3. apply a workaround: it may well be possible that appending .2.1.1.6 to the OIDs will have the traps make more sense. Still, no variables were ever received along a trap like that, and nothing appeared in any logs on the source machines, so they may still remain inconclusive :\

conclusion: Some debugging suggested that the traps in questions are not truncated or misinterpreted but indeed malformed and inconclusive. Two flavors of truncated MIB have been seen so far:

  • .1.3.6.1.4.1.9237
  • .1.3.6.1.4.1.9237.2.1

A workaround similar to the one described in 3 is still possible, but chances are the traps won't tell us much.

X4100

Sun X4100 (galaxy) machines are rather mysterious. The traps are from the OID-subtree .1.3.6.1.4.1.3183. Few MIBs covering it can be found using google (and none on the Sun page, AFAIK), and those we found were from DELL and Intel, respectively. The defined traps seem to follow a certain standard, however, powered by DELL and Intel ([ http://www.dell.com/content/topics/global.aspx/power/en/ps1q03_intel ], pdf: [ http://www.dell.com/downloads/global/vectors/2002_asf.pdf ]).

It appears that Sun chose to go so far along as to use traps inside that "Wired for Managemant" OID-Tree, but they seem to be rather invented. We tried provoking a galaxy machine to send a "Power redundancy degraded" trap (which is in the DELL-ASF-MIB we found online). A trap was generated, but not only did it not have the expected OID, but it wasn't anywhere in the ASF-MIB at all.

As we have not yet found a source of information about the true OIDs of the PETs from Sun machines. IBM als hosts information on PETs: [http://publib.boulder.ibm.com/infocenter/eserver/v1r2/index.jsp?topic=/diricinfo/fqm0_r_events_pet.html].

[update 2006-07-07]

Appearantly, certain voltage warnings are processed OK, while their respective recovery traps are not recognized (making snmptt discard them and not informing nagios), we also noted other events that were never recognized in the first placed and not reported to nagios at all.


CategoryHomepage

FelixFrank (last edited 2008-11-03 12:39:23 by FelixFrank)