Differences between revisions 13 and 14
Revision 13 as of 2006-08-07 15:11:07
Size: 5143
Editor: FelixFrank
Comment: still nothing on X4100 traps
Revision 14 as of 2006-09-07 16:20:16
Size: 5430
Editor: FelixFrank
Comment: continuing the Sun-X4100-SNMP drama
Deletions are marked like this. Additions are marked like this.
Line 50: Line 50:
=== [update 2006-09-07] ===
MCS suggested to download the image of a new X4100 support CD. We found naught but the SUN-PLATFORM-MIB, which isn't any more helpful than it was half a year ago. Waltraut sent another request, including more details on our problem with the missing OIDs.

TableOfContents

Email: MailTo(ffrank@ifh.de)

Design decisions

announce_services

  1. Queries for RPM packages. The initial idea was a construct like {{{ map { chomp; $installed{$_} = 1 } $rpm -qa;

}}} which would have allowed a fast and elegant check like use_service($service_name) if $installed{$package};. This would, however, require exact package names including version numbers in announce_nagios (like firefox-1.0.8-1.4.1.SL3.1.i386). BRSo i chose a different mode of retrieval of the packages installed: {{{ my $installed = 'XxX'.$rpm -qa; $installed =~ s/\n/XxX/g; }}} to allow a more flexible, yet slower query like use_service($service_name) if $installed =~ /XxX$package/;

update_nagios

  1. The subs that make changes to cfg-files may appear quite confusing. They were designed after realizing that, originally, each file was looped over in very similar fashions (i.e., heavy script code redundancy). The applied model closely resembles functional programming: the  filter  sub does not much more but concatenate sets of input lines into "sections" (which are described by parameter), and perform a specific action on such a section. The action is chosen by passing the desired sub-reference to  filter . A swarm of subs defines how all kinds of sections from cfg-files are to be treated. This solution does not facilitate enhanced readability (on the contrary, actually). But it should enable easy changes and additions.

SNMP trapping

v20z

Sun v20z keeps sending mystery traps. Normally, all SP-EVENT traps are supposed to be located under .1.3.6.1.4.1.9237.2.1.1.6 (SP-MasterAgent-MIB::spEvent), we receive traps with the all-too-short OID of .1.3.6.1.4.1.9237 (SP-MasterAgent-MIB::newisys), which is the beginning of Sun's (?) enterprise tree. The cause might be a bug in trapd2. TODO:

  1. check back with nino and upgrade the original trapd :(

  2. try and debug our solution (./)

  3. apply a workaround: it may well be possible that appending .2.1.1.6 to the OIDs will have the traps make more sense. Still, no variables were ever received along a trap like that, and nothing appeared in any logs on the source machines, so they may still remain inconclusive :\

conclusion: Some debugging suggested that the traps in questions are not truncated or misinterpreted but indeed malformed and inconclusive. Two flavors of truncated OID have been seen so far:

  • .1.3.6.1.4.1.9237
  • .1.3.6.1.4.1.9237.2.1

A workaround similar to the one described in 3 is still possible, but chances are the traps won't tell us much.

v65x

We are in possession of MIBs that at least seem to contain usable TRAP OIDs. However, we haven't found a way to enable SNMP-traps yet (i.e., we need to tell the machines where to direct the traps).

[update 2006-08-04]

Andreas found a way to feed the trap-sink to the v65x-es. We had no testing opportunity so far. Firewall reconfiguration will probably be necessary. The MIBs couldn't be verified either.

X4100

Sun X4100 (galaxy) machines are rather mysterious. The traps are from the OID-subtree .1.3.6.1.4.1.3183. Few MIBs covering it can be found using google (and none on the Sun page, AFAIK), and those we found were from DELL and Intel, respectively. The defined traps seem to follow a certain standard, however, powered by DELL and Intel ([ http://www.dell.com/content/topics/global.aspx/power/en/ps1q03_intel ], pdf: [ http://www.dell.com/downloads/global/vectors/2002_asf.pdf ]).

It appears that Sun chose to go so far along as to use traps inside that "Wired for Managemant" OID-Tree, but they seem to be rather invented. We tried provoking a galaxy machine to send a "Power redundancy degraded" trap (which is in the DELL-ASF-MIB we found online). A trap was generated, but not only did it not have the expected OID, but it wasn't anywhere in the ASF-MIB at all.

We have not yet found a source of information about the true OIDs of the PETs from Sun machines. IBM als hosts information on PETs: [http://publib.boulder.ibm.com/infocenter/eserver/v1r2/index.jsp?topic=/diricinfo/fqm0_r_events_pet.html].

[update 2006-07-07]

Appearantly, certain voltage warnings are processed OK, while their respective recovery traps are not recognized (making snmptt discard them and not informing nagios), we also noted other events that were never recognized in the first placed and not reported to nagios at all.

[update 2006-08-07]

We have always been in possession of the "SUN PLATFROM MIB", again pointed out to us by our MCS contact. It is known to be unrelated to the hardware traps. We thouroughly searched another X4100 CD (N1 System Manager) and found a set of MIBs in a package meant for Soloaris on x86 (not in the Linux or Sparc version of the same package). Most of those MIBs referred to noumerous management-or-other related information. Nothing on the received hardware traps could be found, the CD and all MIBs thereon proofed useless just as the other one.

[update 2006-09-07]

MCS suggested to download the image of a new X4100 support CD. We found naught but the SUN-PLATFORM-MIB, which isn't any more helpful than it was half a year ago. Waltraut sent another request, including more details on our problem with the missing OIDs.


CategoryHomepage

FelixFrank (last edited 2008-11-03 12:39:23 by FelixFrank)