Forum Discussion

spiritie's avatar
3 years ago

Various Linux distros - SNMP disk OID change

Hi LM Community

I'm having a issue that I've searched these forums and the web, but unable to find anyone who have a solution on this matter.

We are monitoring Linux server with SNMP, to be specific the DataSources we are having problems with is:
https://www.logicmonitor.com/support/monitoring/os-virtualization/filesystem-monitoring

  • SNMP_Filesystems_Usage
  • SNMP_Filesystems_Status

These 2 DataSources are the "new ones" when it comes to Linux monitoring and disk status + usage, we have removed all of the old DataSources described in the article.

The issue is when the SNMP service is restarted, or the Linux machine is restarted, then SNMPD allocated what seem to be random OIDS to the disks each time.

I've created a support case with LogicMonitor, but it got shrugged off as they haven't heard of this issue before, I cannot believe that we are the only ones that has seen this problem.

Example alarm is:

Host: <REDACTED>
Datasource: Filesystem Capacity-/run/snapd/ns
InstanceGroup: @default
Datapoint: StorageNotAccessible
Level: warn
Start: 2021-12-10 15:50:22 CET
Duration: 0h 12m
Value: 1.0
ClearValue: 0.0
Reason: StorageNotAccessible is not = 1: the current value is 0.0

We have seen this on CentOS, Ubuntu 16, 18, 20. Sometimes it's multiple disks, other times it doesn't happen. The solution is to run active discovery on the resource again.
I think part of the problem is that the WildCard used is the SNMP OID that changes, if the WildCard was the mount point name, this would not had been an issue.

I've partly solved this by changing the discovery schedule on the DataSource's from 1 day, to 15 minutes, then the monitoring works again.

Do anyone have any idea what could be causing this? 

Regards.

10 Replies

  • SNMP index OID shuffling is common, and it's why LM uses the WILDALIAS/Instance Name as the unique identifier for an instance, while the WILDVALUE/Instance Value is the index OID. The wildvalue can change without losing an instance and its history. AD needs to run after re-shuffling to make data reporting work correctly, as the wildvalue is used to match up reported data to the instance.

    Generally, this shouldn't happen often enough for it to throw off more than a poll here and there.

    How often are you restarting SNMPd? 

    Making AD run more often is one way to mitigate.
     

  • SNMPD version:
    Package: snmpd
    Version: 5.8+dfsg-2ubuntu2.3
    Priority: optional
    Section: net
    Source: net-snmp
    Origin: Ubuntu

  • Anonymous's avatar
    Anonymous

    I'll do some testing on this in my environment. At any rate, let's help by understanding some terminology: the tail end of the OID is what's referred to in SNMP as the object index. I've not heard of SNMPd shuffling those particular indices before, but it wouldn't surprise me. The object index is referred to in LogicMonitor as the WildAlias.

    I'm not at the same version of SNMPd, so let me get things updated and test if it works. I know in 5.7, i got the same results before and after an snmpd restart.

  • Anonymous's avatar
    Anonymous

    Ubuntu 20.04

    sweenig@ubuntu2004vm:~$ sudo apt show snmpd -a
    Package: snmpd
    Version: 5.8+dfsg-2ubuntu2.3
    Priority: optional
    Section: net
    Source: net-snmp
    Origin: Ubuntu
    Maintainer: Ubuntu Developers <ubuntu-devel-discuss@lists.ubuntu.com>
    Original-Maintainer: Net-SNMP Packaging Team <pkg-net-snmp-devel@lists.alioth.debian.org>
    Bugs: https://bugs.launchpad.net/ubuntu/+filebug
    Installed-Size: 144 kB
    Pre-Depends: init-system-helpers (>= 1.54~)
    Depends: libc6 (>= 2.4), libsnmp35 (= 5.8+dfsg-2ubuntu2.3), debconf (>= 0.5) | debconf-2.0, adduser, debconf, lsb-base (>= 3.2-13), libsnmp-base
    Suggests: snmptrapd
    Homepage: http://net-snmp.sourceforge.net/
    Download-Size: 56.4 kB
    APT-Manual-Installed: yes
    APT-Sources: http://archive.ubuntu.com/ubuntu focal-updates/main amd64 Packages
    Description: SNMP (Simple Network Management Protocol) agents
     The Simple Network Management Protocol (SNMP) provides a framework
     for the exchange of management information between agents (servers)
     and clients.
     .
     The Net-SNMP agent is a daemon which listens for incoming SNMP
     requests from clients and provides responses.
    
    Package: snmpd
    Version: 5.8+dfsg-2ubuntu2
    Priority: optional
    Section: net
    Source: net-snmp
    Origin: Ubuntu
    Maintainer: Ubuntu Developers <ubuntu-devel-discuss@lists.ubuntu.com>
    Original-Maintainer: Net-SNMP Packaging Team <pkg-net-snmp-devel@lists.alioth.debian.org>
    Bugs: https://bugs.launchpad.net/ubuntu/+filebug
    Installed-Size: 144 kB
    Pre-Depends: init-system-helpers (>= 1.54~)
    Depends: libc6 (>= 2.4), libsnmp35 (= 5.8+dfsg-2ubuntu2), debconf (>= 0.5) | debconf-2.0, adduser, debconf, lsb-base (>= 3.2-13), libsnmp-base
    Suggests: snmptrapd
    Homepage: http://net-snmp.sourceforge.net/
    Download-Size: 56.4 kB
    APT-Sources: http://archive.ubuntu.com/ubuntu focal/main amd64 Packages
    Description: SNMP (Simple Network Management Protocol) agents
     The Simple Network Management Protocol (SNMP) provides a framework
     for the exchange of management information between agents (servers)
     and clients.
     .
     The Net-SNMP agent is a daemon which listens for incoming SNMP
     requests from clients and provides responses.

     

    Did an snmpwalk of 1.3.6.1.2.1.25.2.3.1.3:

    Walking OID 1.3.6.1.2.1.25.2.3.1.3 from ubuntu2004vm.local:161 for version v2c with community=public pdu.timeout=5s walk.timeout=5s
    	1 => Physical memory
    	10 => Swap space
    	3 => Virtual memory
    	35 => /run
    	36 => /
    	38 => /dev/shm
    	39 => /run/lock
    	40 => /sys/fs/cgroup
    	6 => Memory buffers
    	7 => Cached memory
    	72 => /boot/efi
    	73 => /run/snapd/ns
    	75 => /run/user/125
    	77 => /run/user/1000
    	8 => Shared memory

     

    Did a `sudo systemctl snmpd restart`.

    Did another snmpwalk of 1.3.6.1.2.1.25.2.3.1.3:

    Walking OID 1.3.6.1.2.1.25.2.3.1.3 from ubuntu2004vm.local:161 for version v2c with community=public pdu.timeout=5s walk.timeout=5s
    	1 => Physical memory
    	10 => Swap space
    	3 => Virtual memory
    	35 => /run
    	36 => /
    	38 => /dev/shm
    	39 => /run/lock
    	40 => /sys/fs/cgroup
    	6 => Memory buffers
    	7 => Cached memory
    	72 => /boot/efi
    	73 => /run/snapd/ns
    	75 => /run/user/125
    	77 => /run/user/1000
    	8 => Shared memory

     

    Maybe there's something different about your setup than mine (vanilla ubuntu), snmpd.conf = "rocommunity public". I'm not seeing the shuffling.

  • I can find a few references online to people having the problem with google but don't have responses or solutions outside of using the mount name as the index. I'm not that great at Linux but I would guess it might be related to how the kernel is detecting drives on boot and it's order? Perhaps related to uuid/label stuff?

    These physical or virtual servers? Perhaps you might be only able to replicate it with real reboots and not just restarting snmpd.

  • Anonymous's avatar
    Anonymous

    I'm using a VM. Forced a full shutdown and cold boot between these walks. 

    $ !snmpwalk ubuntu2004vm.local .1.3.6.1.2.1.25.2.3.1.3
    Walking OID .1.3.6.1.2.1.25.2.3.1.3 from ubuntu2004vm.local:161 for version v2c with community=public pdu.timeout=5s walk.timeout=5s
    	1 => Physical memory
    	10 => Swap space
    	3 => Virtual memory
    	35 => /run
    	36 => /
    	38 => /dev/shm
    	39 => /run/lock
    	40 => /sys/fs/cgroup
    	6 => Memory buffers
    	7 => Cached memory
    	72 => /boot/efi
    	73 => /run/snapd/ns
    	75 => /run/user/125
    	77 => /run/user/1000
    	8 => Shared memory
    
    $ !snmpwalk ubuntu2004vm.local .1.3.6.1.2.1.25.2.3.1.3
    Walking OID .1.3.6.1.2.1.25.2.3.1.3 from ubuntu2004vm.local:161 for version v2c with community=public pdu.timeout=5s walk.timeout=5s
    	1 => Physical memory
    	10 => Swap space
    	3 => Virtual memory
    	35 => /run
    	36 => /
    	38 => /dev/shm
    	39 => /run/lock
    	40 => /sys/fs/cgroup
    	6 => Memory buffers
    	7 => Cached memory
    	72 => /boot/efi
    	73 => /run/snapd/ns
    	75 => /run/user/125
    	77 => /run/user/1000
    	8 => Shared memory

     

    Gotta be something specific to the config or the hardware (or both).

  • 11 hours ago, Mike Moniz said:

    I can find a few references online to people having the problem with google but don't have responses or solutions outside of using the mount name as the index. I'm not that great at Linux but I would guess it might be related to how the kernel is detecting drives on boot and it's order? Perhaps related to uuid/label stuff?

    These physical or virtual servers? Perhaps you might be only able to replicate it with real reboots and not just restarting snmpd.

     

    All VM's. Sometime the OID indexes changes, sometimes not, it's a bit random.

  • 11 hours ago, Michael Rodrigues said:

    SNMP index OID shuffling is common, and it's why LM uses the WILDALIAS/Instance Name as the unique identifier for an instance, while the WILDVALUE/Instance Value is the index OID. The wildvalue can change without losing an instance and its history. AD needs to run after re-shuffling to make data reporting work correctly, as the wildvalue is used to match up reported data to the instance.

    Generally, this shouldn't happen often enough for it to throw off more than a poll here and there.

    How often are you restarting SNMPd? 

    Making AD run more often is one way to mitigate.
     

     

    Not often to be honest, we mostly see it on reboots in relation to patching.

    Just last night we updated our SNMPD config (We use Puppet) to include the extended disk information, then we rebooted the SNMPD service. This caused a lot of alarms again across multiple servers.

  • Hi @Stuart Weenig

    Thanks for you detailed analysis, I have a bit more to add:

    Our SNMPD config is nothing crazy, please note we use SNMPv3 (I don't know is this makes any difference in discovering logic compared to 1/2, I think not)

    Here is the SNMPD config we used (Pushed with Puppet)

    Quote

    agentaddress udp:161

    # Traditional Access Control
    rocommunity public 127.0.0.1
    rocommunity6 public ::1

    # VACM Configuration
    #       sec.name       source        community
    com2sec notConfigUser  default       public

    com2sec6 notConfigUser  default       public

    #       groupName      securityModel securityName
    group   notConfigGroup v1            notConfigUser
    group   notConfigGroup v2c           notConfigUser

    #       group          context sec.model sec.level prefix read       write notif
    access  notConfigGroup ""      any       noauth    exact  systemview none  none
    #       name          incl/excl  subtree             mask(optional)
    view    systemview    included   .1.3.6.1.2.1.1
    view    systemview    included   .1.3.6.1.2.1.25.1.1

    # System Group
    sysLocation  <REDACTED>
    sysContact  <REDACTED>
    sysServices 72
    sysName <REDACTED>

    ## We do not want annoying "Connection from UDP: " messages in syslog.
    dontLogTCPWrappersConnects no

    # OTHER CONFIGURATION
    rouser <REDACTED> authPriv
    extend diskstats /bin/cat /proc/diskstats

     

     

    The errors in the LM screenshot from last night was SNMPD reboot, and those from today was from a OS reboot.

    Only some of the disks seems to had it index change:

    If it helps, here is also the output from: /etc/fstab

    LABEL=disk-1    /data/backups/disk-1    xfs    defaults,noatime         0 0
    LABEL=disk-2    /data/backups/disk-2    xfs     defaults,noatime        0 0
    LABEL=disk-3    /data/backups/disk-3    xfs     defaults,noatime        0 0
    LABEL=disk-4    /data/backups/disk-4    xfs     defaults,noatime        0 0
    LABEL=disk-5    /data/backups/disk-5    xfs     defaults,noatime        0 0

  • I spun up a Vanilla Ubuntu 20.04 server, with same SNMPD version (just latest apt update), I could not get this behaviour triggered before I started adding in more disks:

    I wasn't able to trigger this in service restart for this VM for some reason, only reboots on this one, some of the reboots also yielded the same indexes, so it's not always that this happens.

    *** Added more disks ***
        
        1 => Physical memory
        10 => Swap space
        3 => Virtual memory
        35 => /run
        36 => /
        38 => /dev/shm
        39 => /run/lock
        40 => /sys/fs/cgroup
        6 => Memory buffers
        7 => Cached memory
        70 => /run/snapd/ns
        72 => /run/user/1000
        73 => /data/backups/disk-1
        74 => /data/backups/disk-2
        75 => /data/backups/disk-3
        76 => /data/backups/disk-4
        77 => /data/backups/disk-5

        8 => Shared memory
        
        *** REBOOT *** 
        
        1 => Physical memory
        10 => Swap space
        3 => Virtual memory
        35 => /run
        36 => /
        38 => /dev/shm
        39 => /run/lock
        40 => /sys/fs/cgroup
        6 => Memory buffers
        65 => /data/backups/disk-5
        66 => /data/backups/disk-2
        67 => /data/backups/disk-3
        68 => /data/backups/disk-1
        69 => /data/backups/disk-4

        7 => Cached memory
        75 => /run/snapd/ns
        77 => /run/user/1000
        8 => Shared memory
        
        *** REBOOT *** 

        1 => Physical memory
        10 => Swap space
        3 => Virtual memory
        35 => /run
        36 => /
        38 => /dev/shm
        39 => /run/lock
        40 => /sys/fs/cgroup
        6 => Memory buffers
        65 => /data/backups/disk-4
        67 => /data/backups/disk-3
        68 => /data/backups/disk-1

        7 => Cached memory
        72 => /data/backups/disk-5
        73 => /data/backups/disk-2

        75 => /run/snapd/ns
        77 => /run/user/1000
        8 => Shared memory