Anybody else disabling Meraki instances by default?
If you are, I’d like to know if you’re experiencing the problem we are that LM has tried and failed to troubleshoot: For some random reason, a number of meraki instances disappear from LM during one discovery cycle and reappear during the next discovery cycle. This isn’t normally a big problem since they instances aren’t set to delete until 30 days. Normally, they’d just show back up and have a gap in data. However, in our casewe have a business need to have the instances disabled on discovery (we charge the customer per Meraki device we actively monitor). This means that instances that have been discovered and enabled for monitoring randomly disappear and reappear as disabled instances. Also, any customer instance level properties that were added to the instance also are not present on the rediscovered instance. In the last 3 hours, there have been 3,577 instances (out of somewhere around 18,000) that disappeared and reappeared in this way. The problem was so pervasive that I had to write a script to loop through all instances and enable them based on a master billing sheet.Solved317Views14likes11CommentsPSA: LM wipes good known properties when unknown results occur
I have recently found that due to the excellent programming skills in the dev team that properties that have previously been autodiscovered can be wiped out when ephemeral issues produce unknown (no data) results. A good example is system.ips -- if the data has been scanned properly in the past and a blip occurs with no data, the previous values get overwritten with just the configured IP of the device. That leads to various fun side effects like NetFlow data not being matched to the device. To make things worse, the “no data” result does not set an internal flag to run a new AD scan earlier and you have to wait up to 24 hours for a regularly scheduled scan. I created a bug ticket requesting they set that flag and run a new scan as soon as possible, but was basically told to pound sand. My workaround was to use an undocumented API endpoint to trigger on specified devices so I stop losing NetFlow data and I scheduled it hourly. The “solution” I was given was to add a netflow property to hardcode the needed IP address for each device -- works, but it is a brittle fix and leads to undesirable manual property management. Beyond that, this issue affects more than NetFlow, that was just the problem that lead me to realize what was happening. Other properties routinely get messed up that could affect processing. This class of problem (replacing good data with unknown data) frequently occurs in modules as well -- for example, a lot of the Powershell configsource modules lack sufficient error checking and unknown results replace previously known good results, leading to change thrashing. Or they often forget to sort/normalize output leading to similar effects. The good news on those is they usually (eventually) listen to me. Anyone who wants to use my workaround can use this script (or at least the central logic if you prefer something other than Perl). I still lose data, but the window is smaller. https://github.com/willingminds/lmapi-scripts/blob/master/lm-action69Views2likes7CommentsPotential 187 UI bug
I tried to report this via Engineer chat, but no response. We have 187 early access. Looks like there may be a critical UI bug relating to the new Instance Group functionality… If you select an instance and then the Instance Group, the UI does NOT update to show the overview graphs etc. Please ping me via email to review the issue in our early access portal on a Zoom meeting.63Views3likes3Commentsheads up - property corruption due to unknown results
I wanted to share this with everyone since it bit me recently. Sometimes LM will reset the value of properties like system.ips to just the single IP associated with the resource if something disrupts access to SNMP even briefly. The problem with this is it can impact other features, like Netflow binding. I am still battling this out with support but in the meantime I wrote a script to trigger AD for specified devices (had to use an undocumented endpoint) and I schedule that hourly to limit the damage in case it happens (normally AD is triggered once per day unless specific changes occur). My change logs show system.ips resetting fairly often, so the script is definitely helping. I explained to support that thwacking data due to an unknown result is a bug and no property should be changed due to that situation, but I imagine this bug will be hard to unwind. I also recommended a partial fix where AD could be triggered by a “host up” event to limit the damage, but that is a hack. Avoiding data corruption in the first place is the right fix. FWIW, the main loop of my script is below (it is in an ancient language, but the logic should be clear :)). DEVICE: for my $d (@{$devices}) { if ($DEVICES{lc $d->{displayName}}) { if ($ACTION eq "scheduleAutoDiscovery") { verbose 1, "scheduling AD for $d->{displayName}\n"; if (not $lmapi->post(path => "/device/devices/$d->{id}/scheduleAutoDiscovery")) { warn "ERROR: $ACTION failed\n"; } } else { die "ERROR: unsupported action: $ACTION\n"; } } }34Views5likes0Comments