Forum Discussion

boomi's avatar
5 years ago

How to stagger datasource instance collection?

I'm working with custom Meraki API datasources and have an issue where the collector can get into a state such that all the instances attempt to collect simultaneously. Or, at least closely enough together it triggers Meraki's rate limiting, and even my backoff/retry isn't doing the job. 

I would think that if my collector script for my customer datasource is set to sleep for a rand() number of seconds, I should be able to avoid this. Any thoughts about what I might be doing wrong, or is there "LogicMonitor way" I should be handling this?

The below is what I'm doing to try to account for 429 Rate Limit responses. (Not saying this is 'correct' in any way, but I thought the retry logic should've worked.)

def getAPIQueryOutput(String api_uri)
{
  url = 'https://api.meraki.com' + api_uri
  req = getHTTPResponse(url)

  // Got it the first try
  if(req.responseCode == 200){
    return new JsonSlurper().parseText(req.inputStream.getText('UTF-8'))
  } else if(req.responseCode == 204) {
    // Entry exists but did not have data.
    return null
  } else if(req.responseCode == 400) {
    // Bad request.
    return null
  } else if(req.responseCode == 404) {
    // Whatever we tried to find didn't exist. Return null.
    return null
  }

  // 429 received due to rate limiting, backoff and try again.
  count = 1
  while(req.responseCode == 429 && count < 4){
    backoff = getBackOffMs(count)
    // Wait for increasing amount of time.
    sleep(backoff)
    req = getHTTPResponse(url)
    count++
  }

  // Return whatever we ended up with.
  return new JsonSlurper().parseText(req.inputStream.getText('UTF-8'))
}

def getBackOffMs(count){
  // Get a random int between 1 - 3 inclusive.
  backoff_seed = new Random().nextInt(3) + 1
  return backoff_seed * 1000 * count
}

 

  • Anonymous's avatar
    Anonymous

    There is a more LogicalMonitor way to do it.  (see what i did there?)

    I believe we're working on getting information ready to present, but I think we're shifting the way LM monitors Meraki to utilize the API instead of SNMP (like you have done).  I think it does something like breaking it up so each network is represented as a different device in LM splitting the monitoring across parallel tasks. Should skirt the 429 problem.

    How urgent is this for you?

  • Ah already ahead of you there, I came to the same conclusion after attempting to do a 'single device' model, which just couldn't do it. (If I weren't trying to do API switchport monitoring, it probably would've been fine).

    I use the MX at each location as the 'anchor' for all of the Meraki monitoring at the location:

     

    But they somehow still manage to sync up. I had the AMP and IPS checks at 4 hour intervals, and after restarting the collector, started getting buckets of 429's:

     

    And now what's really got me confused ... I set both of those to 5 minute collection intervals about 30 minutes ago and now everything's fine!