Forum Discussion

Mike_Moniz's avatar
Mike_Moniz
Icon for Professor rankProfessor
5 years ago

New Office365 monitoring checks, caveats I've noticed

Recently LogicMonitor released a new set of Office 365 monitoring checks. I've built various O365-based checks over the years, mostly against Exchange portion via Exchange Shell but have been looking at implementing GraphAPI support for a while now. I saw that most of the new DataSources use GraphAPI so I was really interested in how LM implemented it since I wasn't able to get any real-time data from GraphAPI. I reviewed all of them and found LM were not able to get real-time data either.

It's not really clear in the descriptions but the DataSources with "Reports" in the name (ex. Office365_Reports_OutlookEmailActivity) get the same data you see in Office365 Reports page. Either the data from the graph itself or the table below it. Microsoft only updates these numbers once a day and there is frequently several days of lag. So while LogicMonitor might show your organization received 100k emails today, it's actually how many you got perhaps 2 days ago. And the actual lag can change over time, sometimes it's just 1 day, sometimes 3. This might throw you off when you look at the graphs, especially if you see dips in the middle of the week when really it's just lagged weekend data. This also applies to DataSources that aggregate data across the "last 7 days". Some DataSources report this via a "reportAge" datapoint which shows the lag in seconds (although it's always in whole days) but not all of them have this datapoint. I don't think LM can do anything about this lag but thought I would point it out to the community.

Also some checks collect data for every user or site in an organization and tends to fail if you have thousands of users or boat load of SP sites. Either it times out a lot or it hits a rate limit. I think the script output might also get truncated after ~64k lines of data .I also see some checks will report 0 instead of No Data if the GraphAPI call fails so you get.very bumpy graphs.

I've found several bugs too but have reported them over to LM support.

Thanks!

  • I am trying to get it working currently and not have to much success.

    I got the non report ones working, minus sharepoint.

    The Report ones, I configured everything I was told to, but the invoke-webrequest inside of get-graphapi returns nothing for some reason. So it errors out.

    Did you have any issues similiar?

  • I got them all work but had to fix several bugs. Here is the list of bugs/fixes I reported (ticket 192561), but LM has not fully validated them. You might want to wait until LM releases newer versions. I also didn't verify all the data it provides (especially SP and ConfigSources).

    • O365 Passwords with "$" and perhaps other symbols might have problems due to using double quotes instead of single quotes, change "##OFFICE365.PASS##" to '##OFFICE365.PASS##'
    • Office365_SharepointOnline_SiteStatus appears to have the wrong AppliesTo, should be hasCategory("Office365") && office365.spoadminsite
    • Office365_SharepointOnline_SiteStatus might show "More results were found but were not returned..." error, add “-Limit All” to the Get-SPOSite statements.
    • Office365_Reports_OutlookEmailActivity always reports 0 received emails (I wish!). The code has “$count.Receive" instead of “$activity.Receive".
    • Office365_Reports_OutlookEmailActivity and Office365_Reports_MicrosoftTeamsUserActivity gets 7 days’ worth of data and seems to pick only the oldest counts instead of the latest, other DataSources do a sort and picks the latest but these does not. You can copy the line from other DataSources.
    • Office365_Reports_MicrosoftTeamsUserActivity might error with “The response content cannot be parsed because the Internet Explorer engine is not available..."Adding "-UseBasicParsing" to the "$query = Invoke-WebRequest" line.
    • Office365_Reports_OneDriveFileCounts is likely using the wrong API. It's using getOneDriveUsageAccountCounts but I think it should be getOneDriveUsageFileCounts API instead. Changing that seems to fix it.
    • The instance-based DataSource have problems with large environments where it times out a lot, sometimes with "0" instead of "No Data".

    See if any of these help, else you might want to try temporarily remove the "| Out-Null" stuff to see if there are errors that are being suppressed, or try adding -Verbose to commands. Or you might want to make sure the AzureAD App stuff is setup correctly. I had LM registered as an App a while back but the directions LM provided looks to match my setup.

  • You might also want to try add the following line near the top of the scripts. This causes PowerShell to use TLS 1.2 for connections.

    [Net.ServicePointManager]::SecurityProtocol = [Net.SecurityProtocolType]::Tls12

     

  • Our SharePoint team pointed out that they keep losing their SP instances for the Office365_SharepointOnline_SiteStatus DataSource causing dashboards to go blank. I found that many of the times when it makes an API call it gets a "ERROR: The remote server returned an error: (503) Server Unavailable" error and the AD script just assumes there is no sites so it removes all the instances because it has no error checking. I stuck the sometimes-failing codes inside a Try/Catch loop and some warning/error actions so the whole script can return non-zero exitCode letting AD knows it was a failed attempt and to ignore it. That will hopefully fix that issue.

    try {
        Connect-SPOService -Url $spo_url -Credential $credential -WarningAction "SilentlyContinue" -ErrorAction "Stop" | Out-Null
        $sites = @(Get-SPOSite -Limit ALL -WarningAction "SilentlyContinue" -ErrorAction "Stop")
    } catch {
        Write-Host "ERROR: $_"
        Exit 1
    }

    I think the other DataSource may also have a similar problem from lack of error checking.

  • Anonymous's avatar
    Anonymous

    Hm, that's not expected. The AD script should exit with a non-zero if there are errors.  Meaning to say, the developer should make the script exit with a non-zero if it encounters errors. Only if the script exists with a 0 will the results even be parsed. Looks like that's exactly the correction you made.