497 days and counting........
You might have received an alert saying your linux based device has just rebooted, but you know that it has been up a long time. A switch might have just sent an alert for every interface flapping when they have all been up solidly. The important question to ask here is how long has the device been up? If its been up for 497 days,994 days,1491 days or any multiple of 497 then you are seeing the 497 day bug, that hits almost every linux based device that is up for a good length of time. Anything using a kernel less than 2.6 computes the system uptime based on the internaljiffies counter, which counts the time since boot in units of 10 milliseconds, or jiffies. This counter is a 32-bit counter, which has a maximum value of 2^32, or 4,294,967,296. When the counter reaches this value (after 497 days, 2 hours, 27 minutes, and 53 seconds, or approximately 16 months), it wraps back around to zero and continues to increment. This can result in alerts about reboots that didn’t happen and cause switches to report a flap on all interfaces. Systems that use 2.6 Kernel and properly supply a 64 bit counter will still alert incorrectly when the 64 bit counter wraps. A 32 bit counter can hold4,294,967,295( /4,294,967,295864000/8640000 = 497.1 days) A 64 bit counter can hold18,446,744,073,709,551,615 . (18,446,744,073,709,551,615/8640000 =2135039823346 days or 5849424173 years) Though I expect in 6,000 million years we will all have other things to worry over.49Views0likes5CommentsExport Datetimes for Downtime in a Website Overview Report
For context, I'm a consumer of LogicMonitor csv/excel report exports only - I am required to aggregate my exports outside of LogicMonitor so I can use the availability data from our website endpoint checks to build Tableau Reports. Thus, it would be incredibly helpful to my organization if I could obtain an export (csv/excel) from LogicMonitor that contained all webservices downtime (NOT aggregated and reported as ##h ##m ##s) with datetimes for each period of missed polls (downtime). For example, each line in the spreadsheet contain the endpoint details and the start/stop time. For periods of flapping, each there would be multiple lines for the same endpoint, each with their own start and end times. Knowing when downtime is occurring AND knowing if it occurred during a scheduled maintenance period are essential pieces of information necessary to advance availability reporting for our cloud applications and ensure we maintain our SLA (which discounts application downtime for planned maintenance). I draw out my data daily, via a website overview report that excludes SDT from the reported downtime in a csv export.6Views0likes0CommentsSystemSTARTUpTimeGreaterthan28days
I feel foolish in asking this however I need to this get off my plate. I've been asked to setup a monitor that alerts windows systems that have been up for more than 28days. I thought that I could clone the "WinSystemUptime" DataSource, which will report if a system has been rebooted and modify it to report not that a system has been rebooted but the system has not been rebooted. Any help would be appreciated1View0likes3CommentsUptime - Bug Report
I’d like to re-initiate this bug report. The Uptime resetting counters at 497 days or 469 days (historical) I just had a similar false alarm telling me that my devices rebooted, when they did not. Please have the DEV team review this specific monitor and determine how the system can display 497+ days “uptime” --------------------------------- ________________________________________ SEP 11, 2015 | 01:56PM CDT Original message ________________ wrote: Support team at logic monitor, Is it possible to request adjustment to the "Uptime" data source monitor so that it does not alert when the counter resets from 11111111111111111111111111111111 to 00000000000000000000000000000001 The developer was aware enough of the event cause to code explanation in to the system alert: could the alert be altered to not-alert when the counter resets? - ________________ From: ________________ Sent: Friday, September 11, 2015 2:44 PM To: Subject: SC# Error: 6348 ________________ is reporting it has only been up for 0.43 minutes Hello ________________ , We have received the following monitoring alert and a ticket #6348 has been created to track your issue. An engineer is assigned and is working to resolve this issue. Thank you. We are investagating if the VM really did reboot or if this alert is coming up for a different reason: ________________is reporting it has only been up for 0.43 minutes, as of 2015-09-11 14:28:48 EDT. If this was an unexpected reboot, please investigate the system logs. NOTE: if ________________has been up for 469 days without a reboot, this alert will trigger due to a counter wrap in the host. In this case, you may disregard this alert. (But the host is probably due for an OS update.) For any inquiries please contact our NOC at support@highpoint.com<mailto:support@highpoint.com> or call 1-855-485-8324 (TECH). Regards, ________________ NOC Support Engineer17Views0likes1CommentTime after (Up)time
Time after (Up)time It’s one of the days that an unsung hero gets his chance to make a mark on the world.And who might this be? The floor function , also called the greatest integer function or integer value (Spanier and Oldham, 1987), gives the largest integer less than or equal to x. The floor function is not-normally implemented in the LM perspective – often brushed off, but it got its day when we had a user on chat who was looking on creating a dashboard which has server uptime in days instead of seconds. And so I tried devising a Big Number Widget that would include a virtual datapoint in it with the following calculation to display seconds into days: Fig1. Initial UptimeDays setup without the floor() function. UptimeSeconds/60/60/24 Where UptimeSeconds references a pre-calculated Complex Datapoint from the WinSystemUptime Datasource. Alas – this was not presenting days in a helpful manner. It was displaying days with a decimal point. Quoting from the chat : “that uptime display widget you created would be grand if it showed the days. it says 0,4 at the minute, does that meant 0.4 days?” And so we needed to figure onanother method. A colleague of ours, a wizard from the magical realm of complex datapoints pointed out to us something powerful, that could display seconds, minutes and days even – And, with this, the second hand unwinds... By using the floor() function appended to the original expression – we could actually calculate: The day rounded to the nearest lowest integer. floor(UptimeSeconds/ 60/ 60 /24) For example if we had the result of 2.4 days from the UptimeSeconds/6000/60/24 calculation – it will present us the result as 2 – to the smallest following integer. Fig2. Complex Datapoints for Days, Hours and Minutes with the floor function And on the Hours Datapoint: floor((UptimeSeconds-(days*86400))/3600) Where 86400 represents the number of seconds in a day – 60 * 60 *24 = 86400. and 3600 represents the number of seconds in an hour - 60 * 60 =3600. This calculates the number of total hours (excluding the amount already converted and apportioned into the Days metric) , and again the floor function rounds the number hours to the smallest following integer. For example - 2.4 days- 0.4 is then carried over to the hours, which equates to 9.6 hours, and with the application of the floor function it will reflect 9 hours, with the balance of 0.6 hours to be calculated on the Minutes Datapoint. And lastly on the Minutes Datapoint: floor((UptimeSeconds-(days*86400)-(hours*3600))/60) Where 86400 represents the number of seconds in a day – 60 *60 *24 = 86400. and 3600 represents the number of seconds in an hour - 60 * 60 =3600. and 60 represents the number of seconds in a minute. This does the same again, number of total minutes excluding those previously factored in the days (*24) and and the hours (*60), rounded to the smallest following integer. And therefore, the floor() function has been very useful in this case to catch the Days, Hours and Minutes, have them precisely presented on the Big Number Widget - which the team could display on the dashboard to wait on -- Time after Time. Fig 3. Uptime Days,Hours and Minutes on the Big Number Widget References: https://www.logicmonitor.com/support/datasources/creating-managing-datasources/datapoint-expressions/ http://mathworld.wolfram.com/FloorFunction.html Credits to David Lee for pointing out on the powerful capabilities of the floor function.7Views0likes0Comments