Forum Discussion

Kelemvor's avatar
Kelemvor
Icon for Expert rankExpert
2 days ago

All my collectors are going down every 24 hours.

Hi,

Starting last week Thursday, all my collectors, across two different LM portals, are going down approximately every 24 hours.  Apparently the Watchdog is telling the Agent to reset itself which causes everything to die for 5-10 minutes.

It's happening on almost every collector we have.  Some do it every day, some skip a day here and there.  They ALL started this last week Thursday.  We didn't make any global changes and have no idea what the heck happened.

Is anyone else dealing with this?

There are entries like this that show the Watchdog service told the main Agent server to restart itself.

[2025-05-06 20:05:47.876 GMT] [MSG] [CRITICAL] [statusmonitor:::] [StatusListener$1.run:135] Peer request to shutdown, CONTEXT=CAUSE=shutdown cmd, ACTION=quit
[2025-05-06 20:05:47.876 GMT] [MSG] [CRITICAL] [statusmonitor:::] [StatusListener$1.run:151] Shutting self down by quit with 0, CONTEXT=MSG=all sockets closed, System.exit(0) now
[2025-05-06 20:05:47.882 GMT] [MSG] [INFO] [statusmonitor:::] [RestartUtil._reportEvent:236] Reported restart reason successfully, CONTEXT=type=ReceivedShutdown, reason=Collector receives shutdown command from watchdog. Agent will restart.
[2025-05-06 20:05:47.883 GMT] [MSG] [INFO] [statusmonitor:::] [RestartUtil._saveRestartReason:292] Save restart reason successfully, CONTEXT=file=C:\Program Files (x86)\LogicMonitor\Agent\conf\restart.conf
[

These are the tickets that they generated showing that it delays a few minutes each day, but is happening almost by clockwork.

 

 

 

I opened a ticket via Chat but was told that something is overload the agent and we need to up the collectors sizes.  This doesn't really tell what happened last Thursday that started causing the problem so I'm posting here wondering if anyone is having the same issue.

Thanks.

1 Reply

  • It is normal for the collector to restart itself every 24 hours. I also think it cycles its encryption key with the portal at the same time. But it should just take seconds for that to happen, not enough to cause any gaps or alerts. So having it take 5-10m to finish the restart is very odd, I've never seen any do that before.

    I would suggest reviewing the agent logs on the collector from before Thursday and after, during the period when it normally restarts itself, to see if something changed. Worth checking Event/system logs for that period to see if the OS is seeing problems restarting the service. I would also try manually restarting the LogicMonitor service and see if even doing it by hand takes it minutes to come back up. I would also check if there is something else that might be running at the same time as the 24hr restart. Like you mentioned the collector restarts itself after ~24hours from when it was first started, perhaps it has started shifting into a period of high cpu load? or perhaps a backup snapshot pauses the VM right at that moment? etc.