Forum Discussion

Kirby_Timm's avatar
Kirby_Timm
Icon for Neophyte rankNeophyte
13 days ago
Solved

SSLError for HTTPS module

LogicMonitor is telling me that one of my FortiGate firewalls, which uses a self signed cert for the GUI, is giving a couple of errors.  One error was "days remaining" for the SSL certificate and the other is just this generic "SSLError".  I went into the FortiGate and renewed the self signed cert for the GUI and that cleared up the "days remaining" error but the generic "SSLError" persists.  I'm not quite sure what's throwing the error so I'm not really sure what to do to resolve it.  Suggestions?

  • For anyone else that may be looking at this in the future, support has supplied me with a solution that works.
    In LogicMonitor, Open up Settings -> Collectors -> [locate the Collector in question] -> Collector Configuration -> Wrapper Config
    Within wrapper config, look at all the "wrapper.java.additional.##" where the ## is going to be a number starting at 1 and incrementing by 1.  For example in my environment I had 
    wrapper.java.additional.1 - wrapper.java.additional.28
    Add 1 to the last wrapper.java.additional.## (so for me it would be wrapper.java.additional.29) and add the following line to the end of your config
    wrapper.java.additional.29=-Djdk.tls.maxHandshakeMessageSize=50000
    Then click "save and restart" to restart the collector.  This solved the issue.  
    The "why" is a bit more iffy.  According to LogicMonitor support:
    It looks like it was basically just buffer overflow protection built in to the Collector. We have a buffer of 32KiB for the handshake response, if the response exceeds that buffer size we discard it as invalid. To be clear, the following is conjecture, but I figure what's going on here is that most SSL handshake responses are less than 32KiB, so that value was probably chosen arbitrarily as 'good enough' for most cases. Given that the SSL handshake response contains the entire certificate chain, if the chain is long enough it could in theory exceed that buffer size, which I assume is why the developers offered this as a knob to turn in the config. According some random sources I found online, a typical enterprise certificate chain for an internal server using TLS 1.2 can be 6-10KiB, so 32KiB should be enough in most cases. Looks like in this case it wasn't.

     

22 Replies

  • For anyone else that may be looking at this in the future, support has supplied me with a solution that works.
    In LogicMonitor, Open up Settings -> Collectors -> [locate the Collector in question] -> Collector Configuration -> Wrapper Config
    Within wrapper config, look at all the "wrapper.java.additional.##" where the ## is going to be a number starting at 1 and incrementing by 1.  For example in my environment I had 
    wrapper.java.additional.1 - wrapper.java.additional.28
    Add 1 to the last wrapper.java.additional.## (so for me it would be wrapper.java.additional.29) and add the following line to the end of your config
    wrapper.java.additional.29=-Djdk.tls.maxHandshakeMessageSize=50000
    Then click "save and restart" to restart the collector.  This solved the issue.  
    The "why" is a bit more iffy.  According to LogicMonitor support:
    It looks like it was basically just buffer overflow protection built in to the Collector. We have a buffer of 32KiB for the handshake response, if the response exceeds that buffer size we discard it as invalid. To be clear, the following is conjecture, but I figure what's going on here is that most SSL handshake responses are less than 32KiB, so that value was probably chosen arbitrarily as 'good enough' for most cases. Given that the SSL handshake response contains the entire certificate chain, if the chain is long enough it could in theory exceed that buffer size, which I assume is why the developers offered this as a knob to turn in the config. According some random sources I found online, a typical enterprise certificate chain for an internal server using TLS 1.2 can be 6-10KiB, so 32KiB should be enough in most cases. Looks like in this case it wasn't.

     

  • Well, I went ahead and actually opened a ticket with LogicMonitor support on this.  I've had a ticket open with FortiNet support since last week but that ticket has kind of gone stale.  
    On a related but different note, is there some where to see in LM exactly what the OID is that's being queried?  

    • Mike_Moniz's avatar
      Mike_Moniz
      Icon for Professor rankProfessor

      There isn't a OID for an HTTP(S) check, it's would use an actual HTTPS request using a check that is built directly into the collector code.

      For the collector logs, you can ask the collector to "send logs to LogicMonitor" which will upload them to the portal where you can then download them without directly accessing the collector. See https://www.logicmonitor.com/support/collectors/collector-management/collector-logging#sc-header-160 . Note that the files without a starting number is the latest one. But I find it easier to check the logs directly on the collector if I have access to it. Easier to view and you can sort by last modified. Logs are located at C:\Program Files\LogicMonitor\Agent\logs\ for Windows Collectors and /usr/local/logicmonitor/agent/logs/ for Linux Collectors by default. wrapper.log is likely the one you want.

      • Kirby_Timm's avatar
        Kirby_Timm
        Icon for Neophyte rankNeophyte

        I did find this in the wrapper log.  Not sure it's helpful or not?
        [FSMWebpageTask$HttpResponseCallback.failed:958] Caught SSL exception, CONTEXT=host=<IP>, sslErrorReason=unknown., EXCEPTION=javax.net.ssl.SSLProtocolException: The size of the handshake message (37037) exceeds the maximum allowed size (32768)
        I'm not sure how one would adjust the size of the handshake message or how it got wonky to begin with... or if it's just a red herring?

  • Well, I've renewed all the internal certs but logicmonitor is still throwing the error.  I don't understand why.  There was no error with the selfsigned cert previously.

    • Mike_Moniz's avatar
      Mike_Moniz
      Icon for Professor rankProfessor

      What does a browser show when you attempt to look at the firewall? Does Poll Now show anything useful?

      • Kirby_Timm's avatar
        Kirby_Timm
        Icon for Neophyte rankNeophyte

        Browser throws an error because it's selfsigned, but LogicMonitor never had an issue with that before and doesn't have an issue with my other 5 fortinet clusters running selfsigned certs either.

         

  • Looking at the HTTPS datasource, the SSLError datapoint will alert if the value is 6. Looking at https://www.logicmonitor.com/support/logicmodules/datasources/data-collection-methods/webpage-httphttps-data-collection says that is "invalid SSL certificate" so still looking like a cert issue. The HTTPS check should work fine with self-signed certs (does here), so I would guess perhaps the options when it was regenerated are the issue? Like perhaps it uses an old algorithm or an invalid subject?

    I would try looking at the cert details in a browser to look at it in more detail.

    • Kirby_Timm's avatar
      Kirby_Timm
      Icon for Neophyte rankNeophyte

      Could be that it's because it looks like some other built in certs are expiring next month.  I'll put it through a change control and run the commands to update the certs and see if that resolves it.

    • Lewis_Beard's avatar
      Lewis_Beard
      Icon for Expert rankExpert

      I wonder if there is any chance his server does a redirect. We had some issue where SSL Certs datasource was trying to look at the site LM had but that site actually redirected to somewhere else, but the datasource doesnt care. So we had some issues due to the SSL cert datasource not taking that into account. Probably not related, its just the only thing I have that could help OP. I guess I should have replied to him. :) Oh well, he will see the extra post count surely. :)

      • Kirby_Timm's avatar
        Kirby_Timm
        Icon for Neophyte rankNeophyte

        I don't think this would be the case.  The error wasn't there before I went on PTO, nothing changed while I was out, and the error was there while I came back from PTO.