Forum Discussion
The data is provided as a list of CPUs, each with its own list of P-states, each of which has a numeric value. Part of the reason that I wanted to collect all of that data is because it's not documented what the numeric values are and I wanted to see what I could determine from collecting them for a while and seeing what they showed.
I wasn't able to get the data into LogicMonitor, so I ended up collecting the data manually and figuring out what kind of data it was outside of LM. It turns out, as per my previous update, that each value is an ever-increasing counter, something along the lines of total time spent in each P-state or total number of operations run in that P-state.
(For what it's worth, the reason I'm collecting this data is that I have a recurrent hardware problem that is preventing processors from running at full speed, and this is the only statistic I could find that was reporting the problem accurately.)
Since the data is presented as counters, in order for the numbers to be meaningful, they have to be compared with previous numbers. Imagine a situation where a particular CPU has been running at full speed (P-state 0) for a month. Let's say that the number is seconds spent in that P-state (it's probably not that, but it's something like that), and the CPU has four P-states. So you'd like the data for that CPU to be {2592000,0,0,0}. But when it's acting normally, it still spends some small amount of time in the other states, so it would actually look something like {2525087,21254,19083,26576}. Then something happens and it starts consistently running in P-state 3. After 6 hours, the data now looks like {2525087,21254,38695,28564}. You can see how, despite the fact that P0 and P1 have not increased at all, that fact is not at all obvious when looking at the raw data without comparing it to previous data.
I have tried aggregating the data into two datapoints per CPU: totalCount = sum(plevel_counter) and weightedCount = sum((plevel+100)*plevel_counter). I can then construct a complex datapoint that is ((weightedCount/totalCount)-100). (The "100" is because the smaller the multiplier is, the more it can get overwhelmed by other counts. The extreme of this is when the multiplier would be 0, but even just using multipliers 1-17, the 17 is way bigger than 1, while 117 is not all that much bigger than 100.) This is a marginally okay approximation, but it sure would be better if LogicMonitor had real data. (I could get 100% accurate data this way by, instead of "+100", using "*(max(plevel_counter)+1)^plevel", but those numbers feel like they'd be far too large for, well, anything. (The real values can be at least 14 digits long.)
It does turn out that there is a maximum number of P-states, 16, so I could just implement it that way, I think. I think there's probably a way to deal with that in a complex datapoint.
I can also talk with our LM admin again to see what he thinks about directly monitoring each ESXi host instead of basing everything on the vCenter server.
(Also sorry about the slow response. I was unknowingly looking directly at my other response, which doesn't show other responses to the main post.)
Related Content
- 5 months ago
- 5 months ago
- 4 months ago
- 6 months ago