Aha, the Machine Check Exception facility. I didn't notice that before. You must have that configured. And it Looks like it works too (grin). Cool the way it says that Linux can auto fix the error, and good that it logs. I really am impressed, and must read up more on that ... so much to do (grin).
The posts I saw were saying that if you get inaccurate data in the cahce's at high temps, it would show up here, or if the power supply was poor with fluctuating 12+5v rails then again, it would show up here. But everything remained solid, so I guess I'll try memtest to put the RAM through some grief, just in case.
EDIT: Ran memtest twice, no problems.....just going to keep an eye out for MCE messages, but guess it was a one off,not important, maybe even a false positive...
Yes, will inevitably involve a number of contributing factors, all variable, and so hard to single any particular one out as the prime caurse. A good power supply will still be at the mercy of mains fluctuations to a degree. Unless it's plugged into one of those ups facilities. Another thing i wish i could afford.
Iv'e been lucky in that regard. That tiny url worked this time too
I think if it was a definite hw issue, (ram), then it would likely occur when ever similar conditions existed. Then start to occur regularly without neccessarily the extreme conditions being present. A consistent thing that seems to come across with ram and temperature, from what i can gather, seems to involve what they call "bit flip". Possibly that was it.
As you have been stressing the system out, and it hasn't reoccured you may well be right that it was a one off. Probably the temp was the straw that fired it off, in context with the high cpu usage.
It can be a real pain too, so much time, more or less wasted, trying to unravel error messages/codes. Ok for the manufactures, i would expect manufactures would have them archived for their own use, and they don't really have a lot of rom space to devote. But , as Linux does represent a big change in the computing mind set ... that is ... computing has been brought to the people, so to speak ... it would be nice to see human readable translations for these things, become part of the kernel. Though, i do know that it is easier said than done, as it would be a mamouth task. But ... it's great that we have the MCE facility available in any case !. I wonders, just how often these things have occured in the past, but just gone unnoticed due to the absence of any facility to provide the log out put. Being for warned at least is a help.
As long as there aren't any further log entries, then it probably was just a free radicle event, but i wouldn't think it was false. Would have thought that 52c would have been ok though. It is well within spec. But 52c is damn hot too, and amd traces are really thin ... seems at least now you have a danger mark for temp, that you can check for.
I've got mce configured as a module thats not loaded, so i'll have to change it to a built in now
A long compile in the background, something like an X or libc6 compile, works as a good defacto system/ram test too.
NB: Had to re-login to post this, on a page opened in a tab from an existing login ...
strange how that happens occassionally ?
Humpty Dumpty Was Pushed !