kernel: mce: [Hardware Error]: Machine check events logged
1 | localhost.localdomain kernel: mce: [Hardware Error]: Machine check events logged |
内存模块出现错误
Machine Check Exceptions (MCE)
X86 CPUs report errors detected by the CPU as machine check events (MCEs). These can be data corruption detected in the CPU caches, in main memory by an integrated memory controller, data transfer errors on the front side bus or CPU interconnect or other internal errors. Possible causes can be cosmic radiation, instable power supplies, cooling problems, broken hardware, running systems out of specification, or bad luck.
Most errors can be corrected by the CPU by internal error correction mechanisms. Uncorrected errors cause machine check exceptions which may kill processes or panic the machine. A small number of corrected errors is usually not a cause for worry, but a large number can indicate future failure.
When a corrected or recovered error happens the x86 kernel writes a record describing the MCE into a internal ring buffer available through the /dev/mcelog device. mcelog retrieves errors from /dev/mcelog, decodes them into a human readable format and prints them on the standard output or optionally into the system log.
用来报告主机硬件相关问题的一种日志机制
常见的MCE错误原因
- 内存错误或ECC(Error Correction Code)问题
- 冷却不充分/处理器过热
- 系统总线错误
- 处理器或硬件的缓存错误
参考
- mcelog