Sunday, 28 August 2016

Fixing an overheating Lenovo X230 laptop

Over the past month I've been hitting excessive thermal heating on my laptop and kidle_inject has been kicking in to try and stop the CPU from overheating (melting!).  A quick double-check with older kernels showed me that this issue was not thermal/performance regression caused by software - instead it was time to clean my laptop and renew the thermal paste.

After some quick research, I found that Artic MX-4 Thermal Compound provided an excellent thermal conductivity rating of 8.5W/mK so I ordered a 4g sample as well as a can of pressurized gas cleaner to clean out dust.

The X230 has an excellent hardware maintenance manual, and following the instructions I stripped the laptop right down so I could pop the heat pipe contacts off the CPU and GPU.  I carefully cleaned off the old dry and cracked thermal paste and applied about 0.2g of MX-4 thermal compound to the CPU and GPU and re-seated the heat pipe.  With the pressurized gas I cleaned out the fan and airways to maximize airflow over the heatpipe.   The entire procedure took about an hour to complete and for once I didn't have any screws left over after re-assembly!

I normally take photos of the position of components during the strip down of a laptop for reference in case I cannot figure out exactly how parts are meant to fix on the re-assembly phase.  In this case, the X230 maintenance manual is sufficiently detailed so I didn't take any photos this time.

I'm glad to report that my X230 is now no-longer overheating. Heat is being effectively pumped away from the CPU and GPU and one can feel the additional heat being pushed out of the laptop.  Once again I can fully max out the CPU and GPU without passive thermal cooling mechanisms being kicked into action, so I've now got 100% of my CPU performance back again; as good as new!

Now and again I see laptop overheating bugs being filed in LaunchPad.  While some are legitimate issues with broken software, I do wonder if the majority of issues with the older laptops is simply due to accumulation of dust and/or old and damaged thermal paste.

Saturday, 30 July 2016

Scanning the Linux kernel for error messages

The Linux kernel contains lots of error/warning/information messages; over 130,000 in the current 4.7 kernel.  One of the tests in the Firmware Test Suite (FWTS) is to find BIOS/ACPI/UEFI related kernel error messages in the kernel log and try to provide some helpful advice on each error message since some can be very cryptic to the untrained eye.

The FWTS kernel error log database is currently approaching 800 entries and I have been slowly working through another 800 or so more relevant and recently added messages.  Needless to say, this is taking a while to complete.  The hardest part was finding relevant error messages in the kernel as they appear in different forms (e.g. printk(), dev_err(), ACPI_ERROR() etc).

In order to scrape the Linux kernel source for relevant error messages I hacked up the kernelscan parser to find error messages and dump these to stdout.  kernelscan can scan 43,000 source files (17,900,000 lines of source) in under 10 seconds on my Lenovo X230 laptop, so it is relatively fast.

I also have been using kernelscan to find spelling mistakes in kernel messages and I've been punting trivial fixes upstream to fix these.  These mistakes are small and petty, but I find it a little irksome when I see the kernel emit a message that contains a typo or spelling mistake - it just looks a bit unprofessional.

I've created a kernelscan snap (which was really easy and fast to do using scancraft), so it is now available Ubuntu.  The source code is also available from the kernel team git web at

The code is designed to only parse kernel source, and it is a very rough and ready parser designed for speed;  fundamentally, it is a big quick hack.  When I get a few spare minutes I will try and see if there is any correlation between the number of error messages with the size of the kernel over the various releases.

Wednesday, 29 June 2016

What's new in stress-ng 0.06.07?

Since my last blog post about stress-ng, I've pushed out several more small releases that incorporate new features and (as ever) a bunch more bug fixes.  I've been eyeballing gcov kernel coverage stats to find more regions in the kernel where stress-ng needs to exercise.   Also, testing on a range of hardware (arm64, s390x, etc) and a range of kernels has eeked out some bugs and helped me to improve stress-ng.  So what's new?

New stressors:
  • ioprio  - exercises ioprio_get(2) and ioprio_set(2) (I/O scheduling classes and priorities)
  • opcode - generates random object code and executes this, generating and catching illegal instructions, bus errors,  segmentation  faults,  traps and floating  point errors.
  • stackmmap - allocates a 2MB stack that is memory mapped onto a temporary file. A recursive function works down the stack and flushes dirty stack pages back to the memory mapped file using msync(2) until the end of the stack is reached (stack overflow). This exercises dirty page and stack exception handling.
  • madvise - applies random madvise(2) advise settings on pages of a 4MB file backed shared memory mapping.
  • pty - exercise pseudo terminal operations.
  • chown - trivial chown(2) file ownership exerciser.
  • seal - fcntl(2) file SEALing exerciser.
  • locka - POSIX advisory locking exerciser.
  • lockofd - fcntl(2) F_OFD_SETLK/GETLK open file description lock exerciser.
Improved stressors:
  • msg: add in IPC_INFO, MSG_INFO, MSG_STAT msgctl calls
  • vecmath: add more ops to make vecmath more demanding
  • socket: add --sock-type socket type option, e.g. stream or seqpacket
  • shm and shm-sysv: add msync'ing on the shm regions
  • memfd: add hole punching
  • mremap: add MAP_FIXED remappings
  • shm: sync, expand, shrink shm regions
  • dup: use dup2(2)
  • seek: add SEEK_CUR, SEEK_END seek options
  • utime: exercise UTIME_NOW and UTIME_OMIT settings
  • userfaultfd: add zero page handling
  • cache:  use cacheflush() on systems that provide this syscall
  • key:  add request_key system call
  • nice: add some randomness to the delay to unsync nicenesses changes
If any new features land in Linux 4.8 I may add stressors for them, but for now I suspect that's about it for the big changes for stress-ng for the Ubuntu Yakkey 16.10 release.

Wednesday, 4 May 2016

Some extra features landing in stress-ng 0.06.00

Recently I've been adding a few more features into stress-ng to get improve kernel code coverage.   I'm currently using a kernel built with gcov enabled and using the most excellent lcov tool to collate the coverage data and produce some rather useful coverage charts.

With a gcov enabled kernel, gathering coverage stats is a trivial process with lcov:

 sudo apt-get install lcov  
 sudo lcov --zerocounters  
 stress-ng --seq 0 -t 60  
 sudo lcov -c -o  
 sudo genhtml -o html  

..and the html output appears in the html directory.

In the latest 0.06.00 release of stress-ng, the following new features have been introduced:

  • af-alg stressor, added skciphers and rngs
  • new Translation Lookaside Buffer (TLB) shootdown stressor
  • new /dev/full stressor
  • hdd stressor now works through all the different hdd options if --maximize is used
  • wider procfs stressing
  • added more keyctl commands to the key stressor
  • new msync stressor, exercise msync of mmap'd memory back to file and from file back to memory.
  • Real Time Clock (RTC) stressor (via /dev/rtc and /proc/driver/rtc)
  • taskset option, allowing one to run stressors on specific CPUs (affinity setting)
  • inotify stressor now also exercises the FIONREAD ioctl()
  • and some bug fixes found when testing stress-ng on various architectures.
The --taskset option allows one to keep stress-ng stressors bound to specific CPUs, for example, to run 5 CPU stressors tied to CPUs 1, 3, 5, 6 and 7:

 stress-ng --taskset 1,3,5-7 --cpu 5  

..thanks to Jim Rowan (Intel) for the CPU affinity ideas.

stress-ng 0.06.00 will be landing in Ubunty Yakkety soon, and also in my power utilities PPA ppa:colin-king/white