My ThinkPad Dies With 10-30% Battery Remaining

Many ThinkPad batteries shipped within the recent months have bad firmware, causing the machines to read the battery charge incorrectly. This can cause the ThinkPads to shut off unexpectedly, and eventually can lead to them not charging at all. If your battery “FRU” (printed on the inside edge of the battery) contains any of the following, you very, very probably have a faulty battery: 42T4708, 42T4714, 42T4737, 42T4757, 42T4797, 42T4803, 42T4783, 42T4789, 42T4831, 42T4807, 42T4815, 42T4839, 42T4848, 42T4849, 42T4850, 42T4851, 42T4852, 42T4853, 42T4854, 42T4855, 42T4856, 42T4857, 42T4858, 42T4929, 42T4933, 42T4937, 42T4939, or 45N1039. These batteries shipped with ThinkPads with model names/numbers L410, L412, L510, L512, SL410, SL510, T410, T420s, T420si, T430s, T430si, T510, W510, X1, X100e, Edge 13″ (Machine types: 0196, 0197, 0492), Edge 14″, Edge 15″, Edge E30 (Machine types: 0196, 0197, 0492), Edge E40, Edge E50, Edge E220s, and Edge E420s. The “machine type” can be found on the bottom of the machine, or inside the battery bay.

To be clear, what’s going on here is the following: These batteries have software running on them (firmware) that incorrectly reports to the computer what the charge state of the battery is. This is clearly, without question, a hardware problem. Let’s get that straight, before going any further. Now, going further, your warranty covers this if you own an affected machine, as I do. Panasonic screwed up in that they placed bad firmware inside these batteries. Lenovo screwed up in that they used those batteries in ThinkPads.

Lenovo has issued an update for Windows machines here (accessed 2-27-13). This will place new firmware on the batteries, mostly if not entirely resolving the issue.

But I don’t Run Windows

If you’re like me and you’re running a Debian-based Linux machine (I run Debian Wheezy), then you can find much of the information about your battery by running a command such as

$ cat /sys/class/power_supply/BAT0/manufacturer
Panasonic
$ cat /sys/class/power_supply/BAT0/model_name
45N1039

Depending on your specific distro, you’ll find things in /proc/acpi or (more reliably) in /sys/class/power_supply. There is generally a good amount of information there, which is what your system is actually using when it displays information about the battery charge and health. For instance, on my Debian machine, I can clearly see an issue with my battery by simply looking at the Wh design and current reported capacities for the battery. Something here look fishy?

$ cat /sys/class/power_supply/BAT0/energy_now
47620000
$ cat /sys/class/power_supply/BAT0/energy_full_design
43290000

In essence, my battery is designed to hold 43290 mWh of charge, while it apparently is currently holding 47620 mWh. In other words, the battery is reporting that it is maintaining a charge 10% higher than it’s design capacity. Oops.

One can think of watt hours as available power over time, whereas amp hours would correspond to current over time, and multiplying amps by voltage gives you watts. We could report in amps instead by using acpi (advanced configuration and power interface) as follows.

$ acpi -V | grep mAh
Battery 0: design capacity 3464 mAh, last full capacity 3811 mAh = 100%

Again, oops. OK, so we know that there is something weird going on with the battery. Before I found the firmware fix, I was trying to diagnose this a bit myself. I used fwts (firmware test suite), which ran me through a process of plugging in and unplugging my machine. This took about five minutes, while it tested the charge and discharge properties of the battery and compared those numbers to what the battery was reporting. Here’s the output.

$ sudo fwts battery
 
Battery Tests.
----------------------------------------------------------------------------------------------------
Test 1 of 1: Check batteries.
This test reports which (if any) batteries there are in the system. In addition, for charging or
discharging batteries, the test validates that the reported 'current capacity' properly increments
/decrements in line with the charge/discharge state. This test also stresses the battery state
reporting codepath in the ACPI BIOS, and any warnings given by the ACPI interpreter will be
reported.
Found 1 batteries.
Test battery 'BAT0'.
Got 105 interrupt(s) on GPE gpe11.
Got 105 interrupt(s) on GPE gpe_all.
Got 105 SCI interrupt(s).
PASSED: Test 1, Detected ACPI battery events.
PASSED: Test 1, Detected ACPI event for battery BAT0.
FAILED [MEDIUM] BatteryNotDischarging: Test 1, Battery BAT0 claims it is discharging but no charge
is used.
Got 131 interrupt(s) on GPE gpe11.
Got 131 interrupt(s) on GPE gpe_all.
Got 131 SCI interrupt(s).
PASSED: Test 1, Detected ACPI battery events.
PASSED: Test 1, Detected ACPI event for battery BAT0.
FAILED [MEDIUM] BatteryNotCharging: Test 1, Battery BAT0 claims it's charging but no charge is added
Please ignore this error with a new battery
FAILED [LOW] BatteryZeroCycleCount: Test 1, System firmware may not support cycle count interface or
it reports it incorrectly for battery BAT0.
Test battery 'BAT0' downward trip point.
Got 75 interrupt(s) on GPE gpe11.
Got 75 interrupt(s) on GPE gpe_all.
Got 75 SCI interrupt(s).
FAILED [HIGH] BatteryNoEvents: Test 1, Did not detect any ACPI battery events.
FAILED [HIGH] BatteryNoEvents: Test 1, Could not detect ACPI events for battery BAT0.
Test battery 'BAT0' upwards trip point.
Got 69 interrupt(s) on GPE gpe11.
Got 69 interrupt(s) on GPE gpe_all.
Got 69 SCI interrupt(s).
FAILED [HIGH] BatteryNoEvents: Test 1, Did not detect any ACPI battery events.
FAILED [HIGH] BatteryNoEvents: Test 1, Could not detect ACPI events for battery BAT0.
 
====================================================================================================
4 passed, 7 failed, 0 warnings, 0 aborted, 0 skipped, 0 info only.
====================================================================================================
 
4 passed, 7 failed, 0 warnings, 0 aborted, 0 skipped, 0 info only.
 
Test Failure Summary
====================================================================================================
 
Critical failures: NONE
 
High failures: 2
 battery: Did not detect any ACPI battery events.
 battery: Could not detect ACPI events for battery BAT0.
 
Medium failures: 2
 battery: Battery BAT0 claims it is discharging but no charge is used.
 battery: Battery BAT0 claims it's charging but no charge is added
 
Low failures: 1
 battery: System firmware may not support cycle count interface or it reports it incorrectly for battery BAT0.
 
Other failures: NONE
 
Test           |Pass |Fail |Abort|Warn |Skip |Info |
---------------+-----+-----+-----+-----+-----+-----+
battery        |    4|    7|     |     |     |     |
---------------+-----+-----+-----+-----+-----+-----+
Total:         |    4|    7|    0|    0|    0|    0|
---------------+-----+-----+-----+-----+-----+-----+

So, time to call Lenovo. Here’s where things got a bit interesting.

Warranty Through Lenovo

You can’t really expect a service representative to know the difference between firmware and drivers, and you really can’t expect a service representative to know what a monolithic kernel is. Anyway, I expected to be told to “update your power management drivers,” which is just a fancy way of telling me to get the firmware update … assuming I use Windows. And, that’s what I was told to do. When I mentioned that I don’t use Windows and that this is a hardware issue, not a driver issue, things got a bit heated.

First, they offered to send me a copy of Windows on a DVD. I’m not interested in that, as I can use Windows … but haven’t loaded it on my personal machines in 7 years at this point. I was then told, and I quote:

“We do not support Linux. Please call Linux to resolve this issue.” ~Lenovo support

I told them that I might as well call my dining room table in the process. That caused a bit of an awkward silence for a few moments. Making a long story short, at this point the representative was clearly on the phone with a supervisor, as I could hear parts of their conversation. I restated how this was a hardware issue … that simply booting into the BIOS and running my machine would cause this problem to happen. And, while I think it’s awesome that Lenovo has a Windows-based solution, my warranty doesn’t say that I have to use Windows in order for the hardware itself to function correctly.

So, this all went down on Friday of last week. I received my new battery, a Sanyo produced FRU 45N1037, in the mail today. This one, some internet searching shows, has solid firmware on it.


This problem was posted on the GAP (Groups Algorithms & Programming) Forum some time ago. Roughly a week later, this partial solution was posted.

Knowing that GAP runs on an interpreted language on top of a C kernel, I thought I may be able to do better with C. After prototyping the situation in Python, my fears were realized: The digits associated with the orbit are HUGE. So, I decided to code in C using the GMP library and OpenMP. The main idea is that the orbit can be computed in parallel, going forwards and backwards until the orbit points agree. Not only will this cut the computation time roughly in half (when compared to a non-parallel C/GMP solution), but it should go much faster than GAP. And indeed, it does. My post to the forum announcing the first known solution can be found here. The maximum digit length found in the orbit is 76,785.

C/GMP/OpenMP Code

Here is the code. It requires the GNU MP Bignum Library (not in GCC) and OpenMP (in GCC). When the orbit points are within a specific digit length difference, only a single core continues the computation. Otherwise, both cores continue to compute the orbit in opposite directions. (There’s no makefile. I’ll give compiler instructions below.)

// Jason B. Hill
// Jason.B.Hill@Colorado.edu
// www.jasonbhill.com
 
#include <stdio.h>
#include <stdlib.h>
#include <gmp.h>
#include <omp.h>
 
//#define p 32
//#define p 736
//#define p 25952
#define p 173176
 
/*****************************************************************************/
/* Transpositions a,b,c and compositions g,g^-1                              */
/*****************************************************************************/
 
void a(mpz_t omega) {
    if(mpz_even_p(omega)==1) mpz_add_ui(omega, omega, 1);
    else mpz_sub_ui(omega, omega, 1);
}
 
void b(mpz_t omega) {
    if(mpz_congruent_ui_p(omega, 0, 5)==1) mpz_add_ui(omega, omega, 4);
    else if(mpz_congruent_ui_p(omega, 4, 5)==1) mpz_sub_ui(omega, omega, 4);
}
 
void c(mpz_t omega) {
    if(mpz_congruent_ui_p(omega, 1, 4)==1) {
        mpz_sub_ui(omega, omega, 1);
        mpz_divexact_ui(omega, omega, 4);
        mpz_mul_ui(omega, omega, 6);
    } else if(mpz_congruent_ui_p(omega, 0, 6)==1) {
        mpz_divexact_ui(omega, omega, 6);
        mpz_mul_ui(omega, omega, 4);
        mpz_add_ui(omega, omega, 1);
    }
}
 
void g(mpz_t omega) {
    a(omega);
    b(omega);
    c(omega);
}
 
void ginv(mpz_t omega) {
    c(omega);
    b(omega);
    a(omega);
}
 
/*****************************************************************************/
/* Main                                                                      */
/*****************************************************************************/
 
int main(void) {
    unsigned long       n = p;
    unsigned long long  i0 = 0;
    unsigned long long  i1 = 0;
    int                 th_id;
    _Bool               sstop = 0;
    size_t              s = 0;
    size_t              c0, c1;
    mpz_t               omega0, omega1;
 
    omp_set_num_threads(2);
 
    mpz_init(omega0);
    mpz_init(omega1);
 
    mpz_set_ui(omega0, n);
    mpz_set_ui(omega1, n);
 
    c0 = mpz_sizeinbase(omega0, 10);
    c1 = mpz_sizeinbase(omega1, 10);
 
    #pragma omp parallel private(th_id) \
    shared(omega0,omega1,s,sstop,c0,c1,i0,i1)
    {
        th_id = omp_get_thread_num();
 
        if(th_id == 1) {
            g(omega1);
            i1++;
            if(mpz_cmp(omega0,omega1)==0) sstop = 1;
        }
 
        #pragma omp barrier
 
        while(!sstop) {
            if(th_id == 0) {
                if(abs(c0 - c1) > 20) {
                    ginv(omega0);
                    c0 = mpz_sizeinbase(omega0, 10);
                    i0++;
                }
            } else if(th_id == 1) {
                if(abs(c0 - c1) > 5) {
                    g(omega1);
                    c1 = mpz_sizeinbase(omega1, 10);
                    i1++;
                } else {
                    if(mpz_cmp(omega0,omega1)==0) sstop = 1;
                    else {
                        g(omega1);
                        c1 = mpz_sizeinbase(omega1, 10);
                        i1++;
                    }
                }
            }
 
            #pragma omp flush(sstop,c0,c1,i0,i1)
 
            if(th_id == 0) {
                if(c0 > s) {
                    s = c0;
                    printf("Core 0: digit length increased to %ld\n", s);
                    printf("iterations: %lld (core0) %lld (core1) %lld (total)\
\n\n",i0,i1,i0+i1);
                }
                if(c1 > s) {
                    s = c1;
                    printf("Core 1: digit length increased to %ld\n", s);
                    printf("iterations: %lld (core0) %lld (core1) %lld (total)\
\n\n",i0,i1,i0+i1);
                }
                if((i0+i1)%100000000==0) {
                    printf("digit length: %ld\n", s);
                    printf("iterations: %lld (core0) %lld (core1) %lld (total)\
\n\n",i0,i1,i0+i1);
                }
            }
        }
    }
 
    printf("total iterations: %lld\n",i0+i1);
    mpz_clear(omega0);
    mpz_clear(omega1);
 
    return 0;
}

Using GCC with OpenMP and GMP, one can compile the code on a multicore machine with, for example, the following command.

gcc -O3 -o length-residue-orbit-omp length-residue-orbit-omp.c -lgmp -fopenmp

Success!

The result: total iterations: 47610700792, is returned in roughly 3.1 days on a 2.9 GHz 3rd generation Intel Core i7 (i7-3520M).