Base32 Decoding in JavaScript

When working on the Google Authenticator web page, I realized that I needed to base32 decode the secret given from Google to get to the raw bytes. For those unaware, Base32 encoding is a mechanism to represent arbitrary binary data (1s and 0s) into an alphanumeric representation that is more convenient for transport (typing, etc.). Base32 encoding is exactly like hexadecimal or octal, where binary data is represented using different characters (0-9, A-F in hex, 0-7 in octal) except Base32 encoding uses even more alphanumeric characters (2-7, A-Z) to reduce the space required to represent the same number of bits. Looking around I couldn't find a well tested implementation of this, so I decided to write my own. I used this opportunity to explore unit testing a JavaScript method using QUnit. Let's start with gathering the test cases. The RFC has some test vectors, so obviously those will be included:
test("Test Vectors", function () {
    // Base32Decode should correctly decode the test vectors from the RFC
    strictEqual(Base32Decode("").length, 0, "Base32Decode should return an empty array for the empty string");
    ok(compareUint8ArrayToString(Base32Decode("MY======"), "f"), "Base32Decode should return 'f' for 'MY======'");
    ok(compareUint8ArrayToString(Base32Decode("MZXQ===="), "fo"), "Base32Decode should return 'f' for 'MZXQ===='");
    ok(compareUint8ArrayToString(Base32Decode("MZXW6YQ="), "foob"), "Base32Decode should return 'foob' for 'MZXW6YQ='");
    ok(compareUint8ArrayToString(Base32Decode("MZXW6YTB"), "fooba"), "Base32Decode should return 'fooba' for 'MZXW6YTB'");
    ok(compareUint8ArrayToString(Base32Decode("MZXW6YTBOI======"), "foobar"), "Base32Decode should return 'foobar' for 'MZXW6YTBOI======'");
});

Obviously these tests won't pass until we have a working Base32 decoder. The decoder is fairly straight-forward for inputs that don't have padding (i.e., the number of bytes are multiples of 40). In that case, you simple map the bits per the RFC:

|__A__|__B__|__C__|__D__|__E__|__F__|__G__|__H__|__I__|__J__|__K__|__L__|__M__|__N__|__O__|__P__|__Q__|__R__|__S__|__T__|__U__|__V__|__W__|__X__|__Y__|__Z__|__2__|__3__|__4__|__5__|__6__|__7__|
|00000|00001|00010|00011|00100|00101|00110|00111|01000|01001|01010|01011|01100|01101|01110|01111|10000|10001|10010|10011|10100|10101|10110|10111|11000|11001|11010|11011|11100|11101|11110|11111

The RFC goes into detail about what cases are possible with padding, etc. but I'll leave that as an exercise to the reader. I could have made the code smaller, but I wanted to be clear and follow the RFC as closely as possible. Here is the implementation:

var Base32Decode = function (base32EncodedString) {
    /// Decodes a base32 encoded string into a Uin8Array, note padding is not supported
    /// The base32 encoded string to be decoded
    /// The Unit8Array representation of the data that was encoded in base32EncodedString
    if (!base32EncodedString && base32EncodedString !== "") {
        throw "base32EncodedString cannot be null or undefined";
    }

    if (base32EncodedString.length * 5 % 8 !== 0) {
        throw "base32EncodedString is not of the proper length. Please verify padding.";
    }

    base32EncodedString = base32EncodedString.toLowerCase();
    var alphabet = "abcdefghijklmnopqrstuvwxyz234567";
    var returnArray = new Array(base32EncodedString.length * 5 / 8);

    var currentByte = 0;
    var bitsRemaining = 8;
    var mask = 0;
    var arrayIndex = 0;

    for (var count = 0; count < base32EncodedString.length; count++) {
        var currentIndexValue = alphabet.indexOf(base32EncodedString[count]);
        if (-1 === currentIndexValue) {
            if ("=" === base32EncodedString[count]) {
                var paddingCount = 0;
                for (count = count; count < base32EncodedString.length; count++) {
                    if ("=" !== base32EncodedString[count]) {
                        throw "Invalid '=' in encoded string";
                    } else {
                        paddingCount++;
                    }
                }

                switch (paddingCount) {
                    case 6:
                        returnArray = returnArray.slice(0, returnArray.length - 4);
                        break;
                    case 4:
                        returnArray = returnArray.slice(0, returnArray.length - 3);
                        break;
                    case 3:
                        returnArray = returnArray.slice(0, returnArray.length - 2);
                        break;
                    case 1:
                        returnArray = returnArray.slice(0, returnArray.length - 1);
                        break;
                    default:
                        throw "Incorrect padding";
                }
            } else {
                throw "base32EncodedString contains invalid characters or invalid padding.";
            }
        } else {
            if (bitsRemaining > 5) {
                mask = currentIndexValue << (bitsRemaining - 5);
                currentByte = currentByte | mask;
                bitsRemaining -= 5;
            } else {
                mask = currentIndexValue >> (5 - bitsRemaining);
                currentByte = currentByte | mask;
                returnArray[arrayIndex++] = currentByte;
                currentByte = currentIndexValue << (3 + bitsRemaining);
                bitsRemaining += 3;
            }
        }
    }

    return new Uint8Array(returnArray);
};

I've added more tests around padding and other specifics you can find at the source below, but enjoy a live demo converting base32 encoded strings to hexadecimal:

You can find the source for both the tests and the actual decode on github here: Base32Decode in JavaScript.
Also, if you'd like, you can run the tests directly from your browser via this link.

No comments: