The FixedBitNotation class is for general purpose binary to text conversion with arbitrary encodings. You can use it to handle variants of many
encodings such as
Base64 or
Base32.
Most binary to text encoding schemes use a fixed number of bits (up to 6) of binary data to generate each encoded character. The algorithms used for these encodings are very similar, so I set out to write a single algorithm that handles them all. (Note that
Ascii85 does not work this way; it uses four bytes to generate five encoded characters, and each character is not derived from a fixed number of bits.)
These encodings are usually used to represent data in a notation that is safe for transport, but as the following examples show, there are other uses.
How it worksFirst, create an instance. The constructor accepts five arguments:
integer
$bitsPerCharacter (required) - This is an integer specifying the number of bits from the raw binary string to use for each encoded character. The practical range is 1 to 6; you may use up to 8, but you will have to provide a base character string ($chars) that is at least pow(2, $bitsPerCharacter) characters long. So even with 7 bits per character you need to specify a value for $chars that is 128 characters long, which exceeds the number of printable ASCII characters.
The output's radix relates to the value of $bitsPerCharacter as follows:
1: base-2 (
binary)
2: base-4
3: base-8 (
octal)
4: base-16 (
hexadecimal)
5: base-32
6: base-64
7: base-128
8: base-256
string
$chars (optional) - This is a string that specifies the base alphabet to use in your notation. As explained above, the string length of $chars is related to the value of $bitsPerCharacter. If $chars is not long enough for $bitsPerCharacter, $bitsPerCharacter will be reduced to the greatest value supported by $chars. The default value of $chars is "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ-,".
boolean
$rightPadFinalBits (optional) - This boolean determines how to handle the bits in the last encoded character when the number of bits remaining is less than $bitsPerCharacter. If TRUE, empty bits will be added on the right as needed to fill the quota. If FALSE (the default), they will be on the left. For most content transfer encoding schemes you will set this to TRUE.
boolean
$padFinalGroup (optional) - It's common to encode characters in groups. For example, Base64 (which is based on 6 bits per character) converts 3 raw bytes into 4 encoded characters. If not enough bytes remain at the end, the final group will be padded with "=" to complete a group of 4 characters, and the encoded character length is always a multiple of 4. Some programs rely on the padding for decoding; FixedBitNotation does not.
string
$padCharacter (optional) - If $padFinalGroup is TRUE, this is the character to use. The default is "=".
The encode() method accepts one argument:
string
$rawString (required) - This is the string that you want to encode.
The decode() method accepts three arguments:
string
$encodedString (required) - This is the string that you want to decode.
boolean
$caseSensitive (optional) - To decode in a case-sensitive manner. The default is TRUE.
boolean
$strict (optional) - If TRUE, NULL will be returned if $encodedString contains an undecodable character (which may include whitespace; see below about handling whitespace). If FALSE (the default), unknown characters are simply ignored.
When to use FixedBitNotationUse it when you want to use an encoding for which PHP does not provide a built-in function. PHP provides the
base64_encode() and
base64_decode() functions, but if you need to use a modifed alphabet, you can either use
strtr() to translate the base64_encode() output, or you can specify your own alphabet with FixedBitNotation.
To encode a string with
modified Base64 for URLs and filenames, where the "+" and "/" are replaced with "-" and "_", you would do:
<?php
$modifiedBase64 = new FixedBitNotation(6, 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_', TRUE, TRUE);
$encoded = $modifiedBase64->encode("encode this \xBF\xC2\xBF");
// ZW5jb2RlIHRoaXMgv8K_
?>
PHP does not provide any Base32 encode or decode methods. By setting $bitsPerCharacter to 5 and specifying your desired alphabet in $chars, you can handle any variant of Base32:
<?php
// RFC 4648 Base32 alphabet
$base32 = new FixedBitNotation(5, 'ABCDEFGHIJKLMNOPQRSTUVWXYZ234567', TRUE, TRUE);
$encoded = $base32->encode('encode this');
// MVXGG33EMUQHI2DJOM======
?>
Octal notation:
<?php
$octal = new FixedBitNotation(3);
$encoded = $octal->encode('encode this');
// 312671433366214510072150322711
?>
A convenient way to go back and forth between binary notation and a real binary string:
<?php
$binary = new FixedBitNotation(1);
$encoded = $binary->encode('encode this');
// 0110010101101110011000110110111101100100011001010010000001110100011010000110100101110011
$decoded = $binary->decode($encoded);
// encode this
?>
PHP has its own fixed-bit notation that it uses to generate session identifiers. The default for $chars (see above) matches the alphabet PHP uses. The
session.hash_bits_per_character php.ini configuration option accepts a value between 4 to 6. Since 4 results in standard hexadecimal, you don't need this class to emulate PHP's session IDs, but you do for 5 and 6. With the raw_output parameter of PHP's hashing functions, you can create unique IDs of the exact same form by choosing $bitsPerCharacter and setting $rightPadFinalBits to FALSE (the default):
<?php
// Generate a value that follows the form:
// session.hash_function = 0
// session.hash_bits_per_character = 5
$notate5bpc = new FixedBitNotation(5);
$id = $notate5bpc->encode(md5(uniqid(mt_rand(), TRUE), TRUE));
// q3c8n4vqpq11i0vr6ucmafg1h3
?>
<?php
// Generate a value that follows the form:
// session.hash_function = 1
// session.hash_bits_per_character = 6
$notate6bpc = new FixedBitNotation(6);
$id = $notate6bpc->encode(sha1(uniqid(mt_rand(), TRUE), TRUE));
// 7Hf91mVc,q-9W1VndNNh3evVN83
?>
(Let's not make this a discussion of the randomness of
rand(),
mt_rand() or
uniqid(); that's not the point.)
I use the above technique to generate unique IDs for all kinds of things, or any time I want a hash digest in a notation other than hexadecimal. For some uses, the decode() method is valuable for converting notated hash digests back into their raw binary form for efficient data storage.
I've also found this FixedBitNotation class useful for creating promotion codes or auto-generated passwords with a carefully chosen alphabet. Whenever you generate codes that will be read and typed in by users, you should use distinct symbols that are not easily confused with others. With a full alphabet it's possible to inadvertently form offensive words. I like to use capital letters and omit vowels and zero. This leaves 30 alphanumeric characters, but we need 32 to use 5 bits per character, so two characters will be used twice in the base alphabet. I accept this because the result doesn't need to be reversible, and even with the bias toward two of the characters, the character distribution is well balanced. Keep the character bias in mind when choosing an output length that makes your codes sufficiently hard to guess.
<?php
// Generate an eight character password
$pwEncoder = new FixedBitNotation(5, '123456789BCDFGHJKLMNPQRSTVWXYZHZ');
$password = substr($pwEncoder->encode(md5(uniqid(mt_rand(), TRUE), TRUE)), 0, 8);
// HW42NMCP
?>
When not to use FixedBitNotationDo not use FixedBitNotation when there is a native PHP function to suit your needs. If you're using Base64 encoding with the
standard alphabet, use base64_encode() and base64_decode(); they're faster. For that reason, you might even prefer the strtr() suggestion I mentioned earlier for handling Base64 or hexadecimal with a modified alphabet.
Instead of using FixedBitNotation for encoding and decoding hexadecimal (like the binary example above), consider using
bin2hex() and
pack() instead; they're about 20 times faster:
<?php
$encoded = bin2hex('encode this'); // 656e636f64652074686973
$decoded = pack('H*', $encoded); // encode this
?>
Finally, please understand that this is not encryption. Do not use this class to secure your data.
Handling whitespaceVariations of some content transfer encoding schemes specify a fixed or maximum line length. To add line endings to your encoded output, you can use
chunk_split() or
wordwrap(). To handle whitespace with decode(), you can simply set $strict to FALSE (the default) to ignore all characters that are not part of the base alphabet. But if you want to set $strict to TRUE, you can use
str_replace() on the encoded string before trying to decode:
<?php
// Remove line breaks from encoded data before decoding
$encoded = str_replace(array("\r", "\n"), '', $encoded);
$decoded = $fbnInstance->decode($encoded, TRUE, TRUE);
?>
<?php
// Remove whitespace from encoded data before decoding
$encoded = str_replace(array(" ", "\t", "\r", "\n", "\0", "\x0B"), '', $encoded);
$decoded = $fbnInstance->decode($encoded, TRUE, TRUE);
?>