Encode a String to Base64 With Java

Base64 is a method of encoding every 3 bytes of input into 4 bytes of output; it is commonly used to encode photos or audio to send in emails (though the days of 7-bit transmission lines are mostly over), and a way to hide webpage authentication (usernames and passwords) from casual snooping. Here is an example of how to code a Base64 encoder in Java, a multi-platform programming language. This example, and the test coding string, is borrowed from the Wikipedia article.

Steps

  1. Enter Info and Name file
  2. Start up your editor, such as Notepad or vi, and enter the preliminaries, such as the class declaration and known constants. Name the file Base64.java.
  3. Those constant values are specified, so the article states, in the relevant RFCs. It is generally a good idea read all relevant RFCs before beginning coding.

  • Treating characters as bytes means that multibyte characters, such as Japanese or Chinese, will not be correctly encoded. Therefore, we must use the getBytes() method of String in order to convert the Unicode characters of the current locale to bytes before the encoding begins. But, for example, if you're working on a Japanese document in an American English locale, you will need to specify the locale for the output of getBytes(), such as getBytes("UTF-8").
  • Let's find out how many padding bytes are needed. Java's modulo operator, %, comes in handy here. Let's also declare the subroutine name and parameters while we're at it.
  • Now we use that value to null-pad the input. Note that if no padding is needed, none is added, since we take the modulus of 3 a second time, turning a 3 into a 0. </ol>
  • Now we get to the meat: packing three bytes at a time into a 24-bit integer, then extracting out 6-bit indices into the coding string. These numbers aren't magic: 24 divides into 6 exactly 4 times, and 6 bits can hold values from 0 to 63, which can index into any value in the 64-byte coding string. </ol>
      1. Lastly, we package the output, after padding it, by inserting CRLFs at the required 76-byte boundaries, using a separate subroutine for clarity.
      1. We can, if desired, add a main routine for testing purposes. This is usually a good idea before posting one's code for public consumption.
      1. Here is the finished module:
      1. Let's compile it, using javac, gcj, likes, or the like; and test, using the Hobbes quote from the Wikipedia article:
      Here's the result:
      1. It matches exactly! That either means both programs are wrong, or they're both more or less right. At this point you might want to revisit the Wikipedia article, and read the linked RFCs to see if we missed anything.

    Tips

    • Don't feel that you always have to understand something completely before coding. Things get clearer as you go along.
    • Java is okay as a general purpose language, and for devices like cell phones may be a programmer's only option; but you might find Javascript's or Python's syntax to be more concise and powerful. Different languages have their own strengths and weaknesses.
    • Try writing the companion decode() method for this module!
    • While reading relevant RFCs is necessary for production code, the information overload can be overwhelming; sometimes the best way is to skim them, code according to what you understand, and then go back and check the functionality point-by-point against the RFC's mandatory requirements.

    Related Articles

    Sources and Citations