Characters and Strings


Characters: ASCII and Unicode formats

We have already talked a bit about characters. The primitive data type char represents a single character (i.e. anything that can be typed on a keyboard).

ASCII

In most computer languages, characters are stored in the ASCII (American Standard Code for Information Interchange). When you save a file as "plain text", it is stored using ASCII.

ASCII format uses 1 byte per character

1 byte gives only 256 possible characters (128 standard and 128 non-standard). Page 357 lists the standard characters. Or see table on web.

Unicode

In order to accommodate more possible characters (e.g. from other natural languages), a new code was recently introduced called Unicode

Unicode format uses 2 bytes per character

which allows for 216 ~ 65K different characters. The downside is that at unicode text file takes up twice the amount of space to store. See charts for examples.

All characters that have an ASCII representation, have the same representation in Unicode (except that it is stored in 2 bytes).

Java uses the Unicode format.


Strings

To create a String, one can either

String name = "Alice";

or

String name = new String("Alice");

The above declarations are not really completely the same (see discussion later about comparing Strings);

A String is stored as an array (we'll study this in next chapter), i.e. as a sequence of characters. Each character has an index that can be used to reference that character.

The first character always starts at index 0. For example:

char z = name.charAt(3); // set z equal to 'c'

Note that characters might be blanks:

String day = " Wednesday";
char d = day.charAt(0); // set d = ' '

The String class has a number of other useful methods. Take a moment to look up String in the API reference.

Some Public Methods:

Also look up: concat, compareTo, indextOf, valueOf, etc


Comparing Strings

Comparing Strings is tricky. There are several different ways of doing it depending on what you want to compare and how the String is stored.

Given strings s1 and s2, one can compare using

Can you predict what the output of the following will be?

String n1 = "cat" ; 
String n2 = "cat";
String n3 = new String("cat");
String n4 = n1;
String n5 = " cat"; 

// ********* compare n1 and n2
if (n1.equals(n2)) System.out.println("n1 equals n2");
else System.out.println("n1 does not equal n2");

if (n1 == n2) System.out.println("n1 == n2");
else System.out.println("n1 != n2");

// ********* compare n1 and n3
if (n1.equals(n3)) System.out.println("n1 equals n3"); else System.out.println("n1 does not equal n3"); if (n1 == n3) System.out.println("n1 == n3"); else System.out.println("n1 != n3"); // ********* compare n1 and n4 if (n1.equals(n4)) System.out.println("n1 equals n4"); else System.out.println("n1 does not equal n4"); if (n1 == n4) System.out.println("n1 == n4"); else System.out.println("n1 != n4"); // ********* compare n1 and n5 if (n1.equals(n5)) System.out.println("n1 equals n5"); else System.out.println("n1 does not equal n5"); if (n1 == n5) System.out.println("n1 == n5"); else System.out.println("n1 != n5");


Anonymous Strings

When you declare a String as

String n = new String("a unique string");

you are always creating a new String object. However, when you create a string

String n = "an anonymous string";

you are creating a new object only if there is not already an anonymous String object with the same letters.

Usually what you are interested in is whether or not two strings contain the same sequence of characters. In this case, using the equals() method will always work regardless of how the string is declared.


Working with Strings: StringBuffer Class

Strings are said to be immutable, meaning they can't be changed. However, you can set a string reference to point to a new String:

String d1 = "Saturday";
d1 = "Wednesday"; // original string garbage collected

If you concatenate

String n1 = "Willamette";
String n2 = n1.concat(" University"); // concat returns
               // a new string but doesn't change n1

What do you do if you want to manipulate the actual characters in Strings? - Use a StringBuffer.

StringBuffer name = new StringBuffer("willamette");
name.setCharAt(0,'W');
name.append(" University");


String Exercises:


next lecture