Representing text
Lecture #3: Data representation |
Representing text | Counting symbols we can represent most "plain" textual documents as a sequence of symbols (i.e., ignoring mark-up such as boldface, page layout, etc.) The symbols themselves can be represented as sequences of bits: 7 or 8 bits per symbol is the norm, thus between 128 and 256 possible symbols (also called "characters") |
Lecture #3: Data representation |
Representing text | Counting symbols |
| The ASCII code The American Standard Code for Information Interchange (or ASCII) uses 7 bits per symbol to represent English letters, Arabic numerals, punctuation and a few control characters (e.g., "end-of-file", "carriage return", "bell", etc.). Several other codes were used historically (e.g., EBCDIC on IBM computers), but are used only rarely today. Here's a link to some info on ASCII |
Lecture #3: Data representation |
Representing text
Lecture #3: Data representation |
Representing text
Lecture #3: Data representation |
Representing text
Lecture #3: Data representation |
Representing text