|
Assigned: Mon 14 Feb 2000
Due: Mon 21 Feb 2000
In this lab we will explore how the arithmetic of various number bases and the methods of data representation described in lectures can be used to determine the best or most concise way to store various forms of data. Questions like these are faced by graphic designers, web page designers, sound engineers and communications technicians on a daily basis. An understanding of some of these issues will help you understand the needs of these disciplines in your programming; it will also help you to develop intuitions about more general data storage and complexity analysis issues.
As with Lab #2, this one is intended to be largely a "paper and pencil" exercise. You may still use any calculators or other computer tools that you find useful, but you should make sure that you understand the underlying concepts and can do these kinds of calculations by hand if necessary (e.g., on an exam).
Instead of a demo, you should hand in a paper version of your answers by the due date.
A bit is the smallest unit of binary data representation, i.e., a
single "zero" or "one", representing a distinction between two different things
(or choices, or alternatives). Eight bits gathered together constitute a
byte; this is usually enough to represent a single letter or symbol
(the ASCII code really uses only 7 bits). Since a byte a byte is eight bits,
it can be represented by a pair of hexadecimal digits, since each hex digit is
equivalent to four bits. Sometimes the four-bit unit is known as a
nibble, but only when we really want to push the limits of cuteness
On a certain computer system texts are stored as eight-bit ASCII codes
(with a leading zero) and the standard system font consists of a 5 x 7
fixed-width grid for each character. For the sake of uniformity, the
system stores a single 5 x 7 grid for all of the 128 ASCII characters,
even though some control codes and the like are not printable.
The system font includes a built-in pixel's worth of vertical spacing
between lines of text, i.e., when a text is rendered on-screen as a
bit-mapped graphic, no extra space need be left between lines.
For longer messages, the first method will clearly result in a more
compact transmission, since sending the font information separately
will save space. But for shorter messages, the "raw" bit-mapped
graphic may actually be smaller than the symbolically-encoded form.
What, if any, is the "break-even" points, i.e., how many symbols can
we send before it is worth switching to the font+symbol form?
The chart below represents a scaled-up version of a black-and-white
bit-map graphic. The following table of hex digits describes the
content of the graphic as a series of bit values, zero for white and
one for black. The lower half of the graphic has already been filled
in for you, using the last four rows of hex values. Fill in the upper
half of the graphic using the first four rows of hex values.
Assume that we have a graphic represented as a color bit-map of size
30 x 40 pixels and with a color depth of 16 bits. Answer the following
questions about file sizes assuming no overhead for file headers and the
like (i.e., just count pixel data or, where relevant, color pallette
tables).
Recall from lecture that a run-length encoding is a technique used to
compress data by representing repeated values (in a "run") as a pair
of values, the first of which is a run length (the number of repetitions)
and the second of which is the repeated element or value. Note that a
run-length encoding can be used in any context where a series of
(possibly-repeated) values is used (e.g., graphics, sound, text, etc.).
A reminder on terminology
.
Exercise #1: Representing text
Note: in this chart, the hex value for a given character is to
the immediate right of that character.
50 65 6E 67 75 69 6E 20 70 6F 77 65 72 21
Exercise #2: Fonts and texts
Exercise #3: Simple graphics
C 0 0 0 0
A 4 8 0 0
A A E E A
C E 8 8 A
8 8 8 8 4
8 6 8 E 4
0 0 0 0 8
0 0 0 0 8
Exercise #4: Color graphics
Exercise #5: Run-length encodings
00 00 00 00 00 FF FF 00 A1 A1 A1 A1 00 00 00 00 00 A1 A1 FF FF FF 00 00
1 00 1 A1 1 00 1 B2 2 C0 1 FF 1 A1 2 32