Thursday, December 3, 2009

Parallel Lives

Plutarch was a Roman-Greek historian and biographer, well known because of his work "Parallel Lives", a series of biographies of famous Greeks and Romans, arranged in pairs to illuminate their virtues and vices. I wanted to try a similar approach pairing some digital audio and digital image technology citizens. But the vices and virtues only apply to the computer users and developers, because the technology, like the science, is neutral.

Let me start before the computers epoch. At some point in the past, painting was the only way to fix an image in a 2D surface. The sound, on the other hand, was impossible to fix in a literal way. Only using a symbolic music language, the same used today (musical scores). In the 19th century appeared the photography and the phonograph. At this point there were two artistic approaches for taking images: painting and photography. Music has a similar duality: it is still possible to “draw” scores, and also to take snapshots of musical performances to be preserved and later reproduced literally. Of course there are non-artistic usages of photographic and phonographic recordings, as well.

Let's stop here a moment. After the photography birth, nobody really has any doubt about the future of the plastic arts. Today, we all enjoy with paintings, children learn to draw and paint, and architects and engineers use technical drawings in their daily jobs. And there is also photography, with artistic and non-artistic branches. We have specific tools in our computers to fulfill all the tasks related to the images. There is Krita and Karbon14, GIMP and Inkscape, Photoshop and Illustrator. For each digital image program there is a vectorial drawing one. There is also CAD software, of course, and ray tracing. More about this later.

Bitmap (raster) graphics are digital representations of 2D images. A digital bitmap is a matrix of pixels (dots), each pixel representing the color (and sometimes transparency) for a point. Different resolutions, or pixel density, are measured in pixels by length unit. Common resolutions are 100 dots per inch, for display devices, and 600 dpi for printers. We can calculate the size in bits of a picture in bitmap format if we know the pixel size and the with and length in pixels of the picture. For instance, an image of 50x20 pixels in true color, 24 bits per pixel, weights exactly 50*20*24 = 24000 bits = 3000 bytes. It is not possible to make a similar calculation for a vectorial image, as the size depends on the complexity and details of the drawing, and not the dimensions.

Digital audio is represented in PCM (pulse code modulation) format as a stream of samples. Each sample is a measure taken by a microphone in equal instants of time. Resolution, or sampling frequency, is measured in samples per time unit. For instance, in Hz (samples per second). Common resolutions are 44100 Hz for CD quality recordings, or 96000 for higher quality. We can calculate the size of an audio recording knowing the number of channels, the sample size, sampling frequency and time length. For instance, a recording of 1 second of monaural (1 channel) sound, using 24 bit samples at 44100 Hz takes 24*44100*1*1= 1058400 bits = 132300 bytes.

Both bitmap graphics and PCM digital audio recordings are usually compressed to reduce it's weight. Some of the compression methods also discard the less relevant information. The JPEG and the MP3 formats are examples of lossy compression. Uncompressed formats are for instance BMP and WAV. Lossless compression examples are PNG and FLAC.

Vector graphics are stored in several file formats. A modern standard is SVG, based on XML. There is also a XML-based music representation, called MusicXML. A very common music format, part of the MIDI standard is SMF (Standard MIDI File) using the .MID filename extension. Vector graphics need to be rendered into bitmap graphics before being displayed, using a rendering graphics library as Cairo. Same happens with the MIDI music, rendered into audio with a synthesizer.

Vector graphics are usually schematic, not very photo realistic. But there are some ray tracing programs allowing photo realistic rendering of a symbolic textual vector graphics source. Something similar happens with MIDI rendering using samplers, producing orchestral music with high level of realism. But many MIDI synthesizers usually leave a very characteristic electronic timbre.

When you render an empty vector graphic (say, a simple white surface) into a bitmap, you can realize that the size of the resulting rendering doesn't depend on the image contents, while the size of the original vector graphics image does. Same with MIDI. If you create a MIDI file of John Cage's composition 4'33'' (which is absolute silence), it will take only a few bytes. The rendering into digital audio weights several megabytes uncompressed. The same size taken by 4 minutes and 33 seconds of a jazz song.

About the transformations. The name Scalable Vector graphics already hints one of its strengths: scalability. You can resize a vector graphic without losing quality, in contrast to a bitmap graphic resize operation. In digital audio, the only dimension is time. You can change the speed, the tempo of a musical composition in MIDI format without losing quality at all. Or you can transpose a song, lowering or rising the notes' pitches. In digital audio you can do those transformations (time stretch and pitch shift) using a FFT (fast Fourier transform) algorithm, but it usually creates artifacts. You can do more transformations in MIDI: mute one instrument, or change it for another one. Adjust the volume of an instrument, or even individual notes.

There are also deep differences in computer support for these technologies. Digital audio is very well supported, like bitmap graphics. Vector graphics are also very well supported, and since the adoption of the SVG standard, it has been very beneficial for Graphic User Interfaces. MIDI is, by contrast, really bad supported, both for hardware and software. Modern graphics cards include 2D and 3D acceleration, and graphics libraries like Cairo and Qt4 obtain a real benefit of the hardware enhancements, and the graphics acceleration is being increasingly adopted by most computer manufacturers and software vendors. On the other hand, audio hardware interfaces have been dropping MIDI support, which was never very good (remember the old cheesy sound of MIDI files?) And Linux distros don't include an easy to use software synthesizer. Both Windows and Mac OSX include a software synthesizer ready to be used out of the box, including the Roland SoundCanvas (lite) SoundFont, which is not a beast but has reasonable quality. Meanwhile, the Linux distros don't care about that. And what about MIDI software? OSSv4 dropped MIDI support. The ALSA sequencer is very good, but needs more good applications.