Tuesday 21 June 2011

The 16-byte frontier: extreme results from extremely small programs.

While mainstream software has been getting bigger and more bloated year after year, the algorithmic artists of the demoscene have been following the opposite route: building ever smaller programs to generate ever more impressive audiovisual show-offs.

The traditional competition categories for size-limited demos are 4K and 64K, limiting the size of the stand-alone executable to 4096 and 65536 bytes, respectively. However, as development techniques have gone forward, the 4K size class has adopted many features of the 64K class, or as someone summarized it a couple of years ago, "4K is the new 64K". There are development tools and frameworks specifically designed for 4K demos. Low-level byte-squeezing and specialized algorithmic beauty have given way to high-level frameworks and general-purpose routines. This has moved a lot of "sizecoding" activity into more extreme categories: 256B has become the new 4K. For a fine example of a modern 256-byter, see Puls by Rrrrola.



The next hexadecimal order of magnitude down from 256 bytes is 16 bytes. Yes, there are some 16-byte demos, but this size class has not yet established its status on the scene. At the time of writing this, the smallest size category in the pouet.net database is 32B. What's the deal? Is the 16-byte limit too tight for anything interesting? What prevents 16B from becoming the new 256B?

Perhaps the most important platform for "bytetros" is MS-DOS, using the no-nonsense .COM format that has no headers or mandatory initialization at all. Also, in .COM files we only need a couple of bytes to obtain access to most of the vital things such as the graphics framebuffer. At the 16-byte size class, however, these "couples of bytes" quickly fill up the available space, leaving very little room for the actual substance. For example, here's a disassembly of a "TV noise" effect (by myself) in fifteen bytes:
addr  bytes     asm
0100 B0 13 MOV AL,13H
0102 CD 10 INT 10H
0104 68 00 A0 PUSH A000H
0107 07 POP ES
0108 11 C7 ADC DI,AX
010A 14 63 ADC AL,63H
010C AA STOSB
010D EB F9 JMP 0108H

The first four lines, summing up to a total of eight bytes, initialize the popular 13h graphics mode (320x200 pixels with 256 colors) and set the segment register ES to point in the beginning of this framebuffer. While these bytes would be marginal in a 256-byte demo, they eat up a half of the available space in the 16-byte size class. Assuming that the infinite loop (requiring a JMP) and the "putpixel" (STOSB) are also part of the framework, we are only left with five (5) bytes to play around with! It is possible to find some interesting results besides TV noise, but it doesn't require many hours from the coder to get the feeling that there's nothing more left to explore.

What about other platforms, then? Practically all modern mainstream platforms and a considerable portion of older ones are out of the question because of the need for long headers and startup stubs. Some platforms, however, are very suitable for the 16-byte size class and even have considerable advantages over MS-DOS. The hardware registers of the Commodore 64, for example, are more readily accessible and can be manipulated in quite unorthodox ways without risking compatibility. This spares a lot of precious bytes compared to MS-DOS and thus opens a much wider space of possibilities for the artist to explore.

So, what is there to be found in the 16-byte possibility space? Is it all about raster effects, simple per-pixel formulas and glitches? Inferior and uglier versions of the things that have already made in 32 or 64 bytes? Is it possible to make a "killer demo" in sixteen bytes? A recent 23-byte Commodore 64 demo, Wallflower by 4mat of Ate Bit, suggests that this might be possible:



The most groundbreaking aspect in this demo is that it is not just a simple effect but appears to have a structure reminiscent of bigger demos. It even has an end. The structure is both musical and visual. The visuals are quite glitchy, but the music has a noticeable rhythm and macrostructure. Technically, this has been achieved by using the two lowest-order bytes of the system timer to calculate values that indicate how to manipulate the sound and video chip registers. The code of the demo follows:
* = $7c
ora $a2
and #$3f
tay
sbc $a1
eor $a2
ora $a2
and #$7f
sta $d400,y
sta $cfd7,y
bvc $7c

When I looked into the code, I noticed that it is not very optimized. The line "eor $a2", for example, seems completely redundant. This inspired me to attempt a similar trick within the sixteen-byte limitation. I experimented with both C-64 and VIC-20, and here's something I came up with for the VIC-20:
* = $7c
lda $a1
eor $9004,x
ora $a2
ror
inx
sta $8ffe,x
bvc $7c

Sixteen bytes, including the two-byte PRG header. The visual side is not that interesting, but the musical output blew my mind when I first started the program in the emulator. Unfortunately, the demo doesn't work that well in real VIC-20s (due to an unemulated aspect of the I/O space). I used a real VIC-20 to come up with good-sounding alternatives, but this one is still the best I've been able to find. Here's an MP3 recording of the emulator output (with some equalization to silent out the the noisy low frequencies).

And no, I wasn't the only one who was inspired by Wallflower. Quite soon after it came out, some sceners came up with "ports" to ZX Spectrum (in 12 or 15 bytes + TAP header) and Atari XL (17 bytes of code + 6-byte header). However, I don't think they're as good in the esthetic sense as the original C-64 Wallflower.

So, how and why does it work? I haven't studied the ZX and XL versions, but here's what I've figured out of 4mat's original C-64 version and my VIC-20 experiment:

The layout of the zero page, which contains all kinds of system variables, is quite similar in VIC-20 and C-64. On both platforms, the byte at the address $A2 contains a counter that is incremented 60 times per second by the system timer interrupt. When this byte wraps over (every 256 steps), the byte at the address $A1 is incremented. This happens every 256/60 = 4.27 seconds, which is also the length of the basic macrostructural unit in both demos.

In music, especially in the rhythms and timings of Western pop music, binary structures are quite prominent. Oldschool homecomputer music takes advantage of this in order to maximize simplicity and efficiency: in a typical tracker song, for example, four rows comprise a beat, four beats (16 rows) comprise a bar, and four bars (64 rows) comprise a pattern, which is the basic building block for the high-level song structure. The macro-units in our demos correspond quite well to tracker patterns in terms of duration and number of beats.

The contents of the patterns, in both demos, are calculated using a formula that can be split into two parts: a "chaotic" part (which contains additions, XORs, feedbacks and bit rotations), and an "orderly" part (which, in both demos, contains an OR operation). The OR operation produces most of the basic rhythm, timbres and rising melody-like elements by forcing certain bits to 1 at the ends of patterns and smaller subunits. The chaotic part, on the other hand, introduces an unpredictable element that makes the output interesting.

It is almost a given that the outcomes of this approach are esthetically closer to glitch art than to the traditional "smooth" demoscene esthetics. Like in glitching and circuit-bending, hardware details have a very prominent effect in "Wallflower variants": a small change in register layout can cause a considerable difference in what the output of a given algorithm looks and sounds like. Demoscene esthetics is far from completely absent in "Wallflower variants", however. When the artist chooses the best candidate among countless of experiments, the judgement process strongly favors those programs that resemble actual demos and appear to squeeze a ridiculous amount of content in a low number of bytes.

When dealing with very short programs that escape straightforward rational understanding by appearing to outgrow their length, we are dealing with chaotic systems. Programs like this aren't anything new. The HAKMEM repository from the seventies provides several examples of short audiovisual hacks for the PDP-10 mainframe, and many of these are adaptations of earlier PDP-1 hacks, such as Munching Squares, dating back to the early sixties. Fractals, likewise producing a lot of detail from simple formulas, also fall under the label of chaotic systems.

When churning art out of mathematical chaos, be that fractal formulas or short machine-code programs, it is often easiest for the artist to just randomly try out all kinds of alternatives without attempting to understand the underlying logic. However, this easiness does not mean that there is no room for talent, technical progress or rational approach in the 16-byte size class. Random toying is just a characteristic of the first stages of discovery, and once a substantial set of easily discoverable programs have been found, I'm sure that it will become much more difficult to find new and groundbreaking ones.

Some years ago, I made a preliminary design for a virtual machine called "Extreme-Density Art Machine" (or EDAM for short). The primary purpose of this new platform was to facilitate the creation of extremely small demoscene productions by removing all the related problems and obstacles present in real-world platforms. There is no code/format overhead; even an empty file is a valid EDAM program that produces a visual result. There will be no ambiguities in the platform definition, no aspects of program execution that depend on the physical platform. The instruction lengths will be optimized specifically for visual effects and sound synthesis. I have been seriously thinking about reviving this project, especially now that there have been interesting excursions to the 16-byte possibility space. But I'll tell you more once I have something substantial to show.

Friday 17 June 2011

We need a Pan-Hacker movement.

Some decades ago, computers weren't nearly as common as they are today. They were big and expensive, and access to them was very privileged. Still, there was a handful of people who had the chance to toy around with a computer in their leisure time and get a glimpse of what a total, personal access to a computer might be like. It was among these people, mostly students in MIT and similar facilities, where the computer hacker subculture was born.

The pioneering hackers felt that computers had changed their life for the better and therefore wanted to share this new improvement method with everyone else. They thought everyone should have an access to a computer, and not just any kind of access but an unlimited, non-institutionalized one. Something like a cheap personal computer, for example. Eventually, in the seventies, some adventurous hackers bootstrapped the personal computer industry, which led to the the so-called "microcomputer revolution" in the early eighties.

The era was filled with hopes and promises. All kinds of new possibilities were now at everyone's fingertips. It was assumed that programming would become a new form of literacy, something every citizen should be familiar with -- after all, using a computer to its fullest potential has always required programming skill. "Citizens' computer courses" were broadcasted on TV and radio, and parents bought cheap computers for their kids to ensure a bright future for the next generation. Some prophets even went far enough to suggest that personal computers could augment people's intellectual capacities or even expand their consciousnesses in the way how psychedelic drugs were thought to do.

In the nineties, however, reality stroke back. Selling a computer to everyone was apparently not enough for automatically turning them into superhuman creatures. As a matter of fact, digital technology actually seemed to dumb a lot of people down, making them helpless and dependent rather than liberating them. Hardware and software have become ever more complex, and it is already quite difficult to build reliable mental models about them or even be aware of all the automation that takes place. Instead of actually understanding and controlling their tools, people just make educated guesses about them and pray that everything works out right. We are increasingly dependent on digital technology but have less and less control over it.

So, what went wrong? Hackers opened the door to universal hackerdom, but the masses didn't enter. Are most people just too stupid for real technological awareness, or are the available paths to it too difficult or time-consuming? Is the industry deliberately trying to dumb people down with excessive complexity, or is it just impossible to make advanced technology any simpler to genuinely understand? In any case, the hacker movement has somewhat forgotten the idea of making digital technology more accessible to the masses. It's a pity, since the world needs this idea now more than ever. We need to give common people back the possibility to understand and master the technology they use. We need to let them ignore the wishes of the technological elite and regain the control of their own lives. We need a Pan-Hacker movement.

What does "Pan-Hacker" mean? I'll be giving three interpretations that I find equally relevant, emphasizing different aspects of the concept: "everyone can be a hacker", "everything can be hacked" and "all hackers together".

The first interpretation, "everyone can be a hacker", expands on the core idea of oldschool hackerdom, the idea of making technology as accessible as possible to as many as possible. The main issue is no longer the availability of technology, however, but the way how the various pieces of technology are designed and what kind of user cultures are formed around them. Ideally, technology should be designed so that it invites the user to seize the control, play around for fun and gradually develop an ever deeper understanding in a natural way. User cultures that encourage users to invent new tricks should be embraced and supported, and there should be different "paths of hackerdom" for all kinds of people with all kinds of interests and cognitive frameworks.

The second interpretation, "everything can be hacked", embraces the trend of extending the concept of hacking out of the technological zone. The generalized idea of hacking is relevant to all kinds of human activities, and all aspects of life are relevant to the principles of in-depth understanding and hands-on access. As the apparent complexity of the world is constantly increasing, it is particularly important to maintain and develop people's ability to understand the world and all kinds of things that affect their lives.

The third interpretation, "all hackers together", wants to eliminate the various schisms between the existing hacker subcultures and bring them into a fruitful co-operation. There is, for example, a popular text, Eric S. Raymond's "How To Become A Hacker", that represents a somewhat narrow-minded "orthodox hackerdom" that sees the free/open-source software culture as the only hacker culture that is worth contributing to. It frowns upon all non-academic hacker subcultures, especially the ones that use handles (such as the demoscene, which is my own primary reference point to hackerdom). We need to get rid of this kind of segregation and realize that there are many equally valid paths suitable for many kinds of minds and ambitions.

Now that I've mentioned the demoscene, I would like to add that all kinds of artworks and acts that bring people closer to the deep basics of technology are also important. I've been very glad about the increasing popularity of chip music and circuit-bending, for example. The Pan-Hacker movement should actively look for new ways of "showing off the bits" to different kinds of audiences in many kinds of diverse contexts.

I hope my writeup has given someone some food of thought. I would like to elaborate my philosophy even further and perhaps do some cartography on the existing "Pan-Hacker" activity, but perhaps I'll return to that at some later time. Before that, I'd like to hear your thoughts and visions about the idea. What kind of groups should I look into? What kind of projects could Pan-Hacker movement participate in? Is there still something we need to define or refine?

Monday 6 June 2011

Ancient binary symbolism and why it is relevant today

It is a well-known fact the human use of binary strings (or even binary numbers, see Pingala) predates electronics and automatic calculators by thousands of years.

Divination was probably the earliest human application for binary arrays. There are several systems in Eurasia and Africa that assign fixed semantics to bitstrings of various lengths. The Chinese I Ching gives meanings to the 3- and 6-bit arrays, while the systems used in the Middle East, Europe and Africa tend to prefer groups of 4 and 8 bits.

These systems of binary mysticism have been haunting me for quite many years already. As someone who has been playing around with bits since childhood, I have found the idea of ancient archetypal meanings for binary numbers very attractive. However, when studying the actual systems in order to find out the archetypes, I have always encountered a lot of noise that has blocked my progress. It has been a little bit frustrating: behind the noise, there are clear hints of an underlying logic and an original protosemantics, but whenever I have tried to filter out the noise, the solution has escaped my grasp.

Recently, however, I finally came up with a solution that satisfies my sense of esthetics. I even pixelled a set of "binary tarot cards" for showing off the discovery:


For a more complete summary, you may want to check out this table that contains a more elaborate set of meanings for each array and also includes all the traditional semantics I have based them on.

Of course, I'm not claiming that this is some kind of a "proto-language" from which all the different forms of binary mysticism supposedly developed. It is just an attempt to find an internally consistent set of meanings that match the various traditional semantics as closely as possible.

Explanation

In my analysis, I have translated the traditional binary patterns into modern Leibnizian binary numbers using the following scheme:

This is the scheme that works best for I Ching analysis. The bits on the bottom are considered heavier and more significant, and they change less frequently, so the normal big-endian reading starts from the bottom. The "yang" line, consisting of a single element, maps quite naturally to the binary "1", especially given that both "yang" and "1" are commonly associated with activity.

I have drawn each "card picture" based on the African shape of the binary array (represented as rows of one or two stones). I have left the individual "stones" clearly visible so that the bitstrings can be read out from the pictures alone. Some of the visual associations are my own, but I have also tried to use traditional associations (such as 1111=road/path, 0110=crossroads, 1001=enclosure) as often as they feel relevant and universal enough.

In addition to visual associations, the traditional systems have also formed semantics by opposition: if the array 1111 means "journey", "change" and "death", its inversion 0000 may obtain the opposite meanings: "staying at home", "stability" and "life". The visual associations of 0000 itself no longer matter as much.

The two operations used for creating symmetry groups are inversion and mirroring. These can be found in all families of binary divination: symmetric arrays are always paired with their inversions (e.g. 0000 with 1111), and asymmetric arrays with their reversions (e.g. 0111 with 1110).

Because of the profound role of symmetry groups, I haven't represented the arrays in a numerical order but in a 4x4 arrangement that emphasizes the mutual relationships via inversion and mirroring. Each of the rows in the "binary tarot" picture represents a group with similar properties:
  • The top row contains the four symmetrical arrays (which remain the same when mirrored).
  • The second row contains the arrays for which mirroring and inversion are equivalent.
  • The two bottom rows represent the two groups whose members can be derived from each other solely by mirroring and inversion.
The semantics within each group are interrelated. For example, the third row ("up", "in", "out", "down") can be labelled "the directions". In order to emphasize this, I have chosen a pair of dichotomies for each row. For example, the row of the directions uses the dichotomies "far-near" and "horizontal-vertical", and the array called "up" combines the poles "far"+"vertical". All the dichotomies can be found in my summary table.

The arrays in the top two groups have an even parity while those on the bottom two groups have an odd parity. This difference is important at least in Al-Raml and related systems, where the array getting the role of a "judge" in a divination table must have an even parity; otherwise there is an error in the calculation.

The members of each row can be derived from one another by eXclusive-ORing them with a symmetrical array (0000, 1111, 0110 or 1001). For this reason, I have also organized the arrangement as a XOR table.

The color schemes used in the card pictures are based on the colors in various 16-color computer palettes and don't carry further symbolism (even though 0010 happens to have the meaning of "red" in Al-Raml and Geomancy as well). Other than that, I have abstained from any modern technological connections.

But why?

Our subjective worlds are full of symbolism that brings various mental categories together. We associate numbers, letters, colors and even playing cards to various real-world things. We may have superstitions about them or give them unique personalities. Synesthetics even do this involuntarily, so I guess it is quite a basic trait for the human mind.

Binary numbers, however, have remained quite dry in this area. We don't really associate them with anything else, so they remain alien to us. Even experts who are constantly dealing with binary technology prefer to hide them or abstract them away. This alienation combined to the increasing role of digitality in our lives is the reason why I think there should be more exposure for the various branches of binary symbolism.

In many cultures, binary symbolism has attained a role so central that people base their conceptions of the world on it. A lot of traditional Chinese cosmology is basically commentary of I Ching. The Yoruba of West Africa use the eight-bit arrays of the Ifa system as "hash codes" to index their whole oral tradition. Some other West African peoples -- the Fon and the Ewe -- extend this principle far enough to give every person an eight-bit "kpoli" or "life sign" at their birth.

I guess the best way to bring some binary symbolism to our modern technological culture might be using it in art. Especially the kind of art such as pixel art, chip music and demoscene productions that embrace the bits, bringing them forward instead of hiding them. This is still just a meta-level idea, however, and I can't yet tell how to implement in it practice. But once I've progressed with it, I'll let you know for sure!

Thursday 2 June 2011

What should big pixels look like?

There has been some fuss recently about a new algorithm that vectorizes pixel art. And yes, judging from the example pictures, this algorithm by Johannes Kopf and Dani Lischinski indeed seems to produce results superior to the likes of hq*x and scale*x or mainstream vectorization algorithms. Let me duplicate the titular example for reference:
Impressive, yes, but as in case with all such algorithms, the first question that came to my mind was: "But does it manage dithering and antialiasing?". The paper explicitly answers this question: no.

All the depixelization algorithms so far have been succesful only with a specific type of pixel art. Pixel art of a cartoonish style that has clear lines and not too many details. This kind of pixel art may have been mainstream in Japan, but in the Western sphere, especially in Europe, there has been a strong tradition of optimalism: the tendency of maximizing the amount of detail and shading within the limited grid of pixels. An average pixel artwork on the Commodore 64 or the ZX Spectrum has an extensive amount of careful manual dithering. If we wish to find a decent general-purpose pixel art depixelization algorithm, it would definitely need to take care of that.

I once experimented by writing an undithering filter that attempts to smooth out dithering while keeping non-dithering-related pixels intact. The filter works as follows:
  • Flag a pixel as a dithering candidate if it differs enough from its cardinal neighborhood (no more than one of the four neighbors are more similar to the reference pixel than the neighbor average).
  • Extend the area of dither candidates: flag a pixel if at least five of its eight neighbors are flagged. Repeat until no new pixels are flagged.
  • For each flagged pixel, replace its color with the weighed average of all the flagged pixels within the surrounding 3x3 rectangle.
Would it be possible to improve the performance of a depixelization algorithm by first piping the picture thru my undithering filter? Let's try out. Here is an example of how the filter manages with a fullscreen C-64 multicolor-mode artwork (from the demoscene artist Frost of Panda Design) and how the results are scaled by the hq4x algorithm:

The undithering works well enough within the smooth areas, and hq4x is even able to recognize the undithered areas as gradients and smooth them a little bit further. However, when looking at the border between the nose and the background, we'll notice careful manual antialiasing that even adds some lonely dithering pixels to smooth out the staircasing. My algorithm doesn't recognize these lonely pixels as dithering, and neither does it recognize the loneliest pixels in the outskirts of dithered gradients as dithering. It is a difficult task to algorithmically detect whether a pixel is intended to be a dithering pixel or a contour/detail pixel. Detecting antialiasing would be a totally different task, requiring a totally new set of processing stages.

There seems to be still a lot of work to do. But suppose that, some day, we will discover the ultimate depixelization algorithm. An image recognition and rerendering pipeline that succesfully recognizes and interprets contours, gradients, dithering, antialiasing and everything else in all conceivable cases, and rerenders it in a high resolution and color without any distracting artifacts. Would that be the holy grail? I wouldn't say so.

The case is that we already have the ultimate depixelization algorithm -- the one running in the visual subsystem of the human brain. It is able to fill in amazing amounts of detail when coping with low amounts of reliable data. It handles noisiness and blurriness better than any digital system. It can extrapolate very well from low-complexity shapes such as silhouette drawings or groups of blurry dots on a CRT screen.

A fundamental problem with the "unlimited resolution" approach of pixel art upscaling is that it attempts to fill in details that aren't there -- a task in which the human brain is vastly superior. Replacing blurry patterns with crisp ones can even effectively turn off the viewer's visual imagination: a grid of blurry dots in the horizon can be just about anything, but if they get algorithmically substituted by some sort of crisp blobs, the illusion disappears. I think it is outright stupid to waste computing resources and watts for something that kills the imagination.

The reason why pixel art upscaling algorithms exist in the first place is that sharp rectangular pixels (the result of nearest-neighbor upscaling) look bad. And I have to agree with this. Too easily recognizable pixel boundaries distract the viewer from the art. The scaling algorithms designed for video scaling partially solve this problem with their interpolation, but the results are still quite bad for the opposite reason -- because there is no respect for the nature of the individual pixel.

When designing a general-purpose pixel art upscaling algorithm, I think the best route would go somewhere between the "unlimited resolution" approach and the "blurry interpolation" approach. Probably something like CRT emulation with some tasteful improvements. Something that keeps the pixels blurry enough for the visual imagination to work while still keeping them recognizable and crisp enough so that the beauty of the patterns can be appreciated.

Nevertheless, I was very fascinated by the Kopf-Lischinski algorithm, but not because of how it would improve existing art, but for its potential of providing nice, organic and blobby pixels to paint new art with. A super-low-res pixel art painting program that implements this kind of algorithm would make a wonderful toy and perhaps even vitalize the pixel art scene in a new and refreshing way. Such a vitalization would also be a triumph for the idea of Computationally Minimal Art which I have been advocating.