on the edge

Greg Black

gjb at gbch dot net
Home page

If you’re not living life on the edge, you’re taking up too much space.

FQE30 at speed

Syndication

RSS Feed

Worthy organisations

Amnesty International Australia — global defenders of human rights

Médecins Sans Frontières — help us save lives around the world

Electronic Frontiers Australia — protecting and promoting on-line civil liberties in Australia

Blogs

(Coming soon…)

Software resources

GNU Emacs

Sat, 18 Sep 2004

Flawed technical papers

In the early 1980s, I read a lot of doctoral theses—some of them because they were written by friends who wanted to know what I thought; some because I thought they might be interesting; and some for other reasons that we can gloss over for now. Mostly, they were pretty boring; some, written by friends in fields that I knew next to nothing about, were fairly incomprehensible; and a surprising number were just plain bad. But then life was kind to me and I have only read the rare thesis, out of interest in its subject matter, over the past twenty years.

Recent investigations into programming languages, however, have prompted me to read some more. I find that I get quite irritated when I find silly mistakes or, worse still, plain wrong material in a thesis. After all, they are not rushed out, but are the result of a great deal of work and considerable review by others before being published. Here’s an example of a silly mistake that should have been fixed early on. It’s taken from Making reliable systems in the presence of software errors:

Strings […] are written as doubly quoted lists of characters, this is syntactic sugar for a list of the integer ASCII codes for the characters in the string, thus for example, the string "cat" is shorthand for [97,99,116].

Leaving aside the poor punctuation, the thing that just slaps you in the face is that the integers listed there cannot possibly be right; you don’t have to have the ASCII code in your head for this to be obvious. So why didn’t somebody tell him to fix this? It seems ironic that a dissertation with such a title, especially when marked “final version (with corrections)”, should have such obvious errors. There are others, but I won’t belabour the point here, as I have another example to discuss.

To be fair, this next target is not a doctoral thesis, but an honours paper—but it’s available online and is recommended from time to time by people who are normally discerning. The paper is In Search of the Ideal Programming Language. I first came across it following a recommendation from a contributor to a site that specialises in programming languages.

Early in the paper, the author provides some code examples to support his contention that C and Pascal are superior to Java in terms of simple expressiveness. The code instantiates the famous “Hello, world!” example. I don’t know Java or Pascal well enough to comment authoritatively, but the examples look similar to stuff I remember; but I do know C and his example code, for this tiny program, is plain wrong. Considering that the correct version is well known, that’s just inexcusable. Ironically, when the C is corrected, it’s not at all clear that it’s significantly superior to the Java version. In any case, being wrong, it does not make much of a case.

Later, we see that this early praise of C was not serious. He goes to great length to criticise elements of C that are so utterly trivial that they don’t deserve discussion; most often, what he demonstrates is merely that he does not understand C. There is a lengthy section that purports to show that C’s lack of a string type is an impediment to somebody writing, e.g., a word processor—when anybody who was writing such a thing in C would simply be using a string library that provided whatever facilities seemed useful. Considering that our author specifies easy integration with extension libraries as an essential feature in any useful language, it’s odd that he doesn’t seem to know when one might be used.

Towards the end of the paper, he talks about portability and gives examples of code that might give different results if compiled with different compilers. Leaving aside for now the question of whether his specific examples are “undefined” or “implementation-defined”, the real point is that only mad people would write code in this way—if a human reading the code can’t possibly guess what the programmer had in mind, the code is simply bad code and discussion of what a compiler might do with it is simply irrelevant.

The important concept that seems to have been completely missed in this long paper is that programming is a discipline and programmers have to learn how to approach it in such a way that their code is understandable by humans—regardless of the specific programming language they may choose to adopt. While it’s certain that some languages are better suited to some kinds of software than others, it’s also true that pretty much anything can be written in any general purpose language—meaning that you can’t write an operating system in awk, or any real program in Pascal.

At the end, the author asserts that no current language offers the facilities that a programming language should offer, although he stops short of specifying what his ideal language would look like. This is an odd conclusion, for two reasons. On the one hand, he seems to ignore the fact that a great deal of software has been developed successfully with the tools we have (admittedly with some classic disasters along the way); and on the other he completely fails to discuss many serious languages that have been developed to address some of his concerns. Some of those languages might be a bit new for his 1997 paper, but most of the modern languages were known before Java hit the headlines; and some important languages which have been around forever, such as the Lisp family, do not even get mentioned in passing. This is a very strange paper, in my opinion, and it’s hard to understand why people would point to it as a starting point for an investigation into programming languages.

As a final note, I did not read either of these papers in full. The Erlang paper is something that I’ll come back to and will read fully, because the content seems useful and reasonably well-written even if the silly mistakes detract from it. I’m not likely to return to the ideal language paper because it seems to have insufficient merit on any level to be worth spending the time on. So I may have missed some details, but not enough to be likely to require me to revise my thought here.

Posted at: 11:17 Path: /software/discuss | permalink