May 11, 2007 will

The smell of C

A recent post on reddit.com sparked a debate on the following snippet of C code.

int main() { int i = 5; i = ++i + ++i; printf ("%d", i); }

The denizens of reddit have been musing over the the output of the code; is it 13 or 14? It can be either depending on the compiler used - or something else entirely - as the result is undefined, which another denizen noted. Sometimes I wonder if the C language was designed purely to provide challenging job interview questions.

Even though I was a C/C++ programmer for a lot longer than I have been a Python programmer, the entire concept of a valid language construct being undefined is offensive to me. Seriously. It's like month old milk; makes me screw my face up and turn away in disgust. In contrast, Python smells like a mixture of freshly cut grass and new car smell.

As enamored with Python as I am, it still sits on top of C (at least C-Python does) and inherits a little of its unpredictable nature. I believe there are some inconsistencies across the various Python platforms, but I can't think of them right now. Most of them were floating point unit related IIRC, because CPython has little choice but to defer to the floating point implementation of the platform. I wonder if there is a web page somewhere documenting runtime differences between platforms and Python implementations. A small and concise web page I hope!

On the subject of smells. What do other languages smell like? I reckon assembler smells kind of like ammonia.

Use Markdown for formatting
*Italic* **Bold** `inline code` Links to [Google](http://www.google.com) > This is a quote > ```python import this ```
your comment will be previewed here
gravatar
Jack Diederich

Python has undefined features as well. id('Hi') == id('Hi') works or doesn't depending on the implementation. Heck, id('Hi') == id('hello') can be true!

gravatar
Paddy3118

Assembler can indeed smell rosy. I do remember thinking of how well thought out the PDP11 assembler code was (later forming the base of the Motorola 68000 assembler instruction set).

gravatar
Will


Python has undefined features as well. id(’Hi’) == id(’Hi’) works or doesn’t depending on the implementation. Heck, id(’Hi’) == id(’hello’) can be true!

I can understand the disparity of id('Hi') == id('Hi'), I guess it depends on if the string is interened or not - but are you certain about id('Hi') == id('Hello')? The help says it is guaranteed to be unique amongs simultaneously existing objects. :-\

gravatar
Will

Oh I get it. The reference to the string in the first id, may have been cleaned up before the second id is performed.

gravatar
Andrew Dalke

Try searching docs.python.org for the word "undefined". Some expose problems in the underlying C code (eg, strange 'seek' values), others are library implementations. Some, though, are undefined aspects of the language.

"While a list is being sorted, the effect of attempting to mutate, or even inspect, the list is undefined."

"Formfeed characters occurring elsewhere in the leading whitespace have an undefined effect (for instance, they may reset the space count to zero)."

"super is undefined for implicit lookups using statements or operators such as "super(C, self)[name]""

This one is for "implementation defined"

"If the transformed name is extremely long (longer than 255 characters), implementation defined truncation may happen."

As I recall, some others came up during the then-named JPython code. For example, a function object is not required to have a way to get the byte code.

gravatar
James

Take away the increment operations (which should be noted now exist in languages other than C) and you have this basic case:

i = write_to_file(line1) + write_to_file(line2)

Where write_to_file returns an integer error code (or maybe the number of bytes written). Is line1 or line2 written first? Most popular languages don't define evaluation order.

gravatar
Luke Plant

With the regards to the 'id' examples -- surely that is to do with non-determinism rather than being 'undefined'. For example:

datetime.now() == datetime.now()

rand() == rand()

Not being able to predict what will happen is not the same as behaviour being undefined.

gravatar
Jason

The usual reason given is that A) not defining a specific order allows the compiler greater freedom in optimization, and B) you shouldn't be writing order-dependent things on the same line anyways; it's just bad coding style (with the exception of && and ||.)

If I remember correctly, Python does define an order (left-to-right?) for function calls, but as Andrew pointed out there's still a lot of other things left undefined.

gravatar
Andrew Dalke

James: "Most popular languages don't define evaluation order"

Python does. Left-to-right in binary ops. See http://docs.python.org/ref/evalorder.html .

Perl5 does not, I think, but I found this quote from Larry Wall: "The fact that Perl 5 doesn't define it is merely an oversight, brought on no doubt by a lack of oversight. But as you point out it can deduced by observation of all the various implementations of Perl. "

Java specified left-to-right, says http://java.sun.com/docs/books/jls/second_edition/html/expressions.doc.html .

Ruby seems to be left-to-right but I can't find a definitive page to that effect.

C#, also left-to-right when there is equal precedence. http://www.awprofessional.com/articles/article.asp?p=25322&seqNum=19&rl=1

Visual Basic, however, does not guarantee the order, says http://vb.mvps.org/tips/truth.asp . "Order of expression execution based on order of appearance is not guaranteed."

gravatar
Bill

Assembler smells of water of course. C has a musty smell, like an old sheet. C++ is like sticking your head into the kitchen on a sunday afternoon...yorkshire puddings, roast potatoes. Python whiffs of ferrero Rocher.

gravatar
Jack Diederich

Luke: Andrew addressed python's determinism and that is the reason for the id() == id() ambiguity. Python garuntees the left id call will happen first but it is implementation dependent if the object passed to the id call is cleaned up first. If it is cleaned up the left object's id will be "free" and could be reused in the right object (as Will pointed out).

Bill: If C++ smelled like yorkshire pudding I wouldn't have switched to Python. Heck, I would switch to any product (languages, cars, toasters) that smelled of roast beef and yorkshire pudding.

gravatar
Will

Close, but I reckon C++ is more like sticking your head into the kitchen oven; it causes dizziness / nausea and there is a risk of spontaneous death - plus overtones of baked in lard. :p

gravatar
Calvin Spealman

There can certainly be some cases where you aren't sure about things involved in Python, but nothing undefined in the sense that C can be. I can counter the examples given.

“While a list is being sorted, the effect of attempting to mutate, or even inspect, the list is undefined.”

- While the list is being sorted, any inspection or mutation could only be occuring in some other thread. Multiple threads are, by desogn, indeterminable.

“Formfeed characters occurring elsewhere in the leading whitespace have an undefined effect (for instance, they may reset the space count to zero).”

- This is caused by improper formatting of a text file. If the file is not formatted properly, you can't expect magic its-OK-ness.

“super is undefined for implicit lookups using statements or operators such as “super(C, self)[name]”"

- Undefined? I actually don't agree. At least, not by the term "undefined" as used in this post. implicit lookups like this are looked up on the type of the object in question, which is the super builtin type, in this case. The methods are undefined in the sense that the type does not define them, so they don't exist. You can't look them up. This is not "undefined" as in not knowing the behavior.

“If the transformed name is extremely long (longer than 255 characters), implementation defined truncation may happen.”

- This is about private name mangling. The mangled names should be considered an implementation detail, you should never use or try to create the names manually, so any implementation specific differences are completely irrelevant.

gravatar
Andrew Dalke

"While the list is being sorted, any inspection or mutation could only be occuring in some other thread. "

It could be done in the comparison function passed to sort. For example, """x= [1,3,2,4]; x.sort(lambda a,b: x.__setitem__(0, x[1]))""". When I try it I get "IndexError: list index out of range".

"This is caused by improper formatting of a text file." If it's wrong ("improper"), then I expect that the parser throws an exception or otherwise indicate a parse error. It does not, hence there is either a bug in the parser or it's not improper formatting.

Regarding "super" - I don't understand super well enough to know what that meant. I left it there as an example where the spec uses "undefined."

Regarding variable names, the C spec makes a distinction between "undefined", "unspecified" and "implementation defined". This behavior is explicitly listed as "implementation defined". Many of the ambiguities in C are also "implementation defined" vs. "undefined."

As do "you should never use or try to create the names manually", you misunderstood the meaning there. If I create the method named "__a" + "b"*252, which is 255 characters long, then the parser will transform that name to insert the class name. This transformed name is longer than 255 characters, and I didn't "create the name[s] manually".

gravatar
Michael

Python actually smells of peaches and sassafras. Lisp of strawberry, watermelon and kind-bud. C has the semi-pleasant smell of old books, while C++ reeks of dirty socks. Java smells like a jail cell -- while Javascript smells surprisingly like strawberry, watermelon and old books...