I was asked on Twitter why Python uses 0-based indexing, with a link to a new (fascinating) post on the subject (http://exple.tive.org/blarg/2013/10/22/citation-needed/).
I recall thinking about it a lot; ABC, one of Python's predecessors,
used 1-based indexing, while C, the other big influence, used 0-based.
My first few programming languages (Algol, Fortran, Pascal) used 1-based
or variable-based. I think that one of the issues that helped me decide
was slice notation.
Let's first look at use cases. Probably the
most common use cases for slicing are "get the first n items" and "get
the next n items starting at i" (the first is a special case of that for
i == the first index). It would be nice if both of these could be
expressed as without awkward +1 or -1 compensations.
Using
0-based indexing, half-open intervals, and suitable defaults (as Python
ended up having), they are beautiful: a[:n] and a[i:i+n]; the former is
long for a[0:n].
Using 1-based indexing, if you want a[:n] to
mean the first n elements, you either have to use closed intervals or
you can use a slice notation that uses start and length as the slice
parameters. Using half-open intervals just isn't very elegant when
combined with 1-based indexing. Using closed intervals, you'd have to
write a[i:i+n-1] for the n items starting at i. So perhaps using the
slice length would be more elegant with 1-based indexing? Then you could
write a[i:n]. And this is in fact what ABC did -- it used a different
notation so you could write a@i|n.(See http://homepages.cwi.nl/~steven/abc/qr.html#EXPRESSIONS.)
But
how does the index:length convention work out for other use cases? TBH
this is where my memory gets fuzzy, but I think I was swayed by the
elegance of half-open intervals. Especially the invariant that when two
slices are adjacent, the first slice's end index is the second slice's
start index is just too beautiful to ignore. For example, suppose you
split a string into three parts at indices i and j -- the parts would be
a[:i], a[i:j], and a[j:].
So that's why Python uses 0-based indexing.
To anyone who prefers 1-based indexing: you are the same people who screwed up the calendar, starting at year 1 and calling the 1900s the 20th century (and arguing that the year 2001 ought to be the start of the next millennium). Haven't you done enough damage? :-)
ReplyDeleteYou seriously think it makes more sense to have a year 0? If the FIRST year isn't year 1, then you end up with the SECOND year being year 1. Yeah, that makes sense. NOT.
DeleteTime has the notation of 1:?? in the second hour of the day. What's your issue with the years again?
DeleteDijkstra had a monograph about this: https://www.cs.utexas.edu/~EWD/transcriptions/EWD08xx/EWD831.html
ReplyDeleteAlso I should point out that the correct first number is zero, and kids should be taught to count "0, 1, 2, ...". It fixes a lot of problems that you get if you start a 1. In this light, there's no argument at all for starting indexing at 1.
As I pointed out in my G+ post, Dijkstra did not consider start:size as an option. Sadly, that's the way C++ seems to be going (https://www.cilkplus.org/tutorial-array-notation).
ReplyDeleteThat Cilk thing goes against the entire STL by not using .start() .end() half-closed intervals. What were these guys thinking, or were they even thinking at all.
DeleteY combinator's thread on this topic mention that in 1985 Inside Macintosh explained why graphics coordinates are 0-based: https://news.ycombinator.com/item?id=6601515 (search for "Macintosh").
ReplyDeleteIf you count objects 0, 1, 2 you wind up with a value that's smaller than the number of objects. 0-based indexing works because it is not ordinal.
ReplyDeleteYour mention of graphics coordinates reminds of another index holy war about the location of the origin: top-left or bottom-left?
ReplyDeleteI like ot's description of indices as boundaries rather than labels. I think it gives both the 0th and 1st camps something to hang their hats on :).
Except the indices as boundaries analogy is broken. :-(
ReplyDelete"abcdefghijklmnop"[:3:-1]
Note that it ends in e, not in d as it should if you consider 3 as a boundary.
When I first encountered this, I thought it was a bug. Boundaries metaphor was something I deeply embraced.
Maybe it could stay that way? Guido, can slice semantics with negative stride be changed?
I agree that slices with negative strides are confusing. We can't change them (it would break too much code) but you can usually avoid them by using the reversed() function instead, e.g. ''.join(reversed("abcdefghijklmnop"[3:])). Still not ideal, but how often do you need a string backwards (apart from palindrome puzzles :-)?
ReplyDeleteYou've seen my code on CheckIO. More often than you think. :-P
ReplyDeleteBut seriously... not only strings, any indexable iterables have same problem. And you haven't seem to have problems with breaking code (especially without any confirmed examples) before. At least in Py4K? :-)
Over on python-ideas we're discussing negative strides now. The best way to use them is to also use negative indexes. E.g. a[::-1] == a[-1:-1-len(a):-1].
ReplyDeleteThis is an old post and interesting, but I think there is a hardware angle you aren't considering in indexing. I'm a EE prof, worked at IBM and Motorola previously. Hardware wants zero indexing. Think about a logical decoder (heart of SRAM memory driving word lines for instance). If you have a four bit decoder, you have options of 0 to 15 as the input so you drive out wordline0 to wordline15. 16 doesn't fit. Also, in a reverse FOR loop with a terminating value of zero, you can use the Z bit (zero flag - NAND of all index bits) for free. Many early assembly languages had instructions for this to avoid an explicit check for termination. Interesting to hear the CS perspective.
ReplyDeleteHi Eric, thanks for adding that point. In high school I built and used simple digital logic and in college I used CDC assembler, so I'm well aware of the hardware preference for 0-based indexing. This is also of course where C's 0-base indexing comes from. But Python, priding itself in being a high-level language, needs to be able to motivate its position in this matter without reference to the implementation.
ReplyDelete