Strings hiding in plain sight

It's not often I come across something in Python that surprises me. Especially in something as mundane as string operations, but I guess Python still has a trick or two up its sleeve.

Have a look at this string:

>>> s = "A"

How many possible sub-strings are in s? To put it another away, how many values of x are there where the expression x in s is true?

Turns out it is 2.

Yes, 2.

>>> "A" in s
True
>>> "" in s
True

The empty string is in the string "A". In fact, it's in all the strings.

>>> "" in "foo"
True
>>> "" in ""
True
>>> "" in "here"
True

Turns out the empty string has been hiding every where in my code.

Not a complaint, I'm sure the rationale is completely sound. And it turned out to be quite useful. I had a couple of lines of code that looked something like this:

if character in ('', '\n'):
   do_something(character)

In essence I wanted to know if character was an empty string or a newline. But knowing the empty string thang, I can replace it with this:

if character in '\n':
    do_something(character)

Which has exactly the same effect, but I suspect is a few nanoseconds faster.

Don't knock it. When you look after the nanoseconds, the microseconds look after themselves.

import timeit char = '' for i in range(10): start = timeit.default_timer() for i in range(0,100000): if char in '\n': pass print ('string:{}'.format(timeit.default_timer()-start)) start = timeit.default_timer() for i in range(0,100000): if char in ('', '\n'): pass print ('set :{}'.format(timeit.default_timer()-start))

import timeit char = '' for i in range(10): start = timeit.default_timer() for i in range(0,1000000): '' in '\n' '\n' in '\n' 'f' in '\n' print ('string:{}'.format(timeit.default_timer()-start)) start = timeit.default_timer() for i in range(0,1000000): '' in ('', '\n') '\n' in ('', '\n') 'f' in ('', '\n') print ('set :{}'.format(timeit.default_timer()-start))

Adam Barnes —

I'd argue this is an anti-feature.

Looking at if character in '\n':, I'd consider "" matching a bug.

Man this is a nice comment box so far. If I have to sign up to disqus or something I'm gonna be mad.

Reply to Adam Barnes

Anon Anon —

Why not just do if character == '\n':then it wont match an empty string

Also, did you have to sign up to disqus? I'll guess I'll figure out by replying.

Reply to Anon Anon

Anonymous —

It's set theory, not an anti-feature.

Reply to Anonymous

Will McGugan —

It is odd looking. But these edge cases tend to be well thought out in Python by some very smart people. So I'm guessing there is some solid thinking behind it.

The comment system is home grown. Did it work well? There may be some glitches left.

Reply to Will McGugan

Terry Jan Reedy —

String containment is about substrings (slices), NOT about single characters: (sub in string) == any(sub == string[i:len(sub)] for i in range(len(string) - len(sub)).

Reply to Terry Jan Reedy

mborus —

Interesting find. Thanks for posting it. Personally I'd prefer the redability of ('', '\n') so that it's clear that you're checking for empty chars which is not obvoius when using the string method.

I also have a problem with your assumtion

Which has exactly the same effect, but I suspect is a few nanoseconds faster. Don't knock it. When you look after the nanoseconds, the microseconds look after themselves.

Did you measure the speed improvement or guess?

I did a small test (Python3.5.2, 32bit) and the behaviour is inconsistent. With the code below, exact matches to the set are slightly faster than checking the string.

Reply to mborus

Good point! You're right, I didn't test it.

Your test doesn't try all the possible inputs, which may perform differently. I tweaked it to try the empty string, a newline, and another character.

It does look like the string version is a tiny bit faster (and I do mean tiny).

You're definitely right about the readability. I'd only use this in a very tight loop and with a comment.

BTW the ('', '\n') is just a tuple. Didn't occur to me to try with a set.

Albert Hopkins —

I actually did know about the empty string is contained in any string thing, but for me.

if character in ('', '\n'):

is more clear (to the reader than)

if character in '\n':

Because though the latter works and may be a few "nanoseconds" faster, the former makes it clear to the user that the effect is intentional.

Reply to Albert Hopkins

No argument there.

I used it in a tight cpu bound loop and with a comment. Still feels a little dirty...

Inyeol Lee —

As an elaboration, guess HOW MANY empty strings in 'abc'.

>>> 'abc'.replace('', '_')
'_a_b_c_'

Reply to Inyeol Lee

That is peculiar. Replacing the empty string with something else, actually makes more empty strings.

Strings hiding in plain sight

Strings hiding in plain sight

Speeding up Websockets 60X

Where are those Print statements?

Spatial Bookmarking Service goes Open Source

A new way of drawing boxes in the terminal (possibly)

Exporting SVGs of terminal output with Rich

Textual for Windows