Skip to content
February 4, 2012 / steve8

A quick look at pypy

As a lover of Python, I was excited to learn of efforts to make it run at speeds in the same class of C or Java.

Many programers consider python vs java as a battle of ease-of-programming vs speed…
Does pypy offer python programs the speed of statically typed languages?
Let’s throw a couple programs at it and see what happens:

I used the last version of a program built up in a previous article:

Python 3/2.7 :

from __future__ import division,print_function
import time
t0=time.time()
prime_list = []
maxx = 10**7
for n in range(2, maxx+1):
	isprime = True
	for p in prime_list:
		if n % p is 0:
			isprime = False
			break
	if isprime == True:
		prime_list.append(n)
def smallestPrimeFactor(n):
	for p in prime_list:
		if n % p == 0:
			return p
prime_count={p : 0 for p in prime_list}
for n in range(2, maxx + 1):
	prime_count[smallestPrimeFactor(n)] += 1
keys = []
values = []
in_sets = []
out_sets = []
for p in sorted(prime_count):
	keys.append([p])
	values.append(prime_count[p])
while len(keys) > 1:
	min1_value = min(values)
	min1_index = values.index(min1_value)
	min1_key = keys[min1_index]
	del values[min1_index]
	del keys[min1_index]
	min2_value = min(values)
	min2_index = values.index(min2_value)
	min2_key = keys[min2_index]
	del keys[min2_index]
	del values[min2_index]
	in_sets.append(set(min1_key + min2_key))
	out_sets.append(set(min1_key))
	keys.append(min1_key + min2_key)
	values.append(min1_value + min2_value)
#now that we have a partition answer-key (mapping of in_sets to out_sets)
#let's increment through the entire range of integers and compute the algorithm's performance
print("done with partition answer-key, time elapsed =", int(1000*(time.time()-t0)), "ms")
question_history = []
for n in range(2, maxx + 1):
	prime_subset = set(prime_list)
	question_count = 0
	while True:
		question_count += 1
		checkset = out_sets[in_sets.index(prime_subset)]
		if smallestPrimeFactor(n) in checkset:
			prime_subset = checkset
		else:
			prime_subset = prime_subset-checkset
		if len(prime_subset) == 1:
			#print( "number =", n, " smallest factor (>1) =", list(prime_subset)[0], " questions asked =", question_count)
			question_history.append(question_count)
			break
print( "worst case   =", max(question_history), "questions" )
print( "best case    =", min(question_history), "questions" )
print( "average case =", sum(question_history), "/", (maxx-1), "~=",sum(question_history)/(maxx-1), "questions" )
print( "time elapsed =", int((time.time() - t0)*1000)/1000,"secs")

The Results with a small N (maxx = 10^4):

$ python3 -V
Python 3.2.2
$ python3 primes.py
done with partition answer-key, time elapsed = 388 ms
worst case = 13 questions
best case = 1 questions
average case = 37473 / 9999 ~= 3.747674767476748 questions
time elapsed = 5.21 secs
$
$ python -V
Python 2.7.2
$ python primes.py
done with partition answer-key, time elapsed = 317 ms
worst case = 13 questions
best case = 1 questions
average case = 37473 / 9999 ~= 3.74767476748 questions
time elapsed = 4.477 secs
$
$ ./pypy-1.7/bin/pypy -V
Python 2.7.1 (7773f8fc4223, Nov 18 2011, 22:15:49)
[PyPy 1.7.0 with GCC 4.0.1]
$ ./pypy-1.7/bin/pypy primes.py
done with partition answer-key, time elapsed = 6043 ms
worst case = 14 questions
best case = 1 questions
average case = 37474 / 9999 ~= 3.74777477748 questions
time elapsed = 43.68 secs

Forgetting about performance for a moment, pypy yields a different result!
If it can’t imitate Python, in a black-box sense… then it’s a bit of a failure in my eyes.
Just for a sanity check I also installed pypy-1.7 on a windows box:

PS C:> .pypy-1.7pypy -V
Python 2.7.1 (930f0bc4125a, Nov 27 2011, 11:58:57)
[PyPy 1.7.0 with MSC v.1500 32 bit]
PS C:> .pypy-1.7pypy primes.py
done with partition answer-key, time elapsed = 3628 ms
worst case = 14 questions
best case = 1 questions
average case = 37474 / 9999 ~= 3.74777477748 questions
time elapsed = 26.488 secs

failed again, pypy-1.7 is the current shipping/stable release, let’s try the linux version:

# pypy -V
Python 2.7.1 (?, Nov 22 2011, 08:37:28)
[PyPy 1.7.0 with GCC 4.6.2]
# pypy primes.py
done with partition answer-key, time elapsed = 2951 ms
worst case   = 14 questions
best case    = 1 questions
average case = 37474 / 9999 ~= 3.74777477748 questions
time elapsed = 20.856 secs

again a failure, but before wiping pypy from my mind, I decided to check a nightly build to see if it’s also broken:

$ ./pypy-nightly/bin/pypy -V
Python 2.7.1 (e3fbdc682ed9, Jan 25 2012, 02:00:17)
[PyPy 1.8.1-dev0 with GCC 4.2.1]
$ ./pypy-nightly/bin/pypy primes.py
done with partition answer-key, time elapsed = 218 ms
worst case = 13 questions
best case = 1 questions
average case = 37473 / 9999 ~= 3.74767476748 questions
time elapsed = 6.486 secs
$

Success, it looks as if the latest nightly can at least imitate the functionality of normal python on this program.

[Carl Friedrich Bolz pointed out the issue in the comments], but in short It’s not really pypy’s fault, it may not imitate normal python properly, but it was behavior that normal python shouldn’t have, or more exactly: A programmer shouldn’t rely on any pythons having:
using ‘is’ as if it was ‘==’, even for “0 is 0″… see comments for details…

Looking at speed, It’s disappointing… While it is faster with the first part of the program, It doesn’t impress on the whole.. Let’s give it a larger workload and see what happens. (I’m including comparisons to an analogous java program from the same article)

python3 python2 pypy-nightly pypy-1.7 java
lines of code 64 64 64 64 121
chars of code 2086 2086 2086 2086 3980
10^5:
part 1: time (secs) 18.3 11.0 4.0 3.1 1.3
total time (secs) 368 329 391 277 136
10^6:
part 1: time (secs) 1234.0 1060.8 262.4 232.4 91.3
total time (secs) 38877 39107 42273 33011 18285

Well, pypy-1.7 is very fast at the first part of the program… closer to java speed than CPython. For the entire duration of the program, pypy-1.7 is a bit faster than Cpython, but not in a league with java.

Another test is with the code from this puzzle.
for your convenience, here’s the code:

import time
def incementlistx(listb7):
    idx = 0
    go = True
    listb7[idx]+=1
    while go:
        go=False
        if listb7[idx]==7:
            if idx == 0:
                listb7[idx]=1
            else:
                listb7[idx]=0
            if len(listb7) 5001008001005:
                    keep=False
                    break
python3 python2 pypy-1.7 pypy-nightly
time (secs) 5594 3257 604 698

It’s faster here, significantly so. This is what I expected with pypy from the start.

versions (unless otherwise specified):

python3:
3.2.2 (default, Nov 21 2011, 16:50:59)
[GCC 4.6.2]

python2:
2.7.2 (default, Nov 21 2011, 17:25:27)
[GCC 4.6.2]

pypy-1.7:
Python 2.7.1 (?, Nov 22 2011, 08:37:28)
[PyPy 1.7.0 with GCC 4.6.2]

pypy-nightly:
2.7.1 (c9343ef21049, Jan 28 2012, 02:54:12)
[PyPy 1.8.1-dev0 with GCC 4.4.3]

java:
java version "1.7.0_147-icedtea"
OpenJDK Runtime Environment (IcedTea7 2.0) (ArchLinux build 7.b147_2.0-5-x86_64)
OpenJDK 64-Bit Server VM (build 21.0-b17, mixed mode)

Looks like PyPy offers significant speed gains in certain situations… It might be worth testing out with your program.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: