A First Date With Python

I started playing Endless Sky recently. It has a lot to recommend it; if you like 4X games, it's worth checking out. One of its features is a fairly rich array of ships and components that you can mix and match to suit your taste. I wanted to experiment with different configurations, and I couldn't find an outfitting tool. So, I set about building my own. One of the game's other nice features is that it is completely free and open source. This is super helpful, because I can just use the actual data files rather than having to spend a ton of time exploring and reverse engineering the stats on everything.

Why Python

The files are organized in a lightweight heirarchical tree structure, where parent-child relationships are described by indentation levels. Here's a snippet of one of those trees to illustrate what I mean.

ship "Combat Drone"
	sprite "ship/combat drone"
	licenses
		Navy
	attributes
		category "Drone"
		"cost" 83000
		weapon
			"blast radius" 5
			"shield damage" 50
			"hull damage" 25
			"hit force" 75
	outfits
		"Beam Laser"
		"X1700 Ion Thruster"
		"X1200 Ion Steering"
		
    description "Combat drones are pilotless attack ships used primarily by the Republic Navy."

The way the data is organized, where indentation is structurally meaningful, put me in mind of Python. Python should also make it relatively easy to build this into a portable desktop app that would run on any system that also runs the game. Plus, learning new things is half the fun.

Getting Ready

This is my absolute first outing with Python. While I'm building some basic familiarity with the syntax and the standard library, I want to have some small contained problems to work with. Project Euler is a good place to start for that kind of thing. I've only done some of the beginning problems, but the one which I found most educational was Problem 7. The task is to find the 10,001st prime number. Since these are still small numbers from a mathematics point of view, I didn't have to do anything exotic; a Sieve of Eratosthenese worked just great. You may not know it by that name, but the technique is the same one as you use to find prime numbers by hand. You have a sequence of numbers you're going to test, and a list of known prime numbers. For each of the numbers you want to test, you try dividing it by all of the prime numbers in your list. If any of them divide evenly, then your test number is not prime and you can move on to the next one. If none of them divide evenly, then you have found a new prime; add it to the list. There are some simple optimizations you can make, but the basic algorithm performs very well as long as the numbers you're testing are small enough to fit in your system's maximum integer size (8 bytes on most systems, or a bit more than 9 quintillion, or 18 quintillion if you can use an unsigned type).

The way I would usually implement a Sieve of Eratosthenese is to have one list of numbers, and then starting from the beginning do the trial division on every later number in the list, and remove the ones that fail. That method eliminates any duplicate tests and saves on memory by not storing two lists. This turns out not to work well with Python. Python has some very robust mechanisms for manipulating and iterating on lists, but they also make it intentionally difficult to alter the list mid-iteration. Another thing that Python has is the range() function. It returns a memory efficient iterable object. So instead of iterating over an existing list, we'll instead iterate over a dynamic sequence, and build a list of primes as we go. I didn't try to do an actual comparisson, but my sense is that this method will perform slightly better for smaller numbers and slightly worse for larger ones. It also saves a bit of unnecessary work of you're looking for the Nth prime rather than all primes less than X. You can see my actual solution on github.

The other problem from Project Euler that I thought was very helpful was #8, which is to find the 13 digit sequence which has the largest product from a 1,000 digit number. I think this problem actually starts out as a text manipulation problem. I start by treating the 1,000 digit number as a string, and split the string on zeroes. Because any sequence that contains a 0 will have a product of 0, so they can be discarded. Then I also discard any sequence which is not at least 13 digits long. The last bit of not-really-math was to break up sequences longer than 13 digits. For instance, if I needed 4 digit sequences, 123456 would become [1234, 2345, 3456]. In many of these transformations, Python's slice syntax and list comprehension were extremely helpful. Slice syntax is like typical array syntax on steroids, and allows you to specify ranges and sequences from a list with arbitrary start and end points. List comprehension allows for even more advanced selection from a list, or even constructing one from scratch. So for instance, the code to discard all of the sequences that were too short to consider is sets = [s for s in sets if len(s) >= 4] where sets was already a list. Of course you can do exactly the same thing in other languages, but I'm finding this syntax to be very readable, and I like that I can use the same iteration mechanism as everywhere else.

We Should Do This Again Sometime

The original motivation for this was to build a tool for Endless Sky. I'll have another post about that soon. But I plan to keep working with Python, which is what this post has been about. I like how easy it is to work with lists, and since strings are lists of characters, it's easy to work with text too. I like that Python has real classes and can be properly object oriented. Basically, I like Python enough to keep using it.