Python is another scripting language - two things about it caught my eye. First, the gui module included with it works on the Macintosh (unlike with perl!) and second, it is supposedly very popular with people at Google. Since I am a big fan of Google, I figured I would give it a look.
Slowly, slowly I'm learning python. And this evening (morning?) I finished my first really useful Python Script. (From my point of view). It duplicates what my clipsort.pl perl script does - but does it better. Why better? Because it includes dialog boxes for picking and saving the file and it can be turned into an app that I can run like any other applet. Very cool.
Because the source is kinda long I will put it on it's own page python_dom_sort. The big thing for me with this applet, is that I used the Document Object Model (light) to manipulate data in an XML document. It was much easier than I expected. While learning dom was a tad bit confusing, that was faster than writing a parser to create my own funky xml object tree. Python seems so powerful because the modules that come with it… are well… very powerful.
The source code isn't that long.
What makes Python also powerful is that it is built for manipulating lists. With surprising few lines and rich grammar you can cycle through lists of things, filtering out what you don't want and keeping just the pieces that you do.
What I did was to create a simple Main that provides the File Dialogs from EasyDialogs a mac specific UI package. I guess I could have used TKinter (and maybe one day I will) but these worked so easy it reminded me of AppleScript so I used them. Besides, it's not like I can take this script to another box. It's designed to work with iMovie files which… will only appear on a Mac!
getElementByTagName() to return a list of array objects.tuples that contain 4 items (the name of the dict object, a ref to the dict object, and copies of the refs to the dicts' siblings - previous and next). tuples What made all this trickier is that the python parser treats newlines '\n' as a text element. Since these are sprinkled throughout the document, I had to do two things.
This created a few side effects but a couple great list iterators were:
slots = [ s for s in range(len(sortRoot.childNodes)) if sortRoot.childNodes[s].nodeType == 1] dlist = [ d for d in sortRoot.childNodes if d.nodeType == 1]
the first line is a classic filtered loop. It builds a list one element at time from the values s takes on. The purpose of this loop is to find the index value for every dict node in the array (skipping the dreaded text elements). As s needs to be an integer, we build the integer count with the command range(len(sortRoot.childNodes)) which basically, builds a list of numbers from 0… the number of childNodes our sort root has. Then for each value of s, we check to see if that childNode has a node type of 1 (it's a DOM Element, NOT a text Element). When the loop is finished slots contains a list of integers - which if used in conjunction with sortRoot.childNodes would produce all the dict references in the array.
The dlist group does something very similar in that it loops through all the childNodes of sortRoot but only puts ones into the list that are DOM Elements (thus skipping the text elements).
That's some pretty efficient stuff, creating a loop, building a list, and testing for membership all in one line. Very nice.
I wrote a script called findDups.py to locate all the duplicates in an archive of photographs. It's pretty intense so a new article is here: pictureTools
One feature of SGMLParser that looked kinda neat is that you can create a subclass of a function and the parent will call it. Huh?
Basically, you can create a master object class that will call functions which are defined by it's descendents. You can do this:
class myObj: def __init__(self): self.reset() def reset(self): self.x = 2 def process(self,method_name): try: method = getattr(self,method_name) except AttributeError: self.dummy(method_name) if method: method() else: self.unkownMethod() def unkownMethod(self): print 'Unkown Method' def dummy(self,name): print 'Dummy Called: %s' % name class myDesc(myObj): def reset(self): self.y = 3 def foo(self): print 'Foo got called!' >>> o = myDesc() >>> o.process('foo') Foo got called!
Basically, the base class provides a mechanism (using getattr()) to let future classes derive functions (add methods) and have those methods called from the base class.
Because Python treats function names as variables (practically everything is a reference) you can create function names on the fly and use getattr() to call those functions.
One of the most common bugs in programming is the “off-by-one” bug. Either you didn't count enough of x. or you counted one two many of x. Python seems to take a different approach. Their core idea seems to be built around two types of lists. lists which can be changed and tuples - which are lists that can not be changed.
lists = mutable (changeable) tuples = immutable (unchangeable)
They have a simple but powerful construct that iterates over lists. You should almost never iterate yourself (think of it as manual search) but if you use filters, indexes, finds, and ins, you can do almost anything to a list of objects without having to worry about the dreaded off by one problem.
In fact, to make this work, they even have a function called range() which converts a number into a list (so that it can be consistently iterated over).
Example:
Instead of
for (i = 0; i <$#v; i++) { sum += v[i]; }
Python would do something like:
for val in v: sum += val
That may be too trivial an example, but a better one from a script I recently wrote is kind of interesting.
myFiles = os.listdir(os.getcwd()) needsFixn = [f for f in myFiles if '%' in f ]
The first line is pretty straight forward, we get a list of files from the object os via it's listdir method. os.getcwd() makes sure we are looking in the current directory.
The second line contains the magic. It creates a list… from an iterator (f for f in myFiles), but it also filters that list as it's created by the test if '%' in f. Basically, if the file name string f contains a percent sign, then we need to fix the file name. The new list needsFixn contains only file names that need to be fixed.
Notice there is no loop counter, no incrementing of an index, no end test. We just walk through each element in the list, and test it against a condition, adding it to the new list if the condition is true, or tossing it out (skipping) if it fails. That's some pretty crazy stuff, but from what I can tell very Pythonic
When I download spreadsheets and such from Microsoft Web Outlook, the spaces always seem to be changed to %20. Very annoying. (Love Microsofts cross platform compatibility). So here's a quick way in Python to fix those. It's cumbersome and eventually I'd like to write a simple macro to fix it but here it is.
ENTER key, this selects and edits the file name.IDLE or the Python Shell.>>>'<command-v>'.replace('%20',' ')
>>> 'This%20File%20is%20messed%20up'.replace('%20',' ') 'This File is messed up'
This little script takes advantage of the fact that any string is an object.
Cool