So, I’m taking a class this year on high performance computing, and I figure’d I might as well kill two birds with one stone: write some blog posts, and also get some studying done. Let’s get to it!
What Is OpenMP?
OpenMP is an API for working with shared memory parallel computers. Essentially everyone now owns one of these machines, as any multi-core machine is a shared memory parallel machine. What it isn’t is a tool for GPU programming or programming on distributed memory systems (like a Beowulf cluster).
OpenMP is one of the fastest and easiest ways to squeeze extra performance our of modern multicore CPUs.
How to Set Up OpenMP?
Unlike some parallel tools (I’m looking at you CUDA 2 years ago), OpenMP is ridiculously easy to set up. If you are running a Debian-like system, it is just:
apt-get install libgomp1
And that’s it! All you need to do now is compile your code, as you normally would, with gcc and the -fopenmp flag
How easy is that?
In the next post, we will write some simple C code using OpenMP.
One of the projects in my list of stuff I’ll get around to is making a 3D unprinter: a machine that can melt a thermoplastic object down and extrude it back into filament. McMaster has this cool course called Sustainable Future, and part of the course is for the students to do a real world project involving sustainability. I pitched the idea to the class, and I’ve got a team of 4 students now working with me to build one! We’re blogging here, and we’ve set up a github repo here. Watch our progress, we should have a good prototype by December.
I have always had a problem with the concept of intellectual property. The great western tradition of post-enlightenment values have always placed the free flow of art and ideas on a pedestal, as a sacrosanct cornerstone of a just society. That the ideas living in our heads and flowing from our lips were the domain of no king, pope, or policeman is the one of the most important cultural norms that has emerged from the enlightenment into modern liberal democracies. The legal constructs associated with intellectual property, in my evaluation, cannot be reconciled with this. A corpuscle of information cannot be at once free to be spoken or expressed and also be the property of some individual and corporation. Information Theory, the fantastic work pioneered by Claude Shannon, only swells my distaste for intellectual property. We know now that with simple coding, all information is reducible to a common binary form. Film, print, music, photography: all is merely a collection of ordered bits. Which makes the idea of owning information all the more ridiculous, as the process can be just as easily reversed: A song can be represented by a string of Shakespeare quotations, a movie can be rendered in musical score. As an illustration of this, I’ve written a short program that takes any file and converts it to a long, rambling nonsense-poem. Poetry as Piracy.
Making the Wordlists
The first step is generating a set of words to use to generate our poems, categorized by their grammatical type. To do this, I downloaded the English wiktionary. I then used grep, sed, and awk to split it into plain lists of words: nouns, past tense verbs, present participle verbs, and adjectives. I then shuffled these lists, and trimmed them down so that their length was a multiple of 2. I didn’t need to do this, but it simplified the work slightly. In the end, I was left with 17 bits worth of information stored in each noun (131,072 words), 13 bits in each past-tense verb (8192), 13 bits in each present-participle verb (8192), and 15 bits for each adjective.
I then decided on two rough sentence skeletons:
The ADJECTIVE NOUN PAST-VERBED the ADJECTIVE NOUN.
ADJECTIVE NOUN is PRESENT-VERBING the ADJECTIVE NOUN.
Each of those sentences can store 77 bits of information. A 1Mb file, for example, will require roughly 10,000 sentences, or about a novelette worth of words. If that 1 Mb file was a copyrighted song, you would not in fact have the freedom to print and distribute your nice new novel (not that you would want to, it would be random nonsense.)
Encoding the File
Now, 77 bits is a bit awkward. Just choosing between each sentence type gives me 1 bit of information. I also get punctuation at the end. If I end each sentence with either a period, exclamation mark, two exclamation marks, or three exclamation marks, that gets me an extra two bits of information. This gets me up to 80 bits per sentence, or 10 bytes. I can now easily encode my data as nonsense poetry! I use the first bit to select which tense of verb, the second two decide if I get a period or exclamation series, and the rest determine the sentence itself. If my file isn’t nicely divisible into base 10, I simply add an additional line at the end:
All that remains are NUM memories and NUM regrets.
Where NUM is the base-10 representation of the remaining bytes in the first case, and the number of bytes remaining in the second instance (as a long string of leading zeros will get truncated in converting to decimal).
Decoding the File
Decoding the file is as simple as just reading in each line, checking what sentence type it is, and what the punctuation at the end is, and returning it to the original binary form!
I’ve been warned that I sometimes veer too far in the direction of toolmaker away from the standard path followed by most scientists. Try as I might, I cannot seem to avoid finding the process of doing science nearly as interesting as the goal of getting that science done. And so, my mind has been orbiting around a problem I suspect is endemic amongst all physicists, if not all scientists. That problem, captured so nicely by this PhD comic is that of filesystem cruft. Science, being at it’s core an experimental art, produces for every successful idea a whole panoply of failed experiments, mistakes, and generally messed-up crap. Being paranoid creatures consumed by our own fears, along with the awareness that serendipity has been a cornerstone of great work, we are loathe to sweep these ill-fated children of the mind into the trash where they (mostly) belong. And so those of us who rely on computers for most of our day-to-day work end up with home directories filled to the brim with old scripts, corrupted data files, a dozen different versions of the same list of values, and other digital detritus. And this situation makes for errors, confusion, thousand yard stare, anal leakage, and other evils too foul to discuss in polite company. Just looking at my /home directory on my workstation at the University, I have more than 100,000 files sitting around, waiting for me to stare at them for a quarter hour trying to remember what they were for.
Inspired by a reddit image post (which I cannot for the life of me find again), I decided to take a series of photos of the sunset from my parents’ house at Cedar-by-the-Sea, Vancouver Island. I many photos over the course of several hours using a digital camera fixed in position on a tripod.
I thought it would look good to blend the images one into the other, so I wrote a quick python script using the Python Image Library. The script blends consecutive images using linear interpolation. An artistic choice to make was how wide the blended regions should be. I tried everything from relatively thin blending regions:
To almost completely blended images:
In the end, however, I decided that what looked the best was actually to have no blending, but rather sharp boundaries between the images. This actually accentuates the effect I was going for, which was to show the changing light over time. Blending the images together actually lessens the effect, rather than enhancing it as had hoped. I plan to get the finished product printed and framed:
Here’s the code for the script I used (apologies for quick-and-dirtiness):
import sys from PIL import Image def imageblend(imdir, numimages = 5, blendwidth=0): if not blendwidth%2 == 0: raise Exception('blendwidth not even') im = Image.open(imdir+"im1.jpg") (width, height) = im.size for i in range(1, numimages): imnum = i+1 centre = i*width/numimages - 1 im_i = Image.open(imdir+'im%d.jpg'%(imnum)) for x in range(blendwidth): col_ind = centre - (blendwidth/2) + x +1 col_box = (col_ind, 0, col_ind+1, height-1) col_o = im.copy().crop(col_box) col_i = im_i.copy().crop(col_box) col = Image.blend(col_o, col_i, float(x)/blendwidth) im.paste(col, col_box) rest_box = (centre+blendwidth/2+1, 0, width-1, height-1) rest = im_i.copy().crop(rest_box) im.paste(rest, rest_box) im.save(imdir+"im_output.jpg") def main(): imdir = sys.argv imageblend(imdir) if __name__=='__main__': main()
So, as you are all fully aware, I have been silent for the past few weeks. Moving across the country can do that to you. Now that I am no longer living out of boxes, expect a rapid catchup as I make up the posts I missed.
I’ll be keeping track at the bottom of my posts.
On my first day of University five years ago, whilst shopping for dorm supplies, I bought this large beer-and-hockey themed piggy bank. Since then, I have deposited any coins smaller than a quarter (ie pennies, nickels and dimes). Today, just 2 weeks shy of moving to the US to start grad school, was time to finally cash in.
Before counting, I wanted to see if I could reasonably estimate how much money there would be. I considered doing a “random sample” approach, counting the value of a small portion and scaling up to the full weight. However, unfortunately the only means at my disposal to weigh the samples was a bathroom scale inteded for weighing people in 0.1kg increments, so I didn’t think this would be accurate enough. Instead I weighed the entire piggy-bank (which came out to an impressive 4.2kg) and made some simple estimations of what the relative proportions of the coins would be as so:
I just got back from the University of Calgary’s fantastic Rothney Astrophysical Observatory. Since there is a new moon in Calgary, we have had late night open houses yesterday, today, and one tomorrow from 10PM until 2AM. Since I am exhausted, let me show you the awesome picture of the beautiful tendrils of cool dust in the Eagle Nebula we were able to capture using the 16″ Clark-Milone Telescope:
Well, I was hoping to make a more interesting post today, but seem to have lost the route through my cluttered mind to get to the synapses that store my github private key passphrase. So, in an attempt to keep up the quantity if not the quality of my blogging, let me turn to this dire state of mnemonic affairs. I can’t remember my passwords very well anymore. I’ve always been leery of password storage utilities, but I think I need to rethink them. I keep my machines with full disk encryption, and commit those passwords well into my memory, so I should be somewhat secure, right? Talk me in to this, dear readers, or tell me the path of folly I am embarking upon.