• About

Bespoke Blog

~ Science! Culture! Computational Engines!

Bespoke Blog

Tag Archives: computer science

Accidental Pilish: Unintentionally Constrained Writing in English Literature

26 Friday Mar 2010

Posted by nfitzgerald in computer science, python

≈ 7 Comments

Tags

computer science, constrained writing, gutenberg, language, nlp, Pilish, writing

Background:

This post is a little late for Pi Day, but it’s never a bad time for discourse related to everyone’s favourite mathematical constant. Twas on Pi Day of this year that I somehow came across this site, which describes the Constrained Writing task of Pilish, in which the length of each word in letters corresponds to the digits of pi:

The first word in this sentence has 3 letters, the next word 1 letter, the next word 4 letters, and so on, following the first fifteen digits of the number π.  A longer example is this poem with ABAB rhyme scheme from Joseph Shipley’s 1960 book Playing With Words:

But a time I spent wandering in gloomy night;

Yon tower, tinkling chimewise, loftily opportune.

Out, up, and together came sudden to Sunday rite,

The one solemnly off to correct plenilune.

Michael Keith, the author of the above website, has created several works in Pilish, including a full-length book covering the first 10,000 digits of pi!

Trying to write under such constraints can feel extremely awkward, but this made me wonder: How often would strings of words adhering to the constraints of Standard Pilish occur unintentionally? Afterall, with the amount of text out there – the sheer rate at which words are being put together by people all over the world every second of every day – it is to be expected that these things should occur with some frequency p > 0. Such is the Law of Large Numbers.

In order to determine this, I would need a large data set. Luckily, such things are readily available. I settled upon the Project Gutenberg ebook catalog – specifically the union of the July 2006 DVD (17,000 books) and the March 2007 Science Fiction Bookshelf CD (most of PG’s Sci-Fi titles). Altogether, this gave me almost 9GB of text (although I later discovered this contained many duplicates, it’s still a hell of alot of words!)

Next I hacked together a small python script which would find, for each file, the longest string of Standard Pilish. Code for this can be checkedout from my SVN repository: http://svn.nfitz.net/pilish

Results:

Somewhat disappointingly, the longest of any Pilish string was 8 digits of pi. The vast majority of books had a longest Pilish string of around 3-5 words. See the histogram below (note the logarithmic scale in the y-axis).

Five books achieved this 8-digit benchmark, listed below, with the section of Pilish text bolded:

  • The Winning of Barbara Worth, by Harold B Wright (1872-1944)

Dismounting and throwing the reins over his horse’s head he came to her smiling, sombrero in hand. “Buenas dias, Senorita. Please may I have a drink?”

“Certainly, Mr. Holmes ; help yourself.” She pointed to the olla hanging in the shade of the ramada.

  • Humphrey Bold- A Story of the Times of Benbow, by Herbert Strang

I was weary of the humdrum life of idling on shore or aimless sailing up and down the channel. The admiral’s was a peaceful mission, and no fighting was expected, but I felt a great curiosity to behold new scenes.

  • Captain Cook’s Journal During the First Voyage Round the World, by James Cook (1728-1779)

And I have a great Objection to firing with powder only amongst People who know not the difference, for by this they would learn to despise fire Arms and think their own Arms superior, and if ever such an Opinion prevailed they would certainly attack you, the Event of which might prove as unfavourable to you as them.

  • Lectures on Modern History, by Baron John Emerich Edward Dalberg Acton (1834-1902)

One was part of the empire, the other was enclosed in Poland, and they were separated by Polish territory. They did not help each other, and each was a source of danger for the other. They could only hope to exist by becoming stronger. That has been, for two centuries and a half, a fixed tradition at Berlin with the rulers and the people. They could not help being aggressive, and they worshipped the authority that could make them successful aggressors.

  • Anne Bradstreet and Her Time, by Helen Campbell (1839-1918)

With the most ambitious of the longer poems–“The Four Monarchies”– and one from which her readers of that day probably derived the most satisfaction, we need not feel compelled to linger. To them its charm lay in its usefulness. There were on sinful fancies; no trifling waste of words, but a good, straightforward narrative of things it was well to know, and Tyler’s comment upon it will be echoed by every one who turns the appallingly matter-of-fact pages…

That last one is the only of the five to have one word of double-digit length, thus covering two digits of pi (‘straightforward = 15 letters = ’15’).

Future Work:

I would like to do a similar analysis of an even larger dataset of more modern language. One possibility is a full archive of Wikipedia. I wonder what is the longest string of unintentional Pilish ever produced?

Another interesting question is how the maximum length of Pilish sections in a document scales with the length of the document, and how well this can be modelled with a simple statistical model such as a Markov Chain.

OH-EM-GEE: An Epoch to Remember

10 Tuesday Feb 2009

Posted by nfitzgerald in computer science

≈ 3 Comments

Tags

computer science, time, UNIX

Drop what you’re doing and pay attention: The Interwebs have just informed me of something spectacular. Just over 70 hours from this moment, UNIX time will read 1234567890. Watch the countdown here. As far as I can figure this is the last “cool” time we will see before the “Unix Millenium Bug” in 2038.

May 2022
M T W T F S S
 1
2345678
9101112131415
16171819202122
23242526272829
3031  
« Mar    

Ben’s Tweets

  • RT @Alex_T_Young: Fetterman will be such an asset as we transition into the “brawling on the senate floor” phase of democratic decline 1 day ago
  • RT @narsi1ion: TES 6 Vaermina quest leaked theguardian.com/world/2022/may… 2 days ago
  • RT @jerseyphysicist: yeah s*x is cool but have you ever had a long ass expression and every other term cancels except one 2 days ago
  • RT @girlandkat: No rest for construction workers in Yokohama today 🐶 ⛑! https://t.co/fucmj7UiI1 5 days ago
  • @frogs4girls MUDA MUDA MUDA 5 days ago

Nicholas’ Tweet’s

  • RT @michielsdj: Now accepted to @iclr_conf! 🎆 3 months ago
  • @mjskay Yeah, I feel a major point people were missing is that an endless spiral into the drain is actually the perfect visual metaphor. 4 months ago
  • RT @_theopompus: A project I'm extremely excited about! An alternative way to incorporate information from the entire Wikipedia (or any oth… 7 months ago
  • RT @MsdJ29: Happy to share a project I've been working on for a while now (with @_theopompus @nfitz @feishaAI @professorwcohen, at Google R… 7 months ago
  • New paper with @MsdJ29 @_theopompus @feishaAI @professorwcohen at Google Research: Mention Memory… twitter.com/i/web/status/1… 7 months ago

Top Posts

  • Basic Data Plotting with Matplotlib Part 3: Histograms
  • Sunset Time Series
  • Filesystem Organization for Physicists Part 1: The Problem
  • Pedal-Powered Lightshow
  • Science is Aesceticism
  • Accidental Pilish: Unintentionally Constrained Writing in English Literature
  • 400 Word Essay 1: Public Libraries in the Digital Age
  • Proceedings of the Association of Pilish-English Research (PAPER)

Tags

100daychallenge advertising astronomy bash biology blogs BMC books browsers Bulshytt calligraphy canada coding cognitive computers computer science css EEE elvish ereaders ethics evolution experiments facebook google government html humor humour ICP I hate this class iliad internet explorer irex java javascript lego letter libraries marketing materialism matplotlib maze mindstorms mods morality mysql networking neuroscience pens philosophy philsophy php Pilish prime minister programming psychology reading review robots science SENG servers sociology steampunk stupid technology time ubc UNIX url vim web web design writing

Blogs We Read

  • Bad Astronomy
  • Boing Boing
  • Rationally Speaking
  • Terry Project (UBC)

RSS Nicholas’ Terry Posts

  • An error has occurred; the feed is probably down. Try again later.

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

Join 163 other followers

Blog at WordPress.com.

  • Follow Following
    • Bespoke Blog
    • Join 163 other followers
    • Already have a WordPress.com account? Log in now.
    • Bespoke Blog
    • Customize
    • Follow Following
    • Sign up
    • Log in
    • Report this content
    • View site in Reader
    • Manage subscriptions
    • Collapse this bar
 

Loading Comments...