• About

Bespoke Blog

~ Science! Culture! Computational Engines!

Bespoke Blog

Tag Archives: nlp

Accidental Pilish: Unintentionally Constrained Writing in English Literature

26 Friday Mar 2010

Posted by nfitzgerald in computer science, python

≈ 7 Comments

Tags

computer science, constrained writing, gutenberg, language, nlp, Pilish, writing

Background:

This post is a little late for Pi Day, but it’s never a bad time for discourse related to everyone’s favourite mathematical constant. Twas on Pi Day of this year that I somehow came across this site, which describes the Constrained Writing task of Pilish, in which the length of each word in letters corresponds to the digits of pi:

The first word in this sentence has 3 letters, the next word 1 letter, the next word 4 letters, and so on, following the first fifteen digits of the number π.  A longer example is this poem with ABAB rhyme scheme from Joseph Shipley’s 1960 book Playing With Words:

But a time I spent wandering in gloomy night;

Yon tower, tinkling chimewise, loftily opportune.

Out, up, and together came sudden to Sunday rite,

The one solemnly off to correct plenilune.

Michael Keith, the author of the above website, has created several works in Pilish, including a full-length book covering the first 10,000 digits of pi!

Trying to write under such constraints can feel extremely awkward, but this made me wonder: How often would strings of words adhering to the constraints of Standard Pilish occur unintentionally? Afterall, with the amount of text out there – the sheer rate at which words are being put together by people all over the world every second of every day – it is to be expected that these things should occur with some frequency p > 0. Such is the Law of Large Numbers.

In order to determine this, I would need a large data set. Luckily, such things are readily available. I settled upon the Project Gutenberg ebook catalog – specifically the union of the July 2006 DVD (17,000 books) and the March 2007 Science Fiction Bookshelf CD (most of PG’s Sci-Fi titles). Altogether, this gave me almost 9GB of text (although I later discovered this contained many duplicates, it’s still a hell of alot of words!)

Next I hacked together a small python script which would find, for each file, the longest string of Standard Pilish. Code for this can be checkedout from my SVN repository: http://svn.nfitz.net/pilish

Results:

Somewhat disappointingly, the longest of any Pilish string was 8 digits of pi. The vast majority of books had a longest Pilish string of around 3-5 words. See the histogram below (note the logarithmic scale in the y-axis).

Five books achieved this 8-digit benchmark, listed below, with the section of Pilish text bolded:

  • The Winning of Barbara Worth, by Harold B Wright (1872-1944)

Dismounting and throwing the reins over his horse’s head he came to her smiling, sombrero in hand. “Buenas dias, Senorita. Please may I have a drink?”

“Certainly, Mr. Holmes ; help yourself.” She pointed to the olla hanging in the shade of the ramada.

  • Humphrey Bold- A Story of the Times of Benbow, by Herbert Strang

I was weary of the humdrum life of idling on shore or aimless sailing up and down the channel. The admiral’s was a peaceful mission, and no fighting was expected, but I felt a great curiosity to behold new scenes.

  • Captain Cook’s Journal During the First Voyage Round the World, by James Cook (1728-1779)

And I have a great Objection to firing with powder only amongst People who know not the difference, for by this they would learn to despise fire Arms and think their own Arms superior, and if ever such an Opinion prevailed they would certainly attack you, the Event of which might prove as unfavourable to you as them.

  • Lectures on Modern History, by Baron John Emerich Edward Dalberg Acton (1834-1902)

One was part of the empire, the other was enclosed in Poland, and they were separated by Polish territory. They did not help each other, and each was a source of danger for the other. They could only hope to exist by becoming stronger. That has been, for two centuries and a half, a fixed tradition at Berlin with the rulers and the people. They could not help being aggressive, and they worshipped the authority that could make them successful aggressors.

  • Anne Bradstreet and Her Time, by Helen Campbell (1839-1918)

With the most ambitious of the longer poems–“The Four Monarchies”– and one from which her readers of that day probably derived the most satisfaction, we need not feel compelled to linger. To them its charm lay in its usefulness. There were on sinful fancies; no trifling waste of words, but a good, straightforward narrative of things it was well to know, and Tyler’s comment upon it will be echoed by every one who turns the appallingly matter-of-fact pages…

That last one is the only of the five to have one word of double-digit length, thus covering two digits of pi (‘straightforward = 15 letters = ’15’).

Future Work:

I would like to do a similar analysis of an even larger dataset of more modern language. One possibility is a full archive of Wikipedia. I wonder what is the longest string of unintentional Pilish ever produced?

Another interesting question is how the maximum length of Pilish sections in a document scales with the length of the document, and how well this can be modelled with a simple statistical model such as a Markov Chain.

January 2023
M T W T F S S
 1
2345678
9101112131415
16171819202122
23242526272829
3031  
« Mar    

Ben’s Tweets

  • RT @OJ_Astro: A blog post by physicist Syksy Räsänen (@SyksyRasanen), who has published with us twice, about why overlay journals are an an… 1 day ago
  • RT @JokesAstro: If the Islands of Hawaii were very massive stars, Kauai would be going supernova soon. https://t.co/NEboP6FRWd 2 days ago
  • RT @AnicaSeelie: @gauravmunjal Imagine if there was a duck but it had human ears 2 days ago
  • RT @jfmclaughlin92: Friendly reminder that if you're at a public university, your institutional email can be searched basically whenever. B… 2 days ago
  • @rcrain_astro Academia has a lot of problems. Precarious employment, massive overworking of junior researchers, poo… twitter.com/i/web/status/1… 2 days ago

Nicholas’ Tweet’s

  • RT @michielsdj: New paper! Retrieval-augmented models are expensive. Make them faster by partially pre-computing passage representations. W… 5 days ago
  • RT @michielsdj: New paper! We propose FiDO, an improved version of Fusion-in-Decoder with faster inference and better performance. Work don… 1 month ago
  • @_julianmichael_ @LukeZettlemoyer @emilymbender @nlpnoah @ssshanest Congrats! 5 months ago
  • RT @michielsdj: Now accepted to @iclr_conf! 🎆 1 year ago
  • @mjskay Yeah, I feel a major point people were missing is that an endless spiral into the drain is actually the perfect visual metaphor. 1 year ago

Top Posts

  • Basic Data Plotting with Matplotlib Part 3: Histograms

Tags

100daychallenge advertising astronomy bash biology blogs BMC books browsers Bulshytt calligraphy canada coding cognitive computers computer science css EEE elvish ereaders ethics evolution experiments facebook google government html humor humour ICP I hate this class iliad internet explorer irex java javascript lego letter libraries marketing materialism matplotlib maze mindstorms mods morality mysql networking neuroscience pens philosophy philsophy php Pilish prime minister programming psychology reading review robots science SENG servers sociology steampunk stupid technology time ubc UNIX url vim web web design writing

Blogs We Read

  • Bad Astronomy
  • Boing Boing
  • Rationally Speaking
  • Terry Project (UBC)

RSS Nicholas’ Terry Posts

  • An error has occurred; the feed is probably down. Try again later.

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

Join 162 other subscribers

Blog at WordPress.com.

  • Follow Following
    • Bespoke Blog
    • Join 74 other followers
    • Already have a WordPress.com account? Log in now.
    • Bespoke Blog
    • Customize
    • Follow Following
    • Sign up
    • Log in
    • Report this content
    • View site in Reader
    • Manage subscriptions
    • Collapse this bar
 

Loading Comments...