Skip navigation

Please go to

Slacker LauraImage

This is my slacker avatar, to be graded “easy.”



Laura and I finally had a moment to regroup post Chicago. Both of us were excited by the warm reception(s) we received for our “strange” work at MLA. We decided to continue the journey, but to of course increase the strangeness. To that end, I ran some new google searches today – my favorite being: “natural language grammatical parsers”. This search brought me to Stanford’s Natural Language Processing Group (“SNLP”). If you decided to pass on the link, here’s the key info:

“…algorithms that allow computers to process and understand human languages…”

Does that sound awesome or what!?

What is sooo exciting about this phrase (to me) is the potential it suggests for deeper semantic visualization.

SNLP incorporates both: “…innovative probabilistic and machine learning approaches to NLP…”; this includes the ability to train the system, which is pretty spectacular!! I suspect in some circles this is old news, but to me the possibilities, in regard to my and Laura’s continued pursuits of strangeness, is mind-boggling.

Of course that being said, the software will take some time to understand and integrate in the existing work. Here’s an online version of the parser.


• Please Note: The Stanford parser returns a very abbreviated shorthand of the parts of speech (or perhaps it has been too long since I sat in an English class). Here’s the decoder:

Here is an excerpt from a paper I gave at MLA about Ira’s visualization experiments:

            Digital artist Ira Greenberg has created for the Poetess archive several types of visualizations using XML-encoded documents.  The text we used is Felicia Hemans’s poem “Domestic Affections.”  The first visualization is a word fountain.  Based on word count, this dynamic picture alphabetizes the words and shoots up water: the higher the water, the greater number of times that word appears in the poem.  The highest water spurts correspond to words such as “the” and “a,” so we are really interested in this middle range.  If one scrolls over it, the word appears on top.  still from word fountain

As is visible here, in “poetess” poetry, “death” is prominent: “Domestic Affections” describes a woman’s desire to have her genius recognized as such, allegedly with fatal consequence.  water fountain

Here you see the cursor discovering in higher spurts of water “she, serene, shade, shadowy, seraph-dreams,” a rather nice list of attributes for what she needs to be and why death might therefore be needed to neutralize a woman’s intellect.

            Another visualization crated by Greenberg is what I call an elocutionary diagram.  


What you see here is of course a still from an animated visualization.  Here a worm-like creature crawls through the poem – through “Domestic Affections” – pausing at each punctuation mark for an amount of time determined by the mark’s relative punctuating force.  A comma generates a curvature in the  arthropod’s body, an exclamation point causes it to stop for a longer period of time and so its tail crashes into its head generating a series of large V shapes.  Here we see, in other words, the poem’s elocutionary force pictured as movement through time.  Why do Vs cluster and clatter here, why gently rolling comma-lulls there?

            Finally Greenberg developed a color wheel visualization based on positive (red), negative (blue), and neutral (gray) terms.

color wheel

Concentric rings moving out represent each line in the poem from beginning to end.  Here we see its emotional ups and downs, how they cluster, contrast, compare.  It is interesting to see that our perception of the color gray or neutrality is affected by the colors surrounding it, just as a peaceful moment after being almost hit by a bus differs from a peaceful moment when one wakes up after pleasant dreams.  One sees here not only where positive terms dominate and where sadness prevails, but how such alterations affect one another.

Paper / Power Point

I’ve created two (VERY) simple semantic visualizations based on a search for terms defined as positive or negative. I was originally planning on dynamically generating word lists using WordNet or some other dictionary api. However, good-old has a much wider and deeper word well (including returns from WordNet). I looked into programatically parsing the returned url (which I may eventually do), but for now have generated the word lists manually (I know, I know, this is admitting some defeat). The visualizations plot a linear and then radial gradient based on lines containing the pos or neg terms. I keep track of the number of pos/neg terms, should a line contain multiple terms (some do). Each line (or concentric ring) overlaps its neighbors and is translucent, allowing some optical color mixing. Arbitrarily– red is pos and blue is neg. The gray is neutral.


Linear Visualization

Radial Visualization

The next visualization (yes, yes it’s a day late) plots an array of protobits (pixels) based on all the characters in the poem (including spaces). The syntactic elements are the colored pixels in their actual location in the poem. The poem is read (so to speak) by an arthropod-esque bot that moves across the characters. The arthropod’s motion is affected by the respective syntactic elements it crosses. Any characters the arthropod head touches are displayed in the bottom right of the window. The syntactic elements are also displayed in the center and remain there until the next element is reached. The arthropod, built as a series of interconnected springs, is a metaphor for the stream of reading that is affected by syntax, as well as its own inertia.

Link to syntactic visualization

I created my first visualization today for the project. Keeping things simple I plotted word usage count as a particle graph. (I also settled on the term protoBits).

Link to Visualization

My process: All the words in the poem were sorted alphabetically, and duplicate words were counted. I plotted the unique words along the x-axis and the duplicate words along the y-axis. Each particle initially occupies a unique position, but to keep things more interesting I made them dynamic and added a random jitter to their x-position upon impact with the ground. Although there is acceleration along the y-axis, there is no gravity/friction–so the system never stabilizes. Moving the mouse over a particle reveals the word plotted. The higher particle columns represent the more common words. Particles turn orange once they’ve been rolled over.

My goal will be to try to create a unique visualization (of increasing complexity) each day leading up to the conference, (so please stop by tomorrow 😉 )

I’ve been able to get the WordNet API integrated into a simple Java app. One amusing side-note is that I got stuck for a day trying to get the WordNet .jar file to run in my Java app. After spending a few hours of unsuccessful Googling, I picked up my own book, in which I explained (to myself) how to solve the problem. So what I thought originally would be the more time consuming and challenging parts of the project–parsing and semantic relationships–have been (at least initially) fairly straightforward. The larger challenge that looms before me is what the heck I’m going to do with all this data.

The problem is not actually what to do, but rather what to do in the next 2 weeks, prior to MLA. I wish I could just explore this material without the burden of deadline. This was supposed to be how I was going to spend my sabbatical this fall–yeah, right!

My thoughts about the visualization process today are to begin with single cell creatures and work my way up. I’ve been thinking about a name for these fundamental organisms: microbots, micro-protobytes, microbytes, protobits, protobots. My thought for these initial creatures is single pixels that bounce in 1 dimension: distance = word usage. I know this is fairly boring, but I feel like I need to begin simply and fundamentally. I will post a few Processing sketches of these initial tests next.

It’s time this blog is resuscitated.

Fortunately I have something to write about, as I am beginning a very interesting collaboration with Laura on the visualization of 18th century romantic poetry-a subject I am severely ignorant about. Here is a recent note I sent to Laura:

Sent Dec 12, 2007

… Some initial thoughts I want to share:

1. I’ve been thinking and working on parsing:
Thus far I’ve been able to input the poem and generate some relatively simple statistical data about overall syntax and word usage (i.e. number of occurrences of terms). I could (and will) parse deeper and collect phoneme groups, prefixes, suffixes, etc as well. In addition, I really want more semantic “meat”, so I’ve downloaded WordNet ( a “lexical database for the English Language” developed at Princeton). WordNet should (I’m hoping) allow me to query all terms against a simplified semantic interface. For example, I would like to be able to identify any term that relates to birth or death or love or hate, etc. This seems the only logical way to approach mapping semantics. Of course, once I collect buckets of terms based on these more general concepts, finer semantic filtering could occur recursively (man that sounds pretentious-put it on the poster “fer sure”!). For example, all the terms that semantically connect to birth, could be further separated–giving forth of an idea, creating a life-form, heritage, lineage, noun vs verb, etc., etc.

If time permits (hah!) it would be good to find some other dictionary api’s; for example aural data (relating to phonemes), etymology, etc.

Once all this mess of data is collected and statistics are generated, I’ll connect the data to a visualization tool. For now, I’m thinking about using my protobyte forms as sort of a conceptual armature (genus perhaps?). I would love to have the poem visualizations/protobytes motile in 3D (ultimately evolving)-–poetry creating virtual life!!!