Skip navigation

Category Archives: Code and Natural Language

Laura and I finally had a moment to regroup post Chicago. Both of us were excited by the warm reception(s) we received for our “strange” work at MLA. We decided to continue the journey, but to of course increase the strangeness. To that end, I ran some new google searches today – my favorite being: “natural language grammatical parsers”. This search brought me to Stanford’s Natural Language Processing Group (“SNLP”). If you decided to pass on the link, here’s the key info:

“…algorithms that allow computers to process and understand human languages…”

Does that sound awesome or what!?

What is sooo exciting about this phrase (to me) is the potential it suggests for deeper semantic visualization.

SNLP incorporates both: “…innovative probabilistic and machine learning approaches to NLP…”; this includes the ability to train the system, which is pretty spectacular!! I suspect in some circles this is old news, but to me the possibilities, in regard to my and Laura’s continued pursuits of strangeness, is mind-boggling.

Of course that being said, the software will take some time to understand and integrate in the existing work. Here’s an online version of the parser.

Ira

• Please Note: The Stanford parser returns a very abbreviated shorthand of the parts of speech (or perhaps it has been too long since I sat in an English class). Here’s the decoder:

I’ve created two (VERY) simple semantic visualizations based on a search for terms defined as positive or negative. I was originally planning on dynamically generating word lists using WordNet or some other dictionary api. However, good-old dictionary.com has a much wider and deeper word well (including returns from WordNet). I looked into programatically parsing the returned dictionary.com url (which I may eventually do), but for now have generated the word lists manually (I know, I know, this is admitting some defeat). The visualizations plot a linear and then radial gradient based on lines containing the pos or neg terms. I keep track of the number of pos/neg terms, should a line contain multiple terms (some do). Each line (or concentric ring) overlaps its neighbors and is translucent, allowing some optical color mixing. Arbitrarily– red is pos and blue is neg. The gray is neutral.

Links:

Linear Visualization

Radial Visualization

The next visualization (yes, yes it’s a day late) plots an array of protobits (pixels) based on all the characters in the poem (including spaces). The syntactic elements are the colored pixels in their actual location in the poem. The poem is read (so to speak) by an arthropod-esque bot that moves across the characters. The arthropod’s motion is affected by the respective syntactic elements it crosses. Any characters the arthropod head touches are displayed in the bottom right of the window. The syntactic elements are also displayed in the center and remain there until the next element is reached. The arthropod, built as a series of interconnected springs, is a metaphor for the stream of reading that is affected by syntax, as well as its own inertia.

Link to syntactic visualization

I created my first visualization today for the project. Keeping things simple I plotted word usage count as a particle graph. (I also settled on the term protoBits).

Link to Visualization

My process: All the words in the poem were sorted alphabetically, and duplicate words were counted. I plotted the unique words along the x-axis and the duplicate words along the y-axis. Each particle initially occupies a unique position, but to keep things more interesting I made them dynamic and added a random jitter to their x-position upon impact with the ground. Although there is acceleration along the y-axis, there is no gravity/friction–so the system never stabilizes. Moving the mouse over a particle reveals the word plotted. The higher particle columns represent the more common words. Particles turn orange once they’ve been rolled over.

My goal will be to try to create a unique visualization (of increasing complexity) each day leading up to the conference, (so please stop by tomorrow ;-) )

I’ve been able to get the WordNet API integrated into a simple Java app. One amusing side-note is that I got stuck for a day trying to get the WordNet .jar file to run in my Java app. After spending a few hours of unsuccessful Googling, I picked up my own book, in which I explained (to myself) how to solve the problem. So what I thought originally would be the more time consuming and challenging parts of the project–parsing and semantic relationships–have been (at least initially) fairly straightforward. The larger challenge that looms before me is what the heck I’m going to do with all this data.

The problem is not actually what to do, but rather what to do in the next 2 weeks, prior to MLA. I wish I could just explore this material without the burden of deadline. This was supposed to be how I was going to spend my sabbatical this fall–yeah, right!

My thoughts about the visualization process today are to begin with single cell creatures and work my way up. I’ve been thinking about a name for these fundamental organisms: microbots, micro-protobytes, microbytes, protobits, protobots. My thought for these initial creatures is single pixels that bounce in 1 dimension: distance = word usage. I know this is fairly boring, but I feel like I need to begin simply and fundamentally. I will post a few Processing sketches of these initial tests next.

It’s time this blog is resuscitated.

Fortunately I have something to write about, as I am beginning a very interesting collaboration with Laura on the visualization of 18th century romantic poetry-a subject I am severely ignorant about. Here is a recent note I sent to Laura:

Sent Dec 12, 2007

… Some initial thoughts I want to share:

1. I’ve been thinking and working on parsing:
Thus far I’ve been able to input the poem and generate some relatively simple statistical data about overall syntax and word usage (i.e. number of occurrences of terms). I could (and will) parse deeper and collect phoneme groups, prefixes, suffixes, etc as well. In addition, I really want more semantic “meat”, so I’ve downloaded WordNet ( a “lexical database for the English Language” developed at Princeton). WordNet should (I’m hoping) allow me to query all terms against a simplified semantic interface. For example, I would like to be able to identify any term that relates to birth or death or love or hate, etc. This seems the only logical way to approach mapping semantics. Of course, once I collect buckets of terms based on these more general concepts, finer semantic filtering could occur recursively (man that sounds pretentious-put it on the poster “fer sure”!). For example, all the terms that semantically connect to birth, could be further separated–giving forth of an idea, creating a life-form, heritage, lineage, noun vs verb, etc., etc.

If time permits (hah!) it would be good to find some other dictionary api’s; for example aural data (relating to phonemes), etymology, etc.

Once all this mess of data is collected and statistics are generated, I’ll connect the data to a visualization tool. For now, I’m thinking about using my protobyte forms as sort of a conceptual armature (genus perhaps?). I would love to have the poem visualizations/protobytes motile in 3D (ultimately evolving)-–poetry creating virtual life!!!

For our first brownbag in our self-directed Visualization Seminar, Ira gave us a chapter of From Complexity to Creativity: Computational Models of Evolutionary, Autopoietic and Cognitive Dynamics, by Ben Goertzel: Ch. 9, FRACTALS AND SENTENCE PRODUCTION about using the L-System (Lindenmayer System) for creating sentences. I was really happy about both the chapter’s argument and the L-system itself because they both evade expressivist theories of how language works, you know, that 19th-c idea probably traceable to John Stuart Mill that thoughts and feelings are this inchoate stuff that you shove into language, words envisioned as containers (or trash bins, really, if you are thinking catharsis). These theories have been overturned by deconstruction, or less obscurely, by Lakoff and Johnson, Metaphors We Live By. In any case, this kind of programming system which produces the beautiful visuals that go with fractals (in Benoit Mandelbrot’s work), because it is recursive and also because it allows for bracketing and incorporating environmental effects into a recursive growth system, generates these beautiful tree-like structures. It seems perfect for generating sentence structures as well. I want to use it in creating my novels of various genres that will ultimately be CAVE novels — not in any mimetic sense; the graphics will be produced by the words but not as pictures of what the words say.

This was great fun — y’all come next time!!

The (pretty bad) poem below will execute in Processing. I tried (with my very limited capabilities) to illustrate an example of what I'm calling "supertext", where the source code is semantically coded and also executable. I'd be very happy to collaborate with some–more literate person than myself–on this. I could probably sling the code and conceptualize some visuals, if you could handle the wordsmithing.

// paste everything below into Processing and hit the run arrow or (OSX: cmd + r WIN: cntrl + r)

/* hunger
Ira Greenberg
original "puff" code October 22, 2005
revised "hunger" May 23, 2006
*/

// head of the beast
float heaving;
float ascension;
float anxiety = .7;
float hope = .9;
int darkness = 0;
int heavens;
int dirt;

// body of the beast
int flesh = 2000;
float[]guts= new float[flesh];
float[]blood= new float[flesh];
float[]girth = new float[flesh];
float[]heft = new float[flesh];
float[]fate = new float[flesh];
float[]compulsivity = new float[flesh];
float[]tenderness = new float[flesh];
color[]phlegm = new color[flesh];

void setup(){
size(400, 400);
heavens = width;
dirt = height;
background(255);
noStroke();
// begin in the center
heaving = heavens/2;
ascension = dirt/2;

//fill body
for (int i=0; i< flesh; i++){ girth[i] = random(-7, 7); heft[i] = random(-4, 4); compulsivity[i]= random(-9, 9); tenderness[i] = random(16, 40); phlegm[i] = color(255, 50+random(-70, 70), 30, 3); } framerate(30); }

void draw(){
background(darkness);

// purpose
for (int i =0; i< flesh; i++){
fill(phlegm[i]);
if (i==0){
guts[i] = heaving+sin(radians(fate[i]))*girth[i];
blood[i] = ascension+cos(radians(fate[i]))*heft[i];
}
else{
guts[i] = guts[i-1]+cos(radians(fate[i]))*girth[i];
blood[i] = blood[i-1]+sin(radians(fate[i]))*heft[i];

// wrenching
if (guts[i] >= heavens-tenderness[i]/2 || guts[i] <= tenderness[i]/2){
girth[i]*=-1;
tenderness[i] = random(1, 40);
compulsivity[i]= random(-13, 13);
}
if (blood[i] >= dirt-tenderness[i]/2 || blood[i] <= tenderness[i]/2){
heft[i]*=-1;
tenderness[i] = random(1, 40);
compulsivity[i]= random(-9, 9);
}
}
// creation
ellipse(guts[i], blood[i], tenderness[i], tenderness[i]);
// divinty
fate[i]+=compulsivity[i];
}

// mind wandering
heaving+=anxiety;
ascension+=hope;

// hopes edge
if (heaving >= heavens-tenderness[0]/2 || heaving <=tenderness[0]/2){
anxiety*=-1;
}
if (ascension >= dirt-tenderness[0]/2 || ascension <= tenderness[0]/2){
hope*=-1;
}
}

Wow!!! “Phatic”, “zeugma”, “syllepsis”. Laura obviously spent her time in Ithaca much more productively than I did (too much turpentine sniffing.) And she even does her homework!

“Linguistic power comes not just from the connotative dimension but also from its performativity…performativity is unlimited, dependent upon uptake and context, but those aren’t extraneous exactly – they can be coded in the linguistic production itself…”

I’m not sure I fully understand “performativity”. My point is not that natural language is limited in its potential to describe, express, etc. Obviously rich complex worlds have been built in words. But, in comparison to mathematical language, these worlds are fuzzy (in a good way.) When we say or write anything, I don’t believe the signification can ever be fully known. However, the expression 2+2 = 4 can (perhaps) never be unknown. The former is dynamic and mutable, the latter static and immutable. I am not passing a value judgment on either of these systems. We can of course, as Laura suggested, code in more context, but as we add specificity we simply approach the infinite (Zeno’s paradox).

“While it is true that in English you can say "I love my skates" and "I love my mother," it really only seems to be the case (or is only true of syntactic rules) that the verb "love" doesn't have a declared datatype for its object.”

This is precisely what I think I was trying to say. The concept of a declared (immutable) datatype is foreign to natural language, right? We can use other explicit structures to build a context of meaning, but ultimately any datatype abstraction needs to be subordinate to a dynamic emergence; language needs elbow space. This is what I meant by “semantic expansion”. From a coding perspective Datatypes (classes), in object-oriented programming, are static constructs that enforce encapsulation and contractual communication. In a pure OOP system, everything would be an object, (based on a datatype.) “Love” would be forced to choose its type. Although, through inheritance, the possibility does also exist for “Love” to be of multiple types. Regardless though, some discrete datatype(s)/object binding is required*.

“Arthur Quinn says that ‘the simplest definition of a figure of speech is an intended deviation from ordinary usage,’ an intentional mistake, and that's what your ‘I love my skates, and I love my mother’ (I'm rewriting it to make a point) would be if they appeared in the same sentence. The sentence is a specific kind of mistake…”

A discussion on the notion of “mistake” would make another worthy post (if anyone’s sitting on the sidelines ready to jump in.) I might argue (probably very foolishly) that there are ONLY mistakes in natural language and no mistakes in mathematical language. When I taught painting (prior to selling out) I described painting as a series of near misses. I guess I’m thinking of mistake as deviation from intention. Thus every human gesture is a small (or larger) mistake. Mathematically we can prove this, referring back to Zeno, but that would be damn boring. In math, until something is proven, it remains unproven; there is no figure of speech territory. Coding does offer some mistake territory, as I tried to illustrate with my fuzzy polygon program, based on random number generation.

“Barthes's S/Z is really a program that codes Balzac's short story "Sarrasine." That text demonstrates that the program for generating the story — really the program for generating any natural sentence in all its connotative and performative grandeur — would have to be so much longer than the sentence or story itself, and I'm not sure any of it would ever be generalizable to other sentences or stories, which is why such coding would be a worthless endeavor, as was my attempt to write an XSL transform to write Wordsworth's poem "A Slumber Did My Spirit Seal."

This last point I agree with. Using code as a mimetic or transformative tool is usually more work than it’s worth. However, using code as a primary generative medium offers unique and fresh possibilities, outside of the domains of natural and mathematical languages. Because code has access to the rigid precision of mathematical language and the narrative fuzziness of natural language, it offers (I still think) possibilities for a new (whole brain) integration, especially needed at our esteemed (disciplinary biased) institutions of higher learning.

* Some languages such as Java rely on late binding, allowing objects to be bound to datatypes dynamically at runtime. This approach supports polymorphism, promoting a high level of object abstraction.

. . . first, as assigned by John.  I built a game / interactive fiction in Inform 7, and you can see the results here.

Second, Ira's assignment: I generated the triangle, the polygon, and the fuzzy polygon, which IS beautiful.  But I can't somehow get from it to language, partly because I'm stuck on some of the things you are saying about language.  I want to lay out my thinking about the differences / similarities between natural language and code, based on your posting.

Linguistic power comes not just from the connotative dimension but also from its performativity.  I say "I love you" to different people to whom it means different things, but I also do different things when I say it: it can serve a phatic function, express an obsession, enact insecurity, compensate, even wound somebody — performativity is unlimited, dependent upon uptake and context, but those aren’t extraneous exactly – they can be coded in the linguistic production itself (they are, in the hands of Jane Austen, e.g., incredibly clear).

While it is true that in English you can say "I love my skates" and "I love my mother," it really only seems to be the case (or is only true of syntactic rules) that the verb "love" doesn't have a declared datatype for its object.  Arthur Quinn says that "the simplest definition of a figure of speech is 'an intended deviation from ordinary usage," an intentional mistake, and that's what your "I love my skates, and my mother" (I'm rewriting it to make a point) would be if they appeared in the same sentence.  The sentence is a specific kind of mistake — often labeled zeugma but it's really "syllepsis," I think, and the most famous example of it is Alexander Pope's line about Belinda who is in danger: Belinda may either "stain her honour, or her new brocade."  That mistake is funny because it violates rules of decorum (I'm not sure whether they are rules about connotation or rules about performance).  The performative effect, however, is to make us think about Belinda — she is clearly a ninny, someone for whom staining a dress and losing her chastity are acts of the same magnitude.  And your sentence "I love my skates, and my mother" similarly tells us something about you, which you of course recognize with the parentheses and the wink!  Rules can be written to express the performative effect: you could, I sincerely believe, make a Jane Austen game (a game about psychological realism).  If "skates" were entered into the game coded "thing [datatype] lovedObject [variableName]" while "Mom" were coded "person lovedObject," your program wouldn't ever substitute skates for Mom, or would do so only if you called function "syllepsis."  Language is only baggier than code if you don't take into account all that it is doing at any given moment, all of which can be coded.  Barthes's S/Z is really a program that codes Balzac's short story "Sarrasine."  That text demonstrates that the program for generating the story — really the program for generating any natural sentence in all its connotative and performative grandeur — would have to be so much longer than the sentence or story itself, and I'm not sure any of it would ever be generalizable to other sentences or stories, which is why such coding would be a worthless endeavor, as was my attempt to write an XSL transform to write Wordsworth's poem "A Slumber Did My Spirit Seal."

Follow

Get every new post delivered to your Inbox.