Skip navigation

Category Archives: Interdiscipline

Laura and I finally had a moment to regroup post Chicago. Both of us were excited by the warm reception(s) we received for our “strange” work at MLA. We decided to continue the journey, but to of course increase the strangeness. To that end, I ran some new google searches today – my favorite being: “natural language grammatical parsers”. This search brought me to Stanford’s Natural Language Processing Group (“SNLP”). If you decided to pass on the link, here’s the key info:

“…algorithms that allow computers to process and understand human languages…”

Does that sound awesome or what!?

What is sooo exciting about this phrase (to me) is the potential it suggests for deeper semantic visualization.

SNLP incorporates both: “…innovative probabilistic and machine learning approaches to NLP…”; this includes the ability to train the system, which is pretty spectacular!! I suspect in some circles this is old news, but to me the possibilities, in regard to my and Laura’s continued pursuits of strangeness, is mind-boggling.

Of course that being said, the software will take some time to understand and integrate in the existing work. Here’s an online version of the parser.

Ira

• Please Note: The Stanford parser returns a very abbreviated shorthand of the parts of speech (or perhaps it has been too long since I sat in an English class). Here’s the decoder:

I’ve created two (VERY) simple semantic visualizations based on a search for terms defined as positive or negative. I was originally planning on dynamically generating word lists using WordNet or some other dictionary api. However, good-old dictionary.com has a much wider and deeper word well (including returns from WordNet). I looked into programatically parsing the returned dictionary.com url (which I may eventually do), but for now have generated the word lists manually (I know, I know, this is admitting some defeat). The visualizations plot a linear and then radial gradient based on lines containing the pos or neg terms. I keep track of the number of pos/neg terms, should a line contain multiple terms (some do). Each line (or concentric ring) overlaps its neighbors and is translucent, allowing some optical color mixing. Arbitrarily– red is pos and blue is neg. The gray is neutral.

Links:

Linear Visualization

Radial Visualization

The next visualization (yes, yes it’s a day late) plots an array of protobits (pixels) based on all the characters in the poem (including spaces). The syntactic elements are the colored pixels in their actual location in the poem. The poem is read (so to speak) by an arthropod-esque bot that moves across the characters. The arthropod’s motion is affected by the respective syntactic elements it crosses. Any characters the arthropod head touches are displayed in the bottom right of the window. The syntactic elements are also displayed in the center and remain there until the next element is reached. The arthropod, built as a series of interconnected springs, is a metaphor for the stream of reading that is affected by syntax, as well as its own inertia.

Link to syntactic visualization

I created my first visualization today for the project. Keeping things simple I plotted word usage count as a particle graph. (I also settled on the term protoBits).

Link to Visualization

My process: All the words in the poem were sorted alphabetically, and duplicate words were counted. I plotted the unique words along the x-axis and the duplicate words along the y-axis. Each particle initially occupies a unique position, but to keep things more interesting I made them dynamic and added a random jitter to their x-position upon impact with the ground. Although there is acceleration along the y-axis, there is no gravity/friction–so the system never stabilizes. Moving the mouse over a particle reveals the word plotted. The higher particle columns represent the more common words. Particles turn orange once they’ve been rolled over.

My goal will be to try to create a unique visualization (of increasing complexity) each day leading up to the conference, (so please stop by tomorrow 😉 )

I’ve been able to get the WordNet API integrated into a simple Java app. One amusing side-note is that I got stuck for a day trying to get the WordNet .jar file to run in my Java app. After spending a few hours of unsuccessful Googling, I picked up my own book, in which I explained (to myself) how to solve the problem. So what I thought originally would be the more time consuming and challenging parts of the project–parsing and semantic relationships–have been (at least initially) fairly straightforward. The larger challenge that looms before me is what the heck I’m going to do with all this data.

The problem is not actually what to do, but rather what to do in the next 2 weeks, prior to MLA. I wish I could just explore this material without the burden of deadline. This was supposed to be how I was going to spend my sabbatical this fall–yeah, right!

My thoughts about the visualization process today are to begin with single cell creatures and work my way up. I’ve been thinking about a name for these fundamental organisms: microbots, micro-protobytes, microbytes, protobits, protobots. My thought for these initial creatures is single pixels that bounce in 1 dimension: distance = word usage. I know this is fairly boring, but I feel like I need to begin simply and fundamentally. I will post a few Processing sketches of these initial tests next.

It’s time this blog is resuscitated.

Fortunately I have something to write about, as I am beginning a very interesting collaboration with Laura on the visualization of 18th century romantic poetry-a subject I am severely ignorant about. Here is a recent note I sent to Laura:

Sent Dec 12, 2007

… Some initial thoughts I want to share:

1. I’ve been thinking and working on parsing:
Thus far I’ve been able to input the poem and generate some relatively simple statistical data about overall syntax and word usage (i.e. number of occurrences of terms). I could (and will) parse deeper and collect phoneme groups, prefixes, suffixes, etc as well. In addition, I really want more semantic “meat”, so I’ve downloaded WordNet ( a “lexical database for the English Language” developed at Princeton). WordNet should (I’m hoping) allow me to query all terms against a simplified semantic interface. For example, I would like to be able to identify any term that relates to birth or death or love or hate, etc. This seems the only logical way to approach mapping semantics. Of course, once I collect buckets of terms based on these more general concepts, finer semantic filtering could occur recursively (man that sounds pretentious-put it on the poster “fer sure”!). For example, all the terms that semantically connect to birth, could be further separated–giving forth of an idea, creating a life-form, heritage, lineage, noun vs verb, etc., etc.

If time permits (hah!) it would be good to find some other dictionary api’s; for example aural data (relating to phonemes), etymology, etc.

Once all this mess of data is collected and statistics are generated, I’ll connect the data to a visualization tool. For now, I’m thinking about using my protobyte forms as sort of a conceptual armature (genus perhaps?). I would love to have the poem visualizations/protobytes motile in 3D (ultimately evolving)-–poetry creating virtual life!!!

thought folks might find this free webtool interesting. 

activeCollab is an easy to use, web based, open source collaboration and project management tool. Set up an environment where you, your team and your clients can collaborate on active projects using a set of simple, functional tools. 100% free!

yes i copied the above from the website.

look at

http://www.activecollab.com/

lisa

Last night at a 4th of July gathering, I met a friendly and thoughtful Botanist. His research involved invasive plants. He was using historical satellite imagery to help identify the effects of invasive plant density on the surrounding flora ecology. Part of his challenge was in actually identifying the plant density within the satellite imagery, which was often below the forest canopy. We spoke a little about the image analysis problem, which of course got me all hot and bothered. In my typical manic and undisciplined fashion, I began asking lots of questions about the imaging algorithms, etc, and I felt him pulling back some. He said he himself wouldn’t be doing the imaging stuff, but a co-investigator (a Geographer) would be dealing with that. I felt we had had our first fight. (Although we had only known each other for about 5 minutes.) I seem to have this effect on certain people, which got me thinking about another recent conversation I had with a Geologist friend, who does do a lot of his own image analysis work. This friend loves speaking about the algorithmic parts of his research, which involves ground water flow. I asked the Geologist if he now prefers the computer flow modeling work more than the actual Geology. He very honestly said he did. (But I could tell it was a guilty pleasure.)

This seems to be a growing problem in many disciplines (at least the perception is); the tools and processes of analysis and production are often as (or even more) fascinating than the original inquiry. I don’t really see this as a “confusing the forest for the trees” sort of problem that should justify a fear or guilt response, but rather see it as indicative of a much deeper and more fundamental problem, namely “technoprocessguiltophobia”. (OK, so this is a silly word that I made up.) At the core of technoprocessguiltophobia is the belief that “serious” work should be the byproduct of focus, intent and most importantly prioritization. Getting lost in a process, such as creating some imaging algorithm, is not the priority when you are focused on the “big picture” (plant ecology,) and there are others who are experts in this sort of thing: Regardless if you’re fascinated by a (tactical) process, it’s essential to remain focused on the goals of your (strategic) research and stay the course. However, what probably got you interested in your original discipline to begin with was this very sort of digressive fascination (if you’re lucky.)

Our evolving digital tools are fascinating. They are also decadent, often silly, expensive and very time-sucking. But of course the technology is really us, so arguably techno-phobia a is a strange sort of self-loathing (another blog post here?) The same urge to understand the structure of a leaf in the 18th century is what’s driving us to model it with bits in the 21st century, which is also the same urge to develop the tools to model it. Only we seem to give value to certain activities that have historical (romantic) precedence–so a naturalist’s traditional process of looking, touching, measuring, etc seems more benign (guilt-free) than a bunch of detached bit shifting happening in a computer. I am in no way suggesting that the naturalist should stop going to nature and doing fieldwork. However, I am strongly recommending that the naturalist let herself get lost and fascinated in the process, even if the process falls outside of their current expertise or seems digressive to the original research mission. (Obviously this might not work for junior faculty in the current system, myself included.) Ultimately, I would argue that the sense of play (re)introduced into their process, through process-focus, will invigorate their research, broaden their perspective and help eradicate the disciplinary silos strangling universities (especially the segments that don’t have access to huge government/industry funding streams.)

Well back to working on my very digressive (and currently pretty buggy) 3D rendering engine, something 8 years of classical painting training prepared me well for?

This is from a letter I sent to the faculty of the Interactive Media Studies program at Miami, regarding the development of an IMS research mission statement.

A point that I would like to see stressed in the mission is the deeper fundamental impact that computation brings to media studies. I think it is easy to lose the forest for the trees here—aided in large part by the software industry. The tendency, both in and out of academia, is to see computation (the glass half full view) as a facilitating and democratizing tool/force, which in itself is not a bad thing. However, this somewhat superficial perspective I think misses the much more significant potential of computation as a distinctive medium and even alternative "intelligence"*. The tool/force perspective relies on an industrial age paradigm-technology enhances, frees, empowers, etc. Computation fits neatly within this continuum of being another incremental step toward full automation. Again, this is a valid and useful signification. However, it seems also to me to be overly egocentric-the individual remains in control of the machine; ideally it serves his/her wishes (ultimately completely.)

In contrast, computation can be a much less agreeable and cooperative agent. As a tool, it is arguably highly inefficient. Consider the actual costs of system development, operations, training, deployment and maintenance, vis-a-vis work productivity. Of course current human demand for "toys" makes these numbers work, but if we try to separate the fulfillment of actual human needs from wants, I wonder how productive computer technology really is (yeah, yeah, I know this is wimpy lefty thinking.) Considering computation as a medium offers a significant break from the older productivity model. As a medium, computation offers universal mutability; it can model/process/analyze/generate visual, aural, tactile, kinetic, textual, etc. data–it can take the form of (perhaps) everything. Thus, when we segment to: digital media, digital humanities, etc, we are expressing a bias based on older disciplinary boundaries rather than any limit inherent within the medium. This is something to consider seriously. And filtering further to digital video, 3D, multimedia, etc seems highly problematic.

A current problem is how to get a literal grip on/in/around the medium. The software industry has stepped in to categorize/granularize the medium for us, and make a whole lot of moola in the process. They have been very effective in confusing the mechanism for the medium. Epistemology is not a high priority in the engineering process, so our software tools don't ask why only how, and we keep buying up the stuff, even if most of us never use 9/10ths of the features in these bloated tools-yet we dutifully upgrade every cycle. I would argue that to stop this cycle and get a "grip" on the medium we need greater fluency in the actual computation medium–not simply facility in manipulating commercial software applications. And this is best achieved through developing programming literacy. I believe IMS should be at the forefront of this-not to train computer scientists, but rather to provide essential education. If we want our students to be able to parse, interpret, analyze, etc. shouldn't they have that proficiency in perhaps the single most dominant and controlling medium in their lives? And obviously I think IMS research should blaze a path in this area. Let me stress again that this is not about low-level computer science based research, but rather fluency in the computation medium and work/research that reflects this fluency and hopefully helps define our emerging field.

* I'll offer some additional half-baked thoughts on "alternative intelligence" in a future post.

I just got John Maeda's book Creative Code: it's dedicated to Muriel Cooper (you mentioned her, Ira).  The dedication to her quotes something she either said or wrote:

"I was convinced that the line between reproduction tools and design would blur when information became electronic and that the lines between designer and artist, author and designer, professional and amateur would also dissolve."

Could Cooper be right?  If not, is it because levels of complexity, or aesthetic problems, prevent any genuine intermingling?  If so, why would the material of the expressive medium (electricity?) make such a difference?