I’ve created two (VERY) simple semantic visualizations based on a search for terms defined as positive or negative. I was originally planning on dynamically generating word lists using WordNet or some other dictionary api. However, good-old dictionary.com has a much wider and deeper word well (including returns from WordNet). I looked into programatically parsing the returned dictionary.com url (which I may eventually do), but for now have generated the word lists manually (I know, I know, this is admitting some defeat). The visualizations plot a linear and then radial gradient based on lines containing the pos or neg terms. I keep track of the number of pos/neg terms, should a line contain multiple terms (some do). Each line (or concentric ring) overlaps its neighbors and is translucent, allowing some optical color mixing. Arbitrarily– red is pos and blue is neg. The gray is neutral.
I’ve been able to get the WordNet API integrated into a simple Java app. One amusing side-note is that I got stuck for a day trying to get the WordNet .jar file to run in my Java app. After spending a few hours of unsuccessful Googling, I picked up my own book, in which I explained (to myself) how to solve the problem. So what I thought originally would be the more time consuming and challenging parts of the project–parsing and semantic relationships–have been (at least initially) fairly straightforward. The larger challenge that looms before me is what the heck I’m going to do with all this data.
The problem is not actually what to do, but rather what to do in the next 2 weeks, prior to MLA. I wish I could just explore this material without the burden of deadline. This was supposed to be how I was going to spend my sabbatical this fall–yeah, right!
My thoughts about the visualization process today are to begin with single cell creatures and work my way up. I’ve been thinking about a name for these fundamental organisms: microbots, micro-protobytes, microbytes, protobits, protobots. My thought for these initial creatures is single pixels that bounce in 1 dimension: distance = word usage. I know this is fairly boring, but I feel like I need to begin simply and fundamentally. I will post a few Processing sketches of these initial tests next.
It’s time this blog is resuscitated.
Fortunately I have something to write about, as I am beginning a very interesting collaboration with Laura on the visualization of 18th century romantic poetry-a subject I am severely ignorant about. Here is a recent note I sent to Laura:
Sent Dec 12, 2007
… Some initial thoughts I want to share:
1. I’ve been thinking and working on parsing:
Thus far I’ve been able to input the poem and generate some relatively simple statistical data about overall syntax and word usage (i.e. number of occurrences of terms). I could (and will) parse deeper and collect phoneme groups, prefixes, suffixes, etc as well. In addition, I really want more semantic “meat”, so I’ve downloaded WordNet ( a “lexical database for the English Language” developed at Princeton). WordNet should (I’m hoping) allow me to query all terms against a simplified semantic interface. For example, I would like to be able to identify any term that relates to birth or death or love or hate, etc. This seems the only logical way to approach mapping semantics. Of course, once I collect buckets of terms based on these more general concepts, finer semantic filtering could occur recursively (man that sounds pretentious-put it on the poster “fer sure”!). For example, all the terms that semantically connect to birth, could be further separated–giving forth of an idea, creating a life-form, heritage, lineage, noun vs verb, etc., etc.
If time permits (hah!) it would be good to find some other dictionary api’s; for example aural data (relating to phonemes), etymology, etc.
Once all this mess of data is collected and statistics are generated, I’ll connect the data to a visualization tool. For now, I’m thinking about using my protobyte forms as sort of a conceptual armature (genus perhaps?). I would love to have the poem visualizations/protobytes motile in 3D (ultimately evolving)-–poetry creating virtual life!!!
thought folks might find this free webtool interesting.
activeCollab is an easy to use, web based, open source collaboration and project management tool. Set up an environment where you, your team and your clients can collaborate on active projects using a set of simple, functional tools. 100% free!
yes i copied the above from the website.
There is a new book by Nancy Armstrong called How Novels Think. It's brilliant, congruent with recent work by Andrew Elfenbein (in PMLA and elsewhere) which discusses print presentation, the look and feel of early 19th-c texts, as "interface." Armstrong's premise is that, since novels do a certain amount of thinking for us, they are bundles of smart data. Novelistic conventions, then, are basically a software package for making information smart. The really brilliant piece of her argument (it may be obvious, but I still think it is brilliant) is her idea that software packages and data bundles in-form: they form the inside of us — our psyches, our selves – as a means and effect of giving us information.
Armstrong's argument really helps me understand something that John Maeda is worried about in thinking about the computer as the artist's material. In Creative Code, he says that he is worried that software is becoming too complex for people to use as a tool (intuitively, without laboriously reading manuals) while programming is becoming easier at the expense of creativity. I can really understand what he's saying here if I think about software as a set of conventions for a specific type of novel — historical romance or gothic fiction, e.g. — and so the programmers of this software as the artists who come up with new genres, new forms, usable by many other very creative people. Here is Maeda expressing his worry:
Programming tools are increasingly oriented toward fill-in-the-blank approaches of the construction of code . . . . The experience of using the latest software, meanwhile, has made even expert uses less likely to dispose of their manuals, as the operation of the tools is no longer self-evident. Can we, therefore, envision a future where software tools are coded less creatively [i.e., a future of impoverished novelistic genres]? Furthermore, will it someday be the case that tools are so complex that they become an obstacle to free-flowing creativity [i.e., that you can't churn out gothic or sci fi]?
Maeda’s own software “Illustrandom” seems to me a beautiful example of something that took complicated rather than fill-in-the-blank programming and renders software that is pretty intuitive and so will allow creativity to flow.
Also, is it possible to discuss some of Ira's work, Protobytes, as the kind of work that intervenes in Maeda's problematic? Ira, you said that you used bits of code, without thinking it, as a painter might use brush strokes, throwing up bits of it, then seeing what happened?