November 17, 2004
The Grey Video
So I'm scrolling me blogroll while attending the blogging bloggers and the blogs they blog foolishness.
By the way, if you're a blogger, add your blog to the blogdex now. Help a librarian run metrics on the community of making blogging.
So, I'm scrolling me (what am I a pirate?) blogroll while attending the blogging bloggers and the blogs they blog foolishness.
And I come across this at boing boing.
I'm kinda surprised it took this long for the video mashup to follow the audio.
Here's another good mashup post from boing boing.
Meatballwiki and Blogdex
Okay, last ASIST Annual Meeting entry, and guess what it's about.
Blogs, and the blogging bloggers who blog them.
That's right its "Beyond the Sandbox: Wikis and Blogs that get work done"
And I'm spent, got no more Infotainment in me.
Go check out:
<rant> <!-- Insert tirade on the self-referential, closed community of the so-called "world wide web" --> </rant>
November 16, 2004
Quick links and cool Infostuff
Okay, I realized I should at least give you the opportunity to go on to learn more about ASIST and its Annual Meeting.
And I have a really cool poster to share that I helped my colleague William Reilly prepare for the SPARC workshop he'll be attending later this week.
So there.
K-Blogs, Wirelessness and the Cult of Personality
Today's ASIST Annual Meeting entry should have appeared much earlier today. I attended the presentation "Blogs for Information Dissemination and Knowledge Management" happily prepared to do what Metametadata do in real time while experts discuss what it is metametadata do. I mean the metameta opportunities were OFF THE CHARTS!
But no reliable wireless connectivity in that particular conference room. ARE YOU FREAKIN' KIDDING ME! Everywhere else in the hotel but conference room Providence II?!!! The presenters couldn't even access the internet to show in real time their own blogging efforts. Without internet access, why even talk about internet technologies? It should be painfully obvious to everyone by now how fundamentally important the basic infrastructure of the internet is and how it should be a common trust supported and enjoyed by all. It shouldn't even be a question, it should just ... be ... there.
So now I have to talk about blogging for oneself vs. for others and for knowledge capture vs. infotainment without the benefit of a punditry soundtrack. I mean at least I'm attending "Diffusion of Knowledge in the Field of Digital Library Development: How is the Field Shaped by Visionaries, Engineers and Pragmatists?" In other words how to develop your own cult of personality, a topic of interest to all bloggers (including myself) who hope to reach a wider audience.
I want to be one of the few lucky ones who get read by everybody. I feel qualified to be an arbiter of "cool." And I'd love people to pay me to pontificate on the future of the information society. Basically, as all readers of this blog are again painfully aware, I'm totally ready for Guruhood. Now I just need to find some people to hang on my every word. I think I've already started to cultivate the appropriately eccentric personality traits. I've stockpiled my somethings to say and now I just need some people to listen to me.
Any takers? (If so, feel free to link to this website!)
WOW, did I use enough emphasis in this article, or what. Don't you just want to sign up for my new lecture series?
November 15, 2004
Meaning and Animal Compassion
Day 2 at the ASIST Annual Meeting is all about Meaning.
First, Sir Tim Berners-Lee, inventor of the world wide web and director of the World Wide Web Consortium, or W3C (located in the Stata Center at MIT), promises us the real world through the magic of the "semantic web." Whereas the protocols that make up the world wide web, TCP/IP, HTTP, html and others represent syntatic agreement for creating and sharing hypertext, the semantic web represents a method for reaching semantic agreement about the content of our hypertextual communications.
The idea is that you define your terms within a specific domain. If you're creating learning objects in a courseware system and your using the Learning Object Metadata and IMS Content Packaging system you define what those schemas mean by a "Contributor" or "Manifest". That way, someone else in an institutional repository system which uses Dublin Core Metadata and the Metadta Encoding and Transmission Schema can programmatically and systematically map those to "Creator" and "StructMap".
This is accomplished without the creation of a grand unified Ontology (essentially one big set of definitions for everything that everybody uses). Instead everyone who adopts Semantic Web technologies warrants that they'll provide their domain specific definitions and then programs can interpret these definitions and find equivilancies.
The definition mechanism is the Uniform Resouce Identifier. A glorified url of the form http://... It is important to remember that uri's aren't the things they define, even in the virtual world. They're still nothing more than names. Just like menu entries aren't food and you can't eat them (I'm sure people have tried).
For example, I could define myself via a vCard or even just a plain old vanilla html where I've deposited my curriculum vitae or just some contact info. Then whenever I include myself in the metadata for any web content I create, I don't just write in my name, I point to the web page where I've defined myself. The mechanism for pointing is RDF, which is just a logical triple of the form Subject-Predicate-Object.
The idea is that you build your collection of information to share with the world from the viewpoint of in between really big scale operations and really small scale ones. You correlate the definitions provided under you and you make your definitions with an eye to their inclusion in larger correlations. The same definition activity goes on at all levels. The semantic web is a fractal web and it realizes the hermetic wisdom As Above, So Below.
You build your data system as a module of modules, incorporating data from different domains. You don't have to set definitions for each data store you tap into and you expose your data for even larger system via standards that allow the larger system to do likewise in aggregating you. No one is forced to one semantic standard, but each can understand each other with a minimum of human supervision.
I think there are two illusions in the RDF grand scheme.
The first is the Aristotelian Illusion that we can speak definitively that any one label, even a rdf triple, is the thing being captured. Despite explicit recognition that uri's are not the things they represent we still interact with triples as if they are. For example, is Gary Marchionini his entry in the NACO authority file maintained by OCLC or the similar record on Amazon? Obviously he's neither, and we should be pointing to some human being in space-time (assuming we share his inertial system). Who has the right to define him? Is it possible to dice up the entire universe into non-competitive domains? I can't imagine it for identity definitions, what about conceptual? I can't get past the dread that good old social conflict will subvert all our technological advances. I would really like to understand better how RDF plans to live happily with a myriad of definitions that don't quite match exactly.
The second illusion is that RDF encourages interoperability without gobbling up smaller standards. It attempts to let semantic standards alone, but it dictates a common syntax. The last time the W3C tried this was html, and how long did it take for the major browsers to implement three or four different flavors of html and the DOM? And how long did it take before the browsers started playing nice and made it easy for content providers to produce material that worked the same way across all systems. I would love to see the semantic web's game plan for avoiding the same sort of hijacking of the "standard."
The one thing that I do like is that Mr. Berners-Lee seems to be in concert with Ms. Hertz and a lot of others regarding the economies of scale on the internet. RDF and the semantic web proposes a solution to bad economies of scale regarding standardization and interoperability of data systems on the web. Departmentalizing the job of defining the content on the web lets everyone do their share and gives more responsibility to the content provider to produce good metadata at the time that they create their content. In a post-modern information society it's no longer good enough to dump your data on the world, you've gotta tell us what it means to you. Sounds kind of silly, but it goes a long way towards improving communication and understanding.
Secondly, Gary Marchionini (not Gerry) presents again, this time in Rhode Island in a presentation called, "Why can't Johnny file?" in which librarians lament the poor information organization behavior of the average user of the new information technologies. It is Gary's contention that we do not need to teach him to file. That there are better things for humans to be doing than organizing our inboxes and hard drives.
His main argument repeats the thesis of his earlier presentation in Cambridge. Let the machines do what the machines do, let the humans do what the humans do.
I agree with him to a point. And that point is this: It's less important to teach someone to file a file so they can find it later than it is to teach them to ask better questions about the information content of the file.
That said I have two caveats and hopefully some clarification.
First, Gary assumes future advances in technology. I am far more wary of making any bold predictive statements about the advance of automatic indexing of multimedia resources or facial recognition software. I wouldn't design instructional methods or encourage information behaviour that relies on a technology to present itself.
Second, Gary equates filing and other organization activities with discovery. In fact, there are three reasons to organize information by filing or classifying and the ability to find it again later is the least of the three. The second most important of the three is to be able to communicate something about the information to others. This information organizing activity is all-pervasive in our society and extends from sharing ipod playlists to writing surveys of the scholarly literature in a particular field.
The thing that we communicate when we share our organizations isn't some grand unified Ontology that we discover. We can all admit that there isn't one master organizing scheme that we can teach to. Rather by sharing our personal organizational efforts with others we share our relationship to information, our worldview, and creating this relationship is the most important thing we do when we file. We need to teach filing and classifying as one part of a larger program to teach people how to develop good relationships with information, how to evaluate it properly, how to find more and better information, how to put it to work for you, and how to relate it to existing personal information stores. We need to teach how to recognize how information affects our thinking, and how our thinking colors the information we encounter. In this respect all filing and classifying can be viewed as interpretation, and we should teach such basic skills as organizing things into simple buckets to see what happens. This is how we teach kids to invent/discover meaning in the world they encounter.
Near the end of the question and answer session following the presentation a fellow stood up and made the most salient point of the day. The session was titled wrong. We shouldn't teach filing, we should teach organizing. How stuff is stored in a server or hard drive or repository or database isn't what we need to know. But we shouldn't throw the baby out with the bathwater. We do need to know how to intellectual and mentally store stuff away. I would offer that what the fellow meant was that we should teach people how to mark up content, to add metadata that describes for themselves and others the personal and social context in which they interact with information and which captures the relative meaning which they found in it.
Okay, Number 3
So the Omaha people would send their young persons at puberty on what white folks ended up calling vision quests. What they were really trying to do was gain the favor of Wakonda (think the Tao rather than Yahweh). The idea was that a sincere child who stood in place for four days with clay rubbed on their head would be more likely to influence Wakonda than a cynical adult. As might be expected, the kids starting recieving visitations from different mediums, animate and inanimate. These visitations turned what was originally an event of supplication on behalf of the tribe into a personal appeal to the most imporant force in the universe for help throughout life, undertaken when a child was "old enough to know sorrow," in other words to engage in tribal relations as an adult.
This is the prayer that all Omaha know, that they all sing as they stand Non'zhinzhon, "as sleeping," for four days, trying to think the happiest thoughts they can for their future.
Wakonda thethu wahpathin atonhe
Wakonda! here, needy, he stands, and I am he
When the kids come back they can't say anything about the animal for four days, then they can go talk to someone who also was visited by their same medium (and they can talk to no one else, and no one else asks). The elder helps them to understand what happened and what it might mean for their relationship to the tribe.
It is said of that the medium "has compassion on" the child, that this is its motive for impelling its particular form on the child for the rest of the child's life. It has decided to take an interest in the child's future because it pitys the child.
I like the idea of going out to gain the favor of some animal or the other. I especially like the no talking about it for four days rule. It seems to really drive the experience home and allow the child to come to terms with it, to properly prepare before talking about it. Once the child has talked it over with an elder the child then goes out and finds the physical expression of that medium and fashions a trophy. If a hawk visited the child, the child would journey until he crossed paths with a hawk, would kill it and preserve the bird as a visible sign of his visitation, the child's most prized possession.
Kids would even become junkies for the process, going through it over and over again until they settled down with a family (if they became holy persons, today called roadmen, they continued the process their whole lives). It was generally frowned upon once one became a respectable member of adult society.
I read all of this from the Omaha Indian book by Alice Fletcher and Joseph La Fleshe. The bible for all things Omaha. When I put on my white perspective, it seems funny that a whole tribe of Indians with active social and cultural traditions would pore over a government scholarly study for ideas on how to be more Indian. It seems a testament to the horror of any culture being systematically euthanized by another.
When I think Omaha about it, it seems just another story. There are two kinds of stories in the book. The stories of the elders that Alice and Joseph recorded, and Alice and Joseph's own stories.
I've heard most of the stories in the book from third party sources like my grandpa and grandma Charlie Stabler and Elizabeth Sansouci.
Going to pray to Wakonda was just one of the many rites that a child passed through towards becoming an adult and forming an appropriate relationship with the nation and the world. It, "required a voluntary effort by which, through the rite of fasting and prayer, the man came into direct and personal relations with the supernatural and realized within himself the forceful power of the union of the seen with the unseen," as Alice says.
Seems like the child is finding meaning in the universe and finding ways to share that meaning. Seems Alice and Joseph are also finding meaning and sharing it.
The totem is metadata and its shared, in fact, it's the only part of the experience that is shared with everybody. And it needs no one definition.
November 14, 2004
Whoever has the best library wins
Coming to you straight out of the American Society for Information Science and Technology Annual Meeting in lovely downtown Providence R.I. is every ones favorite semi-structure data source, Mr. metametadata himself, the Keeper!
Thank you, thank you. I'm very delighted to be here. I just have two things to say.
A very bright person, J. C. Hertz, who makes a living applying game design principles to information systems for major products, services and learning systems (she's principal of a company called Joystick Nation, Inc.) just made some statements I took issue with in her plenary session.
These statements are (paraphrased):
any system that requires people to add their own metadata is doomed to fail
My discomfort with this statement is merely linguistic or epistemelogical. She spoke of "accidental metadata" and provided the classic example of an image caption as accidental rather than I don't know, maybe intentional? I'm not sure I understand the "accidental" domain. Does she mean information that can be mined from the data object itself? Does she mean contextualizing information that people add as they position their data objects in some information space? Does she mean information that is the result of interaction with data objects? All of these are candidate activities for generating metadata byproducts. In fact the product of some of these activities really wants to be both primary data and metadata at the same time.
She most strongly made a distinction between people providing unasked for information as they positioned their data in an information space versus information provided as a result of the information space dropping a cataloging form in front of people and saying you have to fill this in before you can submit your data object. I think that in either case there is an expectation on the part of the owners/participators in an information space for a provider of a data object to also provide metadata. My response to Ms. Hertz is that there just isn't social momentum for making that expectation explicit. There still are strong unspoken social pressures in the information culture that keep most people from dumping a bunch of uncontextualized data into the pool. What metadata specialists do is find ways to work with data providers to take advantage of these pressures to formalize "accidental" metadata offerings to the providers' benefit.
For example, the Metadata Services Unit in the MIT Libraries provides metadata application services for MIT OpenCourseWare. The Unit creates Learning Object Metadata records for OCW resources. MIT OCW is in many respects an electronic publication. Like other publications they provide image captions. These captions contextualize the image and provide copyright information. There is no official publisher's rule that says you have to caption your images, but most people do. The caption is metadata that just plain makes the image more effective as an information resource. It's in the interest of MIT OCW to provide this caption and they've mandated it's existence in their style guide. They are, like most data providers moving to electronic information delivery, unfamiliar with xml, semistructured data and metadata records. It is natural for them to write a caption into a web page right where they embed the image. This provides a benefit to viewers of the web page and also means there's some text about the image that gets indexed for the search engine. The caption in the html makes the image more easily discovered. It is the role of the Metadata Services Unit to find this metadata OCW has provided and get it into a Learning Object Metadata record. Why duplicate the existence of this information in data and metadata? Because now, when you search for images and get a list of them back, you get the caption included in the list. It is much easier to capture the caption in a metadata record field and then display that field on the search results page. If you doubt that this is easier than grabbing the caption straight of the html page go try to make sense of the keyword in context part of a google results list. The two metadata record fields presented in a search results page are General.Title and General.Description. Because the Unit put the caption in the description field patrons have access to that information as they're making a decision whether or not to actually view the item. Once again, metadata (same info, different wrapper) aids information retrieval.
What this seems to point out to me is that there has always been a fuzzy line between data and metadata, and that the information explosion precipitated by hypertext and networking technologies has brought this confusion into high relief. The more stuff that gets into the information space the faster we seem to need to produce even more. This means that we start demanding more for our metadata buck, it must do more for less effort. When folks start drawing clear lines where none existed before it's often because they're drawing in the wagons and abandoning any effort that doesn't seem to realize a return. In this kind of environment there really should not be any metadata "accidents". It's the responsibility of information science professionals to make data providers aware of this and help them form a stronger, more cost-effective relationship with metadata rather than attempt to abandon it all together.
libraries represent a well-organized, well-funded, well-mandated effort that doesn't scale well
This rounds out the arguments against human analysis and metadata production for the mass of data on the internet. Basically, data providers are lazy and data organizers can't afford to send enough librarians at the task. I think that both are misunderstandings of the current information economy. We've seen that data providers do provide metadata intentionally and there are methods to formalize this relationship, cementing the social motivation to provide metadata and realizing an economy.
We can also find the funds to catalog the internet, we've just been looking in the wrong place. Before networking and hypertext it was expensive to get a bunch of data into the world. For ease of distribution a bunch of ideas would aggregate (think gravity) into a book, an easy way to share a lot of information in one package.
It's no longer so expensive to send a bunch of ideas out there so they tend to avoid aggregation (think dark energy pushing infobits apart). The semantic density of the ASIST conference proceedings book seems galactic in scale next to this humble weblog post.
Now this same semantic density scale correlates with the scale of effort on the part of the author. It takes a book author much longer than the 30 minutes or so I've spent throwing of this ephemera. And the scale of reward is similar for the author. But this scale of effort does not hold for the librarian who might catalog either on their way to a library of digital repository. It takes the same 15 to 20 minutes to catalog a book as it does one weblog entry.
This is what Ms. Hertz means. There's enough information in books and serials to justify an international collaborative effort amongst governments, institutions of higher learning and information professionals to provide a sophisticated level of information organization.
The information space that is the internet is screaming for this same level of organization, but the justification for a similar effort on the part of librarians seems absurd. However, if you look at the scale of effort and reward on the part of data providers, it seems apparent that the creators and aggregators of this content could and should shoulder this burden, even if it means hiring the librarians away from Universities and Public Libraries. Librarians need to focus on marketing their service offerings to the data providers wherever they are. This is our entry point, how we get to catalog the world wide web. It's likely to happen piecemeal at first. Especially considering the pluralization of data providers in the current information economy. Don't shy away from any project, no matter how small. We also need to find the right social leverage points to encourage good information organization behavior on the part of those who are producing the mass of content, again no matter how small.
Okay, things to say number 2.
Search Wars, Killer Apps and Why Librarians WILL Eventually Rule the World from the ASIST Special Interest Group: Information Architecture listserv (Thanks Boniface Lau)
It seems that the theme of this post is the old adage from Brown and Duguid (The Social Life of Information), "Social change must precede technological change".
I think that librarians get this and are working very hard to improve their ability to effect the needed social change. I think that the computer and internet technology communities are lagging behind. Their response to the need to improve the quality of metadata that accompanies electronic resources is not to leverage the social pressures that will prompt folks to provide this information, but to find some algorithmic, programmatic solution to pull it out of whatever a human sees fit to deposit. If you expect people to be lazy they will. If you expect more, they'll surprise you.
I don't think that you can process the amounts of data were now seeing through human effort alone, neither can you do it by machine alone. Even google employs an army of librarians to tweek the algorithms and improve the results of programmatic analysis. I just heard a great presentation at MIT by Gerry Marchionini from UNC Libraries about automatic generation of metadata. He thinks that we should be educating a whole generation of young people to be more savvy at information retrieval and interaction, including algorithm maintenance and development. He thinks that we can very easily develop more sophisticated information resource users with higher expectations for information systems and he promotes something he calls Human Computer Information Retrieval, the collaboration between man and machine to solve the problem of interaction with growing amounts of data.
Did you notice that the article reference is from a listserv sponsored by the group throwing the conference I'm attending?
And, yes I noticed that I'm blogging about information technologies like blogging from a conference about information technologies like blogging (I'm also blogging about the conference itself, from the conference itself) and that this is some sort of metametageekiness. As you should have realized long before now, I'm very comfortable with metameta events and environs, and geekiness.