March 24, 2005
So Yahoo buys flickr with which to beat Google
I went looking for the older thread on folksonomies at Many 2 Many where Clay Shirky and Lou Rosenfeld duke it out, but heck, every freaking single post at Many 2 Many these days covers this topic, so just go read them all. Especially look for the articles on "Emergent Semantics" and "One World, Two Maps".
I think the thing that annoys me the most about "folksonomies" is the confusion of lay people (and professionals!) around what users are actually doing when they "tag" images at flickr. They are not, as the Cnet article suggests, "classifying". The whole notion of the community of flickr users regulating a common semantic vocabulary ala Wikipedia is absurd. There is no more wisdom in the flickr crowd than in us collective google users and pagerank spammers, only a deep abiding interest in porn and other lowest common denominators. Even flickr's peer pressure technology threatens to get as ugly as that high school popularity contest you've been repressing all these years.
If you talk to the flickr guys, especially the President of Ludacorp Stewart Butterfield, they will describe for you a very different, very personal "labelling" effort, much closer to what you see at Google's gmail. People are not collaboratively creating and defining buckets into which they deposit their content so that the unknown, average user can later find images via the shared semantic vocabulary. When you spell out this misconception it really takes on its full ridiculousness. By that I mean, who but librarians actually takes an interest in the information retrieval needs of their fellow web denizens?
What people are mostly doing at flickr is attempting to attach labels to their content that will most closely match search terms they themselves will use later to find their own stuff. Butterfield describes it as (from Cory Doctorow's ETECH notes):
Stewart: It's not really categorization on Flickr -- it's about letting users remember. If I add the "Norma" tag to pix of my mom, whose name is Norma, I don't think it goes into the Norma category. The unfortunate thing about the term "folksonomy" is that it implies that it's a replacement for categorization. People categorize things by noting what they do or don't have: mammals have hair and live babies; does it have property a? then it's a whatever.
As Butterfield says, its a very personal effort. What gets created out of this rather large, unappreciated effort is a large number of personal external memory systems. Individual users know that they'll forget the specifics of the image, but they'll want to be able to retrieve it, so they try to anticipate what words they might use later to search for the item. These words may have absolutely nothing to do with any kind of description of the content, format, or any of the other traditional cataloging facets. They are likely not to be the same words that anyone else would use. And yet, many of the pundits, experts and gurus are convinced that a "semantics" will "emerge." Y'know, just like it did at Google.
The average uninformed user has a snowballs chance in hell of conducting decent research in flickr, or at Google (this may be the only point on which Michael Gorman and I agree). The most they can do is enjoy serendipity. Folks like Clay Shirky who claim folksonomies to be the wave of the future make me almost as upset as Google itself, which is guilty for turning the world wide web into a high school popularity contest and convincing people to settle for "good enough." I can't decide who is worse, Google for convincing people to settle for the least because it takes zero effort, or Shirky for telling librarians to quit because the Google driven economics won't support their effort.
An example of "good enough" thinking, again from Butterfield (from Cory Doctorow's ETECH notes):
Stewart: The objective of tags shouldn't be to exhaustively cover the field -- we'll have a million photos of Tokyo, and if the TOKYO tag only gets you 400k of them, it's OK. You're only going to look at 20 of them anyway.
This is why academics have problems with Wikipedia. Their students, the ones using Wikipedia to conduct research, get the wrong idea about how you go about the Scientific Method, and how you conduct exhaustive research in any given subject area.
And I wish I could have gone to the Etech and SXSW conferences, seems like everybody in the technology biz is preoccupied by metadata these days. I wonder when the cool kids will find my weblog.
January 11, 2005
folksonomies vs. controlled vocabularies
At Many 2 Many, a group weblog on social software to which Clay Shirky contributes, Shirky and Lou Rosenfeld seem to enjoy a disagreement on the merits of folksonomies vs. controlled vocabularies for organizing I'm not entirely sure what. I have to admit I agree with Lou Rosenfield that we should be thinking about metadata ecologies and collaborative metadata creation between content creators and professional content organizers. But the proponents of folksonomies have been quicker to recognize the dynamic nature of metadata, to the deserved chagrin of professional catalogers who should well know that no catalog record or controlled vocabulary is ever "finished."
Shirky claims that hiring professional catalogers to maintain controlled vocabularies is "not extensible to the majority of cases where tagging is needed." I wish I knew what these cases were. I suspect he means cases in a domain in which Rosenfeld, Information Architects and Librarians would be wise to follow Shirky's lead, namely the World Wide Web.
It seems to me that a lot of information organization happens over networks and on web servers and has as its object 'born digital content', but yet isn't a part of the world wide web. My occupation is to find the born digital content that is the output of the fulfillment of a major educational institution's teaching and research missions and organize it to do five things:
Help visitors find their way through the digital collections
Organize complex and compound electronic objects into effective educational resources
Satifsy copyright interests and manage access to the collections
Preserve collections over time and across new technologies
Manage digital production workflows, interactions and responsibilities
It seems to me that no folksonomy could accomplish all this all by itself, but several controlled vocabularies do. I work with Learning Objects (whatever those are), lecture notes, problem sets, datasets, slides, movies, structured text documents, audio files, and more. This amounts to a not insignificant amount of material, some of which eventually might make it to the world wide web, some not.
I think that the world wide web is the point where Messrs. Shirky and Rosenfeld start talking past each other. Shirky is right, all the Vengeful Info-Ninjas in the world couldn't tackle the task of cataloging the web in a meaningful and cost effective way. But the world wide web isn't everything. Neither is the world wide web as big as we think. Mostly it's comprised of a lot of little things that aren't really related at all. Google has the right idea about searching the morass. The answer seems to me (and this is hard for a librarian to admit) to throw computational cycles at it. As long as the web remains basically hypertext with no inherent organization this is the probably the winning strategy. And the rise of search as the next great tech industry war seems to confirm Google's hunch. But even Google recognizes that there's a lot more to search than all the html in the world. In fact, html might not be the most important thing. I think that the creators at the bottom of the bottom up web have a lot of other kinds of information that is valuable to them and I think that the next great contest will be to see who can provide the tools to allow individuals and small confederations therof the ability to search their own collections of information. Much of this isn't stored on web servers and accessible via hypertext protocols. Most it is stored in closed repositories or even, heaven forbid, non-digital. A lot of it is in the hands of professional catalogers and other scholarly information vendors and there is a great amount of value to their aggregation and organization of these hidden (to the web, and for now Google) information sources. If there were no value to the dark web, if the web world weren't interested in vetted, scholarly material than why Google Scholar? Why Google print? Why all the hubub about Wikipedia?
I do not mean to discredit the great and meaningful information store that is the world wide web. And I admit controlled vocabularies are not the way to go to organize it. Proponents of folksonomies are correct to recognize that everyone is going to have to do their fair share. It means those with good information organization skills have to help those without. This is my message to fearful librarians who think Google will replace them. Our job was never to be the gateholders. We must be mindful of our mission to teach others to have a healthy, ongoing relationship with information.
I've been cataloging electronic teaching material via IEEE's Learning Object Metadata and IMS Content Packaging for five years now. If there's one thing I can say about non-Marc, non AACR2 metadata for digital objects, it's this: the metadata for any one object is never static, never DONE! No matter how many taxonomies you throw at it.
We've got a project to take a static web publication of every classroom teaching resource that can be digitized and package it in xml for delivery to an institutional repository and for migration back to the learning mangement system from whence it came. Once we've got IMS Content Packages we hope to be able to more easily ship the objects around to the multitude of Information warehouses on on campus that were formerly silos with little or no communication between them. We started with the idea that we might be able to identify learning objects and push them to our patrons. We ended up realizing that we couldn't decide for our patrons, in their ofttimes unique contexts, what is a learning object. Instead we're focusing on allowing the descriptive and structural metadata for each object to change and grow over time. We're working to discover how to enable whatever course content lifecycle our end-users can dream up. This means letting (and asking, which is so much harder) our end-users to provide the contextual metadata and capturing it an fashion aware of that context and its applicability, or lack thereof, in other contexts. If one system organizes courses by session, while another deposits all resources into buckets by kind, how can we deliver the content to a patron that allows him to choose one organization over the other or to start with one and smoothly move the data to another? These are the questions we seek to answer, knowing that some of the metadata we write will go away, some remain and new information will accrete on the content as it lives its life in our electronic teaching systems. This new lifecycle would not be available to these digital teaching resources were it not for the organizational effort of professionals, neither is it possible without the cooperation of one's intended user group.
On one of my projects I developed an information model to capture metadata the would satisfy copyright requirements, populate the search index, and provide information necessary for preservation. Many controlled vocabularies were used as well as good AACR2 principles wherever possible. The workflow that was put into place involves receipt of material from faculty, and handoff to a team of surrogate authors contracted in India who scrub the documents and provide an initial pass of metadata. The metadata team I lead then provides a professional review. As the project proceeds we have been able to push more and more of what is traditionally a professional librarian's responsibility to individuals who don't even have the benefit of familiarity with the material! And to compound the difficulty that authoring team adjusts its staffing levels to match publication cycles based on the institution's semester cycle. Every six months there are new faces to train. Naturally, the results have been outstanding. Collaborative metadata creation can work. And controlled vocabularies can be used to enable the sort of folksonomic information organization in "those cases where tagging is needed."
December 14, 2004
This announcement, following so closely on the heels of Google Scholar (via metafilter), might frighten some librarians, who imagine themselves being chased into obsolescence by computer scientists throwing processing power at the web's information organization challenge.
In fact, these efforts should serve notice that Google "gets" information access and retrieval, and that they view themselves as the library's ally, not competitor. All of their services value and build upon our information organization work. I'm going to have to visit my colleagues up the road and find out who's running this project.
November 17, 2004
Meatballwiki and Blogdex
Okay, last ASIST Annual Meeting entry, and guess what it's about.
Blogs, and the blogging bloggers who blog them.
That's right its "Beyond the Sandbox: Wikis and Blogs that get work done"
And I'm spent, got no more Infotainment in me.
Go check out:
<rant> <!-- Insert tirade on the self-referential, closed community of the so-called "world wide web" --> </rant>
November 16, 2004
Quick links and cool Infostuff
K-Blogs, Wirelessness and the Cult of Personality
Today's ASIST Annual Meeting entry should have appeared much earlier today. I attended the presentation "Blogs for Information Dissemination and Knowledge Management" happily prepared to do what Metametadata do in real time while experts discuss what it is metametadata do. I mean the metameta opportunities were OFF THE CHARTS!
But no reliable wireless connectivity in that particular conference room. ARE YOU FREAKIN' KIDDING ME! Everywhere else in the hotel but conference room Providence II?!!! The presenters couldn't even access the internet to show in real time their own blogging efforts. Without internet access, why even talk about internet technologies? It should be painfully obvious to everyone by now how fundamentally important the basic infrastructure of the internet is and how it should be a common trust supported and enjoyed by all. It shouldn't even be a question, it should just ... be ... there.
So now I have to talk about blogging for oneself vs. for others and for knowledge capture vs. infotainment without the benefit of a punditry soundtrack. I mean at least I'm attending "Diffusion of Knowledge in the Field of Digital Library Development: How is the Field Shaped by Visionaries, Engineers and Pragmatists?" In other words how to develop your own cult of personality, a topic of interest to all bloggers (including myself) who hope to reach a wider audience.
I want to be one of the few lucky ones who get read by everybody. I feel qualified to be an arbiter of "cool." And I'd love people to pay me to pontificate on the future of the information society. Basically, as all readers of this blog are again painfully aware, I'm totally ready for Guruhood. Now I just need to find some people to hang on my every word. I think I've already started to cultivate the appropriately eccentric personality traits. I've stockpiled my somethings to say and now I just need some people to listen to me.
Any takers? (If so, feel free to link to this website!)
WOW, did I use enough emphasis in this article, or what. Don't you just want to sign up for my new lecture series?
November 15, 2004
Meaning and Animal Compassion
Day 2 at the ASIST Annual Meeting is all about Meaning.
First, Sir Tim Berners-Lee, inventor of the world wide web and director of the World Wide Web Consortium, or W3C (located in the Stata Center at MIT), promises us the real world through the magic of the "semantic web." Whereas the protocols that make up the world wide web, TCP/IP, HTTP, html and others represent syntatic agreement for creating and sharing hypertext, the semantic web represents a method for reaching semantic agreement about the content of our hypertextual communications.
The idea is that you define your terms within a specific domain. If you're creating learning objects in a courseware system and your using the Learning Object Metadata and IMS Content Packaging system you define what those schemas mean by a "Contributor" or "Manifest". That way, someone else in an institutional repository system which uses Dublin Core Metadata and the Metadta Encoding and Transmission Schema can programmatically and systematically map those to "Creator" and "StructMap".
This is accomplished without the creation of a grand unified Ontology (essentially one big set of definitions for everything that everybody uses). Instead everyone who adopts Semantic Web technologies warrants that they'll provide their domain specific definitions and then programs can interpret these definitions and find equivilancies.
The definition mechanism is the Uniform Resouce Identifier. A glorified url of the form http://... It is important to remember that uri's aren't the things they define, even in the virtual world. They're still nothing more than names. Just like menu entries aren't food and you can't eat them (I'm sure people have tried).
For example, I could define myself via a vCard or even just a plain old vanilla html where I've deposited my curriculum vitae or just some contact info. Then whenever I include myself in the metadata for any web content I create, I don't just write in my name, I point to the web page where I've defined myself. The mechanism for pointing is RDF, which is just a logical triple of the form Subject-Predicate-Object.
The idea is that you build your collection of information to share with the world from the viewpoint of in between really big scale operations and really small scale ones. You correlate the definitions provided under you and you make your definitions with an eye to their inclusion in larger correlations. The same definition activity goes on at all levels. The semantic web is a fractal web and it realizes the hermetic wisdom As Above, So Below.
You build your data system as a module of modules, incorporating data from different domains. You don't have to set definitions for each data store you tap into and you expose your data for even larger system via standards that allow the larger system to do likewise in aggregating you. No one is forced to one semantic standard, but each can understand each other with a minimum of human supervision.
I think there are two illusions in the RDF grand scheme.
The first is the Aristotelian Illusion that we can speak definitively that any one label, even a rdf triple, is the thing being captured. Despite explicit recognition that uri's are not the things they represent we still interact with triples as if they are. For example, is Gary Marchionini his entry in the NACO authority file maintained by OCLC or the similar record on Amazon? Obviously he's neither, and we should be pointing to some human being in space-time (assuming we share his inertial system). Who has the right to define him? Is it possible to dice up the entire universe into non-competitive domains? I can't imagine it for identity definitions, what about conceptual? I can't get past the dread that good old social conflict will subvert all our technological advances. I would really like to understand better how RDF plans to live happily with a myriad of definitions that don't quite match exactly.
The second illusion is that RDF encourages interoperability without gobbling up smaller standards. It attempts to let semantic standards alone, but it dictates a common syntax. The last time the W3C tried this was html, and how long did it take for the major browsers to implement three or four different flavors of html and the DOM? And how long did it take before the browsers started playing nice and made it easy for content providers to produce material that worked the same way across all systems. I would love to see the semantic web's game plan for avoiding the same sort of hijacking of the "standard."
The one thing that I do like is that Mr. Berners-Lee seems to be in concert with Ms. Hertz and a lot of others regarding the economies of scale on the internet. RDF and the semantic web proposes a solution to bad economies of scale regarding standardization and interoperability of data systems on the web. Departmentalizing the job of defining the content on the web lets everyone do their share and gives more responsibility to the content provider to produce good metadata at the time that they create their content. In a post-modern information society it's no longer good enough to dump your data on the world, you've gotta tell us what it means to you. Sounds kind of silly, but it goes a long way towards improving communication and understanding.
Secondly, Gary Marchionini (not Gerry) presents again, this time in Rhode Island in a presentation called, "Why can't Johnny file?" in which librarians lament the poor information organization behavior of the average user of the new information technologies. It is Gary's contention that we do not need to teach him to file. That there are better things for humans to be doing than organizing our inboxes and hard drives.
His main argument repeats the thesis of his earlier presentation in Cambridge. Let the machines do what the machines do, let the humans do what the humans do.
I agree with him to a point. And that point is this: It's less important to teach someone to file a file so they can find it later than it is to teach them to ask better questions about the information content of the file.
That said I have two caveats and hopefully some clarification.
First, Gary assumes future advances in technology. I am far more wary of making any bold predictive statements about the advance of automatic indexing of multimedia resources or facial recognition software. I wouldn't design instructional methods or encourage information behaviour that relies on a technology to present itself.
Second, Gary equates filing and other organization activities with discovery. In fact, there are three reasons to organize information by filing or classifying and the ability to find it again later is the least of the three. The second most important of the three is to be able to communicate something about the information to others. This information organizing activity is all-pervasive in our society and extends from sharing ipod playlists to writing surveys of the scholarly literature in a particular field.
The thing that we communicate when we share our organizations isn't some grand unified Ontology that we discover. We can all admit that there isn't one master organizing scheme that we can teach to. Rather by sharing our personal organizational efforts with others we share our relationship to information, our worldview, and creating this relationship is the most important thing we do when we file. We need to teach filing and classifying as one part of a larger program to teach people how to develop good relationships with information, how to evaluate it properly, how to find more and better information, how to put it to work for you, and how to relate it to existing personal information stores. We need to teach how to recognize how information affects our thinking, and how our thinking colors the information we encounter. In this respect all filing and classifying can be viewed as interpretation, and we should teach such basic skills as organizing things into simple buckets to see what happens. This is how we teach kids to invent/discover meaning in the world they encounter.
Near the end of the question and answer session following the presentation a fellow stood up and made the most salient point of the day. The session was titled wrong. We shouldn't teach filing, we should teach organizing. How stuff is stored in a server or hard drive or repository or database isn't what we need to know. But we shouldn't throw the baby out with the bathwater. We do need to know how to intellectual and mentally store stuff away. I would offer that what the fellow meant was that we should teach people how to mark up content, to add metadata that describes for themselves and others the personal and social context in which they interact with information and which captures the relative meaning which they found in it.
Okay, Number 3
So the Omaha people would send their young persons at puberty on what white folks ended up calling vision quests. What they were really trying to do was gain the favor of Wakonda (think the Tao rather than Yahweh). The idea was that a sincere child who stood in place for four days with clay rubbed on their head would be more likely to influence Wakonda than a cynical adult. As might be expected, the kids starting recieving visitations from different mediums, animate and inanimate. These visitations turned what was originally an event of supplication on behalf of the tribe into a personal appeal to the most imporant force in the universe for help throughout life, undertaken when a child was "old enough to know sorrow," in other words to engage in tribal relations as an adult.
This is the prayer that all Omaha know, that they all sing as they stand Non'zhinzhon, "as sleeping," for four days, trying to think the happiest thoughts they can for their future.
Wakonda thethu wahpathin atonhe
Wakonda! here, needy, he stands, and I am he
When the kids come back they can't say anything about the animal for four days, then they can go talk to someone who also was visited by their same medium (and they can talk to no one else, and no one else asks). The elder helps them to understand what happened and what it might mean for their relationship to the tribe.
It is said of that the medium "has compassion on" the child, that this is its motive for impelling its particular form on the child for the rest of the child's life. It has decided to take an interest in the child's future because it pitys the child.
I like the idea of going out to gain the favor of some animal or the other. I especially like the no talking about it for four days rule. It seems to really drive the experience home and allow the child to come to terms with it, to properly prepare before talking about it. Once the child has talked it over with an elder the child then goes out and finds the physical expression of that medium and fashions a trophy. If a hawk visited the child, the child would journey until he crossed paths with a hawk, would kill it and preserve the bird as a visible sign of his visitation, the child's most prized possession.
Kids would even become junkies for the process, going through it over and over again until they settled down with a family (if they became holy persons, today called roadmen, they continued the process their whole lives). It was generally frowned upon once one became a respectable member of adult society.
I read all of this from the Omaha Indian book by Alice Fletcher and Joseph La Fleshe. The bible for all things Omaha. When I put on my white perspective, it seems funny that a whole tribe of Indians with active social and cultural traditions would pore over a government scholarly study for ideas on how to be more Indian. It seems a testament to the horror of any culture being systematically euthanized by another.
When I think Omaha about it, it seems just another story. There are two kinds of stories in the book. The stories of the elders that Alice and Joseph recorded, and Alice and Joseph's own stories.
I've heard most of the stories in the book from third party sources like my grandpa and grandma Charlie Stabler and Elizabeth Sansouci.
Going to pray to Wakonda was just one of the many rites that a child passed through towards becoming an adult and forming an appropriate relationship with the nation and the world. It, "required a voluntary effort by which, through the rite of fasting and prayer, the man came into direct and personal relations with the supernatural and realized within himself the forceful power of the union of the seen with the unseen," as Alice says.
Seems like the child is finding meaning in the universe and finding ways to share that meaning. Seems Alice and Joseph are also finding meaning and sharing it.
The totem is metadata and its shared, in fact, it's the only part of the experience that is shared with everybody. And it needs no one definition.
November 14, 2004
Whoever has the best library wins
Coming to you straight out of the American Society for Information Science and Technology Annual Meeting in lovely downtown Providence R.I. is every ones favorite semi-structure data source, Mr. metametadata himself, the Keeper!
Thank you, thank you. I'm very delighted to be here. I just have two things to say.
A very bright person, J. C. Hertz, who makes a living applying game design principles to information systems for major products, services and learning systems (she's principal of a company called Joystick Nation, Inc.) just made some statements I took issue with in her plenary session.
These statements are (paraphrased):
any system that requires people to add their own metadata is doomed to fail
My discomfort with this statement is merely linguistic or epistemelogical. She spoke of "accidental metadata" and provided the classic example of an image caption as accidental rather than I don't know, maybe intentional? I'm not sure I understand the "accidental" domain. Does she mean information that can be mined from the data object itself? Does she mean contextualizing information that people add as they position their data objects in some information space? Does she mean information that is the result of interaction with data objects? All of these are candidate activities for generating metadata byproducts. In fact the product of some of these activities really wants to be both primary data and metadata at the same time.
She most strongly made a distinction between people providing unasked for information as they positioned their data in an information space versus information provided as a result of the information space dropping a cataloging form in front of people and saying you have to fill this in before you can submit your data object. I think that in either case there is an expectation on the part of the owners/participators in an information space for a provider of a data object to also provide metadata. My response to Ms. Hertz is that there just isn't social momentum for making that expectation explicit. There still are strong unspoken social pressures in the information culture that keep most people from dumping a bunch of uncontextualized data into the pool. What metadata specialists do is find ways to work with data providers to take advantage of these pressures to formalize "accidental" metadata offerings to the providers' benefit.
For example, the Metadata Services Unit in the MIT Libraries provides metadata application services for MIT OpenCourseWare. The Unit creates Learning Object Metadata records for OCW resources. MIT OCW is in many respects an electronic publication. Like other publications they provide image captions. These captions contextualize the image and provide copyright information. There is no official publisher's rule that says you have to caption your images, but most people do. The caption is metadata that just plain makes the image more effective as an information resource. It's in the interest of MIT OCW to provide this caption and they've mandated it's existence in their style guide. They are, like most data providers moving to electronic information delivery, unfamiliar with xml, semistructured data and metadata records. It is natural for them to write a caption into a web page right where they embed the image. This provides a benefit to viewers of the web page and also means there's some text about the image that gets indexed for the search engine. The caption in the html makes the image more easily discovered. It is the role of the Metadata Services Unit to find this metadata OCW has provided and get it into a Learning Object Metadata record. Why duplicate the existence of this information in data and metadata? Because now, when you search for images and get a list of them back, you get the caption included in the list. It is much easier to capture the caption in a metadata record field and then display that field on the search results page. If you doubt that this is easier than grabbing the caption straight of the html page go try to make sense of the keyword in context part of a google results list. The two metadata record fields presented in a search results page are General.Title and General.Description. Because the Unit put the caption in the description field patrons have access to that information as they're making a decision whether or not to actually view the item. Once again, metadata (same info, different wrapper) aids information retrieval.
What this seems to point out to me is that there has always been a fuzzy line between data and metadata, and that the information explosion precipitated by hypertext and networking technologies has brought this confusion into high relief. The more stuff that gets into the information space the faster we seem to need to produce even more. This means that we start demanding more for our metadata buck, it must do more for less effort. When folks start drawing clear lines where none existed before it's often because they're drawing in the wagons and abandoning any effort that doesn't seem to realize a return. In this kind of environment there really should not be any metadata "accidents". It's the responsibility of information science professionals to make data providers aware of this and help them form a stronger, more cost-effective relationship with metadata rather than attempt to abandon it all together.
libraries represent a well-organized, well-funded, well-mandated effort that doesn't scale well
This rounds out the arguments against human analysis and metadata production for the mass of data on the internet. Basically, data providers are lazy and data organizers can't afford to send enough librarians at the task. I think that both are misunderstandings of the current information economy. We've seen that data providers do provide metadata intentionally and there are methods to formalize this relationship, cementing the social motivation to provide metadata and realizing an economy.
We can also find the funds to catalog the internet, we've just been looking in the wrong place. Before networking and hypertext it was expensive to get a bunch of data into the world. For ease of distribution a bunch of ideas would aggregate (think gravity) into a book, an easy way to share a lot of information in one package.
It's no longer so expensive to send a bunch of ideas out there so they tend to avoid aggregation (think dark energy pushing infobits apart). The semantic density of the ASIST conference proceedings book seems galactic in scale next to this humble weblog post.
Now this same semantic density scale correlates with the scale of effort on the part of the author. It takes a book author much longer than the 30 minutes or so I've spent throwing of this ephemera. And the scale of reward is similar for the author. But this scale of effort does not hold for the librarian who might catalog either on their way to a library of digital repository. It takes the same 15 to 20 minutes to catalog a book as it does one weblog entry.
This is what Ms. Hertz means. There's enough information in books and serials to justify an international collaborative effort amongst governments, institutions of higher learning and information professionals to provide a sophisticated level of information organization.
The information space that is the internet is screaming for this same level of organization, but the justification for a similar effort on the part of librarians seems absurd. However, if you look at the scale of effort and reward on the part of data providers, it seems apparent that the creators and aggregators of this content could and should shoulder this burden, even if it means hiring the librarians away from Universities and Public Libraries. Librarians need to focus on marketing their service offerings to the data providers wherever they are. This is our entry point, how we get to catalog the world wide web. It's likely to happen piecemeal at first. Especially considering the pluralization of data providers in the current information economy. Don't shy away from any project, no matter how small. We also need to find the right social leverage points to encourage good information organization behavior on the part of those who are producing the mass of content, again no matter how small.
Okay, things to say number 2.
Search Wars, Killer Apps and Why Librarians WILL Eventually Rule the World from the ASIST Special Interest Group: Information Architecture listserv (Thanks Boniface Lau)
It seems that the theme of this post is the old adage from Brown and Duguid (The Social Life of Information), "Social change must precede technological change".
I think that librarians get this and are working very hard to improve their ability to effect the needed social change. I think that the computer and internet technology communities are lagging behind. Their response to the need to improve the quality of metadata that accompanies electronic resources is not to leverage the social pressures that will prompt folks to provide this information, but to find some algorithmic, programmatic solution to pull it out of whatever a human sees fit to deposit. If you expect people to be lazy they will. If you expect more, they'll surprise you.
I don't think that you can process the amounts of data were now seeing through human effort alone, neither can you do it by machine alone. Even google employs an army of librarians to tweek the algorithms and improve the results of programmatic analysis. I just heard a great presentation at MIT by Gerry Marchionini from UNC Libraries about automatic generation of metadata. He thinks that we should be educating a whole generation of young people to be more savvy at information retrieval and interaction, including algorithm maintenance and development. He thinks that we can very easily develop more sophisticated information resource users with higher expectations for information systems and he promotes something he calls Human Computer Information Retrieval, the collaboration between man and machine to solve the problem of interaction with growing amounts of data.
Did you notice that the article reference is from a listserv sponsored by the group throwing the conference I'm attending?
And, yes I noticed that I'm blogging about information technologies like blogging from a conference about information technologies like blogging (I'm also blogging about the conference itself, from the conference itself) and that this is some sort of metametageekiness. As you should have realized long before now, I'm very comfortable with metameta events and environs, and geekiness.
October 07, 2004
Rawbrick's TV Audio Commentaries
As much as I say I'm not supposed to post to the old blog, I'm not showing any signs of stopping soon, or fixing the new weblog either. I know how rawbrick feels about laziness and projects on the periphery. Of course, I've only talked about all the cataloging I'd like to do to this point, she went out and did something about it.
I mean she spent a couple of days, bless her heart.
I had the good fortune to meet Rawbrick when she was in town for some Ex Libris conference. There should be more young librarians like her. Makes you really see the power of METADATA.
September 09, 2004
Metadata, It's The New Black
Now go read this article about the Flickr service that features prominently in the metafilter story. Be sure to notice the sweet use of metadata (keywords, essentially) to organize and discover photos. Tufte would be so proud of the visual display of the 150 most popular keywords. Classification from the ground up! And people are actually participating! And the keyword list isn't degenerating into an unwieldy mess because people are availing themselves of the opportunity at the point of applying each keyword to verify its conformance to the collectively developed vocabulary! All of you friends of mine out there posting your photos to various other services need to immediately switch to Flickr. Go! Do it Now!
Now check out the Metametadata flickr photo, with keywords.
Now read Rawbrick's take on collaborative metadata use and possible models for incorporating library training. Notice how the confirmation service Flickr has cooked into its metadata production process seems to try to programmatically occupy the middle ground Rawbrick defines. This is the stuff that gets me excited. I share Rawbrick's desire to see this space occupied by folks like us and love to see projects recognize the necessity. I think that this sort of thing is the true future of librarianship. Just a bunch of total geeks who can't get enough of information organization monitoring the infernet, making sure everyone is having a good time. Of course metafilter, flicker and other internet community services are totally beating librarians to the punch.
Yep, Librarians will one day rule the world. And the business community is starting to catch on (via Catalogablog). Of course they're making it overly complicated. I'm not sure an ur-element set is even feasible, much less warranted. This is almost sacrilege coming from a professionally trained librarian. We librarians love our standards and our master lists, but I don't think that this is the road to interoperability. I think that rdf and the semantic web will wind up not trying to make a master list of element definitions to which each individually developed metadata set will have to map. Rather semantic agreement will happen at a much lower level of communication.
September 08, 2004
They're starting to believe
That the internet and intellectual property can be free and the world won't come to an end. In fact, the internet wants to be free or I wouldn't be keeping this here weblog.
So when is this gonna make it to Cambridge?
We've got a good start going with these.
Too bad they're nearly all pay-to-play.
Go here to set up an account and look for the free ones.
With MIT completely wireless and Harvard due to get there any day now I naturally feel as if I should be able to connect anywhere, anytime without ever having to plug in.
More reading material on the subject.
July 14, 2004
The Semantic Web
My first post back from an unannounced and unforeseen leave (did anyone even notice?) is job related. Sorry about the long absence, I was kinda busy getting married and reaching another milestone on the OpenCourseWare project - 701 courses published and catalogued.
Anyway, here's an Introduction to the Semantic Web by my colleague Stefano Mazzocchi (thanks Catalogablog). Stefano and I are both working on MIT projects surrounding DSpace, OpenCourseWare, and the application of XML and the Semantic Web to the creation, storage, sharing and use of electronic educational resources.
Stefano's project, SIMILE, is an application of Semantic Web technologies to create a faceted browser and other sweet tools to search disparate and distributed repositories of content.
The two projects I work on, OpenCourseWare and CWSpace are applications of XML to the organization and storage of educational resources to facilitate sharing, retrieval and storage. I'm firmly in the XML camp and have expressed to him the viewpoints he presents from there. I'm still not sure I get the Semantic Web, but this is as good an introduction as you're going to get. Personally, I'm excited by N3
June 01, 2004
I miss typography
This is what I'm talking about:
I miss Rare Book Bibliography, and literary circles and handpresses. The Fu Crew needs to put together a compendium and give a reading. I know every single one of you either writes or draws.
And, yes I am extremly jealous of McSweeney's. Have you seen their latest comic book volume?
May 15, 2004
I've spent this early evening helping Chhavi Sachdev get her own weblog up and running. Check her out at http://www.chhavisachdev.com/blog/.
So far we've downloaded Wordpress and installed it on her server. We've just finished editing her stylesheet and templates.
At some point I'll fix the links in her flash files to fit her preferred site architecture.
She's got an awesome airport hub in her house so she, I and her roomate (my colleague from MIT Libraries) are all working wirelessly on our laptops.
April 15, 2004
The Thinkin' Man's Rap (Nerd Hop)
Okay, Check it, two rap websites. One Old Skool, the other decidedly New.
First, The Foundation
The history of Rap's Old School as told by interviews one Jayquan conducted with the artists themselves.
Especially check the Kool Moe Dee interview.
I've currently got Kool Moe Dee, KRS-One and the Getto Boys on heavy rotation. Gotta go get me some of that Mellie Mel.
And, now for something competely different.
Warning: Smart people wrapping, with like words and stuff, like written down and everything.
The mp3's okay, the lyrics are better than they sound.
Gotta check the comments.
You just gotta.
April 13, 2004
Metadata Rules! (And so do nerds!)
Just a little model of my favorite metadata scheme, the IEEE Standard for Learning Object Metadata, or 1484.12.1 (Click on picture)
Check out its majesty.
Man, so cool.
April 10, 2004
From the Uber-nerd files: Metacrap
Just call me The Vengeful Info-Ninja.
I'm sorry, you're just gonna have to read the article to find out why.
Yes, I understand that I'm devoting an entire entry to people and articles that don't share my belief in the value of what I do. Our conflict lies at the junction of the public web (very shallow) with the "deep web" (huge silos of data, datasets, journal articles, learning objects). These articles discuss the application of metadata to the public web, where Google has correctly chosen to leverage the semantic connections (link choices) of its users because the lowest common denominator rules. If porn comes up highest on a search its because that's accurately a reflection of what the majority of people are searching for with that search string. I'd be willing to bet that Google has to significantly change its approach as it gains access to academic and similar repositories of content resulting from intellectual inquiry.
Other sites of interest:
Finally, The Semantic Web
March 26, 2004
I've been having trouble staying afloat of all the good stuff I want share with you. So I've decided that I need to spend some time talking about that trouble.
So I have no idea why I'm having so much trouble. True to form, I just decided to throw some organization at it.
There are three major messages that I'd like to share with you.
- Damn it feels good to be an uber nerd. A sort of brain dump in which I compare my information dissemination project with other knowledege organization programmes and provide a window into my ridiculously over-organized mind.
- Let's all wear blank badges. In which I try to convince everyone to join a routing list of my complete collection of Invisibles comics. Seriously, many of the major brain dump factoids originated as counter-cultural references in this comic.
- Giving you your very own Personal Digital Library. In which I detail my efforts through the website, implementation of cocoon and savvy use of xml and xslt to develop a personal digital asset manager. Wouldn't you all like a quick and easy way to electronically manage your music, movie and book collections?
Okay, so maybe the trouble is that these major themes are quite personal and important to me, and the required effort per syllable to publish them properly is discouraging. You know me, "if it's not worth doing right (read perfectly) its not worth doing." So to counteract this inertia, I've tried to break the info down into its absolutely discrete units, then I'm just going to throw them up and let you sort through them yourself. The only categorical organization to this sub-publication of metametametadata will be through the titles of the entries. Otherwise the existing categories will be applied, but I'm not making categories for the three major themes.
Ok. Let's get started.
Here's the link to that discrete organization I was talking about. Bet you didn't even know the log was having so much difficulty as to warrant the organizational effort.
So how many times can I say so in a blog entry, it's kind of like how it's really, really annoying when public speakers say "um" all the time, but we totally need them to say it, so that we can catch up with their ideas, and they totally need to say it, to allow time for their brain to send coherent linguistic organizations of those ideas to their mouths. We really, really need it in coversation, but when you listen to a public speaker, or worse a radio dj, go um, um, um it drives you mad. Anyway it's so annoying. And what's up with the two different spellings of Oh-Kay? Something's wrong with my brain today. I need to like, defragment.
March 18, 2004
Sorry, but my professional life is creeping into the log with this entry. The above image is a link to the best presentation of what I do professionally. I support programs like the Learning Activities/Learning Objects Repository at Wesleyan University. LOLA is an attempt to create a "refertory" of digital teaching resources. Refertory means that the project does not store the objects on its own servers. Rather, LOLA is just a catalog of metadata records with links to the actual objects at individual instructor or institutional websites. It's a clearing house where professors can find teaching material, share the way they make particular use of objects, and evaluate the object's effectiveness in the classroom.
For LOLA I provided an implementation of IEEE Learning Object Metadata (LOM). This meant picking a set of elements that completely and accurately describe an "Object," and another set that describes an "Assignment." Through conceptual models, data models, xml bindings and definition of best practices for metadata creation, I defined all possible relationships between "Objects" and "Assignments," and designed user interactions for creating a LOLA object (entering metadata), searching the catalog of LOLA objects, and providing feedback on LOLA objects in the form of assignments. Here's a picture of the Data model I provided to give you a better idea.
In the data model, the two big circles define the two types of metadata records, objects and assignments. The gray sausage looking things are individual metadata elements or fields in those records, and the green squares represent user interaction screens (webpages) where visitors enter catalog information or view that same information. So a LOLA object to the user is a collection of five webpages (including assignment) that provide useful information about the digital resource. The metadata elements in the System interaction screens are provided programmatically by the LOLA system behind-the-scenes; the user never sees them.
The cool thing (to me, anyway) about LOLA is that an object in the "refertory" is neither a physical thing, nor even a digital thing. It's just an instance of metadata (including a URL) that points to a thing!
Kinda explains why I call this weblog metametametadata.
February 28, 2004
More on the Music Industry
Links to a couple of articles from John Dvorak at PC Magazine.
I can provide a data point to verify Dvorak's conjecture regarding the real reason for the downturn in music sales.
During the Napster and Kazaa heyday I spent $40/month on music. I learned about all the music I bought from these file sharing programs. I then went and bought cds because I didn't like the quality of what was available, and I couldn't trust the information associated with the song. I used to find a song and then verifying the album and artist info at Amazon (to check all the possible remixes). Sometimes I would then purchase the album from Amazon. When the RIAA started suing people, I stopped using all file sharing software and subsequently stopped buying music. I can't remember buying a single music album in the entire calendar year 2003.
Recently, the legal digital music sellers have actually produced a service worthy of interest. I use Itunes. Why? The interface. Apple just makes things that are easy and fun to use. But the selection at the music store sucks. Despite the limited availability of stuff to download I still find music that interests me. How? The shared music feature. People on a local area network can share the contents of their Itunes libraries with each other. This is exactly what Napster was, access to music other people burned of their purchsed cds. Smart move Itunes. Living in college dorms I learn of groups like Manu Chao and then find myself back in brick-and-mortar music stores, back on Amazon looking for more information and buying cds again. I've spent more in the last two months on music than I have in the last two years. All thanks to Itunes, thanks to the sweet interface and the reasonable allowance of music sharing.
For those of you who follow this log on a regular basis, I'd like to tie this explanation to two earlier discussions, Do You Live the Dijalog Lifestyle? and Grey Tuesday. The problem is one of information access, something with which those of us living the digital media lifestyle are intimately familiar. The big five labels of the RIAA have established channels for disseminating information via hype and know what's going to happen when they rev up their marketing machines. This creates a controlled situation in which they can make their money. They're slow and unwilling to recognize and take advantage of new mediums for the dissemination of information (which file sharing really is, it's not primarily about dissemination of product) because they lose control. They have a harder time hedging their bets that new talent will sell. Hence, the ridiculous response to DJ Dangermouse's Grey Album. If they were to take the money and run, they would be admitting that money could be made without the need for the bloated production, development and marketing budgets that allow artists to make $1 per cd that costs the consumer $18.
It's just a sad, sad state of affairs when Apple knows the future of the music industry better than the RIAA and builds a compelling, revenue generating product. At least the RIAA can be dragged kicking and screaming into the digital music world. Now if they would just stop the completely pointless exercise of suing people for helping them generate profit.
February 27, 2004
Do You Live the Dijalog Lifestyle?
By now it should come as no surprise to all 12 of you that regularly read these pages that I pay attention to what goes on at xml.com.
And now it seems xml.com is paying attention to what goes on at metametadata.net (with a little help)
Kendall Clark has begun a new column in which he seems to have drawn a perfect picture of me and wants to discuss it in depth.
I was even moved to comment on the column and steer folks back here to share in the metametameta craziness.
Reading the column I'm reminded of Yellow River's obsession with his ipod and the stacks and stacks of cds he's got all around his desk.
February 19, 2004
Effective Information Food
So conversation with The Kenj has identified a need for The Keeper to shorten his posts.
Apparently, The Keeper has unreal expectations for his readers concerning their information gathering behavior in an electronic environment.
In other words, y'all skim.
So, The Keeper'll be brief, or here's a quickie for ya.
From the digest of the ASIST SIGIA listserv, on websites (not this one!) that are the most effective from an Information Architecture perspective:
Apparently, the IA folks love the "usability" and "intuitiveness" of the "interaction."
And of course, as soon as someone recommended a British site for food and cooking recommendations, hilarity ensued.
Anyway, that's all The Keeper has to say about that. I bet The Kenj will even read this post and follow all the links.
February 11, 2004
Two posts in one day! It's like a record.
Okay so I totally redesigned the website. I know, you're all like there's a website? Isn't this it?
No ladies and gentleman this very much is not it. This is just a part of it.
Metametadata is my website about things in my life. The weblog is quickly (and enjoyably) becoming a collaborative effort.
Since metametadata captures information about the record capturing information about the object, and I am that record, I think that this log should from now on be called metametametadata, to recognize that it is capturing information about the capture of information about the record about the object, follow? Knew you did.
This way the log can be our thing and still a part of my thing.
So that's its name.
At the website check out all four sections. At the contribute page notice that kenj and I are listed. If you too would like to be listed as a contributor to the site (and anyone who has made the least comment is already a contributor), then you need to think up a cool description for the date. For example: This date represents the last time x bathed.
You'll all get to be validators.
The only links on the site that shouldn't work are the booklists on the identifier page. Those are next up to bat and will be blogged about, then its a note about making a derive, then a note about the metametadataschema (once I fill in that section).
Okay kids, knock yourself out. And while your at it check out the website I built for work.
February 05, 2004
Okay, here's a quickie.
Working with metadata in libraries, I follow the field of endeavor recently cobbled together under the ill-defined title Information Architecture (see links on the extended entry page). One of the ways I stay 'plugged in' to the IA community is by keeping track of the email list for the ASIST SIG-IA (American Society of Information Science and Technology) (Special Interest Group-Information Architecture).
The folks who regularly post to that list never cease to amuse me.
Anyway, the reason for this post is to send you to this website, http://www.mcmaster.com/, presented to the list as an answer to the following two questions,
- How many top-level categories on a web-site is best.
- How many levels of links are best.
Defying conventional wisdom this site works!
Quoting the poster:
"as many as needed" is usually the answer. for example, my husband thinks
this site < http://www.mcmaster.com> is the best thing since sliced bread,
and points out how great the homepage is. Now that would rate as clutter to
most designers.. but not to the geeky happy free-wheeling scientist whose
dream of being able to buy one .05 millimeter ball bearing, or a sheet of 24
carat gold mesh can come true
Metal Cam-and-Groove Hose Couplings!
Straps & Hangers!
Lubricants & Penetrants!
Drivers & Knockout Punches!
Rotary Motion Vibrators!
Standard and Slug-Buster Round Knockout Punches and Sets!
Information Architecture links:
February 03, 2004
A Public Service Announcement
From the "I really don't have enough ways to waste my time" newsdesk.
--Some websites/blogs that I follow and an introduction to RSS.
RSS (Real Simple Syndication) is metadata scheme that defines a standard way to format the contents of a website or weblog (using xml) so that any number of different programs can access and understand this content. When RSS is universally adopted by the publishers of websites and weblogs, a program that can read RSS will be able to find the content of any site/log, get that content and republish it however it wants.
Programs that read RSS (called syndicators or aggregators) come in the usual two flavors. They are either web-based and client-based. The difference is just in the display medium. Web-based programs transform the xml into html. Client programs have their own graphical-user-interface(GUI), into which they plug the xml content.
Most well-run blogs (including this one) offer an RSS version of their content. To see this blog's RSS, just click on the link "Syndicate this Site (XML)". If you look at the rdf for this weblog, the rest of my website will start to make more sense. I'm applying xml to format the display of information, in addition to its intended use to format the organization of information. You can make your own decision as to how successful I am.
Once you install a syndicator, or sign up online for a web-based aggregator, then all you have to do is travel once to all the blogs you want to follow (including this one) and gather the http addresses for the RSS xml (you can cut and paste this from the address line). Then whenever you run the syndicator it will read these files (which are automatically updated by each blog's content management system) and prepare a list of the current entries/articles.
In the future when we all have mulitple personality disorder and our own reality tv show, everyone will be able to speak the Advanced Really-simple Syndicated Expression language (ARSE). Implants will enhance our ability to code our thoughts in ARSE, allowing us to publish our unvarnished (or varnished) mental states (with video!) straight to the infernet where we will all use our ability to speak ARSE to aggregate each others thoughts by sorting through the noise on the radiation band specifically designated for mass telepathic communications.
We're talking everyone knowing what everyone else knows in real-time, the collective subconscious, learning through osmosis.
Were all gonna be syndicated, baby.
In the meantime, here are some aggregators that will have to do. I like Amphetadesk.
--In the newly established Metametadata tradition, the following are not links to the actual aggregators, but links to lists of links to the aggregators (you know, for context). Some of the links are other blogs that you may want to aggregate. Isn't aggregating fun!
Or, how about articles about aggregators?
And now for the blogs/feeds I aggregate.
- Gothamist -- http://www.gothamist.com/index.rdf
- Itunes 10 New Releases -- http://ax.phobos.apple.com.edgesuite.net/WebObjects/MZStore.woa/wpa/MRSS/newreleases/limit=10/rss.xml
- Lockergnome's Technology News -- http://www.lockergnome.com/lockergnome.xml
- Montague Institute Review -- http://www.montaguelab.com/digest.xml
- Broog: Alien Film Critic -- Despite (because of?) his vast alien intelligence Broog does not syndicate his criticisms, you must travel to his webpage to be brainwashed (and eaten?)
Librarians, these folks do it way better than I do at the moment.
- Rawbrick -- http://www.rawbrick.net/remaindered.xml
- Bookslut -- http://www.bookslut.com/blog/index.rdf
- The Shifted Librarian -- http://www.theshiftedlibrarian.com/rss.xml
And you absolutely have to go to this site, bookmark it, learn it, live it, love it, you know.
January 12, 2004
Okay, so the site is created using a big table, which isn't ideal. Ideally instead of faking the xml, each bit of content would just be an xml file, or better yet rdf. Then I'd just use an xslt to render the page in xhtml, but that's in phase two.
But that's enough about the site. If you want to see how a proper personal weblog/website is done by a professional librarian, check out rawbrick.
January 11, 2004
So I've redone the site, here's the link. It's still a work in progress, but already it's gimmicky, geeky and it breaks every rule of good user-oriented design (well maybe not every rule).
I hope at least, that you all have a chuckle at the use of xml tags as a display mechanism.