January 19, 2005
Why couldn't they just build a massive catapult?
Oh lord,
First there was this:
You'll just have to read this whole page of the thread
Then there was this:
And now we have more at this thread:
Here's the first one I found You do not simply walk into Mortor at Metafilter
January 11, 2005
folksonomies vs. controlled vocabularies
folksonomies + controlled vocabularies
At Many 2 Many, a group weblog on social software to which Clay Shirky contributes, Shirky and Lou Rosenfeld seem to enjoy a disagreement on the merits of folksonomies vs. controlled vocabularies for organizing I'm not entirely sure what. I have to admit I agree with Lou Rosenfield that we should be thinking about metadata ecologies and collaborative metadata creation between content creators and professional content organizers. But the proponents of folksonomies have been quicker to recognize the dynamic nature of metadata, to the deserved chagrin of professional catalogers who should well know that no catalog record or controlled vocabulary is ever "finished."
Shirky claims that hiring professional catalogers to maintain controlled vocabularies is "not extensible to the majority of cases where tagging is needed." I wish I knew what these cases were. I suspect he means cases in a domain in which Rosenfeld, Information Architects and Librarians would be wise to follow Shirky's lead, namely the World Wide Web.
It seems to me that a lot of information organization happens over networks and on web servers and has as its object 'born digital content', but yet isn't a part of the world wide web. My occupation is to find the born digital content that is the output of the fulfillment of a major educational institution's teaching and research missions and organize it to do five things:
Help visitors find their way through the digital collections
Organize complex and compound electronic objects into effective educational resources
Satifsy copyright interests and manage access to the collections
Preserve collections over time and across new technologies
Manage digital production workflows, interactions and responsibilities
It seems to me that no folksonomy could accomplish all this all by itself, but several controlled vocabularies do. I work with Learning Objects (whatever those are), lecture notes, problem sets, datasets, slides, movies, structured text documents, audio files, and more. This amounts to a not insignificant amount of material, some of which eventually might make it to the world wide web, some not.
I think that the world wide web is the point where Messrs. Shirky and Rosenfeld start talking past each other. Shirky is right, all the Vengeful Info-Ninjas in the world couldn't tackle the task of cataloging the web in a meaningful and cost effective way. But the world wide web isn't everything. Neither is the world wide web as big as we think. Mostly it's comprised of a lot of little things that aren't really related at all. Google has the right idea about searching the morass. The answer seems to me (and this is hard for a librarian to admit) to throw computational cycles at it. As long as the web remains basically hypertext with no inherent organization this is the probably the winning strategy. And the rise of search as the next great tech industry war seems to confirm Google's hunch. But even Google recognizes that there's a lot more to search than all the html in the world. In fact, html might not be the most important thing. I think that the creators at the bottom of the bottom up web have a lot of other kinds of information that is valuable to them and I think that the next great contest will be to see who can provide the tools to allow individuals and small confederations therof the ability to search their own collections of information. Much of this isn't stored on web servers and accessible via hypertext protocols. Most it is stored in closed repositories or even, heaven forbid, non-digital. A lot of it is in the hands of professional catalogers and other scholarly information vendors and there is a great amount of value to their aggregation and organization of these hidden (to the web, and for now Google) information sources. If there were no value to the dark web, if the web world weren't interested in vetted, scholarly material than why Google Scholar? Why Google print? Why all the hubub about Wikipedia?
I do not mean to discredit the great and meaningful information store that is the world wide web. And I admit controlled vocabularies are not the way to go to organize it. Proponents of folksonomies are correct to recognize that everyone is going to have to do their fair share. It means those with good information organization skills have to help those without. This is my message to fearful librarians who think Google will replace them. Our job was never to be the gateholders. We must be mindful of our mission to teach others to have a healthy, ongoing relationship with information.
I've been cataloging electronic teaching material via IEEE's Learning Object Metadata and IMS Content Packaging for five years now. If there's one thing I can say about non-Marc, non AACR2 metadata for digital objects, it's this: the metadata for any one object is never static, never DONE! No matter how many taxonomies you throw at it.
We've got a project to take a static web publication of every classroom teaching resource that can be digitized and package it in xml for delivery to an institutional repository and for migration back to the learning mangement system from whence it came. Once we've got IMS Content Packages we hope to be able to more easily ship the objects around to the multitude of Information warehouses on on campus that were formerly silos with little or no communication between them. We started with the idea that we might be able to identify learning objects and push them to our patrons. We ended up realizing that we couldn't decide for our patrons, in their ofttimes unique contexts, what is a learning object. Instead we're focusing on allowing the descriptive and structural metadata for each object to change and grow over time. We're working to discover how to enable whatever course content lifecycle our end-users can dream up. This means letting (and asking, which is so much harder) our end-users to provide the contextual metadata and capturing it an fashion aware of that context and its applicability, or lack thereof, in other contexts. If one system organizes courses by session, while another deposits all resources into buckets by kind, how can we deliver the content to a patron that allows him to choose one organization over the other or to start with one and smoothly move the data to another? These are the questions we seek to answer, knowing that some of the metadata we write will go away, some remain and new information will accrete on the content as it lives its life in our electronic teaching systems. This new lifecycle would not be available to these digital teaching resources were it not for the organizational effort of professionals, neither is it possible without the cooperation of one's intended user group.
On one of my projects I developed an information model to capture metadata the would satisfy copyright requirements, populate the search index, and provide information necessary for preservation. Many controlled vocabularies were used as well as good AACR2 principles wherever possible. The workflow that was put into place involves receipt of material from faculty, and handoff to a team of surrogate authors contracted in India who scrub the documents and provide an initial pass of metadata. The metadata team I lead then provides a professional review. As the project proceeds we have been able to push more and more of what is traditionally a professional librarian's responsibility to individuals who don't even have the benefit of familiarity with the material! And to compound the difficulty that authoring team adjusts its staffing levels to match publication cycles based on the institution's semester cycle. Every six months there are new faces to train. Naturally, the results have been outstanding. Collaborative metadata creation can work. And controlled vocabularies can be used to enable the sort of folksonomic information organization in "those cases where tagging is needed."