May 03, 2005
Tools in Personal Environments: A Taste of New Technologies
by Megan Fox.
I'm blogging from another library conference, this time it's the NEASIST Spring Program Syndicate, Aggregate, Communicate: New Web Tools in Real Applications for Libraries, Companies and Regular Folk. Check out the Program Weblog.
My notes on the first presentation:
See the outline and bibliography here.
The Toys, er, Tools
Tools are getting cooler and are finally starting to converge. Phones are no longer just phones and pda's are no longer just calendars. The new devices are starting to combine more than two functions (phone, messaging, mp3 player, web browsing, personal organizer). The words of the day are "handheld" and "mobile." This is what our young users have and how they want to access the library.
How are librarians using these tools?
Distribution of information (hours, circulation, news, events, policies via small format web pages, text messaging)
Some dissemination of library collections (ebooks, audio)
Reference (real-time messaging)
Tours, Tutorials and more!
Blogs
Blogs are on the scene. (Cover of business week)
Everybody's blogging: librarians, vendors, oclc.
Blogs are not just personal diaries.
In addition to use by indivduals as a professional development tool, libraries as organizations are starting to use weblogs. Some are starting to call these weblogs "Enterprise blogs".
"Enterprise blogs" are used internally in the library for information distribution, project managment, knowledge management, and content management.
Blogs are also being used by libraries to share information with their patrons (see tools section for list of information).
RSS
Really Simple Syndication or Rich Site Summary or RDF (Resource Description Framework) Site Summary.
RSS is a reformatting of website/weblog content in xml for easy syndication.
When used with an aggregator, RSS simplifies the process of scanning the morass of information resources on the world wide web. If supplemented by instant messaging for immediate communication may be able to replace the widespread (mis?)use of email.
Like blogs, first useful to librarians as a professional development tool.
Libraries are also using RSS to enhance services, again in the area of sharing important information with their patrons (see tools section).
RSS allows library patrons to personalize their information exchange with the library (selecting and customizing the information feeds to which they subscribe).
Podcasting
An application of RSS to publish audio. A podcast can be thought of as a blog with audio and RSS. Patrons can subscribe to the podcast and get updates when new audio files are available. The podcast reader will automatically download the audio files to the patrons mp3 player when they sync it with their computer. In this fashion the internet is being used to syndicate regular audio programs as an alternative to radio.
Wiki
A collaborative tool for writing web sites and pages. Part of what is called the "Read/Write" Web. This means giving editorial control to multiple persons in a group. Wikis represent collaborative memory, and a great content and project management tool.
Instant Messaging (IM) and Text Messaging
The "Millenials" IM far more than they email or any other of the technologies.
Libraries are starting to use IM to communicate with patrons (often for the first time). Many IM Reference desks have recently opened. Before Libraries implement this service they need to think through the policy questions. IM can be demanding and needs to supplement, not replace other librarian/patron interactions.
Toolbars/Deskbars
These are ridiculously useful and are part of trend of integrating internet/web services with desktop applications other than the browser window. Why travel to google.com when you can very quickly engage the google search engine with many fewer clicks.
Desktop Search
The Web Search vendors are also producing applications that will index your hard drive and improve your ability to search for files/applications (and bits of information in files/applications) on your own hard drive.
Bookmarklet
Bookmarks with a tiny bit of javascript to do more than just retrieve html pages. It might also search the page/database. Bookmarklets make our life easier.
Social Bookmarking
New tools to allow you to store your bookmarks on a public server and catalog them with your own tags and share these bookmarks (and tags) with others.
Folksonomies/Tagging
Megan mentioned that Folksonomies put the power to organize information resources in the hands of the public who supposedly are most familiar with the content and also familiar with how they will search for the content. Is this true? Are patrons aware of their organization and search habits?
Megan mentioned that folks think that calling Folksonomies "Tagging" diminishes them. That there is something in Folksonomies that transcends untrained individuals adding keywords to their public information resources.
? Is the distinction between Folksonomies vs. Tagging real? Have Folksonomies ever really happened?
Metametadata's Comments
It seems like the early adoption of all these tools is undertaken by brave pioneers who employ them for professional development. But that the real usefullness of these tools comes when libraries themselves employ these tools as new mechanisms for exchanging information with users.
It seems like the benefit to libraries that these new tools provide in common are:
- New avenues for disseminating important information to the patron
- New ways to communicate in real time with patrons
- New ways to collaboratively organize and produce information/collections within the library
April 26, 2005
NEASIST Spring Program Weblog
In the true spirit of metametadata, the producers present the following PSA:
The New England chapter of the American Society of Information Science and Technology (NEASIST) is preparing a spring forum on new social communicaiton tools.
Syndicate, Aggregate, Communicate:
New Web Tools in Real Applications for Libraries, Companies and Regular Folk
Tuesday, 3 May 2005, 9am-4pm
Providence College, Providence, RI
http://www.asis.org/Chapters/neasis/pc/programs/20050503.html
The NEASIST program committe has prepared an impressive line up of speakers to discuss all the latest cool toys (wikis, flickr, im, podcasting, weblogs). In support of the event the committee has also prepared forays into many of the tools including a flickr account and a weblog on this very domain! We will be podcasting all the presentations at the metametadata.net/neasist weblog!
Please go visit http://www.metametadata.net/neasist to learn more about the program and the tools.
For those of you scoring at home, this is a weblog entry about a weblog about a program about weblogs!
April 08, 2005
Redefining the Role of Catalogers in the Age of the Semanitc Web
Last presentation of the day, and probably the most interesting (by which we (who's we?) mean the least practical (meaningful?)).
Semantic Web
There's a new one coming through from the W3C, wait what the heck is the old one?
Machine-Understandable Data (as opposed to Machine-Readable Data)
Not machine storage, index and retrieval
But machines mimicing the operations of someone who understands what the data is.
Moving beyond the notions of word frequency in documents, link ranking and boolean queries.
Semantic Web depends on:
- XML - eXtensible Markup Language
- RDF - Resource Description Framework
- OWL - Web Ontology Language
XML
XML allows data elements to be encoded according to their semantic meaning (not just their physical format like html). XML separates meaning from format, content from design.
XML provides access to the deep web, providing web-accessible structure for data contained in databases.
XML allows web pages to be both dynamic and cataloged. XML is a switching lanaguage, a translation language.
XML means multiple, overlapping markup standards.
RDF
RDF is an XML-based model for making [Aristotelian] statements about resources.
RDF is subject-predicate-object triples
"Resource A has an attribute with the value "B"
Subject = Resource A
Predicate = has an attribute with the value
Object = B
You'll notice that the above three identifiers are themselves triples (the predicate in this case is = (wait, this comment is a triple (you get the picture))
Assumptions behind RDF:
- RDF can be expressed in XML; it is not, however, dependent on XML [RDF just sucks in XML]
- Everything--not just websites--can be assigned a Uniform Resource Identifier (URI), not necessarily a Uniform Resource Locator (URL) (even people, ideas, emotions, values)
- Information organization works best when organized from the ground up rather than the top down.
- Information organization is a matter of making "statements" about resources that preserve context and make sense within the information system
[My problems with RDF
Can we make Aristotelian Statements "that preserve context and make sense?"
Is there an end to the recursion of triples?
Who controls the dictionaries?
Whe creates/controls the semantic equivilancies?]
A Semantic Web is really a web of triples
And looks like a web when you display it.
Subjects and Objects are Entities
Predicates are arrows
Predicates are the elements?
OWL
Ontologies define the predicates/arrows/elements
In CS equals a machine-readable expression of a shared conceptual framwork.
Defining meaningful entities and their relationships with each other.
Ontologies are usually expressed as a combination of a classification scheme and controlled vocabularies.
Ontologies are designed to link similar concepts in different namespaces.
Ontolgoies are designed to increase a search agent's ability to interpret data in a different domain.
Ontologies link namespaces via equivilancies.
Certain processes of inferential logic are possible upon properly prepared RDF encoded documents linked via Ontologies.
This logic leads to proof and trust.
Why should libraries care about the Semantic Web.
How libraries currently use the WWW.
Libraries are using the web (most prominently via OAI PMH) as an irrigation system for swift transfer.
Detailed metadata within an otherwise closed domain are simplified and made available to other places. These other places also simplify their more detailed and otherwise closed domain specific metadata.
What's it like outside the library?
- The world is full of experts
- The world is full of enthusiasts
- Human culture is saturated with complex, nuanced, important relationships
What can libraries bring to the table?
Information resource description, organization and access is what we do (we ask the right questions that no one else thinks of)
Information evaluation (collection development, reference)
We do not realize what a rich, sophisticated body of theory on bibliographic relationships [Ranganathan]
A speculation on what bibliographic representation might look like in the Semantic Web (see slides)
Separate truly local information from non-local bibliographic information, author information, work [I assume in the FRBR sense] information and related works.
Cataloger Repositioning
We shift our cataloging role from taking the time to write out the limited amount of information we are able to write to finding reliable information and linking to it via RDF. We would create/control the semantic equivilancies, based on dictionaries we trust (we're good at finding trustworthy information). We would control the semantic web because we would take the time to build it. [Does this subvert the ground up approach? Is it feasible?]
Our mind is full of ideas and these ideas depend on documents. The intensity of our ideas is dependent upon repeatedly encountering these ideas in documents. Things accrue meaning as you encounter them again, and again, and again.
What we do in libraries is make it possible to bump into texts again. We make it possible that you will encounter a document more than once, having seen something you like, you can put your hands on it again. Ideas can build upon each other in part because of the work in libraries. This role of the library will not go away in the near future.
What is the purpose of the catalog (Lubetzky)
To provide access to entites that accrue from the objects on our shelves
authors (and all their works)
works (and all their editions)
We can give to the Semantic Web (FRBR), an intellectual structure upon which we can build these meaniful semantic structures.
Work =>> Expression =>> =>> Manifestation =>> Item
Questions from the Audience
Can we really define the dictionaries/Do we really need to?
Two completing notions
Complexity and Emergence
Emergent Semantics
The Web came out of the scientific community enamored of the belief that:
If you want structure you have to be able to step back and let the interested parties duke it out.
Information Organization came out of the library community with a low tolerance of the mess that will accompany an emergent structure.
The $64,000 question is Can The Semantics Emerge?
Someone needs to do the study and find out: does meaningful, shared semantics emerge?
Complexity Theory from AI
Three situations posited.
1. A Change, nothing happens
2. A Change, all hell breaks loose
3. 3 A Change, A Shift, the community continues to cohere in a new location
Maybe the library community is the part that holds everything together in times of change. Maybe we need to introduce enough of the concern for order and consistency to mix well with the excitement of the new. We hold the line and make the Semantic Web useful.
Managing the Metadata Morass: Applying Cataloging Skills Beyond the Traditional Catalog
Boston College Library Catalogers
BC Libraries is project oriented, even traditional cataloging (monographs and serials). They’re giving up being on top of the print, by shifting to looking at all of the initiatives, old and new, as projects and then prioritizing projects. Suddenly the electronic initiatives seem more important.
7 professional catalogers, 8 support staff catalogers
?What about training and adjusting staff to the new electronic projects? Baby steps, intense time consuming documentation for first project, then experience takes over by third project.
BC Libraries Technical Services is getting into consulting, working with other people’s data and providing a labor pool of copy catalogers.
Their philosophy is to maintain a seat at the table for all digital production/information organization projects, inside and outside the library. They don’t say no.
[Where are they finding the money to pay people to stay at the table and provide a labor pool? It’s good to see MIT Libraries are not the only ones interested in data outside the libraries collection and their movement to consulting validates the Metadata Services experiment, but we’re moving away from production to consulting only. Where are they finding the money to maintain any kind of meaningful level of production?]
Most of their services and databases are off the shelf (only the Electronic Resources Management database is a do it yourself project)
ERMdb (homegrown, web-based, perl, linux, mysql, for lib staff only, a info hub for other systems/databases generating reports)
In building the ERMdb they followed the DLF ERM best practices, but designed their own metadata schema and cross-walked it to the DLF ERM schema, also designed UI, functional requirements and workflow
DigiTool (Digital Asset Management System, an exLibris product, Oracle DB) It’s still in pilot mode, attempting to create a space for electronic objects that would be on the shelf if they were physical. Concerned with technical and preservation metadata. Concerned with Sustainability over time. Technical and Preservation metadata is intimately associated with descriptive metadata and the object itself. ?Whence the support and sustainability for the system?
We will continue to host the systems and servers they have now and be prepared to migrate the systems forward at the right time. They’ve committed to sustaining the collections, no matter how they make them available. So metadata mediated interoperability of description and storage is key.
Right now they’re focusing on cataloging, cleaning up records and ensuring the possibility of metadata transformation and exchange. DigiTool is incorporating EAD and METS (used for page turning). Of course one of the first projects was slides for professors.
The BC speaker confirms that jpeg2000 will embed metadata in its header file.
Different data sources create wholly different kinds of metadata. They describe objects in different ways and have different relationships with their objects. Catalogers are required to reconcile different approaches in one system like DigiTool.
The catalogers have a good seat at the DigiTool table ensure robust metadata creation, interoperability amongst multiple metadata schemes and incorporating multiple object formats.
The Cataloging Group has a workforce (how, why?)
Digital Commons (Institutional Repository, A Turnkey System, Remote, Out of the Box, a ProQuest product). A pilot is up for ’04-’05. Populated with dissertations. Still defining rules. What metadata? The cataloger provided the name authority control question. They have a full time cataloger to provide metadata for all objects in IR. Is the eScholarship Manger a cataloger? Also provided knowledge of OPAC and the connection between records in OPAC and IR.
?Who runs the IR?
FAST: A Subject Headings Schema Designed for the 21st Century
Welcome to the OCLC presentation. [ed. note - Must find new introduction]
OCLC Research Objectives
- Data Mining
- Understanding Users
Research Areas of Interest to Me
- Knowledge Organization and Semantic Web
- Authority Control
- Metadata Schema Transformation
FAST is a collaborative research project between OCLC, LoC, ALA/SAC. FAST stands for Faceted Application of Subject Terminology.
Google is the 21st Century
Latent Semantic Indexing and Page Rank
Library Catalogs are the 19th Century
LCC, DDC, LCSH, Card Catalogs, Controlled Vocabularies
Library catalogs technologies have been "digitized" and all are expensive compared to Google.
The question is which technology do we apply to get to the grey literature.
FAST is attempting to position itself as the answer, somewhere in between these two technologies and leveraging the positives from both.
FAST is a new approach to Subject Vocabularies
It's cheaper and easier to use than LCSH for electronic objects and compatible with a variety of the new metadata schemas
It is simple in structure and syntax
Usable by non-catalogers in non-library environments
Is designed for semantic interoperability
Is an adaptation of an existing schema
LCSH is the obvious choice for the usual reasons (established, supported), but sucks for cataloging electronic, web-delivered resources. It was designed for pre-coordinated card catalogs.
One of the problems is that LCSH has rules for creating new headings that aren't established. Of the more than 8.5 million distinct topical headings in WorldCat:
over 3 million of the headings are not established, but valid and used in multiple bibliographic records,
over 5 million are not established, but valid and used only once,
only 100K are established.
This is as ridiculous as it sounds. The rules are human derivied and hard to explain to a computer. They allow for a proliferation of headings, which defeats the purpose of grouping similar items under common headings.
FAST contains far fewer established topical headings (about 400K).
FAST normalizes the form of heading for machine encoding.
?Are FAST headings defined via RDF triples?
FAST will use the MARC 21 authority format.
FAST has 8 facets:
- Topical
- Geographic
- Form (Genre)
- Chronological
- Personal Names (Names as subjects)
- Corporate Names (Names as subjects)
- Conference Uniform Titles
- Meetings Uniform Titles
Fast will keep general or "X" subject divisions
[ed. - Genre lists are always inadequate]
FAST is till hierarchical, but loses specificity.
FAST facilitates both pre- and post-coordination.
FAST is available as a OCLC SiteSearch database
The authority file is in beta
Fast enables cool faceted search and browsing technologes (not semantic web).
Conference Blogging
So I'm at another Information Science Conference, and guess what, I am making blogging.
Welcome to Metadata and Meaning: Creating the 21st Century Catalog brought to you all the way live for '05 by the New England Technical Services Librarians (NETSL). This is one of my favorite organizations, mostly becuase I like to pronounce their acronyms (thanks Walt!). Go ahead, try it ... NETSL.
Sharing a few notes with the world is the perfect way to fill the dead time between presentations while avoiding awkward conversations with your colleagues.
This is my second conference at which I'm blogging (the first was the '04 ASIST Annual Conference, #1, #2, #3, #4, #5).
I'm planning to volunteer to blog the upcoming Spring NEASIST program Syndicate, Aggregate, Communicate: New Web Tools in Real Applications for Libraries, Companies and Regular Folk. Conveniently this program will be on technologies like making blogging and the conference organizers are mad to apply as many of the technologies to the event itself.
What I'll be doing today.
1. Bringing you an important PSA
2. Updating a work blog (I can't link to it as it's on a testing server and not yet ready for primetime).
3. Keep running diaries of presentations I attend, recording anything interesting I hear.
March 24, 2005
So Yahoo buys flickr with which to beat Google
Cnet article on Yahoo's acquisition of flickr
Metafilter thread on same topic
Cory Doctorow's notes from the Folksonomy forum at the 2005 Emergent Technology Conference (Etech) (Doctorow writes for boing boing)
Many 2 Many's notes on the Folksonomy forum at Etech
I went looking for the older thread on folksonomies at Many 2 Many where Clay Shirky and Lou Rosenfeld duke it out, but heck, every freaking single post at Many 2 Many these days covers this topic, so just go read them all. Especially look for the articles on "Emergent Semantics" and "One World, Two Maps".
I think the thing that annoys me the most about "folksonomies" is the confusion of lay people (and professionals!) around what users are actually doing when they "tag" images at flickr. They are not, as the Cnet article suggests, "classifying". The whole notion of the community of flickr users regulating a common semantic vocabulary ala Wikipedia is absurd. There is no more wisdom in the flickr crowd than in us collective google users and pagerank spammers, only a deep abiding interest in porn and other lowest common denominators. Even flickr's peer pressure technology threatens to get as ugly as that high school popularity contest you've been repressing all these years.
If you talk to the flickr guys, especially the President of Ludacorp Stewart Butterfield, they will describe for you a very different, very personal "labelling" effort, much closer to what you see at Google's gmail. People are not collaboratively creating and defining buckets into which they deposit their content so that the unknown, average user can later find images via the shared semantic vocabulary. When you spell out this misconception it really takes on its full ridiculousness. By that I mean, who but librarians actually takes an interest in the information retrieval needs of their fellow web denizens?
What people are mostly doing at flickr is attempting to attach labels to their content that will most closely match search terms they themselves will use later to find their own stuff. Butterfield describes it as (from Cory Doctorow's ETECH notes):
Stewart: It's not really categorization on Flickr -- it's about letting users remember. If I add the "Norma" tag to pix of my mom, whose name is Norma, I don't think it goes into the Norma category. The unfortunate thing about the term "folksonomy" is that it implies that it's a replacement for categorization. People categorize things by noting what they do or don't have: mammals have hair and live babies; does it have property a? then it's a whatever.
As Butterfield says, its a very personal effort. What gets created out of this rather large, unappreciated effort is a large number of personal external memory systems. Individual users know that they'll forget the specifics of the image, but they'll want to be able to retrieve it, so they try to anticipate what words they might use later to search for the item. These words may have absolutely nothing to do with any kind of description of the content, format, or any of the other traditional cataloging facets. They are likely not to be the same words that anyone else would use. And yet, many of the pundits, experts and gurus are convinced that a "semantics" will "emerge." Y'know, just like it did at Google.
The average uninformed user has a snowballs chance in hell of conducting decent research in flickr, or at Google (this may be the only point on which Michael Gorman and I agree). The most they can do is enjoy serendipity. Folks like Clay Shirky who claim folksonomies to be the wave of the future make me almost as upset as Google itself, which is guilty for turning the world wide web into a high school popularity contest and convincing people to settle for "good enough." I can't decide who is worse, Google for convincing people to settle for the least because it takes zero effort, or Shirky for telling librarians to quit because the Google driven economics won't support their effort.
An example of "good enough" thinking, again from Butterfield (from Cory Doctorow's ETECH notes):
Stewart: The objective of tags shouldn't be to exhaustively cover the field -- we'll have a million photos of Tokyo, and if the TOKYO tag only gets you 400k of them, it's OK. You're only going to look at 20 of them anyway.
This is why academics have problems with Wikipedia. Their students, the ones using Wikipedia to conduct research, get the wrong idea about how you go about the Scientific Method, and how you conduct exhaustive research in any given subject area.
And I wish I could have gone to the Etech and SXSW conferences, seems like everybody in the technology biz is preoccupied by metadata these days. I wonder when the cool kids will find my weblog.
January 11, 2005
folksonomies vs. controlled vocabularies
folksonomies + controlled vocabularies
At Many 2 Many, a group weblog on social software to which Clay Shirky contributes, Shirky and Lou Rosenfeld seem to enjoy a disagreement on the merits of folksonomies vs. controlled vocabularies for organizing I'm not entirely sure what. I have to admit I agree with Lou Rosenfield that we should be thinking about metadata ecologies and collaborative metadata creation between content creators and professional content organizers. But the proponents of folksonomies have been quicker to recognize the dynamic nature of metadata, to the deserved chagrin of professional catalogers who should well know that no catalog record or controlled vocabulary is ever "finished."
Shirky claims that hiring professional catalogers to maintain controlled vocabularies is "not extensible to the majority of cases where tagging is needed." I wish I knew what these cases were. I suspect he means cases in a domain in which Rosenfeld, Information Architects and Librarians would be wise to follow Shirky's lead, namely the World Wide Web.
It seems to me that a lot of information organization happens over networks and on web servers and has as its object 'born digital content', but yet isn't a part of the world wide web. My occupation is to find the born digital content that is the output of the fulfillment of a major educational institution's teaching and research missions and organize it to do five things:
Help visitors find their way through the digital collections
Organize complex and compound electronic objects into effective educational resources
Satifsy copyright interests and manage access to the collections
Preserve collections over time and across new technologies
Manage digital production workflows, interactions and responsibilities
It seems to me that no folksonomy could accomplish all this all by itself, but several controlled vocabularies do. I work with Learning Objects (whatever those are), lecture notes, problem sets, datasets, slides, movies, structured text documents, audio files, and more. This amounts to a not insignificant amount of material, some of which eventually might make it to the world wide web, some not.
I think that the world wide web is the point where Messrs. Shirky and Rosenfeld start talking past each other. Shirky is right, all the Vengeful Info-Ninjas in the world couldn't tackle the task of cataloging the web in a meaningful and cost effective way. But the world wide web isn't everything. Neither is the world wide web as big as we think. Mostly it's comprised of a lot of little things that aren't really related at all. Google has the right idea about searching the morass. The answer seems to me (and this is hard for a librarian to admit) to throw computational cycles at it. As long as the web remains basically hypertext with no inherent organization this is the probably the winning strategy. And the rise of search as the next great tech industry war seems to confirm Google's hunch. But even Google recognizes that there's a lot more to search than all the html in the world. In fact, html might not be the most important thing. I think that the creators at the bottom of the bottom up web have a lot of other kinds of information that is valuable to them and I think that the next great contest will be to see who can provide the tools to allow individuals and small confederations therof the ability to search their own collections of information. Much of this isn't stored on web servers and accessible via hypertext protocols. Most it is stored in closed repositories or even, heaven forbid, non-digital. A lot of it is in the hands of professional catalogers and other scholarly information vendors and there is a great amount of value to their aggregation and organization of these hidden (to the web, and for now Google) information sources. If there were no value to the dark web, if the web world weren't interested in vetted, scholarly material than why Google Scholar? Why Google print? Why all the hubub about Wikipedia?
I do not mean to discredit the great and meaningful information store that is the world wide web. And I admit controlled vocabularies are not the way to go to organize it. Proponents of folksonomies are correct to recognize that everyone is going to have to do their fair share. It means those with good information organization skills have to help those without. This is my message to fearful librarians who think Google will replace them. Our job was never to be the gateholders. We must be mindful of our mission to teach others to have a healthy, ongoing relationship with information.
I've been cataloging electronic teaching material via IEEE's Learning Object Metadata and IMS Content Packaging for five years now. If there's one thing I can say about non-Marc, non AACR2 metadata for digital objects, it's this: the metadata for any one object is never static, never DONE! No matter how many taxonomies you throw at it.
We've got a project to take a static web publication of every classroom teaching resource that can be digitized and package it in xml for delivery to an institutional repository and for migration back to the learning mangement system from whence it came. Once we've got IMS Content Packages we hope to be able to more easily ship the objects around to the multitude of Information warehouses on on campus that were formerly silos with little or no communication between them. We started with the idea that we might be able to identify learning objects and push them to our patrons. We ended up realizing that we couldn't decide for our patrons, in their ofttimes unique contexts, what is a learning object. Instead we're focusing on allowing the descriptive and structural metadata for each object to change and grow over time. We're working to discover how to enable whatever course content lifecycle our end-users can dream up. This means letting (and asking, which is so much harder) our end-users to provide the contextual metadata and capturing it an fashion aware of that context and its applicability, or lack thereof, in other contexts. If one system organizes courses by session, while another deposits all resources into buckets by kind, how can we deliver the content to a patron that allows him to choose one organization over the other or to start with one and smoothly move the data to another? These are the questions we seek to answer, knowing that some of the metadata we write will go away, some remain and new information will accrete on the content as it lives its life in our electronic teaching systems. This new lifecycle would not be available to these digital teaching resources were it not for the organizational effort of professionals, neither is it possible without the cooperation of one's intended user group.
On one of my projects I developed an information model to capture metadata the would satisfy copyright requirements, populate the search index, and provide information necessary for preservation. Many controlled vocabularies were used as well as good AACR2 principles wherever possible. The workflow that was put into place involves receipt of material from faculty, and handoff to a team of surrogate authors contracted in India who scrub the documents and provide an initial pass of metadata. The metadata team I lead then provides a professional review. As the project proceeds we have been able to push more and more of what is traditionally a professional librarian's responsibility to individuals who don't even have the benefit of familiarity with the material! And to compound the difficulty that authoring team adjusts its staffing levels to match publication cycles based on the institution's semester cycle. Every six months there are new faces to train. Naturally, the results have been outstanding. Collaborative metadata creation can work. And controlled vocabularies can be used to enable the sort of folksonomic information organization in "those cases where tagging is needed."
December 14, 2004
Project Ocean
I can't wait to find out this morning if John Batelle's got the real scoop on Google's latest collaboration with Academic Libraries/Repositories. (found via Boing Boing)
This announcement, following so closely on the heels of Google Scholar (via metafilter), might frighten some librarians, who imagine themselves being chased into obsolescence by computer scientists throwing processing power at the web's information organization challenge.
In fact, these efforts should serve notice that Google "gets" information access and retrieval, and that they view themselves as the library's ally, not competitor. All of their services value and build upon our information organization work. I'm going to have to visit my colleagues up the road and find out who's running this project.
November 17, 2004
Meatballwiki and Blogdex
Okay, last ASIST Annual Meeting entry, and guess what it's about.
Blogs, and the blogging bloggers who blog them.
That's right its "Beyond the Sandbox: Wikis and Blogs that get work done"
And I'm spent, got no more Infotainment in me.
Go check out:
<rant> <!-- Insert tirade on the self-referential, closed community of the so-called "world wide web" --> </rant>
November 16, 2004
Quick links and cool Infostuff
Okay, I realized I should at least give you the opportunity to go on to learn more about ASIST and its Annual Meeting.
And I have a really cool poster to share that I helped my colleague William Reilly prepare for the SPARC workshop he'll be attending later this week.
So there.
K-Blogs, Wirelessness and the Cult of Personality
Today's ASIST Annual Meeting entry should have appeared much earlier today. I attended the presentation "Blogs for Information Dissemination and Knowledge Management" happily prepared to do what Metametadata do in real time while experts discuss what it is metametadata do. I mean the metameta opportunities were OFF THE CHARTS!
But no reliable wireless connectivity in that particular conference room. ARE YOU FREAKIN' KIDDING ME! Everywhere else in the hotel but conference room Providence II?!!! The presenters couldn't even access the internet to show in real time their own blogging efforts. Without internet access, why even talk about internet technologies? It should be painfully obvious to everyone by now how fundamentally important the basic infrastructure of the internet is and how it should be a common trust supported and enjoyed by all. It shouldn't even be a question, it should just ... be ... there.
So now I have to talk about blogging for oneself vs. for others and for knowledge capture vs. infotainment without the benefit of a punditry soundtrack. I mean at least I'm attending "Diffusion of Knowledge in the Field of Digital Library Development: How is the Field Shaped by Visionaries, Engineers and Pragmatists?" In other words how to develop your own cult of personality, a topic of interest to all bloggers (including myself) who hope to reach a wider audience.
I want to be one of the few lucky ones who get read by everybody. I feel qualified to be an arbiter of "cool." And I'd love people to pay me to pontificate on the future of the information society. Basically, as all readers of this blog are again painfully aware, I'm totally ready for Guruhood. Now I just need to find some people to hang on my every word. I think I've already started to cultivate the appropriately eccentric personality traits. I've stockpiled my somethings to say and now I just need some people to listen to me.
Any takers? (If so, feel free to link to this website!)
WOW, did I use enough emphasis in this article, or what. Don't you just want to sign up for my new lecture series?
November 15, 2004
Meaning and Animal Compassion
Day 2 at the ASIST Annual Meeting is all about Meaning.
First, Sir Tim Berners-Lee, inventor of the world wide web and director of the World Wide Web Consortium, or W3C (located in the Stata Center at MIT), promises us the real world through the magic of the "semantic web." Whereas the protocols that make up the world wide web, TCP/IP, HTTP, html and others represent syntatic agreement for creating and sharing hypertext, the semantic web represents a method for reaching semantic agreement about the content of our hypertextual communications.
The idea is that you define your terms within a specific domain. If you're creating learning objects in a courseware system and your using the Learning Object Metadata and IMS Content Packaging system you define what those schemas mean by a "Contributor" or "Manifest". That way, someone else in an institutional repository system which uses Dublin Core Metadata and the Metadta Encoding and Transmission Schema can programmatically and systematically map those to "Creator" and "StructMap".
This is accomplished without the creation of a grand unified Ontology (essentially one big set of definitions for everything that everybody uses). Instead everyone who adopts Semantic Web technologies warrants that they'll provide their domain specific definitions and then programs can interpret these definitions and find equivilancies.
The definition mechanism is the Uniform Resouce Identifier. A glorified url of the form http://... It is important to remember that uri's aren't the things they define, even in the virtual world. They're still nothing more than names. Just like menu entries aren't food and you can't eat them (I'm sure people have tried).
For example, I could define myself via a vCard or even just a plain old vanilla html where I've deposited my curriculum vitae or just some contact info. Then whenever I include myself in the metadata for any web content I create, I don't just write in my name, I point to the web page where I've defined myself. The mechanism for pointing is RDF, which is just a logical triple of the form Subject-Predicate-Object.
The idea is that you build your collection of information to share with the world from the viewpoint of in between really big scale operations and really small scale ones. You correlate the definitions provided under you and you make your definitions with an eye to their inclusion in larger correlations. The same definition activity goes on at all levels. The semantic web is a fractal web and it realizes the hermetic wisdom As Above, So Below.
You build your data system as a module of modules, incorporating data from different domains. You don't have to set definitions for each data store you tap into and you expose your data for even larger system via standards that allow the larger system to do likewise in aggregating you. No one is forced to one semantic standard, but each can understand each other with a minimum of human supervision.
I think there are two illusions in the RDF grand scheme.
The first is the Aristotelian Illusion that we can speak definitively that any one label, even a rdf triple, is the thing being captured. Despite explicit recognition that uri's are not the things they represent we still interact with triples as if they are. For example, is Gary Marchionini his entry in the NACO authority file maintained by OCLC or the similar record on Amazon? Obviously he's neither, and we should be pointing to some human being in space-time (assuming we share his inertial system). Who has the right to define him? Is it possible to dice up the entire universe into non-competitive domains? I can't imagine it for identity definitions, what about conceptual? I can't get past the dread that good old social conflict will subvert all our technological advances. I would really like to understand better how RDF plans to live happily with a myriad of definitions that don't quite match exactly.
The second illusion is that RDF encourages interoperability without gobbling up smaller standards. It attempts to let semantic standards alone, but it dictates a common syntax. The last time the W3C tried this was html, and how long did it take for the major browsers to implement three or four different flavors of html and the DOM? And how long did it take before the browsers started playing nice and made it easy for content providers to produce material that worked the same way across all systems. I would love to see the semantic web's game plan for avoiding the same sort of hijacking of the "standard."
The one thing that I do like is that Mr. Berners-Lee seems to be in concert with Ms. Hertz and a lot of others regarding the economies of scale on the internet. RDF and the semantic web proposes a solution to bad economies of scale regarding standardization and interoperability of data systems on the web. Departmentalizing the job of defining the content on the web lets everyone do their share and gives more responsibility to the content provider to produce good metadata at the time that they create their content. In a post-modern information society it's no longer good enough to dump your data on the world, you've gotta tell us what it means to you. Sounds kind of silly, but it goes a long way towards improving communication and understanding.
Secondly, Gary Marchionini (not Gerry) presents again, this time in Rhode Island in a presentation called, "Why can't Johnny file?" in which librarians lament the poor information organization behavior of the average user of the new information technologies. It is Gary's contention that we do not need to teach him to file. That there are better things for humans to be doing than organizing our inboxes and hard drives.
His main argument repeats the thesis of his earlier presentation in Cambridge. Let the machines do what the machines do, let the humans do what the humans do.
I agree with him to a point. And that point is this: It's less important to teach someone to file a file so they can find it later than it is to teach them to ask better questions about the information content of the file.
That said I have two caveats and hopefully some clarification.
First, Gary assumes future advances in technology. I am far more wary of making any bold predictive statements about the advance of automatic indexing of multimedia resources or facial recognition software. I wouldn't design instructional methods or encourage information behaviour that relies on a technology to present itself.
Second, Gary equates filing and other organization activities with discovery. In fact, there are three reasons to organize information by filing or classifying and the ability to find it again later is the least of the three. The second most important of the three is to be able to communicate something about the information to others. This information organizing activity is all-pervasive in our society and extends from sharing ipod playlists to writing surveys of the scholarly literature in a particular field.
The thing that we communicate when we share our organizations isn't some grand unified Ontology that we discover. We can all admit that there isn't one master organizing scheme that we can teach to. Rather by sharing our personal organizational efforts with others we share our relationship to information, our worldview, and creating this relationship is the most important thing we do when we file. We need to teach filing and classifying as one part of a larger program to teach people how to develop good relationships with information, how to evaluate it properly, how to find more and better information, how to put it to work for you, and how to relate it to existing personal information stores. We need to teach how to recognize how information affects our thinking, and how our thinking colors the information we encounter. In this respect all filing and classifying can be viewed as interpretation, and we should teach such basic skills as organizing things into simple buckets to see what happens. This is how we teach kids to invent/discover meaning in the world they encounter.
Near the end of the question and answer session following the presentation a fellow stood up and made the most salient point of the day. The session was titled wrong. We shouldn't teach filing, we should teach organizing. How stuff is stored in a server or hard drive or repository or database isn't what we need to know. But we shouldn't throw the baby out with the bathwater. We do need to know how to intellectual and mentally store stuff away. I would offer that what the fellow meant was that we should teach people how to mark up content, to add metadata that describes for themselves and others the personal and social context in which they interact with information and which captures the relative meaning which they found in it.
Okay, Number 3
So the Omaha people would send their young persons at puberty on what white folks ended up calling vision quests. What they were really trying to do was gain the favor of Wakonda (think the Tao rather than Yahweh). The idea was that a sincere child who stood in place for four days with clay rubbed on their head would be more likely to influence Wakonda than a cynical adult. As might be expected, the kids starting recieving visitations from different mediums, animate and inanimate. These visitations turned what was originally an event of supplication on behalf of the tribe into a personal appeal to the most imporant force in the universe for help throughout life, undertaken when a child was "old enough to know sorrow," in other words to engage in tribal relations as an adult.
This is the prayer that all Omaha know, that they all sing as they stand Non'zhinzhon, "as sleeping," for four days, trying to think the happiest thoughts they can for their future.
Wakonda thethu wahpathin atonhe
Wakonda! here, needy, he stands, and I am he
When the kids come back they can't say anything about the animal for four days, then they can go talk to someone who also was visited by their same medium (and they can talk to no one else, and no one else asks). The elder helps them to understand what happened and what it might mean for their relationship to the tribe.
It is said of that the medium "has compassion on" the child, that this is its motive for impelling its particular form on the child for the rest of the child's life. It has decided to take an interest in the child's future because it pitys the child.
I like the idea of going out to gain the favor of some animal or the other. I especially like the no talking about it for four days rule. It seems to really drive the experience home and allow the child to come to terms with it, to properly prepare before talking about it. Once the child has talked it over with an elder the child then goes out and finds the physical expression of that medium and fashions a trophy. If a hawk visited the child, the child would journey until he crossed paths with a hawk, would kill it and preserve the bird as a visible sign of his visitation, the child's most prized possession.
Kids would even become junkies for the process, going through it over and over again until they settled down with a family (if they became holy persons, today called roadmen, they continued the process their whole lives). It was generally frowned upon once one became a respectable member of adult society.
I read all of this from the Omaha Indian book by Alice Fletcher and Joseph La Fleshe. The bible for all things Omaha. When I put on my white perspective, it seems funny that a whole tribe of Indians with active social and cultural traditions would pore over a government scholarly study for ideas on how to be more Indian. It seems a testament to the horror of any culture being systematically euthanized by another.
When I think Omaha about it, it seems just another story. There are two kinds of stories in the book. The stories of the elders that Alice and Joseph recorded, and Alice and Joseph's own stories.
I've heard most of the stories in the book from third party sources like my grandpa and grandma Charlie Stabler and Elizabeth Sansouci.
Going to pray to Wakonda was just one of the many rites that a child passed through towards becoming an adult and forming an appropriate relationship with the nation and the world. It, "required a voluntary effort by which, through the rite of fasting and prayer, the man came into direct and personal relations with the supernatural and realized within himself the forceful power of the union of the seen with the unseen," as Alice says.
Seems like the child is finding meaning in the universe and finding ways to share that meaning. Seems Alice and Joseph are also finding meaning and sharing it.
The totem is metadata and its shared, in fact, it's the only part of the experience that is shared with everybody. And it needs no one definition.
November 14, 2004
Whoever has the best library wins
Coming to you straight out of the American Society for Information Science and Technology Annual Meeting in lovely downtown Providence R.I. is every ones favorite semi-structure data source, Mr. metametadata himself, the Keeper!
Thank you, thank you. I'm very delighted to be here. I just have two things to say.
A very bright person, J. C. Hertz, who makes a living applying game design principles to information systems for major products, services and learning systems (she's principal of a company called Joystick Nation, Inc.) just made some statements I took issue with in her plenary session.
These statements are (paraphrased):
any system that requires people to add their own metadata is doomed to fail
My discomfort with this statement is merely linguistic or epistemelogical. She spoke of "accidental metadata" and provided the classic example of an image caption as accidental rather than I don't know, maybe intentional? I'm not sure I understand the "accidental" domain. Does she mean information that can be mined from the data object itself? Does she mean contextualizing information that people add as they position their data objects in some information space? Does she mean information that is the result of interaction with data objects? All of these are candidate activities for generating metadata byproducts. In fact the product of some of these activities really wants to be both primary data and metadata at the same time.
She most strongly made a distinction between people providing unasked for information as they positioned their data in an information space versus information provided as a result of the information space dropping a cataloging form in front of people and saying you have to fill this in before you can submit your data object. I think that in either case there is an expectation on the part of the owners/participators in an information space for a provider of a data object to also provide metadata. My response to Ms. Hertz is that there just isn't social momentum for making that expectation explicit. There still are strong unspoken social pressures in the information culture that keep most people from dumping a bunch of uncontextualized data into the pool. What metadata specialists do is find ways to work with data providers to take advantage of these pressures to formalize "accidental" metadata offerings to the providers' benefit.
For example, the Metadata Services Unit in the MIT Libraries provides metadata application services for MIT OpenCourseWare. The Unit creates Learning Object Metadata records for OCW resources. MIT OCW is in many respects an electronic publication. Like other publications they provide image captions. These captions contextualize the image and provide copyright information. There is no official publisher's rule that says you have to caption your images, but most people do. The caption is metadata that just plain makes the image more effective as an information resource. It's in the interest of MIT OCW to provide this caption and they've mandated it's existence in their style guide. They are, like most data providers moving to electronic information delivery, unfamiliar with xml, semistructured data and metadata records. It is natural for them to write a caption into a web page right where they embed the image. This provides a benefit to viewers of the web page and also means there's some text about the image that gets indexed for the search engine. The caption in the html makes the image more easily discovered. It is the role of the Metadata Services Unit to find this metadata OCW has provided and get it into a Learning Object Metadata record. Why duplicate the existence of this information in data and metadata? Because now, when you search for images and get a list of them back, you get the caption included in the list. It is much easier to capture the caption in a metadata record field and then display that field on the search results page. If you doubt that this is easier than grabbing the caption straight of the html page go try to make sense of the keyword in context part of a google results list. The two metadata record fields presented in a search results page are General.Title and General.Description. Because the Unit put the caption in the description field patrons have access to that information as they're making a decision whether or not to actually view the item. Once again, metadata (same info, different wrapper) aids information retrieval.
What this seems to point out to me is that there has always been a fuzzy line between data and metadata, and that the information explosion precipitated by hypertext and networking technologies has brought this confusion into high relief. The more stuff that gets into the information space the faster we seem to need to produce even more. This means that we start demanding more for our metadata buck, it must do more for less effort. When folks start drawing clear lines where none existed before it's often because they're drawing in the wagons and abandoning any effort that doesn't seem to realize a return. In this kind of environment there really should not be any metadata "accidents". It's the responsibility of information science professionals to make data providers aware of this and help them form a stronger, more cost-effective relationship with metadata rather than attempt to abandon it all together.
libraries represent a well-organized, well-funded, well-mandated effort that doesn't scale well
This rounds out the arguments against human analysis and metadata production for the mass of data on the internet. Basically, data providers are lazy and data organizers can't afford to send enough librarians at the task. I think that both are misunderstandings of the current information economy. We've seen that data providers do provide metadata intentionally and there are methods to formalize this relationship, cementing the social motivation to provide metadata and realizing an economy.
We can also find the funds to catalog the internet, we've just been looking in the wrong place. Before networking and hypertext it was expensive to get a bunch of data into the world. For ease of distribution a bunch of ideas would aggregate (think gravity) into a book, an easy way to share a lot of information in one package.
It's no longer so expensive to send a bunch of ideas out there so they tend to avoid aggregation (think dark energy pushing infobits apart). The semantic density of the ASIST conference proceedings book seems galactic in scale next to this humble weblog post.
Now this same semantic density scale correlates with the scale of effort on the part of the author. It takes a book author much longer than the 30 minutes or so I've spent throwing of this ephemera. And the scale of reward is similar for the author. But this scale of effort does not hold for the librarian who might catalog either on their way to a library of digital repository. It takes the same 15 to 20 minutes to catalog a book as it does one weblog entry.
This is what Ms. Hertz means. There's enough information in books and serials to justify an international collaborative effort amongst governments, institutions of higher learning and information professionals to provide a sophisticated level of information organization.
The information space that is the internet is screaming for this same level of organization, but the justification for a similar effort on the part of librarians seems absurd. However, if you look at the scale of effort and reward on the part of data providers, it seems apparent that the creators and aggregators of this content could and should shoulder this burden, even if it means hiring the librarians away from Universities and Public Libraries. Librarians need to focus on marketing their service offerings to the data providers wherever they are. This is our entry point, how we get to catalog the world wide web. It's likely to happen piecemeal at first. Especially considering the pluralization of data providers in the current information economy. Don't shy away from any project, no matter how small. We also need to find the right social leverage points to encourage good information organization behavior on the part of those who are producing the mass of content, again no matter how small.
Okay, things to say number 2.
Search Wars, Killer Apps and Why Librarians WILL Eventually Rule the World from the ASIST Special Interest Group: Information Architecture listserv (Thanks Boniface Lau)
It seems that the theme of this post is the old adage from Brown and Duguid (The Social Life of Information), "Social change must precede technological change".
I think that librarians get this and are working very hard to improve their ability to effect the needed social change. I think that the computer and internet technology communities are lagging behind. Their response to the need to improve the quality of metadata that accompanies electronic resources is not to leverage the social pressures that will prompt folks to provide this information, but to find some algorithmic, programmatic solution to pull it out of whatever a human sees fit to deposit. If you expect people to be lazy they will. If you expect more, they'll surprise you.
I don't think that you can process the amounts of data were now seeing through human effort alone, neither can you do it by machine alone. Even google employs an army of librarians to tweek the algorithms and improve the results of programmatic analysis. I just heard a great presentation at MIT by Gerry Marchionini from UNC Libraries about automatic generation of metadata. He thinks that we should be educating a whole generation of young people to be more savvy at information retrieval and interaction, including algorithm maintenance and development. He thinks that we can very easily develop more sophisticated information resource users with higher expectations for information systems and he promotes something he calls Human Computer Information Retrieval, the collaboration between man and machine to solve the problem of interaction with growing amounts of data.
Did you notice that the article reference is from a listserv sponsored by the group throwing the conference I'm attending?
And, yes I noticed that I'm blogging about information technologies like blogging from a conference about information technologies like blogging (I'm also blogging about the conference itself, from the conference itself) and that this is some sort of metametageekiness. As you should have realized long before now, I'm very comfortable with metameta events and environs, and geekiness.
October 07, 2004
Rawbrick's TV Audio Commentaries
As much as I say I'm not supposed to post to the old blog, I'm not showing any signs of stopping soon, or fixing the new weblog either. I know how rawbrick feels about laziness and projects on the periphery. Of course, I've only talked about all the cataloging I'd like to do to this point, she went out and did something about it.
I mean she spent a couple of days, bless her heart.
I had the good fortune to meet Rawbrick when she was in town for some Ex Libris conference. There should be more young librarians like her. Makes you really see the power of METADATA.
September 09, 2004
Metadata, It's The New Black
Okay, start with this post from metafilter about how moblogs
are beating traditional news sources to the story.
Now go read this article about the Flickr service that features prominently in the metafilter story. Be sure to notice the sweet use of metadata (keywords, essentially) to organize and discover photos. Tufte would be so proud of the visual display of the 150 most popular keywords. Classification from the ground up! And people are actually participating! And the keyword list isn't degenerating into an unwieldy mess because people are availing themselves of the opportunity at the point of applying each keyword to verify its conformance to the collectively developed vocabulary! All of you friends of mine out there posting your photos to various other services need to immediately switch to Flickr. Go! Do it Now!
Now check out the Metametadata flickr photo, with keywords.
Now read Rawbrick's take on collaborative metadata use and possible models for incorporating library training. Notice how the confirmation service Flickr has cooked into its metadata production process seems to try to programmatically occupy the middle ground Rawbrick defines. This is the stuff that gets me excited. I share Rawbrick's desire to see this space occupied by folks like us and love to see projects recognize the necessity. I think that this sort of thing is the true future of librarianship. Just a bunch of total geeks who can't get enough of information organization monitoring the infernet, making sure everyone is having a good time. Of course metafilter, flicker and other internet community services are totally beating librarians to the punch.
Your reward for making it this far. The early chronicles of a young keeper. (thanks OK/Cancel)
Yep, Librarians will one day rule the world. And the business community is starting to catch on (via Catalogablog). Of course they're making it overly complicated. I'm not sure an ur-element set is even feasible, much less warranted. This is almost sacrilege coming from a professionally trained librarian. We librarians love our standards and our master lists, but I don't think that this is the road to interoperability. I think that rdf and the semantic web will wind up not trying to make a master list of element definitions to which each individually developed metadata set will have to map. Rather semantic agreement will happen at a much lower level of communication.
July 14, 2004
The Semantic Web
My first post back from an unannounced and unforeseen leave (did anyone even notice?) is job related. Sorry about the long absence, I was kinda busy getting married and reaching another milestone on the OpenCourseWare project - 701 courses published and catalogued.
Anyway, here's an Introduction to the Semantic Web by my colleague Stefano Mazzocchi (thanks Catalogablog). Stefano and I are both working on MIT projects surrounding DSpace, OpenCourseWare, and the application of XML and the Semantic Web to the creation, storage, sharing and use of electronic educational resources.
Stefano's project, SIMILE, is an application of Semantic Web technologies to create a faceted browser and other sweet tools to search disparate and distributed repositories of content.
The two projects I work on, OpenCourseWare and CWSpace are applications of XML to the organization and storage of educational resources to facilitate sharing, retrieval and storage. I'm firmly in the XML camp and have expressed to him the viewpoints he presents from there. I'm still not sure I get the Semantic Web, but this is as good an introduction as you're going to get. Personally, I'm excited by N3
May 11, 2004
Games for Infogeeks
Some sneaky librarian type got the idea that they could get average joe web user to help classify images. So they went about turning it into a game. Somehow, they managed to make indexing enjoyable! I've found the game to actually be pretty good practice for my job. Scary, eh?
And if you're ever stuck and don't know how to describe an image, check out this guy's Topical Word Lists (via boingboing). You know, just in case you needed to call someone an ailurophile.
And just because I know that all this indexing will make you hungry. Here's how you can get your unix on and your grub on. Have a Pizza Party via a command line pizza orderator (via boingboing). No geek should be without it. I guess Dominoes has figured out its market. After a long day of shell scripting you just gotta have some pie.
May 03, 2004
Why You Should Fall to Your Knees and Worship a Librarian
I just created a blogroll (check the menu bar on the right). FYI, a blogroll is a list and links to other blogs that I currently read through my RSS aggregator (bloglines).
While constructing the roll I came across this site, so I added it and thought I give it its own post. Enjoy.
The Librarian Avengers (via The Shifted Librarian)
Addendum: I've used Wikipedia to provide encyclopedia references for unfamiliar terms in this entry and am going to try to continue this practice in future entries. Wikipedia is cool (you can find out about it at its entry about itself). Especially check out its self-healing systems.
April 23, 2004
The MRML
Just in case you aren't as afraid of The Info-ninja as Michael Moore, an advance release of what we've currently got in development. You know, they've got four more detectives working on the case. They got us working in shifts.
--------------------------
A Guide to MRML
The Mind Reading Markup Language (MRML /mur'mul/) is a proprietary
extension of the HyperText Markup Language. This document, all MRML
tags, and any ideas you come up with while reading this information are
the exclusive property of the authors. This is an open specification
that
will be expanded as mind control technology is refined.
MRML tags can be embedded into any regular HTML document. They are
completely invisible to all browsers. No one will ever know you are
using them.
NOTE: MRML is not case sensitive. <fraud> is equivalent to <Fraud> or
<frAUD>.
Basic Markup Tags
The following MRML tags are used to read the client's mind for certain
kinds of thoughts and emotions about the contained text.
<BRAINSCAN>
Brainscan performs a light scan of the client's thoughts which may
include perceptions of their current environment. The Brainscan tag is
an
invaluable tool for establishing a user's identity as it is much more
reliable than checking REMOTE_HOST or USER_ID variables.
<THOUGHTSUCK>
Thoughtsuck performs of deeper scan of the client's thoughts which may
included details of significant events within the past 24 hours.
<DEEPFEARS>subject</DEEPFEARS>
Probes the client's mind for their fears about the contained subject.
<DEEPERFEARS>subject</DEEPERFEARS>
Probes the client's mind for their deeper, more repressed fears about
the
contained subject.
<DEEPESTFEARS>subject</DEEPESTFEARS>
Probes the client's mind for their deepest, most repressed fears about
the
contained subject. WARNING: This tag cannot be exported outside the US
borders.
<SEXTHOUGHTS>subject</SEXTHOUGHTS>
Probes the client's mind for sexual thoughts about the contained
subject. Use of the Sexthoughts tag will likely be deprecated with the
release of the Freud and Jung specifications.
Freud and Jung Tags
Specifications for Freudian id, ego, and superego tags and Jungian
symbolic tags are not currently available.
The HYPNOTIZE Tag
Current technology enables very primitive mind control using MRML tags.
The so-called Brainwashing tags are delimited by the special
<!--HYPNOTIZE><HYPNOTIZE--> pair. Suggestions within the Hypnotize area
are completely invisible to all clients, but it is extremely important
that the tags are placed correctly. You are liable for any mental
damages
inflicted by improperly placed tags.
HYPNOTIZE Area Tags
Within the <!--HYPNOTIZE></HYPNOTIZE--> area the following tags may be
used:
<SUGGEST>text</SUGGEST>
Used for mild, easily acceptable suggestions. For stronger suggestions
use
PROGRAM and BELIEVE. Some examples:
It is warm for this time of year.
You need to upgrade your computer.
Republican policies aren't all that bad.
<MEME>text</MEME>
Information that you want the client to pass on to friends and
correspondents.
Some popular memes:
Chain letters
Urban legends
Cool ascii graphics
<FORGET>text</FORGET>
Things you want the client to forget. It may be desirable to have the
client forget the URL of your MRML documents.
<PROGRAM>text</PROGRAM>
Programs the client with a strong post-hypnotic suggestion. There are a
few optional arguments. <PROGRAM TIMES=3 INTERVAL="1 hour" DELAY="2
days">
will trigger the suggestions three times at a one hour interval two
days
hence. <PROGRAM INTERVAL="after every meal" FOREVER> could trigger the
suggestion brush your teeth for an indefinite time.
Within the <PROGRAM></PROGRAM> area the following tags may be used:
<BELIEVE>text</BELIEVE>
Explicit thoughts to be planted in the client's mind. Beware of
contradictory programming! Try to remove previous conceptions before
reprogramming.
To reprogram someone that thinks that Pepsi is better than Coke:
1.<BELIEVE>You have no opinions about the relationship between
Coke and Pepsi.</BELIEVE>
2.<BELIEVE>Coke is better than Pepsi.</BELIEVE>
3.<BELIEVE>You are thirsty.</BELIEVE>
<BUY HREF=url>
Encourages the client to buy products on the World Wide Web. The
optional ITEM=item argument may be used.
<PASSWORD=keyword or phrase>
A convenient way to access the client's mind for future programming
sessions. Should only be used with a secure client-server connection.
Clarification: The BLINK Tag
The <BLINK> tag used in many popular browsers is not a MRML specified
tag.
Running Your Own MRML Server
The software needed to run a MRML-oriented server is freely available to
all persons and institutions. To obtain a copy of the server software a
representative for you or your company must attend a special 3-day
training session where they will be given the MRML software package.
These
training sessions are currently being held twice a month at an
undisclosed
location; requests for private training sessions will be considered.
April 13, 2004
Metadata Rules! (And so do nerds!)
Just a little model of my favorite metadata scheme, the IEEE Standard for Learning Object Metadata, or 1484.12.1 (Click on picture)
Check out its majesty.
Man, so cool.
April 10, 2004
From the Uber-nerd files: Metacrap
Just call me The Vengeful Info-Ninja.
I'm sorry, you're just gonna have to read the article to find out why.
The article is Metacrap, by Cory Doctorow, co-administrator of boingboing
Yes, I understand that I'm devoting an entire entry to people and articles that don't share my belief in the value of what I do. Our conflict lies at the junction of the public web (very shallow) with the "deep web" (huge silos of data, datasets, journal articles, learning objects). These articles discuss the application of metadata to the public web, where Google has correctly chosen to leverage the semantic connections (link choices) of its users because the lowest common denominator rules. If porn comes up highest on a search its because that's accurately a reflection of what the majority of people are searching for with that search string. I'd be willing to bet that Google has to significantly change its approach as it gains access to academic and similar repositories of content resulting from intellectual inquiry.
Other sites of interest:
Rawbrick on the Semantic Web (One) (Two)
Caveat Lector on the Semantic Web (One) (Two) (Three)
Clay Shirky on The Semantic Web
Finally, The Semantic Web
March 26, 2004
Update!
I've been having trouble staying afloat of all the good stuff I want share with you. So I've decided that I need to spend some time talking about that trouble.
So I have no idea why I'm having so much trouble. True to form, I just decided to throw some organization at it.
There are three major messages that I'd like to share with you.
- Damn it feels good to be an uber nerd. A sort of brain dump in which I compare my information dissemination project with other knowledege organization programmes and provide a window into my ridiculously over-organized mind.
- Let's all wear blank badges. In which I try to convince everyone to join a routing list of my complete collection of Invisibles comics. Seriously, many of the major brain dump factoids originated as counter-cultural references in this comic.
- Giving you your very own Personal Digital Library. In which I detail my efforts through the website, implementation of cocoon and savvy use of xml and xslt to develop a personal digital asset manager. Wouldn't you all like a quick and easy way to electronically manage your music, movie and book collections?
Okay, so maybe the trouble is that these major themes are quite personal and important to me, and the required effort per syllable to publish them properly is discouraging. You know me, "if it's not worth doing right (read perfectly) its not worth doing." So to counteract this inertia, I've tried to break the info down into its absolutely discrete units, then I'm just going to throw them up and let you sort through them yourself. The only categorical organization to this sub-publication of metametametadata will be through the titles of the entries. Otherwise the existing categories will be applied, but I'm not making categories for the three major themes.
Ok. Let's get started.
Here's the link to that discrete organization I was talking about. Bet you didn't even know the log was having so much difficulty as to warrant the organizational effort.
So how many times can I say so in a blog entry, it's kind of like how it's really, really annoying when public speakers say "um" all the time, but we totally need them to say it, so that we can catch up with their ideas, and they totally need to say it, to allow time for their brain to send coherent linguistic organizations of those ideas to their mouths. We really, really need it in coversation, but when you listen to a public speaker, or worse a radio dj, go um, um, um it drives you mad. Anyway it's so annoying. And what's up with the two different spellings of Oh-Kay? Something's wrong with my brain today. I need to like, defragment.
March 18, 2004
LOOOOLA, LO-LO-LO-LO-LOOOOLA
And, Learning Objects at Wesleyan.
Sorry, but my professional life is creeping into the log with this entry. The above image is a link to the best presentation of what I do professionally. I support programs like the Learning Activities/Learning Objects Repository at Wesleyan University. LOLA is an attempt to create a "refertory" of digital teaching resources. Refertory means that the project does not store the objects on its own servers. Rather, LOLA is just a catalog of metadata records with links to the actual objects at individual instructor or institutional websites. It's a clearing house where professors can find teaching material, share the way they make particular use of objects, and evaluate the object's effectiveness in the classroom.
For LOLA I provided an implementation of IEEE Learning Object Metadata (LOM). This meant picking a set of elements that completely and accurately describe an "Object," and another set that describes an "Assignment." Through conceptual models, data models, xml bindings and definition of best practices for metadata creation, I defined all possible relationships between "Objects" and "Assignments," and designed user interactions for creating a LOLA object (entering metadata), searching the catalog of LOLA objects, and providing feedback on LOLA objects in the form of assignments. Here's a picture of the Data model I provided to give you a better idea.
In the data model, the two big circles define the two types of metadata records, objects and assignments. The gray sausage looking things are individual metadata elements or fields in those records, and the green squares represent user interaction screens (webpages) where visitors enter catalog information or view that same information. So a LOLA object to the user is a collection of five webpages (including assignment) that provide useful information about the digital resource. The metadata elements in the System interaction screens are provided programmatically by the LOLA system behind-the-scenes; the user never sees them.
The cool thing (to me, anyway) about LOLA is that an object in the "refertory" is neither a physical thing, nor even a digital thing. It's just an instance of metadata (including a URL) that points to a thing!
Kinda explains why I call this weblog metametametadata.
February 28, 2004
More on the Music Industry
Links to a couple of articles from John Dvorak at PC Magazine.
Ode to Napster, Music's Last Hope
I can provide a data point to verify Dvorak's conjecture regarding the real reason for the downturn in music sales.
During the Napster and Kazaa heyday I spent $40/month on music. I learned about all the music I bought from these file sharing programs. I then went and bought cds because I didn't like the quality of what was available, and I couldn't trust the information associated with the song. I used to find a song and then verifying the album and artist info at Amazon (to check all the possible remixes). Sometimes I would then purchase the album from Amazon. When the RIAA started suing people, I stopped using all file sharing software and subsequently stopped buying music. I can't remember buying a single music album in the entire calendar year 2003.
Recently, the legal digital music sellers have actually produced a service worthy of interest. I use Itunes. Why? The interface. Apple just makes things that are easy and fun to use. But the selection at the music store sucks. Despite the limited availability of stuff to download I still find music that interests me. How? The shared music feature. People on a local area network can share the contents of their Itunes libraries with each other. This is exactly what Napster was, access to music other people burned of their purchsed cds. Smart move Itunes. Living in college dorms I learn of groups like Manu Chao and then find myself back in brick-and-mortar music stores, back on Amazon looking for more information and buying cds again. I've spent more in the last two months on music than I have in the last two years. All thanks to Itunes, thanks to the sweet interface and the reasonable allowance of music sharing.
For those of you who follow this log on a regular basis, I'd like to tie this explanation to two earlier discussions, Do You Live the Dijalog Lifestyle? and Grey Tuesday. The problem is one of information access, something with which those of us living the digital media lifestyle are intimately familiar. The big five labels of the RIAA have established channels for disseminating information via hype and know what's going to happen when they rev up their marketing machines. This creates a controlled situation in which they can make their money. They're slow and unwilling to recognize and take advantage of new mediums for the dissemination of information (which file sharing really is, it's not primarily about dissemination of product) because they lose control. They have a harder time hedging their bets that new talent will sell. Hence, the ridiculous response to DJ Dangermouse's Grey Album. If they were to take the money and run, they would be admitting that money could be made without the need for the bloated production, development and marketing budgets that allow artists to make $1 per cd that costs the consumer $18.
It's just a sad, sad state of affairs when Apple knows the future of the music industry better than the RIAA and builds a compelling, revenue generating product. At least the RIAA can be dragged kicking and screaming into the digital music world. Now if they would just stop the completely pointless exercise of suing people for helping them generate profit.
February 27, 2004
Do You Live the Dijalog Lifestyle?
By now it should come as no surprise to all 12 of you that regularly read these pages that I pay attention to what goes on at xml.com.
And now it seems xml.com is paying attention to what goes on at metametadata.net (with a little help)
Kendall Clark has begun a new column in which he seems to have drawn a perfect picture of me and wants to discuss it in depth.
I was even moved to comment on the column and steer folks back here to share in the metametameta craziness.
Reading the column I'm reminded of Yellow River's obsession with his ipod and the stacks and stacks of cds he's got all around his desk.
February 11, 2004
The Website
Two posts in one day! It's like a record.
Okay so I totally redesigned the website. I know, you're all like there's a website? Isn't this it?
No ladies and gentleman this very much is not it. This is just a part of it.
Metametadata is my website about things in my life. The weblog is quickly (and enjoyably) becoming a collaborative effort.
Since metametadata captures information about the record capturing information about the object, and I am that record, I think that this log should from now on be called metametametadata, to recognize that it is capturing information about the capture of information about the record about the object, follow? Knew you did.
This way the log can be our thing and still a part of my thing.
So that's its name.
At the website check out all four sections. At the contribute page notice that kenj and I are listed. If you too would like to be listed as a contributor to the site (and anyone who has made the least comment is already a contributor), then you need to think up a cool description for the date. For example: This date represents the last time x bathed.
You'll all get to be validators.
Cool, eh?
The only links on the site that shouldn't work are the booklists on the identifier page. Those are next up to bat and will be blogged about, then its a note about making a derive, then a note about the metametadataschema (once I fill in that section).
Okay kids, knock yourself out. And while your at it check out the website I built for work.
February 03, 2004
A Public Service Announcement
From the "I really don't have enough ways to waste my time" newsdesk.
--Some websites/blogs that I follow and an introduction to RSS.
RSS (Real Simple Syndication) is metadata scheme that defines a standard way to format the contents of a website or weblog (using xml) so that any number of different programs can access and understand this content. When RSS is universally adopted by the publishers of websites and weblogs, a program that can read RSS will be able to find the content of any site/log, get that content and republish it however it wants.
Programs that read RSS (called syndicators or aggregators) come in the usual two flavors. They are either web-based and client-based. The difference is just in the display medium. Web-based programs transform the xml into html. Client programs have their own graphical-user-interface(GUI), into which they plug the xml content.
Most well-run blogs (including this one) offer an RSS version of their content. To see this blog's RSS, just click on the link "Syndicate this Site (XML)". If you look at the rdf for this weblog, the rest of my website will start to make more sense. I'm applying xml to format the display of information, in addition to its intended use to format the organization of information. You can make your own decision as to how successful I am.
Once you install a syndicator, or sign up online for a web-based aggregator, then all you have to do is travel once to all the blogs you want to follow (including this one) and gather the http addresses for the RSS xml (you can cut and paste this from the address line). Then whenever you run the syndicator it will read these files (which are automatically updated by each blog's content management system) and prepare a list of the current entries/articles.
In the future when we all have mulitple personality disorder and our own reality tv show, everyone will be able to speak the Advanced Really-simple Syndicated Expression language (ARSE). Implants will enhance our ability to code our thoughts in ARSE, allowing us to publish our unvarnished (or varnished) mental states (with video!) straight to the infernet where we will all use our ability to speak ARSE to aggregate each others thoughts by sorting through the noise on the radiation band specifically designated for mass telepathic communications.
We're talking everyone knowing what everyone else knows in real-time, the collective subconscious, learning through osmosis.
Were all gonna be syndicated, baby.
In the meantime, here are some aggregators that will have to do. I like Amphetadesk.
--In the newly established Metametadata tradition, the following are not links to the actual aggregators, but links to lists of links to the aggregators (you know, for context). Some of the links are other blogs that you may want to aggregate. Isn't aggregating fun!
- http://www.hebig.org/blogs/archives/main/000877.php
- http://blogspace.com/rss/readers
- http://www.ourpla.net/cgi-bin/pikie.cgi?RssReaders
- http://rss.lockergnome.com/resources/
Or, how about articles about aggregators?
- http://www.wired.com/news/infostructure/0,1377,60053,00.html
- http://www.onlinemag.net/nov02/OnTheNet.htm
And now for the blogs/feeds I aggregate.
- Gothamist -- http://www.gothamist.com/index.rdf
- Itunes 10 New Releases -- http://ax.phobos.apple.com.edgesuite.net/WebObjects/MZStore.woa/wpa/MRSS/newreleases/limit=10/rss.xml
- Lockergnome's Technology News -- http://www.lockergnome.com/lockergnome.xml
- Montague Institute Review -- http://www.montaguelab.com/digest.xml
- Broog: Alien Film Critic -- Despite (because of?) his vast alien intelligence Broog does not syndicate his criticisms, you must travel to his webpage to be brainwashed (and eaten?)
Librarians, these folks do it way better than I do at the moment.
- Rawbrick -- http://www.rawbrick.net/remaindered.xml
- Bookslut -- http://www.bookslut.com/blog/index.rdf
- The Shifted Librarian -- http://www.theshiftedlibrarian.com/rss.xml
And you absolutely have to go to this site, bookmark it, learn it, live it, love it, you know.
January 12, 2004
Metametadata.net (cont.)
Okay, so the site is created using a big table, which isn't ideal. Ideally instead of faking the xml, each bit of content would just be an xml file, or better yet rdf. Then I'd just use an xslt to render the page in xhtml, but that's in phase two.
But that's enough about the site. If you want to see how a proper personal weblog/website is done by a professional librarian, check out rawbrick.
January 11, 2004
Metametadata.net
So I've redone the site, here's the link. It's still a work in progress, but already it's gimmicky, geeky and it breaks every rule of good user-oriented design (well maybe not every rule).
I hope at least, that you all have a chuckle at the use of xml tags as a display mechanism.
January 08, 2004
First entry
To state the painfully obvious, this is the first entry of this journal and a test of its functionality.
It is still very much a work in progress
My goal in writing here is to record, admittedly for myself, interesting bits of data that I encounter. By interesting I mean those bits that seem to want to cohere into things called information. I intend for this space to supplement and organize the notes I keep in a Moleskine.
I call this log metametadata because I am a digital librarian and a metadata specialist. I provide data about data every waking minute of my life. Hence, the Moleskine, then www.metametadata.net. It seems appropriate to me that I at least attempt to make use of the latest telecommunications technology in my lifelong pursuit of that most elusive of objects, knowledge.
In other words I am trying to find out what the hell is really going on.
I hope that librarians get the joke in this log's title. There is a widely used metadata scheme for educational materials, appropriately called the Learning Object Metadata (LOM) Standard (IEEE standard 1484.12.1). This scheme consists of metadata elements in nine categories, the third of which is "Metametadata" and has the following explanation in the standard:
This category describes this metadata record itself (rather than the learning object that this record describes).
This category describes how the metadata instance can be identified, who created this metadata instance, how, when, and with what references.
NOTE:--This is not the information that describes the learning object itself.
