April 2005 Archives

« March 2005 | Main | May 2005 »

April 26, 2005

The Time Traveler Convention

Spread the word, not through space, but time.

The event of the forever is Saturday, May 7th, at MIT!

For more info: The Official Time Traveler Convention FAQ

Remember, You only need one time traveler convention.

Posted by MetaMetadata at 12:40 PM | Comments (0) | TrackBack

NEASIST Spring Program Weblog

In the true spirit of metametadata, the producers present the following PSA:

The New England chapter of the American Society of Information Science and Technology (NEASIST) is preparing a spring forum on new social communicaiton tools.

Syndicate, Aggregate, Communicate:
New Web Tools in Real Applications for Libraries, Companies and Regular Folk
Tuesday, 3 May 2005, 9am-4pm
Providence College, Providence, RI

http://www.asis.org/Chapters/neasis/pc/programs/20050503.html

The NEASIST program committe has prepared an impressive line up of speakers to discuss all the latest cool toys (wikis, flickr, im, podcasting, weblogs). In support of the event the committee has also prepared forays into many of the tools including a flickr account and a weblog on this very domain! We will be podcasting all the presentations at the metametadata.net/neasist weblog!

Please go visit http://www.metametadata.net/neasist to learn more about the program and the tools.

For those of you scoring at home, this is a weblog entry about a weblog about a program about weblogs!

Posted by MetaMetadata at 12:11 PM | Comments (0) | TrackBack

April 20, 2005

Nerdcore Hiphop

This video is nerdlicious.

DJ Format featuring Abdominal & D-Sisive

And the mefi post responsible for bringing this to our attention (lot's more goodies inside).

Update: Even more NERDCORE (it's real hip hop, it's just smart)

Go here for more fun videos.

Especially look for "Vicious battle raps".

And for heaven's sake, check out MC Frontalot. Nobody fronts more than MC frontalot.

Posted by MetaMetadata at 11:11 AM | Comments (3) | TrackBack

April 08, 2005

Redefining the Role of Catalogers in the Age of the Semanitc Web

Last presentation of the day, and probably the most interesting (by which we (who's we?) mean the least practical (meaningful?)).

Slides available on web

Semantic Web

There's a new one coming through from the W3C, wait what the heck is the old one?

Machine-Understandable Data (as opposed to Machine-Readable Data)

Not machine storage, index and retrieval

But machines mimicing the operations of someone who understands what the data is.

Moving beyond the notions of word frequency in documents, link ranking and boolean queries.

Semantic Web depends on:

  • XML - eXtensible Markup Language
  • RDF - Resource Description Framework
  • OWL - Web Ontology Language


XML

XML allows data elements to be encoded according to their semantic meaning (not just their physical format like html). XML separates meaning from format, content from design.

XML provides access to the deep web, providing web-accessible structure for data contained in databases.

XML allows web pages to be both dynamic and cataloged. XML is a switching lanaguage, a translation language.

XML means multiple, overlapping markup standards.

RDF

RDF is an XML-based model for making [Aristotelian] statements about resources.

RDF is subject-predicate-object triples

"Resource A has an attribute with the value "B"

Subject = Resource A

Predicate = has an attribute with the value

Object = B

You'll notice that the above three identifiers are themselves triples (the predicate in this case is = (wait, this comment is a triple (you get the picture))

Assumptions behind RDF:

  1. RDF can be expressed in XML; it is not, however, dependent on XML [RDF just sucks in XML]
  2. Everything--not just websites--can be assigned a Uniform Resource Identifier (URI), not necessarily a Uniform Resource Locator (URL) (even people, ideas, emotions, values)
  3. Information organization works best when organized from the ground up rather than the top down.
  4. Information organization is a matter of making "statements" about resources that preserve context and make sense within the information system

[My problems with RDF

Can we make Aristotelian Statements "that preserve context and make sense?"

Is there an end to the recursion of triples?

Who controls the dictionaries?

Whe creates/controls the semantic equivilancies?]

A Semantic Web is really a web of triples

And looks like a web when you display it.

Subjects and Objects are Entities

Predicates are arrows

Predicates are the elements?

OWL

Ontologies define the predicates/arrows/elements

In CS equals a machine-readable expression of a shared conceptual framwork.

Defining meaningful entities and their relationships with each other.

Ontologies are usually expressed as a combination of a classification scheme and controlled vocabularies.

Ontologies are designed to link similar concepts in different namespaces.

Ontolgoies are designed to increase a search agent's ability to interpret data in a different domain.

Ontologies link namespaces via equivilancies.

Certain processes of inferential logic are possible upon properly prepared RDF encoded documents linked via Ontologies.

This logic leads to proof and trust.

Why should libraries care about the Semantic Web.

How libraries currently use the WWW.

Libraries are using the web (most prominently via OAI PMH) as an irrigation system for swift transfer.

Detailed metadata within an otherwise closed domain are simplified and made available to other places. These other places also simplify their more detailed and otherwise closed domain specific metadata.

What's it like outside the library?

  1. The world is full of experts
  2. The world is full of enthusiasts
  3. Human culture is saturated with complex, nuanced, important relationships

What can libraries bring to the table?

Information resource description, organization and access is what we do (we ask the right questions that no one else thinks of)

Information evaluation (collection development, reference)

We do not realize what a rich, sophisticated body of theory on bibliographic relationships [Ranganathan]

A speculation on what bibliographic representation might look like in the Semantic Web (see slides)

Separate truly local information from non-local bibliographic information, author information, work [I assume in the FRBR sense] information and related works.

Cataloger Repositioning

We shift our cataloging role from taking the time to write out the limited amount of information we are able to write to finding reliable information and linking to it via RDF. We would create/control the semantic equivilancies, based on dictionaries we trust (we're good at finding trustworthy information). We would control the semantic web because we would take the time to build it. [Does this subvert the ground up approach? Is it feasible?]

Our mind is full of ideas and these ideas depend on documents. The intensity of our ideas is dependent upon repeatedly encountering these ideas in documents. Things accrue meaning as you encounter them again, and again, and again.

What we do in libraries is make it possible to bump into texts again. We make it possible that you will encounter a document more than once, having seen something you like, you can put your hands on it again. Ideas can build upon each other in part because of the work in libraries. This role of the library will not go away in the near future.

What is the purpose of the catalog (Lubetzky)

To provide access to entites that accrue from the objects on our shelves

authors (and all their works)

works (and all their editions)

We can give to the Semantic Web (FRBR), an intellectual structure upon which we can build these meaniful semantic structures.

Work =>> Expression =>> =>> Manifestation =>> Item

Questions from the Audience

Can we really define the dictionaries/Do we really need to?

Two completing notions

Complexity and Emergence

Emergent Semantics

The Web came out of the scientific community enamored of the belief that:

If you want structure you have to be able to step back and let the interested parties duke it out.

Information Organization came out of the library community with a low tolerance of the mess that will accompany an emergent structure.

The $64,000 question is Can The Semantics Emerge?

Someone needs to do the study and find out: does meaningful, shared semantics emerge?

Complexity Theory from AI

Three situations posited.

1. A Change, nothing happens

2. A Change, all hell breaks loose

3. 3 A Change, A Shift, the community continues to cohere in a new location

Maybe the library community is the part that holds everything together in times of change. Maybe we need to introduce enough of the concern for order and consistency to mix well with the excitement of the new. We hold the line and make the Semantic Web useful.

Posted by MetaMetadata at 01:49 PM | Comments (1) | TrackBack

Wah Lum Kung Fu 35th Anniversay

Hosted by my school, Boston Wah Lum! If you're anywhere near New England on the 30th, please come and support Wah Lum Tam Tui Northern Praying Mantis Kung Fu. The number of master's demonstrating at this celebration is staggering.

Posted by MetaMetadata at 01:31 PM | Comments (0) | TrackBack

Managing the Metadata Morass: Applying Cataloging Skills Beyond the Traditional Catalog

Boston College Library Catalogers

BC Libraries is project oriented, even traditional cataloging (monographs and serials). They’re giving up being on top of the print, by shifting to looking at all of the initiatives, old and new, as projects and then prioritizing projects. Suddenly the electronic initiatives seem more important.

7 professional catalogers, 8 support staff catalogers

?What about training and adjusting staff to the new electronic projects? Baby steps, intense time consuming documentation for first project, then experience takes over by third project.

BC Libraries Technical Services is getting into consulting, working with other people’s data and providing a labor pool of copy catalogers.

Their philosophy is to maintain a seat at the table for all digital production/information organization projects, inside and outside the library. They don’t say no.

[Where are they finding the money to pay people to stay at the table and provide a labor pool? It’s good to see MIT Libraries are not the only ones interested in data outside the libraries collection and their movement to consulting validates the Metadata Services experiment, but we’re moving away from production to consulting only. Where are they finding the money to maintain any kind of meaningful level of production?]

Most of their services and databases are off the shelf (only the Electronic Resources Management database is a do it yourself project)

ERMdb (homegrown, web-based, perl, linux, mysql, for lib staff only, a info hub for other systems/databases generating reports)

In building the ERMdb they followed the DLF ERM best practices, but designed their own metadata schema and cross-walked it to the DLF ERM schema, also designed UI, functional requirements and workflow

DigiTool (Digital Asset Management System, an exLibris product, Oracle DB) It’s still in pilot mode, attempting to create a space for electronic objects that would be on the shelf if they were physical. Concerned with technical and preservation metadata. Concerned with Sustainability over time. Technical and Preservation metadata is intimately associated with descriptive metadata and the object itself. ?Whence the support and sustainability for the system?

We will continue to host the systems and servers they have now and be prepared to migrate the systems forward at the right time. They’ve committed to sustaining the collections, no matter how they make them available. So metadata mediated interoperability of description and storage is key.

Right now they’re focusing on cataloging, cleaning up records and ensuring the possibility of metadata transformation and exchange. DigiTool is incorporating EAD and METS (used for page turning). Of course one of the first projects was slides for professors.

The BC speaker confirms that jpeg2000 will embed metadata in its header file.

Different data sources create wholly different kinds of metadata. They describe objects in different ways and have different relationships with their objects. Catalogers are required to reconcile different approaches in one system like DigiTool.

The catalogers have a good seat at the DigiTool table ensure robust metadata creation, interoperability amongst multiple metadata schemes and incorporating multiple object formats.

The Cataloging Group has a workforce (how, why?)

Digital Commons (Institutional Repository, A Turnkey System, Remote, Out of the Box, a ProQuest product). A pilot is up for ’04-’05. Populated with dissertations. Still defining rules. What metadata? The cataloger provided the name authority control question. They have a full time cataloger to provide metadata for all objects in IR. Is the eScholarship Manger a cataloger? Also provided knowledge of OPAC and the connection between records in OPAC and IR.

?Who runs the IR?

Posted by MetaMetadata at 12:48 PM | Comments (0) | TrackBack

FAST: A Subject Headings Schema Designed for the 21st Century

Welcome to the OCLC presentation. [ed. note - Must find new introduction]

OCLC Research Objectives

  • Data Mining
  • Understanding Users

Research Areas of Interest to Me

  • Knowledge Organization and Semantic Web
  • Authority Control
  • Metadata Schema Transformation

FAST is a collaborative research project between OCLC, LoC, ALA/SAC. FAST stands for Faceted Application of Subject Terminology.

Google is the 21st Century

Latent Semantic Indexing and Page Rank

Library Catalogs are the 19th Century

LCC, DDC, LCSH, Card Catalogs, Controlled Vocabularies

Library catalogs technologies have been "digitized" and all are expensive compared to Google.

The question is which technology do we apply to get to the grey literature.

FAST is attempting to position itself as the answer, somewhere in between these two technologies and leveraging the positives from both.

FAST is a new approach to Subject Vocabularies

It's cheaper and easier to use than LCSH for electronic objects and compatible with a variety of the new metadata schemas

It is simple in structure and syntax

Usable by non-catalogers in non-library environments

Is designed for semantic interoperability

Is an adaptation of an existing schema

LCSH is the obvious choice for the usual reasons (established, supported), but sucks for cataloging electronic, web-delivered resources. It was designed for pre-coordinated card catalogs.

One of the problems is that LCSH has rules for creating new headings that aren't established. Of the more than 8.5 million distinct topical headings in WorldCat:

over 3 million of the headings are not established, but valid and used in multiple bibliographic records,

over 5 million are not established, but valid and used only once,

only 100K are established.

This is as ridiculous as it sounds. The rules are human derivied and hard to explain to a computer. They allow for a proliferation of headings, which defeats the purpose of grouping similar items under common headings.

FAST contains far fewer established topical headings (about 400K).

FAST normalizes the form of heading for machine encoding.

?Are FAST headings defined via RDF triples?

FAST will use the MARC 21 authority format.

FAST has 8 facets:

  1. Topical
  2. Geographic
  3. Form (Genre)
  4. Chronological
  5. Personal Names (Names as subjects)
  6. Corporate Names (Names as subjects)
  7. Conference Uniform Titles
  8. Meetings Uniform Titles

Fast will keep general or "X" subject divisions

[ed. - Genre lists are always inadequate]

FAST is till hierarchical, but loses specificity.

FAST facilitates both pre- and post-coordination.

FAST is available as a OCLC SiteSearch database

The authority file is in beta

Fast enables cool faceted search and browsing technologes (not semantic web).

Posted by MetaMetadata at 09:57 AM | Comments (0) | TrackBack

Conference Blogging

So I'm at another Information Science Conference, and guess what, I am making blogging.

Welcome to Metadata and Meaning: Creating the 21st Century Catalog brought to you all the way live for '05 by the New England Technical Services Librarians (NETSL). This is one of my favorite organizations, mostly becuase I like to pronounce their acronyms (thanks Walt!). Go ahead, try it ... NETSL.

Sharing a few notes with the world is the perfect way to fill the dead time between presentations while avoiding awkward conversations with your colleagues.

This is my second conference at which I'm blogging (the first was the '04 ASIST Annual Conference, #1, #2, #3, #4, #5).

I'm planning to volunteer to blog the upcoming Spring NEASIST program Syndicate, Aggregate, Communicate: New Web Tools in Real Applications for Libraries, Companies and Regular Folk. Conveniently this program will be on technologies like making blogging and the conference organizers are mad to apply as many of the technologies to the event itself.

What I'll be doing today.

1. Bringing you an important PSA

2. Updating a work blog (I can't link to it as it's on a testing server and not yet ready for primetime).

3. Keep running diaries of presentations I attend, recording anything interesting I hear.

Posted by MetaMetadata at 09:46 AM | Comments (1) | TrackBack