Strange Strands

13 Sep 2006

Taxonomy of Documents

I have a lot of notes, essays, articles, specifications, designs, ideas, drafts, code, thoughts, and scribblings that I haven't yet published because there's no suitable framework for doing so. The problem is organising them: I don't like to mix pieces of writing that are on a) different subjects, and b) in various stages of completion. Whenever I've tried to fix this before, I've usually tried to divide things up into as few directories as possible because of my dislike of classifying things. But now I think a much more discrete form of classification is necessary, having a lot of directories for a lot of files, and so I'm starting to work on getting a sense of the most important characteristics of this morass of documentation.

So far I'm classifying along the following guidelines: a) whether the input is in Text or HTML, b) the quality of the input (on a scale from * to ****), c) whether the content is about technology or about other things. There's also a separate couple of factors: i) very rough drafts which don't come into this taxonomy, and ii) essay series that are loosely connected but have a more central driving theme. So far I've basically been coming up with directories that fulfill each of the possible criteria: Text or HTML? *, **, ***, or ****? Technology or Other? Some of them overlap, so it's nice seeing which barriers I don't mind breaking; and others I've been keeping staunchly separate. One interesting thing is that I didn't create any superfluous directories yet.

One test of whether this classification works is whether existing documents fit into it well, and so far it seems that they do, but of course there's always the question of where you put a document that fits into more than one of the categories that you've defined, or one which shifts categories. Same old problem. For example, I had defined a directory for archiving pretty low quality old notes on particular subjects ("topic"), and a directory for higher quality encyclopaedic entries ("about"), until I realised that two of the articles that I'd pegged for the latter—an X-SAMPA table, and some notes about the songs of Robert Johnson—were actually more suited for the former, except I'd been using that as a kind of archival place, to put some existing stuff that didn't really fit elsewhere. It's a bit like the adverb of my classification scheme.

I'm also a little unsure about mixing Text and HTML, which always seems like a shoddy thing to do, but is absolutely necessary for a specifications directory, since most specifications are in HTML, except for Internet-Drafts and RFCs, which are in Text. And of course there's the possibility for PEPs and the like, which might even need yet another category.

Until this is worked out to a high degree of satisfaction, I can't publish any of this stuff, so it's fairly important. I've only really just started to figure out this very broad and discrete classification system; the only other part of it that I'd worked out to a medium degree of satisfaction is a kind of dated directory ("YYYY") thing for all the roughest of the files, the bits of ephemera that I accrue from shell program output and so on. That kind of stuff might be publishable long before any of the more polished stuff, which makes sense because in being less polished I have less to worry about in the manner of its publication.

Strange Strands, Taxonomy of Documents, by Sean B. Palmer
Archival URI: http://inamidst.com/strands/doctaxon

Feedback?

Your email address:
inamidst.com