Notes & Todos: The PIM Saga

In 2003, I spent a great deal of my development time chasing the Personal Information Management furphy. I started out (Phase I) working on rudimentary applications for taking notes: the first, b.py, was a rudimentary shell-like interface for taking notes that worked surprisingly well. The second, n.py, was an RDF API backed extension of b.py, which ultimately led to the redesign of my Eep line of RDF software and hence onto Pyrple. These first two notes programs were very rudimentary in their data structures, and focussed on simplicity as much as possible, centered around my day-to-day use case.

As time went on, however, I started to develop new subtle approaches. I have two documents from that period (Phase II) which I reproduce here: they're putative essays on the direction that I could've taken for b.py and n.py's successor. The first document, General Data Structure, discusses interface, backend, searching, diff streams, and categorization all very briefly. The second document Small Python Program Design Notes, tries to investigate a more intuitive and object oriented method for scribing information.

Phase III consisted of divergencies from the line-based "suck information in" manifesto, and resulted in several applications such as Through-The-Web-Editing wikis, the blogbot/noets weblog powered from IRC (and subsequently jotbot the next year), simple lexical scanners, &c. Eventually I realised that what I was mostly searching for was a way in which the computer could organise the information I enter into it for me based solely on its inherent properties, but that that's impossible. What would be better is a way to entice me to record simple information more, and think about how it ought to be structured given the limitations of current storage systems. As William Loughborough said, effectively signalling the completion of Phase III: "The clutter is inherent to the organism!"

In September 2003 I published Opus 59 (Are You Taking Notes?) and Opus 61 (More Metanotes), which briefly discuss some ideas along the same kind of lines as the Phase II stuff.

For published Phase III thinking, there's the noets.py source, but most especially the RDFe - Schema-Aware RDF Editor ideas of taking as much burden off of the user as possible. Apart from those two main programs and the Through-The-Web-Editing wiki, I'm rather embarrassed by most of this stuff now. Not that it was a waste of time, since in the progress I investigated quite a few existing notes programs and approaches, but information organisation is not a simple thing and depends on many factors. For example, I still find taking notes on paper much more expressive (and there's a large thread on that at c2 Is Anything Better Than Paper?), but it's not greppable. I still use open windows and logs as todo items, yet tav thought it necessary to build in a hundred ranking/dating/status features when I challenged him to add a todo function to his IRC bot in under an hour—and of course he failed because of that overcomplexity. And one of the primary motivational factors for taking down the information in the first place is in whether other people are interested and can contribute to furthering the idea, which is very much a social problem and not a technical one.

(The following design document is from 2003-05, and Phase II of the general notes saga.)

General Data Structure

The notes program consists of a GUI interface, so as to be able to maintain state, though a command line alternative should probably be provided too. wxPython should be used for the GUI, and it should be made accessible.

In general, this notes program works by taking data input from the user in a similar method to a bash command line or an HTML form: you type or paste the data, and then press enter to submit it and it is stored by the system. Submitting new information changes the state of the system. As to *how* the information changes the state, this is controlled by the input parsers. The input parsers are basically a series of customized regular expressions working on datatypes.

When an item is entered, it is parsed according to the regexps, which map onto a function in the actual API. The most common of these will be @add, which will add an item to the store.

Items

An item is simply a set of binary data with a single subject. It could feasibly be exported as RDF, but it is instead stored internally as a more efficient data structure. The Python equivalent of these data structures are basically dictionaries. They're all property value pair things. For example:-

{'uid': 'sj9', 
 'content': 'this is a test note', 
 'timestamp': '2003-05-23T22:10:30Z'}

Each property/slot can be either an internally defined property, or a private externally defined ones (these start with an underscore: _). In the example, three public properties are used.

Each property must use one of the standard datatypes. Example datatypes are: shortdate, date, token, string, number, period delimited, comma delimited. Each of the datatypes will have various methods for parsing them. Like hopefully as much as possible of the program, the datatypes will be customizable and possibly even extensible, but it is expected that the defaults will be most helpful. Extensions would be difficult without messing about with the code.

Addressing: Searching

Items are located by queries/searches. For example, one might want to search for all items on 2003-05 with UIDs start with "s". This query would probably something such as:-

timestamp:2003-05*;uid:s* (";" delimited: must be quoted)

But these queries aren't just to enable one to search for data more easily: they actually address sets of items in the system. Therefore, this addressing technique can be used to provide metadata to items, for example.

Augmentations

Data needs to be changed, but destroying data is not good. Therefore, a diff format will be specified and one can use a special property to store the diff format. Hmm. And perhaps the diff property will be the only overwritable (actually: appendable) property.

{'content': 'blargh', 
 'diff': 'content/gh/gh!/;content/blargh/splodge/'}

Though this raises two questions: a) what happens when something like a UID is overwritten? What does it mean for a UID to get augmented? b) can diffs change themselves? What does that mean? So, perhaps diffs should only be allowed to modify the builtin "content" property.

Classes of Data

Items have various subclasses. It is required that each item have a uid (possibly the uid property should be preflected by the empty string) and a timestamp, so the base class is:-

:Item a rdfs:Class; :hasProp :uid, :timestamp .

Property ranges are specified in terms of datatypes. These should probably be fixed for the builtin properties... hmm. Oh, and the diffability of properties should be some form of metadata for those properties.

(The following design document is from 2003-06, and Phase II of the general notes saga.)

Small Python Program Design Notes

These are some design notes for a small Python note taking program for things such as rough notes, todo items, people's contact information, journal entries, and so on. Things are stored as items, which are basically described by triples. Identifers are all local, but can be mapped to URIs using the builtin :uri property.

Other builtin properties: :seq (all items must have exactly one), :uid (@@ not strictly necessary: the subject is the UID...), :timestamp (all items must have exactly one: this marks the time when the item was added to the database). Item is synonymous with Resource (for now), but with the above restrictions, making it a sub-class. Builtin properties and classes (:Diff, :Item, :Class, :Property etc.) start with colons. User defined ones must not contain a colon.

The :content builtin property is synonymous with rdf:value. It should be used for textual entries, which in turn will be the warp and weft of the program's data, usually.

Things are stored using the following format:-

:?<uid>:<version> :seq <seq>; :timestamp <timestamp>; other... .

The syntax for entry is...

<values>? ( <prop> <values> (';' <prop> <values>)* )? (':' <content>)?

The first value denotes the classes for the item. The :called builtin property can be used to give a user defined UID to an item, otherwise a generated hex number is used.

Values are comma separated. A value must not contain a single comma followed by a space: to quote a single comma followed by a space, use two commas. Same for semi-colon and space, and colon and space, since these are used to delimit property and values, and the content from the rest.

The content bit at the end obviously adds content as a :content property of the item being added.

:seq cannot be overrided. :timestamp can be overrided, but only one will be set, so the last one will be used. :content can likewise be set as a normal property.

Diffs are basically copies of the old version with augmentations. Any new property overrides any old value(s?) for the same property. To include previous old values, use `inherit, and to delete use `del.

The :meta builtin property takes a comma separated list of tokens that say stuff about an item. For example, using d to mark the item as "finished", and p to mark the item as "public".

Example of entries:-

Person, Friend name Bob B. Bobbington, Bobbington,, Bob; :called bob

Note: this is an example note

Todo for `bob; do splunge

Note that datatypes are all prefixed with the backtick, `, character. They are then parsed according to regexp. You can use long dates, short dates, UIDs, and so forth.

This might be better, though:-

bob :type Friend/Collegue
     name "Bob B. Bobbington", "Bobbington, Bob"

Note: this is an example note

Todo for `bob; do splunge

bob { type: Friend, Collegue; name: "Bob B. Bobbington", "Bobbington, Bob"; }
[] { type: Note; content: "... ugh"; }

or use any quote form that one wants, like in s!/!?! substitutions:-

Person, Friend name "Bob B. Bobbington", `Bobbington, Bob`; :called bob
Note: "this is an example note"
Todo for bob; do "splunge"

Person,Friend(bob) name "Bob B. Bobbington", `Bobbington, Bob`
Note: "this is an example note"
Todo for bob; do "splunge"

Person(bob) :class Friend; name "Bob B. Bobbington", `Bobbington, Bob`
Note: "this is an example note"
Todo for bob; do "splunge"

bob a Person Friend; name "Bob B. Bobbington" "Bobbington, Bob"
Note "this is an example note"
Todo for bob; do "splunge"

( class value ) | ( subj pred objts ( ";" pred objts )* )

Sean B. Palmer