Thunderbird kung-fu: querying the index for messages

Note: the following is not for the fainthearted.

Problem: you are not satisfied by the built-in search user interface of Mozilla Thunderbird 3.1, and you need to extract some informations as raw data.

Solution: use the internal sqlite database created by the Thunderbird search engine.
Open a shell, and type (here ehr3bn8y is the name of my profile, your will be different):

sqlite3 ~/.thunderbird/ehr3bn8y.default/global-messages-db.sqlite

now let’s explore a little bit the format of the database:

.tables
.schema messagesText
.schema messagesText_content

Here is the list of the tables:

while this is the format of the messages table:

CREATE TABLE messages (id INTEGER PRIMARY KEY, folderID INTEGER, messageKey INTEGER, conversationID INTEGER NOT NULL, date INTEGER, headerMessageID TEXT, deleted INTEGER NOT NULL default 0, jsonAttributes TEXT, notability INTEGER NOT NULL default 0);

and this is the messagesText_content:

CREATE TABLE 'messagesText_content'(docid INTEGER PRIMARY KEY, 'c0subject', 'c1body', 'c2attachmentNames', 'c3author', 'c4recipients');

Note that the date field in the messages table is in microseconds elapsed from the Unix epoch, to convert:

date = DATE(1970;1;1) + unix_epoch_ms/86400000000
unix_epoch_ms = (date - DATE(1970;1;1)*86400000000

for example this query will extract to the file authors.txt all the people who sent me mails from January 1st 2009 until today:

.output authors.txt
select distinct messagesText_content.c3author from (messages inner join messagesText_content on (messagesText_content.docid = messages.id)) where messages.date > '1230768000000000';

Enjoy !

P.S.: some tables such as messagesText etc. are virtual, and if you try to query them with sqlite3, you’ll get the error “Error: unknown tokenizer: mozporter“. The workaround is suggested here (you may need to change the number 18B785B7 to whatever you get from the 2st command):

SELECT hex(fts3_tokenizer('porter'));
SELECT fts3_tokenizer('mozporter',X'18B785B7');