Changeset

2854:687b19cad4f5

mod_storage_xmlarchive/README: Add description of how data is stored
author Kim Alvefur <zash@zash.se>
date Thu, 28 Dec 2017 22:30:56 +0100
parents 2853:a844d1535c4d
children 2855:7713cd4fff8f
files mod_storage_xmlarchive/README.markdown
diffstat 1 files changed, 24 insertions(+), 0 deletions(-) [+]
line wrap: on
line diff
--- a/mod_storage_xmlarchive/README.markdown	Fri Dec 08 21:14:10 2017 +0100
+++ b/mod_storage_xmlarchive/README.markdown	Thu Dec 28 22:30:56 2017 +0100
@@ -63,3 +63,27 @@
 Where `$DIR` is `to` or `from`, `$STORE` is e.g. `archive` or `archive2`
 for MAM and `muc_log` for MUC logs. Finally, `$JID` is the JID of the
 user or MUC room to me migrated, which can be repeated.
+
+Data structure
+==============
+
+Data is split in three kinds of files and messages are grouped by day.
+Prosodys `util.datamanager` is used, so all special characters in these
+filenames are escaped and reside under `hostname/store` in Prosodys Data
+directory, commonly `/var/lib/prosody`.
+
+`username.list`
+:   A list of dates in `YYYY-MM-DD` format.
+
+`username@YYYY-MM-DD.list`
+:   Index containing metadata for messages stored on that day.
+
+`username@YYYY-MM-DD.xml`
+:   Messages in textual XML format, separated by newlines.
+
+This makes it fairly simple and fast to find messages by timestamp.
+Queries that are not time based, but limited to a specific contact may
+be expensive as potentially the entire archive will be read.
+
+Each archive ID is of the form `YYYY-MM-DD-random`, making lookups by
+archive id just as simple as time based queries.