AW: Compressed Pristines (Custom Format?)
From: Markus Schaber <m.schaber_at_3s-software.com>
Date: Fri, 23 Mar 2012 16:49:45 +0000
Hi,
> -----Ursprüngliche Nachricht-----
Maybe xz (lzma2) is the algorithm to look at. It usually has a better ratio for cpu_usage/compression_factor, and decompression is nearly as fast as gz.
> I also like the fact that the pristine files are opaque and don't
So maybe a developer tool for (un)packing pristine archives should be created.
> Another point raised by Markus is to store "common pristine" files and
This point is seen to be independently of the compression of pristine store. Both a working-copy local and a common pristine store can profit from the compression.
> Sqlite may be used as Branko has suggested. I'm not opposed to this. It
Maybe a distinct pristine.db bettern than to put them in wc.db, but I'm not sure about that.
> My main concern is that
Usually, sqlite re-uses free space within the same database rather efficiently.
> The solution is to "vacuum" wc.db, but that depends on its size,
"svn cleanup" would be a good opportunity.
I just had another idea, we could store the metadata in the SQLite database:
In the wc.db, in the pristine table, store 4 rows[1]: filename, offset, length, algorithm.
"filename" denotes the container file name. Payload files are first concatenated, then compressed, then put into the container. Offset and length are byte-offsets in the decompressed bytestream. "Algorithm" denotes the compression algorithm, with one value reserved for uncompressed storage. If a container grows beyond a specific limit, a new file is created.
The main advantage of storing the metadata in SQLite is that we do not need to invent any new file format.
Some other positive aspects (some of them are clearly also possible using your original proposal):
- This allows to apply concatenation and compression orthogonally, on a container-by-container basis:
- By reserving a special length value (like -1 or SQL NULL) for "look at the file on disk", we can quickly upgrade existing working copies without touching the pristine files at all.
- "debuggability" is somehow given:
- As most current decompressors for gz and lzma transparently support the decompression of streams which are "first compressed, then concatenated", we could even try to exploit transfer encodings (like transparent gz compression in http) which might already deliver us compressed files.
The disadvantage clearly is that we need a few more bits when storing that metadata in the SQLite database, instead of in our own file. But in my eyes, this few bytes do not outweigh the overhead of inventing our own metadata storage format, including correct synchronization, transaction safety etc, which are already provided reliably by sqlite.
Best regards
Markus Schaber
[1] Plus the additional rows like ref_count, checksum etc., which are needed by svn, but are not of interest for this discussion.
-- ___________________________ We software Automation. 3S-Smart Software Solutions GmbH Markus Schaber | Developer Memminger Str. 151 | 87439 Kempten | Germany | Tel. +49-831-54031-0 | Fax +49-831-54031-50 Email: m.schaber@3s-software.com | Web: http://www.3s-software.com CoDeSys internet forum: http://forum.3s-software.com Download CoDeSys sample projects: http://www.3s-software.com/index.shtml?sample_projects Managing Directors: Dipl.Inf. Dieter Hess, Dipl.Inf. Manfred Werner | Trade register: Kempten HRB 6186 | Tax ID No.: DE 167014915Received on 2012-03-23 17:50:30 CET |
This is an archived mail posted to the Subversion Dev mailing list.
This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.