AW: Compressed Pristines (Summary)
From: Markus Schaber <m.schaber_at_3s-software.com>
Date: Mon, 2 Apr 2012 09:30:38 +0000
First, thanks for your great summary. I'll throw in just my 2 cents below.
> Von: Ashod Nakashian [mailto:ashodnakashian_at_yahoo.com]
Was any of those tests actually executed on a file system supporting something like "block suballocation", "tail merging" or "tail packing"?
Today, I was rather surprised that my pristine subdir of one of our main projects which contains 726 MB of data has an actual disk size of 759 MB, which leads to an overhead of less than 4% due to block-size rounding. (According to the Explorer "Properties" dialog of Win 7 on a NTFS file system.)
AFAICS, "modern" file systems increasingly support that kind of feature, so we should at least think about how much effort we want to throw at the "packing" part of the problem if it's likely to vanish (or, at least, being drastically reduced) in the future. My concern is that storing small pristines in their own SQLite database will also bring some overhead that may be in the same magnitude of 4%, due to SQLite Metadata, the necessary primary key column, and indexing.
Additionally, the simple and efficient way of storing the pristines in a SQLite database (one blob per file) also prevents us from exploiting inter-file redundancies during compression, while adding a packing layer on top of sqlite leads to both high complexity and a large average blob size, and large blobs are probably more efficiently handled by the FS directly.
To cut it short: I'll "take" whatever solution emerges, but my gut feeling tells me that we should use plain files as containers, instead of using sqlite.
The other aspects (grouping similar files into the same container before compression, applying a size limit for containers, and storing uncompressible files in uncompressed containers) are fine as discussed.
I'll try to run some statistics using publicly available projects on an NTFS file system, just for comparision.
: http://msdn.microsoft.com/en-us/library/windows/desktop/ee681827%28v=vs.85%29.aspx claims tail packing support for NTFS. http://en.wikipedia.org/wiki/Block_suballocation claims support for BtrFS, ReiserFS, Reiser4, FreeBSD UFS2. And AFAIR, XFS has a similar feature. Sadly, Ext[2,3,4] are not on that list yet, but rumors claim that Ext4 is to be replaced by BtrFS in the long run.
-- ___________________________ We software Automation. 3S-Smart Software Solutions GmbH Markus Schaber | Developer Memminger Str. 151 | 87439 Kempten | Germany | Tel. +49-831-54031-0 | Fax +49-831-54031-50 Email: firstname.lastname@example.org | Web: http://www.3s-software.com CoDeSys internet forum: http://forum.3s-software.com Download CoDeSys sample projects: http://www.3s-software.com/index.shtml?sample_projects Managing Directors: Dipl.Inf. Dieter Hess, Dipl.Inf. Manfred Werner | Trade register: Kempten HRB 6186 | Tax ID No.: DE 167014915Received on 2012-04-02 11:31:18 CEST
This is an archived mail posted to the Subversion Dev mailing list.