Re: Compressed Pristines (Summary)

From: Ashod Nakashian <ashodnakashian_at_yahoo.com>
Date: Tue, 10 Apr 2012 00:57:55 -0700 (PDT)

Hi Justin,

Sorry for the late reply and thanks for your notes. I should say right off the bat that the design doc is outdated in terms of what we plan to do as a first implementation, although as a proposal for packed file format I think it's still mostly valid, except for a few notes and improvements (such as 64bit filesize) that are missing/invalid (regardless of whether or not we'll ultimately implement it or not).

See my notes and comments inline please.

>________________________________
> From: Justin Erenkrantz <justin_at_erenkrantz.com>
>To: Ashod Nakashian <ashodnakashian_at_yahoo.com>
>Cc: Greg Stein <gstein_at_gmail.com>; "dev_at_subversion.apache.org" <dev_at_subversion.apache.org>
>Sent: Friday, April 6, 2012 10:19 AM
>Subject: Re: Compressed Pristines (Summary)
>
>On Wed, Apr 4, 2012 at 1:28 PM, Ashod Nakashian
><ashodnakashian_at_yahoo.com> wrote:
>> I feel this is indeed what we're closing on, at least for an initial working demo. But I'd like to hear more agreements before committing to this path. I know some did show support for this approach, but it's hard to track them in the noise.
>>
>> So to make it easier, let's either voice support to this suggestion and commit to an implementation, or voice objection with at least reasons and possibly alternative action. Silence is passive agreement, so the onus on those opposing ;-)
>
>I just read the Google doc - glad to see progress here - a few comments:
>
>First off, if I understand correctly, I do have to say that I'm not at
>all a fan of having a large pristine file spread out across multiple
>on-disk compressed pack files. I don't think that makes a whole lot
>of sense - I think it'd be simplest (when we hit the heuristic to put
>it on-disk rather than in SQLite) to keep it to just one file. I
>don't get why we'd want to have a big image pristine file (say a PSD
>file) split out into say 20 smaller files on disk. Why? It just
>seems we're going to introduce a lot of complexity for very little
>return.

The straightforward design is to have a single large pack file. But in practice this is very problematic. You can already find FSes that may barf on multi-GB files, but that aside, consider the overhead of removing prisitine files and shifting the bytes following it. The overhead is extreme. To avoid that, we need to track holes in the files (and incur the wasted space on disk) and (even worse) we need to do heavyweight lifting to accommodate new/modified pristines into these holes where they might not fit! In other words, we have to write a complex FS in a single file and we have to keep its size on disk small (to justify this feature!) and to do housekeeping as fast as possible (shifting GBs on disk because we have a largish hole at the beginning of the file has a cost).

My solution is to split the pack files such that each file is small enough to fit in memory and be written to disk in sub-second times. This way, 1) holes in these files can be avoided completely and swiftly, 2) even if we keep holes, they shouldn't/couldn't be too large.

> The whole point of stashing the small files directly into
>SQLite's pristine.db is to make the small files SQLite's problem and
>not the on-disk FS (and reduce sub-block issues) - with that in place,
>I think we're not going to need to throw multiple files into the same
>pack file. It'll just get too confusing, IMO, to keep track of which
>file offsets to use. (For a large file that already hits the size
>trigger, we know that - worst case scenario - we might lose one FS
>block. Yawn.) We can make the whole strategy simpler if we follow
>that.

I'm a bit confused here. You're assuming that we'll use Sqlite for small files and the FS for larger ones, I'm assuming. However that's not in the proposal, it's what we've agreed on on this list. We aren't going to implement both, not for now at least. What we're going to do is simply push small pristines into pristine.db and in-place compress the larger ones on disk (as a first implementation we'll probably even leave the names the same and change nothing other than passing the disk I/O through compressed streams). Beyond that, we will probably experiment with packing. But it's a bit soon to worry about it. Although any research or help is more than welcome!

>
>I'm with Greg in thinking that we don't need the pack index files -
>but, I think I'll go further and reiterate that I think that there
>should just be a 1:1 correspondence between the pack file and the
>pristine file. What's the real advantage of having multiple large
>pristines in one pack file (and that we constantly *append* to)? And,
>with append FS ops with multiple files in one pack file, we rely on
>our WC/FS locking strategy to be 100% perfect or we have a hosed pack
>file. Ugh. I think it just adds unnecessary complexity. I think
>it'll be far simpler to have 1:1 correspondence with a pack file to a
>single large pristine. We'll have enough complexity already to just
>find the small files sitting in SQLite rather than on-disk.

It's all relative! Saying "multiple large pristines in one pack file" assumes too much. I find it better to first define/find/compute an order of pack file size that satisfies our reqs (my crude math finds that to be in the order of a few MBs - see proposal doc) then it follows automatically the largest pristine that can share a pack file with another. Anything smaller can share a pack file with others, and hence (by our definition!) aren't "too large". Larger ones are "really large" (again by our definition) and so will be compressed alone on disk (ignoring if there will be a pack-header or not is hopefully not debated for now) - practically these files will be in-place compressed, as a result.

As for assuming that we only/constantly append to a pack file, that's unfounded. Files may be removed from a pristine store upon svn up. Even if not, a file modification is reasonably implemented as remove+add. This is the correct way to do it because the files size might change and we need to do the same housekeeping as removing one file and adding an unrelated one. Granted, there is room for improving this. In other words, knowing it's the same prisitine file modified a bit doesn't give us much information to be of practical use.

>
>Given that 1:1 correspondence, I do think that the original file-size
>and the complete checksum should be stored in the custom pack file
>on-disk. It'll make it so that we could easily validate whether the
>pack file is corrupt or not by using file-size (as a first-order
>check) and checksum (as second-order). The thinking here is that if
>the checksum is not in the file contents, but only in the file name
>(or the pristine.db), the file system could very easily lose the
>filename (hello ext3/4!) - this would allow us to verify the integrity
>of the pack file and reassociate it if it gets dis-associated. This
>is less of an issue with the client as it can always refetch - but, if
>the server code ends up using the same on-disk format (as hinted in
>the Google Doc)...then, I think this is important to have in the file
>format from the beginning.
>
>I definitely think that we should store the full 64-bit length
>svn_filesize_t and not be cute and assume no one has a file larger
>than 1TB.

All welcome notes. We will get back to these issues when we have a working version that we can play and experiment with. There is certainly too many things to worry about and perhaps even more tempting points to toy with. To be pragmatic (and productive) I want to focus on getting the simplest working implementation that can justify this feature (i.e. one that does produce real disk savings without too much complexity or performance reduction). But points taken.

>
>I'll disagree with Greg for now and say that it's probably best to
>just pack everything and not try to be cute about not packing certain
>file types - I think that's a premature optimization right now. I
>think the complexity of having a mixed pristine collection with some
>files packed and some files unpacked is odd (and some files in SQLite
>and some files on-disk). Maybe end up adding a no-op compression type
>to the file format (IOW, tell gzip to do a no-op inflate via
>Z_NO_COMPRESSION). Maybe. I just doubt it's worth the additional
>complexity though. ("Is this pristine file compressed?" "I don't
>know." Argh!) Making those assumptions based on file extensions or
>even magic bits can be a bit awkward - case in point is PDF...some
>times it'll compress well, some times it won't. So, best off just
>always compressing it. =)

Agreed. I think it's reasonable to attempt at a no-compression type, but keep it abstracted away in the compression layer, not higher. I also agree it's a premature optimization, so we should do it when we have a working stack first.

-Ash

>
>My $.02. -- justin
>
>
>
Received on 2012-04-10 09:58:32 CEST

This message: [ Message body ]
Next message: Julian Foad: "Re: svn commit: r1296596 - /subversion/trunk/subversion/libsvn_delta/xdelta.c"
Previous message: Daniel Shahaf: "Re: build probs on Windows with release build and conditionally"
In reply to: Justin Erenkrantz: "Re: Compressed Pristines (Summary)"
Next in thread: Ivan Zhakov: "Re: Compressed Pristines (Summary)"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]