[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Fwd: Subversion Compressed Pristines Design

From: Ashod Nakashian <ashodnakashian_at_yahoo.com>
Date: Thu, 22 Mar 2012 01:44:42 -0700 (PDT)

I've combined issued from separate and removed the comments from Google docs to make things concise.Converted to plain-text for convenience.

>________________________________
> From: Greg Stein <gstein_at_gmail.com>
>To: dev_at_subversion.apache.org
>Sent: Thursday, March 22, 2012 12:25 AM
>Subject: Fwd: Subversion Compressed Pristines Design
>
>
>Again, pulling commentary up out of the HTML:
>
>* svn_filesize_t: Subversion has been designed for 64-bit file
sizes. The pristine store should be able to store files of that size. I
see no reason to argue against a concept that has been part of its core
design. You state "improve performance", but unless/until you can
demonstrate that storing 8 bytes impacts performance, then I'll call it
"premature optimization". I have a hard time believing that reading 8 vs 5 bytes from a file has any noticeable impact on performance. If you
*really* believe it will impact performance, then store 2 bytes for the
length. Reserve the high bit, giving you 32k length for size. If the
high bit is set, then read one more byte divided as 3/5 bits. The 3 bits tells you how many more bytes to incorporate into the size, and the 5
bits are more size content. That will give you 2 bytes in the case where most files are <32k, and 3 bytes for most others (20 bits == 1 meg). And files up to svn_filesize_t are possible (since you can max out 76
bits). Oh, and we don't need 96-bit sizes; again: svn was designed
around 64-bit; we cannot expand to 96, so I dunno where that comment
came from.

Let me clarify. There is a trade-off between fixed-size and variable-size structures. The former is easier/faster to read, because you don't need to to any parsing. The latter is potentially more compact on disk, but requires parsing and/or lookup tables (git uses var-size and lookup tables). My original design was 40 bytes / entry due to file-size being 64-bits and pack ID being 32-bits. The reason I say shorter size is faster because I'm thinking in terms of reading the complete (or a large part) of such an index file. 5 bytes x 1M files = 5MB of data. See below for why we might need to read the full index.

Having said that, let's suspend this path and consider your next point which looks more promissing. I like the idea of having the index entries in sqlite, *provided it's not significantly slower*.

>
>
>*
 wc.db: by the time we try to read PS, we already have wc.db open and
have read a row out of PRISTINE. There is near-zero impact to read more
data for that pristine out of wc.db. I doubt that removing pristine
information from wc.db is possible, since we need the cross-referencing
with other WC state. Thus, we'll always have the database read, and can
always fetch everything needed for the PS data (without the need for
index files).

It's true that we're already read the row, and getting the relevant information in that read is probably faster than reading from a separate file on disk. I'd completely agree to moving the index file into the wc.db if we avoid any operations that span *all* entries. You see, if for some operation we need to have a map of all pack store files and their contents (for fast lookup or to find the best-fit pack store for a new/modified file) then we need one of 3 methods:

1) Read all pristine rows from wc.db and construct the neessary lookup table(s).
2) Store separate table(s) in wc.db with the necessary information (but again we need to construct lookup tables).
3) Store this info in a separate file (the index file) such that reading it restores this lookup tables (yes, the entries, if fixed-size, can be sorted and kept sorted at all times, lookup becomes trivial).

My approach was #3 and as such I made the entries fixed-size and small (for fast read, small memory footprint, high cache-hit rate etc.). We can make the file-size full 64-bits and still have better performance than by using the first 2 approaches.

>
>* I'm unclear on what you mean by "no limitations", as your design doc clearly states 16MB file size limits, and 64k of those files. Thus, 1TB of content. Now... elsewhere, you also state that a single (pack) file may exceed the normal limit, for purposes of keeping an entire pristine within a single file. These two design points are a bit contradictory.

Seems I must change the wording in the doc. The 16MB is the cut-off size to split a pack and start writing into another one. There is no reason why a pack file can't grow to TBs of size and still have 64k of them. But there must be a cutoff size otherwise splitting will never occur. The algorithm will basically split the packs until it reaches a maximum number of splits and at that point (that we've deemed more pack files are counter productive) that existing pack files will have to grow instead. It's just that I've tentitively chosen 16MB as a cutoff and 64k splits as "good starting points" and added that benchmarking will ultimately decide these values (although I think 64k files is a nice limit to try to preserve as higher numbers may require us to use a directory tree to avoid cluttering the pristine folder beyond hope - and yes, some file systems do perform rather poorly on directories that have a very large number of files).

Of course we can make the cutoff larger, but then working with a single pack store will be more time consuming (due to compression, fragmentation etc.) which will either affect speed or space. It's best to keep the pack files small and have a large number of them, then grow them as necessary rather than having large pack files and split, say, at 2GB limits. That's because most WCs are actually less than 2GB and splitting wouldn't benefit the majority.

>
>
>* In any case... 1TB is certainly not large enough for the pristine content store. I've personally witnessed 10GB working copies. I will bet others on this list have seen larger. 10GB is just two orders of magnitude less than 1TB. We need better future proofing. I'd like to see a pristine store that can accommodate 1PB, minimum. The current store has no limit, so if one is to be applied, then it better be very generous.

I like the 1PB minimum. But see above on the "limitation issue".

Cheers!

-Ash

>
>
>Cheers,
>-g
>
Received on 2012-03-22 09:45:21 CET

This is an archived mail posted to the Subversion Dev mailing list.