[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Fwd: Subversion Compressed Pristines Design

From: Greg Stein <gstein_at_gmail.com>
Date: Thu, 22 Mar 2012 03:17:12 -0400

I'm not going to try and reply in-line on an HTML email :-P ... but let me
pull the comments up here to address:

* svn_filesize_t: Subversion has been designed for 64-bit file sizes. The
pristine store should be able to store files of that size. I see no reason
to argue against a concept that has been part of its core design. You state
"improve performance", but unless/until you can demonstrate that storing 8
bytes impacts performance, then I'll call it "premature optimization". I
have a hard time believing that reading 8 vs 5 bytes from a file has any
noticeable impact on performance. If you *really* believe it will impact
performance, then store 2 bytes for the length. Reserve the high bit,
giving you 32k length for size. If the high bit is set, then read one more
byte divided as 3/5 bits. The 3 bits tells you how many more bytes to
incorporate into the size, and the 5 bits are more size content. That will
give you 2 bytes in the case where most files are <32k, and 3 bytes for
most others (20 bits == 1 meg). And files up to svn_filesize_t are possible
(since you can max out 76 bits). Oh, and we don't need 96-bit sizes; again:
svn was designed around 64-bit; we cannot expand to 96, so I dunno where
that comment came from.

* wc.db: by the time we try to read PS, we already have wc.db open and have
read a row out of PRISTINE. There is near-zero impact to read more data for
that pristine out of wc.db. I doubt that removing pristine information from
wc.db is possible, since we need the cross-referencing with other WC state.
Thus, we'll always have the database read, and can always fetch everything
needed for the PS data (without the need for index files).


---------- Forwarded message ----------
From: ashodnakashian (Google Docs) <
Date: Thu, Mar 22, 2012 at 02:12
Subject: Subversion Compressed Pristines Design
To: gstein_at_gmail.com

ashodnakashian added comments to Subversion Compressed Pristines
[image: Greg Stein]
*Greg Stein*

svn_filesize_t is a signed 64-bit value. 5 bytes is insufficient.
[image: ashodnakashian]

This is something I debated for a while (with myself). At first I used the
full 64-bits, but I found it excessive. Really, a single file *that* large?
Notice that this is the size of a *single* pristine file. 5 bytes is good
for 1TB. We can easily make this 6 bytes (good for 256TB files) but I'd
like to conserve space to improve performance. Let's discuss this further.

Note: We can always have version 2 of this format that supports files of
96-bit length in the future, nothing prevents that.
[image: Greg Stein]
*Greg Stein*
Pack Index Files

I see no need for the index files when we have the wc.db sqlite database
handy. That database has everything about the pristines right now, so I'd
recommend just extending the columns in the pristine table, as appropriate.
[image: Greg Stein]
*Greg Stein*

Note: it may be possible to use this stuff on the server side. Using a
separate database or pack index might make that easier. I don't recall the
server requirements around sqlite databases (think: concurrency
[image: ashodnakashian]

Regarding your first point, wc.db may be used. It'll be significantly
slower, but we can certainly use it. Let's discuss this further.

As for the possibility of using this one the server, YES! absolutely.
Specially that the structure of the files can support multiple revisions
and a layered approach. However, since that isn't fully designed, and, more
importantly, to limit the scope of this feature and avoid feature creep, I
purposefully left that out. But I'll add a note for posterity.
[image: Greg Stein]
*Greg Stein*
Candidate Entropy-Compression Libraries

As mentioned above, please add Snappy.
[image: ashodnakashian]

[image: ashodnakashian]

*Marked as resolved*
[image: Greg Stein]
*Greg Stein*

We keep those values in the database. No need for PackFS to worry about

See 'translated_size' and 'last_mod_time' columns.
[image: ashodnakashian]

Moving the index file into the database is certainly worth discussing
further. Let's take it on the dev-list.
You received this email because you are a participant in the updated
comment threads.Change<https://docs.google.com/document/docos/notify?id=1ktIsewfMBMVBxbn-Ng8NwkNwAS_QJ6eC7GOygsbBeEc&title=Subversion+Compressed+Pristines+Design>what
Google Docs sends you.You
can not reply to this email.

Received on 2012-03-22 08:17:49 CET

This is an archived mail posted to the Subversion Dev mailing list.