[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

RE: Compressed Pristines (Call for Vote)

From: Bert Huijben <bert_at_qqmail.nl>
Date: Tue, 27 Mar 2012 12:14:24 +0200

> -----Original Message-----
> From: Branko ÄŒibej [mailto:brane_at_xbc.nu] On Behalf Of Branko Cibej
> Sent: dinsdag 27 maart 2012 2:56
> To: Subversion Development
> Subject: Re: Compressed Pristines (Call for Vote)
>
> [this should really be on the dev@ list, i mis-addressed my response.]
>
> On 26.03.2012 17:48, Ashod Nakashian wrote:
> >>
> >> I recommended a two-stage approach to the implementation because
> >> (packing and) encrypting the pristines is only half the story, there
> >> are a number of interactions with the rest of the system that need to
> >> be ironed out, and it makes no sense to take on the most complex part
> >> of the storage format first.
> > I see. This isn't what I gathered previously.
>
> I don't /know/ what interactions there are, but it would certainly help
> to find out (by running the test suite with the simplest possible
> compressed-pristine implementation, for example).
>
> >> As others have pointed out, the only cases where compression and
> >> packing may cause trouble is working copies with a lot of binary,
> >> compressed files (.jars, various image formats, etc.) That's a problem
> >> that needs solving regardless of the compression algorithm or pack
> >> format.
> > This can be handled with a not-compressed flag and storing the data as-is.
>
> You're jumping a couple steps here, the first two obviously being, a),
> identify which file formats should not be compressed, and b) figure out
> how to detect such files reasonably reliably. Using svn:mime-type as the
> determinator is not the answer, because I can set that to anything I
> like regardless of actual file contents. My advice here is to
> incorporate the flag in the wc.db, but not do the actual detection and
> special handling in the first release.
>
> > My only issue with the database-as-storage is that it won't work for
> > large files and if we store large files directly on disk, then we'll
> > have to split the storage, which isn't really great.
>
> It is, in fact, really great. Filesystems are typically much better at
> storing large files than any contrived packed-file format. Databases are
> typically much better at storing small files than most filesystems. To
> get the most benefit for the least effort, use both.
>
> > Consider a file just around the cut-off for storing in a database that gets
> moved between the database and disk between modifications that results in the
> file crossing the cut-off.
>
> Then don't make the cutoff a hard limit. For example, if a file is
> already on disk, don't store it in the database until it shrinks to
> below 75% of the limit. Conversely, don't push it out to the filesystem
> until it grows to perhaps even 200% of the limit.

A file that is only keyed by its SHA-1 will never grow or shrink. I don't think we have to discuss this part right now.

We moved away from using a pristine file per working copy file in 1.7 and I don't think we want to move back to that system.

When a file changes it becomes a new file, or we must open a completely new design where we also store just the changes for certain files.

        Bert
Received on 2012-03-27 12:15:07 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.