Re: Optional/compressed text bases (was: Re: [Reminder] Subversion a mentor for Google Summer of Code)

From: Peter N. Lundblad <peter_at_famlundblad.se>
Date: 2006-05-08 21:47:21 CEST

Qi Fred writes:
> On 5/8/06, Peter N. Lundblad <peter@famlundblad.se> wrote:
>
> > I assume these options will determinee which method gets used when
> > checking out? Do you imagine the user being able to switch existing
> > working copies?
>
>
> Sure, users may use a svn client supports compressed copy when they create
> theire initial check outs. But some others may not. Switch means the user
> can
> upgrade the client smoothly without re-do check out. Another problem is that
> uncompress can be very time consuming, and users would like to work with
> original text-bases. So a mechanism supports switching is necessary.

We don't need to worry about the UI details right now, but it seems clear
that you intend some extra command or something to manage this state. Correct?

> Thanks. This is very useful. I am not clear how deltas are genereated and
> whether minimal deltas are used in commitments. Do you mean that we
> need not modify any code of the server, if the client sends full text as a
> delta
> in a commitment?

Exactly. The delta is a series of instructions: copy from the delta
source, copy from the target generated thus far and insert new data.
The server will take this, construct the fulltext and generate its own
delta, often based on another revision then the WC one. I think
having to send the whole new text to the server if you choose to
eliminate text bases is an acceptable tradeof, at least initially.

> > > If a file without cached text base has been modified and intend to
> > > be committed, there are three (or more) potential working cycles:
> > >
> > > 1) abort and warn the user
> >
> > That's not good. This makes the feature pretty useless except for
> > read-only working copies...
> > > 2) temporarily download the base revision
> > >
> > Could as well send a fulltext delta to the server.
>
>
> It would be better if the server accepts compressed delta.

So, we have an 100 MB file. Are you suggesting that downloading that
file, just to be able to upload a delta is better than just uploading
the whole new text? Or are you just suggesting that the new text
should be compressed? IF the latter, then that's already the case, so
you don't need to worry about that.

> > > 3) make Subversion work without cached text bases
> > > - split large binary files into small blocks, for example, 32KB
> > > - stores locally the very short message digests of all blocks
> > > - detect changes by comparing digests of corresponding blocks
> > > - send only the changed blocks to the server or request and
> > > download only the changed blocks to the client.
> > > - generate deltas and commit changes (on server or client side).
> >
> > What happens when someone inserts one byte near the beginning of the
> > file? We need an rsync-like algorithm if we want to do this. I think
> > this is an optional optimization. People will need to trade disk
> > usage (storing text bases) versus network usage.
>
>
> The average performane is better than the two previous suggestions.
> To optimize the worst case would be time consuming, and I am not wheter
> the time is enough within the Summer of Code limitation.

Sorry. I don't follow the above. What I'm sayihng is that your
proposed algorithm won't work if (part of) the file is shifted away
from its original location, for example by inserting or removing some
bytes. I think that's very common, so I don't think that's only the
worst case.

When I said "optional optimization", I meant that it is optional for
you to implement (or whoever gets to do it). The feature works fine
without it.

> > > All the above working cycles solve the problem introduced by disable
> > > caching text bases. The first one can be easily implemented, but
> > > introduces inconvenient manual operations. The latter two cycles
> > > require modifications in both the client and server sides. The
> > > problem of the second one is the heavy load of transmission during a
> > > commit. Since the contents of large files change seldom, the second
> > > cycle is feasible. The third one concerns the collision of message
> > > digest algorithms. There is a report that different contents give
> > > same MD5 digests (http://eprint.iacr.org/2004/199.pdf). But
> > > collisions have not been found in SHA-1 algorithm. Some
> > > investigations should be down to avoid collisions. I prefer to
> > > implement the third working model.
> > >
> > I'm no expert in this area, but I pretty sure the collisions concern
> > the cryptographic uses of MD5, so I don't think we need to worry about
> > that. Others may want to comment here.
>
>
> I would like to use MD5 algorithm, but there is a risk that some files are
> not
> correctly committed to the server.

There is another problem here, which is that the client may not detect
a modification if the original and the modified file happen to have
the same checksum. I think this risk is so small that we shouldn't
worry about it; our current timestamp-based modification detection
heuristic fails for real... And if this really happens, the
work-around would be to checkout a working copy *with* text bases and
do the commit from there. Note that this is *not* about data corruption.

> > > A special property, 'svn:text-base', is suggested to be added. This
> > > property indicates the way Subversion stores the text base of
> > > corresponding file. Its value of can be one of the follows:
> >
> > As I said above, this shouldn't be versioned. You may need to extend
> > the .svn/entries file, though.
>
>
> This suggestion is good. Is there a user interface to access the
> .svn/entries file
> in current Subversion client? I think we need a new command for users to
> access this file.

There's no interface to manipulate entries directly, and there
shouldn't be:-) What we need is an interface to manipulate this
particular state, including fetching/removing/(un)compressing the text base.

Regards,
//Peter

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Mon May 8 21:47:56 2006

This message: [ Message body ]
Next message: Paul Burba: "Re: [PATCH]: Was [PROPOSAL] Takeover Take 2"
Previous message: Walter Mundt: "Re: SoC application submitted: Improving the Python Bindings"
In reply to: Qi Fred: "Re: Optional/compressed text bases (was: Re: [Reminder] Subversion a mentor for Google Summer of Code)"
Next in thread: Ron: "Re: Optional/compressed text bases"

Contemporary messages sorted: [ By Date ] [ By Thread ] [ By Subject ] [ By Author ] [ By messages with attachments ]