[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Another working copy library

From: Peter Lundblad <plundblad_at_google.com>
Date: 2007-01-19 11:44:24 CET

Hi,

I have real concerns with this new WC implementation idea, but I still think it
is something we should do, in a slightly different way than you propose...

David Anderson writes:

> The first is that this is a 2.0 feature. I disagree for several reasons.
>
> First of all, if it wasn't clear in my initial mail and followups, I
> intend libsvn_wc_sqlite to follow the API that libsvn_wc has. There
> may be a few new functions required (for example the call to sever a
> subtree from the wc), and possibly some changes to the libsvn_wc API,
> which would have to be backwards-compatible changes. However, as
> noted, some of the semantics that some users have come to rely on
> (known location of text-bases, using .svn as the basis for knowing if
> a directory is under version control...) would be broken by
> libsvn_wc_sqlite, which is why it cannot entirely replace libsvn_wc
> before 2.0.
>
We never promised that the internal format would be the same throughout
1.x, so people relying on that are doomed. We've had several updates
to the internal format, including the ones for 1.4. The API needs to stay
compatible, though.

> That, in my opinion, is not a strong enough reason to push off
> addressing wc issues until 2.0. For that matter, we still don't know

So, we know we have some big-project users that are being hurt by our
current performance. I am also pretty sure there are lots off not-so-big
project users that are quite happy with the current implementation.
The question is to what extent a certain user category should be allowed
to add costs for the project *see below about maintainence costs).

> exactly what we want and don't want in 2.0, nor when we will start
> working on it. I am not prepared to put this off indefinitely until
> the hypothetical time where we decide that we've had it with 1.x. That
> watershed moment isn't clearly defined enough for me to rely on it.
>
OTOH, we are the ones who can start defining a roadmap for 2.0 if we want to.
Starting things that are more or less 2.0 things now (WC redesign,
FS redesign, ..., without a clear vision about 2.0 wisl cause us to carry
the old compatibility bagage forever. Most (all?) users will use
the new stuff and the old stuff will bitrot, but still be halfheartedly
maintained for theoretical compatibility. I am not stopping you from designing
a new WC format (in fact I want to do the oposite), but we do need to
consider the big picture.

> IOW, I think this is as much a compatibility break as FSFS was for BDB
> backend users, because the implicit promises that you could use BDB
> tools to mess with the filesystem went away. People have the choice
> whether or not to switch, and can make an informed decision based on
> the various tradeoffs.
>
Theoretically this could be the case. The big difference, though, is that
for the filesystem, there was a much cleaner API abstraction.
I just went through the 150 K svn_wc.h to see whether my feeling was right,
and I am still convinced that the WC API has accumulated so much low-level
crutf that trying to reimplement it in a completely different format -
with different tradeoffs and requirements will be a real challenge.

First off all, how do we know that the new library is compatible with the
old one? This is a very complex piece of code and some of the APIs
have really quirky semantics. I guess a solution to this would be to write
a real test suite for the WC library and not just rely on regression tests
of the svn commandline client for that.

Another thing is that the API is rather specific to the current way
of working. Would the adm directory locking mechanism be appropriate for
this type of format. The svn_wc_entry_t struct exposes a lot of
implementation details that it shouldn't and the entries caching in
memory is actually *hardcoded* in the API. There are other examples of
things I'd rather not be constrained by in a new WC format design.

I suggest we instead create a new API and then try to implement it
in terms of the old one. The old API will essentially be frozen and our
own code will thereby start using the new one. I admit that this might
cause some tasks to be suboptimally implemented for the old WC and we
need, to some extent, take that into account when designing the API.
OTOH, if users complain about perf regressions, we can always recommend
they switch to the new WC format without forcing them to;)
New features may have to be reflected in the old WC library, but we can add
new private (__ named) APIs to not expose new APIs outside our own libs.

> Which brings me to the maintenance burden issue that was raised. As I
> have said, I am willing to put changesets where my mouth is and make
> this happen. It seems a few other people are also willing to help out
> doing this. I am very much against needless maintenance burdens, but I

So, you are talking about doing the initial implementation which is not the
same as maintainence. The old code also needs some care, but who will want
to put effort into that when the new shiny lib is available? When adding
new features, we need to consider whether to make them available for those
who didn't switch yet.

We now have 4 RA layers, 2 FS backends and we would then have 2 WC libs as
well. This becomes 16 combinations for testing and when most people
switch to the new lib things that don't work in the old one will probably
turn up later in the dev cycle leading to problems. These are real costs
for a complex part of our code base that we shouldn't underestimate.

I am ready to accept this extra cost in order to go forward, but as I said
before, if we do this, we really need to start focusing on 2.0 to not
keep this cost forever.

> Regarding the proposed storage mechanism: I'm with Justin on the not
> storing anything outside the working copy by default. I like my
> self-contained working copies that I can delete without "detaching"
> them or some such nonsense. However, there are cases where storing the

We once hard-coded the location of admin data, which turned out to be a bad
thing for certain dev environments on certain platforms. Let's not redo
that mistake...

> want stuff in ~/.subversion, then put the metadata there and symlink,
> and you're golden.

...or for that matter encourage people to mess with the metada (location)
by hand. Also, let's be a bit more user friendly than that.
>
> However, I think that libsvn_wc_sqlite with tree crawls would still be
> a huge improvement over plain libsvn_wc, even without using an
...

I think you are right, but guessing is no good way of profiling.
A prototype implementation will tell for sure.

So, what am I trying to say? I am positive to this idea in general, but
I think we need to consider it a part of a movement towards 2.0.
I think we should create a completely new API and start using it in our
code and create a wrapper implementing the new API in terms of the existing
libsvn_wc API.

> I'll start drawing up a more detailed design tonight, to flesh out the
> general ideas.
>
Cool! Looking forward to reading it.

Regards,
//Peter

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Fri Jan 19 11:45:04 2007

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.