Re: Another working copy library

From: Ph. Marek <philipp.marek_at_bmlv.gv.at>
Date: 2007-01-17 11:53:30 CET

On Wednesday 17 January 2007 08:36, David Anderson wrote:
> I've been kicking the thought around for a while now, so I'll get it
> out here in the open.
>
> I think we all know about the "organic" growth of libsvn_wc.
...
> The ones I can think of right now:
>
> - Having to crawl the entire tree on most operations ...
> - Storing metadata all over the place. ...
> - Text-base storage. ...
> - Doesn't play well with ... `grep -v .svn` to filter out ...

> So, I want to break libsvn_wc.
...
> I think that libsvn_wc_sqlite addresses the issues I pointed out at
> the beginning of this mail: tree crawls are minimized,
If you can get the indizes in a sequential chunk, it may be worth it.
Else you'll get the harddisk seeking around, too (although better, I admit).

> inode count goes way down,
That's for sure.

> commandline tools don't find text-base dupes all over
> the place,
As long as you store them in a BLOB or similar, why not? grep happily looks
into binary files, just to give you the filename.
And if you use "grep --exclude *.svn*" it doesn't matter if there's one file
or 1000.

> and we have a clear internal API where we can handle the
> text-base storage problem cleanly. And, hopefully, most operations are
> reduced to an SQL select statement, which can be blindingly fast if
> the database is indexed properly.
"If the database is indexed properly". So it will have some
storage-space-cost.

I don't want to be seen as outright against that idea - it surely has it's
merits. I just don't know whether it makes sense to store multi-GB in a
database, when there's a filesystem available. It feels a bit like
file-in-database-on-nfs-mount-on-loopback-mounted-file-on-samba-share, if you
get what I mean ;-)

I think that, when such a big thing is being done, it may be good to break a
bit more -- don't store local text bases.
That saves us 50% of storage space and grep is happy.
Use partial-MD5-hashes to check for modifications (like fsvs does), and if
there's a ra call "ra_get_file_ranges" fsvs would be happy, too :-)

Or, to not have such a big change, simply define an alternate storage
container; in fsvs speak the "Working copy Administrative Area" (WAA), and
use the MD5 of the files as an index there.
That would additionally allow to share the text-bases across multiple
check-outs, which is a nice benefit.
If the directories are set apart, a grep won't look there.

(I don't want to sound like 'look what fsvs does better' -- fsvs is not
thought for source control -- but I believe that it's got a few things right.
[If it didn't, I'd change it to :-])

Regards,

Phil

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Wed Jan 17 11:53:30 2007

This message: [ Message body ]
Next message: David Anderson: "Re: Another working copy library"
Previous message: Ph. Marek: "Re: changelist feature -- keep it? tweak it? scrap it?"
In reply to: David Anderson: "Another working copy library"
Next in thread: David Anderson: "Re: Another working copy library"
Reply: David Anderson: "Re: Another working copy library"

Contemporary messages sorted: [ By Date ] [ By Thread ] [ By Subject ] [ By Author ] [ By messages with attachments ]