[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: [RFC] FSFS filesystem options (long, sorry)

From: Ph. Marek <philipp.marek_at_bmlv.gv.at>
Date: 2007-03-05 10:45:34 CET

On Monday 05 March 2007 08:55, Malcolm Rowe wrote:
> Well, I was really looking for comments about the option concept
Having options is good - so you can get what you need. In some cases it's not
possible (or very hard!) to re-compile - and the standard way may not work.

> , but with regards to the number of files in a directory: if the filesystem
> supports it _well_, it really is the most efficient option. Most don't,
> alas.
>
> (FSFS doesn't ever do a readdir(), so it's not quite as bad as you make
> out, but the large number of files doesn't help lookups generally).
It's not so much about readdir() or the API - just reading the directory and
giving whatever entries were just found is not the problem.

I'm talking about *humans* trying to have a look there.

It's not needed, normally - but just when there's an *abnormal* situation, it
helps a lot to just do a "cd ; ls" and look where you want to go.
If you're having stress and a simple "ls" doesn't work ... that's bad.
And I'm not even thinking about using some graphical interface - just imagine
the KDE repository (with currently ~640k revisions!), and browsing there with
konqueror! You'll never see any files, because before the data could be read
you'll get a change event on the directory, and it gets loaded again ...

Many thousand files in a directory may be possible (and scale) in the
filesystem, but don't work for humans.

> > BTW: If you do some changes in FSVS, how about issue 2286
> > (http://subversion.tigris.org/issues/show_bug.cgi?id=2286) "Identical
> > files should share storage space in repository"? Pretty please :-) ?
>
> Yes, that's something else I've looked at. This would be an especially
> good idea for feature branches, with their frequent merge-from-trunk
> patterns, and it'd also help increase cache locality.
>
> It's not easy, though: you need to determine the delta base for a file,
> accept the new file and write out a delta (just in case it's unique),
> then quickly look up any matching representations and ditch the delta
> you've just written out. Oh, and when you commit, somehow update the
> MD5 index without disturbing other readers.
>
> I'm not saying it's impossible, but it's pretty hard.
What I'd like to append here: as the merge-branch is on its way now, this
could be much easier. If we copied r6, never changed the file, and now merge
the changes from r12, we could relay that to the repository (or do the merge
there?) by saying "just hardlink to r12 of file X, but keep history from this
path" - there's even less work to do for the repository, and a lot of space
saved.
Of course, other identical data should be merged, too.
(But: that could even be done asynchronous, ie. per cron-job ... just fetch
the list of files changed since the last run, get their MD5s, and do
a "svnadmin link path/to/file/1@12 path/to/other/file@51" ... although that
should work on something better than single files).

Or maybe we should address the data per the hash - "Is the file with SHA-1 YYY
in the repository? No? Here you have it".
But that's the git way of doing things ...

Regards,

Phil

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Mon Mar 5 10:45:52 2007

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.