Re: Entries caching & Performance

From: Kirby C. Bohling <kbohling_at_birddog.com>
Date: 2002-11-26 01:39:10 CET

On Mon, 2002-11-25 at 17:57, Philip Martin wrote:
> Brandon Ehle <behle@pipedreaminteractive.com> writes:
>
> > pilchie on IRC revealed that we are currently doing caching of
> > entries reading only on commits and status. Is anyone working on
>
> That's not correct, entries are cached for all operations. At present
> operations that modify the working copy will repeatedly *write* the
> entries file, but it does not get read repeatedly. Writing happens
> repeatedly because a) that's how it worked originally, and b) if it
> didn't then an interrupted operation would lose all its modifications.
> (I know interrupted checkouts cannot be restarted, but that's a bug
> that needs to be fixed.)
>

Any reason compelling reason to not have the entries broken out into
seperate files in a directory (say .svn/entry_dir/ ) w/ global and/or
directory properties store in .svn/dir_entries or some such. That's a
bad naming convention, but I'm not really that creative.

It'd match the API a little better. On imports and checkouts some time
is spent basically removing the trailing '>', then appending the latest
entry to to the entries file and putting a '>' after it.

Instead of doing that, the entire file gets written every time. So the
number writes ends up being O(n^2) relative to the number of entries.
During that time, it spends a lot of time translating things back and
forth to and from XML to entry structures.

You write out a single entry at a time, and read all of them in at a
time thru the API (write_entry vs. read_entries). At least, as of a
month or so ago when I last looked, the API let you write a single entry
at a time, but because of the storage mechanism and atomic constraints,
you had to process every single entry in that directory and then write
all of it out using the atomic file write calls. If each entry were in
a seperate file, writing an entry would be proportional to the number of
properties of that entry, rather then proportional to the number of
properties of all the files in the directory. Doing this would remove a
performance penalty for having directories with a large number of files
or sub-directories.

When I looked at it, this appeared to be the best optimization I could
find for speeding up imports and checkouts, which we're painfully slow.
When I was playing with a few of my own personal code bases it sure
seemed too slow to be usable on a daily basis.

Thanks,
Kirby

> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
> For additional commands, e-mail: dev-help@subversion.tigris.org
>

-- 
Real Programmers view electronic multimedia files with a hex editor.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Received on Tue Nov 26 01:40:12 2002

This message: [ Message body ]
Next message: Philip Martin: "Re: Entries caching & Performance"
Previous message: Philip Martin: "Re: Entries caching & Performance"
In reply to: Philip Martin: "Re: Entries caching & Performance"
Next in thread: Philip Martin: "Re: Entries caching & Performance"
Reply: Philip Martin: "Re: Entries caching & Performance"

Contemporary messages sorted: [ By Date ] [ By Thread ] [ By Subject ] [ By Author ] [ By messages with attachments ]