[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Eliminating multiple file-reads of text-base on updated targets

From: Erik Huelsmann <ehuels_at_gmail.com>
Date: 2005-12-06 10:45:53 CET

On 12/6/05, Peter N. Lundblad <peter@famlundblad.se> wrote:
> On Tue, 6 Dec 2005, Erik Huelsmann wrote:
>
> > I've found an optimization which would fit nicely in the 1.4
> >'libsvn_wc optimizing release': libsvn_wc calls svn_io_file_checksum
> >on files it just read or wrote - meaning it reads them again to
> >calculate the checksum.
>
> Do you know how much this costs performance-wise? Yes, it might be a
> reasonable optimization, but it would be good to have the whole picture if
> possible.

Absolutely no idea, but I guess it makes a big difference for what
your working copy looks like: small and medium size files will
probably just hit the penalty of accessing the filesystem, but large
(hundreds of MBs) files will suffer big time if they need to be read
several times.

> >- creating an svn_stream_checksummed, like svn_subst_stream_translated.
> >- let the stream raise an error on close if the stream checksum
> >doesn't match a given checksum value (with an error message text
> >passed to it on creation)
>
> Would it be better to just let the stream provide the checksum and let the
> caller take care of the error handling itself? I see this passing of
> error messages down the calls as a little ugly.

Yes, I was posting about it exactly because of it.

> >- create svn_stream_checksummed_get_sum() to retrieve the checksum
> >from the checksummed stream
>
> Just provide a pointer where to store the checksum on close instead? The
> function above wouldn't be type-safe, would it?

Storing in a pointer passed to it will also work. Heh. Thanks for the
idea. Never thought about it.

> BTW, this is the big part of issue #1882.

Thanks for finding the issue :-) I knew there was an issue about
something like it, but it's only a related problem in this case: it's
the calculation of the target datastream which is meant in the issue.
My problem is that we calculate the MD5 of the target stream, write
the target stream to disc, forget the MD5 and need to recalculate the
MD5 later on... (But that's part 2. What I presented here is part 1.)

Part 2 involves update_editor.c.

bye,

Erik.
Received on Tue Dec 6 10:47:12 2005

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.