Re: [PROPOSAL] Drop XML from .svn/entries

From: Peter N. Lundblad <peter_at_famlundblad.se>
Date: 2006-04-07 23:08:20 CEST

Justin Erenkrantz writes:
> On 4/7/06, Peter N. Lundblad <peter@famlundblad.se> wrote:
> > Since the new format reduced the CPU time by about 35%, I think the
> > performance gain is a combination of I/O and CPU, especially when much
> > of the entries files is ion the OS cache, which is the case when you
> > perform multiple operations in a row. But I don't think this should
> > be very hard to try, so I'll give it a shot to get some perspective.
>
> There might be some opportunities to make our XML parsing a bit faster still.
>
> A few things looking at read_entries:
>
> 1) APR_BUFFERED isn't turned on, so any small reads or writes will
> cause extra libc/syscall activity. Combined with...

Will we normally have small reads here? We read into a reasonably
large buffer, so I'd think turning on APR_BUFFERED would just be
another memory copy operation (or does it read directly into large
buffers? I havent checked.)

> 2) No need to call read_full in the do loop. However many bytes it
> reads at a time is fine by us. read_full would cause multiple I/O ops
> of a non-optimal chunk size - the better strategy is to just process
> what we got from one libc/syscall invocation instead of forcing the
> return to be CHUNK_SIZE (16k).

I understand this for pipes and sockets, but not for regular files. I
don't think we normally see short reads on regular files unless we are
at EOF. Or what do I miss?

> It *might* be worth it to see if doing a mmap of the entries file
> would help - that would make the parsing faster as we can avoid a
> buffer copy entirely. Solaris, FreeBSD, and a few other OSes have a
> madvise option (MADV_SEQUENTIAL) that works with mmap that tells it as
> we read the memory location we can discard the previously read bytes
> and we will fetching the next bytes. This isn't insignficiant in some
> other usage patterns like this.
>
> Your recent changes to handle_start_tag to use a scratch pool helps
> save memory, but we could/should avoid the strdups that come with
> svn_xml_make_att_hash. It might just be worth teaching
> svn_wc__atts_to_entry how to handle the atts array directly to avoid
> the scratch pool altogether.

Yeah, looking at a profile shows that moving attributes into hashes
and looking them up is significant.

Still, we're spending lots of time in libexpat, according to oprofile.

Thanks,
//Peter

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Fri Apr 7 23:08:51 2006

This message: [ Message body ]
Next message: Nicolás Lichtmaier: "Re: Dropping dependencies in tarballs"
Previous message: Michael Sweet: "Re: Dropping dependencies in tarballs"
In reply to: Justin Erenkrantz: "Re: [PROPOSAL] Drop XML from .svn/entries"
Next in thread: Philip Martin: "Re: [PROPOSAL] Drop XML from .svn/entries"

Contemporary messages sorted: [ By Date ] [ By Thread ] [ By Subject ] [ By Author ] [ By messages with attachments ]