On Mon, Feb 07, 2011 at 09:23:23PM +0100, Johan Corveleyn wrote:
> I've been wondering about the question "how about storing/buffering
> the entire query results in memory?" Would this really be a problem,
> even for very large working copies?
>
> I have a quite large working copy checked out here (trunk of an old
> backup of our work repository): 68,806 files, 9,868 folders (not
> counting .svn area of course). Ok, it's not huge, there are certainly
> much larger ones out there, but it's not small either. For comparison:
> svn trunk is only 1,818 files, 227 folders.
>
> I just note that wc.db of this large wc is only 62 MB (the one from
> svn trunk is only 1.6 MB). And this is a working copy with mostly
> .java files, all with 3 properties set (svn:eol-style, svn:keywords
> and cvs2svn:cvs-rev (yeah, this was an old migration experiment, we
> dropped this one when we did the final migration)).
>
> So I'm thinking that any query results will take a maximum of 62 MB of
> memory, and usually a lot less (most queries won't be reading the
> entire db). Or is this too naive?
The sqlite temp_store pragma is a global setting.
It not only applies to proplist, but also to other queries using
temporary tables.
We're trying to devise a general approach to better using wc.db
based on the proplist example. Other commands may need a similar
approach. So we can't just look at proplist to make a decision.
How much data will svn status pull into memory? How much data will
future APIs that we haven't even designed yet pull into memory?
> The above solution with SQLite temp tables seems like a good approach.
> And it would be great if the "file-backed" temp tables would be
> "almost as fast" as the "in-memory" temp tables (with the PRAGMA
> setting). But if the file-backed temp tables would be significantly
> slower, I would consider taking the "in-memory" route.
I think we should use the default (PRAGMA temp_store = FILE) unless
we encounter a serious problem with it. I believe that this was changed
to MEMORY because that helps with the ridiculous amount of queries we're
doing at the moment. But with the handful of queries we're issuing with
Branko's patch, and the amount of data pulled into a temporary table,
we're more likely to hit the problem where temporary tables don't fit
into memory. When that happens, we fail if temp_store is MEMORY,
and we get a bit slower if temp_store is FILE. Getting a bit slower
is better than failing.
I also think that we shouldn't be changing sqlite default settings unless
there is a very good reason. We're (mostly?) not database developers, so
we're probably better off relying on defaults chosen by sqlite devs.
> Or, even better: make it configurable for the user (client-side config
> file, or something wc-specific (?)), so he can make the choice between
> memory and speed for himself.
That would be interesting to add, and would allow our users to help
us find out whether we might want to flip the default to MEMORY in
the future (e.g. for Subversion 1.8).
> A lot of wc databases out there will be
> so small that the user will hardly notice the memory increase.
All we'd be doing is allowing sqlite to flush data to disk if needed.
Even with a temporary table backed by a file, most operations happen in
memory. Either in buffers managed by sqlite or the operating system's
buffer cache (until sqlite does an fsync). So for small databases it
shouldn't make a difference.
Received on 2011-02-07 21:52:16 CET