On Sun, Oct 5, 2008 at 3:41 AM, Daniel Shahaf <d.s_at_daniel.shahaf.name> wrote:
> Greg Stein wrote on Fri, 3 Oct 2008 at 03:50 -0700:
>> On Thu, Oct 2, 2008 at 11:39 PM, Daniel Shahaf <d.s_at_daniel.shahaf.name> wrote:
>> >> - place the checksum directly into this table, rather than a "text_id"
>> >> field. due to operational issues, the text_id may not be a satisfiable
>> >> foreign key (we may not want to insert a row).
>> > Can you explain this scenario? When do you see TEXT_ID not a valid
>> > foreign key *and* CHECKSUM a valid foreign key (at the same time)?
>> > Before this change, the schema was "id PRIMARY KEY, checksum NOT NULL".
>> CHECKSUM might not be valid, but it is a known quantity when the row
>> is inserted into NODE. Say, during a checkout, I get all the metadata
>> for the file, but I haven't started to fetch it. So we create a row in
>> NODE. Later, as we begin the fetching of the content, then we insert a
>> row into TEXT_BASE.
> IOW, the foreign key won't be enforced by sqlite, in this case.
That was my plan, but per our conversion on IRC, if we insert into
TEXT_BASE as soon as we know the checksum, then we can establish a
foreign-key relationship and a hard integrity constraint. Since the
system is already designed to support a row in TEXT_BASE with no
corresponding filesystem content, we may as well store the knowledge
of the checksum while we have it.
Another outcome of that conversion was: we'll definitely stick to
CHECKSUM as the primary key. Since that is unique, and there are no
(appreciable) performance gains to using a plain numeric ID for the
key... then we'll go for clarity/simplicity and stick to CHECKSUM.
>> If we kept it as an ID, then we'd have to create a row in TEXT_BASE
>> well before any intent of actually fetching that. Now, that said, the
>> TEXT_BASE.STORED_SIZE would be NULL, so we already know it is not
>> there or is incomplete. So that might actually be fine.
> So, the question is whether to create the TEXT_BASE row before or after
> fetching the file itself.
As above... we'll insert immediately.
> Suppose you check out two wc's of the same repos at the same time. If
> the TEXT_BASE row was updated immediately, then as soon as the first wc
> fetched its list of checksums of files to download, the second wouldn't
> download any of them;
Not really. The second WC would note that all of the STORED_SIZE
columns are NULL meaning that the content is not present, or got
partially downloaded, or whoknowswhat. But for all intents and
purposes, it does not have it. Thus, for safety, the second WC should
go ahead and request all that content. (it has no idea that another
process is doing it at the same time; coulda been last week, for all
> whereas, if TEXT_BASE is only updated after
> download completes, then it's possible for both wc's to download the
> same (identical) file simultaneously -- unnecessarily.
For safety, yes: they'll both download it. Sure, maybe that could be
optimized, but it is no worse than today :-)
>> It seems more problematic to fetch the content first, then insert a
>> row. That's just kind of a gut feel... I don't have immediate reasons
>> why offhand.
> (see above: I also prefer inserting the row first, but haven't good
> reasons either)
I figured this out the other day. The scenario is an interrupted
download. There are two choices here: add a row, then grab the content
(RTC), or grab the content and add a row (CTR).
In RTC: if the content is never downloaded, or partially downloaded,
then the row will have NULL for the STORED_SIZE. When you read the
database, you'll immediately know that you don't have valid content.
If some content is present, you could actually checksum it to find
that you have the content, but it just didn't get recorded into the
In CTR: the row is missing, so the content on disk is unknown. Maybe
all there, or maybe a partial file. You have no information.
>> > Most of the time the pristine files don't change under our feet. (When
>> > they do, they're corrupted.) So we can skip the size check and always
>> > do only the more expensive check (calculating the checksum). And if we
>> > skip the size check, we won't need stored_size in the database.
>> We can do a quick apr_stat() to see if the file is present and the
>> right size, and have a high confidence that it is usable.
> I wouldn't trust the size alone: it's too easy to run 'find | xargs sed'
> that doesn't modify the sizes. That's why I assumed we would checksum
> the base file before declaring it usable.
Well... I don't think we go to that extent today, so I'm not prepared
to distrust the text base to that level. Shoot... if you think the
text base is *that* fragile, then you're gonna have to worry about
race conditions. "Oh hey. The text base checksum still matches. Good.
<CORRUPTION> Now, let me go ahead and use it." Ooops.
So yeah. If the stored size matches what is on disk, then I'm prepare
to trust it.
Note that svn_wc__db_pristine_check() *does* have a mode that will
re-run a checksum. We can use that mode from "svn cleanup" or
>> If the stored_size is not event present, then the file is not going to
>> be there, or it has only been partially-fetched.
> I agree that it's useful for the partially-fetched case. But in that case
> a boolean would work as well (just store an arbitrary non-NULL value at
> the point you would store the size).
If the text base's size changes, then we have a fast-path exit to
"corrupt", so hey... SIZE DOES MATTER.
> The rest is pure logic: since I don't trust the size as corruption
> indicator and since I think a boolean would work as well as the size for
> detecting the partial-fetch case, therefore I am not yet convinced that
> we need to store the size at all.
> (You're welcome to convince me otherwise, of course :))
If the size is the same, then we can always run a checksum. But we can
skip the checksum if whatever happened to the text base has changed
the size. If we want to go nuts on corruption detection, then we could
also store a timestamp.
>> > The metadata location (per-dir, wc root, etc.) is typically per-wc, not
>> > per-versioned-dir. As the code is, we'll just set a key in the hash for
>> > every single directory in the wc, recursively. But we could store the
>> > information (whether the pristine dir is stored in-place or at the wc
>> > root) just for the wc root and let deeper dirs inherit its setting.
>> Whenever an operation hits a subdirectory, we don't want to scan
>> upwards to find the wc-root. We want to just *go* to where the data
>> is. Thus, we have pdh.base_dir.
> I thought we wouldn't need to scan because each dir would know what wc
> it is part of. (If we're going to put metadata at wc-root by default,
How would it know that?
In the default case, a versioned dir will NOT have a .svn
subdirectory. Also, there will be no central datastore. Everything
will be wc-root. So given an arbitrary directory, how do you determine
whether it is versioned at ALL, let alone where the wcroot is? Answer:
you traverse up the directory.
During a single svn command, you might hit some file in a subdirectory
as one of the arguments. The fact that it is "under" one of the
previously-discovered wcroots is NOT sufficient. You still have to
scan upwards to find a switched root, or an svn:external or somesuch.
> having a cheap way to find it seems natural.) And the scan would be in
> the hash, not in the disk (except for the first scan). (Lastly, isn't
> there a recursion a few layers above us that we can take advantage of?)
Once you fill the hash, then yes: it'll be fast. And as you smack new
directories, as they traverse upwards, you may run into a
previously-known directory and terminate your scan.
>> I also didn't envision proactively setting subdir values, but lazily
>> filling them in. Today, the code will create a new pdh structure since
>> we're doing per-versioned-dir base_dir values.
> Lazy filling or not, by the end of the checkout you'll have a hash key
> for every versioned dir in the wc just checked out... right? Couldn't
> memory usage, etc., become significant?
We've previously stated that an in-memory map of directories (and even
files! ugh) is okay. Take a look at the commit process some time: it
holds all that crap in memory. It is *never* okay to store a file's
contents in memory, but we've been okay with metadata.
In the long run, we'll have a bunch of directory paths mapping to a
single pdh structure, so it "shouldn't be too bad".
>> And note that we can't just have db.wc_root ... any given operation
>> could cover *multiple* wc roots (think of svn:externals or switched
> open_many() allows that too, no?
Correct. open_many() is intended to (in one pass) find all the common
pdh structures. Future requests into the API might need to look up
more, but that depends on how you use the API.
There is an open question: maybe just open a db handle, and then as
you start using it, it will open datastores as necessary, rather than
trying to do all that up front. It might simplify the API. (I don't
have a good argument right now for opening all the datastores up
>> Note that if the result is HOME or GIVEN_PATH, then all the hash
>> values would point to the same PDH structure. Well... at least for
>> a given WC. I envision an eventual possibility of different configs
>> for different WCs. For example, a WC on an NFS mount might be WC_ROOT,
>> while your personal WCs are all GIVEN_PATH.
> The part that surprises me is "eventual". Are you saying we won't be able
> to operate on metadata-at-.svn-dirs and on metadata-at-wc-root wc's in the
> same operation ('svn up wc1 wc2') or in subsequent operations under the
> same environment ('svn up wc1; svn up wc2') by default?
In one operation, some of the datastores may be wcroot, and others
centralized. I don't know what the commands will look like or the
configs or anything to get that kind of granularity, which is why I
said "eventual". The APIs definitely support metadata-per-wc to be
located anywhere, and to mix/match that at will.
To unsubscribe, e-mail: dev-unsubscribe_at_subversion.tigris.org
For additional commands, e-mail: dev-help_at_subversion.tigris.org
Received on 2008-10-05 17:57:51 CEST