[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: svn commits: r33357, r33358

From: Daniel Shahaf <d.s_at_daniel.shahaf.name>
Date: Fri, 3 Oct 2008 09:39:09 +0300 (Jerusalem Daylight Time)

Took me a few days to get to these...

r33357:
> Continued evoluation of the metadata schema.
>
> * subversion/libsvn_wc/wc-metadata.sql3:
> (NODE, NODE_CHANGES):
> - move the actual_size field to this table, from the TEXT_BASE table.
> two different nodes may share the same text base, but apply different
> EOL and/or keywords translations to it, thus having different sizes
> within the WORKING/ACTUAL trees.
> - place the checksum directly into this table, rather than a "text_id"
> field. due to operational issues, the text_id may not be a satisfiable
> foreign key (we may not want to insert a row).

Can you explain this scenario? When do you see TEXT_ID not a valid
foreign key *and* CHECKSUM a valid foreign key (at the same time)?
Before this change, the schema was "id PRIMARY KEY, checksum NOT NULL".

> thus, the ideal foreign
> key is "checksum" which *may* exist in the TEXT_BASE table.
> (TEXT_BASE):
> - renamed from BASE_TEXT. we use "text base" more often.
> - drop the "id" field, and key off of checksum instead.
> - compression is now optional (rather than planning on a particular
> value to mean "no compression").
> - stored_size is now optional to detect when the pristine has been
> properly stored (or not).
>

> -CREATE TABLE BASE_TEXT (
> - id INTEGER PRIMARY KEY AUTOINCREMENT,
> -
> +CREATE TABLE TEXT_BASE (
> /* ### the hash algorithm (MD5 or SHA-1) is encoded in this value */
> - checksum TEXT NOT NULL,
> -
> - /* ### enumerated values specifying type of compression */
> - compression INTEGER NOT NULL,
> + checksum TEXT NOT NULL PRIMARY KEY,
>

Okay. A certain text-base may appear more than once (if we change the
hash algorithm used), but this won't break anything. (The row may be
shared, so we'll have to create a new one rather than just modify the
CHECKSUM column's value.)

> + /* ### used to verify the pristine file is "proper". NULL if unknown,
> + ### and (thus) the pristine copy is incomplete/unusable. */
> + stored_size INTEGER,

Do we need this?

Most of the time the pristine files don't change under our feet. (When
they do, they're corrupted.) So we can skip the size check and always
do only the more expensive check (calculating the checksum). And if we
skip the size check, we won't need stored_size in the database.

?

> - refcount INTEGER NOT NULL
> + refcount INTEGER NOT NULL

This value can be derived from the other tables, can't it? (See? Who
said whitespace-only changes were useless?)

> );

r33358:
> --- trunk/subversion/libsvn_wc/wc_db.h 2008/09/30 04:39:20 33357
> +++ trunk/subversion/libsvn_wc/wc_db.h 2008/09/30 04:58:37 33358
> +/* ### @a new_refcount may be NULL */
> +svn_error_t *
> +svn_wc__db_pristine_decref(int *new_refcount,
> + svn_wc__db_pdh_t *pdh,
> + svn_checksum_t *checksum,

So, sometimes 'svn_checksum_t *checksum' is the primary key of the
operation, and sometimes it just describes a file passed in another arg.
Personally I'd find it easier to read if it were named differently
depending on its role -- e.g., here it might be named 'svn_checksum_t
*text_base_id'. That conveys more information than calling it 'checksum'.

> /*
> * wc_db.c : manipulating the administrative database
..
> static svn_error_t *
> get_pristine_fname(const char **path,
> svn_wc__db_pdh_t *pdh,
> svn_checksum_t *checksum,
> svn_boolean_t create_subdir,
> apr_pool_t *result_pool,
> apr_pool_t *scratch_pool)
> {
> const char *hexdigest = svn_checksum_to_cstring(checksum, scratch_pool);
> char subdir[3] = { 0 };
>
> /* We should have a valid checksum and (thus) a valid digest. */
> SVN_ERR_ASSERT(hexdigest != NULL);
>
> /* Get the first two characters of the digest, for the subdir. */
> subdir[0] = hexdigest[0];
> subdir[1] = hexdigest[1];
>

*nod* Works for any non-zero length of hexdigest[]. Nice :)

> svn_error_t *
> svn_wc__db_open_many(svn_wc__db_t **db,
> const apr_array_header_t *paths,
> svn_config_t *config,
> apr_pool_t *result_pool,
> apr_pool_t *scratch_pool)
> {
...
> for (i = 0; i < paths->nelts; ++i)
> {
> const char *path = APR_ARRAY_IDX(paths, i, const char *);
> svn_node_kind_t kind;
> svn_boolean_t special;
> svn_wc__db_pdh_t *pdh;
>
> /* If the file is special, then we need to refer to the encapsulating

Contrary to the comment, the boolean variable 'special' is not used in
the function.

> directory instead, rather than resolving through a symlink to a
> file or directory. */
> SVN_ERR(svn_io_check_special_path(path, &kind, &special, scratch_pool));
>
...
> }
>
>
> svn_error_t *
> svn_wc__db_pristine_dirhandle(svn_wc__db_pdh_t **pdh,
> svn_wc__db_t *db,
> const char *dirpath,
> apr_pool_t *result_pool,
> apr_pool_t *scratch_pool)
> {
> /* ### need to fix this up. we'll probably get called with a subdirectory
> ### of some dirpath that we opened originally. that means we probably
> ### won't have the subdir in the hash table. need to be able to
> ### incrementally grow the hash of per-dir structures. */
>
> *pdh = apr_hash_get(db->dir_data, dirpath, APR_HASH_KEY_STRING);
>
> return SVN_NO_ERROR;
> }

The metadata location (per-dir, wc root, etc.) is typically per-wc, not
per-versioned-dir. As the code is, we'll just set a key in the hash for
every single directory in the wc, recursively. But we could store the
information (whether the pristine dir is stored in-place or at the wc
root) just for the wc root and let deeper dirs inherit its setting.

Also, for the values, why use strings? An enum should cover the 2-3
most common cases ("in ./.svn", "in wc root" (we know where it is),
"in ~/.subversion").

>
>
> svn_error_t *
> svn_wc__db_pristine_decref(int *new_refcount,
> svn_wc__db_pdh_t *pdh,
> svn_checksum_t *checksum,
> apr_pool_t *scratch_pool)
> {
> return svn_error__malfunction(__FILE__, __LINE__, "Not implemented.");
> }

Interesting spelling of the error handling.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe_at_subversion.tigris.org
For additional commands, e-mail: dev-help_at_subversion.tigris.org
Received on 2008-10-03 08:39:28 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.