[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

[PATCH] checksums for text bases (early review opportunity)

From: Karl Fogel <kfogel_at_newton.ch.collab.net>
Date: 2002-04-12 01:46:10 CEST

This isn't working yet, but its general shape is pretty clear, so I'm
posting it now for people to review & comment early.

Currently it can seg fault. The problem seems to occur in the xml
quoting code, because I mistakenly thought that APR's Md5 digests were
the long format (sixteen 7-bit chars or whatever it is). Apparently
they're not, and I need to learn how to convert to a friendlier format
for storage in the .svn/entries file anyway (independent of xml
escaping problem). Does this require apr_xlate.h, or is there some
easier way?

-K

--------------------------------------------------------------------------
Record a checksum for text base at commit time.

[THIS CHANGE IS NOT WORKING YET. FOR EAGER REVIEWERS ONLY.]

This is part of issue #549, but does not close the issue. We still
have to use the checksum. There are two occasions to use it:

   - On commit (to guarantee that the svndiff data sent is calculated
     against the right data)

   - On checkout/update (to guarantee that the svndiff received gets
     applied against the right data)

The commit case is an entirely client-side change, and imho should be
considered part of issue #549's milestone. The checkout/update case
should not, on the other hand: it requires changes to server, client,
and protocol, and anyway should be done at the same time as issue #649
(fs checksums), so the server doesn't recompute checksums every time
it updates a client.

* subversion/include/svn_wc.h
  (svn_wc_entry_t): Add new `checksum' field.
  (svn_wc_process_committed): Tweak documentation.

* subversion/libsvn_wc/entries.h
  (SVN_WC__ENTRY_ATTR_CHECKSUM): Remove comment about "not used", heh.
  (svn_wc__atts_to_entry): Redocument.

* subversion/libsvn_wc/entries.c
  (svn_wc__atts_to_entry, write_entry): Handle checksum.

* subversion/libsvn_wc/adm_ops.c
  (svn_wc_process_committed): Compute and store a checksum for the
  text base. Remove duplicate doc string, but preserve some of it as a
  comment inside the function. Do stricter checking of error type
  when trying to open an adm file, rather than assuming that only one
  kind of error could possibly happen. Change variable to `logtags',
  not `logtag', since it can (and usually does) hold multiple xml
  elements.

* subversion/include/svn_xml.h
  (svn_xml_make_open_tag): Redocument.

* subversion/include/svn_io.h, subversion/libsvn_subr/io.c
  (svn_io_file_checksum): New function.

Index: ./subversion/include/svn_io.h
===================================================================
--- ./subversion/include/svn_io.h
+++ ./subversion/include/svn_io.h Thu Apr 11 18:42:22 2002
@@ -199,6 +199,13 @@
                                            apr_pool_t *pool);
 
 
+/* Store an MD5 checksum of FILE's contents in *CHECKSUM_P. Allocate
+ *CHECKSUM_P in POOL, and use POOL for any temporary allocation. */
+svn_error_t *svn_io_file_checksum (const char **checksum_p,
+ const char *file,
+ apr_pool_t *pool);
+
+
 /* Return a POSIX-like file descriptor from FILE.
 
    We need this because on some platforms, notably Windows, apr_file_t
Index: ./subversion/include/svn_wc.h
===================================================================
--- ./subversion/include/svn_wc.h
+++ ./subversion/include/svn_wc.h Thu Apr 11 18:32:25 2002
@@ -177,6 +177,12 @@
   apr_time_t text_time; /* last up-to-date time for text contents */
   apr_time_t prop_time; /* last up-to-date time for properties */
 
+ /* Checksum. Optional; can be NULL for backwards compatibility. */
+ svn_stringbuf_t *checksum; /* Checksum for the untranslated base file.
+ (In theory, this is never longer
+ than MD5_DIGESTSIZE bytes, but no
+ need to encode that in the type.) */
+
   /* "Entry props" */
   svn_revnum_t cmt_rev; /* last revision this was changed */
   apr_time_t cmt_date; /* last date this was changed */
@@ -560,7 +566,7 @@
 
 /* This is of type `svn_ra_close_commit_func_t'.
 
- Bump each committed PATH to NEW_REVNUM, one at a time, after a
+ Bump a successfully committed absolute PATH to NEW_REVNUM after a
    commit succeeds. REV_DATE and REV_AUTHOR are the (server-side)
    date and author of the new revision; one or both may be NULL.
 
Index: ./subversion/include/svn_xml.h
===================================================================
--- ./subversion/include/svn_xml.h
+++ ./subversion/include/svn_xml.h Thu Apr 11 17:31:13 2002
@@ -211,11 +211,16 @@
 void svn_xml_make_header (svn_stringbuf_t **str, apr_pool_t *pool);
 
 
-/* Makes an XML open tag named TAGNAME. Varargs are used to specify a
- NULL-terminated list of alternating const char *Key and
- svn_stringbuf_t *Val.
+/* Store a new xml tag TAGNAME in *STR.
 
- STYLE must be one of the enumerated styles in svn_xml_open_tag_style. */
+ If STR is NULL, allocate *STR in POOL; else append the new tag to
+ *STR, allocating in STR's pool
+
+ Take the tag's attributes from varargs, a NULL-terminated list of
+ alternating const char *Key and svn_stringbuf_t *Val. Do
+ xml-escaping on each Val.
+
+ STYLE is one of the enumerated styles in svn_xml_open_tag_style. */
 void svn_xml_make_open_tag (svn_stringbuf_t **str,
                             apr_pool_t *pool,
                             enum svn_xml_open_tag_style style,
Index: ./subversion/libsvn_wc/entries.c
===================================================================
--- ./subversion/libsvn_wc/entries.c
+++ ./subversion/libsvn_wc/entries.c Thu Apr 11 16:47:38 2002
@@ -335,6 +335,14 @@
       }
   }
 
+ /* Checksum. */
+ {
+ entry->checksum = apr_hash_get (atts, SVN_WC__ENTRY_ATTR_CHECKSUM,
+ APR_HASH_KEY_STRING);
+ if (entry->checksum)
+ *modify_flags |= SVN_WC__ENTRY_MODIFY_CHECKSUM;
+ }
+
   /* Setup last-committed values. */
   {
     svn_stringbuf_t *cmt_datestr, *cmt_revstr;
@@ -868,6 +876,13 @@
       const char *timestr = svn_time_to_nts (entry->prop_time, pool);
       apr_hash_set (atts, SVN_WC__ENTRY_ATTR_PROP_TIME, APR_HASH_KEY_STRING,
                     svn_stringbuf_create (timestr, pool));
+ }
+
+ /* Checksum */
+ if (entry->checksum)
+ {
+ apr_hash_set (atts, SVN_WC__ENTRY_ATTR_CHECKSUM,
+ entry->checksum->len, entry->checksum->data);
     }
 
   /* Last-commit Stuff */
Index: ./subversion/libsvn_wc/entries.h
===================================================================
--- ./subversion/libsvn_wc/entries.h
+++ ./subversion/libsvn_wc/entries.h Thu Apr 11 16:19:52 2002
@@ -48,7 +48,7 @@
 #define SVN_WC__ENTRY_ATTR_KIND "kind"
 #define SVN_WC__ENTRY_ATTR_TEXT_TIME "text-time"
 #define SVN_WC__ENTRY_ATTR_PROP_TIME "prop-time"
-#define SVN_WC__ENTRY_ATTR_CHECKSUM "checksum" /* ### not used */
+#define SVN_WC__ENTRY_ATTR_CHECKSUM "checksum"
 #define SVN_WC__ENTRY_ATTR_SCHEDULE "schedule"
 #define SVN_WC__ENTRY_ATTR_COPIED "copied"
 #define SVN_WC__ENTRY_ATTR_COPYFROM_URL "copyfrom-url"
@@ -77,11 +77,12 @@
                                     apr_pool_t *pool);
 
 
-/* Create a new entry from the attributes hash ATTS. Use POOL for all
- allocations, EXCEPT that **NEW_ENTRY->attributes will simply point
- to the ATTS passed to the function -- this hash is not copied into
- a new pool! MODIFY_FLAGS are set to the fields that were
- explicitly modified by the attribute hash. */
+/* Set *NEW_ENTRY to a new entry, taking attributes from ATTS.
+ Allocate the entry itself in POOL, but don't copy attributes into
+ POOL. Instead, (*NEW_ENTRY)->attributes and any allocated members
+ in *NEW_ENTRY will refer directly to ATTS and its values.
+
+ Set MODIFY_FLAGS to reflect the fields that were present in ATTS. */
 svn_error_t *svn_wc__atts_to_entry (svn_wc_entry_t **new_entry,
                                     apr_uint32_t *modify_flags,
                                     apr_hash_t *atts,
Index: ./subversion/libsvn_wc/adm_ops.c
===================================================================
--- ./subversion/libsvn_wc/adm_ops.c
+++ ./subversion/libsvn_wc/adm_ops.c Thu Apr 11 18:26:12 2002
@@ -192,17 +192,6 @@
 }
 
 
-
-/* Process an absolute PATH that has just been successfully committed.
-
- Specifically, its working revision will be set to NEW_REVNUM; if
- REV_DATE and REV_AUTHOR are both non-NULL, then three entry values
- will be set (overwritten): 'committed-rev', 'committed-date',
- 'last-author'.
-
- If RECURSE is true (assuming PATH is a directory), this post-commit
- processing will happen recursively down from PATH.
-*/
 svn_error_t *
 svn_wc_process_committed (svn_stringbuf_t *path,
                           svn_boolean_t recurse,
@@ -213,9 +202,15 @@
 {
   svn_error_t *err;
   apr_status_t apr_err;
- svn_stringbuf_t *log_parent, *logtag, *basename;
+ svn_stringbuf_t *log_parent, *logtags, *basename;
   apr_file_t *log_fp = NULL;
   char *revstr = apr_psprintf (pool, "%ld", new_revnum);
+ const char *checksum = NULL;
+
+ /* Set PATH's working revision to NEW_REVNUM; if REV_DATE and
+ REV_AUTHOR are both non-NULL, then set the 'committed-rev',
+ 'committed-date', and 'last-author' entry values; and set the
+ checksum if a file. */
 
   /* Write a log file in the adm dir of path. */
 
@@ -225,10 +220,12 @@
   err = svn_wc__open_adm_file (&log_fp, log_parent, SVN_WC__ADM_LOG,
                                (APR_WRITE | APR_APPEND | APR_CREATE),
                                pool);
- if (err)
+ if (err && (err->apr_err == APR_ENOTDIR))
     {
       /* (Ah, PATH must be a file. So create a logfile in its
- parent instead.) */
+ parent instead.) */
+
+ svn_stringbuf_t *tmp_text_base;
 
       svn_error_clear_all (err);
       svn_path_split (path, &log_parent, &basename, pool);
@@ -239,10 +236,26 @@
                                       (APR_WRITE|APR_APPEND|APR_CREATE),
                                       pool));
 
+ /* We know that the new text base is sitting in the adm tmp area
+ by now, because the commit succeeded. */
+ tmp_text_base = svn_wc__text_base_path (path, TRUE, pool);
+
+ /* It would be more efficient to compute the checksum as part of
+ some other operation that has to process all the bytes anyway
+ (such as copying or translation). But that would make a lot
+ of other code more complex, since the relevant copy and/or
+ translation operations happened elsewhere, a long time ago.
+ If we were to obtain the checksum then/there, we'd still have
+ to somehow preserve it until now/here, which would result in
+ unexpected and hard-to-maintain dependencies. Ick.
+
+ So instead we just do the checksum from scratch. Ick. */
+ svn_io_file_checksum (&checksum, tmp_text_base->data, pool);
+
       /* Oh, and recursing at this point isn't really sensible. */
       recurse = FALSE;
     }
- else
+ else if (! err)
     {
       /* PATH must be a dir */
       svn_stringbuf_t *pdir;
@@ -267,18 +280,21 @@
       tmp_entry.revision = new_revnum;
       SVN_ERR (svn_wc__entry_modify (pdir, basename, &tmp_entry,
                                      SVN_WC__ENTRY_MODIFY_REVISION, pool));
- }
+ }
+ else /* got an unexpected error, return it */
+ return err;
 
- logtag = svn_stringbuf_create ("", pool);
+ logtags = svn_stringbuf_create ("", pool);
 
   /* Append a log command to set (overwrite) the 'committed-rev',
- 'committed-date', and 'last-author' attributes in the entry.
+ 'committed-date', 'last-author', and possibly `checksum'
+ attributes in the entry.
 
      Note: it's important that this log command come *before* the
      LOG_COMMITTED command, because log_do_committed() might actually
      remove the entry! */
   if (rev_date && rev_author)
- svn_xml_make_open_tag (&logtag, pool, svn_xml_self_closing,
+ svn_xml_make_open_tag (&logtags, pool, svn_xml_self_closing,
                            SVN_WC__LOG_MODIFY_ENTRY,
                            SVN_WC__LOG_ATTR_NAME, basename,
                            SVN_WC__ENTRY_ATTR_CMT_REV,
@@ -287,13 +303,14 @@
                            svn_stringbuf_create (rev_date, pool),
                            SVN_WC__ENTRY_ATTR_CMT_AUTHOR,
                            svn_stringbuf_create (rev_author, pool),
+ checksum ? SVN_WC__ENTRY_ATTR_CHECKSUM : NULL,
+ checksum ? checksum : NULL,
                            NULL);
 
-
   /* Regardless of whether it's a file or dir, the "main" logfile
      contains a command to bump the revision attribute (and
      timestamp.) */
- svn_xml_make_open_tag (&logtag, pool, svn_xml_self_closing,
+ svn_xml_make_open_tag (&logtags, pool, svn_xml_self_closing,
                          SVN_WC__LOG_COMMITTED,
                          SVN_WC__LOG_ATTR_NAME, basename,
                          SVN_WC__LOG_ATTR_REVISION,
@@ -301,7 +318,7 @@
                          NULL);
 
 
- apr_err = apr_file_write_full (log_fp, logtag->data, logtag->len, NULL);
+ apr_err = apr_file_write_full (log_fp, logtags->data, logtags->len, NULL);
   if (apr_err)
     {
       apr_file_close (log_fp);
Index: ./subversion/libsvn_subr/io.c
===================================================================
--- ./subversion/libsvn_subr/io.c
+++ ./subversion/libsvn_subr/io.c Thu Apr 11 18:43:09 2002
@@ -28,6 +28,7 @@
 #include <apr_strings.h>
 #include <apr_thread_proc.h>
 #include <apr_portable.h>
+#include <apr_md5.h>
 #include "svn_types.h"
 #include "svn_path.h"
 #include "svn_string.h"
@@ -385,6 +386,60 @@
     *different_p = FALSE;
   else
     *different_p = TRUE;
+
+ return SVN_NO_ERROR;
+}
+
+
+svn_error_t *
+svn_io_file_checksum (const char **checksum_p,
+ const char *file,
+ apr_pool_t *pool)
+{
+ char *checksum = NULL;
+ struct apr_md5_ctx_t context;
+ apr_file_t *f = NULL;
+ apr_status_t apr_err;
+ unsigned char digest[MD5_DIGESTSIZE];
+ char buf[BUFSIZ]; /* What's a good size for a read chunk? */
+
+ /* ### The apr_md5 functions return apr_status_t, but they only
+ return success, and really, what could go wrong? So below, we
+ ignore their return values. */
+
+ apr_md5_init (&context);
+
+ apr_err = apr_file_open (&f, file, APR_READ, APR_OS_DEFAULT, pool);
+ if (apr_err)
+ return svn_error_createf
+ (apr_err, 0, NULL, pool,
+ "svn_io_file_checksum: error opening '%s' for reading", file);
+
+ do {
+ apr_size_t len = BUFSIZ;
+
+ apr_err = apr_file_read (f, buf, &len);
+
+ if ((! (APR_STATUS_IS_SUCCESS (apr_err))) && (apr_err != APR_EOF))
+ return svn_error_createf
+ (apr_err, 0, NULL, pool,
+ "svn_io_file_checksum: error reading from '%s'", file);
+
+ apr_md5_update (&context, buf, len);
+
+ } while (apr_err != APR_EOF);
+
+ apr_err = apr_file_close (f);
+ if (apr_err)
+ return svn_error_createf
+ (apr_err, 0, NULL, pool,
+ "svn_io_file_checksum: error closing '%s'", file);
+
+ checksum = apr_pcalloc (pool, (MD5_DIGESTSIZE + 1));
+ apr_md5_final (digest, &context);
+ memcpy (checksum, digest, MD5_DIGESTSIZE);
+
+ *checksum_p = checksum;
 
   return SVN_NO_ERROR;
 }

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Fri Apr 12 01:41:31 2002

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.