[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: [PATCH] checksums for text bases (early review opportunity)

From: <cmpilato_at_collab.net>
Date: 2002-04-12 17:43:41 CEST

Karl Fogel <kfogel@newton.ch.collab.net> writes:

> This isn't working yet, but its general shape is pretty clear, so I'm
> posting it now for people to review & comment early.
>
> Currently it can seg fault. The problem seems to occur in the xml
> quoting code, because I mistakenly thought that APR's Md5 digests were
> the long format (sixteen 7-bit chars or whatever it is). Apparently
> they're not, and I need to learn how to convert to a friendlier format
> for storage in the .svn/entries file anyway (independent of xml
> escaping problem). Does this require apr_xlate.h, or is there some
> easier way?

Here, try this on for size (there were a number of boo-boos in your
code that caused segfaults, it was only a matter of which was the
first you hit while running the code :-) and I've changed the spec a
bit to claim that the 'checksum' member of svn_wc_entry_t is actually
a base64-encoded version of the MD5 checksum. Of course, I did the
encoding as well.

--------------------------------------------------------------------------
Record a checksum for text base at commit time. A graphic novel by
Karl Fogel <kfogel@collab.net> (Foreward written by C. Michael Pilato
<cmpilato@collab.net>)

--
This is part of issue #549, but does not close the issue.  We still
have to use the checksum.  There are two occasions to use it:
   - On commit (to guarantee that the svndiff data sent is calculated
     against the right data)
   - On checkout/update (to guarantee that the svndiff received gets
     applied against the right data)
The commit case is an entirely client-side change, and probably should
be considered part of issue #549's milestone.  The checkout/update
case should not, on the other hand: it requires changes to server,
client, and protocol, and anyway should be done at the same time as
issue #649 (fs checksums), so the server doesn't recompute checksums
every time it updates a client.
* subversion/include/svn_wc.h
  (svn_wc_entry_t): Add new `checksum' field.
  (svn_wc_process_committed): Tweak documentation.
  (struct svn_wc_close_commit_baton): Removed (obsolete).
* subversion/libsvn_wc/entries.h 
  (SVN_WC__ENTRY_ATTR_CHECKSUM): Remove comment about "not used", heh.
  (svn_wc__atts_to_entry): Redocument.
* subversion/libsvn_wc/entries.c
  (svn_wc__atts_to_entry, write_entry, fold_entry): Handle checksum.
* subversion/libsvn_wc/adm_ops.c
  (svn_wc_process_committed): Remove duplicate doc string, but
  preserve some of it as a comment inside the function.  Do stricter
  checking of error type when trying to open an adm file, rather than
  assuming that only one kind of error could possibly happen.  Change
  variable to `logtags', not `logtag', since it can (and usually does)
  hold multiple xml elements.
* subversion/include/svn_xml.h
  (svn_xml_make_open_tag): Redocument.
* subversion/include/svn_io.h, subversion/libsvn_subr/io.c
  (svn_io_file_checksum): New function.
Index: ./subversion/include/svn_io.h
===================================================================
--- ./subversion/include/svn_io.h
+++ ./subversion/include/svn_io.h	Fri Apr 12 09:54:23 2002
@@ -199,6 +199,14 @@
                                            apr_pool_t *pool);
 
 
+/* Store a base64-encoded MD5 checksum of FILE's contents in
+   **CHECKSUM_P.  Allocate CHECKSUM_P in POOL, and use POOL for any
+   *temporary allocation. */
+svn_error_t *svn_io_file_checksum (svn_stringbuf_t **checksum_p,
+                                   const char *file,
+                                   apr_pool_t *pool);
+
+
 /* Return a POSIX-like file descriptor from FILE.
 
    We need this because on some platforms, notably Windows, apr_file_t
Index: ./subversion/include/svn_wc.h
===================================================================
--- ./subversion/include/svn_wc.h
+++ ./subversion/include/svn_wc.h	Fri Apr 12 09:28:44 2002
@@ -177,6 +177,10 @@
   apr_time_t text_time;          /* last up-to-date time for text contents */
   apr_time_t prop_time;          /* last up-to-date time for properties */
 
+  /* Checksum.  Optional; can be NULL for backwards compatibility. */
+  svn_stringbuf_t *checksum;     /* base64-encoded checksum for the
+                                    untranslated base file. */
+
   /* "Entry props" */
   svn_revnum_t cmt_rev;          /* last revision this was changed */
   apr_time_t cmt_date;           /* last date this was changed */
@@ -545,22 +549,7 @@
 
 /*** Commits. ***/
 
-/* The RA layer needs 3 functions when doing a commit: */
-
-/* Publically declared, so libsvn_client can pass it off to the RA
-   layer to use with any of the next three functions. */
-struct svn_wc_close_commit_baton
-{
-  /* The "prefix" path that must be prepended to each target that
-     comes in here.  It's the original path that the user specified to
-     the `svn commit' command. */
-  svn_stringbuf_t *prefix_path;
-};
-
-
-/* This is of type `svn_ra_close_commit_func_t'.
-
-   Bump each committed PATH to NEW_REVNUM, one at a time, after a
+/* Bump a successfully committed absolute PATH to NEW_REVNUM after a
    commit succeeds.  REV_DATE and REV_AUTHOR are the (server-side)
    date and author of the new revision; one or both may be NULL.
 
Index: ./subversion/include/svn_xml.h
===================================================================
--- ./subversion/include/svn_xml.h
+++ ./subversion/include/svn_xml.h	Fri Apr 12 08:14:56 2002
@@ -211,11 +211,16 @@
 void svn_xml_make_header (svn_stringbuf_t **str, apr_pool_t *pool);
 
 
-/* Makes an XML open tag named TAGNAME.  Varargs are used to specify a
-   NULL-terminated list of alternating const char *Key and
-   svn_stringbuf_t *Val.
+/* Store a new xml tag TAGNAME in *STR.
 
-   STYLE must be one of the enumerated styles in svn_xml_open_tag_style. */
+   If STR is NULL, allocate *STR in POOL; else append the new tag to
+   *STR, allocating in STR's pool
+
+   Take the tag's attributes from varargs, a NULL-terminated list of
+   alternating const char *Key and svn_stringbuf_t *Val.  Do
+   xml-escaping on each Val.
+
+   STYLE is one of the enumerated styles in svn_xml_open_tag_style. */
 void svn_xml_make_open_tag (svn_stringbuf_t **str,
 			    apr_pool_t *pool,
 			    enum svn_xml_open_tag_style style,
Index: ./subversion/libsvn_wc/entries.c
===================================================================
--- ./subversion/libsvn_wc/entries.c
+++ ./subversion/libsvn_wc/entries.c	Fri Apr 12 10:03:44 2002
@@ -335,6 +335,14 @@
       }
   }
 
+  /* Checksum. */
+  {
+    entry->checksum = apr_hash_get (atts, SVN_WC__ENTRY_ATTR_CHECKSUM,
+                                    APR_HASH_KEY_STRING);
+    if (entry->checksum)
+      *modify_flags |= SVN_WC__ENTRY_MODIFY_CHECKSUM;
+  }
+
   /* Setup last-committed values. */
   {
     svn_stringbuf_t *cmt_datestr, *cmt_revstr;
@@ -870,6 +878,11 @@
                     svn_stringbuf_create (timestr, pool));
     }
 
+  /* Checksum */
+  if (entry->checksum)
+    apr_hash_set (atts, SVN_WC__ENTRY_ATTR_CHECKSUM, APR_HASH_KEY_STRING,
+                  entry->checksum);
+
   /* Last-commit Stuff */
   if (SVN_IS_VALID_REVNUM (entry->cmt_rev))
     apr_hash_set (atts, SVN_WC__ENTRY_ATTR_CMT_REV, APR_HASH_KEY_STRING,
@@ -1071,6 +1084,12 @@
   if (modify_flags & SVN_WC__ENTRY_MODIFY_SCHEDULE)
     cur_entry->schedule = entry->schedule;
 
+  /* Checksum */
+  if (modify_flags & SVN_WC__ENTRY_MODIFY_CHECKSUM)
+    cur_entry->checksum = entry->checksum
+                          ? svn_stringbuf_dup (entry->checksum, pool) 
+                          : NULL;
+  
   /* Copy-related stuff */
   if (modify_flags & SVN_WC__ENTRY_MODIFY_COPIED)
     cur_entry->copied = entry->copied;
Index: ./subversion/libsvn_wc/entries.h
===================================================================
--- ./subversion/libsvn_wc/entries.h
+++ ./subversion/libsvn_wc/entries.h	Fri Apr 12 08:14:56 2002
@@ -48,7 +48,7 @@
 #define SVN_WC__ENTRY_ATTR_KIND          "kind"
 #define SVN_WC__ENTRY_ATTR_TEXT_TIME     "text-time"
 #define SVN_WC__ENTRY_ATTR_PROP_TIME     "prop-time"
-#define SVN_WC__ENTRY_ATTR_CHECKSUM      "checksum"     /* ### not used */
+#define SVN_WC__ENTRY_ATTR_CHECKSUM      "checksum"
 #define SVN_WC__ENTRY_ATTR_SCHEDULE      "schedule"
 #define SVN_WC__ENTRY_ATTR_COPIED        "copied"
 #define SVN_WC__ENTRY_ATTR_COPYFROM_URL  "copyfrom-url"
@@ -77,11 +77,12 @@
                                     apr_pool_t *pool);
 
 
-/* Create a new entry from the attributes hash ATTS.  Use POOL for all
-   allocations, EXCEPT that **NEW_ENTRY->attributes will simply point
-   to the ATTS passed to the function -- this hash is not copied into
-   a new pool!  MODIFY_FLAGS are set to the fields that were
-   explicitly modified by the attribute hash.  */
+/* Set *NEW_ENTRY to a new entry, taking attributes from ATTS.
+   Allocate the entry itself in POOL, but don't copy attributes into
+   POOL.  Instead, (*NEW_ENTRY)->attributes and any allocated members
+   in *NEW_ENTRY will refer directly to ATTS and its values.
+
+   Set MODIFY_FLAGS to reflect the fields that were present in ATTS. */
 svn_error_t *svn_wc__atts_to_entry (svn_wc_entry_t **new_entry,
                                     apr_uint32_t *modify_flags,
                                     apr_hash_t *atts,
Index: ./subversion/libsvn_wc/adm_ops.c
===================================================================
--- ./subversion/libsvn_wc/adm_ops.c
+++ ./subversion/libsvn_wc/adm_ops.c	Fri Apr 12 10:12:44 2002
@@ -192,17 +192,6 @@
 }
 
 
-
-/* Process an absolute PATH that has just been successfully committed.
-   
-   Specifically, its working revision will be set to NEW_REVNUM;  if
-   REV_DATE and REV_AUTHOR are both non-NULL, then three entry values
-   will be set (overwritten):  'committed-rev', 'committed-date',
-   'last-author'. 
-
-   If RECURSE is true (assuming PATH is a directory), this post-commit
-   processing will happen recursively down from PATH. 
-*/
 svn_error_t *
 svn_wc_process_committed (svn_stringbuf_t *path,
                           svn_boolean_t recurse,
@@ -213,9 +202,15 @@
 {
   svn_error_t *err;
   apr_status_t apr_err;
-  svn_stringbuf_t *log_parent, *logtag, *basename;
+  svn_stringbuf_t *log_parent, *logtags, *basename;
   apr_file_t *log_fp = NULL;
   char *revstr = apr_psprintf (pool, "%ld", new_revnum);
+  svn_stringbuf_t *checksum = NULL;
+
+  /* Set PATH's working revision to NEW_REVNUM; if REV_DATE and
+     REV_AUTHOR are both non-NULL, then set the 'committed-rev',
+     'committed-date', and 'last-author' entry values; and set the
+     checksum if a file. */
 
   /* Write a log file in the adm dir of path. */
 
@@ -228,7 +223,9 @@
   if (err)
     {
       /* (Ah, PATH must be a file.  So create a logfile in its
-         parent instead.) */      
+         parent instead.) */
+
+      svn_stringbuf_t *tmp_text_base;
 
       svn_error_clear_all (err);
       svn_path_split (path, &log_parent, &basename, pool);
@@ -239,10 +236,26 @@
                                       (APR_WRITE|APR_APPEND|APR_CREATE),
                                       pool));
 
+      /* We know that the new text base is sitting in the adm tmp area
+         by now, because the commit succeeded. */
+      tmp_text_base = svn_wc__text_base_path (path, TRUE, pool);
+
+      /* It would be more efficient to compute the checksum as part of
+         some other operation that has to process all the bytes anyway
+         (such as copying or translation).  But that would make a lot
+         of other code more complex, since the relevant copy and/or
+         translation operations happened elsewhere, a long time ago.
+         If we were to obtain the checksum then/there, we'd still have
+         to somehow preserve it until now/here, which would result in
+         unexpected and hard-to-maintain dependencies.  Ick.
+
+         So instead we just do the checksum from scratch.  Ick. */
+      svn_io_file_checksum (&checksum, tmp_text_base->data, pool);
+
       /* Oh, and recursing at this point isn't really sensible. */
       recurse = FALSE;
     }
-  else
+  else if (! err)
     {
       /* PATH must be a dir */
       svn_stringbuf_t *pdir;
@@ -267,18 +280,21 @@
       tmp_entry.revision = new_revnum;
       SVN_ERR (svn_wc__entry_modify (pdir, basename, &tmp_entry, 
                                      SVN_WC__ENTRY_MODIFY_REVISION, pool));
-    }
+    } 
+  else   /* got an unexpected error, return it */
+    return err;
 
-  logtag = svn_stringbuf_create ("", pool);
+  logtags = svn_stringbuf_create ("", pool);
 
   /* Append a log command to set (overwrite) the 'committed-rev',
-     'committed-date', and 'last-author' attributes in the entry.
+     'committed-date', 'last-author', and possibly `checksum'
+     attributes in the entry.
 
      Note: it's important that this log command come *before* the
      LOG_COMMITTED command, because log_do_committed() might actually
      remove the entry! */
   if (rev_date && rev_author)
-    svn_xml_make_open_tag (&logtag, pool, svn_xml_self_closing,
+    svn_xml_make_open_tag (&logtags, pool, svn_xml_self_closing,
                            SVN_WC__LOG_MODIFY_ENTRY,
                            SVN_WC__LOG_ATTR_NAME, basename,
                            SVN_WC__ENTRY_ATTR_CMT_REV,
@@ -289,11 +305,18 @@
                            svn_stringbuf_create (rev_author, pool),
                            NULL);
 
+  if (checksum)
+    svn_xml_make_open_tag (&logtags, pool, svn_xml_self_closing,
+                           SVN_WC__LOG_MODIFY_ENTRY,
+                           SVN_WC__LOG_ATTR_NAME, basename,
+                           SVN_WC__ENTRY_ATTR_CHECKSUM,
+                           checksum,
+                           NULL);
 
   /* Regardless of whether it's a file or dir, the "main" logfile
      contains a command to bump the revision attribute (and
      timestamp.)  */
-  svn_xml_make_open_tag (&logtag, pool, svn_xml_self_closing,
+  svn_xml_make_open_tag (&logtags, pool, svn_xml_self_closing,
                          SVN_WC__LOG_COMMITTED,
                          SVN_WC__LOG_ATTR_NAME, basename,
                          SVN_WC__LOG_ATTR_REVISION, 
@@ -301,7 +324,7 @@
                          NULL);
 
 
-  apr_err = apr_file_write_full (log_fp, logtag->data, logtag->len, NULL);
+  apr_err = apr_file_write_full (log_fp, logtags->data, logtags->len, NULL);
   if (apr_err)
     {
       apr_file_close (log_fp);
Index: ./subversion/libsvn_subr/io.c
===================================================================
--- ./subversion/libsvn_subr/io.c
+++ ./subversion/libsvn_subr/io.c	Fri Apr 12 10:30:14 2002
@@ -28,11 +28,13 @@
 #include <apr_strings.h>
 #include <apr_thread_proc.h>
 #include <apr_portable.h>
+#include <apr_md5.h>
 #include "svn_types.h"
 #include "svn_path.h"
 #include "svn_string.h"
 #include "svn_error.h"
 #include "svn_io.h"
+#include "svn_base64.h"
 #include "svn_pools.h"
 #include "svn_private_config.h" /* for SVN_CLIENT_DIFF */
 
@@ -389,6 +391,64 @@
   return SVN_NO_ERROR;
 }
 
+
+svn_error_t *
+svn_io_file_checksum (svn_stringbuf_t **checksum_p,
+                      const char *file,
+                      apr_pool_t *pool)
+{
+  struct apr_md5_ctx_t context;
+  apr_file_t *f = NULL;
+  apr_status_t apr_err;
+  unsigned char digest[MD5_DIGESTSIZE];
+  char buf[BUFSIZ];  /* What's a good size for a read chunk? */
+  svn_stringbuf_t *md5str;
+
+  /* ### The apr_md5 functions return apr_status_t, but they only
+     return success, and really, what could go wrong?  So below, we
+     ignore their return values. */
+
+  apr_md5_init (&context);
+
+  apr_err = apr_file_open (&f, file, APR_READ, APR_OS_DEFAULT, pool);
+  if (apr_err)
+    return svn_error_createf
+      (apr_err, 0, NULL, pool,
+       "svn_io_file_checksum: error opening '%s' for reading", file);
+  
+  do { 
+    apr_size_t len = BUFSIZ;
+
+    apr_err = apr_file_read (f, buf, &len);
+
+    if ((! (APR_STATUS_IS_SUCCESS (apr_err))) && (apr_err != APR_EOF))
+      return svn_error_createf
+        (apr_err, 0, NULL, pool,
+         "svn_io_file_checksum: error reading from '%s'", file);
+
+    apr_md5_update (&context, buf, len);
+
+  } while (apr_err != APR_EOF);
+
+  apr_err = apr_file_close (f);
+  if (apr_err)
+    return svn_error_createf
+      (apr_err, 0, NULL, pool,
+       "svn_io_file_checksum: error closing '%s'", file);
+
+  apr_md5_final (digest, &context);
+  md5str = svn_stringbuf_ncreate (digest, MD5_DIGESTSIZE, pool);
+  *checksum_p = svn_base64_encode_string (md5str, pool);
+  
+  /* ### Our base64-encoding routines append a final newline if any
+     data was created at all, so let's hack that off. */
+  if ((*checksum_p)->len)
+    {
+      (*checksum_p)->len--;
+      (*checksum_p)->data[(*checksum_p)->len] = 0;
+    }
+  return SVN_NO_ERROR;
+}
 
 
 /*** Permissions and modes. ***/
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Fri Apr 12 17:46:30 2002

This is an archived mail posted to the Subversion Dev mailing list.