[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: dir/file plus more (was: Re: Are svn_fs_dir_t and svn_fs_file_t worth it?)

From: Jim Blandy <jimb_at_zwingli.cygnus.com>
Date: 2000-11-09 04:25:15 CET

> *) svn_fs__file_from_skel() appears to copy the file contents into memory.
> This isn't going to scale to multi-megabyte (or gigabyte!) files. It
> would be good to have a way that directly maps from DB3 to a seek/read
> function. The ideal interface for me will allow me to seek to a point in
> the "stream" and then read "n" bytes from it. Preferably, the read would
> not allocate memory (say, if DB3 mmap'd the record, then I'd just get a
> pointer into that mamp).
>
> Basically, I'm looking at a case where we have a gigabyte file stored
> into Subversion. The client requests bytes 100-120 and bytes
> 2000000-2000100. For optimum Apache behavior, I could get just those
> bytes without any memory allocation. (beyond what mmap does, or possibly
> reading from a file descriptor into some allocated memory (of just the
> right size))
>
> What kinds of mechanisms does DB3 support for reading large content? Can
> you give me a pointer to the doc/API? (so that I can be a bit more
> intelligent in my request here)

Berkeley DB does have mechanisms for reading and writing partial
records; I've included the page from the manual below. However, these
don't really solve the problem. If we have a billion-byte file stored
as a delta against another billion-byte file, we need our delta
application routines to support random access. That should be fun.

 Berkeley DB Reference Guide: [Prev] [Ref][Next]
      Access Methods

                    Partial record storage and retrieval

It is possible to both store and retrieve parts of data items in all
Berkeley DB access methods. This is done by specifying the DB_DBT_PARTIAL
flag to the DBT structure passed to the Berkeley DB interface.

The DB_DBT_PARTIAL flag is based on the values of two elements of the DBT
structure, dlen and doff. The value of dlen is the number of bytes of the
record in which the application is interested. The value of doff is the
offset from the beginning of the data item where those bytes start.

For example, if the data item were ABCDEFGHIJKL, a doff value of 3 would
indicate that the bytes of interest started at D, and a dlen value of 4
would indicate that the bytes of interest were DEFG.

When retrieving a data item from a database, the dlen bytes starting doff
bytes from the beginning of the record are returned, as if they comprised
the entire record. If any or all of the specified bytes do not exist in the
record, the retrieval is still successful and the existing bytes or nul
bytes are returned.

When storing a data item into the database, the dlen bytes starting doff
bytes from the beginning of the specified key's data record are replaced by
the data specified by the data and size structure elements. If dlen is
smaller than size, the record will grow, and if dlen is larger than size,
the record will shrink. If the specified bytes do not exist, the record will
be extended using nul bytes as necessary, and the store call will still
succeed.

The following are various examples of the put case for the DB_DBT_PARTIAL
flag. In all examples, the initial data item is 20 bytes in length:

ABCDEFGHIJ0123456789

  1. size = 20
          doff = 0
          dlen = 20
          data = abcdefghijabcdefghij

          Result: The 20 bytes at offset 0 are replaced by the 20 bytes of data,
          i.e., the entire record is replaced.

          ABCDEFGHIJ0123456789 -> abcdefghijabcdefghij

  2. size = 10
          doff = 20
          dlen = 0
          data = abcdefghij

          Result: The 0 bytes at offset 20 are replaced by the 10 bytes of data,
          i.e., the record is extended by 10 bytes.

          ABCDEFGHIJ0123456789 -> ABCDEFGHIJ0123456789abcdefghij

  3. size = 10
          doff = 10
          dlen = 5
          data = abcdefghij

          Result: The 5 bytes at offset 10 are replaced by the 10 bytes of data.

          ABCDEFGHIJ0123456789 -> ABCDEFGHIJabcdefghij56789

  4. size = 10
          doff = 10
          dlen = 0
          data = abcdefghij

          Result: The 0 bytes at offset 10 are replaced by the 10 bytes of data,
          i.e., 10 bytes are inserted into the record.

          ABCDEFGHIJ0123456789 -> ABCDEFGHIJabcdefghij0123456789

  5. size = 10
          doff = 2
          dlen = 15
          data = abcdefghij

          Result: The 15 bytes at offset 2 are replaced by the 10 bytes of data.

          ABCDEFGHIJ0123456789 -> ABabcdefghij789

  6. size = 10
          doff = 0
          dlen = 0
          data = abcdefghij

          Result: The 0 bytes at offset 0 are replaced by the 10 bytes of data,
          i.e., the 10 bytes are inserted at the beginning of the record.

          ABCDEFGHIJ0123456789 -> abcdefghijABCDEFGHIJ0123456789

  7. size = 0
          doff = 0
          dlen = 10
          data = ""

          Result: The 10 bytes at offset 0 are replaced by the 0 bytes of data,
          i.e., the first 10 bytes of the record are discarded.

          ABCDEFGHIJ0123456789 -> 0123456789

  8. size = 10
          doff = 25
          dlen = 0
          data = abcdefghij

          Result: The 0 bytes at offset 25 are replaced by the 10 bytes of data,
          i.e., 10 bytes are inserted into the record past the end of the current
          data.

          ABCDEFGHIJ0123456789 -> ABCDEFGHIJ0123456789\0\0\0\0\0abcdefghij

                                                        [Prev] [Ref][Next]

Copyright Sleepycat Software
Received on Sat Oct 21 14:36:14 2006

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.