[this should really be on the dev@ list, i mis-addressed my response.]
On 26.03.2012 17:48, Ashod Nakashian wrote:
>> I recommended a two-stage approach to the implementation because
>> (packing and) encrypting the pristines is only half the story, there
>> are a number of interactions with the rest of the system that need to
>> be ironed out, and it makes no sense to take on the most complex part
>> of the storage format first.
> I see. This isn't what I gathered previously.
I don't /know/ what interactions there are, but it would certainly help
to find out (by running the test suite with the simplest possible
compressed-pristine implementation, for example).
>> As others have pointed out, the only cases where compression and
>> packing may cause trouble is working copies with a lot of binary,
>> compressed files (.jars, various image formats, etc.) That's a problem
>> that needs solving regardless of the compression algorithm or pack
> This can be handled with a not-compressed flag and storing the data as-is.
You're jumping a couple steps here, the first two obviously being, a),
identify which file formats should not be compressed, and b) figure out
how to detect such files reasonably reliably. Using svn:mime-type as the
determinator is not the answer, because I can set that to anything I
like regardless of actual file contents. My advice here is to
incorporate the flag in the wc.db, but not do the actual detection and
special handling in the first release.
> My only issue with the database-as-storage is that it won't work for
> large files and if we store large files directly on disk, then we'll
> have to split the storage, which isn't really great.
It is, in fact, really great. Filesystems are typically much better at
storing large files than any contrived packed-file format. Databases are
typically much better at storing small files than most filesystems. To
get the most benefit for the least effort, use both.
> Consider a file just around the cut-off for storing in a database that gets moved between the database and disk between modifications that results in the file crossing the cut-off.
Then don't make the cutoff a hard limit. For example, if a file is
already on disk, don't store it in the database until it shrinks to
below 75% of the limit. Conversely, don't push it out to the filesystem
until it grows to perhaps even 200% of the limit.
> It's not as clean as having a file-format that handles all cases for us. Sure it'll take longer to get release-quality code, but
> it'd be the correct way of doing things,
Oh nonsense. "Correct" is what works best in a given situation and is
most maintainable, not what generates the best research papers.
> won't feel hackish and will serve us in the long-run better.
I dislike hand-waving arguments, so you'll have to substantiate this one
about serving us better in the long run to convince me. All I see here
is an acute case of NIH syndrome.
> My only similar solid example is Git. I don't know how much effort went into their system, but I'm sure we can do this and do it right. If I were to choose between 6 months to release without packed-files and 12-15 months with them, I'd choose the latter. At least that's how I see it.
The point is that you can have a working, stable implementation using
off-the-shelf code (filesystem and database) in a few weeks, and it does
not stop you from going on to inventing a packed-file format that is
nevertheless friendly to deletions.
(Just for the record though: If you can do that and make it perform
better than SQLite in less than a year, I'll eat my hat.)
Received on 2012-03-27 02:56:17 CEST