Re: Blue-sky idea: Representation reuse

From: Greg Hudson <ghudson_at_MIT.EDU>
Date: 2002-10-12 19:10:50 CEST

On Sat, 2002-10-12 at 09:11, mark benedetto king wrote:
> Yes, it has been done, but AFAIK, there are the "sound" O(n^2) approaches,
> and the "practical" O(n log n) approaches.

I'm still not sure how you get n log n. Even if you can reduce the file
contents to a hash with nice similarity properties, how do you find
pairs of hashes which are similar? Isn't that just O(n^2) with a
smaller constant?

> What if the repository were stored on a compressed filesystem?
> This could (IMO) give us similar total storage savings as
> a shared-representation model, without any additional code or
> testing.

A compressing filesystem would only exploit local similarities; if you
check in the same file ten times, you'll still get ten compressed copies
of the file. It could only produce similar total storage savings by
numerical coincidence.

(Self-compressing plaintexts is probably a lot faster than storing our
whole database in a compressing filesystem. Plus, as far as I can tell,
compressing filesystems just aren't that prevalent outside of Windows.)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Sat Oct 12 19:11:34 2002

This message: [ Message body ]
Next message: Justin Erenkrantz: "Adding ACL support was Re: SV: Accessing different revs via Apache?"
Previous message: mark benedetto king: "Re: Context parameter in public API"
In reply to: mark benedetto king: "Re: Blue-sky idea: Representation reuse"
Next in thread: mark benedetto king: "Re: Blue-sky idea: Representation reuse"
Reply: mark benedetto king: "Re: Blue-sky idea: Representation reuse"

Contemporary messages sorted: [ By Date ] [ By Thread ] [ By Subject ] [ By Author ] [ By messages with attachments ]