[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Blue-sky idea: Representation reuse

From: Greg Hudson <ghudson_at_MIT.EDU>
Date: 2002-10-12 19:10:50 CEST

On Sat, 2002-10-12 at 09:11, mark benedetto king wrote:
> Yes, it has been done, but AFAIK, there are the "sound" O(n^2) approaches,
> and the "practical" O(n log n) approaches.

I'm still not sure how you get n log n. Even if you can reduce the file
contents to a hash with nice similarity properties, how do you find
pairs of hashes which are similar? Isn't that just O(n^2) with a
smaller constant?

> What if the repository were stored on a compressed filesystem?
> This could (IMO) give us similar total storage savings as
> a shared-representation model, without any additional code or
> testing.

A compressing filesystem would only exploit local similarities; if you
check in the same file ten times, you'll still get ten compressed copies
of the file. It could only produce similar total storage savings by
numerical coincidence.

(Self-compressing plaintexts is probably a lot faster than storing our
whole database in a compressing filesystem. Plus, as far as I can tell,
compressing filesystems just aren't that prevalent outside of Windows.)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Sat Oct 12 19:11:34 2002

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.