[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Another request for obliterate...

From: Tim Hill <tim_at_realmsys.com>
Date: 2005-04-19 19:41:13 CEST

Yes, I agree my prune will not handle your case. I feel that prune is
used primarily for the "it shouldn't be in there at all (or anymore)"
type of case, while your scenario is more to do with trimming the
overall repository, presumably for space constraints (if storage
*really* were free and infinite, would it be needed?). So far as I can
see, you either need, as you say, to eliminate an entire revision, which
introduces a new concept into svn, or support the concept of "holes" in
the revision history of a file. Others here have suggested using a
0-length file as a placeholder for such a hole, and I guess that would
work. But it's pretty odd.

On a more speculative note, I'm also not totally certain about saving
space, given that SVN deltas binary files also. If rev 20 in your
example contained a huge build tool (binary), presumably it is still
needed in revs 19 and 24. So the binary storage "hit" in rev 20 is
probably very small indeed. I just ran such a quick test on a sandbox
repo, and indeed the cost of the binary file was small (I edited a large
BMP file). Now, I'm sure there are parasitic cases where a trivial
binary edit can upset the svn binary differ and cause a big hit, but
again, isn't this a bit unusual? Shouldn't svn be optimized for the more
common cases?

--Tim

Weintraub, David wrote:

> The Prune idea is what I am basically thinking of an "obliterate
> obliterate" command. It starts with the most resent revision, and
> prunes backwards all of the versions. It will not allow you to prune
> if there is no version in the latest version of the directory, or if
> there are any copied URL links to another directory. I would also
> limit it to just go though pruning no more than "X" versions of the
> archive. By limiting the "obliterate obliterate", you simplify the
> implementation and make sure you're not removing anything "interesting".
>
> This will not work with the "rmversion obliterate" command I was
> referring to. Let's say the history of file foo.c looks like this:
>
>Version Description
>======= ===========
>18 Good Build
>19 Release 1.0
>
>
>
>20 Bad Build
>24 Good Build
>27 Bad Build
>28 Bad Build
>31 Good Build
>
> I want to remove versions 30, 27 and 28, but I want to keep versions
> 18, 19, 24, and 31. In a few weeks, I will also be removing versions
> 18, 24, and 31, but I never want to remove version 19. Under your
> scenario, I can only prune completely backwards. That is, if I want to
> remove version 20, I would have to remove versions 24 and 31 too
> (which I don't want to do).
>
> I realize that this is a very difficult task to program into
> Subversion, and I am not expecting anything soon. There are a lot of
> issues to work out. For example, if I remove version 27 and 28 of this
> file, what is in archive version 27 and 28? Should it be the same file
> that was in version 24? Maybe I shouldn't be able to rmversion a
> single file, but have to rmversion an entire archive so there won't be
> a version 27 and 28 of the archive. As long as those versions of the
> archive aren't linked to anywhere else, I might be able to
> assume these particular versions of the archive are not interesting.
>
> Of course, the funny thing is in ClearCase, I labeled all of the
> builds -- including the bad ones. After all, how do I know a build is
> bad until it is built and tested by our team? If a build ended up
> being bad, I deleted the label which allowed me to rmversion the built
> binaries. If I want to duplicate this in Subversion, I would have to
> have someway to "uncopy" a tag, or be able to let the rmversion
> command know it is okay to remove a particular version even if it is
> copied to the tag directory.
>
> Which would make this type of command even more difficult to
> implement. Meanwhile, I now have a reason why binaries should not be
> stored in Subversion.
>
>
> -----Original Message-----
> *From:* Tim Hill [mailto:tim@realmsys.com]
> *Sent:* Monday, April 18, 2005 6:39 PM
> *To:* Weintraub, David
> *Cc:* 'Subversion Users'
> *Subject:* Re: Another request for obliterate...
>
> Good points. The more I think about this the more I feel that a
> "prune" command is really the best compromise. Something like:
>
> svnadmin prune REPOPATH -r REV PATH ...
>
> Prunes the specified path(s) starting at the specified revision
> from the specified repository. Each path (file or directory) is
> obliterated from the specified revision, and all subsequent
> revisions, including all branches made at or after the specified
> revision. The change is permanent. Pruning at the revision where
> the file was originally added to the repository will obliterate
> all traces of the file. Pruning at a branch revision will
> obliterate all traces of the file on that branch.
>
> My gut feeling is that this will accomodate 80% of users needs but
> keep the model simple enough so that it can actually be used
> without leading to disasters.
>
> I've also seen lots of shops where entrie build toolchains are
> checked-in. The rationale here is usually broken, and consists of
> either (a) over-loading the SCC system as a backup system or (b) a
> fabled "we need to be able to reconstruct our build environment".
> Of course, in the latter case, you need more than just the
> toolchain (think OS etc etc.).
>
> Incidentally (OT), I now use virtual machines as a way to maintain
> build environments. Just ZIP the whole thing up and put on optical
> media -- VM, OS, tools, etc etc.
>
> --Tim
>
>
>
> Weintraub, David wrote:
>
>> I vote for an obliterate command, but we are talking about two
>> separate commands "obliterate" and "remove version (rmver)":
>>
>> * I've been a CM admin for about 15 years, and I find that a user
>> will request me to obliterate a file about 3 or 4 times per year.
>> Mostly with new files that were accidentally added and contained
>> sensitive information. I've never "obliterated" files with
>> substantial histories, and I'd probably refuse if a user request
>> that I do -- especially if it involves stuff that was released
>> (either internally or externally).
>>
>> * I find "rmver" a bit more useful. In ClearCase, developers
>> don't develop off of the head of the trunk (called /main in
>> ClearCase). Instead, they create a branch and do their
>> development work on that branch. Once they've determined that
>> their code works and it is stable, they would merge their work
>> onto the head of the trunk. (In ClearCase, the trunk was suppose
>> to be always stable and releasable)
>>
>> If you look at a version tree of a file, you'll see dozens of
>> branches merging in and out of the /main branch. To clean up this
>> mess, many places have a policy of removing "dead" development
>> branches and versions. You still have most of the versioning
>> information since you're not deleting anything off of the main
>> trunk. You're only deleting old versions that even the developers
>> no longer care about. This speeds up many of the scripts we use
>> (image how long the "blame" command would take if you have a file
>> with 20 versions on the main trunk, and hundreds of versions on a
>> dozen different side branches) and speeds ClearCase up a bit too.
>> However, it doesn't save very much room since you're only storing
>> the deltas.
>>
>> * It is extremely common -- despite what people may claim "best
>> practice" states -- to put binaries of compiled programs in your
>> archive. This gives you a single place where developers can get
>> precompiled libraries to develop against, it gives everyone a
>> single location of a guaranteed to be valid release, and you know
>> that the System Admins are backing this up on a daily basis.
>>
>> The problem is that binaries take up a ton of room. If you're
>> building every single day, and each build contains 10 to 20
>> gigabytes of binary data, you'll fill up a network disk area no
>> matter how big it is. We are constantly removing old versions of
>> libraries and executables that were never released. Our policy
>> was to remove all binaries from any "bad build", anything from a
>> "good build" over two weeks old, and keep any binaries from an
>> actual release until those binaries are no longer supported.
>>
>> I would like both versions of the "obliterate" command
>> (obliterate and rmversion), but then I'd also like a million
>> dollars and a pony. The "obliterate obliterate" command might be
>> easy to implement if we simply put on restrictions of what it can
>> obliterate. Maybe a file that has only one version of itself, is
>> not on any branches or labels, and is still in the HEAD of the
>> trunk. Maybe something that is in no more than "X" versions of
>> the archive where "X" is a fairly small number. If you make a
>> booboo and accidentally put in a file that shouldn't be in the
>> archive, you can ask the CM to obliterate it, but you better ask
>> pretty quick. Even with those restrictions, it would cover about
>> 98% of the need for obliterate.
>>
>> The "rmversion obliterate" command is much, much harder to
>> implement for reasons I outlined before. You are going to have
>> side effects, and must determine how you handle those side
>> effects before you even dream about coding. In ClearCase, we
>> could not (at least easily) remove a version of a file that had a
>> branch coming out of it, or had a label on it or was
>> "interesting" in any other way.
>>
>> But then, ClearCase versioned files and not the entire archive,
>> so doing a "rmversion" had limited side effects. And, these side
>> effects were well understood. In Subversion, where the whole
>> archive is versioned, the effects are much larger and more
>> unpredictable. For example, how could I make sure I am not
>> accidentally removing an "interesting" version of a file? That
>> is, a version of the file with a tag/label on it or a file that
>> is at the root of a branch. In ClearCase, we would prune dead
>> branches and remove all of those versions. But, we didn't want to
>> remove versions of file with labels (tags) on them or files that
>> are used for work that is being actively developed.
>>
>> In Subversion, there is no difference between a branch and a tag
>> except for what exists in between the ears of the CM. How can we
>> make sure Subversion knows that a particular version of the file
>> we want to remove isn't "interesting"?
>>
>> Right now, I am going to discourage my company from versioning
>> binary files. We will store binaries on a share and just hope
>> that the SysAdmin is backing up those areas on a daily basis. As
>> long as we are only storing deltable files, disk space won't be a
>> major problem.
>>
Received on Tue Apr 19 19:46:18 2005

This is an archived mail posted to the Subversion Users mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.