Re: Every Version of Every File in a Repository

From: Branko Čibej <brane_at_wandisco.com>
Date: Wed, 08 Oct 2014 00:41:01 +0200

On 07.10.2014 22:36, Andreas Mohr wrote:
> Hi,
>
> That's certainly a somewhat tough one.
>
>
> I will get tarred and feathered here for my way of trying to solve this,
> and possibly even rightfully so, but... ;)

Well, I certainly won't skin you alive for suggesting this; but ... I
would imagine that "git svn fetch" has to essentially do just what the
OP doesn't want to do, i.e., successively retreive each revision of
every file in the Subversion repository to populate the Git repository.
There's not much chance this would be faster than just doing the same
with Subversion, especially since, once you're done you /still/ have to
scan the files resulting Git repo.

Going back to the original question ...

> Aside from the brute-force method of checking out the entire repository
> starting at revision 1 , performing a scan, updating to the next revision,
> and repeating until I reach the head, I don’t know of a way to do this.

This is, in fact, likely to be (almost) the most efficient way to do
this, since you can just use the existing Subversion client to deal with
the repository contents and version discrepancies.

But there is an alternative that might be more efficient in your case:
Create a dumpstream of the repository using "svnadmin dump",
non-incremental and not using deltas, then pipe the stream to a custom
tool that extracts the file contents the stream and either writes them
to disk, or passes them to your scanning tool in some other way.

The reason why this could be faster than the checkout+repeated update is
that you do not have the overhead of a working copy, directory tracking,
property handling, etc. etc., and you can probably save on disk space by
keeping the file contents around only as long as they're being scanned.
It does mean that your custom tool will have to parse the dumpfile
format, but that's really not so hard, the format is quite simple, and
there are a number of example scripts that do that in our repository.
Another alternative is to use our API directly, possibly through one of
the bindings, to get file contents straight from the repository; but I
suspect it's harder than parsing the dump file.

-- Brane
Received on 2014-10-08 00:41:33 CEST

This message: [ Message body ]
Next message: Alexey Neyman: "Re: Every Version of Every File in a Repository"
Previous message: Andreas Mohr: "Re: Every Version of Every File in a Repository"
In reply to: Andreas Mohr: "Re: Every Version of Every File in a Repository"
Next in thread: Alexey Neyman: "Re: Every Version of Every File in a Repository"
Reply: Alexey Neyman: "Re: Every Version of Every File in a Repository"
Reply: Daniel Shahaf: "Re: Every Version of Every File in a Repository"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]