[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Every Version of Every File in a Repository

From: Andreas Mohr <andi_at_lisas.de>
Date: Tue, 7 Oct 2014 22:36:12 +0200

Hi,

On Tue, Oct 07, 2014 at 03:03:13PM -0500, JT.Miller_at_L-3com.com wrote:
> Is there a way to check out every version of a file in a repository? We
> just had a requirement levied to perform a scan of every file in a
> repository. The scan tool must have each file in a stand-alone format.
> Thus, I need a way to extract every version of every file within a
> repository.
>
>  
>
> Aside from the brute-force method of checking out the entire repository
> starting at revision 1 , performing a scan, updating to the next revision,
> and repeating until I reach the head, I don’t know of a way to do this.

That's certainly a somewhat tough one.

I will get tarred and feathered here for my way of trying to solve this,
and possibly even rightfully so, but... ;)

OK, here it goes:
you could do a git-svn on your repo,
then get all files ever existing via http://stackoverflow.com/a/12090812
, then for each such file do a git log --all --something --someveryshortformat
to get all its revisions,
then do a
file_content=$(git show <revision>:./path/to/file)
(alternatively do git show ... > $TMPDIR/mytmp since that ought to be more
reliable for largish files)
, then scan that
(but ideally you'd be able to directly pipe the git show stream into your scan tool).

That ought to give you a scan result for *all* revisions of *all* files
in *all* branches of your repo (you might want to decorate things with a
"uniq" applied at some place or another, to ensure that you're indeed
not doing wasteful duplicate processing of certain items).
OK possibly scratch the "*all* branches" part, since this may require
some extra effort in the case of git-svn...

However this high-level complex lookup solution
might be both rather crude and much less precise
compared to a parse-each-object kind of solution at git plumbing level, if this is
possible (and I'd very much guess it is).
Hmm, that could be a git rev-list, and that would then list changed files for each commit,
and AFAICS globally (i.e., on the global commit tree, rather than specific
"human-tagged" branch names). So that operation mode once successfully scripted
ought to be *a lot* better than the "list all files, then rev-log each file" algo.

And you could then safety check your algorithm
by having it spit out a full list of all commit hash / file combos
(this happens to be the same list which you would then feed into git show,
entry by entry),
and then try hard to figure out a way
to pick a repo-side file version which accidentally is NOT contained in that list
--> algo error!

Oh, and BTW: all this *without* having to do a filesystem-based checkout
(i.e., working copy modification)
of any repo item, even once.
(i.e., this is actually going *against* your initially stated "requirement" of
"Is there a way to check out every version of a file in a repository?",
and rightfully so ;)

HTH,

Andreas Mohr
Received on 2014-10-07 22:36:42 CEST

This is an archived mail posted to the Subversion Users mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.