[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: [PATCH]: Automatic log file removal

From: Kirby C. Bohling <kbohling_at_birddog.com>
Date: 2002-02-04 23:06:44 CET

> On Mon, 4 Feb 2002, Greg Stein wrote:
>
>
>>On Mon, Feb 04, 2002 at 12:02:44PM -0500, Daniel Berlin wrote:
>>

<snip>

>
>>4) people might want to *keep* those logs; they can be used to replay the
>> database back to a fixed point, or they can be shipped off to another
>> machine to replicate the database (thanks to Kirby for these ideas)
>>
> Except, it's really not as feasible as it sounds to keep the logs on the
> same machine, due to disk space issues (they really just add up too
> quickly).
>

gzip -9 gives a 10 to 1 compression on the logs I had bzip2 did better,
but I don't remember the ratio, which makes it reasonable to keep a lot
of logs around. 8 months of development took 80MB, the extra copies of
the repositories in the .svn directory take up 10% of that on my project
and I normally have 3-4 trees checked out at any one time.

> Replication is your answer here, and it doesn't require keeping the log
> files on your main machine.
> Why?
> Because you can set up a replicated client that is a log file only
client,
> specifically for this purpose.

Don't keep them all around forever, but with most databases you keep all
the logs from last couple of hot/cold backups just for safety sake so if
catestrophic database or *USER* error goes on, you can recover to a
little while ago before the user error. Granted, with this application,
the only entity which touches the database is subversion which in theory
the user errors get removed from the application during testing. Watch
out for the database errors... :-)

> People who need to keep log files in case a disk blows up are *much*
more
> likely to do this, and simply have this machine archive the log files
> somewhere, than to risk running out of disk space on an important server
> that may be doing other tasks.
> In fact, specifically, this is one of the reasons replication was added
> to Berkeley DB.
> The only reason to keep absolutely every log file is if you can't lose a
> second worth of work after your disk blows up completely.
> And, BTW, it also assumes you aren't in some kind of redundant array
where
> a disk blowing up is just fine, which, if you can't afford a seconds
worth
> of loss, you probably already have.
>

Thats all good information, I was unaware of the mirror/replicate
functionality in Berkely DB. From the docs I saw, if you do hot backups
you need the logs for the correct time period to get the database
consistant and recover forward in time. Waiting 60 seconds before
removing them might be bad. I will have to play with BDB to find out.

I know Oracle best, and the drill is switch log files, do a hot backup,
switch log files. With the log files just prior and after that recovery
to any point in time after the hot backup finished. I presume it is a
similar procedure. Not having an archive log means pretty much no
recovery past the missing log file.

> Keeping every log file by default is solving a problem that is already
> solved in multiple ways.
>
> It's something users are going to complain about.
> Especially since users include more than just people who want to use the
> server portion. Their will be plenty of people who use it for local
> repositories who will be completely turned off by this behavior.
>

I complained, so I understand. It might be good to explain up front
what is going on, and how to address it in the docs (it might be and I
missed it). I reviewed all the docs reasonable closely and didn't spot
it. Several people on the list politely explained it to me.

>
>>I'm perfectly happy to put a log-killer.sh script into /trunk/tools that
>>people can run (feed it a list of repositories to keep clean).
>>Administrators can use that script if they choose.
>>
> As you wish.
> I still think it's the wrong answer for the majority of people.
> Its one thing to have it configurable, defaulting to not keeping them or
> keeping them for some time interval. But requiring a separate script
just
> adds another step most people have to take to set up subversion.
>

*nod*

do something that allows reasonable recovery expectations, document it,
and allow the user to modify the parameters to get exactly what they
want. Create a svnadmin shrink/scrub/etc... rather then a script, and
explain what it is doing in the man page. Explain that it cleans up
after the database and you DO NOT lose revision history. People read
man pages, they ignore scripts in the tools directory in my experience
(well I do *grin*).

        Take the code that was the body the the thread and stick it in svnadmin,
make it take several options including compress or not, maybe a
percentage of repository size that log files can be, and how old the
oldest log to keep around is. If that is acceptable, tell me I'll
submit a patch for it. A user can fire and forget with that command
using cron, or put it in post commit hook as suggested on the list.

        Kirby

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Sat Oct 21 14:37:04 2006

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.