[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: [RFC] diff-optimizations-bytes branch: avoiding function call overhead (?)

From: Stefan Fuhrmann <eqfox_at_web.de>
Date: Fri, 24 Dec 2010 15:43:00 +0100

On 24.12.2010 14:25, Stefan Fuhrmann wrote:
> On 23.12.2010 03:44, Gavin Beau Baumanis wrote:
>> Hi Johan,
>>
>> I was intrigued by your requirement to create a large file for testing.
>> I remember from a really long time ago when I learnt C, that we used
>> a specific algorithm for creating "natural" and "random" text.
>> With some help from Mr.Google found out about Markov Chains that look
>> promising - I can't remember if that was what I learned about or not
>> - but it looks like it might be a prove helpful none the less.
>>
>> A little further Googlng and I found this specific post on
>> stackoverflow.
>> http://stackoverflow.com/questions/1037719/how-can-i-quickly-create-large-1gb-textbinary-files-with-natural-content
>>
>>
>>
>> No Idea if it is going to help you specifically or not... but there
>> are quite a few ideas in the comments;
>> * Obtain a copy of the first 100MB from wikipedia - for example.
>>
> You might try some recent LINUX tar ball (~400MB).
> It should be
> * mainly but probably not entirely text
> * very close to typical real-world data (large config file
> sections, lots of source code, maybe some binary /
> UTF16 data)
> * accessible to everybody for independent testing etc.
>
> Just an idea ;)
> -- Stefan^2.
>
... you may import many versions (including the RCs)
of it to form a deep history.

-- Stefan^2.
Received on 2010-12-24 15:44:34 CET

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.