[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: CString to CStringA conversion and other questions :)

From: Oto BREZINA <otik_at_printflow.eu>
Date: Mon, 20 Aug 2012 20:40:49 +0200

On 2012-08-20 18:44, Stefan Küng wrote:
>>>> 3.
>>>> Thru code I have seen lot of *ptr++ based algos (sometimes in unnatural
>>>> way for me), are those quicker then based on ptr[]?
>>>> According some quite old optimalisation guide ptr[] could be faster
>>>> becouse of less increment instructions, and simpler cache management,
>>>> however it was in about '97.
>>>> Do you have any real performance tests/data - I tried to run my own, but
>>>> I was unable to start performance tests as I'm not admin ... will try
>>>> again later.
>>> ptr incrementations can be faster than index based access, usually when
>>> using std containers.
>> What about char * and wchar_t * ?
>> like in CheckUnicodeType, of Load "fill in the lines into the array"
>> part. Data can be quite big.
>> Most important part is access instruction *ptr vs ptr[i] and number of
>> cache miss. But easiest way is to check that for big data.
> I'm wondering here: why do you want to change that part of the code?
> Does it not work? Is it too slow?
I'm NOT about to rewrite CheckUnicodeType, just was wonder when I read
it if *ptr++ is faster.
In fact I was little bit about - I wrote simple UTF8 validator some time
ago, and seeing your implementation with lot of nested ifs, ++ etc, I
was wondering what are cons and pros to compare with my - more state
based implementation.
Everything stopped on Performance Analysis tool.

In CheckUnicodeType you get thru array twice just to verify if it is UTF8.
And it can be enhanced to UTF16BE/LE detection - statisticaly based, so
not 100% accurate, but it would cost some processing. For now you need
BOM for UTF16s.

It started with simple task:
Last line of file should have no newline on it. In current
implementation you keep information in attribute m_bReturnAtEnd. This
makes some editing at file end quite hard. For example you can
add/remove new line on last line using CTRL enter, but this is usually
not applied on edited file. Keeping this attribute actual seems to be
hard task with lot of possible bugs. But even I was able to remove this
attribute and add last line (when needed) it does not apeared in views...
It seems that you use line data from diff where is empty last line
missing. I'll come back to this later.

In that time I found that there is quite lot of code duplication in Load
and Save and duplices are not same... Making code harder to read and
maintain. Plus missing UTF32 encoding

Other motivation was to something simple, before starting code editing
in multiple views.

>
>>>> 5.
>>>> Have you any specific reason to not support UTF32, or just too small use
>>>> cases.
>>> Is there even a tool/app/whatever that writes such files?
>>> I've never seen such a file myself.
>>> So why implement something that won't be used?
>> Quite agree. If this is only reason I would implemented that. I guess
>> this format is really rare. And if used then on Linux. But it makes me
>> feel, that application is unfinished whenever I read, that UTF32 is
>> thread as binary ...
>> To add support for UTF32 is write load and write filter (x2 BE, LE) ...
>> should be easy ...
>> Can be good new feature for 1.8.
> not really: while the svn diff lib doesn't support even utf16, it
> doesn't break either for those because it skips over the null bytes.
> But utf32 wouldn't work - too many null bytes in a normal text.
So T-Merge do a diff directly on UTF16 files?
Utf8 temp file is created only if encoding differ like UTF16 and UTF8,
or never ?
Can that be simply enforced for UTF32 files?

 From CDiffData::Load it seems that UTF16 files are saved as UTF8 in
temp, for diff purposes, Am I right?

>>>> Have you any reason to not support other EOLs? According
>>>> http://en.wikipedia.org/wiki/Newline only NEL seems be questionable.
>>> actually, yes: the svn diff library doesn't support them, so supporting
>>> them in TMerge makes no sense. We could split the lines there, but the
>>> diffing engine would treat those as one line and so the diff would be
>>> shown wrong.
>> Ok, sounds reasonable.
>>
>> There is as much use for thoses EOLS as for UTF32, so no big deal, but:
>> If I get that correctly, diff is made on temp files in UTF8 format. In
>> case other EOLs is used we can convert them to EOL_AUTOLINE, and make diff.
> That would work for *showing* the diff. But when saving edited content,
> you would save the converted EOLs.
Of course one, we load(ed). We only need to get know to upper layer e.g
CDiffData::Load, that diff needs enforce UTF8, while on enforced UTF8
save all non standard EOLS can be converted to AUTO. We'll lose little
bit of diff this way through.

Making temp UTF8 needed in UTF16,32 Exotic EOLS, different Encodings
ASCII and UTF8 ...
>> This lead me to other question:
>> 6.
>> When saving to enforced UTF8 for all but UTF8BOM BOM iwas not saved -
>> was this intentional? Can be BOM missing for all UTF8 enforced files.
>> Correct? This was implemented in r23192.
> There's an option in the settings to save files as utf8 even if they're
> detected as ANSI.
> You're now saving those with a BOM, which isn't what we did before. In
> that case you should write the file without a BOM (always without BOM if
> possible, only if the file had one when loading, then we write the BOM too).
r23193 is like: ((!bSaveAsUTF8)&&(m_UnicodeType ==
CFileTextLines::UTF8BOM))
Should mean If NOT SaveAsUtf8 and ... then save BOM.

bSaveAsUTF8 is only for user requests, or for diff purposes too?

>> 7.
>> If I read code correctly On CFileTextLines::CheckLineEndings you detect
>> EOL_LFCR, but in CFileTextLines::Load this is one is decoded as EOL_LF
>> and EOL_CR or EOL_CRLF.
>> Is this intentional?
>> I guess EOL_LFCR is too rare to be really wanted, but why to detect it
>> in Check then. This makes all EOL_AUTOLINE EOL_LFCR.
> I don't understand what you mean here.
> In Load(), the line endings are checked by calling CheckLineEndings(),
> there's no separate detection.
CheckLineEndings can detect EOL_LFCR, Load not. Is this what you want?

>
> Stefan
>

-- 
Oto ot(ik) BREZINA - 오토, mob: +421 903 653 470
Printflow s.r.o, tel +421 2 4488 1086, Bratislava, Slovakia, EU If I 
toppost I do it because:
  * I don't have time to edit out irrelevant context and signatures
  * I expect you to remember the context for my email messages
  * I want you do the work to figure out what I said
  * My time is more important than your time
------------------------------------------------------
http://tortoisesvn.tigris.org/ds/viewMessage.do?dsForumId=757&dsMessageId=2999679
To unsubscribe from this discussion, e-mail: [dev-unsubscribe_at_tortoisesvn.tigris.org].
Received on 2012-08-20 20:42:00 CEST

This is an archived mail posted to the TortoiseSVN Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.