Convert encoding where output cannot represent characters
Moderator: General Moderators
- Ambush Commander
- DevNet Master
- Posts: 3698
- Joined: Mon Oct 25, 2004 9:29 pm
- Location: New Jersey, US
the usual bottle necks are disk IO, network delays, thus a large performance can be gained by first looking at these two problems. Tools such as ptrace, strace are very useful to find where the bottle necks might be.
http://www.schlossnagle.org/~george/tal ... e%20pdf%22
there was a pdf slide using ptrace a few weeks back, lost the link.
http://www.schlossnagle.org/~george/tal ... e%20pdf%22
there was a pdf slide using ptrace a few weeks back, lost the link.
well, you could convert your string from source to target charset using //IGNORE option, then convert back and look for differences. Every char missed from double-converted string should be encoded using html entity.
Not an elegant solution, of course
I would prefer to have an ability to set //TRANSLIT callback.
Not an elegant solution, of course
- Ollie Saunders
- DevNet Master
- Posts: 3179
- Joined: Tue May 24, 2005 6:01 pm
- Location: UK
Those of you who think you know about performance might be able to help Astions.
- Ambush Commander
- DevNet Master
- Posts: 3698
- Joined: Mon Oct 25, 2004 9:29 pm
- Location: New Jersey, US
- Ambush Commander
- DevNet Master
- Posts: 3698
- Joined: Mon Oct 25, 2004 9:29 pm
- Location: New Jersey, US
I did more thinking about the technique, and it only works nicely for fixed-length encodings (especially 8-bit ASCII-compatible ones). Everything else and you have to implement a character gobbler (I'm sure that's not the term for it) for each encoding you want to support, which is almost as bad as having to setup lookup tables.
- Ambush Commander
- DevNet Master
- Posts: 3698
- Joined: Mon Oct 25, 2004 9:29 pm
- Location: New Jersey, US