Page 1 of 2
HTML in language strings
Posted: Mon Jan 05, 2009 10:53 am
by AlexC
Hey,
Is HTML in language strings (which we're using Gettext for all our translation goodness) bad practice? I can't make my mind up as whether it is or not, and the Gettext manual doesn't seem to sure, either (
http://www.gnu.org/software/gettext/manual/gettext.html: "HTML markup, however, is common enough that it's
probably ok to use in translatable strings.").
An few examples of what we've ran into:
Code: Select all
<?php
printf( _('<span id="foobar">Hello,</span> %1$s!'), $foo );
echo _('foobar car <a href="http://example.com">zomg</a>');
?>
How can we avoid this, or do we not need to?
Regards,
(I had no idea where to put this, btw)
Re: HTML in language strings
Posted: Mon Jan 05, 2009 5:06 pm
by alex.barylski
Is HTML in language strings (which we're using Gettext for all our translation goodness) bad practice?
Yes. What if you port to Windows desktop applications? HTML tags won't likely render.
Not only that, but they are separate resources. What if you fail to boldify the text in a French translation?
There "might" be instances where it's absolutely required but I don't think I have ever encountered such a situation.
I would do something like:
Code: Select all
<strong>#LANG_CODE_1#</strong>#LANG_CODE_2#
Cheers,
Alex
Re: HTML in language strings
Posted: Tue Jan 06, 2009 1:15 am
by AlexC
Yes. What if you port to Windows desktop applications? HTML tags won't likely render.
That's kind of a weird argument/point, if there are a few language strings with HTML in it currently, then surely I am already using vast amounts of HTML (as it is a web application) - so why would it ever be ported to that

Re: HTML in language strings
Posted: Tue Jan 06, 2009 8:29 am
by alex.barylski
Perhaps a weird reason but it is a reason. This is where you as a developer need to make an executive decision I guess.
If you know absolutely without a dought it'll never be ported, include HTML.
Are you going to let your users edit the language code? Will you filter the language codes to prevent XSS? Are you prepared to include HTML in all of your translations? What if (instead of porting) you sometime in the future decide to use XXXHTML instead of XHTML are you prepared to modify your language tables AND the HTML templates?
Personally I keep HTML out of anything but HTML template files -- which are alternative snyax PHP scripts if anything.
Re: HTML in language strings
Posted: Tue Jan 06, 2009 8:48 am
by AlexC
Perhaps a weird reason but it is a reason
It's not a reason, it makes no logical sense to convert an entire dedicate web application to something that is not. Even if I *was*, I think the fact there are HTML in language strings would be the least of my worries, since all the other code would need to be ported and would most likely included different language stings. I know 100% that this application will always be a web application.
Are you going to let your users edit the language code? Will you filter the language codes to prevent XSS? Are you prepared to include HTML in all of your translations? What if (instead of porting) you sometime in the future decide to use XXXHTML instead of XHTML are you prepared to modify your language tables AND the HTML templates?
The language strings would be edited by translators and released by us as language packs, HTML is allowed in them regardless, however they will of course be reviewed by us, so any malicious content would be found (these would be translators brought in to translate the application, not edited by your average Joe through the application, just to make things clear).
HTML within the language strings, would not be much (I'm not talking about full blow tables or complex HTML), it would just be small amounts such as in the examples given.
Personally I keep HTML out of anything but HTML template files -- which are alternative snyax PHP scripts if anything.
As do I, however these language strings are causing an issue with that. How would you get around this?
Re: HTML in language strings
Posted: Tue Jan 06, 2009 6:38 pm
by alex.barylski
As do I, however these language strings are causing an issue with that. How would you get around this?
By keeping HTML out of the language strings.
If you rely on HTML you have introduced a concrete dependency on HTML. If you used an intermediate format like wiki/bbcode markup you would make the dependency a little more abstract/generic or whatever, for lack of a better term.
Instead of using <strong> use *bold* and instead of <u> use _underline_
Then have a filter go over the language strings and replace those with the HTML, XHTML or XXXHTML or whatever your desired output may be.
Alternatively you could partition strings into distinct parts when there is markup involved:
Code: Select all
This is a <strong>string</strong> which relied on <h3>HTML</h3>
Would become:
CODE_1 = This is a
CODE_2 = string
CODE_3 = which replied on
CODE_4 = HTML
And the HTML template would look like
Code: Select all
CODE_1 <strong>CODE_2</strong> CODE_3 <h3>CODE_4</h3>
Codes would then be replaced in a post intercepting filter. Personally I hate using code's though I'd rather just use gettext inline.
Cheers,
Alex
Re: HTML in language strings
Posted: Wed Jan 07, 2009 1:32 am
by AlexC
Splitting the string up is what we currently do (or did) to get around it, it just seems 'hackish' and makes life harder for the translators. Some languages, iirc, have different words depending on what context they are in and what is before/after them, so splitting them up may result in broken language sentences - if you see what I'm getting at?
Personally I hate using code's though I'd rather just use gettext inline.
Same (though I do see advantages to them), and as we're using Gettext + Launchpad for our translations it's best to keep the language strings in English so that translators can just download the .po/pot files and start translating the English version.
What I did try (and sort of works) is something like this:
Code: Select all
<?php printf( t('foobar car %1$szomg%2$s'), '<a href="http://example.com">', '</a>' ); ?>
It just seems again potential to confuse the user, is the string 'zomg' or 'szomg'?
Re: HTML in language strings
Posted: Wed Jan 07, 2009 2:34 pm
by alex.barylski
If you have a dedicated server use intl it's functions are better than sprintf
Re: HTML in language strings
Posted: Wed Jan 07, 2009 3:39 pm
by AlexC
We do, however this application is distributed for anyone to download and use.
Re: HTML in language strings
Posted: Wed Jan 07, 2009 3:48 pm
by alex.barylski
Then you have to wait until 5.3 becomes common place...I think that is the version in whcih supports it without having to install from PECL.
Emulating it would be a daunting task...maybe look into Zend_Locale and it's language classes, they might have emulated it, or something close to it.
Re: HTML in language strings
Posted: Wed Jan 07, 2009 3:49 pm
by allspiritseve
AlexC wrote:Splitting the string up is what we currently do (or did) to get around it, it just seems 'hackish' and makes life harder for the translators.
My vote is to keep the HTML in the strings. I'm not worried about any so-called "dependency on html". I don't think we're getting rid of HTML any time soon, just look at how long it takes for browsers to adopt HTML & CSS standards. If for some reason in the future you need to convert to another format, it seems like it'd be just as easy to translate HTML as BBCode. The most important thing for me though, as a bilingual student, is that you make the job easiest for your translators

Re: HTML in language strings
Posted: Wed Jan 07, 2009 5:54 pm
by alex.barylski
My vote is to keep the HTML in the strings. I'm not worried about any so-called "dependency on html". I don't think we're getting rid of HTML any time soon, just look at how long it takes for browsers to adopt HTML & CSS standards. If for some reason in the future you need to convert to another format, it seems like it'd be just as easy to translate HTML as BBCode.
Look at Smarty for instance...it used HTML tags for its modules...then XHTML compliance and such became important, now the module developers had to go and update modules.
If you used <b> instead of the SEO <strong> you need to change, not just templates, but language files too?
Separation of concerns...I'm pednatic about...some others aren't.
Re: HTML in language strings
Posted: Wed Jan 07, 2009 6:05 pm
by allspiritseve
PCSpectra wrote:pednatic
Oh, the irony...
In all seriousness though... is it really that big of a deal to change your template files AND your language files? It seems like a simple search and replace would do the trick. If that saves him jumping through hoops now, why optimize prematurely?
Re: HTML in language strings
Posted: Wed Jan 07, 2009 9:18 pm
by alex.barylski
In all seriousness though... is it really that big of a deal to change your template files AND your language files? It seems like a simple search and replace would do the trick. If that saves him jumping through hoops now, why optimize prematurely?
Nothing is a big deal when your talking small scale
As long as the controller-model-view code is minimal, keeping it in a single function isn't bad either...

Re: HTML in language strings
Posted: Wed Jan 07, 2009 9:36 pm
by allspiritseve
PCSpectra wrote:As long as the controller-model-view code is minimal, keeping it in a single function isn't bad either...

I don't think it's quite the same thing... with everything in a single function you're almost definitely going to run into maintainence problems, and soon... with HTML, maybe he'll sorta run into minor problems in 10 years, MAYBE, if something better comes along and actually gets widespread adoption, but what's the worst he'd have to do to fix it? search and replace a bunch of files. With MVC, you can't automate well-separated code. It's gotta be done right early on.