Page 1 of 3
Are messages part of a language
Posted: Tue Feb 06, 2007 7:03 pm
by Ambush Commander
An essential part of any multilingual system is the ability to load system messages in different messages. Rather than hard-coding, "Error: Core Reactor Meltdown" throughout your application, you should be able to use a little key like "error-reactor-meltdown" which will go and retrieve the appropriate message in French or Swahili, depending on the locale. We'll put other L10N necessities such as date and number formatting on the side for a moment.
Translations for these system messages are usually provided by means of a PHP file with oodles of strings stuffed in an associative array. While this is workable with translators, it isn't for site admins who want to customize text. Thus, there is often a database layer over the regular translations that lets people without write access to the webserver to change system messages to things like "ERROR CORE MELTDOWN OHNOHZORZ!!!"
It may be reasonably asserted, then, that the whole subject of messages, including their retrieval from multiple data sources, their caching, parameter substitution, etc, is a complicated issue that merits their own class, which we shall term Messages or
Translation.
Messages, however, aren't the only game in town: a language has to support things like dates, numbers, and genitive noun forms to be truly accurate: messages are only a part (albeit a very large one) of the mix. So, I pose to you a question of dependencies:
Is the language dependent on the messages? Or are the messages just a component of the language? Or are they two separate entities that have dependencies on each other? Or is there a third gateway two these two objects? (The association is not always one to one: there may be multiple languages on one page)
Some examples: MediaWiki contains a MessageCache which is used for retrieving messages. this object considers the Language object only one of many places where a message could be squirreled away. The PEAR packages Translation2 and I18Nv2 appear to be blithely unaware of one another (even though the former is dependent on the latter). I am not currently aware of any other translation packages.
I'm looking for something light, thank you very much, since this will be used for an error reporting system.

Posted: Tue Feb 06, 2007 7:10 pm
by Christopher
You might want to take a look at with they are doing in the Zend Framework with Locale, Translate, System, etc. and how they interrelate. There have been a number of very long design discussions there.
Posted: Tue Feb 06, 2007 7:22 pm
by Ambush Commander
I took a peek, but the code is incomprehensible and documentation nowhere to be found. The project proposals seem to imply that each component (Date, Currency, Translate) is an entity of itself and share a common thread in that they implement Zend_Locale. This does not help me very much, as I intend on a tight binding of only language related behaviors together due to the fact that it will be a very minimal system.
Posted: Tue Feb 06, 2007 8:50 pm
by wei
Posted: Tue Feb 06, 2007 9:01 pm
by Ambush Commander
It is on the subject of messages and parameter binding, but it doesn't say anything architecturally speaking. I suppose, however, more examples would be welcome. If someone who understands the Zend Framework setup would like to speak up, I'm all ears.
Posted: Tue Feb 06, 2007 9:08 pm
by wei
well,
Is the language dependent on the messages? Or are the messages just a component of the language? Or are they two separate entities that have dependencies on each other? Or is there a third gateway two these two objects? (The association is not always one to one: there may be multiple languages on one page)
i don't understand what you are trying to say there, may be some definition of the terms may help.
Posted: Tue Feb 06, 2007 9:15 pm
by Ambush Commander
Okay. Let's see...
Language - representation of a written language. Contains data and behavior associated with the language, these are usually translated messages and language-specific formatting.
Messages - one part of a language, they are strings like "Log out" and "Be careful of alligators", only translated to a certain language. A message is usually is a language (i.e. it has a property of being English or French). However, messages are often aggregated inside a language object, so that the Language contains all the default message strings.
If this was the end of the story, things wouldn't be so bad: just have an array of phrases => messages inside the language object. However, messages don't necessarily have to come from the language object: they may be loaded from a database or XML file, they may be cached in memory or in a serial file, which are really out of the scope of a lowly Language object.
Posted: Tue Feb 06, 2007 9:26 pm
by wei
well, may be to follow ICU's namings, your "language" is like a "locale" in ICU, because 1 "natural language" may be used in many places (e.g. english is used in many place and each place may have different variants on number/date formatting conventions).
http://icu.sourceforge.net/userguide/locale.html
Not everyone uses the term Locale, MS .NET uses the term Culture (a little different though).
Message formatting is not only locale sensitive, it is message sensitive as well, that is, the formatting/translation of the message depends on the message and locale. E.g. (plural forms in English, no such things as plurals in Chinese)
Posted: Tue Feb 06, 2007 9:32 pm
by Ambush Commander
Yep. It's quite complicated.
e.g. english is used in many place and each place may have different variants on number/date formatting conventions
True, but practically speaking it doesn't make much sense to make a distinction between American and British spelling: imagine the redundancy!
ICU's method of action seems to be setting up locale aware functions, and then passing the current locale into them via a Locale object. This is opposed to a polymorphic approach where the functions are actually bound to the locale, so no locale passing is necessary. ICU's approach is more procedural in nature.
Posted: Tue Feb 06, 2007 9:37 pm
by wei
it doesn't make much sense to make a distinction between American and British spelling: imagine the redundancy!
one of the goals of localization is to use specific terms and phrases for that locale, so not only spelling.
the reason for passing in the locale (not always the current locale) is because of the need to fallback to invariant locales if necessary. E.g. a missing translation for "en_AU" locale will try to fall back to the translation for "en".
ICU is to be used for php 6 i think.
Posted: Tue Feb 06, 2007 9:50 pm
by Ambush Commander
one of the goals of localization is to use specific terms and phrases for that locale, so not only spelling.
True, but even then, the necessity of such distinctions is tenuous. Even massively multilingual projects like MediaWiki haven't found it necessary to make such a distinction. Admittedly, this is not the case for some languages, such as Chinese, although when changes become that large one must wonder whether or not they are still the same language. I digress: locale probably is the more precise term, but language is more accessible.
the reason for passing in the locale (not always the current locale) is because of the need to fallback to invariant locales if necessary. E.g. a missing translation for "en_AU" locale will try to fall back to the translation for "en".
If carefully done, this sort of fallback functionality can also be implemented for the locale objects I've described above. It's actually quite elegant.
Yep, ICU's coming to PHP6, but that's too long.

Posted: Wed Feb 07, 2007 12:22 am
by alex.barylski
Interesting...although I'm not sure I follow...
All I do is load an INI file with printf placeholders and pass that string to sprintf() when interpolation is required...
Yes it could be jazzed up a bit...but...I kinda like language translation stremlined...so I keep at this as a minimal...
I actually keep language files modular just like the components of my app, so I don't load a INI file with 10000 records for translation and only use 50 or 100...
What more do you want? Pardon me if you mention this already...but I have only breifly skimmed over your message(s)....
Posted: Wed Feb 07, 2007 4:04 am
by Jenk
After only a brief ponder, I think I would approach it in pretty much the same way as Hockey, though I think I would add a datalayer to allow the admin to change the messages as they see fit (or perhaps add extra languages)
re: Messages vs Locale/Language, I think Zend may have the right track, in that keeping them bound within the (pseudo) namespace of Zend_Locale, rather than tightly binding them will be better for them, because not everyone (such as yourself, if I read your post right) will want both at the same time.
I would certainly not bind both together. Infact I would stretch as far as saying the only link between the two should be the choice of language, and nothing else.
Re: Are messages part of a language
Posted: Wed Feb 07, 2007 2:53 pm
by jmut
Ambush Commander wrote:....Rather than hard-coding, "Error: Core Reactor Meltdown" throughout your application, you should be able to use a little key like "error-reactor-meltdown" which will go and retrieve the appropriate message in French or Swahili, depending on the locale.....
I would rather write the full string of translation rather than key (as in gettext).
First you(as a developer) see what the full error is..not some weird error code. Next you can easily add dynamic data to it. There are tools that tells you which string is translated..and what not.. so whole translation process should be pretty easy.
Posted: Wed Feb 07, 2007 2:58 pm
by Ambush Commander
All I do is load an INI file with printf placeholders and pass that string to sprintf() when interpolation is required...
Yes it could be jazzed up a bit...but...I kinda like language translation stremlined...so I keep at this as a minimal...
I actually keep language files modular just like the components of my app, so I don't load a INI file with 10000 records for translation and only use 50 or 100...
Right. Your system focuses solely on message substitution and not other localization related functions. As for modular language files, I don't think splitting them up will be necessary for my purposes (there shouldn't be that many).
Messages vs Locale/Language, I think Zend may have the right track, in that keeping them bound within the (pseudo) namespace of Zend_Locale, rather than tightly binding them will be better for them, because not everyone (such as yourself, if I read your post right) will want both at the same time.
I would certainly not bind both together. Infact I would stretch as far as saying the only link between the two should be the choice of language, and nothing else.
Perhaps, but if you go the other route you end up amalgamating all the different language functionalities together. Let me explain, the use case is:
Code: Select all
$locale = new Locale();
$locale->setLocale('EN');
$money = new Currency('$23');
echo $money->getCurrency(Locale::FR);
First of all, how the heck does Currency know about $locale? I smell a singleton, which is smelly, so we'd refactor this as:
Code: Select all
$locale = new Locale();
$locale->setLocale('EN');
$money = new Currency('$23', $locale);
echo $money->getCurrency(Locale::FR);
The next trouble: we're calling getCurrency while passing a language code. This is all fine and dandy, but what do the innards of getCurrency look like?
Code: Select all
function getCurrency($locale) {
switch ($locale) {
case 'EN':
// format for EN
// ...
}
}
This is bad: we don't want all the language currency formatting in one class. Maybe if we separated them...
Code: Select all
function getCurrency($locale) {
$fmt = $this->getCurrencyFormatter($locale);
// convert it to the proper currency, of course, but that's got its own problems
return $fmt->format($this->currency);
}
An object, then, for each currency. Presumably, the same would have to apply to dates, number formatting, etc. Inheritance used to prevent duplication. Which is a lot of objects!
Let's take Currency, Measure and Dates out of the mix for a moment, the nature of my library won't have to deal with them. This leaves me with number formatting, as well as a few other miscellaneous tidbits like default encoding for the language, text directionality, and a little bit of meta-data on the language. It just appears to me that stuffing it all in one class would be more convenient, without getting in the way of future expansion. Am I missing something?
I would rather write the full string of translation rather than key (as in gettext).
First you(as a developer) see what the full error is..not some weird error code. Next you can easily add dynamic data to it. There are tools that tells you which string is translated..and what not.. so whole translation process should be pretty easy.
Well, I would hope that the error code is descriptive enough to glean its meaning from that. I'm not sold on the full string approach because it's brittle: the slightest change to the message and you need a new key.