Are messages part of a language

Not for 'how-to' coding questions but PHP theory instead, this forum is here for those of us who wish to learn about design aspects of programming with PHP.

Moderator: General Moderators

User avatar
Ambush Commander
DevNet Master
Posts: 3698
Joined: Mon Oct 25, 2004 9:29 pm
Location: New Jersey, US

Are messages part of a language

Post by Ambush Commander »

An essential part of any multilingual system is the ability to load system messages in different messages. Rather than hard-coding, "Error: Core Reactor Meltdown" throughout your application, you should be able to use a little key like "error-reactor-meltdown" which will go and retrieve the appropriate message in French or Swahili, depending on the locale. We'll put other L10N necessities such as date and number formatting on the side for a moment.

Translations for these system messages are usually provided by means of a PHP file with oodles of strings stuffed in an associative array. While this is workable with translators, it isn't for site admins who want to customize text. Thus, there is often a database layer over the regular translations that lets people without write access to the webserver to change system messages to things like "ERROR CORE MELTDOWN OHNOHZORZ!!!"

It may be reasonably asserted, then, that the whole subject of messages, including their retrieval from multiple data sources, their caching, parameter substitution, etc, is a complicated issue that merits their own class, which we shall term Messages or Translation.

Messages, however, aren't the only game in town: a language has to support things like dates, numbers, and genitive noun forms to be truly accurate: messages are only a part (albeit a very large one) of the mix. So, I pose to you a question of dependencies:

Is the language dependent on the messages? Or are the messages just a component of the language? Or are they two separate entities that have dependencies on each other? Or is there a third gateway two these two objects? (The association is not always one to one: there may be multiple languages on one page)

Some examples: MediaWiki contains a MessageCache which is used for retrieving messages. this object considers the Language object only one of many places where a message could be squirreled away. The PEAR packages Translation2 and I18Nv2 appear to be blithely unaware of one another (even though the former is dependent on the latter). I am not currently aware of any other translation packages.

I'm looking for something light, thank you very much, since this will be used for an error reporting system. ;-)
User avatar
Christopher
Site Administrator
Posts: 13596
Joined: Wed Aug 25, 2004 7:54 pm
Location: New York, NY, US

Post by Christopher »

You might want to take a look at with they are doing in the Zend Framework with Locale, Translate, System, etc. and how they interrelate. There have been a number of very long design discussions there.
(#10850)
User avatar
Ambush Commander
DevNet Master
Posts: 3698
Joined: Mon Oct 25, 2004 9:29 pm
Location: New Jersey, US

Post by Ambush Commander »

I took a peek, but the code is incomprehensible and documentation nowhere to be found. The project proposals seem to imply that each component (Date, Currency, Translate) is an entity of itself and share a common thread in that they implement Zend_Locale. This does not help me very much, as I intend on a tight binding of only language related behaviors together due to the fact that it will be a very minimal system.
wei
Forum Contributor
Posts: 140
Joined: Wed Jul 12, 2006 12:18 am

Post by wei »

User avatar
Ambush Commander
DevNet Master
Posts: 3698
Joined: Mon Oct 25, 2004 9:29 pm
Location: New Jersey, US

Post by Ambush Commander »

It is on the subject of messages and parameter binding, but it doesn't say anything architecturally speaking. I suppose, however, more examples would be welcome. If someone who understands the Zend Framework setup would like to speak up, I'm all ears.
wei
Forum Contributor
Posts: 140
Joined: Wed Jul 12, 2006 12:18 am

Post by wei »

well,
Is the language dependent on the messages? Or are the messages just a component of the language? Or are they two separate entities that have dependencies on each other? Or is there a third gateway two these two objects? (The association is not always one to one: there may be multiple languages on one page)
i don't understand what you are trying to say there, may be some definition of the terms may help.
User avatar
Ambush Commander
DevNet Master
Posts: 3698
Joined: Mon Oct 25, 2004 9:29 pm
Location: New Jersey, US

Post by Ambush Commander »

Okay. Let's see...

Language - representation of a written language. Contains data and behavior associated with the language, these are usually translated messages and language-specific formatting.

Messages - one part of a language, they are strings like "Log out" and "Be careful of alligators", only translated to a certain language. A message is usually is a language (i.e. it has a property of being English or French). However, messages are often aggregated inside a language object, so that the Language contains all the default message strings.

If this was the end of the story, things wouldn't be so bad: just have an array of phrases => messages inside the language object. However, messages don't necessarily have to come from the language object: they may be loaded from a database or XML file, they may be cached in memory or in a serial file, which are really out of the scope of a lowly Language object.
wei
Forum Contributor
Posts: 140
Joined: Wed Jul 12, 2006 12:18 am

Post by wei »

well, may be to follow ICU's namings, your "language" is like a "locale" in ICU, because 1 "natural language" may be used in many places (e.g. english is used in many place and each place may have different variants on number/date formatting conventions).

http://icu.sourceforge.net/userguide/locale.html

Not everyone uses the term Locale, MS .NET uses the term Culture (a little different though).

Message formatting is not only locale sensitive, it is message sensitive as well, that is, the formatting/translation of the message depends on the message and locale. E.g. (plural forms in English, no such things as plurals in Chinese)
User avatar
Ambush Commander
DevNet Master
Posts: 3698
Joined: Mon Oct 25, 2004 9:29 pm
Location: New Jersey, US

Post by Ambush Commander »

Yep. It's quite complicated.
e.g. english is used in many place and each place may have different variants on number/date formatting conventions
True, but practically speaking it doesn't make much sense to make a distinction between American and British spelling: imagine the redundancy!

ICU's method of action seems to be setting up locale aware functions, and then passing the current locale into them via a Locale object. This is opposed to a polymorphic approach where the functions are actually bound to the locale, so no locale passing is necessary. ICU's approach is more procedural in nature.
wei
Forum Contributor
Posts: 140
Joined: Wed Jul 12, 2006 12:18 am

Post by wei »

it doesn't make much sense to make a distinction between American and British spelling: imagine the redundancy!
one of the goals of localization is to use specific terms and phrases for that locale, so not only spelling.

the reason for passing in the locale (not always the current locale) is because of the need to fallback to invariant locales if necessary. E.g. a missing translation for "en_AU" locale will try to fall back to the translation for "en".

ICU is to be used for php 6 i think.
User avatar
Ambush Commander
DevNet Master
Posts: 3698
Joined: Mon Oct 25, 2004 9:29 pm
Location: New Jersey, US

Post by Ambush Commander »

one of the goals of localization is to use specific terms and phrases for that locale, so not only spelling.
True, but even then, the necessity of such distinctions is tenuous. Even massively multilingual projects like MediaWiki haven't found it necessary to make such a distinction. Admittedly, this is not the case for some languages, such as Chinese, although when changes become that large one must wonder whether or not they are still the same language. I digress: locale probably is the more precise term, but language is more accessible.
the reason for passing in the locale (not always the current locale) is because of the need to fallback to invariant locales if necessary. E.g. a missing translation for "en_AU" locale will try to fall back to the translation for "en".
If carefully done, this sort of fallback functionality can also be implemented for the locale objects I've described above. It's actually quite elegant.

Yep, ICU's coming to PHP6, but that's too long. ;-)
alex.barylski
DevNet Evangelist
Posts: 6267
Joined: Tue Dec 21, 2004 5:00 pm
Location: Winnipeg

Post by alex.barylski »

Interesting...although I'm not sure I follow...

All I do is load an INI file with printf placeholders and pass that string to sprintf() when interpolation is required...

Yes it could be jazzed up a bit...but...I kinda like language translation stremlined...so I keep at this as a minimal...

I actually keep language files modular just like the components of my app, so I don't load a INI file with 10000 records for translation and only use 50 or 100... :P

What more do you want? Pardon me if you mention this already...but I have only breifly skimmed over your message(s)....
User avatar
Jenk
DevNet Master
Posts: 3587
Joined: Mon Sep 19, 2005 6:24 am
Location: London

Post by Jenk »

After only a brief ponder, I think I would approach it in pretty much the same way as Hockey, though I think I would add a datalayer to allow the admin to change the messages as they see fit (or perhaps add extra languages)

re: Messages vs Locale/Language, I think Zend may have the right track, in that keeping them bound within the (pseudo) namespace of Zend_Locale, rather than tightly binding them will be better for them, because not everyone (such as yourself, if I read your post right) will want both at the same time.

I would certainly not bind both together. Infact I would stretch as far as saying the only link between the two should be the choice of language, and nothing else.
jmut
Forum Regular
Posts: 945
Joined: Tue Jul 05, 2005 3:54 am
Location: Sofia, Bulgaria
Contact:

Re: Are messages part of a language

Post by jmut »

Ambush Commander wrote:....Rather than hard-coding, "Error: Core Reactor Meltdown" throughout your application, you should be able to use a little key like "error-reactor-meltdown" which will go and retrieve the appropriate message in French or Swahili, depending on the locale.....
I would rather write the full string of translation rather than key (as in gettext).
First you(as a developer) see what the full error is..not some weird error code. Next you can easily add dynamic data to it. There are tools that tells you which string is translated..and what not.. so whole translation process should be pretty easy.
User avatar
Ambush Commander
DevNet Master
Posts: 3698
Joined: Mon Oct 25, 2004 9:29 pm
Location: New Jersey, US

Post by Ambush Commander »

All I do is load an INI file with printf placeholders and pass that string to sprintf() when interpolation is required...

Yes it could be jazzed up a bit...but...I kinda like language translation stremlined...so I keep at this as a minimal...

I actually keep language files modular just like the components of my app, so I don't load a INI file with 10000 records for translation and only use 50 or 100...
Right. Your system focuses solely on message substitution and not other localization related functions. As for modular language files, I don't think splitting them up will be necessary for my purposes (there shouldn't be that many).
Messages vs Locale/Language, I think Zend may have the right track, in that keeping them bound within the (pseudo) namespace of Zend_Locale, rather than tightly binding them will be better for them, because not everyone (such as yourself, if I read your post right) will want both at the same time.

I would certainly not bind both together. Infact I would stretch as far as saying the only link between the two should be the choice of language, and nothing else.
Perhaps, but if you go the other route you end up amalgamating all the different language functionalities together. Let me explain, the use case is:

Code: Select all

$locale = new Locale();
$locale->setLocale('EN');
$money = new Currency('$23');
echo $money->getCurrency(Locale::FR);
First of all, how the heck does Currency know about $locale? I smell a singleton, which is smelly, so we'd refactor this as:

Code: Select all

$locale = new Locale();
$locale->setLocale('EN');
$money = new Currency('$23', $locale);
echo $money->getCurrency(Locale::FR);
The next trouble: we're calling getCurrency while passing a language code. This is all fine and dandy, but what do the innards of getCurrency look like?

Code: Select all

function getCurrency($locale) {
  switch ($locale) {
    case 'EN':
      // format for EN
    // ...
  }
}
This is bad: we don't want all the language currency formatting in one class. Maybe if we separated them...

Code: Select all

function getCurrency($locale) {
  $fmt = $this->getCurrencyFormatter($locale);
  // convert it to the proper currency, of course, but that's got its own problems
  return $fmt->format($this->currency);
}
An object, then, for each currency. Presumably, the same would have to apply to dates, number formatting, etc. Inheritance used to prevent duplication. Which is a lot of objects!

Let's take Currency, Measure and Dates out of the mix for a moment, the nature of my library won't have to deal with them. This leaves me with number formatting, as well as a few other miscellaneous tidbits like default encoding for the language, text directionality, and a little bit of meta-data on the language. It just appears to me that stuffing it all in one class would be more convenient, without getting in the way of future expansion. Am I missing something?
I would rather write the full string of translation rather than key (as in gettext).
First you(as a developer) see what the full error is..not some weird error code. Next you can easily add dynamic data to it. There are tools that tells you which string is translated..and what not.. so whole translation process should be pretty easy.
Well, I would hope that the error code is descriptive enough to glean its meaning from that. I'm not sold on the full string approach because it's brittle: the slightest change to the message and you need a new key.
Post Reply