Page 1 of 1

A question about language...

Posted: Sat Feb 16, 2008 1:00 am
by alex.barylski
Two actually:

1) How do you store language files. Are they language codes stored in a central file or spreadout across several INI, XML, CSV, etc? What do your language codes look like, do you follow any kind of convention?

2) Currently I store my language translation tables in a directory called i18n - which no longer makes sense as my locale code has been moved into it's own directory and lookup translation tables just don't make sense being organized into a directory called i18n...

Should I rename the directory 'langs' or 'trans' - what directory name would make most sense to you as an outsider looking in?

Re: A question about language...

Posted: Sat Feb 16, 2008 6:46 am
by Ollie Saunders
Hmm I must say I thought you were referring to programming languages when I first started to read this. So that might be a concern with naming is langs. human_langs would get round that problem but it's not as snappy. Alternatively you could go with multilingualization or M17N (I had to use google to get the correct spelling for that one) which, as a word, is a bit more specific.
1) How do you store language files. Are they language codes stored in a central file or spreadout across several INI, XML, CSV, etc? What do your language codes look like, do you follow any kind of convention?
I must confess to never having done this before but I would be tempted to suggest serialised arrays. They are probably the fastest thing to PHP and all you need is the simple key => value mapping. Anything more complex such as XML is just going to be extra weight both in terms of performance and having to write extra code to extract the data.

Re: A question about language...

Posted: Sat Feb 16, 2008 6:59 am
by Maugrim_The_Reaper
Hi Hockey,

I store to a lang directory which is often further split by category. For example I need translations for views, forms, validation errors, etc. So I tend to have 2-3 Translation Containers per application request. I'd like to narrow that down further by aggregating translations at runtime to one object - but for now it's not really required. I utilise caching where possible - both on the Translation lookups and the end page views.

As to format - I use gettext and/or TMX (XML). These are two industry standards so that means you'll find them quite portable anywhere on a Linux machine. I've transitioned to Zend_Translate in PHP primarily because it's a high-quality component without a real matching alternative - this is so good it implements Gettext natively to work around some of it's flaws in PHP. Yes, I work on the ZF personally, but in this case it's not a reimplementation of an existing work - it's truly well done and quite original. It also supports another XML standard XLIFF, databases, PHP arrays, XMLTM, CSV (if seriously silly), Qt, etc. Note that caching finally makes an XML format very viable.

Re: A question about language...

Posted: Sat Feb 16, 2008 11:53 am
by Christopher
I use templates for all text, so I create template sub-directories using the language code in each template directory. That way I can set language code globally and the entire application grabs templates from the right sub-dirs. It is essentially a response level setting with almost no overhead. I just showed this in a separate thread:

Code: Select all

templates/
template/EN/
template/ES/
template/FR/
This not only makes the code trivial, but you can just give the translators access to only the template directory or provide a we based editor -- and they deal with the content. I really don't like to do content. ;)

Here is the other thread:

viewtopic.php?f=1&t=78905

Re: A question about language...

Posted: Sat Feb 16, 2008 2:08 pm
by alex.barylski
Interesting to see how we all differ slightly in our methods...mine is similar to yours...

Although, I don't think serialized arrays are your best choice. For two reasons:

1) I would almost put money on the fact that parsing the JSON file (or whatever serialized arrays/objects are sotred as) is more time consuming than a simple INI file.

2) It doesn't lend itself nicely to parties interested in editing the language files. You would need to build a custom tool around your language files - when if you used CSV any speadsheet could be used to edit your language codes.

Similar to Maugrim's approach...I modularize my language files...so I may at anyone time have to load severl different language files to replace placeholders.

Re: A question about language...

Posted: Sat Feb 16, 2008 2:12 pm
by Ollie Saunders
1) I would almost put money on the fact that parsing the JSON file (or whatever serialized arrays/objects are sotred as) is more time consuming than a simple INI file.
Serialized arrays are not stored in JSON. It's a native PHP format and it's very fast, pretty much the fastest thing for PHP.

Re: A question about language...

Posted: Sat Feb 16, 2008 4:21 pm
by alex.barylski
My next question:

I have just read some more articles on building multiple language support - sounds PHP has a lot of issues with locale and language and PHP 6 is the life guard on duty. :P

Anyways, up until now I stored the charset in the HTML <meta> tag but replaced it with headers() in PHP instead - apparently the recommended way, the latter is just redundant and can cause issues...

What about language specification? Where should I declare this?

Code: Select all

 <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en lang="en">  <head>    <title>Some title</title>     <meta http-equiv="content-language" content="en" />
Is it enough to declare the language code in the <html> using lang attribute - is the <meta> tag again redundant? I would think it is redundant, because if it's placed after the <title> tag then how is the browser to know how to render the <title> text? I think the <meta> is an archaic technique and best replaced by headers or in the <html> tag explicitly.

Can you set language codes from PHP using header() as well?

Code: Select all

header('Content-Language: en');
The only issue I see in using the header over the lang attribute is that some bots may only look for HTML attributes or meta tags when cateogirzing web sites in their search cache...

Opinions? Advice?

Cheers :)

Re: A question about language...

Posted: Sun Feb 17, 2008 4:37 am
by Ollie Saunders
The advice I read for dealing with character encoding (a similar and in many ways related issue) is to give the browser as many chances as possible. Go with all of them but for god's sake be consistent.

Also have you thought about content negotiation? Do you know what it is? It involves the Accept-Language request header.