Page 1 of 1

Html Extraction For Language Translation

Posted: Thu Oct 21, 2010 5:29 pm
by neophyte
Here's the situation we have language strings in our database mixed with html.

We need to be able to extract the language from between the tags, and present them to the translator for translation.

Then before we save the new translation content, replace the old language strings with new language strings.

Whatever this extraction tool is, it has to replace the original language with the new language and save it without modifying the tags already there.

I've tried DOM but it's a bit fickle and alters the HTML. Does anyone have a suggestion for a library that could be of use here?

Or a whole different approach, that'd be great.

Re: Html Extraction For Language Translation

Posted: Thu Oct 21, 2010 5:51 pm
by VladSun
Post what you've tried so far.

Re: Html Extraction For Language Translation

Posted: Thu Oct 21, 2010 5:56 pm
by neophyte
I've tried DOM.

I took the content, and built a recursive array of dom node values. The theory being that if I can wind it up like this consistently and wind it out with the same result, I should be able to replace them in order.

Re: Html Extraction For Language Translation

Posted: Thu Oct 21, 2010 5:57 pm
by VladSun
neophyte wrote:I've tried DOM.

I took the content, and built a recursive array of dom node values. The theory being that if I can wind it up like this consistently and wind it out with the same result, I should be able to replace them in order.
I meant - PHP code :)

Re: Html Extraction For Language Translation

Posted: Thu Oct 21, 2010 6:38 pm
by neophyte

Code: Select all

 protected function _parse(&$replacements = array(), $top_node = null)
    {   
        $top_node = (empty($top_node))?$this->_get_root_node(): $top_node;
        if ($top_node->hasChildNodes()) {
            $subNodes = $top_node->childNodes;
            foreach($subNodes as $subNode) {
                if (($subNode->nodeType !=3) ||
                   (($subNode->nodeType ==3)
                   && (strlen(trim($subNode->wholeText)) >=1)))
                {
                    if ( $subNode->nodeName == '#text') {

                        if (!empty($replacements)) {
                            $subNode->replaceData(0, 1000000, array_shift($replacements)); 
                        }
                        $array[$subNode->nodeName][] = $subNode->nodeValue;
                    }
                }

                $array[] = $this->_parse($replacements, $subNode);
            }

        }
        return (isset($array))?$array: null;
    } 
Here is the bit where I'm building a recursive array.

Re: Html Extraction For Language Translation

Posted: Thu Oct 21, 2010 6:40 pm
by VladSun
What's the input data so I can replay and debug?