Html Extraction For Language Translation

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
User avatar
neophyte
DevNet Resident
Posts: 1537
Joined: Tue Jan 20, 2004 4:58 pm
Location: Minnesota

Html Extraction For Language Translation

Post by neophyte »

Here's the situation we have language strings in our database mixed with html.

We need to be able to extract the language from between the tags, and present them to the translator for translation.

Then before we save the new translation content, replace the old language strings with new language strings.

Whatever this extraction tool is, it has to replace the original language with the new language and save it without modifying the tags already there.

I've tried DOM but it's a bit fickle and alters the HTML. Does anyone have a suggestion for a library that could be of use here?

Or a whole different approach, that'd be great.
User avatar
VladSun
DevNet Master
Posts: 4313
Joined: Wed Jun 27, 2007 9:44 am
Location: Sofia, Bulgaria

Re: Html Extraction For Language Translation

Post by VladSun »

Post what you've tried so far.
There are 10 types of people in this world, those who understand binary and those who don't
User avatar
neophyte
DevNet Resident
Posts: 1537
Joined: Tue Jan 20, 2004 4:58 pm
Location: Minnesota

Re: Html Extraction For Language Translation

Post by neophyte »

I've tried DOM.

I took the content, and built a recursive array of dom node values. The theory being that if I can wind it up like this consistently and wind it out with the same result, I should be able to replace them in order.
User avatar
VladSun
DevNet Master
Posts: 4313
Joined: Wed Jun 27, 2007 9:44 am
Location: Sofia, Bulgaria

Re: Html Extraction For Language Translation

Post by VladSun »

neophyte wrote:I've tried DOM.

I took the content, and built a recursive array of dom node values. The theory being that if I can wind it up like this consistently and wind it out with the same result, I should be able to replace them in order.
I meant - PHP code :)
There are 10 types of people in this world, those who understand binary and those who don't
User avatar
neophyte
DevNet Resident
Posts: 1537
Joined: Tue Jan 20, 2004 4:58 pm
Location: Minnesota

Re: Html Extraction For Language Translation

Post by neophyte »

Code: Select all

 protected function _parse(&$replacements = array(), $top_node = null)
    {   
        $top_node = (empty($top_node))?$this->_get_root_node(): $top_node;
        if ($top_node->hasChildNodes()) {
            $subNodes = $top_node->childNodes;
            foreach($subNodes as $subNode) {
                if (($subNode->nodeType !=3) ||
                   (($subNode->nodeType ==3)
                   && (strlen(trim($subNode->wholeText)) >=1)))
                {
                    if ( $subNode->nodeName == '#text') {

                        if (!empty($replacements)) {
                            $subNode->replaceData(0, 1000000, array_shift($replacements)); 
                        }
                        $array[$subNode->nodeName][] = $subNode->nodeValue;
                    }
                }

                $array[] = $this->_parse($replacements, $subNode);
            }

        }
        return (isset($array))?$array: null;
    } 
Here is the bit where I'm building a recursive array.
User avatar
VladSun
DevNet Master
Posts: 4313
Joined: Wed Jun 27, 2007 9:44 am
Location: Sofia, Bulgaria

Re: Html Extraction For Language Translation

Post by VladSun »

What's the input data so I can replay and debug?
There are 10 types of people in this world, those who understand binary and those who don't
Post Reply