Page 1 of 1

matching double tags (equal)

Posted: Wed Apr 30, 2008 6:13 am
by mcog_esteban
Hello.
I need to transform something like:

Code: Select all

<tag><tag><a/><b/></tag></tag>
into

Code: Select all

<tag></a></b></tag>
, i read that using backreference is the best aproach.
Can someone help me how use preg_replace with the correct regex ?

Edit:
Using this regex almost solves half of my problem, but i can't get the first "<" and last ">" of the matched text

Code: Select all

 
<?php
 
$str  = '<CompanyAddress><CompanyAddress><BuildingNumber>Comapany address value</BuildingNumber>';
$str .= '<StreetName>Comapany address value</StreetName><AddressDetail>Comapany address value</AddressDetail>';
$str .= '<City>Comapany address value</City><PostalCode>Comapany address value</PostalCode>';
$str .= '<Region>Comapany address value</Region><Country>Comapany address value</Country></CompanyAddress></CompanyAddress>';
 
$regex2 = '/\\b(\\w+)\\b\\W+\\1/';
 
preg_match($regex2, $str, $matches);
 
var_dump($matches);
 
?>
 
this is what i get

Code: Select all

 
array(2) {
  [0]=>
  string(30) "CompanyAddress><CompanyAddress"
  [1]=>
  string(14) "CompanyAddress"
}
 

Re: matching double tags (equal)

Posted: Wed Apr 30, 2008 3:58 pm
by VladSun
Interesting problem :)

Code: Select all

$html = "<tag1><tag1><tag1><tag2>bold</tag2></tag1></tag1></tag1>";
 
while (($temp = preg_replace('/(<(\w+)?>)(\1)(.*)(<\/\2>)(<\/\2>)/', '$1$4$6', $html)) != $html)
    $html = $temp;
 
print($html);

Re: matching double tags (equal)

Posted: Thu May 01, 2008 4:17 pm
by prometheuzz
Here's another way:

Code: Select all

#!/usr/bin/php
<?php
$text = "<tag> <tag><a/><b/></tag>   </tag> </tag>";
$text = preg_replace('/(<[^>]+>)(\s*\1)+/', '$1', $text);
echo "$text\n";
?>

Re: matching double tags (equal)

Posted: Fri May 02, 2008 4:51 am
by mcog_esteban
Hi, thank you all.
The problem happened to be more dificult then expected, i forget to tell that i could have double tags as childs of other tags, ex:

Code: Select all

<tag1><tag1><tag2><tag2>value</tag2></tag2><tag3><tag3>value</tag3></tag3></tag1></tag1>
But i had help from another friend and now the problem is solved.
If Interest anyone, here's the code snippet:

Code: Select all

perl -pe 'BEGIN{undef$/}sub a{$_=shift;$a=s!(<([^>]+)>)\s*\1((?:.|\n)*?)?</\2>\s*</\2>!$1$3</$2>!g;if($a){a($_)}else{$_}};a($_)' [i]input_file[/i]
, maybe someone can convert this to php.

Re: matching double tags (equal)

Posted: Fri May 02, 2008 5:18 am
by prometheuzz
mcog_esteban wrote:Hi, thank you all.
The problem happened to be more dificult then expected, i forget to tell that i could have double tags as childs of other tags, ex:

Code: Select all

<tag1><tag1><tag2><tag2>value</tag2></tag2><tag3><tag3>value</tag3></tag3></tag1></tag1>
...
What should be the output of that?
Because the code snippet I posted earlier, produces the following:

Code: Select all

<tag1><tag2>value</tag2><tag3>value</tag3></tag1>

Re: matching double tags (equal)

Posted: Fri May 02, 2008 6:06 am
by mcog_esteban
Exactly that, but i haven't tryed your piece of code, i'm glad it worked, but like i said, i was with a friend trying to do this, he works mostly with perl and regular expressions and i got back to the forum today (1st May is a national holliday), i noticed that besides VladSun, you also posted.
Your snipet is alot more readable, which is good since i'm not a perl user.

Re: matching double tags (equal)

Posted: Fri May 02, 2008 6:32 am
by prometheuzz
mcog_esteban wrote:Exactly that, but i haven't tryed your piece of code, i'm glad it worked, but like i said, i was with a friend trying to do this, he works mostly with perl and regular expressions and i got back to the forum today (1st May is a national holliday), i noticed that besides VladSun, you also posted.
Your snipet is alot more readable, which is good since i'm not a perl user.
I'm no Perl programmer either, but that code snippet looks overly obfuscated to me.
This Perl snippet does exactly the same replacement as the PHP version I posted:

Code: Select all

#!/usr/bin/perl -w
$text = "<tag1><tag1><tag2><tag2>value</tag2></tag2><tag3><tag3>value</tag3></tag3></tag1></tag1>";
print "$text\n";
$text =~ s/(<[^>]+>)(\s*\1)+/$1/g;
print "$text\n";
 
# output:
# <tag1><tag1><tag2><tag2>value</tag2></tag2><tag3><tag3>value</tag3></tag3></tag1></tag1>
# <tag1><tag2>value</tag2><tag3>value</tag3></tag1>

Re: matching double tags (equal)

Posted: Fri May 02, 2008 6:46 am
by mcog_esteban
Hum...your code works best if the given text to replace is in one single line.
If i decide to format the xml (make it readable, or else is just a huge single line) and then replace the double tags, some formatation is lost.

Re: matching double tags (equal)

Posted: Fri May 02, 2008 7:24 am
by prometheuzz
mcog_esteban wrote:Hum...your code works best if the given text to replace is in one single line.
If i decide to format the xml (make it readable, or else is just a huge single line) and then replace the double tags, some formatation is lost.
Multiple lines works fine when testing on my machine: new line chars like '\n' and also '\r\n' get "eaten" by the '\s'.
Can you post an example when it fails?

Re: matching double tags (equal)

Posted: Fri May 02, 2008 8:17 am
by mcog_esteban
the xml shows ok in a browser like Firefox, but if open the file in ex, ultraedit, i see that the formatting is lost after somepoint.
it doesn't really matter that much.

XML produced, beautyfied and tags removed:

Code: Select all

<Root>
    <AuditFile>
        <HeaderFile>
            <AuditFileVersion>Some value</AuditFileVersion>
            <CompanyID>Some value</CompanyID>
            <TaxRegistrationNumber>Some value</TaxRegistrationNumber>
            <TaxAccountingBasis>Some value</TaxAccountingBasis>
            <CompanyName>Some value</CompanyName>
            <CompanyAddress>
                <BuildingNumber>Comapany address value</BuildingNumber>
                <StreetName>Comapany address value</StreetName>
                <AddressDetail>Comapany address value</AddressDetail>
                <City>Comapany address value</City>
                <PostalCode>Comapany address value</PostalCode>
                <Region>Comapany address value</Region>
                <Country>Comapany address value</Country>
            </CompanyAddress>
        <FiscalYear>Some value</FiscalYear>
        <StartDate>Some value</StartDate>
        <EndDate>Some value</EndDate>
        <CurrencyCode>Some value</CurrencyCode>
        <DateCreated>Some value</DateCreated>
        <ProductID>Some value</ProductID>
        <ProductVersion>Some value</ProductVersion>
        <HeaderComment>Some value</HeaderComment>
        <Telephone>Some value</Telephone>
        <Fax>Some value</Fax>
        <Email>Some value</Email>
        <WebSite>Some value</WebSite>
    </HeaderFile>
<MasterFiles>
    <GeneralLedgers/>
    <Customers/>
    <Suppliers/>
    <Products/>
    <TaxTables/>
</MasterFiles>
<GeneralLedgerEntries>
<NumberOfEntries/>
<TotalDebit/>
<TotalCredit/>
<Journals/>
</GeneralLedgerEntries>
<SourceDocuments>
    <SalesInvoices>
        <NumberOfEntries/>
        <TotalDebit/>
        <TotalCredit/>
        <Invoices>
            <Invoice>
                <InvoiceNo>IA10000001</InvoiceNo>
                <Period/>
                <InvoiceDate>2008-21-01</InvoiceDate>
                <InvoiceType>Venda a dinheiro</InvoiceType>
                <SystemEntryDate>2008-21-01T17:46:52</SystemEntryDate>
                <TransactionID/>
                <CustomerID/>
                <ShipTo/>
                <ShipFrom/>
                <Line>
                    <LineNumber>1</LineNumber>
                    <OrderReferences/>
                    <ProductCode/>
                    <ProductDescription/>
                    <Quantity>1</Quantity>
                    <UnitOfMeasure/>
                    <UnitPrice>26</UnitPrice>
                    <TaxPointDate>2008-21-01</TaxPointDate>
                    <References/>
                    <Description>Servi&#xE7;os Internet (alojamento/acesso)</Description>
                    <DebitAmount/>
                    <CreditAmount/>
                    <Tax/>
                    <SettlementAmount/>
                    </Line>
                    <DocumentTotals>
                        <TaxPayable/>
                        <NetTotal/>
                        <GrossTotal/>
                        <Currency/>
                        <Settlement/>
                        </DocumentTotals>
                        </Invoice>
                        <Invoice>
                        <InvoiceNo>IA10000002</InvoiceNo>
                        <Period/>
                        <InvoiceDate>2008-21-01</InvoiceDate>
                        <InvoiceType>Venda a dinheiro</InvoiceType>
                        <SystemEntryDate>2008-21-01T17:46:56</SystemEntryDate>
                        <TransactionID/>
                        <CustomerID/>
                        <ShipTo/>
                        <ShipFrom/>
                        <Line>
....
 
XML produced, tags removed, beautyfied:

Code: Select all

 
<Root>
    <AuditFile>
        <HeaderFile>
            <AuditFileVersion>Some value</AuditFileVersion>
            <CompanyID>Some value</CompanyID>
            <TaxRegistrationNumber>Some value</TaxRegistrationNumber>
            <TaxAccountingBasis>Some value</TaxAccountingBasis>
            <CompanyName>Some value</CompanyName>
            <CompanyAddress>
                <BuildingNumber>Comapany address value</BuildingNumber>
                <StreetName>Comapany address value</StreetName>
                <AddressDetail>Comapany address value</AddressDetail>
                <City>Comapany address value</City>
                <PostalCode>Comapany address value</PostalCode>
                <Region>Comapany address value</Region>
                <Country>Comapany address value</Country>
            </CompanyAddress>
            <FiscalYear>Some value</FiscalYear>
            <StartDate>Some value</StartDate>
            <EndDate>Some value</EndDate>
            <CurrencyCode>Some value</CurrencyCode>
            <DateCreated>Some value</DateCreated>
            <ProductID>Some value</ProductID>
            <ProductVersion>Some value</ProductVersion>
            <HeaderComment>Some value</HeaderComment>
            <Telephone>Some value</Telephone>
            <Fax>Some value</Fax>
            <Email>Some value</Email>
            <WebSite>Some value</WebSite>
        </HeaderFile>
        <MasterFiles>
            <GeneralLedgers/>
            <Customers/>
            <Suppliers/>
            <Products/>
            <TaxTables/>
        </MasterFiles>
        <GeneralLedgerEntries>
            <NumberOfEntries/>
            <TotalDebit/>
            <TotalCredit/>
            <Journals/>
        </GeneralLedgerEntries>
        <SourceDocuments>
            <SalesInvoices>
                <NumberOfEntries/>
                <TotalDebit/>
                <TotalCredit/>
                <Invoices>
                    <Invoice>
                        <InvoiceNo>IA10000001</InvoiceNo>
                        <Period/>
                        <InvoiceDate>2008-21-01</InvoiceDate>
                        <InvoiceType>Venda a dinheiro</InvoiceType>
                        <SystemEntryDate>2008-21-01T17:46:52</SystemEntryDate>
                        <TransactionID/>
                        <CustomerID/>
                        <ShipTo/>
                        <ShipFrom/>
                        <Line>
                            <LineNumber>1</LineNumber>
                            <OrderReferences/>
                            <ProductCode/>
                            <ProductDescription/>
                            <Quantity>1</Quantity>
                            <UnitOfMeasure/>
                            <UnitPrice>26</UnitPrice>
                            <TaxPointDate>2008-21-01</TaxPointDate>
                            <References/>
                            <Description>Servi&#xE7;os Internet (alojamento/acesso)</Description>
                            <DebitAmount/>
                            <CreditAmount/>
                            <Tax/>
                            <SettlementAmount/>
                        </Line>
                        <DocumentTotals>
                            <TaxPayable/>
                            <NetTotal/>
                            <GrossTotal/>
                            <Currency/>
                            <Settlement/>
                        </DocumentTotals>
                    </Invoice>
.....
this is how ultraedit shows both files...

Re: matching double tags (equal)

Posted: Fri May 02, 2008 9:04 am
by prometheuzz
It is unclear to me if you still have a question.
If you have, then which code snippet are you using? The Perl or the PHP? And how are you using it? A bit more detail please.

Thanks.

Re: matching double tags (equal)

Posted: Fri May 02, 2008 9:09 am
by mcog_esteban
I'm using your code now.

Code: Select all

 
<?php
$text = file_get_contents('out.xml');
$text = preg_replace('/(<[^>]+>)(\s*\1)+/', '$1', $text);
file_put_contents('out.xml', $text);
?>
 
There's no problem, i just use your code after creating the xml and works fine.
Thank you again.

Re: matching double tags (equal)

Posted: Fri May 02, 2008 9:15 am
by prometheuzz
mcog_esteban wrote: ...
There's no problem, i just use your code after creating the xml and works fine.
Thank you again.
Ah, ok.
You're welcome!