Page 1 of 1
matching double tags (equal)
Posted: Wed Apr 30, 2008 6:13 am
by mcog_esteban
Hello.
I need to transform something like:
into
, i read that using backreference is the best aproach.
Can someone help me how use preg_replace with the correct regex ?
Edit:
Using this regex almost solves half of my problem, but i can't get the first "<" and last ">" of the matched text
Code: Select all
<?php
$str = '<CompanyAddress><CompanyAddress><BuildingNumber>Comapany address value</BuildingNumber>';
$str .= '<StreetName>Comapany address value</StreetName><AddressDetail>Comapany address value</AddressDetail>';
$str .= '<City>Comapany address value</City><PostalCode>Comapany address value</PostalCode>';
$str .= '<Region>Comapany address value</Region><Country>Comapany address value</Country></CompanyAddress></CompanyAddress>';
$regex2 = '/\\b(\\w+)\\b\\W+\\1/';
preg_match($regex2, $str, $matches);
var_dump($matches);
?>
this is what i get
Code: Select all
array(2) {
[0]=>
string(30) "CompanyAddress><CompanyAddress"
[1]=>
string(14) "CompanyAddress"
}
Re: matching double tags (equal)
Posted: Wed Apr 30, 2008 3:58 pm
by VladSun
Interesting problem
Code: Select all
$html = "<tag1><tag1><tag1><tag2>bold</tag2></tag1></tag1></tag1>";
while (($temp = preg_replace('/(<(\w+)?>)(\1)(.*)(<\/\2>)(<\/\2>)/', '$1$4$6', $html)) != $html)
$html = $temp;
print($html);
Re: matching double tags (equal)
Posted: Thu May 01, 2008 4:17 pm
by prometheuzz
Here's another way:
Code: Select all
#!/usr/bin/php
<?php
$text = "<tag> <tag><a/><b/></tag> </tag> </tag>";
$text = preg_replace('/(<[^>]+>)(\s*\1)+/', '$1', $text);
echo "$text\n";
?>
Re: matching double tags (equal)
Posted: Fri May 02, 2008 4:51 am
by mcog_esteban
Hi, thank you all.
The problem happened to be more dificult then expected, i forget to tell that i could have double tags as childs of other tags, ex:
Code: Select all
<tag1><tag1><tag2><tag2>value</tag2></tag2><tag3><tag3>value</tag3></tag3></tag1></tag1>
But i had help from another friend and now the problem is solved.
If Interest anyone, here's the code snippet:
Code: Select all
perl -pe 'BEGIN{undef$/}sub a{$_=shift;$a=s!(<([^>]+)>)\s*\1((?:.|\n)*?)?</\2>\s*</\2>!$1$3</$2>!g;if($a){a($_)}else{$_}};a($_)' [i]input_file[/i]
, maybe someone can convert this to php.
Re: matching double tags (equal)
Posted: Fri May 02, 2008 5:18 am
by prometheuzz
mcog_esteban wrote:Hi, thank you all.
The problem happened to be more dificult then expected, i forget to tell that i could have double tags as childs of other tags, ex:
Code: Select all
<tag1><tag1><tag2><tag2>value</tag2></tag2><tag3><tag3>value</tag3></tag3></tag1></tag1>
...
What should be the output of that?
Because the code snippet I posted earlier, produces the following:
Code: Select all
<tag1><tag2>value</tag2><tag3>value</tag3></tag1>
Re: matching double tags (equal)
Posted: Fri May 02, 2008 6:06 am
by mcog_esteban
Exactly that, but i haven't tryed your piece of code, i'm glad it worked, but like i said, i was with a friend trying to do this, he works mostly with perl and regular expressions and i got back to the forum today (1st May is a national holliday), i noticed that besides VladSun, you also posted.
Your snipet is alot more readable, which is good since i'm not a perl user.
Re: matching double tags (equal)
Posted: Fri May 02, 2008 6:32 am
by prometheuzz
mcog_esteban wrote:Exactly that, but i haven't tryed your piece of code, i'm glad it worked, but like i said, i was with a friend trying to do this, he works mostly with perl and regular expressions and i got back to the forum today (1st May is a national holliday), i noticed that besides VladSun, you also posted.
Your snipet is alot more readable, which is good since i'm not a perl user.
I'm no Perl programmer either, but that code snippet looks overly obfuscated to me.
This Perl snippet does exactly the same replacement as the PHP version I posted:
Code: Select all
#!/usr/bin/perl -w
$text = "<tag1><tag1><tag2><tag2>value</tag2></tag2><tag3><tag3>value</tag3></tag3></tag1></tag1>";
print "$text\n";
$text =~ s/(<[^>]+>)(\s*\1)+/$1/g;
print "$text\n";
# output:
# <tag1><tag1><tag2><tag2>value</tag2></tag2><tag3><tag3>value</tag3></tag3></tag1></tag1>
# <tag1><tag2>value</tag2><tag3>value</tag3></tag1>
Re: matching double tags (equal)
Posted: Fri May 02, 2008 6:46 am
by mcog_esteban
Hum...your code works best if the given text to replace is in one single line.
If i decide to format the xml (make it readable, or else is just a huge single line) and then replace the double tags, some formatation is lost.
Re: matching double tags (equal)
Posted: Fri May 02, 2008 7:24 am
by prometheuzz
mcog_esteban wrote:Hum...your code works best if the given text to replace is in one single line.
If i decide to format the xml (make it readable, or else is just a huge single line) and then replace the double tags, some formatation is lost.
Multiple lines works fine when testing on my machine: new line chars like '\n' and also '\r\n' get "eaten" by the '\s'.
Can you post an example when it fails?
Re: matching double tags (equal)
Posted: Fri May 02, 2008 8:17 am
by mcog_esteban
the xml shows ok in a browser like Firefox, but if open the file in ex, ultraedit, i see that the formatting is lost after somepoint.
it doesn't really matter that much.
XML produced, beautyfied and tags removed:
Code: Select all
<Root>
<AuditFile>
<HeaderFile>
<AuditFileVersion>Some value</AuditFileVersion>
<CompanyID>Some value</CompanyID>
<TaxRegistrationNumber>Some value</TaxRegistrationNumber>
<TaxAccountingBasis>Some value</TaxAccountingBasis>
<CompanyName>Some value</CompanyName>
<CompanyAddress>
<BuildingNumber>Comapany address value</BuildingNumber>
<StreetName>Comapany address value</StreetName>
<AddressDetail>Comapany address value</AddressDetail>
<City>Comapany address value</City>
<PostalCode>Comapany address value</PostalCode>
<Region>Comapany address value</Region>
<Country>Comapany address value</Country>
</CompanyAddress>
<FiscalYear>Some value</FiscalYear>
<StartDate>Some value</StartDate>
<EndDate>Some value</EndDate>
<CurrencyCode>Some value</CurrencyCode>
<DateCreated>Some value</DateCreated>
<ProductID>Some value</ProductID>
<ProductVersion>Some value</ProductVersion>
<HeaderComment>Some value</HeaderComment>
<Telephone>Some value</Telephone>
<Fax>Some value</Fax>
<Email>Some value</Email>
<WebSite>Some value</WebSite>
</HeaderFile>
<MasterFiles>
<GeneralLedgers/>
<Customers/>
<Suppliers/>
<Products/>
<TaxTables/>
</MasterFiles>
<GeneralLedgerEntries>
<NumberOfEntries/>
<TotalDebit/>
<TotalCredit/>
<Journals/>
</GeneralLedgerEntries>
<SourceDocuments>
<SalesInvoices>
<NumberOfEntries/>
<TotalDebit/>
<TotalCredit/>
<Invoices>
<Invoice>
<InvoiceNo>IA10000001</InvoiceNo>
<Period/>
<InvoiceDate>2008-21-01</InvoiceDate>
<InvoiceType>Venda a dinheiro</InvoiceType>
<SystemEntryDate>2008-21-01T17:46:52</SystemEntryDate>
<TransactionID/>
<CustomerID/>
<ShipTo/>
<ShipFrom/>
<Line>
<LineNumber>1</LineNumber>
<OrderReferences/>
<ProductCode/>
<ProductDescription/>
<Quantity>1</Quantity>
<UnitOfMeasure/>
<UnitPrice>26</UnitPrice>
<TaxPointDate>2008-21-01</TaxPointDate>
<References/>
<Description>Serviços Internet (alojamento/acesso)</Description>
<DebitAmount/>
<CreditAmount/>
<Tax/>
<SettlementAmount/>
</Line>
<DocumentTotals>
<TaxPayable/>
<NetTotal/>
<GrossTotal/>
<Currency/>
<Settlement/>
</DocumentTotals>
</Invoice>
<Invoice>
<InvoiceNo>IA10000002</InvoiceNo>
<Period/>
<InvoiceDate>2008-21-01</InvoiceDate>
<InvoiceType>Venda a dinheiro</InvoiceType>
<SystemEntryDate>2008-21-01T17:46:56</SystemEntryDate>
<TransactionID/>
<CustomerID/>
<ShipTo/>
<ShipFrom/>
<Line>
....
XML produced, tags removed, beautyfied:
Code: Select all
<Root>
<AuditFile>
<HeaderFile>
<AuditFileVersion>Some value</AuditFileVersion>
<CompanyID>Some value</CompanyID>
<TaxRegistrationNumber>Some value</TaxRegistrationNumber>
<TaxAccountingBasis>Some value</TaxAccountingBasis>
<CompanyName>Some value</CompanyName>
<CompanyAddress>
<BuildingNumber>Comapany address value</BuildingNumber>
<StreetName>Comapany address value</StreetName>
<AddressDetail>Comapany address value</AddressDetail>
<City>Comapany address value</City>
<PostalCode>Comapany address value</PostalCode>
<Region>Comapany address value</Region>
<Country>Comapany address value</Country>
</CompanyAddress>
<FiscalYear>Some value</FiscalYear>
<StartDate>Some value</StartDate>
<EndDate>Some value</EndDate>
<CurrencyCode>Some value</CurrencyCode>
<DateCreated>Some value</DateCreated>
<ProductID>Some value</ProductID>
<ProductVersion>Some value</ProductVersion>
<HeaderComment>Some value</HeaderComment>
<Telephone>Some value</Telephone>
<Fax>Some value</Fax>
<Email>Some value</Email>
<WebSite>Some value</WebSite>
</HeaderFile>
<MasterFiles>
<GeneralLedgers/>
<Customers/>
<Suppliers/>
<Products/>
<TaxTables/>
</MasterFiles>
<GeneralLedgerEntries>
<NumberOfEntries/>
<TotalDebit/>
<TotalCredit/>
<Journals/>
</GeneralLedgerEntries>
<SourceDocuments>
<SalesInvoices>
<NumberOfEntries/>
<TotalDebit/>
<TotalCredit/>
<Invoices>
<Invoice>
<InvoiceNo>IA10000001</InvoiceNo>
<Period/>
<InvoiceDate>2008-21-01</InvoiceDate>
<InvoiceType>Venda a dinheiro</InvoiceType>
<SystemEntryDate>2008-21-01T17:46:52</SystemEntryDate>
<TransactionID/>
<CustomerID/>
<ShipTo/>
<ShipFrom/>
<Line>
<LineNumber>1</LineNumber>
<OrderReferences/>
<ProductCode/>
<ProductDescription/>
<Quantity>1</Quantity>
<UnitOfMeasure/>
<UnitPrice>26</UnitPrice>
<TaxPointDate>2008-21-01</TaxPointDate>
<References/>
<Description>Serviços Internet (alojamento/acesso)</Description>
<DebitAmount/>
<CreditAmount/>
<Tax/>
<SettlementAmount/>
</Line>
<DocumentTotals>
<TaxPayable/>
<NetTotal/>
<GrossTotal/>
<Currency/>
<Settlement/>
</DocumentTotals>
</Invoice>
.....
this is how ultraedit shows both files...
Re: matching double tags (equal)
Posted: Fri May 02, 2008 9:04 am
by prometheuzz
It is unclear to me if you still have a question.
If you have, then which code snippet are you using? The Perl or the PHP? And how are you using it? A bit more detail please.
Thanks.
Re: matching double tags (equal)
Posted: Fri May 02, 2008 9:09 am
by mcog_esteban
I'm using your code now.
Code: Select all
<?php
$text = file_get_contents('out.xml');
$text = preg_replace('/(<[^>]+>)(\s*\1)+/', '$1', $text);
file_put_contents('out.xml', $text);
?>
There's no problem, i just use your code after creating the xml and works fine.
Thank you again.
Re: matching double tags (equal)
Posted: Fri May 02, 2008 9:15 am
by prometheuzz
mcog_esteban wrote:
...
There's no problem, i just use your code after creating the xml and works fine.
Thank you again.
Ah, ok.
You're welcome!