[solved] regex issue; counting subpatterns..?

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
nordin
Forum Newbie
Posts: 2
Joined: Sun Jun 13, 2004 5:36 pm
Contact:

[solved] regex issue; counting subpatterns..?

Post by nordin »

hey,
i've got a little problem on my hands here, perhaps one of you have a solution..

basically the scenario is as follows;
i have a number of csv files (variable row and column count), all of which have to be parsed and slapped into a standard html table for presentation. so far so good, no problems here.

the problem, however, is that not all fields have values, resulting in a massive bunch of empty table cells, which i'd like to get rid of - as well as set colspans for the remaining cells -with- values to fill out the space and fix the flow a bit..

long story short, i have this:

Code: Select all

<tr>
   <td>something</td><td></td><td></td><td></td>
</tr>
<tr>
   <td></td><td>something</td><td></td><td>meep</td>
</tr>
<tr>
   <td></td><td></td><td></td><td>blah</td>
</tr>
- and i need to get it into something like this:

Code: Select all

<tr>
   <td colspan='4'>something</td>
</tr>
<tr>
   <td></td><td colspan='2'>something</td><td>meep</td>
</tr>
<tr>
   <td colspan='3'></td><td>blah></td>
</tr>
now, i was thinking along the lines of using a preg_replace for this, searching for two or more empty cells, optionally one non-empty cell followed by one or more empty cells, and replacing them with a single cell with a colspan - however this would require counting subpatterns, i think, and i frankly dont have a clue on how to do that.. just recently started getting into regexes, and while they are definately extremely useful, they sure as hell cause a rather nasty headache..

so, anyone have an idea on how to do this? if i can simply solve this with a nice little regex that'd be preferable, but if someone has a radically different solution that'd work i'm all ears.

:)
Last edited by nordin on Thu Jun 17, 2004 5:14 pm, edited 1 time in total.
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

I cooked this up:

Code: Select all

<?php

$text = '<tr> 
   <td>something</td><td></td><td></td><td></td> 
</tr> 
<tr> 
   <td></td><td>something</td><td></td><td>meep</td> 
</tr> 
<tr> 
   <td></td><td color="23" colspan=3>something</td><td></td><td>meep</td> 
</tr> 
<tr> 
   <td></td><td colspan="3" bgcolor="2">something</td><td></td><td>meep</td> 
</tr> 
<tr> 
   <td></td><td color="123" colspan="3" bgcolor="2">something</td><td></td><td>meep</td> 
</tr> 
<tr> 
   <td></td><td></td><td></td><td>blah</td> 
</tr>';

if(!defined('_DEBUG_'))
	define('_DEBUG_',0);
	
if(1 || _DEBUG_)
	echo $text;

function replacer($m)
{
	if(_DEBUG_)
	{
	echo "\n\n";
	print_r($m);
	}

	$parts = preg_split('#(<\s*?td\s*?>\s*?<\s*?/\s*?td\s*?>)#is',$m[0], -1, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE);

	if(_DEBUG_)
	print_r($parts);
	
	$previous = -1;
	$counter = 0;
	foreach($parts as $key => $part)
	{
		if(preg_match('#<\s*?td\s*?>\s*?<\s*?/\s*?td\s*?>#is',$part))
		{
			$counter++;
			$parts[$key] = '';
		}
		else
		{
			if($counter > 0)
			{
				if($previous == -1)
				{
					if($counter == 1)
						$parts[0] = '<td></td>';
					else
						$parts[0] = '<td colspan="'.$counter.'"></td>';
				}
				elseif(preg_match('#<\s*?td[^>]*?\s+?colspan\s*?=[^\d]*?(\d+?)[^>]*?>#is', $parts[$previous], $match))
				{
					$num = (int)$match[1] + $counter;
					$parts[$previous] = preg_replace('#(<\s*?td[^>]*?\s+?colspan\s*?=[^\d]*?)\d+?([^>]*?>)#is','${1}'.$num.'\\2',$parts[$previous]);
				}
				else
				{
					$parts[$previous] = preg_replace('#(<\s*?td[^>]*?)(>)#is', '\\1 colspan="'.($counter+1).'"\\2', $parts[$previous]);
				}
			}
			$counter = 0;
			$previous = $key;
		}
	}
	
	if($counter > 0)
	{
		if($previous == -1)
			$previous = 0;
		if(preg_match('#<\s*?td[^>]*?\s+?colspan\s*?=[^\d]*?(\d+?)[^>]*?>#is', $parts[$previous], $match))
		{
			$num = (int)$match[1] + $counter;
			$parts[$previous] = preg_replace('#(<\s*?td[^>]*?\s+?colspan\s*?=[^\d]*?)\d+?([^>]*?>)#is','${1}'.$num.'\\2',$parts[$previous]);
		}
		else
		{
			$parts[$previous] = preg_replace('#(<\s*?td[^>]*?)(>)#is', '\\1 colspan="'.($counter+1).'"\\2', $parts[$previous]);
		}
	}

	if(_DEBUG_)
	{
	print_r($parts);
	echo "\n\n";
	}
	
	$ret = implode('',$parts);
	
	if(empty($ret))
	{
		if($counter == 1)
			return '<td></td>';
		else
			return '<td colspan="'.$counter.'"></td>';
	}
	else
		return $ret;
}

header('Content-type: text/plain');
$replaced = preg_replace_callback('#(<\s*?td[^>]*?>[^\s]*?<\s*?/\s*?td\s*?>)?((<\s*?td\s*?>\s*?<\s*?/\s*?td\s*?>)+)#is','replacer',$text);

if(_DEBUG_)
echo "\n\n\n\n\n\n\n";
echo $replaced;

?>
outputs

Code: Select all

&lt;tr&gt; 
   &lt;td&gt;something&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;&lt;/td&gt; 
&lt;/tr&gt; 
&lt;tr&gt; 
   &lt;td&gt;&lt;/td&gt;&lt;td&gt;something&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;meep&lt;/td&gt; 
&lt;/tr&gt; 
&lt;tr&gt; 
   &lt;td&gt;&lt;/td&gt;&lt;td color="23" colspan=3&gt;something&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;meep&lt;/td&gt; 
&lt;/tr&gt; 
&lt;tr&gt; 
   &lt;td&gt;&lt;/td&gt;&lt;td colspan="3" bgcolor="2"&gt;something&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;meep&lt;/td&gt; 
&lt;/tr&gt; 
&lt;tr&gt; 
   &lt;td&gt;&lt;/td&gt;&lt;td color="123" colspan="3" bgcolor="2"&gt;something&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;meep&lt;/td&gt; 
&lt;/tr&gt; 
&lt;tr&gt; 
   &lt;td&gt;&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;blah&lt;/td&gt; 
&lt;/tr&gt;






&lt;tr&gt; 
   &lt;td colspan="4"&gt;something&lt;/td&gt; 
&lt;/tr&gt; 
&lt;tr&gt; 
   &lt;td&gt;&lt;/td&gt;&lt;td colspan="2"&gt;something&lt;/td&gt;&lt;td&gt;meep&lt;/td&gt; 
&lt;/tr&gt; 
&lt;tr&gt; 
   &lt;td&gt;&lt;/td&gt;&lt;td color="23" colspan=4&gt;something&lt;/td&gt;&lt;td&gt;meep&lt;/td&gt; 
&lt;/tr&gt; 
&lt;tr&gt; 
   &lt;td&gt;&lt;/td&gt;&lt;td colspan="4" bgcolor="2"&gt;something&lt;/td&gt;&lt;td&gt;meep&lt;/td&gt; 
&lt;/tr&gt; 
&lt;tr&gt; 
   &lt;td&gt;&lt;/td&gt;&lt;td color="123" colspan="4" bgcolor="2"&gt;something&lt;/td&gt;&lt;td&gt;meep&lt;/td&gt; 
&lt;/tr&gt; 
&lt;tr&gt; 
   &lt;td colspan="3"&gt;&lt;/td&gt;&lt;td&gt;blah&lt;/td&gt; 
&lt;/tr&gt;
Last edited by feyd on Thu Jun 17, 2004 7:15 pm, edited 1 time in total.
nordin
Forum Newbie
Posts: 2
Joined: Sun Jun 13, 2004 5:36 pm
Contact:

Post by nordin »

hey, thanks :)
didnt expect a big pile of code, more along the lines of a couple of hints, but i dont really mind, heh. might even learn something from it.

anyways, this seems to work pretty damn good.. there are a couple of minor issues, mainly due to some oddities in the source data i suspect, but i dont think its something i cant iron iron out on my own.

8)
Post Reply