stryderjzw wrote:Sorry for the delay, I had exams.
No problem - I expect these live TDD sessions to take some time.
I was getting some undefined index errors (I'd recommend E_ALL error reporting) but a minor change got everything running green again:
Code: Select all
class PatternIterator {
var $haystack;
var $pattern;
function PatternIterator($hay) {
$this->haystack = $hay;
}
function setPattern($pattern) {
$this->pattern = $pattern;
}
function next() {
if(preg_match($this->pattern, $this->haystack, $result)) {
if ( ($length = strlen($result[0])) > 0 ) {
$this->haystack = preg_replace(
$this->pattern,
str_repeat('',$length),
$this->haystack,
1);
return $result[0];
}
} else {
return false;
}
}
}
stryderjzw wrote:Probably not the most elegant solution, but it appears to work for the tests. So, if it works for the tests, then it should be good, no?
Absolutely. The green bar is all that really matters, within reason. If you've got a deadline to meet you might not always have time to perfect the class, but you'll always have the unit tests to refactor against later.
You'll notice I took the trigger_error lines out. Does it still work? This is actually quite a cute thing about testing: if you've got a hunch that you can make a change but you're not really sure you can just go ahead and experiment. If everything stays green you were right; if not it only takes a moment to undo. Without the tests you wouldn't have the same instant feedback and you'd have to sit down and think it through properly. Sometimes you get away with it; sometimes you don't.
Preg_replace_callback provides another solution:
Code: Select all
class PatternIterator
{
var $_pattern;
function PatternIterator($haystack)
{
$this->_haystack = $haystack;
}
function setPattern($pattern)
{
$this->_pattern = $pattern;
}
function next()
{
$this->bite = false;
$this->_haystack = preg_replace_callback(
$this->_pattern,
array(&$this, '_setBite'),
$this->_haystack,
1);
return $this->bite;
}
function _setBite($matches)
{
$this->bite = $matches[0];
return '';
}
}
stryderjzw wrote:I see how we're adding tests to add functionality to our class, but when do we know to stop?
You probably won't think of everything on the first pass. Tests never provide a 100% proof, but I think you can get as close to that as the amount of effort you're prepared to put in. At least you do have a formal document which describes the behaviour which an implementation must exhibit. You can always add further constraints later as new things come to mind. I find that the discipline of testing makes me think about the details much carefully than I otherwise might have so even the first-pass test case is a huge improvement on none at all.
One general point would be to make sure you test the class with the full range of values it could receive in the wild. If a method takes an integer parameter, I'd usually test with 0, 1 since these can sometimes be special cases. After that, any other number. Very often you're looking for the simplest way to get an inductive proof.
In case you're interested, here's the class and test case for my own PatternIterator (a bit of a work in progress so it might have a bug or two still). I needed a couple more features: the class will be used to parse search terms and I need to get the offsets in order to rank results by word order later. Preg_replace_callback is no good for that but you can iiterate through pattern matches with preg_match if you keep track of the current offset and feed that back into the function.
It's also set up so it can chew through found matches at the same it's iterating over the main target string ie:
Code: Select all
function _mainLoop
{
$this->_chewer->setPattern(..etc..);
while($token = $this->_chewer->next()) {
$this->_findInToken($token);
}
}
function _findInToken($token)
{
$this->_chewer->chewToken($token);
$this->_chewer->setPattern(..etc..);
while($foo = $this->_chewer->next()) {
// do something with $foo
}
$this->_chewer->chewHaystack();
}
Code: Select all
class TestOfPatternIterator extends UnitTestCase
{
function TestOfPatternIterator()
{
$this->UnitTestCase();
}
function testWithNoPattern()
{
$string = 'foo';
$pattern = '';
$it =& new PatternIterator($string);
$this->assertIdentical($it->next(), false);
$this->assertError('Empty regular expression');
}
function testReferencedTarget()
{
$string = 'foo';
$pattern = '/foo/';
$expected_offset = 0;
$it =& new PatternIterator($string);
$it->setPattern($pattern);
$this->assertIdentical($it->next(), 'foo');
$this->assertIdentical($it->getOffset(), $expected_offset);
$this->assertIdentical($it->next(), false);
$this->assertIdentical($it->getOffset(), null);
$this->assertIdentical($string, ' ');
}
function testCopiedTarget()
{
$string = 'foo';
$pattern = '/foo/';
$it =& new PatternIterator($string, true);
$it->setPattern($pattern);
$this->assertIdentical($it->next(), 'foo');
$this->assertIdentical($it->next(), false);
$this->assertIdentical($string, 'foo');
}
function testMultipleMatches()
{
$string = 'foo foo';
$pattern = '/foo/';
$expected_offset_0 = 0;
$expected_offset_1 = 4;
$it =& new PatternIterator($string);
$it->setPattern($pattern);
$this->assertIdentical($it->next(), 'foo');
$this->assertIdentical($it->getOffset(), $expected_offset_0);
$this->assertIdentical($it->next(), 'foo');
$this->assertIdentical($it->getOffset(), $expected_offset_1);
$this->assertIdentical($it->next(), false);
$this->assertIdentical($it->getOffset(), null);
$this->assertIdentical($string, ' ');
}
function testMultiplePatterns()
{
$string = 'foo bar';
$pattern_0 = '/foo/';
$pattern_1 = '/bar/';
$expected_offset_0 = 0;
$expected_offset_1 = 4;
$it =& new PatternIterator($string);
$it->setPattern($pattern_0);
$this->assertIdentical($it->next(), 'foo');
$this->assertIdentical($it->getOffset(), $expected_offset_0);
$this->assertIdentical($it->next(), false);
$this->assertIdentical($it->getOffset(), null);
$it->setPattern($pattern_1);
$this->assertIdentical($it->next(), 'bar');
$this->assertIdentical($it->getOffset(), $expected_offset_1);
$this->assertIdentical($it->next(), false);
$this->assertIdentical($it->getOffset(), null);
$this->assertIdentical($string, ' ');
}
// just a note for documentation
function testLookingPatternDoesNotReplaceSuffixOrPrefix()
{
$string = '"foo"';
$pattern = '/(?<=")foo(?=")/';
$expected_offset = 1;
$it =& new PatternIterator($string);
$it->setPattern($pattern);
$this->assertIdentical($it->next(), 'foo');
$this->assertIdentical($it->getOffset(), $expected_offset);
$this->assertIdentical($it->next(), false);
$this->assertIdentical($it->getOffset(), null);
$this->assertIdentical($string, '" "');
}
#!! untested in copy mode
// usage example in TermsParser, method: _findOrTerms()
function testFindInSubstring()
{
$haystack = 'foo "hello world" "dial 999" bar';
$it =& new PatternIterator($haystack);
$it->setPattern('/"[^"]+"/');
$token = $it->next();
$it->chewToken($token);
$it->setPattern('/[a-zA-Z]+/');
$this->assertIdentical($it->next(), 'hello');
$this->assertIdentical($it->getOffset(), 5);
$this->assertIdentical($it->next(), 'world');
$this->assertIdentical($it->getOffset(), 11);
$this->assertIdentical($it->next(), false);
$this->assertIdentical($it->getOffset(), null);
$it->chewHaystack();
$token = $it->next();
$it->chewToken($token);
$it->setPattern('/[a-zA-Z]+/');
$this->assertIdentical($it->next(), 'dial');
$this->assertIdentical($it->getOffset(), 19);
$this->assertIdentical($it->next(), false);
$this->assertIdentical($it->getOffset(), null);
$it->chewHaystack();
$it->setPattern('/[\w]+/');
$this->assertIdentical($it->next(), 'foo');
$this->assertIdentical($it->getOffset(), 0);
$this->assertIdentical($it->next(), 'bar');
$this->assertIdentical($it->getOffset(), 29);
$this->assertIdentical($it->next(), false);
$this->assertIdentical($it->getOffset(), null);
// note that the token will always be replaced by empty space in the
// haystack, even if its contents have not been completely chewed
$this->assertIdentical($haystack, str_pad('', 32));
$this->assertNotIdentical($haystack,
str_pad('', 24) . '999' . str_pad('', 5));
}
}
Code: Select all
class PatternIterator
{
var $_match_after_offset = 0;
var $_current_match_length = 0;
var $true_offset;
var $_haystack;
var $_pattern;
var $_token_offset = 0;
/*
param (string) $haystack
param (boolean) $copy: by default, the haystack is passed by ref
and pattern matches will be replaced with the
corresponding number of spaces; $copy === true
leaves the original haystack unaltered
*/
function PatternIterator(&$haystack, $copy = false) // implements Iterator. Probably.
{
if( !$copy) {
$this->_haystack = $this->_target_ref =& $haystack;
} else {
$this->_haystack = $this->_target_ref = $haystack;
}
}
/*
param (string) $pattern
*/
function setPattern($pattern)
{
$this->_pattern = $pattern;
$this->_reset();
}
/*
return (string/false)
*/
function next()
{
if(preg_match(
$this->_pattern,
$this->_haystack,
$matches,
PREG_OFFSET_CAPTURE,
$this->_match_after_offset)) {
$match = $matches[0][0];
$match_offset = $matches[0][1];
$this->_update($match, $match_offset);
return $match;
} else {
$this->true_offset = null;
$this->_target_ref = $this->_haystack;
return false;
}
}
function _update($match, $match_offset)
{
$this->_current_match_length = strlen($match);
$this->_removeMatchFromHaystack($match_offset);
$this->_match_after_offset =
$this->_current_match_length + $match_offset;
$this->true_offset = $match_offset + $this->_token_offset;
}
function _removeMatchFromHaystack($offset)
{
$this->_haystack =
substr($this->_haystack, 0, $offset) .
str_pad('', $this->_current_match_length) .
substr($this->_haystack, $offset + $this->_current_match_length);
}
/*
return (array)
*/
function getOffset()
{
return $this->true_offset;
}
function chewToken($token)
{
if( !isset($this->_cache)) {
$this->_cache = array();
$this->_cache['haystack'] = $this->_haystack;
$this->_cache['pattern'] = $this->_pattern;
$this->_cache['current_offset'] = $this->_match_after_offset;
}
$this->_haystack = $token;
$this->_token_offset =
$this->_match_after_offset - $this->_current_match_length;
$this->_reset();
}
function chewHaystack()
{
if(isset($this->_cache)) {
$this->_token_offset = 0;
$this->_haystack =& $this->_cache['haystack'];
$this->_pattern = $this->_cache['pattern'];
$this->_match_after_offset = $this->_cache['current_offset'];
unset($this->_cache);
} else {
// ??
trigger_error('no cache set');
}
}
function _reset()
{
$this->_match_after_offset = 0;
$this->_current_match_length = 0;
$this->true_offset = null;
}
}