Page 1 of 2
Gimme a name : URI and friends
Posted: Sun Aug 06, 2006 5:41 pm
by Ambush Commander
I need a name for a class that takes a URI returns either false (the URI isn't valid), the original version (URI was fine) or an amended version (slight normalization was needed).
I would rather not use "URIValidator", which is too long and excludes the fixing aspect of the class. I might go with "URIDef", in accordance with some other classes named the same way (AttrDef and ChildDef) but Definition doesn't imply it will
do anything (which might mean that the other ones are wrong too). Also, not having URI in the name would be helpful, since people are always mixing up URL and URI.
And once that's been picked, I need a name for a hierarchy of classes that this "URIDef" class will defer processing to based on the scheme of the URI (you know, like URIScheme_http, URIScheme_ftp, etc.) Once again, preferably without URI in the name. "Scheme" is a little too vague however...
While we're on the topic, it would be nice to know whether or not you think the retrieval and loading of the scheme classes should be handled by "URIDef" or seperated into a distinct "URISchemaRegistry".
Finally, it would be nice to know whether or not this URIDef should inherit from AttrDef, which maintains exactly the same interface. The only difference is context: AttrDef is only called for validation of HTML attributes, while URIDef may be called by the CSSDef too. My reasoning is that since they're so complicated, we ought to just give them their own hierarchies.
It's funny, but the naming issue is the only thing that's stopping me from actually coding the class.

Maybe I should just use the ones I put up and deal with it.
Posted: Sun Aug 06, 2006 5:51 pm
by feyd
URIne
URIRef(erence)
URICheck/URICk
URIFix
URIRepair
URIHeal
URIScheme

Protocol?
Posted: Sun Aug 06, 2006 6:07 pm
by Ambush Commander
URIne
Oh dear...
URIScheme

Protocol?
Well, that's interesting that you should mention it, because if I got one thing from reading the RFC, it's that scheme != protocol. A few examples: http:// largely relies on the http protocol, but it also requires dns for domain names. news: is protocol independent. Same with mailto:.
URICheck/URICk
URIFix
URIRepair
URIHeal
Hmm... "Check" might work nice. The other three, however, imply that the URI is broken and needs to be fixed (which is not always the case or possible). (I'm really adamant about first impressions because if I don't find a really good solution, I'm sticking with Def and disambiguating inside the file)
URIRef(erence)
Not sure I understand.
Posted: Sun Aug 06, 2006 6:13 pm
by feyd
URIRef for shortest, URIReference for completeness. I go for descriptive names. I don't really care how long they get provided they make sense (and I couldn't make a shorter name for whatever reason.)
To me, URIValidator could easily work.
Re: Gimme a name : URI and friends
Posted: Sun Aug 06, 2006 6:22 pm
by Christopher
Ambush Commander wrote:I need a name for a class that takes a URI returns either false (the URI isn't valid), the original version (URI was fine) or an amended version (slight normalization was needed).
That sounds more like a troublesome funciton with overloaded return values. Perhaps two classes would make more sense: a UrlValidator/UrlCheck to see if the URL is valid, and a UrlFormat/UrlFix to "normalize" it. Although that word "normalize" gives me the feeling that that method should be in the class that actually needs the "normalized" value -- perhaps private as well.
Posted: Sun Aug 06, 2006 6:33 pm
by Ambush Commander
URIRef for shortest, URIReference for completeness. I go for descriptive names. I don't really care how long they get provided they make sense (and I couldn't make a shorter name for whatever reason.)
Aside: I like descriptive names, but not to the point where they won't fit in a single line after adding tabs for conditionals/classes (although at that point, you might need a factory for them). What I meant was that I couldn't see how "reference" had anything to do with the task at hand. (sorry, working kind of slow today).
That sounds more like a troublesome funciton with overloaded return values. Perhaps two classes would make more sense: a UrlValidator/UrlCheck to see if the URL is valid, and a UrlFormat/UrlFix to "normalize" it. Although that word "normalize" gives me the feeling that that method should be in the class that actually needs the "normalized" value -- perhaps private as well.
Well, it may not always be as simple as normalization. The URI might need munging, filtering, replacement with a dud value, etc. Normalization would be the common use case, but I'd want to be flexible enough to let the user do anything they please. Plus, in the end, this all is getting spit back out, so why not turn
http://www.google.com to
http://www.google.com/ ?
As for the overloaded return values, to the caller, they make perfect sense. Essentially, we end up with an assoc array of attributes, and we look up the appropriate AttrDef based on attribute name and element name, and then execute member function validate() (funny naming, eh?) If it returns false, we straight out remove the attribute. Otherwise, we replace the attribute with whatever it returns. In HTML, there seems to be a difference between a non-existant attribute and a blank attribute value.
Splitting it up might improve readability, but there's a sort of premature optimization (I know, I know, root of all evil) because this function is going to get called
every time someone references a URI. A cache (which I will be adding later) may help, but in the end, it's going to be used a lot, and it's not such a big readability problem. I just have to be sure to assertIdentical() in the unit tests.
To me, URIValidator could easily work.
Hrmm... maybe...
Posted: Sun Aug 06, 2006 8:56 pm
by Christopher
Ambush Commander wrote:As for the overloaded return values, to the caller, they make perfect sense. Essentially, we end up with an assoc array of attributes, and we look up the appropriate AttrDef based on attribute name and element name, and then execute member function validate() (funny naming, eh?) If it returns false, we straight out remove the attribute. Otherwise, we replace the attribute with whatever it returns. In HTML, there seems to be a difference between a non-existant attribute and a blank attribute value.
Sounds like you are doing something like:
When you probably should be doing:
Code: Select all
if ($thingy->isValid()) {
$url = $thingy->getFormatted()
The latter is clearer about what is actually going on.
Posted: Sun Aug 06, 2006 9:01 pm
by Ambush Commander
Looks more like:
Code: Select all
foreach ($tokens as $key => $token) {
if ($token->type !== 'start' && $token->type !== 'empty') continue;
// DEFINITION CALL
$defs = $this->definition->info[$token->name]->attr;
$attr = $token->attributes;
// do global transformations
// DEFINITION CALL
foreach ($this->definition->info_attr_transform as $transform) {
$attr = $transform->transform($attr);
}
// do local transformations
// DEFINITION CALL
foreach ($this->definition->info[$token->name]->attr_transform as $transform) {
$attr = $transform->transform($attr);
}
foreach ($attr as $attr_key => $value) {
// call the definition
if ( isset($defs[$attr_key]) ) {
if (!$defs[$attr_key]) {
$result = false;
} else {
$result = $defs[$attr_key]->validate($value, $config, $accumulator);
}
} elseif ( isset($d_defs[$attr_key]) ) {
$result = $d_defs[$attr_key]->validate($value, $config, $accumulator);
} else {
$result = false;
}
// put the results into effect
if ($result === false || $result === null) {
unset($attr[$attr_key]);
} elseif (is_string($result)) {
// simple substitution
$attr[$attr_key] = $result;
}
// we'd also want slightly more complicated substitution,
// although we're not sure how colliding attributes would
// resolve
}
// commit changes
// could interfere with flyweight implementation
$tokens[$key]->attributes = $attr;
}
Posted: Sun Aug 06, 2006 9:08 pm
by Christopher
Good grief!
Posted: Sun Aug 06, 2006 9:14 pm
by Ambush Commander
Well it's complicated logic!
Here's what it does:
Take the array $tokens and iterate through all tokens that have attributes (start and empty). During that iteration:
Apply global transformation: transformations that apply to all elements (ex. copy lang to xml:lang).
Apply local transformation: element-specific transformation. (ex. <p align="center"> to <p style="text-align:center;">)
Then, iterate through each attribute and perform validation on it. If the element's validator is explicitly set false, automatically remove the attribute. If the element's validator is set, run it normally. If the element has no validator, try the global validator. If that's not set remove the attribute.
That's it! If it's hard to understand... well... refactor time!
Posted: Sun Aug 06, 2006 9:43 pm
by Ollie Saunders
Sorry this may seem pinickity but:
Code: Select all
$this->definition->info_attr_transform or $this->definition->infoAttrTransform
I know which I prefer.
In fact I've started writing all my code to the
Zend Framework Coding Standards. I've found it excellent.
Oh and I've no idea what your code is doing
And my stance on the name is: UrlTest if you are performing tests against it only. UrlClean if you are tidying or removing. UrlFix if you are likely to perform more aggresive transformations on it such as adding stuff. Descriptive names are worthy of promotion but most people know ref == reference, its when people start calling this CTDProc and curPro that you need to be worried. Also as a heads up because I made this mistake a lot of times class names should always be superclass followed by subclass. I had a class Text and then SmallText and LargeText, that is wrong, should be Text, TextSmall and TextLarge. Even if you already know that it helps to be told from time to time because its really easy to do.
Posted: Sun Aug 06, 2006 9:53 pm
by Ambush Commander
Sorry this may seem pinickity but:
Code: Select all
$this->definition->info_attr_transform or $this->definition->infoAttrTransform
I know which I prefer.
The main thing is consistency. My rule is camelcaps for methods and classes and underscores for variables (member and non-member) and class namespacing.
I'll consider it though.
In fact I've started writing all my code to the Zend Framework Coding Standards. I've found it excellent.
I borrowed Zend framework's directory structure for my code, although Zend and I disagree on which unit tester to use (SimpleTest for me). The only code guidelines I've read before are PEARs... and that was a long time ago. I'll check it out.
And my stance on the name is: UrlTest if you are performing tests against it only. UrlClean if you are tidying or removing. UrlFix if you are likely to perform more aggresive transformations on it such as adding stuff.
See, see, the URL vs. URI stuff again! Sigh... also interesting how you don't capitalize them. Anyway, the problem is it's all three. The URI doesn't
have to fix it, that's an extra feature. A necessary extra feature, but an extra feature nonetheless. Maybe they should be seperated... (thinks of speed)
Also as a heads up because I made this mistake a lot of times class names should always be superclass followed by subclass. I had a class Text and then SmallText and LargeText, that is wrong, should be Text, TextSmall and TextLarge. Even if you already know that it helps to be told from time to time because its really easy to do.
I do Text, Text_Small, and Text_Large, with the directory layout like Text.php, Text/Small.php and Text/Large.php. Ocassionally I'll break this rule, if the class's hierarchy is big enough to merit its own name.
Posted: Sun Aug 06, 2006 10:21 pm
by Ollie Saunders
also interesting how you don't capitalize them
Yeah that's part of ZF Coding Standards. I actually know the coding standards better than some people at Zend now because I've found numerous standard breaking code in ZF itself hehe.
do Text, Text_Small, and Text_Large, with the directory layout like Text.php, Text/Small.php and Text/Large.php. Ocassionally I'll break this rule, if the class's hierarchy is big enough to merit its own name
Yeah I would do that if could be bothered to "Zend framework's directory structure for my code". Also for now I don't think my hierarchy is big enough.
*ole looks at his list of 34 files*
hmmm.
Posted: Sun Aug 06, 2006 11:30 pm
by Christopher
Ambush Commander wrote:Well it's complicated logic!
I think that the data should be encapsulated differently and the objects should do their own work. The final loops should be more like:
Code: Select all
while ($def = $defs->next()) {
if ($def->isValid($attr_key) ) {
$tokens[$key]->attributes[$attr_key] = $def->getFormatted($attr_key) ;
}
}
Posted: Mon Aug 07, 2006 12:25 am
by Luke
I would definately go with URIne