Code: Select all
#!/perl/bin/perl -w
#--------------------------------------#
# HERE IS Q7 2004. I'M SURE KEITH CAN #
# TRIM IT DOWN A BIT. I'VE TAKEN ALL #
# POSSIBLITIES REGARDING POSITIONS OF #
# WHITESPACE INTO ACCOUNT IN THE LINKS #
#--------------------------------------#
%args = (); # Collects the command line arguments
$i = 0;
foreach $filenames (@ARGV) {
$args{$i} = $filenames;
if (-e $filenames) { # Only open file if it exists (otherwise see error msg below)
my %links = (); #
my %outer = (); # Our Counters we'll add to for counting links
my %inner = (); #
open(DATA, "< $filenames");
while (<DATA>) { # Loop over lines in each file
while (m/<\s*a\s+href\s*=\s*"(ї^"]+)"\s*>/gi) { # Pattern match for full url including #abc
$links{$1}++; # Store all urls for current file in array
}
}
foreach $keys (keys %links) { # Loop over all the collected urls
if ($keys =~ m{(\w+://ї^\#]+)(\#ї^\#]*)?}gi || $keys =~ m{(/?ї^/\#]+/ї^\#]*)+(\#ї^\#]*)?}gi) { # Pattern match for outer refs
++$outer{$1};
} elsif ($keys =~ m{^\#} || $keys =~ m{^$filenames\#}gi) { # Pattern match for self refs
++$self_refer;
} elsif ($keys =~ m{^(ї^\#/]+)(\#ї^\#]*)?}gi) { # Pattern match for inner refs
++$inner{$1};
}
}
print ("\nFile: $filenames -- self refers $self_refer times\n"); # Display output (filename and self refer count)
print (" Inner refs:\n");
foreach my $keys (keys %inner) {
print (" ($inner{$keys}) $keys\n"); # Display results for inner refs
}
print (" Outer refs:\n");
foreach my $keys (keys %outer) {
print (" ($outer{$keys}) $keys\n"); # Display results for outer refsAugust 08, 2004
}
close(DATA);
} else {
print ("\n\nFile: $filenames does not exist in this directory\n"); # Error msg if file doesn't exist
}
++$i;
$self_refer = 0; # Reset self refer counter for next loop
}
__END__The question is as follows:
The problem with my code is that any IDENTICAL links are not seen by the regexp and the counters are not incremented.A web designer wants a tool which will summarise the links in a set of HTML documents. It should take a number of filenames on the command line, then analyse and summarise each file in turn. Links inside the documents should be categorised in three ways: as self references, as references to other files in the set being analysed, or as 'external' references to other files or websites. Links of the form url_string#text point to specific parts of a document, but the #text part should be ignored in this analysis. The tool should present information for each file as shown in the example below. The file prac-one.html self-refers zero times, has two links to another file in the analysis set, then three links to one external URL and one link to another.
File: prac-one.html -- self refers 0 times
Inner refs:
(2) misc-info.html
Outer refs:
(3) http://www.dur.ac.uk/~higs/pc-hugs.html
(1) seg01-code/GUI.lhs
Can anyone debug without totally rewriting my code. (it was my actual answer when i sat the exam).
Just in case this looks at all dodgy it's not. i post in here all the time and this is a question from the July 2004 paper we sat this year. see http://library.dur.ac.uk/ to get a copy of the paper (services => past exam papers => computer science => 2004 papers => Logic, grammar, and Software Tools or some thing to that effect).
thanks in advance