Page 1 of 1

Anyone wanna debug a perlscript? Durham Uni, UK Past exam :)

Posted: Sat Aug 07, 2004 9:57 pm
by Chris Corbyn
Here's the code (in full) - it doesn't increment the counters for identical urls (read below to see full problem):

Code: Select all

#!/perl/bin/perl -w

#--------------------------------------#
# HERE IS Q7 2004. I'M SURE KEITH CAN  #
# TRIM IT DOWN A BIT. I'VE TAKEN ALL   #
# POSSIBLITIES REGARDING POSITIONS OF  #
# WHITESPACE INTO ACCOUNT IN THE LINKS #
#--------------------------------------#

%args = (); # Collects the command line arguments
$i = 0;
foreach $filenames (@ARGV) {
	$args{$i} = $filenames;
	if (-e $filenames) { # Only open file if it exists (otherwise see error msg below)
		my %links = (); #
		my %outer = (); # Our Counters we'll add to for counting links
		my %inner = (); #
		open(DATA, "< $filenames");
		while (<DATA>) &#123; # Loop over lines in each file
			while (m/<\s*a\s+href\s*=\s*"(&#1111;^"]+)"\s*>/gi) &#123; # Pattern match for full url including #abc
				$links&#123;$1&#125;++; # Store all urls for current file in array
			&#125;
		&#125;
		foreach $keys (keys %links) &#123;  # Loop over all the collected urls
			if ($keys =~ m&#123;(\w+://&#1111;^\#]+)(\#&#1111;^\#]*)?&#125;gi || $keys =~ m&#123;(/?&#1111;^/\#]+/&#1111;^\#]*)+(\#&#1111;^\#]*)?&#125;gi) &#123; # Pattern match for outer refs
				++$outer&#123;$1&#125;;
			&#125; elsif ($keys =~ m&#123;^\#&#125; || $keys =~ m&#123;^$filenames\#&#125;gi) &#123; # Pattern match for self refs
				++$self_refer;
			&#125; elsif ($keys =~ m&#123;^(&#1111;^\#/]+)(\#&#1111;^\#]*)?&#125;gi) &#123; # Pattern match for inner refs
				++$inner&#123;$1&#125;;
			&#125;
		&#125;
		print ("\nFile: $filenames -- self refers $self_refer times\n"); # Display output (filename and self refer count)
		print ("   Inner refs:\n");
		foreach my $keys (keys %inner) &#123;
			print ("        ($inner&#123;$keys&#125;) $keys\n");  # Display results for inner refs
		&#125;
		print ("   Outer refs:\n");
		foreach my $keys (keys %outer) &#123;
			print ("        ($outer&#123;$keys&#125;) $keys\n"); # Display results for outer refsAugust 08, 2004 
		&#125;
		close(DATA);
	&#125; else &#123;
		print ("\n\nFile: $filenames does not exist in this directory\n"); # Error msg if file doesn't exist
	&#125;
	++$i;
	$self_refer = 0; # Reset self refer counter for next loop
&#125;

__END__
This is the task that was set for it (it's a past exam question from durham univeristy, UK but we've finished and we're debating how to do it :-) )

The question is as follows:
A web designer wants a tool which will summarise the links in a set of HTML documents. It should take a number of filenames on the command line, then analyse and summarise each file in turn. Links inside the documents should be categorised in three ways: as self references, as references to other files in the set being analysed, or as 'external' references to other files or websites. Links of the form url_string#text point to specific parts of a document, but the #text part should be ignored in this analysis. The tool should present information for each file as shown in the example below. The file prac-one.html self-refers zero times, has two links to another file in the analysis set, then three links to one external URL and one link to another.

File: prac-one.html -- self refers 0 times
Inner refs:
(2) misc-info.html
Outer refs:
(3) http://www.dur.ac.uk/~higs/pc-hugs.html
(1) seg01-code/GUI.lhs

The problem with my code is that any IDENTICAL links are not seen by the regexp and the counters are not incremented.

Can anyone debug without totally rewriting my code. (it was my actual answer when i sat the exam).

Just in case this looks at all dodgy it's not. i post in here all the time and this is a question from the July 2004 paper we sat this year. see http://library.dur.ac.uk/ to get a copy of the paper (services => past exam papers => computer science => 2004 papers => Logic, grammar, and Software Tools or some thing to that effect).

thanks in advance :-)

Posted: Sun Aug 08, 2004 3:48 am
by timvw
First you build links as ('url1' => 2, 'url2' => 9)

And then you loop through links and for each occurence you add 1 to inner, outer or self. This should not be 1 but the $links{$keys}. For 'url1' you would need to add 2.

So you need to change 3 lines in your code ;)

Posted: Sun Aug 08, 2004 5:49 am
by Chris Corbyn
Thanks, you're right.

We changed it to

Code: Select all

#!/perl/bin/perl -w

#--------------------------------------#
# HERE IS Q7 2004. I'M SURE KEITH CAN  #
# TRIM IT DOWN A BIT. I'VE TAKEN ALL   #
# POSSIBLITIES REGARDING POSITIONS OF  #
# WHITESPACE INTO ACCOUNT IN THE LINKS #
#--------------------------------------#

%args = (); # Collects the command line arguments
$i = 0;
foreach $filenames (@ARGV) &#123;
	$args&#123;$i&#125; = $filenames;
	if (-e $filenames) &#123; # Only open file if it exists (otherwise see error msg below)
		my %links = (); #
		my %outer = (); # Our Counters we'll add to for counting links
		my %inner = (); #
		open(DATA, "< $filenames");
		while (<DATA>) &#123; # Loop over lines in each file
			while (m/<\s*a\s+href\s*=\s*"(&#1111;^"]+)"\s*>/gi) &#123; # Pattern match for full url including #abc
				++$links&#123;$1&#125;; # Store all urls for current file in array
			&#125;
		&#125;
		foreach $keys (keys %links) &#123;  # Loop over all the collected urls
			if ($keys =~ m&#123;(\w+://&#1111;^\#]+)(\#&#1111;^\#]*)?&#125;gi || $keys =~ m&#123;(/?&#1111;^/\#]+/&#1111;^\#]*)+(\#&#1111;^\#]*)?&#125;gi) &#123; # Pattern match for outer refs
				if (!exists $outer&#123;$1&#125;) &#123;
					$outer&#123;$1&#125; = 0;
				&#125;
				$outer&#123;$1&#125; = ($outer&#123;$1&#125;+$links&#123;$keys&#125;);
			&#125; elsif ($keys =~ m&#123;^\#&#125; || $keys =~ m&#123;^$filenames\#&#125;gi) &#123; # Pattern match for self refs
				if (!$self_refer) &#123;
					$self_refer = 0;
				&#125;
				$self_refer = ($self_refer+$links&#123;$keys&#125;);
			&#125; elsif ($keys =~ m&#123;^(&#1111;^\#/]+)(\#&#1111;^\#]*)?&#125;gi) &#123; # Pattern match for inner refs
				if (!exists $inner&#123;$1&#125;) &#123;
					$inner&#123;$1&#125; = 0;
				&#125;
				$inner&#123;$1&#125; = ($inner&#123;$1&#125;+$links&#123;$keys&#125;);
			&#125;
		&#125;
		print ("\nFile: $filenames -- self refers $self_refer times\n"); # Display output (filename and self refer count)
		print ("   Inner refs:\n");
		foreach my $keys (keys %inner) &#123;
			print ("        ($inner&#123;$keys&#125;) $keys\n");  # Display results for inner refs
		&#125;
		print ("   Outer refs:\n");
		foreach my $keys (keys %outer) &#123;
			print ("        ($outer&#123;$keys&#125;) $keys\n"); # Display results for outer refsAugust 08, 2004 
		&#125;
		close(DATA);
	&#125; else &#123;
		print ("\n\nFile: $filenames does not exist in this directory\n"); # Error msg if file doesn't exist
	&#125;
	++$i;
	$self_refer = 0; # Reset self refer counter for next loop
&#125;

__END__
And it works now :-)