Computing.Net > Forums > Unix > Using awk to list links in a file

Computer Problems? Computing.Net has over 1,000,000 posts about all things technology related! Over 90% answered within 24 hours! Click here to start participating now! Also, be sure to check out the New User Guide.

Using awk to list links in a file

Reply to Message Icon

Name: grep2
Date: May 8, 2007 at 08:33:19 Pacific
OS: MS DOS
CPU/Ram: 80386/28M
Product: NEC
Comment:

Struggling with this one...
I have a html file
eg:
<html>
link
link2
link3
</html>

and i wish to create an awk script that will take this html file and print a list of the links contained in this file aswell as the frequency of each link (No. of times it occured).
example output:

www.google.com 2
http://computing.net 1

How would I go about starting this script, presumably making the '

Sponsored Link
Ads by Google

Response Number 1
Name: James Boothe
Date: May 8, 2007 at 09:55:31 Pacific
Reply:

In the main body, each line that is not empty (has Number of Fields greater than zero), it accumulates into an array called "kount" the number of occurrences of the first word in each line.

At the END, it prints the contents of the array, and pipes the printed output into a sort command.

You will need to beef up the logic concerning which lines should be accumulated. As coded, it will create entries in the array for the html headers and footers also.

To print the links left justified in a minimum 44-character field, use %-44s instead of %44s.

awk '\
{if (NF>0) kount[$1]++}
END {\
for (i in kount)
    printf "%44s %5d\n",i,kount[i]
}
' my.html | sort # -n


0

Response Number 2
Name: ghostdog
Date: May 8, 2007 at 20:34:41 Pacific
Reply:

you are on the right track, but a usual html file doesn't really look like that? Links may be everywhere, not really just at $1..you might want to show an example html.


0

Response Number 3
Name: grep2
Date: May 8, 2007 at 23:13:51 Pacific
Reply:

Does the regular expression need to be something like...

awk '/a href/{ sub (/.*a href = "/, ""); sub(/".*/,""); print }'

to safeguard for links being everywhere?
Remember if i have <\a href = "..."\> link <\/a> (N.B used '\' to show anchor tag as this forum creates a href hyperlink with the tag. i.e im trying to mimic what awk will see which is just the source code.) i want to print the link itself e.g "..." and the number of times that particular link occurs.


Intelligence Services
Software for Investigation and Change


0

Sponsored Link
Ads by Google
Reply to Message Icon

Related Posts

See More


creating data file with n... Multiple file multiple co...



Post Locked

This post is quite old and has been locked from receiving new replies. Please create a new posting instead.


Go to Unix Forum Home


Sponsored links

Ads by Google


Results for: Using awk to list links in a file

Find duplicate words in a file
    Summary: The gsub commands are to get rid of extra spaces that would mess up the comparisons.  I made an assumption that you do not need two or more spaces in a row. Any word appearing in the same tag list mo...
www.computing.net/answers/unix/find-duplicate-words-in-a-file/7999.html

Awk to edit field in file
    Summary: How can i edit a field in a file (saving the changes in the file) using awk. I can find the line that i want to edit by looking up field 1, but on that line, i don't know how to edit field 2 and save ...
www.computing.net/answers/unix/awk-to-edit-field-in-file/8482.html

how to replace a line in a file
    Summary: my problem is i want to replace a line in a file e.g i want to replace the line DBUID= (some name) with DBUID=$user variable file name is sample.ksh DBUID=aruns010 please help me in solving this pro...
www.computing.net/answers/unix/how-to-replace-a-line-in-a-file/7214.html