Tom's Guide | Tom's Hardware | Tom's Games
![]() |
![]() |
![]() |
Hi,
I have a file,lets say InputList which contains a list of files.Inputfile
/home/kgeorg/nishana/survey_pages/common/AdvancedSearchPage.jsp
/home/kgeorg/nishana/survey_pages/common/Footer.jsp
/home/kgeorg/nishana/survey_pages/common/Header.jsp
etc..I have to open each file in Inputlist,parse through it and get the information stored in the meta tag
For Eg
<Title>ABC Personal Finance, Credit Cards, Business Services, and Travel Services</title><META NAME ="KEYWORDS" CONTENT="credit card, express, Credit Cards, express financial advisors, travel and entertainment, express card, hotel reservations, aexpress travel, express credit card, express cards, express financial, financial advisor">
<META NAME ="DESCRIPTION" CONTENT="ABC offers individuals online access to its world-class Card, Financial, and Travel services, including financial advice, retirement planning, air and hotel reservations and more.">
I have to extract the title, the contents in the meta tags KEYWORD and DESCRIPTION.Could anyone help me out with a shell script with AWK/SED to do the same?
Thanks,
Nishana

experimental:
[code]awk 'BEGIN{IGNORECASE=1}
/<Title>/,/<\/Title>/{
gsub("<Title>|</Title>", "");
title=$0
}
/<META NAME ="KEYWORDS" CONTENT=/,/">/ {
gsub(/META NAME ="KEYWORDS" CONTENT=|<|>|/,"")
gsub(/<|>|"/,"")
keywords=$0
}
/<META NAME ="DESCRIPTION" CONTENT=/,/">/ {
gsub(/<META NAME ="DESCRIPTION" CONTENT=|<|>|"/,"")
description=$0
}
END {
print "Title : " title
print "Keywords : " keywords
print "Description: " description
}' "file1"
[/code]

edited:
[code]awk 'BEGIN{IGNORECASE=1}
/<Title>/,/<\/Title>/{
gsub("<Title>|</Title>", "");
title=$0
}
/<META NAME ="KEYWORDS" CONTENT=/,/">/ {
gsub(/META NAME ="KEYWORDS" CONTENT=|<|>|/,"")
keywords=$0
}
/<META NAME ="DESCRIPTION" CONTENT=/,/">/ {
gsub(/<META NAME ="DESCRIPTION" CONTENT=|<|>|"/,"")
description=$0
}
END {
print "Title : " title
print "Keywords : " keywords
print "Description: " description
}' "file1"
[/code]

![]() |
![]() |
![]() |

This post is quite old and has been locked from receiving new replies. Please create a new posting instead.
| Ads by Google |