|I am writing an html crawling script in order to quickly download data from a very huge and badly indexed site.|
The website shows the link for the first and last page of the db's result whitch are like
www.database.com/whatever/whatever-1.html for the first page
www.database.com/whatever/whatever-x.html for the last page (where x stands for an unknown number)
i want to read the last page's link and cut off anything but that last number, then compose all the pages links terminating from 1 (whitch of course will be the first page) to x using a simple loop and a counter.
My problem is all about isolating that "x number" with a regex (awk or sed) both the number (be it 2, 3 or 4 digits long) and the link may differ from time to time in content and lenght.
I need to find a regex whitch recognizes the last number (not digit) there is in a line and isolate it.
I know it's kid's stuff... but i AM a kid.
Thank you in advance.