Solved How to extract link from html?

Microsoft Windows xp professional w/serv...
August 22, 2014 at 18:38:18
Specs: WINDOWS SERVER 2012, WINDOWS 8, 9.6GHZ/4GB
HAY
I AM TRYING TO WRITE A PROGRAM WHICH WILL DOWNLOAD SOMEONES FACEBOOK PICTURES FROM FACEBOOK AUTOMATICALLY.
AND I AM DOING WELL.
I AM USING WINDOWS POWERSHELL PROGRAMING LANGUAGE FOR THIS PROGRAM.
I AM USING INVOKE-WEBREQUEST COMMAND TO LOG IN TO FACEBOOK TO OPEN SOMEONES PHOTOS PAGE AND TO DOWNLOAD THAT PAGE INTO AN VARIABLE.

THIS SUCCEEDDES, AND I USE .CONTENT PROPERTY OF A REQUEST TO SEE THE HTML CODE OF A WEB PAGE.
THIS WORKS AWSOME.

HOWEWER, I GOT STUCK HERE.
I SEE THE LINKS OF PICTURES INSIDE HTML, AND I CAN MANUALLY FIND IN HTML, COPPY LINK FROM CONSOLE, AND PASTE IT TO ANOTHER INVOKE-WEBREQUEST COMMAND, WHICH THEN DOWNLOADS A PICTURE AND I JUST SEND IT TO A NEW FILE WITH JPG EXTENSION AS BYTES ENCODED AND THE PICTURE IS CREATED.

BUT THE PROBLEM IS, THAT I WANT TO DO IT AUTOMATICALLY.
AND I DO NOT HOW.
SO MY QUESTION IS, HOW TO FILTER OUT ONLY LINKS FROM A HTML FILE USING POWERSHELL?

SO, I CAN DOWNLOAD A WEB PAGE, IN HTML, BUT I NEED A FILTER WHICH WILL EXTRACT ONLY LINKS FROM HTML PAGE.

DO YOU KNOW HOW CAN I DO IT?
THANX.

message edited by SYOBSYOT


See More: How to extract link from html?

Report •


#1
August 22, 2014 at 21:06:11
✔ Best Answer
Wget might help, unless you want to stick to one platform:

GNU Wget 1.9.1, a non-interactive network retriever.
Usage: WGET [OPTION]... [URL]...

...
-i, --input-file=FILE download URLs found in FILE.
-F, --force-html treat input file as HTML.
-B, --base=URL prepends URL to relative links in -F -i file.


Recursive accept/reject:
-A, --accept=LIST comma-separated list of accepted extensions.
-R, --reject=LIST comma-separated list of rejected extensions.


Report •

#2
August 23, 2014 at 11:35:46
I DID NOT LOOK AT THIS DESCRIPTION IN HELP DETAILY, BUT AS FAR AS I LOOKED, THIS PROGRAM CAN DOWNLOAD A LIST OF URLS WHICH ARE IN A FILE.
BUT, WHAT HE MENT IS THAT URLS ARE WRITTEN AS A LIST.
SO, THIS CAN HELP WHEN URLS ARE ALREADY EXTRACTED FROM THE HTML FILE, NOT HELP ME TO EXTRACT LINKS FROM HTML.

AND I WROTE SUCH FUNCTION IN POWERSHELL ALREADY.
IT USES DATA FILE, AND IT DOWNLOADS ALL ITEMS IN THAT DATA FILE.
JUST THERE ARE ONLY LINKS IN THAT FILE, NOTHING ELSE.
I THINK THE SAME THING IS WITH THIS PROGRAM.
IT CAN DOWNLOAD ALREADY EXTRACTED ARRAY OF LINKS, BUT MY FUNCTION IN POWERSHELL CAN DO THIS THING TOO. SO, I WILL NOT USE THIS PROGRAM.

BUT, I AM REALLY CLOSE TO WRITE THIS FINAL PART OF A PROGRAM.
I SUCCEEDED TO FILTER LINKS, BUT IN SOME CASE, SOME UNALLOWED CHARACTERS ARE THERE, AND SUPSTITUTED WITH OTHER CHARACTERS, SO IT DOES NOT WORK ALLWAYS.
BUT SOON I WILL FIX IT, SO THIS THREAD WILL NOT BE HERE ANYMORE. ;)

message edited by SYOBSYOT


Report •

Related Solutions


Ask Question