Computer Problems? Computing.Net has over 1,000,000 posts about all things technology related! Over 90% answered within 24 hours! Click here to start participating now! Also, be sure to check out the New User Guide.
web crawler robots
Name: patcoola Date: April 18, 2006 at 01:41:56 Pacific OS: Windows XP pro CPU/Ram: P4m 2ghz/1gb DDR2-RAM
Comment:
the google crawler and other crawlers are crawling my website under both the domain name and the host user account url, because of the double urls the google crawler discards the pages it crawled, how do i set a robot.txt or .htaccess to do not crawl the host user url and only crawl the domian url?
ex: http://myhost.com/myusername/ // bad http://www.mydomian.com/ // good
Summary: Been given the assignment in my Web Devel. class to create a web crawler that would search out and return cookies. Any suggestions on how to go about that, or with a sample code, would be greatly appr...
Summary: If you are looking for a search tool that will run on your site, then it would need to be specific to your CMS application. And, since you didn't specify what CMS aplpication you are using, nobody cou...
Summary: For #1: "000-1" OR "000-2" OR "000-3" #2: I don't believe you can get anything past 1,000 results...Google has (probably accurately) assumed that nobody will want to go through more than 1,000 resul...