Question about htaccess, robots.txt and crawl

Assembled / ASSEMBLED
February 9, 2011 at 18:56:59
Specs: Vista Ultimate, E6550, 4GB DDR2
I have a couple of questions.

1) How long after I register a domain can I expect the search engines index my site?

2) I do not have any robots.txt or any htaccess files on my server. Is that okay? Are they required for search engines to crawl my site?



See More: Question about htaccess, robots.txt and crawl

Report •

February 9, 2011 at 20:48:17
1.) That's a loaded question and a step into the giant realm of Search Engine Optimization. Just to get you started, you should submit your homepage to Google, Yahoo, and Bing.

An even better idea is to signup to the Webmaster tools provided by those 3 organizations. Once signed up, you can submit sitemaps in XML format (I suggest a compressed version) and get very helpful reports to optimize your site. Google and Yahoo are pretty quick to begin crawling, but Bing seems to take a while.

2.) You aren't required to have a robots.txt or .htaccess file for search engines to crawl your site, but you will find them to be very helpful. You should use a .htaccess file for canonicalization purposes. To a search engine, and are two separate websites, so you end up losing out on search result rankings.

Report •

February 10, 2011 at 00:04:13
Thanks very much,
That was a very handy reply and I'm going to bookmark your reply with the links you gave me for further use. Very useful, much appreciated. Thanks.


Report •

February 13, 2011 at 04:21:06
Hello again,
I'm asking some more questions.
I submitted the site to google and it is getting listed now. This is the site:
The attempt is to try to get it listed in the first page of google results for search phrases like "waterproofing goa" or 'water proofing goa".

This is another website I made for my neighbor: and this website does show up on the first page of google results when a user types for "ayurveda goa".

I have asked the proprietor of Tithi Enterprises to give me at least half a page's content of the importance of waterproofing, the damage caused by water on cement, rusting of re-enforced steel, etc, etc. I am awaiting content from him. Until then, I intend to use the site to learn as much as I can because it offers the latest php 5.2+ version and unlimited MySql databases.

So a few more questions:

1) How long would it take to get listed if I didn't submit the site to any of the search engines? Months, or longer?

2) I have opted for the linux hosting package, and I put all the files in the public_html folder. There is also a folder named www which is located in the parent folder of public_html (one up level from the public_html folder) and this www folder seems to be automatically duplicating the files I put into public_html. Is this a feature offered by the webhost as a backup?

3) On a few occasions I have found that even though I have uploaded a php file, after a while it seems that the older version of the file is being used on the server. It is almost like there is come caching going on at the server, and sometimes this caching doesn't work properly, especially with php files. Any comments/explanation about this would be helpful.

4) Finally, when I go to google and type I get the result to the portfolio page. But when I go to google and type I get no results. Why is this happening? Is it because google's indexing is not complete.

4) If I put the meta tags keywords and description into <?php echo "<meta name=\"keywords\" content=\".. various keywords ..\" "; ?> tags, will the search engine crawlers be able to crawl the meta tags or not?

BTW, Tithi is the proprietor's daughter's name. The content on the home page is written by me, but the content on the portfolio page was provided by him and I have to go through it slowly and make it sound more professional.


Report •

Related Solutions

February 13, 2011 at 10:27:11
Hey Sarosh. I'll try to best answer your questions and I'm happy to assist.

1.) If you didn't manually submit the website, it would take until a search robot follows a link from another website to yours. It might have been days, months, years, or never, so it is a good thing you submitted it yourself. I strongly suggest creating a site map in .xml or .xml.gz format and submitting that to Google Webmaster Tools. Search engines love site maps, and it will help the robot crawl all your web pages. You can even suggest to the robot how frequently to crawl each page.

If you were using a CMS like WordPress, you could use a plugin like "Google XML Sitemaps" which will automatically create them for you.

2,) I've seen websites where 'www' was the root directory and websites where 'public_html' was the root, but I've never seen one be a sub directory of the other. I think you'll be better off asking your web host that question.

3.) I've never encountered this problem with PHP pages, but I'd say it has something to do with the duplicates you mentioned in the previous question.

4.) The index.php page is a file that you identify as the homepage. Google recognizes it and removes the /index.php from the search page. If you searched for your site as your site comes up at the top of the SERP-Search Engine Result Page.

If you created a page called index.htm, you'd probably see that page as your homepage instead of the index.php. Web servers have a list of priorities to identify which file should be the homepage. The file names are usually 'index' or 'default' and the extensions typically include .htm .html .shtml .php. If you wanted both an index.htm and index.php, you could use the .htaccess file to specify which one you want as the homepage.

In your scenario, you searched for a specific page (portfolio.php) and then searched for the homepage.

5.) I don't see a problem with that because it returns valid HTML, so search robots will interpret it fine. You did forget to close the meta tag, so make sure you do that.

<?php echo "<meta name=\"keywords\" content=\".. various keywords ..\" /> "; ?>

The method I use is to enclose PHP statements in apostrophes and HTML attributes in quotes. So I would have wrote it to look like this:

<?php echo '<meta name="keywords" content=".. various keywords .." /> '; ?>

You could have also closed the PHP statement, wrote the HTML line, and then reopened the PHP statement:

yada yada ?><meta name="keywords" content=".. various keywords .." /><?php yada yada

It's just personal preference and development style.

It's worth noting that Google is no longer using meta keywords to improve your ranking on SERPs. For years, people have been abusing meta keywords to improve rankings so Google omitted that factor.

I still (and properly) use meta keywords on my site for those not searching on Google, but don't spend too much time modifying them. The page content is far more important, and search engines recognize that.

Good luck with the Tithi Enterprises project and your future web development endeavors. I love learning web dev, so you're welcome to pick my brain anytime.

Apologies if I don't respond to your reply immediately. I don't check this site daily.

Report •

February 14, 2011 at 09:52:02
Thanks for reminding me that google does not rely on the keywords, I had forgotten about that. I remember now one article that said about google that "with google, content is king".

This is the first time I am focusing on search results, it is very interesting to see how the site gets listed as I change/add content to it.

I wanted to use the site mainly to practice some real online php/MySql database programming, but it is also very interesting to see how the content and design of the site affects the search results ratings.

BTW, this might be a false observation, but I am getting different rankings when I search for "tithi enterprises" on my computer and different rankings when I search for it from my BB mobile browser.

Just to summarize about the search bots, would it be safe to say that the search bots only view the source code of the sites? They do not actually get to access the php content? What if I use classes and IDs like "tithi" "waterproofing" in the CSS file? Or create javascript variables named "tithi", "waterproofing", etc, etc?

Thanks for offering to help, I will post here again if I run into any difficulties. Appreciate your co-operation.


Report •

February 16, 2011 at 20:20:32
Hello yet again,
Some good news!.
When I type "waterproofing goa" or "water proofing goa" in I am getting as the no2 result.
Can someone else please confirm this before I start over-praising myself for a job well done?


Report •

February 18, 2011 at 11:27:18
Hello Sarosh,

One thing I learned about Search Engine Optimization is not to code your website for search spiders. Code it for your readers. Don't try to trick them. You may get away with it for a while, but you could get deleted from SERPs if you get caught. With that being said, don't intentionally use keywords as variables, identifiers, or class names. If you end up doing so as your own coding style, that's fine. Just don't have tricking search spiders in mind.

PHP is a server-side language, which means end-users (including robots) cannot see the code behind it. They can see anything client-side like JavaScript, just like you can view a page's source code. You can use a tool like this SEO Text Browser to view your site as a search spider would:

I'm not entirely sure why you get different search results on your mobile device. I don't have a web enabled cell phone, so I don't bother with mobile site design and search results. I predict specifically designed services for mobile devices won't last long because these devices are becoming more powerful and able to interpret a web page like a normal PC.

I just checked those queries on Google India and didn't see your website on the front page, or within the top-7 locations on the map. Because Google's algorithm is so complex and much of it is a secret, I cannot provide you with a concrete answer why our results differ. I would guess that because I'm from a different part of the world, Google didn't think your page would be relevant to my search.

Did you do your searches while logged out of iGoogle? If you're logged in, the results may not have been organic because of prior search history, marking a search result as a favorite, or inputting your location.

Apologies if I don't respond to your reply immediately. I don't check this site daily.

Report •

February 22, 2011 at 00:44:08
As of 22nd February, when I type "waterproofing goa" in then my site is the no1 result.
But if I type "waterproofing goa" in then the site is somewhere on page 7.
I have read about page rank fluctuations and one possible reason being duplicate content on the site, so I have taken down duplicate content.
Another suspicion I have is the use of the word "tithi" which is probably being marked as adult content if it is split up as "*** hi".

I have submitted the site to google maps and it is pending manual review.
If you can do a search on as well as for "waterproofing goa" and confirm the results, that would help.

The site was no1 on 2 times, but then it goes back to page 7.


BTW, I am using google webmasters on the site and have submitted an xml sitemap too. At first there was "No Data" for most of the fields, but now it is filling up slowly. When I submitted the sitemap there were 4 urls submitted and 1 url in site index.
Today there is 0 urls in site index. I have not done anything to cause this change. I think the google-ranking game is a whole different cup of game, and as you said, I will just focus on making the site and not worry about google rankings. GoogleBot is going to do what it wants anyway, not what we want it to do.

Report •

February 27, 2011 at 22:34:51
I just checked with that query and the site didn't show up within the 7 map locations or 10 websites on the first SERP. I don't pay close attention to page rank, but it's my understanding that it updates rather infrequently, perhaps every few months. How your site ranks on SERPs changes regularly as it expands and Google believes it would be of interest to the searcher.

I don't think Google is penalizing you for that term because they've incorporated other cultures into their searches. In addition, your site isn't based around adult materials and situations. I don't know why one of your pages stopped being indexed, but I did do a search and noticed 3 of your pages are indexed.

I agree with your last line. Just make sure you: 1.) Code for the readers, 2.) Provide good navigation throughout your pages so your readers and spiders can follow, 3.) Update your sitemap as you add new pages and resubmit it to Google Webmaster Tools each time, 4.) Use Webmaster Tools to find problems with your site like crawl errors and HTML suggestions, 5.) Have fun developing your website.

Apologies if I don't respond to your reply immediately. I don't check this site daily.

Report •

February 27, 2011 at 23:33:56
Thank you,
It was indeed fun. BTW, you were right about me being logged into google. I should have looked into that earlier. I get different rankings when I am logged into google and different rankings when I am logged out of google.
So far webmasters has not displayed any errors or any html suggestions. But the "search querries" and impression keep changing. I am checking google webmasters almost everyday, more because it is interesting than the rankings.

The location on the map is still "pending".

It was quite interesting to do this first hand, earlier I was only focusing on making the page and trying to learn some php-mysql. Now the site is listed on google and yahoo, but not yet on bing.
I appreciate your contribution, you made it more interesting with your guidance. I will post back whenever I have something interesting to report:)


Report •

February 28, 2011 at 16:44:44
I'm happy to hear that and look forward to hearing from you again. =)

Apologies if I don't respond to your reply immediately. I don't check this site daily, but you're welcome to PM me as a reminder.

Report •

Ask Question