• Free Toronto Classifieds - eClassifieds4U
  •  
       
       
     

    hogcrawler

    What is hogcrawler?

    Its the webcrawler/spider for www.hogdex.com.


    Is it polite?

    Of course, we're Canadian! We want to be good netizens as we crawl. We we do this:

    • Only crawl in the wee hours of the morning (Toronto time, of course)
    • Never hit the same host more than once a minute - typically less frequently.
    • Only do a shallow crawl of any site.
    • No retries.
    • Only request "text/html" and "text/plain" pages. i.e. don't waste your bandwidth downloading images, etc
    • Avoid CGI scripts and ASP and PHP pages with parameters
    • Obey ROBOTS meta tag NOINDEX, NOFOLLOW and NONE requests
    • Obey the robots.txt file. We obey requests to avoid directories and files types and look for our name: "hogcrawler". Presently we don't parse the rarely used time parameters. We haven't seen any time parameters used at the Toronto sites we have crawled. And we suspect the time parameters would be used to tell crawlers to crawl at night and not too frequently - stuff we do anyways.


    What are you doing with my content?

    The same kinda thing as every other search engine:

    • We are only indexing the words
    • We definitely don't have any plans to repurpose your fine content.
      e.g. we would never even think of passing off your news headlines as our own - shudder.


    Why should I let hogcrawler crawl my site?

    It's your choice, of course. But the main reasons are: free advertising and more hits. As with any other search engine, if a hogdex user does a search for something your site writes about the user will see pages from your site and will probably click on the link and go to your site.


    How can I get hogcrawler to crawl my site?

    We are just a little search engine and we don't want to index every Toronto site so we are indexing only the major Toronto sites. We do the newspapers, other high-profile sites, and sites they point to.


    Can my webcrawler crawl hogdex?

    Sure just obey our robots.txt file.

     
     
     
     
     
     
     
    todaytodayLeafsPeaceAirport/AirlinesStocksBooksWeatherMapsMayor Miller