Ocelli is a Web crawler owned and operated by GlobalSpec ®, the leading specialized search engine and information resource for the engineering community. Ocelli's mission is to find and index web pages for The Engineering Web sm from GlobalSpec, a unique slice of the World Wide Web focusing solely on engineering and technical content.

GlobalSpec respects your Web site and your ability to control how it is used. Ocelli implements:

  • Robots.txt exclusion protocol, allowing Webmasters to control what portions of their Web sites are visited by a crawler (User-agent: Ocelli)
  • HTML Robots META tags, allowing precise page-level control over whether robots may index pages and / or follow links from pages
  • Webmaster selectable crawl delays, to control page download frequency

By default, Ocelli introduces a 2.5 second delay between the end of one page download and the start of the next. If this places a burden on your Web site, please use the feedback form to request a longer delay for your site.

Feedback

Please send us your feedback about Ocelli.

Frequently Asked Questions

  1. Why is Ocelli crawling my Web site if it has no engineering content?
    Probably because your website is being pointed to by some engineering websites. If Ocelli finds no engineering content on your website, it will not return.

  2. How can I keep Ocelli from spidering my Web site?

    Through robots.txt exclusion protocol and/or HTML robots META tags. Alternatively, you may use our feedback form to request that GlobalSpec stop crawling your Web site. To restrict Ocelli from crawling your entire website:

    User-agent: Ocelli
    Disallow: /

    To restrict Ocelli from crawling any pages beginning with http://yoursite/private:

    User-agent: Ocelli
    Disallow: /private


  3. Does Ocelli support pattern matching in Robots.txt files?

    Yes, Ocelli does support some pattern matching in Robots.txt files. This is an extension of the Robots.txt exclusion standard, so not all bots may follow it.

    • '*' will match any sequence of characters
    • '$' indicates the end of an URL string

    To restrict Ocelli from crawling any dynamic page (containing the '?' character):

    User-agent: Ocelli
    Disallow: /*?

    To restrist Ocelli from requesting any PDF documents (ending in .pdf):

    User-agent: Ocelli
    Disallow: /*.pdf$


  4. Ocelli is downloading too many pages from my Web site, too quickly. How can I control this?

    You can specify how quickly Ocelli downloads pages from your website using the Crawl-Delay instruction in robots.txt (see the next FAQ item for more information) or you may use the feedback form to request a suitable crawl delay in seconds.


  5. Does Ocelli obey the Crawl-Delay instruction in robots.txt?

    Yes. An entry in robots.txt similar to the example below would instruct Ocelli to wait twenty seconds between each hit to your website.

    User-agent: Ocelli
    Crawl-delay: 20


  6. Why is your crawler requesting URLs that don't exist on my website?

    Ocelli will follow links to pages on your website. If Ocelli is attempting to crawl pages that don't exist, there may be outdated links on your website or an external website that need updating. If you believe Ocelli is erroneously downloading content, please use the feedback form to notify us of a problem, including relevant URL and/or log information.