<div dir="ltr"><div class="gmail_default" style="font-family:tahoma,sans-serif">After searching the access log as well, i found out the reference to our server came from one our older sites, after grepping the entire site i found out that we actually do have links to *.jsp urls :(<br>
<br></div><div class="gmail_default" style="font-family:tahoma,sans-serif">Thank you.<br></div></div><div class="gmail_extra"><br clear="all"><div><div dir="ltr"><b><font face="arial, helvetica, sans-serif">--<br>Rabin</font></b><br>
</div></div>
<br><br><div class="gmail_quote">On Wed, May 21, 2014 at 12:55 AM, E.S. Rosenberg <span dir="ltr"><<a href="mailto:esr@g.jct.ac.il" target="_blank">esr@g.jct.ac.il</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div dir="ltr"><div><div><div><div><div><div>host 66.249.79.57<br>57.79.249.66.in-addr.arpa domain name pointer <a href="http://crawl-66-249-79-57.googlebot.com" target="_blank">crawl-66-249-79-57.googlebot.com</a>.<br><br>
</div>This suggests that it is indeed googlebot, why should they put GCE on googlebot hosts, way too high a risk of resulting in blocked bots.<br>
</div>I would guess that either<br></div>- They were testing a new crawler that is messing up (likely, but less I think since it should/would do so locally first)<br></div>- Some other site contains a link to the content you don't have or someone (intentionally) made a query of the google search engine which triggered the crawler trying said URI.<br>
<br></div>Regards,<br></div>Eliyahu - אליהו<br></div><div class="gmail_extra"><br><br><div class="gmail_quote"><div class="">2014-05-20 12:14 GMT+03:00 Rabin Yasharzadehe <span dir="ltr"><<a href="mailto:rabin@rabin.io" target="_blank">rabin@rabin.io</a>></span>:<br>
</div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_default" style="font-family:tahoma,sans-serif">Good point, thank you.<br></div>
</div><div><div class="h5"><div class="gmail_extra">
<br clear="all"><div><div dir="ltr"><b><font face="arial, helvetica, sans-serif">--<br>
Rabin</font></b><br></div></div><div><div>
<br><br><div class="gmail_quote">On Tue, May 20, 2014 at 10:23 AM, shimi <span dir="ltr"><<a href="mailto:linux-il@shimi.net" target="_blank">linux-il@shimi.net</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div><div>On Tue, May 20, 2014 at 10:15 AM, Rabin Yasharzadehe <span dir="ltr"><<a href="mailto:rabin@rabin.io" target="_blank">rabin@rabin.io</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div style="font-family:tahoma,sans-serif">I have installed fail2ban on one of my servers, and created a set of rules to block some request the (from my point of view) looks like probing attempts. <br>
<br></div><div style="font-family:tahoma,sans-serif">One of the rules is to block on site, any request to *.jsp which i don't have on this server. <br><br></div><div style="font-family:tahoma,sans-serif">
Today i got a mail about a blocked IP which belong to Google (based on whois).<br></div><div style="font-family:tahoma,sans-serif"><span style="font-family:courier new,monospace"># whois 66.249.79.57</span><br>
</div><div style="font-family:tahoma,sans-serif"><br></div><div style="font-family:tahoma,sans-serif">can any one tell me, why Googlebot will search for something i don't have any reference to in my site? <br><br></div>
</div></blockquote></div></div><div><br>The ".." does look strange, I think Googlebot always use Canonical URLs in general... <br><br></div><div>Just a note: The fact that there's no reference in your site (if that is indeed a fact...) - does NOT say that there isn't such a reference in any other site on the Internet...<br>
<br></div><div>Note that Google also has GCE - I would assume the netblocks for GCE would also say "Google"... maybe it's a crawler which is not really Googlebot, rather than an impersonator running through GCE...<span><font color="#888888"><br>
<br></font></span></div><span><font color="#888888"><div>-- Shimi<br></div></font></span></div><br></div></div>
</blockquote></div><br></div></div></div>
<br></div></div><div class="">_______________________________________________<br>
Linux-il mailing list<br>
<a href="mailto:Linux-il@cs.huji.ac.il" target="_blank">Linux-il@cs.huji.ac.il</a><br>
<a href="http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il" target="_blank">http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il</a><br>
<br></div></blockquote></div><br></div>
</blockquote></div><br></div>