Googlebot searching for ".../bin/en.jsp"
E.S. Rosenberg
esr+linux-il at g.jct.ac.il
Wed May 21 01:00:48 IDT 2014
Re:all - didn't notice it was linux-il
host 66.249.79.57
57.79.249.66.in-addr.arpa domain name pointer
crawl-66-249-79-57.googlebot.com.
This suggests that it is indeed googlebot, why should they put GCE on
googlebot hosts, way too high a risk of resulting in blocked bots.
I would guess that either
- They were testing a new crawler that is messing up (likely, but less I
think since it should/would do so locally first)
- Some other site contains a link to the content you don't have or someone
(intentionally) made a query of the google search engine which triggered
the crawler trying said URI.
Regards,
Eliyahu - אליהו
2014-05-20 12:14 GMT+03:00 Rabin Yasharzadehe <rabin at rabin.io>:
> Good point, thank you.
>
>
> *-- Rabin*
>
>
> On Tue, May 20, 2014 at 10:23 AM, shimi <linux-il at shimi.net> wrote:
>
>> On Tue, May 20, 2014 at 10:15 AM, Rabin Yasharzadehe <rabin at rabin.io>wrote:
>>
>>> I have installed fail2ban on one of my servers, and created a set of
>>> rules to block some request the (from my point of view) looks like probing
>>> attempts.
>>>
>>> One of the rules is to block on site, any request to *.jsp which i don't
>>> have on this server.
>>>
>>> Today i got a mail about a blocked IP which belong to Google (based on
>>> whois).
>>> # whois 66.249.79.57
>>>
>>> can any one tell me, why Googlebot will search for something i don't
>>> have any reference to in my site?
>>>
>>>
>> The ".." does look strange, I think Googlebot always use Canonical URLs
>> in general...
>>
>> Just a note: The fact that there's no reference in your site (if that is
>> indeed a fact...) - does NOT say that there isn't such a reference in any
>> other site on the Internet...
>>
>> Note that Google also has GCE - I would assume the netblocks for GCE
>> would also say "Google"... maybe it's a crawler which is not really
>> Googlebot, rather than an impersonator running through GCE...
>>
>> -- Shimi
>>
>>
>
> _______________________________________________
> Linux-il mailing list
> Linux-il at cs.huji.ac.il
> http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cs.huji.ac.il/pipermail/linux-il/attachments/20140521/a28cfe0f/attachment.html>
More information about the Linux-il
mailing list