Googlebot searching for ".../bin/en.jsp"

Googlebot searching for ".../bin/en.jsp"

Rabin Yasharzadehe rabin at rabin.io
Wed May 21 01:26:02 IDT 2014


After searching the access log as well, i found out the reference to our
server came from one our older sites, after grepping the entire site i
found out that we actually do have links to *.jsp urls :(

Thank you.


*--Rabin*


On Wed, May 21, 2014 at 12:55 AM, E.S. Rosenberg <esr at g.jct.ac.il> wrote:

> host 66.249.79.57
> 57.79.249.66.in-addr.arpa domain name pointer
> crawl-66-249-79-57.googlebot.com.
>
> This suggests that it is indeed googlebot, why should they put GCE on
> googlebot hosts, way too high a risk of resulting in blocked bots.
> I would guess that either
> - They were testing a new crawler that is messing up (likely, but less I
> think since it should/would do so locally first)
> - Some other site contains a link to the content you don't have or someone
> (intentionally) made a query of the google search engine which triggered
> the crawler trying said URI.
>
> Regards,
> Eliyahu - אליהו
>
>
> 2014-05-20 12:14 GMT+03:00 Rabin Yasharzadehe <rabin at rabin.io>:
>
>> Good point, thank you.
>>
>>
>> *-- Rabin*
>>
>>
>> On Tue, May 20, 2014 at 10:23 AM, shimi <linux-il at shimi.net> wrote:
>>
>>> On Tue, May 20, 2014 at 10:15 AM, Rabin Yasharzadehe <rabin at rabin.io>wrote:
>>>
>>>> I have installed fail2ban on one of my servers, and created a set of
>>>> rules to block some request the (from my point of view) looks like probing
>>>> attempts.
>>>>
>>>> One of the rules is to block on site, any request to *.jsp which i
>>>> don't have on this server.
>>>>
>>>> Today i got a mail about a blocked IP which belong to Google (based on
>>>> whois).
>>>> # whois 66.249.79.57
>>>>
>>>> can any one tell me, why Googlebot will search for something i don't
>>>> have any reference to in my site?
>>>>
>>>>
>>> The ".." does look strange, I think Googlebot always use Canonical URLs
>>> in general...
>>>
>>> Just a note: The fact that there's no reference in your site (if that is
>>> indeed a fact...) - does NOT say that there isn't such a reference in any
>>> other site on the Internet...
>>>
>>> Note that Google also has GCE - I would assume the netblocks for GCE
>>> would also say "Google"... maybe it's a crawler which is not really
>>> Googlebot, rather than an impersonator running through GCE...
>>>
>>> -- Shimi
>>>
>>>
>>
>> _______________________________________________
>> Linux-il mailing list
>> Linux-il at cs.huji.ac.il
>> http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cs.huji.ac.il/pipermail/linux-il/attachments/20140521/42e21f4e/attachment.html>


More information about the Linux-il mailing list