<div dir="ltr"><br><div class="gmail_quote">2009/9/14 Shachar Shemesh <span dir="ltr">&lt;<a href="mailto:shachar@shemesh.biz">shachar@shemesh.biz</a>&gt;</span><br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">


<div dir="ltr" bgcolor="#ffffff" text="#000000">

Hi all,<br>

<br>

One of my clients is having a weird problem, and I&#39;m pretty much at my

wit&#39;s end as for what to do about it.<br>

<br>

The site is called &quot;Tzofit&quot; (at <a href="http://tzofit.co.il" target="_blank">tzofit.co.il</a>), and is an index and

publisher for Zimmers. When you search Google for &quot;צימרים&quot; the site

appears on the second page, and when you search Google for &quot;צופית&quot; it

is the first result. In both cases, you cannot miss it - Google

displays the site&#39;s title and summary as Japanese!<br>

<br>

Now here&#39;s where it gets really strange. While the main site is

proclaimed to be in Japanese, all the deep links are in Hebrew. If you

ask to see the Google cache, the site appears in Hebrew. If you search

for its address directly (<a href="http://tzofit.co.il" target="_blank">tzofit.co.il</a>), the site appears with correct

title and summary. The only explanation I have is that this is a Google

index bug.<br>

<br>

The problem is that even if that is the case, I cannot see what I can

do about it. I tried to ask about it on the Google forums

(<a href="http://www.google.com/support/forum/p/Web+Search/thread?tid=08c423ea40d5c1ab&amp;hl=en" target="_blank">http://www.google.com/support/forum/p/Web+Search/thread?tid=08c423ea40d5c1ab&amp;hl=en</a>),

but, as expected, got not replies. On the other hand, I did not manage

to find anything wrong with the actual page.<br>

<br>

Trying to translate the Japanese text, using Google Translate, back to

English seems to show that the text translates, but is not coherent

sentences. Then again, looking at the raw encoding, this does not

appear to be Hebrew interpreted with the wrong encoding (or am I

missing something?)<br>

<br>

If anyone has any clue, it would be much appreciated.<br><br></div></blockquote><div><br> I would try the following:<br><ul><li>remove extra newlines from beginning of document. an xml document should begin with an xml definition. maybe newlines are valid, i never checked, but usually they don&#39;t begin that way, so why do it... :)</li>

<li>in an html document, you define the language inside the html opening tag, with lang=&quot;he&quot;. the meta tag that does this is redundant, and I would assume google likes the html definition better.</li><li>the newlines in the file appears to be dos-style. maybe you want to try to run the file through dos2unix</li>

<li>it could be this windows-1255 thing - maybe try putting there iso-8859-8-i - or even better, switch to utf-8 altogether. &quot;everybody loves utf-8&quot; :)<br></li></ul><br>These are my ideas...<br><br>HTH,<br><br>-- Shimi<br>

</div></div></div>