Server config: Mistakes with character encoding - part 2
Know what your server talks!
Know its configs and defaults, that is. If you've put a static (say) HTML file on your server, encoded as (say) utf-8
and you just can't get it displayed the right way ... look at your server's config.
Here's why.
Many webservers send a content-type HTTP header by default alongside with (e.g.) static HTML files. For example, Apache by default sends this one:
Content-Type: text/html; charset=iso-8859-1
If there's a content-type HTTP header being send it doesn't matter if you've included a meta-tag in your HTML file, like this one:
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-15">
Common browsers will prefer information that's send via HTTP headers over information that's embedded as meta-tags into the document itself.
You might well have put the correct meta-tag onto your page (e.g. declaring that this very document is utf-8 encoded) ... if the server sends a content-type HTTP header along with it that declares it is encoded as latin-1
your browser will stick to the HTTP header.
Among the top candidates: Rails' page caching
This particular mishap might well catch you when your used to configuring your Ruby on Rails applications but for the first time ever start using Rails' page caching mechanism. Page caching is the only one of Rails' caching mechanisms where not Rails but the webserver will send the HTTP headers.
When you've configured Rails to e.g. talk utf-8 to the world, but haven't pay attention to the headers your server is sending they'll most probably differ from the Rails headers.
That is: the first page that's send by Rails will be fine - the browser will know it as utf-8
and display everything correctly. Every subsequent request to the same page will be responded with the same data but preceeded by a different content-type header, so the browser will try to display it as latin-1
...
How to check this?
To find out what HTTP headers are send for a given file you can just use curl:
curl -I http://rubyonrails.org
... will do the job.
Oh, and if you can't use here's an online HTTP header sniffer ... there are various other tools out there, too.
How to fix this?
It's usually pretty trivial to change your webservers config to send your stuff with a different content-type header. For example, here's a bunch of useful information about overwriting Apache's default encoding with a .htaccess file. For example, in your .htaccess file you could use:
<Files "*.htm"> ForceType "text/html; charset=UTF-8" </Files>
This page contains a howto about configuring Mongrels mime-types. Basically, you add some MIME types to a YAML file like config/mongrel_mime.yml:
--- .htm: text/html; charset=UTF-8 .html: text/html; charset=UTF-8 .txt: text/plain; charset=UTF-8
... and then specify this file when you start Mongrel, like this: mongrel_rails start -m config/mongrel.mime.yml [...]
.