Friday, March 27, 2009

I18n and Haskell

I18n and Haskell

The first things I tried to code in Haskell were UI programs with Gtk and several console tools. It was the opposite approach to the one usual for learning Haskell; Haskellers mostly learn the language by solving hard algorithmic tasks. So my first problem was not about understanding monads, but to use UTF-8 in the code and to create multilingual interfaces.

The first of those problems seems to have been solved already (though I'd like to see native UTF-8 support in Haskell). But the second is not. Today I'll try to fill the gap in the internationalization (also known as i18n) of the Haskell programs.

The approach I'll talk about is based on GNU gettext utility. All my experience on this utility is taken from internationalizing Python applications. So I adapted this experience to the Haskell world.

Let's start with an example. Suppose that we want to make the following program multilingual:

module Main where

import IO

main = do
putStrLn "Please enter your name:"
name <- getLine
putStrLn $ "Hello, " ++ name ++ ", how are you?"

Using these recomendations, prepare strings and wrap them to some 'translation' function '__':

module Main where

import IO
import Text.Printf

__ = id

main = do
putStrLn (__ "Please enter your name:")
name <- getLine
printf (__ "Hello, %s, how are you?") name

We will return to the definition of '__' a bit later; for now we will leave the function empty (id).

The next step is to generate a POT file (a template which contains all strings to needed to be translated). For Python, C, C++ and Scheme there is the xgettext utility, but it doesn't support Haskell. So I created simple utility, that does the same thing for haskell files --- hgettext. You could find it on Hackage.

Now, from the directory that contains your project, run this command:

hgettext -k __ -o messages.pot Main.hs

It will gather all strings containing the function '__' from the Main.hs and write everything to messages.pot.

Now look at the resulting pot file:


# Translation file

msgid ""
msgstr ""

"Project-Id-Version: PACKAGE VERSION\n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2009-01-13 06:05-0800\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language-Team: LANGUAGE <LL@li.org>\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"

#: Main.hs:0
msgid "Please enter your name:"
msgstr ""

#: Main.hs:0
msgid "Hello, %s, how are you?\n"
msgstr ""

We are interested in the last part of this file -- the parts beginning with #: Main.hs:.... Each is followed by a pair of lines beginning with msgid and msgstr. msgid is the original text from the code, and msgstr is the translated string. Each language should have its own translation file. I will create two translations: German and English.

To create a PO file for specific locale we should use the msginit utility.
To generate the German translation template run:

msginit --input=messages.pot --locale=de.UTF-8

And for English translations run:

msginit --input=messages.pot --locale=en.UTF-8

If we look at the generated files (en.po and de.po), we will see that English translation is completely filled, only the German PO file needs to be edited. So we fill it with following strings:

#: Main.hs:0
msgid "Please enter your name:"
msgstr "Wie heißen Sie?"

#: Main.hs:0
msgid "Hello, %s, how are you?\n"
msgstr "Hallo, %s, wie geht es Ihnen?\n"

Now we have to create directories where these translations should be placed. Originally all translation files are placed in the folder /usr/share/locale/ , but you are free to select a different place. Run:

mkdir -p {de,en}/LC_MESSAGES

This will create two sub-directories 'de' and 'en', each containing LC_MESSAGES, in the current directory. Now we use the msgfmt tool to encode our po files to mo files (binary translation files):

msgfmt --output-file=en/LC_MESSAGES/hello.mo en.po
msgfmt --output-file=de/LC_MESSAGES/hello.mo de.po

Ok, now the preparatory tasks are done. The final step is to modify the code to support the internationalization:

module Main where

import IO
import Text.I18N.GetText
import System.Locale.SetLocale
import System.IO.Unsafe

__ :: String -> String
__ = unsafePerformIO . getText

main = do
setLocale LC_ALL (Just "")
bindTextDomain "hello" "."
textDomain "hello"

putStrLn (__ "Please enter your name:")
name <- getLine
printf (__ "Hello, %s, how are you?\n") name

Here we added three initialization strings:

setLocale LC_ALL (Just "")
bindTextDomain "hello" "."
textDomain "hello"

You'll have to download the setlocale package to enable the first function: it sets the current locale to the default value. The next two functions tell gettext to take the "hello.mo" message file from the locale directory (I set it to ".", but in general case, this directory should be passed from the package configuration).

The final step is to define the function '__'. It simply calls getText from the module Text.I18N.GetText. Its type is String -> IO String so I used unsafePerformIO to make it simpler the. The GetText library was written by me, so maybe in the future it will be possible to implement a version of getText which will work outside the IO monad.

Now you can build and try the program in different locales:

user> ghc --make Main.hs
[1 of 1] Compiling Main ( Main.hs, Main.o )
Linking Main ...

user> LOCALE=en_US.UTF-8 ./Main
Please enter your name:
Bond
Hello, Bond, how are you?

user> LOCALE=de_DE.UTF-8 ./Main
Wie heißen Sie?
Bond
Hallo, Bond, wie geht es Ihnen?

user>

That's all :), really it was much simpler than writing this blog entry. I hope this article will be helpful for you.

PS: hgettext is on Hackage now

PPS: Thanks to Michael Thompson, who corrected my poor English :)

7 comments:

  1. Hey, that's great!

    Perhaps a solution to the unsafe IO would be to make an IO function to run at startup that pre-caches all the translations and returns a pure object (a finite map of the translations), which would be passed to the translation function to lookup individual strings in the rest of the code.

    ReplyDelete
  2. Conrad,

    In this case, we'll have to pass this translation function to all functions that need translation. I don't see the method how we could use global translation function (outside the IO monad).

    Consider the following example, lets we have function

    translate :: String -> String

    It should be pure (if we don't want to use unsafePerformIO).
    So the call

    translate "Hello"

    should return the same string everytime, regardless the environment and context, but this is untrue for gettext, it returns different strings, according to the current locale settings.

    ReplyDelete
  3. Hi!

    It's nice to see this mini-tutorial on the web and have Haskell bindings for gettext. Thanks, Василь!

    Just one comment. Please do not split a phrase into two. Word order differs in different languages. The right way would be to translate “Hello, %s, how are you?” as a whole.

    Could you adjust your example accordingly?

    ReplyDelete
  4. Сергей,

    Thank you for your comment.

    GNU gettext manual has a useful section about how to prepare strings for translation: http://www.gnu.org/software/gettext/manual/gettext.html#Preparing-Strings.
    And I should use these guidelines in my post.

    I've updated the post and the wiki.

    ReplyDelete
  5. You could consider embedding the translation files directly into the source code using Template Haskell. I've written a library to do this called file file-embed (http://hackage.haskell.org/package/file-embed). Then it would truly a pure operation to look up values.

    ReplyDelete
  6. Michael,

    You wrote a nice lib, it remembered me how I included different resource files into win32 executables, with Haskell this is much simpler :)

    But this approach will have many disadvantages, corresponding to the i18n. For example, translators will have to install Haskell compiler to write translation files and merge it with program. Also I want to keep GNU gettext translating pipeline, so the translators will not be confused with our programs :).

    But to create simple standalone binary your approach will be the right thing, so I think, I will add it to the library.

    PS: could you give me link to the "best Template Haskell tutorial ever" if it exists of course :), because I have tried to teach TH several times without any success.

    ReplyDelete
  7. can somebody help me please?
    When i try to do :
    hgettext -k __ -o messages.pot Main.hs

    i received the error:hgettext is not recognized as an internal or external command...

    Thanks a lot!

    ReplyDelete