Saturday, March 28, 2009

hgettext on Hackage

I've got access to the Haskell.org and Hackage, so now you could get hgettext package from Hackage by typing

cabal install --global hgettext

There is also a wiki page about internationalization of Haskell applications on the Haskell.org

In future only wiki and Hackage will contain most recent versions of hgettext library and documentation.

Friday, March 27, 2009

I18n and Haskell

I18n and Haskell

The first things I tried to code in Haskell were UI programs with Gtk and several console tools. It was the opposite approach to the one usual for learning Haskell; Haskellers mostly learn the language by solving hard algorithmic tasks. So my first problem was not about understanding monads, but to use UTF-8 in the code and to create multilingual interfaces.

The first of those problems seems to have been solved already (though I'd like to see native UTF-8 support in Haskell). But the second is not. Today I'll try to fill the gap in the internationalization (also known as i18n) of the Haskell programs.

The approach I'll talk about is based on GNU gettext utility. All my experience on this utility is taken from internationalizing Python applications. So I adapted this experience to the Haskell world.

Let's start with an example. Suppose that we want to make the following program multilingual:

module Main where

import IO

main = do
putStrLn "Please enter your name:"
name <- getLine
putStrLn $ "Hello, " ++ name ++ ", how are you?"

Using these recomendations, prepare strings and wrap them to some 'translation' function '__':

module Main where

import IO
import Text.Printf

__ = id

main = do
putStrLn (__ "Please enter your name:")
name <- getLine
printf (__ "Hello, %s, how are you?") name

We will return to the definition of '__' a bit later; for now we will leave the function empty (id).

The next step is to generate a POT file (a template which contains all strings to needed to be translated). For Python, C, C++ and Scheme there is the xgettext utility, but it doesn't support Haskell. So I created simple utility, that does the same thing for haskell files --- hgettext. You could find it on Hackage.

Now, from the directory that contains your project, run this command:

hgettext -k __ -o messages.pot Main.hs

It will gather all strings containing the function '__' from the Main.hs and write everything to messages.pot.

Now look at the resulting pot file:


# Translation file

msgid ""
msgstr ""

"Project-Id-Version: PACKAGE VERSION\n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2009-01-13 06:05-0800\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language-Team: LANGUAGE <LL@li.org>\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"

#: Main.hs:0
msgid "Please enter your name:"
msgstr ""

#: Main.hs:0
msgid "Hello, %s, how are you?\n"
msgstr ""

We are interested in the last part of this file -- the parts beginning with #: Main.hs:.... Each is followed by a pair of lines beginning with msgid and msgstr. msgid is the original text from the code, and msgstr is the translated string. Each language should have its own translation file. I will create two translations: German and English.

To create a PO file for specific locale we should use the msginit utility.
To generate the German translation template run:

msginit --input=messages.pot --locale=de.UTF-8

And for English translations run:

msginit --input=messages.pot --locale=en.UTF-8

If we look at the generated files (en.po and de.po), we will see that English translation is completely filled, only the German PO file needs to be edited. So we fill it with following strings:

#: Main.hs:0
msgid "Please enter your name:"
msgstr "Wie heißen Sie?"

#: Main.hs:0
msgid "Hello, %s, how are you?\n"
msgstr "Hallo, %s, wie geht es Ihnen?\n"

Now we have to create directories where these translations should be placed. Originally all translation files are placed in the folder /usr/share/locale/ , but you are free to select a different place. Run:

mkdir -p {de,en}/LC_MESSAGES

This will create two sub-directories 'de' and 'en', each containing LC_MESSAGES, in the current directory. Now we use the msgfmt tool to encode our po files to mo files (binary translation files):

msgfmt --output-file=en/LC_MESSAGES/hello.mo en.po
msgfmt --output-file=de/LC_MESSAGES/hello.mo de.po

Ok, now the preparatory tasks are done. The final step is to modify the code to support the internationalization:

module Main where

import IO
import Text.I18N.GetText
import System.Locale.SetLocale
import System.IO.Unsafe

__ :: String -> String
__ = unsafePerformIO . getText

main = do
setLocale LC_ALL (Just "")
bindTextDomain "hello" "."
textDomain "hello"

putStrLn (__ "Please enter your name:")
name <- getLine
printf (__ "Hello, %s, how are you?\n") name

Here we added three initialization strings:

setLocale LC_ALL (Just "")
bindTextDomain "hello" "."
textDomain "hello"

You'll have to download the setlocale package to enable the first function: it sets the current locale to the default value. The next two functions tell gettext to take the "hello.mo" message file from the locale directory (I set it to ".", but in general case, this directory should be passed from the package configuration).

The final step is to define the function '__'. It simply calls getText from the module Text.I18N.GetText. Its type is String -> IO String so I used unsafePerformIO to make it simpler the. The GetText library was written by me, so maybe in the future it will be possible to implement a version of getText which will work outside the IO monad.

Now you can build and try the program in different locales:

user> ghc --make Main.hs
[1 of 1] Compiling Main ( Main.hs, Main.o )
Linking Main ...

user> LOCALE=en_US.UTF-8 ./Main
Please enter your name:
Bond
Hello, Bond, how are you?

user> LOCALE=de_DE.UTF-8 ./Main
Wie heißen Sie?
Bond
Hallo, Bond, wie geht es Ihnen?

user>

That's all :), really it was much simpler than writing this blog entry. I hope this article will be helpful for you.

PS: hgettext is on Hackage now

PPS: Thanks to Michael Thompson, who corrected my poor English :)

Less code - more functionality

Yesterday I tried to implement one "simple" function. It should take a Haskell source and return a list of all parameters to the function
abc :: String->String
E.g. for the part of code:


xy = (abc "hello") ++ (abc "world")

main = do putStrLn (abc "hi")
putStrLn xy
putStrLn (abc "bye")



output should be:
["hello", "world", "hi", "bye"]


Haskell has library Language.Haskell.Parser to parse its own source files, but the output has very complex structure. For example, previous part of code, will be represented like:


(HsModule (SrcLoc {srcFilename = "", srcLine = 1, srcColumn = 1})
(Module "Main") (Just [HsEVar (UnQual (HsIdent "main"))]) []
[HsPatBind (SrcLoc {srcFilename = "", srcLine = 1, srcColumn = 1})
(HsPVar (HsIdent "xy")) (HsUnGuardedRhs (HsInfixApp (HsParen (HsApp (HsVar (UnQual
(HsIdent "abc"))) (HsLit (HsString "hello")))) (HsQVarOp (UnQual (HsSymbol "++")))
(HsParen (HsApp (HsVar (UnQual (HsIdent "abc"))) (HsLit (HsString "world")))))) [],
HsPatBind (SrcLoc {srcFilename = "", srcLine = 3, srcColumn = 1})
(HsPVar (HsIdent "main")) (HsUnGuardedRhs (HsDo [HsQualifier (HsApp (HsVar
(UnQual (HsIdent "putStrLn"))) (HsParen (HsApp (HsVar (UnQual (HsIdent "abc")))
(HsLit (HsString "hi"))))),HsQualifier (HsApp (HsVar (UnQual (HsIdent "putStrLn")))
(HsVar (UnQual (HsIdent "xy")))),HsQualifier (HsApp (HsVar (UnQual (HsIdent "putStrLn")))
(HsParen (HsApp (HsVar (UnQual (HsIdent "abc"))) (HsLit (HsString "bye")))))])) []])


Ugghhhh, looks terrible. The straightforward way to solve my task is to write a bunch of functions that will parse all datatypes, until they extract something like
(HsApp (HsVar (UnQual (HsIdent "abc"))) (HsList (HsString s)))
Maybe it simplier to regexp through haskell code?

No, and let me introduce TemplateHaskell. I haven't used it yet, but heard, that it is very powerfull part of the Haskell. It works like C++ Templates or macros, i.e. during program compilation. On the Haskell-Cafe Neil Mitchel pointed me to the use uniplate generic library. Without deep explorations and understanding how it work, I wrote a one-line function that solves my problem:


getParamList hscode= [x |
HsApp (HsVar (UnQual (HsIdent "abc"))) (HsList (HsString x)) <-
universeBi (parseModule hscode)]


A really brilliant result. Even beginner haskeller could easily understand what is happen here.

PS: It is amazing how fast and helpful haskell community is. Thank you guys :)