Thursday, October 22, 2009

Working with databases in Haskell. Part 2

Here I want to describe how I deal with tagged document storage in Haskell, using the relational database sqlite3 (through HDBC). Yes, I know, that relational database work very poor with document-oriented and tagged data, but for a little amount of data it could be the proper way to implement storage.

The following code works with databases in the same way as described in my previous post, and written in literate Haskell style, so you can easily save it in the some_file.lhs, load it in ghci and play with database.

Task description

Let’s suppose we have to store documents with following properties:

  • author — the string, representing the author
  • title — document’s title
  • body — the content of the document
  • list of tags — tags, assigned to the documents

All parameters will be Strings for simplicity, and we will not put any restrictions to the tags format.

The most obvious way to store these data is to use single table with separate fields for each property. But how to deal with tags? Document could have any amount of tags, or have nothing. We couldn’t create a table with infinite columns. Other solution, is to store all tags in one text field, and represent them as comma-separated lists, but in this case we have to put some restrictions to the tags (e.g. they shouldn’t hold commas) and each query involves different string matching and string updating, which should be done on client side, and those not efficient.

But there is the third way. We, along with the database gurus, will use many-to-many relation to represent tagged documents.

Begin to code


> import Database.HDBC
> -- we will use sqlite3 driver here
> import Database.HDBC.Sqlite3
> import Control.Monad (liftM, when)
> -- all our database code will be enclosed in ReaderT monad
> -- see previous post for details
> import Control.Monad.Reader
> import Data.Maybe
> import Control.Applicative

Define the internal Haskell types, from/to which we will serialize data:


> type TagList = [String]
>
> data Document = Doc {
> author :: String
> , title :: String
> , content :: String
> } deriving (Eq, Show)
>
> -- Our transaction type
> type Transact a = ReaderT Connection IO a

Define some helper functions to work with database and the function that runs our transaction monad


> -- Get database connection object
> conn :: Transact Connection
> conn = ask
>
> -- Query the database without result (update queries)
> query :: String -> [SqlValue] -> Transact ()
> query q v = do
> c <- conn
> liftM (const ()) $ liftIO $ run c q v
>
> -- Query the database with returning result
> query' :: String -> [SqlValue] -> Transact [[SqlValue]]
> query' q v = do
> c <- conn
> liftIO $ quickQuery' c q v
>
> -- Run the transaction
> runTransact :: String -> Transact a -> IO a
> runTransact dbname io = do
> c <- connectSqlite3 dbname
> runReaderT withCommit c
> where
> withCommit = do
> result <- io
> conn >>= liftIO . commit
> return result

OK, all preparations are done, now start implementing the core of our document data-store.

In order to implement many-to-many relation, we have to create three tables: one for documents, one for tags and one is relation between tags and documents.

Creating databases


> -- Create tables
> createDocumentsTbl =
> query "CREATE TABLE documents ( \
> \docid INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT, \
> \author TEXT NOT NULL, \
> \title TEXT NOT NULL, \
> \content TEXT NOT NULL, \
> \UNIQUE (author, title, content))" []
>
> createTagsTbl =
> query "CREATE TABLE tags ( \
> \tagid INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT, \
> \tag TEXT NOT NULL UNIQUE )" []
>
> createDocTagsTbl =
> query "CREATE TABLE doctags ( \
> \docid INTEGER, \
> \tagid iNTEGER, \
> \PRIMARY KEY (docid, tagid), \
> \FOREIGN KEY (docid) REFERENCES documents (docid), \
> \FOREIGN KEY (tagid) REFERENCES tags (tagid))" []
>
> -- Checks whether tables created, and create them otherwise
> initializeDb = do
> tables <- conn >>= liftIO . getTables
> mapM_ (\(tn,cf) -> when (tn `notElem` tables) cf)
> [("documents", createDocumentsTbl)
> ,("tags", createTagsTbl)
> ,("doctags", createDocTagsTbl)]

Now you can load this file to the ghci and execute:

runTransact "test.db" initializeDb

then check the database with sqlite3 test.db command and see, that all tables were successfully created.

Low-level access to the databases

Lets start implementing basic functions to work with our documents:


> -- Adds new tag to the database, do nothing if tag exists
> addTag tag =
> query "INSERT OR IGNORE INTO tags (tag) VALUES (?)"
> [toSql tag]
>
> -- Finds tag by name, returns id or Nothing
> findTag tag =
> liftM (idListToMaybe) $
> query' "SELECT tagid FROM tags WHERE tag = ?" [toSql tag]
>
> -- Return tag name by id
> getTag tagid =
> liftM (fmap (fromSql . head) . listToMaybe) $
> query' "SELECT tag FROM tags WHERE tagid = ?" [tagid]
>
> -- Remove tag from database
> removeTagById tagid =
> query "DELETE FROM tags WHERE tagid = ?" [tagid]
>
> -- Find document by name, author and content. Return docid or Nothing
> findDocument doc =
> liftM (idListToMaybe) $
> query' "SELECT docid FROM documents \
> \WHERE author = ? AND title = ? AND content = ?" $
> docToSqlValue doc [author, title, content]
>
> -- Adds new document document to the database
> addDocument doc =
> query "INSERT OR IGNORE INTO documents (author, title, content) \
> \VALUES (?,?,?)" $ docToSqlValue doc [author, title, content]
>
> -- Return document by id
> getDocument docid =
> liftM (listToDocument) $
> query' "SELECT author, title, content FROM documents \
> \WHERE docid = ?" [docid]
>
> -- Remove document from the database
> removeDocumentById docid =
> query "DELETE FROM documents WHERE docid = ?" [docid]
>
> -- Adds link between document and tag
> addLink docid tagid =
> query "INSERT OR IGNORE INTO doctags VALUES (?, ?)" $
> [docid, tagid]
>
> -- Remove link between document and tag
> removeLinkById docid tagid =
> query "DELETE FROM doctags WHERE docid = ? AND tagid = ?" $
> [docid, tagid]
>
> -- Get tags for document, returns list of tagids
> getDocumentTagsById docid =
> liftM (map head) $
> query' "SELECT tagid FROM doctags WHERE docid = ?" [docid]
>
> -- Get document ids for the tag id
> getTaggedDocumentsById tagid =
> liftM (map head) $
> query' "SELECT docid FROM doctags WHERE tagid = ?" [tagid]
>

High-level functions to work with data

So the basic functions, that create separate entities in the database are implemented, now we could use that functions to combine high-level operations:


> -- Adds new tag and return its id, or simply return id if tag exists
> -- we used fromJust here, because tag should exists
> addTag' tag = addTag tag >>
> (liftM fromJust $ findTag tag)
>
> -- Adds new document and its id, or simply return id if document exists
> addDocument' doc = addDocument doc >>
> (liftM fromJust $ findDocument doc)
>
> -- Adds complete document with tags
> addCompleteDocument doc tags = do
> docid <- addDocument' doc
> tagids <- mapM addTag' tags
> mapM_ (addLink docid) tagids
> return docid
>
> -- Get document tag names
> getDocumentTagNames docid = do
> tags <- getDocumentTagsById docid
> liftM catMaybes $ mapM getTag tags

And so on. By the way, last function getDocumentTagNames we could also implement as a single SQL query, using JOIN:


> -- Similar to `getDocumentTagNames` but implemented
> -- through SQL JOINs
> getDocumentTagNames' docid = do
> liftM (map (fromSql . head)) $
> query' "SELECT tag \
> \FROM tags LEFT JOIN doctags USING (tagid) \
> \WHERE docid = ?" [docid]

And finally, some helper functions, used before:


> -- Converts select result into Just v or Nothing
> idListToMaybe = fmap head . listToMaybe
>
> -- Converts Document structure to SqlValue list,
> -- the second argument - list of fields to convert
> docToSqlValue doc = map (toSql . ($ doc))
>
> -- Converts the list of SqlValues to the document
> listToDocument = fmap (toDoc . map fromSql) . listToMaybe
> where
> toDoc [a,b,c] = Doc a b c
> toDoc _ = Doc "" "" ""

Conclusion

I have just demonstrated how is easy to work with databases in Haskell, and also showed the basic principles of storing tagged documents, which is very important in the WEB (and often in desktop) applications.

The code is clearer than the code on Visual Basic or Delphi (from my memoirs) and simpler than code on Python. And I think this proves that Haskell is very good for real world applications.

No comments:

Post a Comment