Monday, July 11, 2011

The Toilet Paper Entrepreneur. Not a review

Many years ago I was fascinated with ideas shared by Robert Kiyosaki and others, that everyone could be a millionaire. The only is needed is to think that you’re already a millionaire and action accordingly.

That was easy. Only several hours a day, dreaming how would I spend my millions on yachts, helicopters, restaurants and all my dreams become true soon.

Yes this is bullshit, but for first-year student this looked very cool along with the fact, that book authors looked rich and successful people. By the way, I haven't spent much time on practicing this, because I was busy with other things, that fascinated me more - programming, studying, family. But I still know some people who believe in this, and still sometimes invite me to "dream together".

Ok, most of them, who believed in the easy money grove, went for work, gathering money, playing with children and ensured that there is no easy money. Each work or business requires much of efforts, nerves etc. If you want to get more money, than prepare to greater responsibility, or greater risk or both. Anyway, you'll have less time to sleep or to play with kids.

"But what about these lucky guys, from all these books? What about Page and Brin? Zukerberg? Other millionaires?" You'll ask. The answer is: "They're lucky, dude". Yes, they belong to the small number of population, we kindly name "lucky pants". I'm very happy these people exist and do cool stuff I'm using, but they couldn't act as examples. They're represent only 0.0001% of all businesses. Next 2-5% are ordinary businesses with ordinary income and 95% businesses are failed, or in a way of failing.

Anyway, I like to read books and when "The Toilet Paper Entrepreneur" comes through my eyes, I’ve decided to look on it.

To be shortly, this book left concurrent feelings. The good are that it is not big and reads easily. Also the author tells about hard way to success, contrary with previous dreaming-based methods. He also give there some fresh tips on how to start business without initial funding, how to keep away from angels and VCs on startup. There are also many tips on how to spend less or no money on advertising, web page, conference calls etc.

In the same time Mike presented a new population of entrepreneurs - TPE. But I don’t wannabe TPE, I just want to do what I like and get paid for it. And I don't think this book will create new move - TPE move.

Also, appealing to the lucky businesses and "The Secret" movie, left feeling that this book continues series of "Homebrew millionaire. Made easy" books, about which I have strong disagreement. Yep, believing in success should be present in all entrepreneurs, but pragmatic view should dominate.

I don't miss I've read this book; it has many interesting advices, and tends to be real. Also it prepares you to a hard way if you'll choose to be an entrepreneur. And not only toilet paper entrepreneur :)

Sunday, April 17, 2011

Installing gtk2hs 0.12 for the latest Haskell Platform on the Windows 7

This post only comments the original tutorial from haskell.org and describes extra action, one should perform to successfully install the Gtk bindings to Haskell without MinGW.


So, steps, you should perform:



  1. Read the original documentation on how to install Haskell for Windows


  2. Download all required files, mentioned im this document


  3. Install the Gtk to the location without spaces in the filenames. So the default location (C:\Program Files\Gtk+) won't work.


  4. Run command-line prompt, fill the INCLUDE and PKG_CONFIG_PATH with correct paths to Gtk+ and libxml2.


  5. run cabal install --ghc-option=-DCABAL_VERSION_MINOR=10 gtk



This is not all.


Now, some packages fails to register with error:


Registering cairo-0.12.0...
setup.exe: internal error: unexpected package db stack: [UserPackageDB]

This is because of bug (or mistake?) in Gtk2hsSetup.hs. So for these packages you should unpack them (e.g. cabal unpack cairo), move to the unpacked directory and change the following code in the Gtk2hsSetup.hs (line 199 of the file):


#if CABAL_VERSION_CHECK(1,10,0)
installedPkgInfo pkg lbi inplace [packageDb]
#else
installedPkgInfo pkg lbi inplace packageDb
#endif

to


#if CABAL_VERSION_CHECK(1,10,0)
installedPkgInfo pkg lbi inplace (withPackageDB lbi)
#else
installedPkgInfo pkg lbi inplace packageDb
#endif

Than, in the package root directory type:


cabal install --ghc-option=-DCABAL_VERSION_MINOR=10

Other errors I've faced with, and the solutions


Error in compiling pango package, the compiler couldn't find glib/glib.h file.


In the file pango-0.12.0\Graphics\Rendering\Pango\Struts.hsc change


#include <glib/glib.h>

to


#include <glib.h>

Error in compiling gio package, some names aren't defined (wmDriveStopButton, DriveStartStopType, DriveStartFlags)


This is because in the module exports these names are present, but defined in a body with macro:


#if GLIB_CHECK_VERSION(2,22,0)
...
#endif

My solution, is to wrap the definitions of these exports in the same macro. Fortunately this will not break the compilation.


After all these steps you should have a working installation of the Gtk2Hs. To check this, you could build and run examples from the gtk-0.12.0\demo directory.

Saturday, May 29, 2010

Released new version of ProjectTemplate. Now on Windows.

Recently I've uploaded new, more powerful version of my ProjectTemplate tool. Now it could create from templates single files, and don't require the project directory to be created. More info in the README file.

Also I've added support for Windows. It is not so beauty as for Linux platforms (on Windows it works without realine functionality). The Windows binaries are stored on Google Code.

Ironically, but ProjectTemplate helps me more in C++ coding, not in Haskell ;)

Tuesday, March 9, 2010

On creating Haskell projects. Cabal init.

In comments to my previous article on reddit I read about new feature in the cabal-install package (version 0.8.0) - cabal init. This is more convenient successor for mkcabal and I glad it now seamlessly integrated to this powerful project management tool.

This tool also supports rich command line interface, to avoid entering some common properties (such as license, version etc.) every time.

Friday, March 5, 2010

ProjectTemplate - a template-based tool for creating Haskell projects

Yesterday I created 5 projects, and found many boilerplate code on preparing cabal files, Setup.hs and directory structure. So I decided to simplify project creation for haskellers and created ProjectTemplate.

I was inspired by the Don’s mkcabal tool, and created the similar dialog-based application.

mkcabal, created by Don, is very simple tool, and it helps, when you start using Haskell, but this tool hasn’t any configuration and personalization of Haskell project, so I every time after creating project with mkcabal make some changes in the directory structure and file names.

ProjectTemplate is the template-based project creation tool. It was developed for ease creation of Haskell projects, but now I see, it could ease creation different directory structures, based on templates (for example web sites).

Templates are directories with files it them, which will be copied to the newly created project. Each template could contain complex directory hierarchy. Each file/directory name could be the plain name, or template name. Template name is created from the double underscore and parameter name, and in the project directory is substituted to the parameter value.

For example __ProjectName.cabal file with ProjectName parameter set to test, will be substituted to test.cabal. The same is true for directories.

ProjectTemplate also fills template files, where substitutes parameters to their values. The template file should have .template extension, which will be cut of in the destination file. In the template file parameters are enclosed in $ character. The example cabal file, with template parameters could look like:

name:                $ProjectName$
version: 0.1
synopsis:
description:
category: $Category$
license: BSD3
license-file: LICENSE
author: $Author$
maintainer: $Maintainer$
build-depends: base
build-type: Simple

executable: $ProjectName$
main-is: src/Main.hs
ghc-options:

ProjectTemplate defines following parameters: ProjectName, Day, Month, Year. All other parameters will be asked to user during the project creation.

You could look at sample templates in the ProjectTemplate repository, to better understanding how this tool works.

Monday, February 1, 2010

Using PageRank to calculate HackageRank

Previous article was introductory to the PageRank algorithm. Today I want to show one of the practical use of previous library.

Objectives

Hackage grows every day. Because of its centralized structure and the cabal tool, it is easy to install packages with all dependencies with single command.

But I always wonder, what packages are most important, and obviously present at almost all PC’s with Haskell. Sometimes looks strange, when simple library with several functions depends on more than 5 packages. Even when I have installed most popular packages, because my Haskell environment haven’t changed last year. Also I always have troubles with choosing the right library for Web server, or for work with database.

So I decided to apply PageRank algorithm to the Hackage. PageRank shows the probability of the random surfer to visit single page. Similarly HackageRank must show the probability for single package to be present on the random PC of the random haskeller.

The code

> module Main where
>
> -- use distribution package to read dependencies
> import Distribution.PackageDescription
> import Distribution.PackageDescription.Parse
> import Distribution.PackageDescription.Configuration
> import Distribution.Verbosity
> import Distribution.Version
> import Distribution.Package
>
> -- file utils, to enumerate directories
> import System.Directory
> import System.FilePath
> import System.Posix.Files
>
> import System.Environment
>
> import Control.Monad
> import Control.Applicative
> import Control.Arrow
>
> import qualified Data.ByteString.Char8 as B
>
> import qualified Data.Map as M
> import Data.Map (size, keys, fromListWith, toList)
>
> import qualified Data.Set as S
> import Data.Set (empty, union, fromList, toList)
>
> import Data.Maybe
> import Data.List (sortBy)
>
> import Text.Printf
>
> import Rank

Gathering all cabal files

All packages descriptions could be downloaded from the Hackage.

The path to the unpacked archive will be send through command line argument.

> isDir = liftA isDirectory . getFileStatus
> notDir = liftA not . isDir
>
> directoryListing p = (map (p </>) . filterUpperDirs) <$>
> getDirectoryContents p
> where
> filterUpperDirs = filter (`notElem` [".", ".."])
>
> cabalFiles root = do
> dir <- directoryListing root
> directories <- filterM isDir dir
> files <- filterM notDir dir
> subfiles <- concat <$> mapM cabalFiles directories
> return (files ++ subfiles)

Read package names and dependencies

I pack all strings into byte strings, to reduce memory usage (the PageRank algorithm itself heavily uses memory).

> packName pkg = let (PackageName pkgName) = packageName pkg
> in B.pack pkgName
>
> dependencyName (Dependency (PackageName name) _) = B.pack name
>
> packageDependencies pkg =
> let directDeps = buildDepends pkg
> allBuildInfoDeps =
> concatMap (uncurry (++) . (buildTools &&& pkgconfigDepends))
> (allBuildInfo pkg)
> in map dependencyName (directDeps ++ allBuildInfoDeps)
>
> parsePackageFile cabal = do
> pkg <- flattenPackageDescription <$> readPackageDescription silent cabal
> let deps = packageDependencies pkg
> return ((packName pkg, S.fromList deps) : [(x, S.empty) | x <- deps])
>

Generate the list of all packages and their dependencies

enumeratePackages simply maps each package name to integer value. Previous versions of PageRank algorithm run out of memory, when I tried to use strings in vertex names.

> -- we have a list of elements in the form of: [(name, deps set)]
> -- the name could be repeated in the list, so let's convert it to
> -- map, and combine all sets
> packages root = do
> largeList <- concat <$> (mapM parsePackageFile =<< cabalFiles root)
> return (M.fromListWith (S.union) (filter (not . B.null . fst) largeList))
>
> nameToIdx pkgsMap = M.fromList (zip (M.keys pkgsMap) ([0..]::[Int]))
> idxToName pkgsMap = M.fromList (zip ([0..]::[Int]) (M.keys pkgsMap))
>
> enumeratePackages pkgsMap =
> let indexes = nameToIdx pkgsMap
> getIndex name = fromJust (name `M.lookup` indexes)
> enumerator (n, d) = (getIndex n, map getIndex d)
> in map enumerator (M.toList (M.map S.toList pkgsMap))

Decode indexes to package names

Last function just simply decodes the integer values to the corresponding packages names.

> decodePackages pkgsMap ranks = 
> let names = idxToName pkgsMap
> getName idx = fromJust (idx `M.lookup` names)
> in map (\(i, r) -> (getName i, r)) ranks
>

Main function

> main :: IO ()
> main = do
> -- take the packages directory and number of iterations
> (dir:iters:_) <- getArgs
> -- read packages to the map
> packagesMap <- packages dir
> let hackageRankList = pageRankList 0.85
> (fromGraph (enumeratePackages packagesMap))
> lastRank = decodePackages packagesMap
> (fromRankGraph (hackageRankList !! (read iters)))
> sortedRank = sortBy (\(_, r1) (_, r2) -> compare r2 r1) lastRank
> -- printing rank to the screen
> let formatString (n, r) = printf "%10.5f - %s\n" r (B.unpack n)
> mapM_ formatString sortedRank

Some results

Here the Top20 of packages, by HackageRank:

 470.84702 - base
90.73473 - syb
68.48688 - ghc-prim
57.82218 - integer
57.56195 - integer-gmp
57.55739 - rts
57.52356 - integer-simple
45.63995 - array
34.83558 - mtl
34.49726 - bytestring
33.62577 - containers
20.91788 - directory
17.19401 - haskell98
16.18344 - old-time
14.68120 - filepath
14.65891 - old-locale
13.04511 - unix
13.03026 - parsec
11.40593 - random
10.29807 - process

Hm. Doesn’t tell anything :) Let’s try to rank packages, which provides interface to sqlite database:

   0.41222 - HDBC-sqlite3
0.33478 - hsql-sqlite
0.33478 - hsql-sqlite3
0.33024 - bindings-sqlite3
0.31262 - sqlite
0.29853 - haskelldb-hdbc-sqlite3
0.29853 - haskelldb-hsql-sqlite
0.29853 - haskelldb-hsql-sqlite3
0.29853 - hsSqlite3

Much better, now I know, that best tool to work with sqlite is HDBC.

I think, there could be another uses of these calculations, waiting for your ideas.

The complete list of HackageRank calculated for Jan 31, 2010 you can download here.

Sunday, January 31, 2010

Using Haskell to calculate PageRank

PageRank algorithm was originally developed by the Google’s founders for ranking pages by their links. But the idea of ranking, based on references to the items could be used not only in Web, but in other different fields, which have directed graph structure. This article, and simultaneously Literate Haskell code, shows how to calculate page rank with Haskell.

PageRank algorithm

PageRank algorithm uses random surfer model. It calculates the probability of visiting concrete page by some surfer, who randomly clicks on different links on the page.

Originally PageRank for page A is the:

PR(A)=PR(P1)/LC(P1)+PR(P2)/LC(P2)+...+PR(Pn)/LC(Pn)

where LC(Pi) — is number of outgoing links from Pi, and Pi is all pages, which links to the A.

When page hasn’t outgoing links, that we suppose, that it connects to every other page in the Web.

Also PageRank includes damping factor — a probability, that random surfer will continue clicking links on each step. Usually damping factor equals to 0.85.

The previous PageRank formula with damping factor D will look:

PR(A)=(1-D)+D(PR(P1)/LC(P1)+PR(P2)/LC(P2)+...+PR(Pn)/LC(Pn))

Because, the link graph have cycles (i.e. PR(A) could indirectly depend on PR(A)), than calculating on page rank is iterative — on each step we recalculate whole graph, getting it closer to real PageRank value.

Thus the algorithm should look like:

  1. Initialize PR(Ai) for all Ai with arbitrary value, simply 1

  2. Calculate PR(Ai) for all Ai

  3. Repeat step 2 specified times, or until the error will be relatively small

Implementation

> module Main where
>
> import System.Random
> import Control.Monad
> import Control.Applicative
> import Data.Maybe
> import System.Environment
>
> import qualified Data.IntMap as M
> import qualified Data.IntSet as S

Convert input data to more convenient structure

Our initial input data will have the form:

[(vertex, [vertexes to which current vertex links])]

vertexes, are integers, for more efficient implementation.

> -- Each vertex has its rank, number of outbound links and 
> -- set of inbound links
> data VertexProps = VertexProps {
> vertexRank :: Double,
> numOutboundLinks :: Int,
> inboundLinks :: S.IntSet
> } deriving Show
>
> isDangling (VertexProps _ 0 _) = True
> isDangling _ = False
>
> -- Our graph is map between vertex index and it properties
> type RankGraph = M.IntMap VertexProps
>
> fromGraph :: [(Int, [Int])] -> RankGraph
> fromGraph gr =
> let depPairs (v, l) = (v, VertexProps 1 (length l) S.empty) :
> [(x, VertexProps 0 0 (S.singleton v)) |
> x <- l]
> depList = concatMap depPairs gr
> catVertexes (VertexProps r1 l1 s1) (VertexProps r2 l2 s2) =
> VertexProps (r1+r2) (l1+l2) (s1 `S.union` s2)
> in M.fromListWith catVertexes depList
>

Implement single-step PageRank algorithm

> pageRank :: Double -> RankGraph -> RankGraph
> pageRank df gr = M.mapWithKey transformRank gr
> where transformRank v (VertexProps r l s) =
> VertexProps (calcRank s) l s
> -- calculate rank for a single vertex
> calcRank s = (1 - df) + df * (danglingScore + (S.fold incomR 0 s))
> -- sum of all linkVotes for vertixe vrt
> incomR vrt r = r + (linkVote vrt)
> linkVote v = let lvf (VertexProps r l _) = r / (fromIntegral l)
> in maybe 0 lvf (v `M.lookup` gr)
> -- calculate the dangling score
> -- sum all dangling page ranks and divide them to graph size
> danglingPages = filter (isDangling . snd) (M.toList gr)
> danglingScore = (sum (map (vertexRank . snd) danglingPages)) /
> (fromIntegral (M.size gr))
>
> pageRankList df gr = iterate (pageRank df) gr
>
> -- Convert from RankGraph to list [(id, rank)]
> fromRankGraph = map (\(v, r) -> (v, vertexRank r)) . M.toList

Testing

Let’s test our creation. I create simple main program, that will read graph size from the command-line and calculate rank with 20 iterations. To force evaluation, we’ll just print last element from the result of the last iteration. Here I only interested in algorithm’s efficiency.

Generate test graph

> createTestGraph :: Int -> Int -> IO [(Int, [Int])]
> createTestGraph numPages maxLinks =
> sequence [(,) p <$> outboundLinks p | p <- [1..numPages]]
> where
> outboundLinks p = do
> outLinksNum <- randomRIO (0, maxLinks)
> sequence [randomRangeNot p (1, numPages) |
> x <- replicate outLinksNum 0]
>
> randomRangeNot p (a, b) = do
> v <- randomRIO (a, b - 1)
> return (if v == p then b else v)
> main :: IO ()
> main = do
> (gSize:gLinks:_) <- getArgs
> rank <- pageRankList 0.85 . fromGraph <$>
> createTestGraph (read gSize) (read gLinks)
> putStrLn ("Graph size: " ++ gSize ++ "x" ++ gLinks)
> putStrLn ("20th iteration result: " ++
> (show (M.toList (rank !! 20) !! ((read gSize) - 1))))

My algorithm works 7 seconds to calculate 20 iterations of PageRank for a graph with 10000 vertexes. Not bad, but I will be glad to reduce the memory usage because 150MB is too big for the tiny set of data.