|author||Thomas Letan <email@example.com>||2020-02-04 18:13:38 +0100|
|committer||Thomas Letan <firstname.lastname@example.org>||2020-02-04 18:13:38 +0100|
Initial commit with previous content and a minimal theme
Diffstat (limited to 'site/posts/extensible-type-safe-error-handling.org')
1 files changed, 392 insertions, 0 deletions
diff --git a/site/posts/extensible-type-safe-error-handling.org b/site/posts/extensible-type-safe-error-handling.org
new file mode 100644
@@ -0,0 +1,392 @@
+<h1>Extensible Type-Safe Error Handling in Haskell</h1>
+#+TOC: headlines 2
+A colleague of mine introduced me to the benefits of [[https://crates.io/crates/error-chain][~error-chain~]], a crate which
+aims to implement /“consistent error handling”/ for Rust. I found the overall
+design pretty convincing, and in his use case, the crate really makes its error
+handling clearer and flexible. I knew /pijul/ uses ~error-chain~ to, but I never
+had the occasion to dig more into it.
+At the same time, I have read quite a lot about /extensible effects/ in
+Functional Programming, for an academic article I have submitted to
+[[http://www.fm2018.org][Formal Methods 2018]][fn:fm2018]. In particular, the [[https://hackage.haskell.org/package/freer][freer]] package provides a very
+nice API to define monadic functions which may use well-identified effects. For
+instance, we can imagine that ~Console~ identifies the functions which may print
+to and read from the standard output. A function ~askPassword~ which displays a
+prompt and get the user password would have this type signature:
+askPassword :: Member Console r => Eff r ()
+Compared to ~IO~, ~Eff~ allows for meaningful type signatures. It becomes easier
+to reason about function composition, and you know that a given function which
+lacks a given effect in its type signature will not be able to use them. As a
+predictable drawback, ~Eff~ can become burdensome to use.
+Basically, when my colleague showed me its Rust project and how he was using
+~error-chain~, the question popped out. *Can we use an approach similar to ~Eff~
+to implement a Haskell-flavoured ~error-chain~?*
+Spoiler alert: the answer is yes. In this post, I will dive into the resulting
+API, leaving for another time the details of the underlying
+implementation. Believe me, there is plenty to say. If you want to have a look
+already, the current implementation can be found on [[https://github.com/lethom/chain][GitHub]].
+In this article, I will use several “advanced” GHC pragmas. I will not explain
+each of them, but I will /try/ to give some pointers for the reader who wants to
+[fn:fm2018] If the odds are in my favour, I will have plenty of occasions to write
+more about this topic.
+* State of the Art
+This is not an academic publication, and my goal was primarily to explore the
+arcane of the Haskell type system, so I might have skipped the proper study of
+the state of the art. That being said, I have written programs in Rust and
+** Starting Point
+In Rust, ~Result<T, E>~ is the counterpart of ~Either E T~ in
+Haskell[fn:either]. You can use it to model to wrap either the result of a
+function (~T~) or an error encountered during this computation (~E~).
+Both ~Either~ and ~Result~ are used in order to achieve the same end, that is
+writing functions which might fail.
+On the one hand, ~Either E~ is a monad. It works exactly as ~Maybe~ (returning
+an error acts as a shortcut for the rest of the function), but gives you the
+ability to specify /why/ the function has failed. To deal with effects, the
+~mtl~ package provides ~EitherT~, a transformer version of ~Either~ to be used
+in a monad stack.
+On the other hand, the Rust language provides the ~?~ syntactic sugar, to
+achieve the same thing. That is, both languages provide you the means to write
+potentially failing functions without the need to care locally about failure. If
+your function ~B~ uses a function ~A~ which might fail, and want to fail
+yourself if ~A~ fails, it becomes trivial.
+Out of the box, neither ~EitherT~ nor ~Result~ is extensible. The functions must
+use the exact same ~E~, or errors must be converted manually.
+[fn:either] I wonder if they deliberately choose to swap the two type arguments.
+** Handling Errors in Rust
+Rust and the ~error-chain~ crate provide several means to overcome this
+limitation. In particular, it has the ~Into~ and ~From~ traits to ease the
+conversion from one error to another. Among other things, the ~error-chain~
+crate provides a macro to easily define a wrapper around many errors types,
+basically your own and the one defined by the crates you are using.
+I see several drawbacks to this approach. First, it is extensible if you take
+the time to modify the wrapper type each time you want to consider a new error
+type. Second, either you can either use one error type or every error
+However, the ~error-chain~ package provides a way to solve a very annoying
+limitation of ~Result~ and ~Either~. When you “catch” an error, after a given
+function returns its result, it can be hard to determine from where the error is
+coming from. Imagine you are parsing a very complicated source file, and the
+error you get is ~SyntaxError~ with no additional context. How would you feel?
+~error-chain~ solves this by providing an API to construct a chain of errors,
+rather than a single value.
+my_function().chain_err(|| "a message with some context")?;
+The ~chain_err~ function makes it easier to replace a given error in its
+context, leading to be able to write more meaningful error messages for
+* The ResultT Monad
+The ~ResultT~ is an attempt to bring together the extensible power of ~Eff~ and
+the chaining of errors of ~chain_err~. I will admit that, for the latter, the
+current implementation of ~ResultT~ is probably less powerful, but to be honest
+I mostly cared about the “extensible” thing, so it is not very surprising.
+This monad is not an alternative to neither Monad Stacks a la mtl nor to the
+~Eff~ monad. In its current state, it aims to be a more powerful and flexible
+version of ~EitherT~.
+As often in Haskell, the ~ResultT~ monad can be parameterised in several ways.
+data ResultT msg (err :: [*]) m a
+- ~msg~ is the type of messages you can stack to provide more context to error
+- ~err~ is a /row of errors/[fn:row], it basically describes the set of errors
+ you will eventually have to handle
+- ~m~ is the underlying monad stack of your application, knowing that ~ResultT~
+ is not intended to be stacked itself
+- ~a~ is the expected type of the computation result
+[fn:row] You might have notice ~err~ is of kind ~[*]~. To write such a thing,
+you will need the [[https://www.schoolofhaskell.com/user/konn/prove-your-haskell-for-great-safety/dependent-types-in-haskell][DataKinds]] GHC pragmas.
+** ~achieve~ and ~abort~
+The two main monadic operations which comes with ~ResultT~ are ~achieve~ and
+~abort~. The former allows for building the context, by stacking so-called
+messages which describe what you want to do. The latter allows for bailing on a
+computation and explaining why.
+achieve :: (Monad m)
+ => msg
+ -> ResultT msg err m a
+ -> ResultT msg err m a
+~achieve~ should be used for ~do~ blocks. You can use ~<?>~ to attach a
+contextual message to a given computation.
+The type signature of ~abort~ is also interesting, because it introduces the
+~Contains~ typeclass (e.g., it is equivalent to ~Member~ for ~Eff~).
+abort :: (Contains err e, Monad m)
+ => e
+ -> ResultT msg err m a
+This reads as follows: /“you can abort with an error of type ~e~ if and only if
+the row of errors ~err~ contains the type ~e~.”/
+For instance, imagine we have an error type ~FileError~ to describe
+filesystem-related errors. Then, we can imagine the following function:
+readContent :: (Contains err FileError, MonadIO m)
+ => FilePath
+ -> ResultT msg err m String
+We could leverage this function in a given project, for instance to read its
+configuration files (for the sake of the example, it has several configuration
+files). This function can use its own type to describe ill-formed description
+parseConfiguration :: (Contains err ConfigurationError, MonadIO m)
+ => String
+ -> String
+ -> ResultT msg err m Configuration
+To avoid repeating ~Contains~ when the row of errors needs to contains several
+elements, we introduce ~:<~[fn:top] (read /subset or equal/):
+getConfig :: ( '[FileError, ConfigurationError] :< err
+ , MonadIO m)
+ => ResultT String err m Configuration
+getConfig = do
+ achieve "get configuration from ~/.myapp directory" $ do
+ f1 <- readContent "~/.myapp/init.conf"
+ <?> "fetch the main configuration"
+ f2 <- readContent "~/.myapp/net.conf"
+ <?> "fetch the net-related configuration"
+ parseConfiguration f1 f2
+You might see, now, why I say ~ResultT~ is extensible. You can use two functions
+with totally unrelated errors, as long as the caller advertises that with
+~Contains~ or ~:<~.
+[fn:top] If you are confused by ~:<~, it is probably because you were not aware
+of the [[https://ocharles.org.uk/blog/posts/2014-12-08-type-operators.html][TypeOperators]] before. Maybe it was for the best. :D
+** Recovering by Handling Errors
+Monads are traps, you can only escape them by playing with their
+rules. ~ResultT~ comes with ~runResultT~.
+runResultT :: Monad m => ResultT msg ' m a -> m a
+This might be surprising: we can only escape out from the ~ResultT~ if we do not
+use /any errors at all/. In fact, ~ResultT~ forces us to handle errors before
+~ResultT~ provides several functions prefixed by ~recover~. Their type
+signatures can be a little confusing, so we will dive into the simpler one:
+recover :: forall e m msg err a.
+ (Monad m)
+ => ResultT msg (e ': err) m a
+ -> (e -> [msg] -> ResultT msg err m a)
+ -> ResultT msg err m a
+~recover~ allows for /removing/ an error type from the row of errors, To do
+that, it requires to provide an error handler to determine what to do with the
+error raised during the computation and the stack of messages at that
+time. Using ~recover~, a function may use more errors than advertised in its
+type signature, but we know by construction that in such a case, it handles
+these errors so that it is transparent for the function user. The type of the
+handler is ~e -> [msg] -> ResultT msg err m a~, which means the handler /can
+raise errors if required/. ~recoverWhile msg~ is basically a synonym for
+~achieve msg $ recover~. ~recoverMany~ allows for doing the same with a row of
+errors, by providing as many functions as required. Finally, ~recoverManyWith~
+simplifies ~recoverMany~: you can provide only one function tied to a given
+typeclass, on the condition that the handling errors implement this typeclass.
+Using ~recover~ and its siblings often requires to help a bit the Haskell
+type system, especially if we use lambdas to define the error handlers. Doing
+that is usually achieved with the ~Proxy a~ dataype (where ~a~ is a phantom
+type). I would rather use the TypeApplications[fn:tap] pragma.
+recoverManyWith @[FileError, NetworkError] @DescriptiveError
+ (do x <- readFromFile f
+ y <- readFromNetwork socket
+ printToStd x y)
+The ~DecriptiveError~ typeclass can be seen as a dedicated ~Show~, to give
+textual representation of errors. It is inspired by the macros of ~error_chain~.
+We can start from an empty row of errors, and allows ourselves to
+use more errors thanks to the ~recover*~ functions.
+[fn:tap] The [[https://medium.com/@zyxoas/abusing-haskell-dependent-types-to-make-redis-queues-safer-cc31db943b6c][TypeApplications]] pragmas is probably one of my favourites. When I
+use it, it feels almost like if I were writing some Gallina.
+* ~cat~ in Haskell using ResultT
+~ResultT~ only cares about error handling. The rest of the work is up to the
+underlying monad ~m~. That being said, nothing forbids us to provide
+fine-grained API for, e.g. Filesystem-related functions. From an error handling
+perspective, the functions provided by Prelude (the standard library of Haskell)
+are pretty poor, and the documentation is not really precise regarding the kind
+of error we can encounter while using it.
+In this section, I will show you how we can leverage ~ResultT~ to *(i)* define an
+error-centric API for basic file management functions and *(ii)* use this API to
+implement a ~cat~-like program which read a file and print its content in the
+** (A Lot Of) Error Types
+We could have one sum type to describe in the same place all the errors we can
+find, and later use the pattern matching feature of Haskell to determine which
+one has been raised. The thing is, this is already the job done by the row of
+errors of ~ResultT~. Besides, this means that we could raise an error for being
+not able to write something into a file in a function which /opens/ a file.
+Because ~ResultT~ is intended to be extensible, we should rather define several
+types, so we can have a fine-grained row of errors. Of course, too many types
+will become burdensome, so this is yet another time where we need to find the
+newtype AlreadyInUse = AlreadyInUse FilePath
+newtype DoesNotExist = DoesNotExist FilePath
+data AccessDeny = AccessDeny FilePath IO.IOMode
+data EoF = EoF
+data IllegalOperation = IllegalRead | IllegalWrite
+To be honest, this is a bit too much for the real life, but we are in a blog post
+here, so we should embrace the potential of ~ResultT~.
+** Filesystem API
+By reading the [[https://hackage.haskell.org/package/base-22.214.171.124/docs/System-IO.html][System.IO]] documentation, we can infer what our functions type
+signatures should look like. I will not discuss their actual implementation in
+this article, as this requires me to explain how `IO` deals with errors itself
+(and this article is already long enough to my taste). You can have a look at
+[[https://gist.github.com/lethom/c669e68e284a056dc8c0c3546b4efe56][this gist]] if you are interested.
+openFile :: ( '[AlreadyInUse, DoesNotExist, AccessDeny] :< err
+ , MonadIO m)
+ => FilePath -> IOMode -> ResultT msg err m Handle
+getLine :: ('[IllegalOperation, EoF] :< err, MonadIO m)
+ => IO.Handle
+ -> ResultT msg err m Text
+closeFile :: (MonadIO m)
+ => IO.Handle
+ -> ResultT msg err m ()
+** Implementing ~cat~
+We can use the ~ResultT~ monad, its monadic operations and our functions to deal
+with the file system in order to implement a ~cat~-like program. I tried to
+comment on the implementation to make it easier to follow.
+cat :: FilePath -> ResultT String err IO ()
+cat path =
+ -- We will try to open and read this file to mimic
+ -- `cat` behaviour.
+ -- We advertise that in case something goes wrong
+ -- the process.
+ achieve ("cat " ++ path) $ do
+ -- We will recover from a potential error,
+ -- but we will abstract away the error using
+ -- the `DescriptiveError` typeclass. This way,
+ -- we do not need to give one handler by error
+ -- type.
+ recoverManyWith @[Fs.AlreadyInUse, Fs.DoesNotExist, Fs.AccessDeny, Fs.IllegalOperation]
+ (do f <- Fs.openFile path Fs.ReadMode
+ -- `repeatUntil` works like `recover`, except
+ -- it repeats the computation until the error
+ -- actually happpens.
+ -- I could not have used `getLine` without
+ -- `repeatUntil` or `recover`, as it is not
+ -- in the row of errors allowed by
+ -- `recoverManyWith`.
+ repeatUntil @(Fs.EoF)
+ (Fs.getLine f >>= liftIO . print)
+ (\_ _ -> liftIO $ putStrLn "%EOF")
+ closeFile f)
+ -- Using the `DescriptiveError` typeclass, we
+ -- can print both the stack of Strings which form
+ -- the context, and the description of the generic
+ -- error.
+ printErrorAndStack e ctx = do
+ liftIO . putStrLn $ Fs.describe e
+ liftIO $ putStrLn "stack:"
+ liftIO $ print ctx
+The type system of ~cat~ teaches us that this function handles any error it
+might encounter. This means we can use it anywhere we want… in another
+computation inside ~ResultT~ which might raise errors completely unrelated to
+the file system, for instance. Or! We can use it with ~runResultT~, escaping the
+~ResultT~ monad (only to fall into the ~IO~ monad, but this is another story).
+For once, I wanted to write about the /result/ of a project, instead of /how it
+is implemented/. Rest assured, I do not want to skip the latter. I need to clean
+up a bit the code before bragging about it.