Initial commit with previous content and a minimal theme

author: Thomas Letan <lthms@soap.coffee> 2020-02-04 18:13:38 +0100
committer: Thomas Letan <lthms@soap.coffee> 2020-02-04 18:13:38 +0100
commit: 9754a53fdc14f6ee4cf00f851cda68d69889bdcd (patch)
tree: 0af2ae9e1488f22b56e92cb6b17bb82e2f919bd3 /site/posts/extensible-type-safe-error-handling.org
1 files changed, 392 insertions, 0 deletions
diff --git a/site/posts/extensible-type-safe-error-handling.org b/site/posts/extensible-type-safe-error-handling.org
new file mode 100644
index 0000000..115b3e8
--- /dev/null
+++ b/site/posts/extensible-type-safe-error-handling.org
@@ -0,0 +1,392 @@
+#+BEGIN_EXPORT html
+<h1>Extensible Type-Safe Error Handling in Haskell</h1>
+<time datetime="2018-02-04">2018-02-04</time>
+#+END_EXPORT
+
+#+OPTIONS: toc:nil
+#+TOC: headlines 2
+
+A colleague of mine introduced me to the benefits of [[https://crates.io/crates/error-chain][~error-chain~]], a crate which
+aims to implement /“consistent error handling”/ for Rust. I found the overall
+design pretty convincing, and in his use case, the crate really makes its error
+handling clearer and flexible. I knew /pijul/ uses ~error-chain~ to, but I never
+had the occasion to dig more into it.
+
+At the same time, I have read quite a lot about /extensible effects/ in
+Functional Programming, for an academic article I have submitted to
+[[http://www.fm2018.org][Formal Methods 2018]][fn:fm2018]. In particular, the [[https://hackage.haskell.org/package/freer][freer]] package provides a very
+nice API to define monadic functions which may use well-identified effects. For
+instance, we can imagine that ~Console~ identifies the functions which may print
+to and read from the standard output. A function ~askPassword~ which displays a
+prompt and get the user password would have this type signature:
+
+#+BEGIN_SRC haskell
+askPassword :: Member Console r => Eff r ()
+#+END_SRC
+
+Compared to ~IO~, ~Eff~ allows for meaningful type signatures. It becomes easier
+to reason about function composition, and you know that a given function which
+lacks a given effect in its type signature will not be able to use them. As a
+predictable drawback, ~Eff~ can become burdensome to use.
+
+Basically, when my colleague showed me its Rust project and how he was using
+~error-chain~, the question popped out. *Can we use an approach similar to ~Eff~
+to implement a Haskell-flavoured ~error-chain~?*
+
+Spoiler alert: the answer is yes. In this post, I will dive into the resulting
+API, leaving for another time the details of the underlying
+implementation. Believe me, there is plenty to say. If you want to have a look
+already, the current implementation can be found on [[https://github.com/lethom/chain][GitHub]].
+
+In this article, I will use several “advanced” GHC pragmas. I will not explain
+each of them, but I will /try/ to give some pointers for the reader who wants to
+learn more.
+
+[fn:fm2018] If the odds are in my favour, I will have plenty of occasions to write
+more about this topic.
+
+* State of the Art
+
+This is not an academic publication, and my goal was primarily to explore the
+arcane of the Haskell type system, so I might have skipped the proper study of
+the state of the art. That being said, I have written programs in Rust and
+Haskell before.
+
+** Starting Point
+
+In Rust, ~Result<T, E>~ is the counterpart of ~Either E T~ in
+Haskell[fn:either]. You can use it to model to wrap either the result of a
+function (~T~) or an error encountered during this computation (~E~).
+Both ~Either~ and ~Result~ are used in order to achieve the same end, that is
+writing functions which might fail.
+
+On the one hand, ~Either E~ is a monad. It works exactly as ~Maybe~ (returning
+an error acts as a shortcut for the rest of the function), but gives you the
+ability to specify /why/ the function has failed. To deal with effects, the
+~mtl~ package provides ~EitherT~, a transformer version of ~Either~ to be used
+in a monad stack.
+
+On the other hand, the Rust language provides the ~?~ syntactic sugar, to
+achieve the same thing. That is, both languages provide you the means to write
+potentially failing functions without the need to care locally about failure. If
+your function ~B~ uses a function ~A~ which might fail, and want to fail
+yourself if ~A~ fails, it becomes trivial.
+
+Out of the box, neither ~EitherT~ nor ~Result~ is extensible. The functions must
+use the exact same ~E~, or errors must be converted manually.
+
+[fn:either] I wonder if they deliberately choose to swap the two type arguments.
+
+** Handling Errors in Rust
+
+Rust and the ~error-chain~ crate provide several means to overcome this
+limitation. In particular, it has the ~Into~ and ~From~ traits to ease the
+conversion from one error to another. Among other things, the ~error-chain~
+crate provides a macro to easily define a wrapper around many errors types,
+basically your own and the one defined by the crates you are using.
+
+I see several drawbacks to this approach. First, it is extensible if you take
+the time to modify the wrapper type each time you want to consider a new error
+type. Second, either you can either use one error type or every error
+type.
+
+However, the ~error-chain~ package provides a way to solve a very annoying
+limitation of ~Result~ and ~Either~. When you “catch” an error, after a given
+function returns its result, it can be hard to determine from where the error is
+coming from. Imagine you are parsing a very complicated source file, and the
+error you get is ~SyntaxError~ with no additional context. How would you feel?
+
+~error-chain~ solves this by providing an API to construct a chain of errors,
+rather than a single value.
+
+#+BEGIN_SRC rust
+my_function().chain_err(|| "a message with some context")?;
+#+END_SRC
+
+The ~chain_err~ function makes it easier to replace a given error in its
+context, leading to be able to write more meaningful error messages for
+instance.
+
+* The ResultT Monad
+
+The ~ResultT~ is an attempt to bring together the extensible power of ~Eff~ and
+the chaining of errors of ~chain_err~. I will admit that, for the latter, the
+current implementation of ~ResultT~ is probably less powerful, but to be honest
+I mostly cared about the “extensible” thing, so it is not very surprising.
+
+This monad is not an alternative to neither Monad Stacks a la mtl nor to the
+~Eff~ monad. In its current state, it aims to be a more powerful and flexible
+version of ~EitherT~.
+
+** Parameters
+
+As often in Haskell, the ~ResultT~ monad can be parameterised in several ways.
+
+#+BEGIN_SRC haskell
+data ResultT msg (err :: [*]) m a
+#+END_SRC
+
+- ~msg~ is the type of messages you can stack to provide more context to error
+  handling
+- ~err~ is a /row of errors/[fn:row], it basically describes the set of errors
+  you will eventually have to handle
+- ~m~ is the underlying monad stack of your application, knowing that ~ResultT~
+  is not intended to be stacked itself
+- ~a~ is the expected type of the computation result
+
+[fn:row] You might have notice ~err~ is of kind ~[*]~. To write such a thing,
+you will need the [[https://www.schoolofhaskell.com/user/konn/prove-your-haskell-for-great-safety/dependent-types-in-haskell][DataKinds]] GHC pragmas.
+
+** ~achieve~ and ~abort~
+
+The two main monadic operations which comes with ~ResultT~ are ~achieve~ and
+~abort~. The former allows for building the context, by stacking so-called
+messages which describe what you want to do. The latter allows for bailing on a
+computation and explaining why.
+
+#+BEGIN_SRC haskell
+achieve :: (Monad m)
+        => msg
+        -> ResultT msg err m a
+        -> ResultT msg err m a
+#+END_SRC
+
+~achieve~ should be used for ~do~ blocks. You can use ~<?>~ to attach a
+contextual message to a given computation.
+
+The type signature of ~abort~ is also interesting, because it introduces the
+~Contains~ typeclass (e.g., it is equivalent to ~Member~ for ~Eff~).
+
+#+BEGIN_SRC haskell
+abort :: (Contains err e, Monad m)
+      => e
+      -> ResultT msg err m a
+#+END_SRC
+
+This reads as follows: /“you can abort with an error of type ~e~ if and only if
+the row of errors ~err~ contains the type ~e~.”/
+
+For instance, imagine we have an error type ~FileError~ to describe
+filesystem-related errors. Then, we can imagine the following function:
+
+#+BEGIN_SRC haskell
+readContent :: (Contains err FileError, MonadIO m)
+            => FilePath
+            -> ResultT msg err m String
+#+END_SRC
+
+We could leverage this function in a given project, for instance to read its
+configuration files (for the sake of the example, it has several configuration
+files). This function can use its own type to describe ill-formed description
+(~ConfigurationError~).
+
+#+BEGIN_SRC haskell
+parseConfiguration :: (Contains err ConfigurationError, MonadIO m)
+                   => String
+                   -> String
+                   -> ResultT msg err m Configuration
+#+END_SRC
+
+To avoid repeating ~Contains~ when the row of errors needs to contains several
+elements, we introduce ~:<~[fn:top] (read /subset or equal/):
+
+#+BEGIN_SRC haskell
+getConfig :: ( '[FileError, ConfigurationError] :< err
+             , MonadIO m)
+             => ResultT String err m Configuration
+getConfig = do
+  achieve "get configuration from ~/.myapp directory" $ do
+    f1 <- readContent "~/.myapp/init.conf"
+              <?> "fetch the main configuration"
+    f2 <- readContent "~/.myapp/net.conf"
+              <?> "fetch the net-related configuration"
+
+    parseConfiguration f1 f2
+#+END_SRC
+
+You might see, now, why I say ~ResultT~ is extensible. You can use two functions
+with totally unrelated errors, as long as the caller advertises that with
+~Contains~ or ~:<~.
+
+[fn:top] If you are confused by ~:<~, it is probably because you were not aware
+of the [[https://ocharles.org.uk/blog/posts/2014-12-08-type-operators.html][TypeOperators]] before. Maybe it was for the best. :D
+
+** Recovering by Handling Errors
+
+Monads are traps, you can only escape them by playing with their
+rules. ~ResultT~ comes with ~runResultT~.
+
+#+BEGIN_SRC haskell
+runResultT :: Monad m => ResultT msg '[] m a -> m a
+#+END_SRC
+
+This might be surprising: we can only escape out from the ~ResultT~ if we do not
+use /any errors at all/. In fact, ~ResultT~ forces us to handle errors before
+calling ~runResultT~.
+
+~ResultT~ provides several functions prefixed by ~recover~. Their type
+signatures can be a little confusing, so we will dive into the simpler one:
+
+#+BEGIN_SRC haskell
+recover :: forall e m msg err a.
+           (Monad m)
+        => ResultT msg (e ': err) m a
+        -> (e -> [msg] -> ResultT msg err m a)
+        -> ResultT msg err m a
+#+END_SRC
+
+~recover~ allows for /removing/ an error type from the row of errors, To do
+that, it requires to provide an error handler to determine what to do with the
+error raised during the computation and the stack of messages at that
+time. Using ~recover~, a function may use more errors than advertised in its
+type signature, but we know by construction that in such a case, it handles
+these errors so that it is transparent for the function user. The type of the
+handler is ~e -> [msg] -> ResultT msg err m a~, which means the handler /can
+raise errors if required/. ~recoverWhile msg~ is basically a synonym for
+~achieve msg $ recover~. ~recoverMany~ allows for doing the same with a row of
+errors, by providing as many functions as required. Finally, ~recoverManyWith~
+simplifies ~recoverMany~: you can provide only one function tied to a given
+typeclass, on the condition that the handling errors implement this typeclass.
+
+Using ~recover~ and its siblings often requires to help a bit the Haskell
+type system, especially if we use lambdas to define the error handlers. Doing
+that is usually achieved with the ~Proxy a~ dataype (where ~a~ is a phantom
+type). I would rather use the TypeApplications[fn:tap] pragma.
+
+#+BEGIN_SRC haskell
+recoverManyWith @[FileError, NetworkError] @DescriptiveError
+    (do x <- readFromFile f
+        y <- readFromNetwork socket
+        printToStd x y)
+    printErrorAndStack
+#+END_SRC
+
+The ~DecriptiveError~ typeclass can be seen as a dedicated ~Show~, to give
+textual representation of errors. It is inspired by the macros of ~error_chain~.
+
+We can start from an empty row of errors, and allows ourselves to
+use more errors thanks to the ~recover*~ functions.
+
+[fn:tap] The [[https://medium.com/@zyxoas/abusing-haskell-dependent-types-to-make-redis-queues-safer-cc31db943b6c][TypeApplications]] pragmas is probably one of my favourites. When I
+use it, it feels almost like if I were writing some Gallina.
+
+* ~cat~ in Haskell using ResultT
+
+~ResultT~ only cares about error handling. The rest of the work is up to the
+underlying monad ~m~. That being said, nothing forbids us to provide
+fine-grained API for, e.g. Filesystem-related functions. From an error handling
+perspective, the functions provided by Prelude (the standard library of Haskell)
+are pretty poor, and the documentation is not really precise regarding the kind
+of error we can encounter while using it.
+
+In this section, I will show you how we can leverage ~ResultT~ to *(i)* define an
+error-centric API for basic file management functions and *(ii)* use this API to
+implement a ~cat~-like program which read a file and print its content in the
+standard output.
+
+** (A Lot Of) Error Types
+
+We could have one sum type to describe in the same place all the errors we can
+find, and later use the pattern matching feature of Haskell to determine which
+one has been raised. The thing is, this is already the job done by the row of
+errors of ~ResultT~. Besides, this means that we could raise an error for being
+not able to write something into a file in a function which /opens/ a file.
+
+Because ~ResultT~ is intended to be extensible, we should rather define several
+types, so we can have a fine-grained row of errors. Of course, too many types
+will become burdensome, so this is yet another time where we need to find the
+right balance.
+
+#+BEGIN_SRC haskell
+newtype AlreadyInUse = AlreadyInUse FilePath
+newtype DoesNotExist = DoesNotExist FilePath
+data AccessDeny = AccessDeny FilePath IO.IOMode
+data EoF = EoF
+data IllegalOperation = IllegalRead | IllegalWrite
+#+END_SRC
+
+To be honest, this is a bit too much for the real life, but we are in a blog post
+here, so we should embrace the potential of ~ResultT~.
+
+** Filesystem API
+
+By reading the [[https://hackage.haskell.org/package/base-4.9.1.0/docs/System-IO.html][System.IO]] documentation, we can infer what our functions type
+signatures should look like. I will not discuss their actual implementation in
+this article, as this requires me to explain how `IO` deals with errors itself
+(and this article is already long enough to my taste). You can have a look at
+[[https://gist.github.com/lethom/c669e68e284a056dc8c0c3546b4efe56][this gist]] if you are interested.
+
+#+BEGIN_SRC haskell
+openFile :: ( '[AlreadyInUse, DoesNotExist, AccessDeny] :< err
+            , MonadIO m)
+         => FilePath -> IOMode -> ResultT msg err m Handle
+#+END_SRC
+
+#+BEGIN_SRC haskell
+getLine :: ('[IllegalOperation, EoF] :< err, MonadIO m)
+        => IO.Handle
+        -> ResultT msg err m Text
+#+END_SRC
+
+#+BEGIN_SRC haskell
+closeFile :: (MonadIO m)
+          => IO.Handle
+          -> ResultT msg err m ()
+#+END_SRC
+
+** Implementing ~cat~
+
+We can use the ~ResultT~ monad, its monadic operations and our functions to deal
+with the file system in order to implement a ~cat~-like program. I tried to
+comment on the implementation to make it easier to follow.
+
+#+BEGIN_SRC haskell
+cat :: FilePath -> ResultT String err IO ()
+cat path =
+  -- We will try to open and read this file to mimic
+  -- `cat` behaviour.
+  -- We advertise that in case something goes wrong
+  -- the process.
+  achieve ("cat " ++ path) $ do
+    -- We will recover from a potential error,
+    -- but we will abstract away the error using
+    -- the `DescriptiveError` typeclass. This way,
+    -- we do not need to give one handler by error
+    -- type.
+    recoverManyWith @[Fs.AlreadyInUse, Fs.DoesNotExist, Fs.AccessDeny, Fs.IllegalOperation]
+                    @(Fs.DescriptiveError)
+      (do f <- Fs.openFile path Fs.ReadMode
+          -- `repeatUntil` works like `recover`, except
+          -- it repeats the computation until the error
+          -- actually happpens.
+          -- I could not have used `getLine` without
+          -- `repeatUntil` or `recover`, as it is not
+          -- in the row of errors allowed by
+          -- `recoverManyWith`.
+          repeatUntil @(Fs.EoF)
+              (Fs.getLine f >>= liftIO . print)
+              (\_ _ -> liftIO $ putStrLn "%EOF")
+          closeFile f)
+      printErrorAndStack
+    where
+      -- Using the `DescriptiveError` typeclass, we
+      -- can print both the stack of Strings which form
+      -- the context, and the description of the generic
+      -- error.
+      printErrorAndStack e ctx = do
+        liftIO . putStrLn $ Fs.describe e
+        liftIO $ putStrLn "stack:"
+        liftIO $ print ctx
+#+END_SRC
+
+The type system of ~cat~ teaches us that this function handles any error it
+might encounter. This means we can use it anywhere we want… in another
+computation inside ~ResultT~ which might raise errors completely unrelated to
+the file system, for instance. Or! We can use it with ~runResultT~, escaping the
+~ResultT~ monad (only to fall into the ~IO~ monad, but this is another story).
+
+* Conclusion
+
+For once, I wanted to write about the /result/ of a project, instead of /how it
+is implemented/. Rest assured, I do not want to skip the latter. I need to clean
+up a bit the code before bragging about it.
author	Thomas Letan <lthms@soap.coffee>	2020-02-04 18:13:38 +0100
committer	Thomas Letan <lthms@soap.coffee>	2020-02-04 18:13:38 +0100
commit	9754a53fdc14f6ee4cf00f851cda68d69889bdcd (patch)
tree	0af2ae9e1488f22b56e92cb6b17bb82e2f919bd3 /site/posts/extensible-type-safe-error-handling.org