summaryrefslogtreecommitdiffstats
path: root/site/posts/ExtensibleTypeSafeErrorHandling.org
blob: cc276f0bbf6e30d1ce847986efd76d0397b6386b (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
#+BEGIN_EXPORT html
<h1>Extensible Type-Safe Error Handling in Haskell</h1>

<p>This article has originally been published on <span
id="original-created-at">February 04, 2018</span>.</p>
#+END_EXPORT

#+TOC: headlines 2

#+BEGIN_EXPORT html
<div id="history">site/posts/ExtensibleTypeSafeErrorHandling.org</div>
#+END_EXPORT

A colleague of mine introduced me to the benefits of [[https://crates.io/crates/error-chain][~error-chain~]], a crate which
aims to implement /“consistent error handling”/ for Rust. I found the overall
design pretty convincing, and in his use case, the crate really makes its error
handling clearer and flexible. I knew /pijul/ uses ~error-chain~ to, but I never
had the occasion to dig more into it.

At the same time, I have read quite a lot about /extensible effects/ in
Functional Programming, for an academic article I have submitted to
[[http://www.fm2018.org][Formal Methods 2018]][fn:fm2018]. In particular, the [[https://hackage.haskell.org/package/freer][freer]] package provides a very
nice API to define monadic functions which may use well-identified effects. For
instance, we can imagine that ~Console~ identifies the functions which may print
to and read from the standard output. A function ~askPassword~ which displays a
prompt and get the user password would have this type signature:

#+BEGIN_SRC haskell
askPassword :: Member Console r => Eff r ()
#+END_SRC

Compared to ~IO~, ~Eff~ allows for meaningful type signatures. It becomes easier
to reason about function composition, and you know that a given function which
lacks a given effect in its type signature will not be able to use them. As a
predictable drawback, ~Eff~ can become burdensome to use.

Basically, when my colleague showed me its Rust project and how he was using
~error-chain~, the question popped out. *Can we use an approach similar to ~Eff~
to implement a Haskell-flavoured ~error-chain~?*

Spoiler alert: the answer is yes. In this post, I will dive into the resulting
API, leaving for another time the details of the underlying
implementation. Believe me, there is plenty to say. If you want to have a look
already, the current implementation can be found on [[https://github.com/lethom/chain][GitHub]].

In this article, I will use several “advanced” GHC pragmas. I will not explain
each of them, but I will /try/ to give some pointers for the reader who wants to
learn more.

[fn:fm2018] If the odds are in my favour, I will have plenty of occasions to write
more about this topic.

* State of the Art

This is not an academic publication, and my goal was primarily to explore the
arcane of the Haskell type system, so I might have skipped the proper study of
the state of the art. That being said, I have written programs in Rust and
Haskell before.

** Starting Point

In Rust, ~Result<T, E>~ is the counterpart of ~Either E T~ in
Haskell[fn:either]. You can use it to model to wrap either the result of a
function (~T~) or an error encountered during this computation (~E~).
Both ~Either~ and ~Result~ are used in order to achieve the same end, that is
writing functions which might fail.

On the one hand, ~Either E~ is a monad. It works exactly as ~Maybe~ (returning
an error acts as a shortcut for the rest of the function), but gives you the
ability to specify /why/ the function has failed. To deal with effects, the
~mtl~ package provides ~EitherT~, a transformer version of ~Either~ to be used
in a monad stack.

On the other hand, the Rust language provides the ~?~ syntactic sugar, to
achieve the same thing. That is, both languages provide you the means to write
potentially failing functions without the need to care locally about failure. If
your function ~B~ uses a function ~A~ which might fail, and want to fail
yourself if ~A~ fails, it becomes trivial.

Out of the box, neither ~EitherT~ nor ~Result~ is extensible. The functions must
use the exact same ~E~, or errors must be converted manually.

[fn:either] I wonder if they deliberately choose to swap the two type arguments.

** Handling Errors in Rust

Rust and the ~error-chain~ crate provide several means to overcome this
limitation. In particular, it has the ~Into~ and ~From~ traits to ease the
conversion from one error to another. Among other things, the ~error-chain~
crate provides a macro to easily define a wrapper around many errors types,
basically your own and the one defined by the crates you are using.

I see several drawbacks to this approach. First, it is extensible if you take
the time to modify the wrapper type each time you want to consider a new error
type. Second, either you can either use one error type or every error
type.

However, the ~error-chain~ package provides a way to solve a very annoying
limitation of ~Result~ and ~Either~. When you “catch” an error, after a given
function returns its result, it can be hard to determine from where the error is
coming from. Imagine you are parsing a very complicated source file, and the
error you get is ~SyntaxError~ with no additional context. How would you feel?

~error-chain~ solves this by providing an API to construct a chain of errors,
rather than a single value.

#+BEGIN_SRC rust
my_function().chain_err(|| "a message with some context")?;
#+END_SRC

The ~chain_err~ function makes it easier to replace a given error in its
context, leading to be able to write more meaningful error messages for
instance.

* The ResultT Monad

The ~ResultT~ is an attempt to bring together the extensible power of ~Eff~ and
the chaining of errors of ~chain_err~. I will admit that, for the latter, the
current implementation of ~ResultT~ is probably less powerful, but to be honest
I mostly cared about the “extensible” thing, so it is not very surprising.

This monad is not an alternative to neither Monad Stacks a la mtl nor to the
~Eff~ monad. In its current state, it aims to be a more powerful and flexible
version of ~EitherT~.

** Parameters

As often in Haskell, the ~ResultT~ monad can be parameterised in several ways.

#+BEGIN_SRC haskell
data ResultT msg (err :: [*]) m a
#+END_SRC

- ~msg~ is the type of messages you can stack to provide more context to error
  handling
- ~err~ is a /row of errors/[fn:row], it basically describes the set of errors
  you will eventually have to handle
- ~m~ is the underlying monad stack of your application, knowing that ~ResultT~
  is not intended to be stacked itself
- ~a~ is the expected type of the computation result

[fn:row] You might have notice ~err~ is of kind ~[*]~. To write such a thing,
you will need the [[https://www.schoolofhaskell.com/user/konn/prove-your-haskell-for-great-safety/dependent-types-in-haskell][DataKinds]] GHC pragmas.

** ~achieve~ and ~abort~

The two main monadic operations which comes with ~ResultT~ are ~achieve~ and
~abort~. The former allows for building the context, by stacking so-called
messages which describe what you want to do. The latter allows for bailing on a
computation and explaining why.

#+BEGIN_SRC haskell
achieve :: (Monad m)
        => msg
        -> ResultT msg err m a
        -> ResultT msg err m a
#+END_SRC

~achieve~ should be used for ~do~ blocks. You can use ~<?>~ to attach a
contextual message to a given computation.

The type signature of ~abort~ is also interesting, because it introduces the
~Contains~ typeclass (e.g., it is equivalent to ~Member~ for ~Eff~).

#+BEGIN_SRC haskell
abort :: (Contains err e, Monad m)
      => e
      -> ResultT msg err m a
#+END_SRC

This reads as follows: /“you can abort with an error of type ~e~ if and only if
the row of errors ~err~ contains the type ~e~.”/

For instance, imagine we have an error type ~FileError~ to describe
filesystem-related errors. Then, we can imagine the following function:

#+BEGIN_SRC haskell
readContent :: (Contains err FileError, MonadIO m)
            => FilePath
            -> ResultT msg err m String
#+END_SRC

We could leverage this function in a given project, for instance to read its
configuration files (for the sake of the example, it has several configuration
files). This function can use its own type to describe ill-formed description
(~ConfigurationError~).

#+BEGIN_SRC haskell
parseConfiguration :: (Contains err ConfigurationError, MonadIO m)
                   => String
                   -> String
                   -> ResultT msg err m Configuration
#+END_SRC

To avoid repeating ~Contains~ when the row of errors needs to contains several
elements, we introduce ~:<~[fn:top] (read /subset or equal/):

#+BEGIN_SRC haskell
getConfig :: ( '[FileError, ConfigurationError] :< err
             , MonadIO m)
             => ResultT String err m Configuration
getConfig = do
  achieve "get configuration from ~/.myapp directory" $ do
    f1 <- readContent "~/.myapp/init.conf"
              <?> "fetch the main configuration"
    f2 <- readContent "~/.myapp/net.conf"
              <?> "fetch the net-related configuration"

    parseConfiguration f1 f2
#+END_SRC

You might see, now, why I say ~ResultT~ is extensible. You can use two functions
with totally unrelated errors, as long as the caller advertises that with
~Contains~ or ~:<~.

[fn:top] If you are confused by ~:<~, it is probably because you were not aware
of the [[https://ocharles.org.uk/blog/posts/2014-12-08-type-operators.html][TypeOperators]] before. Maybe it was for the best. :D

** Recovering by Handling Errors

Monads are traps, you can only escape them by playing with their
rules. ~ResultT~ comes with ~runResultT~.

#+BEGIN_SRC haskell
runResultT :: Monad m => ResultT msg '[] m a -> m a
#+END_SRC

This might be surprising: we can only escape out from the ~ResultT~ if we do not
use /any errors at all/. In fact, ~ResultT~ forces us to handle errors before
calling ~runResultT~.

~ResultT~ provides several functions prefixed by ~recover~. Their type
signatures can be a little confusing, so we will dive into the simpler one:

#+BEGIN_SRC haskell
recover :: forall e m msg err a.
           (Monad m)
        => ResultT msg (e ': err) m a
        -> (e -> [msg] -> ResultT msg err m a)
        -> ResultT msg err m a
#+END_SRC

~recover~ allows for /removing/ an error type from the row of errors, To do
that, it requires to provide an error handler to determine what to do with the
error raised during the computation and the stack of messages at that
time. Using ~recover~, a function may use more errors than advertised in its
type signature, but we know by construction that in such a case, it handles
these errors so that it is transparent for the function user. The type of the
handler is ~e -> [msg] -> ResultT msg err m a~, which means the handler /can
raise errors if required/. ~recoverWhile msg~ is basically a synonym for
~achieve msg $ recover~. ~recoverMany~ allows for doing the same with a row of
errors, by providing as many functions as required. Finally, ~recoverManyWith~
simplifies ~recoverMany~: you can provide only one function tied to a given
typeclass, on the condition that the handling errors implement this typeclass.

Using ~recover~ and its siblings often requires to help a bit the Haskell
type system, especially if we use lambdas to define the error handlers. Doing
that is usually achieved with the ~Proxy a~ dataype (where ~a~ is a phantom
type). I would rather use the TypeApplications[fn:tap] pragma.

#+BEGIN_SRC haskell
recoverManyWith @[FileError, NetworkError] @DescriptiveError
    (do x <- readFromFile f
        y <- readFromNetwork socket
        printToStd x y)
    printErrorAndStack
#+END_SRC

The ~DecriptiveError~ typeclass can be seen as a dedicated ~Show~, to give
textual representation of errors. It is inspired by the macros of ~error_chain~.

We can start from an empty row of errors, and allows ourselves to
use more errors thanks to the ~recover*~ functions.

[fn:tap] The [[https://medium.com/@zyxoas/abusing-haskell-dependent-types-to-make-redis-queues-safer-cc31db943b6c][TypeApplications]] pragmas is probably one of my favourites. When I
use it, it feels almost like if I were writing some Gallina.

* ~cat~ in Haskell using ResultT

~ResultT~ only cares about error handling. The rest of the work is up to the
underlying monad ~m~. That being said, nothing forbids us to provide
fine-grained API for, e.g. Filesystem-related functions. From an error handling
perspective, the functions provided by Prelude (the standard library of Haskell)
are pretty poor, and the documentation is not really precise regarding the kind
of error we can encounter while using it.

In this section, I will show you how we can leverage ~ResultT~ to *(i)* define an
error-centric API for basic file management functions and *(ii)* use this API to
implement a ~cat~-like program which read a file and print its content in the
standard output.

** (A Lot Of) Error Types

We could have one sum type to describe in the same place all the errors we can
find, and later use the pattern matching feature of Haskell to determine which
one has been raised. The thing is, this is already the job done by the row of
errors of ~ResultT~. Besides, this means that we could raise an error for being
not able to write something into a file in a function which /opens/ a file.

Because ~ResultT~ is intended to be extensible, we should rather define several
types, so we can have a fine-grained row of errors. Of course, too many types
will become burdensome, so this is yet another time where we need to find the
right balance.

#+BEGIN_SRC haskell
newtype AlreadyInUse = AlreadyInUse FilePath
newtype DoesNotExist = DoesNotExist FilePath
data AccessDeny = AccessDeny FilePath IO.IOMode
data EoF = EoF
data IllegalOperation = IllegalRead | IllegalWrite
#+END_SRC

To be honest, this is a bit too much for the real life, but we are in a blog post
here, so we should embrace the potential of ~ResultT~.

** Filesystem API

By reading the [[https://hackage.haskell.org/package/base-4.9.1.0/docs/System-IO.html][System.IO]] documentation, we can infer what our functions type
signatures should look like. I will not discuss their actual implementation in
this article, as this requires me to explain how `IO` deals with errors itself
(and this article is already long enough to my taste). You can have a look at
[[https://gist.github.com/lethom/c669e68e284a056dc8c0c3546b4efe56][this gist]] if you are interested.

#+BEGIN_SRC haskell
openFile :: ( '[AlreadyInUse, DoesNotExist, AccessDeny] :< err
            , MonadIO m)
         => FilePath -> IOMode -> ResultT msg err m Handle
#+END_SRC

#+BEGIN_SRC haskell
getLine :: ('[IllegalOperation, EoF] :< err, MonadIO m)
        => IO.Handle
        -> ResultT msg err m Text
#+END_SRC

#+BEGIN_SRC haskell
closeFile :: (MonadIO m)
          => IO.Handle
          -> ResultT msg err m ()
#+END_SRC

** Implementing ~cat~

We can use the ~ResultT~ monad, its monadic operations and our functions to deal
with the file system in order to implement a ~cat~-like program. I tried to
comment on the implementation to make it easier to follow.

#+BEGIN_SRC haskell
cat :: FilePath -> ResultT String err IO ()
cat path =
  -- We will try to open and read this file to mimic
  -- `cat` behaviour.
  -- We advertise that in case something goes wrong
  -- the process.
  achieve ("cat " ++ path) $ do
    -- We will recover from a potential error,
    -- but we will abstract away the error using
    -- the `DescriptiveError` typeclass. This way,
    -- we do not need to give one handler by error
    -- type.
    recoverManyWith @[Fs.AlreadyInUse, Fs.DoesNotExist, Fs.AccessDeny, Fs.IllegalOperation]
                    @(Fs.DescriptiveError)
      (do f <- Fs.openFile path Fs.ReadMode
          -- `repeatUntil` works like `recover`, except
          -- it repeats the computation until the error
          -- actually happpens.
          -- I could not have used `getLine` without
          -- `repeatUntil` or `recover`, as it is not
          -- in the row of errors allowed by
          -- `recoverManyWith`.
          repeatUntil @(Fs.EoF)
              (Fs.getLine f >>= liftIO . print)
              (\_ _ -> liftIO $ putStrLn "%EOF")
          closeFile f)
      printErrorAndStack
    where
      -- Using the `DescriptiveError` typeclass, we
      -- can print both the stack of Strings which form
      -- the context, and the description of the generic
      -- error.
      printErrorAndStack e ctx = do
        liftIO . putStrLn $ Fs.describe e
        liftIO $ putStrLn "stack:"
        liftIO $ print ctx
#+END_SRC

The type system of ~cat~ teaches us that this function handles any error it
might encounter. This means we can use it anywhere we want… in another
computation inside ~ResultT~ which might raise errors completely unrelated to
the file system, for instance. Or! We can use it with ~runResultT~, escaping the
~ResultT~ monad (only to fall into the ~IO~ monad, but this is another story).

* Conclusion

For once, I wanted to write about the /result/ of a project, instead of /how it
is implemented/. Rest assured, I do not want to skip the latter. I need to clean
up a bit the code before bragging about it.