From 70731164b51f44b57044f01d111b1a7bd30298f9 Mon Sep 17 00:00:00 2001 From: Thomas Letan Date: Thu, 19 Mar 2020 21:51:21 +0100 Subject: Add a new post about Clight and its semantics --- .gitignore | 1 + site/posts/ClightIntroduction.v | 368 ++++++++++++++++++++++++++++++++++++++++ site/posts/index.org | 5 + 3 files changed, 374 insertions(+) create mode 100644 site/posts/ClightIntroduction.v diff --git a/.gitignore b/.gitignore index 7cb6118..985d6e8 100644 --- a/.gitignore +++ b/.gitignore @@ -12,6 +12,7 @@ build/ .lia.cache site/posts/StronglySpecifiedFunctionsProgram.html site/posts/MiniHTTPServer.html +site/posts/ClightIntroduction.html site/posts/StronglySpecifiedFunctions.html site/posts/RewritingInCoq.html site/posts/Ltac101.html diff --git a/site/posts/ClightIntroduction.v b/site/posts/ClightIntroduction.v new file mode 100644 index 0000000..eaae907 --- /dev/null +++ b/site/posts/ClightIntroduction.v @@ -0,0 +1,368 @@ +(** * A Study of Clight and its Semantics *) +(* begin hide *) +From Coq Require Import List. +Import ListNotations. +(* end hide *) +(** CompCert is a certified C compiler which comes with a proof of semantics + preservation. What this means is the following: the semantics of the C code + you write is preserved by CompCert compilation passes up to the generated + machine code. + + I have been interested in CompCert for quite some times now. Today (##March 18, 2020##), I have challenged myself to study + Clight and its semantics. This write-up is the result of this challenge, + written as I was progressing. + + #
# *) + +(** ** Installing CompCert *) + +(** CompCert has been added to <>, and as a consequence can be very easily + used as a library for other Coq developments. A typical use case is for a + project to produce Clight (the high-level AST of CompCert), and to benefit + from CompCert proofs after that. + + Installing CompCert is as easy as + +<< +opam install compcert +>> + + Once <> terminates, the <> namespace becomes available. In + addition, several binaries are now available if you have correctly set your + <> environment variable. For instance, <> takes a C file as + an argument, and generates a Coq file which contains the Clight generated by + CompCert. *) + +(** ** Problem Statement *) + +(** Our goal for this first write-up is to prove that the C function + +<< +int add (int x, int y) { + return x + y; +} +>> + + returns the expected result, that is <>. The <> tool + generates (among other things) the following AST (note: I have modified it + in order to improve its readability). *) + +From compcert Require Import Clight Ctypes Clightdefs AST + Coqlib Cop. + +Definition _x : ident := 1%positive. +Definition _y : ident := 2%positive. + +Definition f_add : function := + {| fn_return := tint + ; fn_callconv := cc_default + ; fn_params := [(_x, tint); (_y, tint)] + ; fn_vars := [] + ; fn_temps := [] + ; fn_body := Sreturn + (Some (Ebinop Oadd + (Etempvar _x tint) + (Etempvar _y tint) + tint)) + |}. + +(** The fields of the [function] type are pretty self-explanatory (as it is + often the case in CompCert’s ASTs as far as I can tell for now). + + Identifiers in Clight are ([positive]) indices. The [fn_body] field is of + type [statement], with the particular constructor [Sreturn] whose argument + is of type [option expr], and [statement] and [expr] look like the two main + types to study. The predicates [step1] and [step2] allow for reasoning + about the execution of a [function], step by step (hence the name). It + appears that <> generates Clight terms using the function call + convention encoded by [step2]. To reason about a complete execution, it + appears that we can use [star] (from the [Smallstep] module) which is + basically a trace of [step]. These semantics are defined as predicates (that + is, they live in [Prop]). They allow for reasoning about + state-transformation, where a state is either + + - A function call, with a given list of arguments and a continuation + - A function return, with a result and a continuation + - A [statement] execution within a [function] + + We import several CompCert modules to manipulate _values_ (in our case, + bounded integers). *) + +From compcert Require Import Values Integers. +Import Int. + +(** Putting everything together, the lemma we want to prove about [f_add] is + the following. *) + +Lemma f_add_spec (env : genv) + (t : Events.trace) + (m m' : Memory.Mem.mem) + (v : val) (x y z : int) + (trace : Smallstep.star step2 env + (Callstate (Ctypes.Internal f_add) + [Vint x; Vint y] + Kstop + m) + t + (Returnstate (Vint z) Kstop m')) + : z = add x y. + +(** ** Proof Walkthrough *) + +(** We introduce a custom [inversion] tactic which does some clean-up in + addition to just perform the inversion. *) + +Ltac smart_inv H := + inversion H; subst; cbn in *; clear H. + +(** We can now try to prove our lemma. *) + +Proof. + +(** We first destruct [trace], and we rename the generated hypothesis in order + to improve the readability of these notes. *) + + smart_inv trace. + rename H into Hstep. + rename H0 into Hstar. + +(** This generates two hypotheses. + +<< +Hstep : step1 + env + (Callstate (Ctypes.Internal f_add) + [Vint x; Vint y] + Kstop + m) + t1 + s2 +Hstar : Smallstep.star + step2 + env + s2 + t2 + (Returnstate (Vint z) Kstop m') +>> + + In other words, to “go” from a [Callstate] of [f_add] to a [Returnstate], + there is a first step from a [Callstate] to a state [s2], then a succession + of steps to go from [s2] to a [Returnstate]. + + We consider the single [step], in order to determine the actual value of + [s2] (among other things). To do that, we use [smart_inv] on [Hstep], and + again perform some renaming. *) + + smart_inv Hstep. + rename le into tmp_env. + rename e into c_env. + rename H5 into f_entry. + +(** This produces two effects. First, a new hypothesis is added to the context. + +<< +f_entry : function_entry1 + env + f_add + [Vint x; Vint y] + m + c_env + tmp_env + m1 +>> + + Then, the [Hstar] hypothesis has been updated, because we now have a more + precise value of [s2]. More precisely, [s2] has become + +<< +State + f_add + (Sreturn + (Some (Ebinop Oadd + (Etempvar _x tint) + (Etempvar _y tint) + tint))) + Kstop + c_env + tmp_env + m1 +>> + + Using the same approach as before, we can uncover the next step. *) + + smart_inv Hstar. + rename H into Hstep. + rename H0 into Hstar. + +(** The resulting hypotheses are + +<< +Hstep : step2 env + (State + f_add + (Sreturn + (Some + (Ebinop Oadd + (Etempvar _x tint) + (Etempvar _y tint) + tint))) + Kstop c_env tmp_env m1) t1 s2 +Hstar : Smallstep.star + step2 + env + s2 + t0 + (Returnstate (Vint z) Kstop m') +>> + + An inversion of [Hstep] can be used to learn more about its resulting + state… So let’s do just that. *) + + smart_inv Hstep. + rename H7 into ev. + rename v0 into res. + rename H8 into res_equ. + rename H9 into mem_equ. + +(** The generated hypotheses have become + +<< +res : val +ev : eval_expr env c_env tmp_env m1 + (Ebinop Oadd + (Etempvar _x tint) + (Etempvar _y tint) + tint) + res +res_equ : sem_cast res tint tint m1 = Some v' +mem_equ : Memory.Mem.free_list m1 + (blocks_of_env env c_env) + = Some m'0 +>> + + Our understanding of these hypotheses is the following + + - The expression [_x + _y] is evaluated using the [c_env] environment (and + we know thanks to [binding] the value of [_x] and [_y]), and its result + is stored in [res] + - [res] is cast into a [tint] value, and acts as the result of [f_add] + + The [Hstar] hypothesis is now interesting + +<< +Hstar : Smallstep.star + step2 env + (Returnstate v' Kstop m'0) t0 + (Returnstate (Vint z) Kstop m') +>> + + It is clear that we are at the end of the execution of [f_add] (even if Coq + generates two subgoals, the second one is not relevant and easy to + discard). *) + + smart_inv Hstar; [| smart_inv H ]. + +(** We are making good progress here, and we can focus our attention on the [ev] + hypothesis, which concerns the evaluation of the [_x + _y] expression. We + can simplify it a bit further. *) + + smart_inv ev; [| smart_inv H]. + rename H4 into fetch_x. + rename H5 into fetch_y. + rename H6 into add_op. + +(** In a short-term, the hypotheses [fetch_x] and [fetch_y] are the most + important. + +<< +fetch_x : eval_expr env c_env tmp_env m1 (Etempvar _x tint) v1 +fetch_y : eval_expr env c_env tmp_env m1 (Etempvar _y tint) v2 +>> + + The current challenge we face is to prove that we know their value. At this + point, we can have a look at [f_entry]. This is starting to look familiar: + [smart_inv], then renaming, etc. *) + + smart_inv f_entry. + clear H. + clear H0. + clear H1. + smart_inv H3; subst. + rename H2 into allocs. + +(** We are almost done. Let’s simplify as much as possible [fetch_x] and + [fetch_y]. Each time, the [smart_inv] tactic generates two suboals, but only + the first one is relevant. The second one is not, and can be discarded. *) + + smart_inv fetch_x; [| inversion H]. + smart_inv H2. + smart_inv fetch_y; [| inversion H]. + smart_inv H2. + +(** We now know the values of the operands of [add]. The two relevant hypotheses + that we need to consider next are [add_op] and [res_equ]. They are easy to + read. + +<< +add_op : sem_binarith + (fun (_ : signedness) (n1 n2 : Integers.int) + => Some (Vint (add n1 n2))) + (fun (_ : signedness) (n1 n2 : int64) + => Some (Vlong (Int64.add n1 n2))) + (fun n1 n2 : Floats.float + => Some (Vfloat (Floats.Float.add n1 n2))) + (fun n1 n2 : Floats.float32 + => Some (Vsingle (Floats.Float32.add n1 n2))) + v1 tint v2 tint m1 = Some res +>> + + - [add_op] is the addition of [Vint x] and [Vint y], and its result is + [res]. + +<< +res_equ : sem_cast res tint tint m1 = Some (Vint z) +>> + + - [res_equ] is the equation which says that the result of [f_add] is + [res], after it has been cast as a [tint] value + + Surprisingly, neither [add_op] nor [res_equ] can be reduced easily… After + some digging, I have found that this is because both rely on [Archi.ptr64], + which is basically opaque as far as reduction strategies are concerned. + + Now, there are probably more elegant (idiomatic) way to deal with this + issue, but a working approach is to unfold until [Archi.ptr64] appears, then + use the [Archi.splitlong_ptr32] lemma to replace it with its value + ([false]). *) + + unfold sem_binarith in *. + cbn in *. + unfold sem_cast in *. + cbn in *. + rewrite Archi.splitlong_ptr32 in *; auto. + +(** We can now simplify [add_op] and [res_equ], and this allows us to + conclude. *) + + smart_inv add_op. + smart_inv res_equ. + reflexivity. +Qed. + +(** ** Conclusion *) + +(** The main difficulty of this exercise was the opaqueness of [Archi.ptr64]; + this is surprising, since it means my struggle was technical, not + conceptual. On the contrary, the definitions of Clight are easy to + understand, and the ##CompCert documentation## + is very pleasant to read. + + Understanding Clight and its semantics can be very interesting if you are + working on a language that you want to translate into machine code. However, + proving functional properties of a given C snippet using only CompCert can + quickly become cumbersome. From this perspective, the ##VST## project is very + interesting, as its main purpose is to provide tools to reason about Clight + programs more easily. *) diff --git a/site/posts/index.org b/site/posts/index.org index 64363a0..6b2e304 100644 --- a/site/posts/index.org +++ b/site/posts/index.org @@ -19,6 +19,11 @@ Coq is a formal proof management system which provides a pure functional language with nice dependent types together with an environment for writing machine-checked proofs. +- [[./ClightIntroduction.html][A Study of Clight and its Semantics]] :: + Clight is a “simplified” C AST used by CompCert, the certified C compiler. In + this write-up, we prove a straighforward functional property of a small C + function, as an exercise to discover the Clight semantics. + - [[./MiniHTTPServer.html][Implementing and Certifying a Web Server in Coq]] :: An explanation on how to write an almost pure Coq, and working (albeit minimal) HTTP server. -- cgit v1.2.3