Formal Verification of Smart Contracts

Hi!

I’ve been taking a look at the Formal Verification that Maker uses and trying to learn more about KEVM and the K language in general.

I’m trying to start (small) and take a look at a simple proof in mkr-authority’s bytes4 support, but I have a few questions regarding the code:

1. Rule for setRoot proof (magic number is bitwise not of 2^160) - how is this number calculated?

rule 115792089237316195423570985007226406215939081747436879206741300988257197096960 &Int X => 0
  requires #rangeAddress(X)
Lemmas for bytes4 support
syntax TypedArg ::= #bytes4( Int )

I found one of the lemmas provided some more information regarding the calculation of the bytes4 support in the comments - which is calculated from: (2^32 - 1) * 2^224. How does this derivation work?

syntax Int ::= "maxBytes4"
 rule maxBytes4 => 115792089210356248756420345214020892766250353992003419616917011526809519390720 [macro] /* (2^32 - 1) * 2^224 */

2. minUint256 value in lemmas

I noticed the minUint256 gets used when checking the #getValue lemma. Is the value of a minimum uint256 a constant that’s made accessible by KEVM/K?

syntax Int ::= #getValue ( TypedArg ) [function]
rule #getValue(#bytes4( DATA )) => DATA
  requires minUInt256 <=Int DATA andBool DATA <=Int maxBytes4

Thanks!
Natalie

6 Likes

Hi Natalie, let me know if the answers below clear things up at all :slightly_smiling_face: :

  1. So for this rule:
rule 115792089237316195423570985007226406215939081747436879206741300988257197096960 &Int X => 0
  requires #rangeAddress(X)

This tells K how to simplify a certain operation that occurs in the Solidity-generated bytecode, which is this bitwise “and” of the constant 1157…6960 and some EVM word X. So the constant originally comes out of the code. It happens, the magic number here is the bitwise not of 2^160 - 1 as a 256-bit word (yes, the description is missing a “- 1”, which is maybe the motivation for the question…good catch, I will correct that). So what the bytecode is trying to do is throw away the lower-order 160 bits of X. What the rule then says is that if X (represented in K as a plain integer) is already known to be in the range of an address (an Ethereum address is 160 bits), then this expression (1157...6960 &Int X) rewrites to 0 (i.e. no bits set). This might seem very obvious, but sometimes K needs help with obvious things :).

  1. Now for this rule:
syntax Int ::= "maxBytes4"
 rule maxBytes4 => 115792089210356248756420345214020892766250353992003419616917011526809519390720 [macro] /* (2^32 - 1) * 2^224 */

It’s just defining a constant. As I mentioned above, in K we represent EVM words with plain integers, so this is supposed to be the largest valid value for an integer representing a bytes4 data type. In Solidity, a bytesN data type stores its data in the highest-order N bytes of a word. So the maximum K integer that corresponds to a valid bytes4 value is constructed by setting the highest-order 32 bits of a 256 bit word. The (2^32 - 1) * 2^224 expression just describes this mathematically–(2^32 - 1) is a word with the lowest-order 32 bits set; multiplying by 2^224 shifts these bits to the left so that highest-order 32 bits are set (224 + 32 = 256, so the 32nd bit moves to be the 256th bit, etc).

  1. minUInt256, along with many other constants, is defined in KEVM:
    https://github.com/dapphub/evm-semantics/blob/0bbee05fb3ce15e3313604ed3679e2c08d8e4c35/evm-types.md
    (note that klab uses a fork of evm-semantics, the original evm-semantics repo is here)
2 Likes

seth is a great tool to help with this (you can install over at dapp.tools

[email protected]:~$ seth --to-hex 115792089237316195423570985007226406215939081747436879206741300988257197096960
0xffffffffffffffffffffffff0000000000000000000000000000000000000000
[email protected]:~$ seth --to-hex 115792089210356248756420345214020892766250353992003419616917011526809519390720
0xffffffff00000000000000000000000000000000000000000000000000000000
3 Likes

Thank so much for the quick response @Kurt_Barry & @Mariano_Conti! Definitely clears up my questions about the lemmas. I’ll definitely check out seth as well!

I have a few additional questions regarding the proof for the canCall specs:

behaviour canCall-true of MkrAuthority
interface canCall(address src, address dst, bytes4 sig)
...
iff
  VCallValue == 0
if
  (src == Root) or 
  (sig == #asWord(#padRightToWidth(32, #take(4, #abiCallData("burn", #address(0), #uint256(0))))) or 
  sig == #asWord(#padRightToWidth(32, #take(4, #abiCallData("burn", #uint256(0)))))) or
  (sig == #asWord(#padRightToWidth(32, #take(4, #abiCallData("mint", #address(0), #uint256(0))))) and May == 1)
  sig <= 115792089210356248756420345214020892766250353992003419616917011526809519390720 
  sig >= 0

returns 1
  1. It seems like the contents inside an iff section represent preconditions for a proof to start running. Does the VCallValue==0 mean that the tx must not have been sent with Ether (ie: similar to a check for tx.value==0)?

  2. Is it fair to say that the (sig == #asWord(#padRightToWidth(32, #take(4, #abiCallData("burn", #address(0), #uint256(0))))) essentially:

    • #abiCallData - pads the arguments sent into the function
    • #take (N, WS) - truncates and/or pads depending on the size of the data to N=4
    • padRightToWidth(N, WS) - after truncating, ensures that the result of #take(4..) is 32 bytes.
      • From the name, it sounds like its purpose is to add additional padding (if the lengths are not equal) - but the documentation seems to suggest that it’s a check of length equality. If the output of #take().. doesn’t match the expected length of N, does the padRightToWidth add additional zeroes for padding?
1 Like
  1. Yes, VCallValue == 0 states that 0 ETH was sent. iff does not quite behave how you conjecture–how iff conditions are used (simplifying slightly) is that klab, for each act, creates a pass spec where the iff conditions are true, and a fail spec where they are false. The fail spec of course asserts that the function reverts, whereas the pass spec asserts that it does not revert (along with other conditions defined in the spec like storage reads and rewrites, return values, etc).

  2. Here’s what each piece does in the RHS expressions of the sig comparisons:

  • #abiCallData : computes the Solidity call data as a K byte array (see definition here).
  • #take (N, WS) : just takes the first N bytes (in the spec, this is 4, so it’s picking out the function selector); you can see how it’s defined here.
  • #padRightToWidth (N, WS) : pads to the right with zero bytes, up to the specified width (N); defined here.
  • #asWord : converts an array of bytes to an EVM word (so, in the K representation, just an integer between 0 and (2^256 - 1) inclusive), defined here.
2 Likes

Ah, very interesting. Yeah, I took a brief glance at the KEVM whitepaper and I noticed the distinction between the fail and the pass spec, but I didn’t realize that’s how it worked. Will definitely have to play around with it.

Got it - thanks so much for those links and the short description of the functions. Helps a lot!! :slight_smile: I really appreciate the explanation - thank you @Kurt_Barry!

2 Likes

Hmm… I’m still sort of trying to understand the RHS. I’ve sort of psuedo-coded (pseudo-mathed?) it out, but having a bit of trouble seeing how this works:

Step 1: #abiCallData:
According to the formula in the EVM semantics link, #abiCallData = F1 : F2 : F3 : F4 : T1 : ... : T32 : V1 : ... : V32, where:

  • F1 : F2 : F3 : F4 = keccak("burn",#address(0),#uint256(0))
  • T1 : ... : T32 = bytes(#address(0))
  • V1 : ... V32 = bytes(#UINT256(0))

abiCallDataResult = [keccak("burn",#address(0),#uint256(0),bytes(#address(0)),bytes(#uint256(0))]

Step 2: #take(4, abiCallDataResult):
From the description above and the description in the semantics, pulls out the function name which means the result should be:

  • #take(4, abiCallDataResult) = "burn"

Though I’m a bit confused as to whether the result of #take(4, abiCallDataResult) is equal to the plaintext function selector ('burn') or equivalent to the keccak of the burn function selector (keccak256('burn'))?

EDIT: After going through this flow a bit more, is N=4 chosen because want to specifically check lemmas only on bytes4, which would result in the bytes of addr(0) and uint256(0) being dropped?

(Sidenote: For simplicity, I’m referring to this result as takeResult in future steps.)

Step 3: padRightToWidth(32, takeResult)
If the function pads 0’s to the right, up to 32 bytes, then would the result of this be:

padRightResult = #padRightToWidth(32, `takeResult`)
padRightResult = [takeResult] + [bytes(0)]*(offset to make this length 32 bytes)

Why would the bytes array need to be padded with zeroes up to 32 bytes? Does this step need to be done to convert the array into an EVM word? On that note, is an EVM word essentially a bytes32 var in Solidity?

Step 4: #asWord
The padRightResult is a bytes array, so the #asWord converts this to an EVM word. And then when K parses the word, it pulls it in as an integer value, which is checked on the bounds of (0, maxBytes4). Is this something that happens under the hood where if K reads in a word and the result of that is an integer?

Thanks! :slight_smile: