ESM Governance Permission Revocation Bug Postmortem

Start: 2019-11-18 16:00:04 UTC (MCD launch)

End: 2021-04-26 14:02:08 UTC

Authors: Kurt Barry (kmbarry1)

Status: Resolved

Summary: The ESM collects MKR and, if a threshold is exceeded, allows anyone to trigger Emergency Shutdown. The purpose of the ESM is to mitigate two serious scenarios:

  1. a critical bug that is too dangerous to share knowledge of widely or attempt to fix; and,
  2. a governance attack.

The ESM MCD was launched with was suitable to deal with the former, but it overlooked an important consideration for the latter. Namely, a malicious entity in control of the governance mechanism could steal all collateral even after shutdown via the privileged access of the Governance Security Module (GSM) to the core accounting contract (the Vat). The governance delay would have to pass before this became possible; however, with the delay of 48 hours that was in-place when the bug was discovered, Emergency Shutdown would not have reached completion before the malicious entity became capable of acting.

Impact: Some engineering time was required to discuss and fix the bug.

Root Causes: The primary cause was incomplete specification of the behavior of the ESM. Time pressures around launch and internal controversies contributed to this oversight. A particular point of confusion was that different individuals had different ideas about the purpose of the ESM–though the README of the repository always contained mention of both use cases, concerns at the time of deployment were generally more focused on bugs than malicious governance. Allowing governance intervention after shutdown is desirable in the “bug” scenario, as it allows governance to work around bugs in the shutdown mechanism itself–this took focus off of the danger of such an ability in a “malicious governance” scenario. The ultimate length of the GSM delay was uncertain as well–some thought it would eventually be longer than the time required for emergency shutdown to complete.

Trigger: Use of the ESM in a “malicious governance” scenario.

Resolution: The ESM code was modified to revoke governance permissions on sensitive modules after shutdown. The updated ESM was deployed and integrated into the live system in the executive spell that added Liquidations 2.0. This spell executed on Apr 26 2021 14:02 UTC, fully resolving the issue.

Detection: Discovered during internal discussions around the necessary upgrade of the End and ESM modules as part of MIP-45 (Liq 2.0) implementation.

Action Items:

Action Item Type Owner (GH handle) Ticket
Create guidelines for specifying new modules and designing property-based tests. prevent kmbarry1 #596

Lessons Learned

What Went Well

  • The bug was discovered before it became relevant in practice.
  • The bug was discovered through internal review processes.

What Went Wrong

  • The ESM’s intended behavior wasn’t fully specified.
  • The system was significantly vulnerable to malicious governance attacks for an extended period of time.

Timeline

All times UTC.

2019-11-18: MCD system launched with the flawed ESM.

2021-03: internal discussions around replacing the ESM as part of Liquidations 2.0 reveal the issue (exact date uncertain).

2021-04-26 14:02: the spell installing the new ESM is cast, fully mitigating the issue.

Supporting Information

Forum post disclosing the issue and fix:

6 Likes

I’m glad we have such a great team behind us working every day to make the MKR ecosystem more secure, let’s understand that nothing man-made is perfect and that’s what allows us to improve every day.

Cheer up guys and keep the improvements coming.

3 Likes