Revisiting GSM Delay & Dark Spell Social Layer Discussion

The March market collapse was the focal point for most of the past month. Now that things have cooled down, it’s time to revisit the GSM & Dark Spell discussion that we started back in February.

Link to the original “Dark Fix” discussion. To better fit Maker nomenclature, specifically the spell, we’ll refer to the fix going forward it as the “dark spell.”

As many of you are aware, the GSM delay was reduced from 24 to 4 hours during the downturn. We’ll start the discussion tomorrow on how we want to proceed with any changes to the GSM delay.

The second topic of discussion is the dark spell social layer. Three methods were proposed back in February. We’d like to walk through those methods again, start collecting feedback on the community’s preferred solution, and begin formalizing the process.

Link to tomorrow’s slides can be found here.

Discussion will start during the G&R meeting and overflow will continue in this thread. Looking forward to starting this discussion again!

The Three Models

(1) Major MKR Holders Model

The developer team reaches out to a pre-selected committee of MKR holders and shares the details about the critical vulnerability and patch. The selected committee of MKR holders would then sign a transaction confirming support for the proposed dark spell solution.

Advantages

  • Dev team (current or EPC) interacts directly with pre-selected committee.
  • Number of people with knowledge of the bug kept to a minimum.

Disadvantages

  • High trust required that the dev team / MKR holder committee does the right thing.
  • MKR holder committee may not have the technical acumen or feel comfortable vouching for proposed fix.
  • Potential conflicts of interest.

Implementation Details

  • Criteria for the MKR holder committee
    • Number of members?
    • When to select new committee?
  • Iterate with the community to define a formal document outlining the steps to be followed.
  • Decide avenue of communication with large MKR holders.

(2) Independent Auditor Attestation Model

An agreement is signed with an independent auditor (IA) in advance to perform an audit when the need arises. When a critical vulnerability is detected, the dev team reaches out to the pre-approved auditor who will vet and vouch for the dark spell solution.

Advantages

  • Dev team (current or EPC) interacts with dedicated, specialized auditor with contractual agreement in place.
  • Community + MKR holder input on the process.
  • Number of people with knowledge of the bug kept to a minimum.

Disadvantages

  • The dev team and the IA are fully entrusted authorities over entire system and all of its assets.
  • Costs of having attestation agreement with auditor.
  • Centralized, with Maker governance oversight.

Implementation Details

  • Criteria for the IA relationship:
    • What do we want from an auditor?
    • How to select the auditor(s)?
  • Obtain community input on criteria to select IA.
  • Community vets and approves IA, and can replace or amend IA relationship with governance vote.
  • More flexibility to design a process. For example, the community, and not just MKR holders, select one or more independent auditors at time of bug discovery to examine the fix alongside a pre-approved list of auditors.

(3) Community Appointed Trusted Party Model

When the development team is ready with the bug fix implementation, the community will be asked to appoint an independent trusted party to attest to the validity of the vulnerability and the dark spell solution.

Advantages

  • Dev team (current or EPC) informs the community which then spearheads the efforts to determine a trusted party (technical details of bug are not revealed).
  • Largely community driven.

Disadvantages

  • Publicly communicating that a bug exists in the system.
  • Community may take extended time deliberating on a solution.
  • Reactionary (there would be no independent attestor in place).
  • No agreements in place beforehand.

Implementation Details

  • Iterate with community to define formal process to follow once a bug has been identified.
  • This is a reactive process. The dev team will communicate that a bug has been discovered and will work with the community to find a resolution that works for the particular situation.
5 Likes

Thanks for providing these slides in advance @wil, much appreciated.

@MakerMan I know your working on alot of other stuff, but this sort of issue is what you have worked on for a living basically right? Very curious on your opinion.

With a committee of MKR holders I would want them to be public about their involvement in potential dark fixes. Otherwise I see to many opportunities for insider trading. Actually regardless with them being public, there are still many opportunities for profiting on that information, so seems a bit tricky.

I am a little confused with the community driven solution presented here, but what I was envisioning is that potentially we could vote on individuals in the community who would be disclosed the information in a dark fix scenario. I guess ideally we would want to vote for people with strong technical backgrounds, active presence in the community and ideally don’t hold a ton of MKR. One issue is that we cant guarantee that those people would be around once a dark fix emergency actually happens.

Either way, mkr or community committee, having a plan b for the social side seems like a good idea, maybe thats where the independent auditors could come in? I am assuming the devs have relationships with auditors who could fill that role.

Also we should do a practice dark fix to test the technical aspect and whatever other mechanisms we eventually settle on. Thanks for the presentation Wil!

2 Likes

Just adding my comments in the CC side bar here as a placeholder.

I really liked the 3 choices offered Independant auditor, Community group, and MKR backed group.

First point is that this should be written up as a MIP.

Regarding issues, trust, and mechanics… Think about MKR stake bonding + MKR governance promise backing on issues.

Since this requires significant community trust (it did in the past and will continue to do so into the future) I think the greatest trust is placed in MKR. I think an interesting solution to the trust issue is to have entities who want to do this work to put up a MKR bond of some reasonable size which can be used to slash claims against for access to this kind of Maker community work paid with a commensurate fee (this isn’t just doing very demanding, time critical work, but a real insurance bond has to be posted to access the market on top of everything else).

This MKR bond can come from the community, or by some form of promise delegation. Whether and how much to actually stake vs. promise as trust support is a different issue.

There issues here which we have already discussed.

  1. Secrecy is required in all aspects of discovery, implementation, testing, deployment. An organization doesn’t want have advertised a potential exploit before fixing it.
  2. Often times there is a urgent time element (this breaks down into exploit is being used NOW, vs. could be used at any time, or under X,Y,Z conditions)
  3. 1+2 usually lead to a huge trust component which often necessitates forms of insurance to satisfy the failure of trust elements.

I think it would be important for the system to have at least two independent bonded parties put up MKR be bonded by some MKR holder so on at least this important issue we have no less than a minimum of 2 independent parties signing off.

Finally as I said while this may not be ideal - I think this whole thing should be written up as a formal MIP. We can talk about this but I know a lot of people are going to be focusing on ‘how do we trust A,B, or C in this process’. How does governance even vote on such a thing. My point with the bonding above is that if we can get like 50+K MKR to back multiple parties on this - the governance part is simply a formalization of the work it is a kind of governance/MKR holder supported done deal from a governance perspective. Doing the vote typically will only be a formality.

Remember how we had this CC saying we needed this ‘dark fix’ etc. from the foundation. Everyone was like - well what do we need it for - what is the risk we are looking at - and really no questions could be answered regarding any details. Everyone basically had to trust foundation, and in the end MKR voting for it did… I think with a kind of bonding stake amount + a MKR voting support delegation we could literally solve the actual ‘governance mechanics’ and ‘trust issue’ directly - as can best be done with the current governance configuration with the above 3 constraints applied.

I think this will and should inspire a lot of serious governance discussion because this issue of MKR doing stuff directly connects to trust in the system and what this means when Maker foundation is formally dissolved.

All I have at the moment I may think of more.

4 Likes

I think an interesting solution to the trust issue is to have entities who want to do this work to put up a MKR bond of some reasonable size which can be used to slash claims against for access to this kind of Maker community work paid with a commensurate fee (this isn’t just doing very demanding, time critical work, but a real insurance bond has to be posted to access the market on top of everything else).

Help me understand this better:

  1. Multiple MKR holders stake X tokens.
  2. This group of MKR holders has oversight on the governance process to vet an auditor set.
  3. Independent auditors attest that there is a vulnerability and the suggested patch fixes it.
  4. Critical bug is fixed.
  5. Independent auditors are paid for their services.
  6. Staked MKR holders receive commensurate compensation for the risk they’ve taken on in this role.

Abstracting away the stringent details, I’m trying to better understand the compensation breakdown for people who stake and how they will be incentivized to act given a critical bug is discovered.

1 Like

Adding onto the follow ups from @wil

  • Under what conditions do the MKR holders lose their stake?
  • When we say bond here do we mean bond in the I.O.U type sense.
  • When/How are the auditors paid?
  • Who determines the “commensurate compensation” for the bond issuer.

It would be good to summarise the three options in the initial post of this topic so that people don’t have to dig out the pros and cons from the slides.

I updated the initial post.

To keep the discussion going, I’ll create a signal request for the GSM delay. I wanted some initial thoughts on the parameters. Options:

  • Keep as-is at 4 hours
  • Increase by X hours (presumably we can schedule this or the next in next week’s executive?)
  • Return to 24 hours

Just trying to get some feedback before I type up the request.

3 Likes

My 2 wei for you then:

  1. Let’s go back to 24hrs for now.
  2. Option 2 (IA) sounds like the path of least resistance while following acceptable best practices to me. Very much the lesser of three weevils, but that’s what governance comes down to in the end, imo :wink:
1 Like

Awesome, we reduced to 4h to have better response time in that urgent situation, so going back to 24h now sense to me.

Honestly I’d like to see something like 72 hours as an ideal. But agree, I’d prefer 24 hours to 4 hours.

1 Like

Given everything that has happened lately within and outside of Maker our ability to adjust to events is pretty important. Taking 3 days for any change to manifest in the protocol seems somewhat extreme in the present state, especially given the fluidity of the global markets. Like war (I guess I mean more war cough… iran cough…) could break out at any point really. Who knows how that would impact Maker.

I agree. We survived the fallout from Black Thursday as well as we did in no small part to the ability to make rapid changes in the face of complete chaos. Imagine what the next one could be like if there are no changes to the system for 3 days.

Would be pretty heartbreaking to watch the protocol fail in slow motion because governance can’t execute a spell. Does the community really think we are out of the woods?

The circuit breaker is outside the GSM delay. It would also give us a good incentive to make sure all the auction parameters are set optimally.

I agree though, I think. 3 days is a long-term target, not an immediate one.

It would be a good motivator but it could be argued that the auction parameters were optimal… until we had a minor economic apocalypse and a global pandemic.

The circuit breaker is also an untested (in the wild) tool. It was a last minute concoction from my understanding (someone tell me otherwise if its been in development for awhile), and shouldn’t be considered a dependable solution to anything.

1 Like

I’ve posted the signal request:

Please take a moment to vote!

1 Like

I am honestly confused here regarding GSM delay.

Why are we not including a discussion regarding my suggestion to have a MKR seasoning period before MKR can enter governance in particular and whether this kind of approach could work.

Someone who has an idea can they ELI5 to the rest of us what we are being protected against by this GSM delay that a ‘MKR seasoning approach’ can’t solve. Why is instant access to all MKR control functions (including the ES btw) so bloody important that we then have to apply a GSM to pause the changes by MKR governance and leaving the system exposed to what I will call a ES attack by 50K wild MKR.

Wouldn’t a better approach be to

  • Have some idea of whether MKR in governance is acting appropriately and can be ‘trusted’ within governance to steward the system properly for all MKR holders. (this may include people who are voting within governance to stand up and be accounted for in some polling mechanism and tracking of MKR entering the system).
  • Have some control over a timeframe that wild MKR has to season in a wallet to limit entrance to governance and therefore effectively limit all governance attacks to ones that ‘take time’ to execute.
  • Set the GSM to a reasonbly short (if not elminate it completely) delay before the HAT can be moved and spells set to execute.
  • Allow only MKR in governance to access MKR control functions including ES.

There is an addendum here that I thought of recently is that the GSM basically could be somewhat dynamic in that if say MKR at the level of last hat enters the system the GSM even with the seasoning rule applied is automatially put up to 24hrs. But if MKR has not changed in governance by any significant amount over the past 30days say lower the GSM literally to zero.

In this kind of model Maker potentially gets the best of both worlds.

I am going to add something regarding circuit breakers btw. Why are we not including a minting circuit breaker that has no delay like the auction/liquidation delay. From what I can see if we stop auctions and minting of DAI we effecively ‘halt’ the system in total we close the surface area for attack pretty nicely. The only other attack is a collateral extraction attack which if we are paranoid we could and should put in the system a collateral withdrawal circuit breaker which pretty much gives the HAT the ability to literally freeze the system in a single spell (stop auctions, minting, and collateral withdrawal) effectively halting the system without ESing it.

This would take significant technical work on the chief, which will probably require MIPs and alot of convincing. Any new code on the chief also invalidates any old audits. There appear to be several changes people want to the voting system, I dont know what is safe from a software perspective, either incremental changes or large updates, but either way we should organize all these changes people want and see how they can work in concert.

They dynamic GSM seems overly complex and not necessarily safe. The attackers just would need to wait 30 days after locking their wild mkr and could wreak havoc without a delay.

I really like the non ES emergency freeze options.

1 Like

I pretty much agree having the dynamic GSM to MKR deposited def not the best KISS solution.

There really needs to be a whole discussion regarding how the community trusts MKR in the governance contract to actually carry out system actions and what is a reasonably governance delay when foundation dissolves. Since this whole idea of 50+K MKR even waiting 30 days to execute what they want basically backstops with someone catching the ‘badHAT’ within the GSM delay period.

This remined me that if the foundation dissolves the community is going to need a Maker operations group that watches the system 24/7/365 with 1 deep redundancy like a hawk and can flag important MKR holders action is required. This all literally has to happen within the GSM delay (which I agree we probably need).

IF we can effectively halt the system or relevant portions until we can square away a goodHAT governance action then basically we only need to practice (via firealarm kinds of drills) in what time frames we can muster MKR to vote. Literally this would be a kind of ‘firealarm’ executive that we simply put up to see how long it takes for our governance MKR to act and use this data to fine tune the GSM delay.

What do we have left in the basket of tricks if someone gains control of the HAT.

I have it as.

delayed by GSM
HAT control and spell execution.
HAT override spell (has to be caught and governance executed within the GSM btw)

Immediate
and basically one auction circuit breaker which requires governance to execute.
ES the system which is effectively immediate and irrevokable.

All of these actions necessairly require MKR to act via governance and construct/execute spells except ES.

Now it might be interesting to have a feature that even 1000-5000 MKR could execute one of the halt spells. This way we just need ‘semi trusted’ agents that can halt pieces of the system at any time. If the halt was found by the community to be improper or malicious just slash the Maker - re-enable and move on. In fact this kind of control would be a perfect thing to test operational response on.

A firealarm test can be someone with the required MKR to trigger a stop on liquidations and then see how long it takes for someone else to fire up an executive or to sent the alert for governance response and how long it takes to evaluate and re-enable In the end if this was determined to be a risk or operations team designed firealarm test the MKR governance response MKR is restored and the data collected to asses - governance response.

This is purely an operational characterization…

We put in a liquidation circuit breaker in literally what days. I really think we should seriously consider ability to halt DAI minting, as well as collateral removal and to have this action to be enablable by some amount of MKR that could get slashed after the fact (i.e. it stays in a contract until governance restores or slashes).

BTW: The incentive for reporting and action on these tests can be the reporter, system informer basically get a fee for performing a system response function. Eventually this stuff could and should be somewhat automated. MKR holders with governance MKR also submit a way to contact so once a system alert is triggered software basically automates the propagation of the ‘system alert’ to relevant governance participants to act. Given we will probably need a lot of governance participation and fast response I honestly think we need to consider compensation for governance participants performing these operations monitoring and response functions - they will by necessity be doing a lot of work potentially at any hour or day of the year.