MIP40c3-SP34: Add Data Insights Core Unit Budget (DIN-001)

MIP40c3-SP34: Add Data Insights Core Unit Budget

Preamble

MIP40c3-SP#: 34
Author(s): Tomek Mierzwa (@tmierzwa)
Contributors: Arran Kitson (@Arran)
Tags: core-unit, cu-din-001, budget
Status: RFC
Date Applied: 2021-09-07
Date Ratified: 

Sentence Summary

This subproposal adds the budget for Data Insights Core Unit.

Specification

Motivation

We are proposing this budget for the Data Insights Core Unit to be able to succeed in its mandate to provide a free and permissionless datasets with contextualized and enriched MCD Protocol data and continuously support and empower other members of the DAO and Community in the field of data analytics and data science.

Core Unit ID

DIN-001

Roadmap

The following diagram shows the proposed roadmap for Data Insights Core Unit for the next 9 months. This roadmap will be reviewed monthly and the backlog will be kept public.

Budget considerations

This budget secures:

  • a dedicated team of:
    • CU Facilitator (25%)
    • Two Data Engineers (100%)
    • One Data Analyst (100%)
    • One Community Manager (100%)
    • One Front-End Designer (50%)
    • One Product Manager (10%)
    • Financial Controller (10%)
  • source data:
    • fixed cost of source on-chain and off-chain data
    • it covers all data sourcing and preprocessing costs (blockchain nodes, external APIs subscriptions, decoding, integration, quality assurance, etc.)
  • data storage and processing infrastructure (estimated and reviewed quarterly):
    • AWS infrastructure (including but not limited to: EC2, S3, ECS, SES)
    • Snowflake database subscription (including Reader Accounts)
  • community empowerment costs:
    • taking part in conferences / events (monthly amount is aggregated and fully spent on relevant events and if not used then reallocated on community grants)
    • grants for community data analyst / data scientists (monthly amount is aggregated and fully spent on individual grants)
    • protocol data hackathons (monthly amount is aggregated and fully spent on logistics and prizes)

The distribution of budget across these components is shown below:

Cost component Amount
Team costs DAI 41,750.00
- CU Facilitator (25%) DAI 3,500.00
- Two Data Engineers (100%) DAI 24,000.00
- One Front-End Designer (50%) DAI 4,000.00
- One Community Manager (100%) DAI 8,000.00
- One Product Manager (10%) DAI 1,250.00
- One Financial Controller (10%) DAI 1,000.00
Data costs DAI 25,000.00
-Decoded source data costs DAI 25,000.00
Infrastructure costs DAI 7,000.00
- AWS infrastructure DAI 2,000.00
- Snowflake subscription DAI 5,000.00
Community empowerment costs DAI 9,000.00
- Taking part in relevant conferences / events DAI 2,000.00
- Grants for community data analyst / data scientists DAI 5,000.00
- Protocol data hackathons DAI 2,000.00

The total budget asked for the first 9 months is 82,750.00 DAI / month.
After this period, the budget will be revised on the basis of actual backlog, documented maintenance costs and other needs.

Interim budget

Since leaving the Foundation at the end of May 2021 our team has continued to maintain and update existing datasets (e.g. vaults history, governance actions history, liquidations history, pricing history), GUIs and APIs (e.g. MCDState.info, MCDGov.info) that are being used by other Core Units (e.g. Risk CU, GovAlpha CU). We have also updated EthTx to support Goerli at the request of Protocol Engineering.

We also respond to a lot of ad-hoc queries and answer data related requests from other CUs and community members. Currently one of our data engineers is almost fully utilized by these activities to support the DAO. This also generates substantial costs of maintaining the required AWS and Snowflake infrastructure.

Since June 2021 all of this has been provided to the DAO from our own budget in order to ensure that there was continuity for the Core Units that use our datasets between us leaving the Foundation and a proposal working its way through governance. We also understand that this proposal is likely to need at least two more months to be formally accepted by the governance during which time we will continue to maintain the datasets and infrastructure.

Current monthly costs of maintaining existing products at a minimal level of support during June to October 2021 compared to the full budget request is DAI 19,500.00 / month and includes:

Cost component Amount
Team costs DAI 12,750.00
- CU Facilitator (10%) DAI 1,500.00
- One Data Engineer (80%) DAI 10,000.00
- One Product Manager (10%) DAI 1,250.00
Data costs DAI 5,000.00
- Decoded source data costs DAI 5,000.00
Infrastructure costs DAI 1,750.00
- AWS infrastructure DAI 1,000.00
- Snowflake subscription DAI 750.00

We have investigated alternative routes to cover these costs with a few of the Core Units and the conclusion after these discussions was that we should add the amount to our budget proposal.

Therefore we ask for an additional one-time payment of 5 months * DAI 19,500.00 = DAI 97,500.00 to cover our costs for June-October 2021. It would be added to the first monthly payment if this CU and budget proposal is accepted by governance.

The regular monthly budget would then commence from 1 November 2021.

7 Likes

Sir, request for clarification–is that 82,750 DAI PER MONTH, or per 9-MONTHS?

And if I understand this correctly, you want to be REIMBURSED for costs pertaining to June-October 2021 expenses, correct?

Thank you in advanced!

2 Likes

It’s per month: “This MIP adds a Data Insights Core Unit provided as a service from Token Flow Insights with a total monthly budget of 82,750.00 DAI.

2 Likes

Thanks @blimpa for answering this :slight_smile:

Yes indeed, we are asking for:

  • monthly budget of 82,750.00 DAI / month for the first 9 months which is 9 * 82,750.00 DAI = 744 750,00 DAI in total

plus

  • one time reimbursement of Jun-Oct 2021 costs (5 months * 19,500.00 DAI = 97,500.00 DAI).

I will correct the budget subproposal to make it crystal clear.

2 Likes

@tmierzwa,

Thank you for this proposal. It is interesting, I am sure you and your team are very good with numbers.

From my point of view there are two small issues.

  1. I would like to see the demand side. It is not difficult, it looks like this.

CU RWA: for this CU we in DIN do this…
CU Oracles: for this CU we deliver …
CU PE: …and this team cannot function without our…

etc etc. Should be a breeze since you came from the Foundation.

  1. I would like to direct your attention to makerburn.com, upper right quadrant where you can see we are currently running an operational loss. The appetite for a 7-person number-crunching team costing DAI 82,000 per month and with no prospect of being a profit center might be limited. The interim team size however could possibly be more realistic.
5 Likes

Hi @Planet_X, many thanks for your questions.

Some time ago we created a list of areas of our cooperation with CUs (current and discussed).
Let me copy it here, and I’d love other Facilitators to step in to this discussion if they have any comments.


Protocol Engineering CU

Done:

  • provided Goerli and Rinkeby support for EthTx

Discussed:

  • smart contracts analytics (calls frequency, gas consumption, reverts)
  • execution simulation for transactions not mined yet
  • state diffs and state reads reporting
  • L2 rollups/proofs decoding on L1
  • L2 transactions execution decoding
  • multichain liquidity tracking tool

Growth CU

Done:

  • ad-hoc queries for grouping vaults owners according to vault age / activity

Discussed:

  • support for grants campaign for most loyal / active vaults owners
  • live dashboard with the major protocol information for partners

Risk CU

Done:

  • MCDState dataset / GUI / API - detailed vaults history with time machine functionality
  • Liquidations 2.0 dataset / API - detailed auctions history
  • Pricing history for collaterals (market / Oracles)

Discussed:

  • new, better GUI for MCDState and Liquidations 2.0 dataset

SES CU

Done:

  • ongoing support in data roles recruitment (test assignments / interviews)

Discussed:

  • ad-hoc provisioning / analysis of on-chain data

Governance & DUX CU

Done:

  • MCDGov dataset / GUI / API - detailed history of governance actions (executives/polls)
  • various ad-hoc queries, e.g. the number of new vault / users over time
  • revealing and hardening the polls results calculation algorithm
  • ongoing support for Snapshot off-chain voting integration

Discussed;

  • Delegate Contracts support in MCDGov
  • MKR token tracker with detailed insight about usage and ownership
  • restoring ‘on-chain effects’ analysis for spells with dedicated EthTx endpoint

Potentially:

  • switching from the old Spock infrastructure to TF Ethereum Data Warehouse

StarkNet Engineering CU

In progress:

  • Gas usage analytics for major protocol contracts to understand how fees reduction could bring value to MKR holders and protocol users
  • Detailed pricing history for collaterals

RWA CU

Discussed:

  • potential migration of financial reporting from Dune to TF datasets

The list above is a quick brain dump and for sure is not complete.


Please note that very important part of our proposal is the continuous delivery of free protocol data to external consumers (community members, other protocols, partners, scientists, etc.).

A nice example can be the research project of The University of Chicago which aims to quantitatively “stress test” DAI and its robustness to large market events, collateral composition shifts, and so on. We’re providing the historical data on MKR and DAI to do this study.

We believe that empowering external data users is very important for the protocol security, sustainability and transparency.


Regarding your second comment about limited appetite for 7-person number-crunching team…

  • our proposal assumes approx. 5 FTE (some roles are part time)
  • crunching numbers is much more important in hard times than in times of prosperity

Best
Tomek

5 Likes

Yes, this would be fantastic! We have a lot of data needs at DUX, and we have had some issues with our current ETL process. Our team are mostly front end coders, not data scientists, so it’s helpful to have a core unit like Tokenflow to collaborate with. The on-chain effects endpoint is one important component in the verifiability of the governance process. We are focused on finding ways to improve participation by presenting data to our users to help them make informed decisions when voting.

4 Likes

I can confirm that the Risk Core Unit utilizes vault activity and liquidations data provided by @tmierzwa 's team. This data is already highly integrated with our models. Why we think this CU makes sense is because we can’t imagine decoding on-chain data ourselves on top of building complicated risk models and behaviour analytics. We’d probably need to employ extra on-chain data specialist, but this would be costly and there is not many who know MakerDAO technical infrastructure well.

On the other hand I can also say our particular CU needs only data, not dashboards itself. But this may not be true for every CU, because I don’t know if every single of them has a development team such as ours.

4 Likes

Wanted to just comment from GovAlpha’s perspective.

The MCDGov dataset has been a useful reference tool. Most recently we’ve been using it to confirm delegate’s votes while compiling metrics. Without it, we would be directly examining transactions which would definitely take longer and be more error-prone.

There have been a few times I’ve asked for specific data / tables from @tmierzwa, both on GovAlpha’s behalf, or to organise things for others. Every time this happens, @tmierzwa is unfailingly polite, receptive and generally awesome to work with. He asks relevant questions, and always helps to clarify requirements before starting the work.

This is another item that has been useful, and that I believe will need further work in the future. Because the polling system result calculation happens off-chain, any front-end will need to implement their own version of the results algorithm. Without a detailed, public and correct reference implementation, this is not possible.

It’s also worth noting that I initially asked for a specification of the algorithm. @tmierzwa delivered that, and a reference implementation in Python. Correctly determining that having a known-good implementation would make future implementations easier - something I should have realized myself.

I’ve been a little more hands-off with this element. @prose11 and @Elihu have been heading it up. So far as I’m aware, @tmierzwa and co have been present at several meetings, and helped to clarify the requirements around the snapshot ‘voting strategies’ (how snapshot counts votes).

This would be great to have in the future, in a fully featured form. It will help reduce the burden on governance and give voters further reassurance that a spell is doing what they expect it to do.


In the future, I’m confident that having governance data available and accessible will allow us to make more targeted and effective changes to the governance system with the aim of improving participation.

Up until this point, we have lacked easy access to participation metrics, meaning that even if we were to take some action to try to improve participation, we would have no idea as to its effectiveness. The Data Analytics CU can provide this data, allowing us to judge the effectiveness of interventions in a way in which is currently not available to us.

Pursuing these avenues without the Data Insights CU is still possible, but I suspect it would be more expensive, take longer, and ultimately be of lower quality.

Thus far, it has been easier to find someone that can effectively create a front-end or dashboard than it has been to find someone that can source good data. I’m not concerned about the front-end side for GovAlpha either, especially since the Data Insights CU has also delivered functional front end dashboards in several instances up to this point.


TL;DR: Up until now, the Data Insight CU has been fantastic to work with. Having their support available in the future should allow us to take a more evidence based approach to changes to governance processes in the future.

7 Likes

Thank you @LongForWisdom for your very kind words :slight_smile:

You mentioned me personally several times in your reply. But it would be unfair to ignore the other members of our team who did most of the work behind my back. Especially @piotr.klis who manages the ETL pipelines and GUIs for Maker Protocols datasets. Also, every other Token Flow engineer contributes to our products and we believe that today the team is the greatest value.

2 Likes

Hello @tmierzwa – just a few questions here–can you please provide color on how you intend to prioritize across the 7 core units that you have already supported? It sounds like you have been doing ad hoc services for all core units–which cost MakerDAO 19,500 a month (the requested 97,500 reimbursement to cover costs for June-October 2021), and now the expense are increasing to 82,750 DAI per month. I was wondering if you can specify what more will MakerDAO get, and how will you avoid stretching your Team to thin? ( Overextend oneself)

Also, trying to get more clarity on your services. Does your CU offer any actual novel products now, or are you writing them/bootstrapping analytics products with the requested budget? Do you sell data feeds, or just analytics products? If so, will you provide MakerDAO with those services as part of this onboarding proposal to become a Make Core Unit?

Thank you in advance Tomasz!

On behalf of the PECU, I can confirm that we have worked closely and successfully with the proposed Data Insights Team. This has involved Goerli support which was critical for moving our testnet infrastructure off of Kovan. Similarly, their data analysis has been important for understanding and interpreting price movements during volatile market conditions. Likewise, as liquidity moves from L1 to L2, we have been working closely to use their tooling to decode L2 transactions, make optimizations and track liquidity movements. In summary, the Data Insight Team have always been great to work with for obtaining such information that help us make data driven decisions.

3 Likes

Let me express my support for this proposed CU. Proven product and expertise.

2 Likes

Hello @ElProgreso, thank you for the questions. Breaking them out:

can you please provide color on how you intend to prioritize across the 7 core units that you have already supported?

We have good experience of working with multiple stakeholders and use agile processes to manage our backlog of tasks and prioritize what will be worked on in current and upcoming sprints.

We would involve all Core Units in quarterly planning to prioritize our activities, and also invite the CUs to bi-weekly sprint review meetings where we discuss progress, go through any blockers and potential resource or prioritization conflicts, and also get feedback on increments developed during the sprints.

We also propose that we make our backlog and sprint boards public at least at a high level so that the wider community can see what we are working on, with appropriate labeling to show the CU origin of the request and the agreed priority.

I was wondering if you can specify what more will MakerDAO get, and how will you avoid stretching your Team to thin? ( Overextend oneself)

Since leaving the Foundation at the end of May we have been doing the minimum necessary to “keep the lights on” by keeping MCDGov, MCDState, Liquidations live, fixing bugs and then responding to urgent ad-hoc requests such as adding Goerli support to EthTx for Protocol Engineering. In the list in my earlier response to @Planet_X here this is basically the “done” tasks.

This has taken almost one data engineer full time, some of my time and some additional time from the team as well as running the infrastructure.

After approval we would be able to expand the team to be able to include the items in the “Discussed” category in the list above, and the activities more generally described in our MIP39 submission. Our resource and budget estimate is based on a sufficient team to accomplish this with some headroom to respond to the inevitable urgent requests, bugs and situations that need to be resolved.

In terms of not stretching ourselves too thin, growing the team and the agile and transparency processes described above should help us to manage this.

Does your CU offer any actual novel products now, or are you writing them/bootstrapping analytics products with the requested budget? Do you sell data feeds, or just analytics products? If so, will you provide MakerDAO with those services as part of this onboarding proposal to become a Make Core Unit?

So far, we have been keeping existing products on life support rather than developing new ones.

Our main focus is on the data itself (and supporting documentation) but we have developed GUIs where we believe there is value for the community, and where there isn’t another team using the data to build a community focused GUI for that dataset. MCDState is a good example where Risk CU uses the data feed via API and builds their own models on top, but we built the MCDState GUI to allow the wider community to also have an easy way to get access to the analytics and visualizations that the data allows. We would expect that this pattern would continue in the future.

With the increased budget, as mentioned above we would expand the current products to the wider list of activities in MIP39 submission and the specific tasks already in discussion with other Core Units.

2 Likes