The Graph Logo

Grant funded project allowing data scientists to use DeFi data from different subgraphs.

Keyko proposed Grant funded project allowing data scientists to use DeFi data from different subgraphs in a structured way using directly an API without the need to create any queries.


The proposal from Keyko was selected and we received a grant from The Graph Foundation to build the solution.


Keyko built a Python package to aggregate data from several DeFi subgraphs of different protocols.


How to use The Graph to obtain data from different DeFi protocols and create a standardized dataset?


Obtaining and analyzing data from a blockchain is becoming easier. However, converting this data into a homogeneous and standardized dataset can be more complex and tedious to achieve if we want to use several DeFi protocols as a source.


From our experience, having common and standard data models, such as Fpml, has made dealing with data easier by transforming the inputs of several applications into a common model where the field names reflect the same content and the same path where to find it.


The problem that we faced in our experience using The Graph as data source for our DeFi datasets is that we always needed to perform transformations on the data, create specific mappers for each protocol, and use queries inside the notebooks. This made the code more complicated, focusing on the technical aspects rather than the data, which was our main interest.


Getting data


As an example, we can see the different queries that we need to make to get the data from two protocols, Aavesome and Compound.


The underlying data that we want to obtain has the same meaning. We want to get the data related to borrows, and we are interested in the same fields (user who made the request, date of the request, amount and currency involved).


If we look at the queries that we have to make for this, we can observe quite a few differences between protocols.


To obtain this data in Aave, we would have to write:


borrows {

user {


reserve {






While for compound the query would be:



borrowEvents {





Same information, but organized differently and with different field names. If we want to save it in a CSV to feed a model or in a database, we will need to perform different transformations.


The solution


Our proposal was to create a common model for the entities that protocols have in common (borrows, deposits, liquidations, swaps…) and transform all the different inputs into a common object that has the same variable names and units, creating a standard way to use this data.


Defining a common model


In our example, to define a common model that has the same names regardless of the names assigned in the protocols, we created e a class with the following structure:


This class holds the values that we want to save in our data set, and a to_dict() function to serialize the object in a Json file, making it easy to save in a data frame.


Loading the data into a Dataset


Once we have established  a common model for saving the data, it enables us to query several protocols and transform all the different data from the queries into a common structure. With this approach, we have created a common object with the same names, making it easier to put into a data frame or database for analysis.


If we look at the data, we can see that it’s organized in the same way. This allows the data scientist to focus only on analyzing this data, without having to deal with all the technical aspects. They can get a data frame and start analyzing it.


Next steps


We worked on creating this common model for the different entities shared across various protocols and incorporated as many protocols as possible. Then, we created an open-source Python package that is available to install using PyPI and call the appropriate functions to retrieve this data just using a standard API.


Read the full case study here