Compacting ERC20 Logs
I have zero time, so I herein produce a very quick take on something I’ve been thinking about for a long time. This is more a thought experiment than anything else.
Fact: Very, very, very many of the transactions on the Ethereum blockchain are simple ERC20 token transfers.
So what?
What If ERC20 Tokens were Native to the Chain?
Every token transfer that appears on the chain using the standard ERC20 token transfer
function (or transferFrom
, which we see below can be safely ignored).
The transfer function is defined in the ERC20 standard as:
function transfer(address _to, uint256 _value)
which translates into a four-byte signature of 0xa9059cbb
.
Looking at a typical ERC20 transfer gotten from the RPC, it looks something like this:

And, the log generated by a transfer looks like this:

If you look closely at this there’s a huge amount of redundant information.
Coloring Stuff Makes it More Obvious
Looking at that some gobblygook bytes colored to make it more clear, gives:

We see…

for the transaction and…

for the log. Notice anything?
Every piece of the token transfer’s receipt is already present in the transaction data except status
.
Here’s an Idea
What if we removed the redundant information from the events that the standard requires and turned the log for transfers into a special case native to the chain.
The log for a transfer could look something like this, for example:

In other words, all we really need to know about an ERC20 token transfer is whether or not it succeeded. All the other information is already included in the transaction.
Obvious Objections
It’s too hard, to change now. All the dApps would break.
This is true. A lot of things would break. But I’m suggesting we think forward to the next 50 years, not the last five.
It’s too hard, all the client code would have to be modified.
This is also true. See my previous response and see below where I make a quick back-of-the-envelope calc on how much space this might save.
It slows down queries.
This is also true. It would slow down queries because the node software is highly optimized for delivering log-related queries.
Currently, a dApp sends the transaction and then queries for the log and, as a result, this is why the redundant information is probably included. Presumably, though, the dApp was the source of the transaction, so it already has the information needed to reconstruct the full log.
In the case of an after-the-fact, off-chain scrape the transactional information is most likely available already as well because many off-chain scrapes will scan the blocks and scan the transactions before scanning particular logs.
Benefits
The size of data stored by a node is decreased.
Even if this doesn’t make it into the protocol level by becoming a special case native primitive, the observation leads us to conclude that we could greatly reduce the size of the data on the machine’s hard drive. We’ve calculated an estimate below using very rough back-of-the-envelope calculations.
The number of total bytes transferred over the wire is cut in half.
The amount of space “on the wire“’” is infinite — isn’t it? What does high traffic even mean?
It means there are too many bytes trying to jamb their way onto the wire. So every little bit counts, and this could lower the number of bytes going over the wire significantly.
Smaller data means more “regular people” can run nodes.
The whole goal of everything I do is to make running a local node easier. One of the biggest complaints about running a node is how much disc space it takes up. This would lower that amount and thereby allow more people to run more nodes.
How Much Space Might This Save?
We ran the following commands from the TrueBlocks command-line tool chifra
:
chifra blocks 1756978-1757000 | grep input | cut -c1-26
This produced a file (file.txt
) with data from around 24,000 transactions extrating only the input
fields. This represents about 200 blocks randomly sampled across blocks between 3,000,000 and 13,000,000.
That data looks like this:
"input": "0x",
"input": "0x18cbafe5"
"input": "0xc9807539"
"input": "0xab834bab"... plus 24,375 more rows...
Not amazing, but fairly interesting.
We ran the following command against that data file and found 1,578 different four-byte codes in the 24,379 records with 10 of them showing more than 100 transactions with that four-byte.
cat file.txt | sort | uniq -c | sort -n
Using the Ethereum Four Byte Directory, we find (for 10 most frequently appearing functions) this information:
Count Four-Byte Signature
------------------------------------------------------------------
105 0x23b872dd transferFrom(address,address,uint256)
111 0x202ee0ed submit(uint256,int256)
117 0x6ea056a9 sweep(address,uint256)
127 0xef343588 trade(uint256[8],address[4],...)
191 0x38ed1739 swapExactTokensForTokens(uint256,uint256...)
195 0x18cbafe5 swapExactTokensForETH(uint256,uint256,...)
352 0x7ff36ab5 swapExactETHForTokens(uint256,address[],...)
486 0x095ea7b3 approve(address,uint256)
7485 0xa9059cbb transfer(address,uint256)
10343 0x (straight up ETH transfer)
Or, stated as percentages:
Percent Four-Byte Signature
------------------------------------------------------------------
0.54% 0x23b872dd transferFrom(address,address,uint256)
0.57% 0x202ee0ed submit(uint256,int256)
0.60% 0x6ea056a9 sweep(address,uint256)
0.65% 0xef343588 trade(uint256[8],address[4],...)
0.98% 0x38ed1739 swapExactTokensForTokens(uint256,uint256...)
1.00% 0x18cbafe5 swapExactTokensForETH(uint256,uint256,...)
1.80% 0x7ff36ab5 swapExactETHForTokens(uint256,address[],...)
2.49% 0x095ea7b3 approve(address,uint256)
38.36% 0xa9059cbb transfer(address,uint256)
53.01% 0x (straight up ETH transfer)
So 91%
of all the transactions we sampled were either a straight-up transfer of ETH or an ERC20 token transfer.
Hand Waving
We ran the following command against the same set of blocks:
chifra blocks --raw 3000000-13000000:50000 | jq | grep size
and summed the result to find that the 200 blocks we sampled take up 5,132,592 bytes (5 MB) on the hard drive. Extending that out across the 13,800,429 blocks at the time of this writing, we get an estimated size for just the blocks alone at 5,132,592 * 13,800,429 = 354,159,857,410
bytes or about 350 GB
for the block data alone.
A very rough guess is that there is as much log data (which isn’t stored as part of the blocks) as there are blocks, and if we add 350 GB
to 350 GB
we get 700 GB
which is on the order of magnitude of the known chain size (2TB).
So, let’s use 350 GB
as the size of just the logs.
Extending 350 GB * .3836 * .1
(because we can decrease the size of a transfer log to 1/10
its current size) we get 13.5 GB
. Is that a lot? Not really….
We could if we replaced all the transfer
logs with a simple boolean showing success or failure and picked up the remainder of the data from the transaction that spawned the transfer, decrease the size of the data on the hard drive by about 15 GB
or 1%
of the total (assuming 1.5 TB
in total).
Conclusion: Not worth the effort!
→[Correction — 12/30/2021]
I made a mistake in the above calc. It should have used a value of .9
not .1
since we are decreasing the size to 1/10
its original size. So 350 GB * .3836 * .9
would save 120.834 GB
. That’s actually pretty much, so different conclusion. Might be worth it.
→[Correction — 12/30/2021]
Support Our Work
TrueBlocks is totally self-funded from our own personal funds and a few grants such as The Etheruem Foundation (2018), Consensys (2019), Moloch DAO (2021), and most recently Filecoin/IPFS (2021).
If you like this article or you simply wish to support our work go to our GitCoin grant https://gitcoin.co/grants/184/trueblocks. Donate to the next matching round. We get the added benefit of a larger matching grant. Even small amounts have a big impact.
If you’d rather, feel free to send any token to our public Ethereum address at trueblocks.eth or 0xf503017d7baf7fbc0fff7492b751025c6a78179b.
Join Coinmonks Telegram Channel and Youtube Channel learn about crypto trading and investing