How Accurate is EtherScan?

Why build an 18-decimal place accurate ledger if it doesn’t balance?

[Accompanying Video]

I had a call this morning with a cryptocurrency accountant. He’s a wonderful fellow. One of those people who can happily wade through thousands of rows of a spreadsheet trying to get the digits to behave themselves. He’s a man after my own heart.

This accountant — call him Mr. Green — makes a good living helping people do their crypto-taxes. He’s busier than ever. He tells me, and here I quote, “Nothing ever balances.”

How is this possible? Isn’t Ethereum supposed to be an 18-decimal-place accurate ledger? Doesn’t everyone using Ethereum have the same data?

The answer, of course, is that Ethereum always does balance.

Why then is Mr. Green having trouble? It’s because he’s not using the Etheruem data directly. He’s using OPD (other people’s data). Very few of us are using data straight from the chain. We almost all of us get our Ethereum data from an API or a website. In the Ethereum ecosystem, this means either Infura or EtherScan.

I, like Mr. Green, want those annoying little digits to behave themselves. I’m becoming increasingly concerned, especially as smart contracts become more and more complicated, about the fact that “Nothing ever balances.”

Think before Tweeting…

Of course, this was met with crypto-twitter skepticism, so I‘m writing this article as a way of showing my readers what I meant by that comment.

What is an Appearance?

We define an “appearance” as either: (a) the use of the address in one of the common “address” fields of a transaction (such as to, from, eventEmmitter, or contractAddress) or (b) its use as data in the input or eventData fields of the transaction, or (c) its subsequent use in any smart contract invocation.

In other words, given the byte string representing an address, an address’s appearance list is a list of every transaction where that byte string shows up. Sounds simple.

It’s not!

To try to solve this, TrueBlocks builds an index of every appearance we can find on the chain. To test our code, we use EtherScan’s APIs. You’ll understand how surprised we were this week when we discovered that TrueBlocks consistently finds more appearances than EtherScan.

Want to read this story later? Save it in Journal.

EtherScan APIs

Quite some time ago we built a tool called ethslurp that makes use of these endpoints:

Given an address, ethslurp returns all the transactions where EtherScan says that address appears.

This week we built a shell script, fromES, that successively calls each of these five endpoints and assembles the results into a single file and then compares those results against TrueBlocks.

We ran tests against 99 randomly chosen addresses. For example, the command:

fromES 0x91c5fa6872f3a93b999843eaf06eb34a18a69a12`

produces these results:

Notice that the five EtherScan endpoints deliver 31, 0, 26, 2, and 0 records respectively. Summed, in the line labeled ‘all’, this totals 59 records.

At first blush, we were concerned that this was more records than TrueBlocks returns (40). What we discovered was that EtherScan’s five endpoints include duplicate records. We altered our shell script to remove the duplicates producing the line ‘sorted uniq’.

Looked at it this way, TrueBlocks finds one extra record not found by EtherScan. It turns out to be transaction #8156524.14 (that is, the 14th transaction of block 8,156,524).

TrueBlocks provides a few other tools that allow us to see the details of any Etherum transaction. ThegetTrace tool:

getTrace 8156524.14

returns

which I admit is a bit crazy looking, but it does contain the byte string for the given address (colored pink). It turns out that this transaction ended in a revert. This is probably why EtherScan doesn’t find it.

Etherscan obviously finds all transactions of type (a) mentioned above. It finds most type (b) transactions as well (EtherScan calls these internal transactions). If this had not reverted, it would probably have been an internal transaction because (the address is used as data as opposed to being one of the address-related fields).

This happens frequently on EtherScan. Of the 99 addresses we tested, TrueBlocks found more transactions than EtherScan did for 85 addresses. For not a single address did EtherScan identify a transaction that TrueBlocks did not find. 85–0. I call that significant.

But Is It Relevant?

Why build an 18-decimal place ledger that the entire world comes to agreement on and spends way more on than any previous computing system if we’re going to throw away data? Explain that to me, please.

Analyzing the Differences

As a quick summary, the missing transactions appear to be of at least six types:

  • Input data of both errored and completed transactions
  • Log topics of both errored and completed transactions
  • Log data of of both errored and completed transactions
  • Output data of deeply embedded traces
  • Uncle mining rewards

An example of the first type is this transaction:

https://etherscan.io/tx/ 0xa4a96ca16373239fd679711b05bcbdc138bc40a5bb2a085799c23bbaf5fd2a3a

which does not appear in the internal transaction list of the address:

0x28f4a17f8a99ab90c1a401b85d694b2c0ea40c4b

however, that address is clearly in the input data of the transaction.

Future Work

Help us continue our work. Visit our GitCoin grant page here: https://gitcoin.co/grants/184/trueblocks, and donate today.

Or, if you’d rather not expose yourself to scrutiny, and you’d still like to donate, send ETH to 0xB97073B754660BB356DfE12f78aE366D77DBc80f.

📝 Save this story in Journal.

👩‍💻 Wake up every Sunday morning to the week’s most noteworthy stories in Tech waiting in your inbox. Read the Noteworthy in Tech newsletter.

Blockchain Enthusiast, Founder TrueBlocks, LLC and Philadelphia Ethereum Meetup, MS Computer Science UPenn