Our approach to indexing EVM events

Our approach to indexing EVM events

Ethereum Virtual Machine (EVM)-compatible blockchains have a neat way for off-chain services to learn about on-chain activity: events.

Hereunder, we explain what they are and how to index them using Rust, with alloy handling the heavy lifting of interacting with the node, and tokio facilitating parallelization.

Introduction to Events

In Solidity, you can declare an event like this:

event Transfer(address indexed from, address indexed to, uint256 value);

and emit it with:

emit Transfer(msg.sender, to, amount);

You can emit events for anything you can imagine, like token transfers, price updates, permissions upgrades, etc. When something important happens on your contract, you emit an event.

From an EVM perspective, events are identified with LOG0, LOG1, LOG2, LOG3, and LOG4 opcodes. They are all the same, and the subindex indicates the number of arguments you can index.

Indexing means that you can filter by any of the indexed fields. You can achieve the same result by indexing everything and then performing the filtering yourself, but that would waste a significant amount of resources. The EVM does it for you.

Our Transfer event will be using the opcode LOG3. That's why we have two indexed topics from and to and one extra for the event signature that we will explain later on.

As you can infer, you can index up to three topics, which would correspond to the LOG4 opcode.

You may be wondering why events? Well, here's why:

  • They are very cheap to emit. In terms of gas. Way cheaper than writing to persistent storage.
  • They don't bloat the state trie, also known as the Patricia Merkle trie. This justifies why they are cheaper.
  • They use a bloom filter, which allows high-speed queries when an event is not present.

Fundamentals of event indexing

Any EVM-compatible node such as reth or geth comes with an RPC endpoint named eth_getLogs. The only parameters you need to provide to this call are:

  • The address of the contract. It can't be an EOA address, as that can't contain or execute code.
  • The fromBlock and toBlock range you want to search events in.
  • The so-called signature of the event, which is simply hashing the event name and type of each argument: keccak256("Transfer(address,address,uint256)")
  • The indexed events, if you want to apply a filter. You can use null if you don't want to apply any filters.

There is a great tool that's called cast that's part of foundry. It is a kind of Swiss army knife. It allows you to calculate the signature of any event.

cast keccak "Transfer(address,address,uint256)"

Which outputs:

0xddf252ad1be2c89b69c2b068fc378daa952ba7f163c4a11628f55a4df523b3ef

Putting it all together, we can get the Transfer events done between block 22361834 and 22362602 for the USDC contract, where the to was the zero address. This means burning them. Note that addresses shall be left-padded to 32 bytes. And note that the block numbers are provided in hexadecimal format.

curl https://reth-ethereum.ithaca.xyz/rpc \
  -X POST \
  -H "Content-Type: application/json" \
  --data '{"method":"eth_getLogs","params":[{"address": "0xA0b86991c6218b36c1d19D4a2e9Eb0cE3606eB48", "fromBlock": "0x15536ea", "toBlock": "0x15539ea", "topics": ["0xddf252ad1be2c89b69c2b068fc378daa952ba7f163c4a11628f55a4df523b3ef", null, "0x0000000000000000000000000000000000000000000000000000000000000000"]}],"id":1,"jsonrpc":"2.0"}'

Indexing with Rust and Alloy

Now that we know the basics, let's see how to build a high-performing indexer to index a set of events of a given contract.

First and foremost, we recommend using alloy. It's a great library written in Rust that abstracts away deserialization, event signatures, types, and all the nitty-gritty so that we can focus on what's important. Last May 2025, they announced their v1.0.0 release.

While alloy does a big part of the heavy lifting for us, there are other things we have to do, and we have to adapt to the use case:

  • Backfill events from contract deployment to the latest head.
  • Heavy parallelized requests.
  • Pick the block ranges from and to for each request.
  • Scale horizontally across multiple RPC endpoints.
  • Handle backpressure to prevent your memory from growing unbounded.
  • Order events for safer processing.
  • Use of middleware for failures and rate limits.
  • Batch inserts to allow faster processing.
  • Handle reorgs to safely index events in real time.
  • Polling vs subscription.

Backfilling

Most of the time, indexing begins at the block where the contract was deployed. We refer to backfilling as the indexing of events that occurred in the past. Either because we just started the indexer, or because the indexer was offline for some time, so it has to catch up.

Ideally, you shall provide the deployment block, so you avoid indexing since the genesis block, which is block 0. Since it uses a bloom filter, this indexing will be fast, but you can save some time.

Parallelizing

The second important item is parallelizing. A commonly naive approach seen in indexing is running one request at a time. We request logs for blocks 0-999. When that finishes, from 1000-1999. When that finishes 2000-2999. That's very slow.

We rely on the tokio Rust crate to spawn multiple requests simultaneously. Be aware that running too many parallel requests can cause your node to become overloaded, and its performance may even worsen.

Block ranges

Then it's also essential to choose the block range. This is the number of blocks you ask for in every request. There is no rule of thumb, but more than 10,000 is definitely too much.

It's essential to understand your use case. It's not the same as indexing a contract that has tons of events as almost none. Using 1000 can be a good choice and will work well when there are numerous events, and will also move quickly when there are none.

There is an interesting approach here that uses a binary search. With this, you aim to fetch a given number of events in each range. If you are below that target, you increase the range. Opposite if you are above. However, we will save that for another post.

Scale horizontally

To index events even faster, we can scale horizontally. This means using multiple RPC endpoints instead of a single one. With this, the bottleneck will be your downlink bandwidth and how fast you can process the events on your side.

There are numerous ways to schedule requests across all nodes. You can most likely write a PhD thesis with this, but a simple round robin goes a long way. Simple and effective.

Handle backpressure

It's also essential to handle backpressure properly so that your node's memory doesn't grow unbounded. It will depend on your ETL (Extract, Transform, Load) pipeline, so tune it accordingly. In most cases, event indexing resembles an ETL pipeline where events are:

  • Extracted from a source. The node, via a RPC endpoint.
  • Transformed. To calculate other derived fields, do some cleanup, etc.
  • Load. Store the resulting data in a database.

The Transform or Load stages can be slower than the Extract stage. In this case, you end up with more stuff coming in than going out. This can cause the memory of your node to increase exponentially.

Backpressure can be easily handled with a queue or buffer, where you only allow a maximum of q elements. When that's reached, the event indexing pauses briefly to enable the queue to be emptied.

Order events

Parallelizing vertically and horizontally is excellent for fast indexing, but it has some associated problems. Each task can terminate in a different order, and you must order the events accordingly.

Let's see an example. You send two parallel requests:

  • task1. Get events for blocks 0-999:
  • task2. Get events for blocks 1000-1999

However task2 can finalize before task1 due to many reasons. And here we have multiple approaches. One is to write whatever finishes, no matter if it's in order.

The other approach is to wait for task1 before processing task2. This allows to process the events in order, but you will be waiting for some time if requests come out of order. No approach is better. It depends on the use case.

Middleware and retries

Using a middleware is also a great choice and alloy comes with it. The most used one is a retrial middleware. Imagine one of your nodes goes offline. Your request will fail. If you don't have a retry mechanism, your indexer will crash. And this is an error that can most likely be recovered. This middleware sits between the node, "in the middle", and allows for the configuration of a retry logic.

For example, you can say that if a request fails, you want to retry up to 10 times with a given exponential backoff of a few seconds.

This retry logic is something you can implement yourself, but we highly recommend using an out-of-the-box middleware, such as RetryBackoffLayer.

Batch inserts

Most likely, you will be inserting the events in some persistent storage. Depending on your use case, it's worth checking whether your database or the tool you're using allows for batch insertions.

This can speed up processing and improve performance by orders of magnitude. Inserting 1,000 events one by one is significantly slower than performing a batch insert of 1,000 events at once.

In some cases, you may need to "read your own writes" at the time of insertion. Meaning, inserting event 2 may have some dependency on event 1. If so, make sure that batch inserting allows you to read your writes. If it doesn't, you won't be able to read until you commit the batch.

Handle reorgs

There are also reorgs, something that is the root cause of many bugs. A reorg occurs when your node’s fork‑choice rule switches its view of the canonical chain to a different branch. In practice, this means it discards previously accepted blocks and adopts a new fork that the protocol considers heavier. Each consensus has its own rule. PoW uses the longest chain, and PoS the one with the most stake on it.

This means that you could index an event from a block that later on is reorged. And maybe after the reorg that event is not there anymore. So your indexer is wrong! It indexed something that never happened!

Depending on the use case, various approaches can be employed to solve this problem. If you genuinely need real-time indexing, staying up to date with the latest head, you shall handle reorgs. Most nodes will notify you about this.

You can also check that the new block you receive always matches the previous newHead.parentHash matches lastHead.hash

Another approach is to index a few blocks behind the head, like 2 or 3. With that, you have a 30-second delay. You will still have reorgs, but the chances are way lower.

The last approach is to only index finalized blocks. This means having a delay of around 13 minutes from the head. With this, you have a very very very high guarantee that no reorg will happen. So your indexer will be fine.

Pooling vs Subscription

Last but not least, when it comes to indexing, there are two common approaches on how to stay up to date with the chain:

  • Pooling-based. You keep asking the blockchain every x second if there is a new block. If there is, you fetch it.
  • Subscription-based. You subscribe to new blocks, often using websockets. The moment a new block is added, you are notified immediately.

Each approach has pros and cons. With pooling, you won't be notified exactly when a new block is available; you will have a worst-case delay of x. Whereas with subscription-based, you are notified immediately.

But the subscription-based model has some cons. The connection may be dropped, or you may experience a brief period of being offline. In which you won't be getting blocks. Be very careful and constantly monitor that the next block is always the previous +1. With this, you catch a bunch of bugs and ensure you are not missing blocks.

So the best approach is a mix of both. Rely on subscriptions for getting the latest block, but always have pooling as a backup plan.

Case Study: Indexing Ethereum Deposit Contract

Here is an example of how we tackle indexing. We use Rust for performance, alloy to easily interact with the node and tokio to massively parallelize requests.

GitHub - BilinearLabs/beacon-contract-indexer: Indexes Ethereum’s Beacon Deposit Contract
Indexes Ethereum’s Beacon Deposit Contract. Contribute to BilinearLabs/beacon-contract-indexer development by creating an account on GitHub.

In this example we index the Ethereum deposit contract, processing around 40k blocks per second in mainnet.

We use a producer-consumer architecture, where the producer continuously spawns new tasks that hit the RPC node. We define a maximum number of concurrent requests to prevent the node from being overwhelmed.

To handle backpressure, we define a maximum queue size. When full, the producer stops generating more events until a slot becomes available. This is a safety measure to prevent memory from growing without bounds, but it would be a rare event since events are consumed at a faster rate.

For this use case, we wanted to process blocks in order, so the consumer stores the blocks in a key-value buffer. When the next block range becomes available, it's pulled from the buffer and processed.

Need help with blockchain indexing?


Whether you’re building a real‑time analytics dashboard, on‑chain data warehouse, or bespoke smart‑contract indexer, Bilinear Labs has you covered.

📧 Get in touch: contact@bilinearlabs.io

| Twitter | Linkedin | Github |