Chapter 5: Bitcoin Mining
This chapter is all about mining. We’ve already seen quite a bit about miners and how Bitcoin relies on them — they validate every transaction, they build and store all the blocks, and they reach a consensus on which blocks to include in the block chain. We also have already seen that miners earn some reward for doing this, but we still have left many questions unanswered. Who are the miners?
How did they get into this? How do they operate? What's the business model like for miners? What impact do they have on the environment? In this chapter, we will answer all of these questions.
Do you want to get into Bitcoin mining? If you do, we’re not going to completely discourage you, but beware that Bitcoin mining bears many similarities to gold rushes. Historical gold rushes are full of stories of young people rushing off to find fortune, and many of them lose everything they have. A few strike it rich, but even those that do generally endure lots of hardship along the way. Flocking to a gold rush isn’t easiest way to get rich, and Bitcoin mining is starting to look like a similar proposition. As we’ll see in this section, mining is by no means a get-rich-quick scheme.
But first, let’s look at the technical details. To be a Bitcoin miner, you have to join the Bitcoin network and connect to other nodes. Once you’re connected, there are six tasks to perform:
We can classify the steps that a miner must take into two categories. Some tasks — validating transactions and blocks — help the Bitcoin network and are fundamental to its existence. These tasks are the reason that the Bitcoin protocol requires miners in the first place. Other tasks — the race to find blocks and profit —- aren’t necessary for the Bitcoin network itself but are intended to incentivize miners to perform the essential steps. Of course, both of these are necessary for Bitcoin to function as a currency, since miners need an incentive to perform the critical steps.
Finding a valid block. Let’s return to the question of finding a nonce that makes your block valid. In Chapter 3 we saw that there are two main hash-based structures. There's the block chain where each block header points to the previous block header in the chain, and then within each block there's a Merkle tree of all of the transactions included in that block.
The first thing that you do as a miner is you assemble all the transactions that you have from your pending transaction pool into a Merkle tree. You then create a block with a header that points to the previous block. In the block header, there’s a 32 bit nonce field, and you keep trying different nonces looking for one that that causes the block’s hash to be under the target — roughly, begin with the required number of zeros. A miner may begin with a nonce of 0 and successively increment it by one in search of a nonce that makes the block valid. See Figure 5.1.
Figure 5.1: Finding a valid block. In this example, the miner tries a nonce of all 0s. It does not produce a valid hash output, so the miner would then proceed to try a different nonce.
In most cases you'll try every single possible 32-bit value for the nonce and none of them will produce a valid hash. At this point you're going to have to make further changes. Notice in Figure 5.1 that there’s an additional nonce in the coinbase transaction that you can change as well. After you've exhausted all possible nonces for the block header, you'll change the extra nonce in the coinbase transaction — say by incrementing it by one — and then you'll start searching nonces in the block header once again.
When you change the nonce parameter in the coinbase transaction, the entire Merkle tree of transactions has to change (See Figure 5.2). So the change of the coinbase nonce will propagate all the way up, and since you’ll have to update all the hashes, changing the extra nonce in the coinbase transaction is much more expensive than changing the nonce in the block header. For this reason, miners spend most of the time changing the nonce in the block header and only change the coinbase nonce when they have exhausted all of the 232 nonces in the block header.
Figure 5.2: Changing a nonce in the coinbase transaction propagates all the way up the Merkle tree.
The vast, vast majority of nonces that you try aren't going to work, but if you stay at it long enough you'll eventually find the right combination of the extra nonce in the coinbase transaction and the nonce in the block header that produce a block with a hash under the target. When you find this, you want to announce it as quickly as you can and hope that you can profit from it.
Is everyone solving the same puzzle? You may be wondering: if every miner just increments the nonces as we described, aren’t all miners solving the exact same puzzle? Won’t the fastest miner always win? The answer is no! Firstly, it’s unlikely that miners will be working on the exact same block as each miner will likely include a somewhat different set of transactions and in a different order. But more importantly, even if two different miners were working on a block with identical transactions, the blocks would still differ. Recall that in the coinbase transaction, miners specify their own address. This change will propagate up causing all the Merkle hashes to change ensuring that no two miners are hashing the same inputs. |
Difficulty. Exactly how difficult is it to find a valid block? As of March 2015, the mining difficulty target (in hexadecimal) is:
0000000000000000172EC0000000000000000000000000000000000000000000
so the hash of any valid block has to be below this value. In other words only one in about 267 nonces that you try will work, and that’s a really huge number. One approximation for it that you would think about is it's about the population of the earth squared. So, if every person on Earth was themselves their own planet Earth with seven billion people on it the total number of people would be close to this number.
Determining the difficulty. The mining difficulty changes every 2016 blocks. It is adjusted based on how efficient the miners were over the period of the previous 2016 blocks according to this formula:
Two weeks is the amount of time it would take to mine 2016 if a block were created exactly every 10 minutes. So the effect of this formula is to scale the difficulty to maintain the property that blocks should be found by the network on average about once every ten minutes. There’s nothing special about 2 weeks, but it’s a good trade-off. If the period were much shorter, the difficulty might fluctuate due to random variations in the number of blocks found in each period. If the period were much higher, the network’s hash power might get too far out of balance with the difficulty.
Each Bitcoin miner independently computes the difficulty and will only accept blocks that meet the difficulty that they computed. Miners who are on different branches might not compute the same difficulty value, but any two miners mining on top of the same block will agree on what the difficulty should be. This allows consensus to be reached.
You can see in Figure 5.3 that over time the mining difficulty keeps increasing. It's not necessarily a steady linear increase or an exponential increase, but it depends on activity in the market. Things like how many new miners are joining, which in turn may be affected by the current exchange rate of Bitcoin, affect the mining difficulty. Generally, as more and more miners come online, blocks are found faster, and the difficulty is increased so that it again takes ten minutes to find a block.
In Figure 5.3 you can see that in the red line on the graph there's a step function of difficulty even though the overall network hash is growing smoothly. The discrete step results from the fact that the difficulty is only adjusted every 2016 blocks.
Another way to view this is to view how long it takes to find a block on average. Figure 5.4 shows how many seconds elapse between consecutive blocks in the block chain. You can see that this gradually goes down, jumps up and then gradually goes down again. Of course what's happening is that every
2016 blocks the difficulty resets and the average block time goes back up to about ten minutes. Over the next period the difficulty stays unchanged, but more and more miners come online. Since the hash power has increased but the difficulty has not, blocks are found more quickly until the difficulty is again adjusted after 2016 blocks, or about two weeks.
Figure 5.3: Mining difficulty over time (mid-2014). Note that the y-axis begins at 80,000 TH/s.
Figure 5.4 : Time to find a block (early 2014). Note that the y-axis begins at 460 seconds.
Even though the goal was for a block to be found every ten minutes on average, it's actually close to about every nine minutes, and at the end of the two week cycle it will get down to around eight minutes. This behavior is during a period of rapid hash rate increase. If the hash rate isn’t increasing as fast, the average time to find a block will be more stable.
There have been decreases in difficulty a few times in Bitcoin’s history, small in magnitude compared to its increases. One proposed scenario for Bitcoin’s collapse is a “death spiral” in which a dropping exchange rate makes mining unprofitable for some miners, causing an exodus, in turn causing the price to drop further. While there have been no catastrophic declines of mining power so far, there’s no inherent reason why difficulty must keep increasing.
We've mentioned that the computation that miners have to do is very difficult. In this section, we’ll discuss why it is so computationally difficult and take a look at the hardware that miners use to facilitate this computation.
What exactly is this difficult computation that miners are working on? They are computing many, many SHA-256 hashes. We've discussed hash functions and we've mentioned SHA-256 in particular. SHA-256 is a general purpose cryptographic hash function that’s part of a bigger family of functions that was standardized in 2001. SHA-256 was a good choice as this was strongest cryptographic hash function available at the time when Bitcoin was designed. It is possible that it will become less secure over the lifetime of Bitcoin, but for now it remains secure. It did come out of the NSA, which has led to some conspiracy theories, but it's generally considered to be a very strong hash function.
Sidebar. Although SHA-256 is generally considered to be cryptographically secure, its replacement, the SHA-3 family, has already been picked. SHA-3 is in the final stages of standardization today, but it wasn't available at the time Bitcoin was designed. |
A closer look at SHA-256. Figure 5.5 shows more detail about what actually goes on in a SHA-256 computation. While we don't need to know all of the details to understand how Bitcoin works, we’ll give a high level overview so you have a general idea of the task that miners are solving.
SHA-256 maintains 256 bits of state. The state is split into eight 32-bit words which makes it very optimized for 32-bit hardware, and in each round some bitwise tweaks that are applied to some of those words. Then a number of words in the state are taken — some with these tweaks applied — and added together mod 32. The result of all of these additions is wired over to the first word of the state and the entire state shifts over.
Figure 5.5 is just one round of the SHA-256 compression function, and a complete computation of SHA-256 does this for 80 iterations. During each round, there are slightly different constants applied so that every iteration isn't exactly the same.
Figure 5.5 : The structure of SHA-256. This is one round of the compression function.
So the task for miners is compute this function. To do this, they need to be able to deal with 32-bit words, do 32-bit modular addition, and also be able to do some bitwise logic. Remember that miners are racing each other so they will want to do this as fast as possible.
As we will see shortly, Bitcoin actually requires SHA-256 to be applied twice to a block in order to get the hash that is used by the nodes. This is a quirk of Bitcoin, and the reason for the double application are not fully specified and seemingly unnecessary, but at this point, it’s just something that miners have to deal with.
CPU mining. The first generation of mining was all done on general purpose computers — that is general purpose central processing units (CPUs). In fact, CPU mining was as simple as running the code shown in Figure 5.6. That is, miners simply searched over nonces in a linear fashion, computed SHA 256 in software and checked if the result was a valid block. Also, notice in the code that as we mentioned, SHA-256 is applied twice.
HDR[kNoncePos]++;
IF (SHA256(SHA256(HDR)) < (65535 << 208) / DIFFICULTY)
return;
}
Figure 5.6 : CPU mining pseudocode.How fast will this run on a general purpose computer? On a high end desktop PC you can compute about 20 million hashes per second (MH/s). At that speed, it would take you over a hundred thousand years on average at the early-2015 difficulty level to find a block. We talked about how mining was going to be a difficult slog, and if you're mining on a general purpose PC today it's a really, really big hill to get up because it's going to take you about 300,000 years on average to find a block. CPU mining is no longer profitable with the current difficulty. For the last few years, anyone trying mine on a CPU probably doesn’t understand how Bitcoin works and were probably pretty disappointed that they never made any money doing it.
GPU mining. The second generation began when people started to get frustrated with how slow their CPUs were and instead used their graphics card, or graphics processing unit (GPU).
Almost every modern computer has a GPU built in for high performance graphics. They’re designed to have high throughput, and also high parallelism, both of which are very useful for Bitcoin mining.
Bitcoin mining can be parallelized because you can compute multiple hashes at the same time with different nonces. In 2010, a language called OpenCL was released. OpenCL is a general purpose language to do things other than graphics on a GPU. It's a high level-language, but over time people started tweaking the code even further to run more quickly on specific graphics cards. This paved the way for Bitcoin mining on GPUs.
Mining with graphics cards has some nice properties. For one thing, they're easily available, and they're easy for amateurs to set up. You can order graphics cards online or buy them at most big consumer electronics stores. They’re the most accessible high-end hardware that's available to most people. They also have some properties that make them specifically good for Bitcoin mining. They're designed for parallelism so they have a lot of Arithmetic Logic Units (ALUs) in them that you can use in parallel to do different SHA-256 computations, and some of them also have specific instructions to do bitwise operations that work out quite nicely for SHA-256. They also have the property that you can drive many graphics cards from one motherboard and CPU. So you could take your one computer and attach multiple graphics cards to it.
Most graphics cards can also be overclocked which is a property that gamers demand so you can run them faster than they're actually designed for, if you want to take on the risk. And with Bitcoin mining, it might be a good idea to run the chip much faster than it was designed for even if you introduce some errors in the process. For example, say you can run your graphics card 50 percent faster but doing so will increase the error in the SHA-256 computation to 30 percent of the time. If an invalid solution is erroneously declared valid by the graphics card — something that would happen rarely — you can always double-check it on your CPU. On the other hand, if a valid solution is erroneously missed, you’d never know. But if your speed increase from overclocking can overcome the decrease in output due to errors, you’d still come out ahead. There’s a term called goodput that measures this, which is simply the product of throughput and success rate. In the above example, the throughput is 1.5x compared to not overclocking, whereas the success rate is 0.7x. The product is 1.05, which means overclocking increases the goodput by 5%. People have spent a long time optimizing exactly how much they should overclock a given chip and what errors it would introduce.
As we said earlier, you can control multiple GPUs from a single CPU, and people began taking advantage of this. They would use multiple graphics cards together for mining, and you began to see some really interesting home-brewed setups like this one shown in Figure 5.7. This was still in the early days of Bitcoin when miners were still mostly hobbyists who didn't know a lot about running a modern data center, but they came up with some quite ingenious designs for how to pack many graphics cards into a small place and keep them cool.
Figure 5.7: A home-built rack of GPUs used for Bitcoin mining. You can also see the fans that they used to build their cooling system. Source: LeonardH, cryptocurrenciestalk.com.
Disadvantages of GPU mining. GPU mining has some disadvantages. GPUs have a lot of hardware built into them for doing video that doesn’t get used by miners. Specifically, they have floating point units that aren’t used at all in SHA-256, and these are wasted when using them for mining. GPUs also don't have the greatest cooling characteristics when you put a lot of them next to one another.
They’re not designed to run side by side as they are in the picture; they're designed to be on one graphics card box doing graphics for one computer.
GPUs can also have a fairly large power draw, so a lot of electricity is being used relative to a computer. Another disadvantage initially was that you had to either build your own board or buy expensive boards to house multiple graphics cards.
On a really high-end graphics card with aggressive tuning you might get as high as 200 MH/s , or 200 million hashes per second, an order of magnitude better than you would be doing with a CPU. But even with that improved performance, and even if you're really aggressive and used one hundred GPUs together, it would still take you over 300 years on average to find a block at the early-2015 difficulty level. Due to this lack of performance, GPU mining is basically dead.
FPGA mining. Around 2011 some miners started to use FPGAs or Field Programmable Gate Arrays. That's around the same time that the first implementation of Bitcoin mining came out in Verilog, a hardware design language that’s used to program FPGAs. The rationale behind FPGAs is to try to get close to the performance characteristics of custom hardware while also allowing the owner of the card to customize it or reconfigure it “in the field.” This lies in contrast to a chip which is made in a factory and does the same thing forever.
Figure 5.8: A home-built rack of FPGAs. Although you don't see the cooling setup pictured here, a rack like this would need a cooling system.
FPGAs offer better performance than graphics cards, particularly on some of the “bit fiddling” operations. These are easy to specify on an FPGA, and cooling is also easier with FPGAs. You're also wasting less of the card than you would be a graphics card. As with GPUs, you can pack many of these together and drive them from one central unit, and this is exactly what people began to do (see Figure 5.8). Overall, it was possible build a big array of FPGAs more neatly and cleanly than you could with graphics cards.
If you were using an FPGA and using it well, you might get up to a GH/s, or one billion hashes per second. This is certainly a large performance gain over CPUs and GPUs, but even if you had a hundred boards together, each with a 1 GH/s throughput, it would still take you about 50 years on average to find a Bitcoin block at the early-2015 difficulty level.
Despite the performance gain, the days of FPGA mining were quite limited. Firstly, they were being driven harder for Bitcoin mining — by being on all the time and overclocked — than a lot of consumer grade FPGAs were really designed for. Because of this, people found errors and malfunctions in their FPGAs as they were mining. It also turned out to be difficult to optimize the 32 bit addition step which is critical in doing SHA-256. FPGAs are also less accessible to people. You can't buy an FPGA at most stores, and there are few people who know how to program an FPGA or how to set them up.
Even though FPGAs improved performance, the cost-per-performance was only marginally improved over GPUs. FPGA mining was a rather short-lived phenomenon. Whereas GPU mining dominated for about a year or so, the days of FPGA mining were far more limited — lasting only a few months.
At this point you might be wondering that if all of these solutions are so intractable today, what are people actually using? This brings us to ASIC mining.
ASIC mining. Mining today is dominated by Bitcoin ASICs, or application-specific integrated circuits. These are chips that were designed, built, and optimized for the sole purpose of mining Bitcoins.
There are a few big vendors that that sell these to consumers. There is a good deal of variety in the ASICs that you can buy. You can choose between slightly bigger and more expensive models, more compact models, as well as models with varying performance and energy consumption claims.
Designing ASICs requires a lot of expertise and their lead-time is also quite long. Nevertheless, Bitcoin ASICs were designed and produced surprisingly quickly. In fact, analysts have said that this may be the fastest turnaround time in the history of integrated circuits for specifying a problem and turning it around to have a working chip in people's hands. On the flip side, the first few generations of Bitcoin ASICs were quite buggy, and most of them didn't