Chapter 3: Mechanics of Bitcoin
This chapter is about the mechanics of Bitcoin. Whereas in the first two chapters, we’ve talked at a relatively high level, now we’re going to delve into detail. We’ll look at real data structures, real scripts, and try to learn the details and language of Bitcoin in a precise way to set up everything that we want to talk about in the rest of this book. This chapter will be challenging because a lot of details will be flying at you. You’ll learn the specifics and the quirks that make Bitcoin what it is.
To recap where we left off last time, the Bitcoin consensus mechanism gives us an append-only ledger, a data structure that we can only write to. Once data is written to it, it’s there forever. There’s a decentralized protocol for establishing consensus about the value of that ledger, and there are miners who perform that protocol and validate transactions. Together they make sure that transactions are well formed, that they aren’t already spent, and that the ledger and network can function as a currency. At the same time, we assumed that a currency existed to motivate these miners. In this chapter we’ll look at the details of how we actually build that currency, to motivate the miners that make this whole process happen.
3.1 Bitcoin transactions
Let’s start with transactions, Bitcoin’s fundamental building block. We’re going to use a simplified model of a ledger for the moment. Instead of blocks, let’s suppose individual transactions are added to the ledger one at a time.
Figure 3.1 an account-based ledger
How can we build a currency on top of such a ledger? The first model you might think of, which is actually the mental model many people have for how Bitcoin works, is that you have an account-based system. You can add some transactions that create new coins and credit them to somebody. And then later you can transfer them. A transaction would say something like “we’re moving 17 coins from Alice to Bob”, and it will be signed by Alice. That’s all the information about the transaction that’s contained in the ledger. In Figure 3.1, after Alice receives 25 coins in the first transaction and then transfers 17 coins to Bob in the second, she’d have 8 Bitcoins left in her account.
The downside to this way of doing things is that anyone who wants to determine if a transaction is valid will have to keep track of these account balances. Take another look at Figure 3.1. Does Alice have the 15 coins that she’s trying to transfer to David? To figure this out, you’d have to look backwards in time forever to see every transaction affecting Alice, and whether or not her net balance at the time that she tries to transfer 15 coins to David is greater than 15 coins. Of course we can make this a little bit more efficient with some data structures that track Alice’s balance after each transaction. But that’s going to require a lot of extra housekeeping besides the ledger itself.
Because of these downsides, Bitcoin doesn’t use an account-based model. Instead, Bitcoin uses a ledger that just keeps track of transactions similar to ScroogeCoin in Chapter 1.
Figure 3.2 a transaction-based ledger, which is very close to Bitcoin
Transactions specify a number of inputs and a number of outputs (recall PayCoins in ScroogeCoin). You can think of the inputs as coins being consumed (created in a previous transaction) and the outputs as coins being created. For transactions in which new currency is being minted, there are no coins being consumed (recall CreateCoins in ScroogeCoin). Each transaction has a unique identifier. Outputs are indexed beginning with 0, so we will refer to the first output as “output 0”.
Let’s now work our way through Figure 3.2. Transaction 1 has no inputs because this transaction is creating new coins, and it has an output of 25 coins going to Alice. Also, since this is a transaction where new coins are being created, no signature is required. Now let’s say that Alice wants to send some of those coins over to Bob. To do so, she creates a new transaction, transaction 2 in our example. In the transaction, she has to explicitly refer to the previous transaction where these coins are coming from. Here, she refers to output 0 of transaction 1 (indeed the only output of transaction 1), which assigned 25 bitcoins to Alice. She also must specify the output addresses in the transaction.
In this example, Alice specifies two outputs, 17 coins to Bob, and 8 coins to Alice. And, of course, this whole thing is signed by Alice, so that we know that Alice actually authorizes this transaction.
Change addresses. Why does Alice have to send money to herself in this example? Just as coins in ScroogeCoin are immutable, in Bitcoin, the entirety of a transaction output must be consumed by another transaction, or none of it. Alice only wants to pay 17 bitcoins to Bob, but the output that she owns is worth 25 bitcoins. So she needs to create a new output where 8 bitcoins are sent back to herself. It could be a different address from the one that owned the 25 bitcoins, but it would have to be owned by her. This is called a change address.
Efficient verification. When a new transaction is added to the ledger, how easy is it to check if it is valid? In this example, we need to look up the transaction output that Alice referenced, make sure that it has a value of 25 bitcoins, and that it hasn’t already been spent. Looking up the transaction output is easy since we’re using hash pointers. To ensure it hasn’t been spent, we need to scan the block chain between the referenced transaction and the latest block. We don’t need to go all the way back to the beginning of the block chain, and it doesn’t require keeping any additional data structures (although, as we’ll see, additional data structures will speed things up).
Consolidating funds. As in ScroogeCoin, since transactions can have many inputs and many outputs, splitting and merging value is easy. For example, say Bob received money in two different transactions – 17 bitcoins in one, and 2 in another. Bob might say, I’d like to have one transaction I can spend later where I have all 19 bitcoins. That’s easy — he creates a transaction with the two inputs and one output, with the output address being one that he owns. That lets him consolidate those two transactions.
Joint payments. Similarly, joint payments are also easy to do. Say Carol and Bob both want to pay David. They can create a transaction with two inputs and one output, but with the two inputs owned by two different people. And the only difference from the previous example is that since the two outputs from prior transactions that are being claimed here are from different addresses, the transaction will need two separate signatures — one by Carol and one by Bob.
Transaction syntax. Conceptually that’s really all there is to a Bitcoin transaction. Now let’s see how it’s represented at a low level in Bitcoin. Ultimately, every data structure that’s sent on the network is a string of bits. What’s shown in Figure 3.3 is very low-level, but this further gets compiled down to a compact binary format that’s not human-readable.
Figure 3.3 An actual Bitcoin transaction.
As you can see in Figure 3.3, there are three parts to a transaction: some metadata, a series of inputs, and a series of outputs.
And then there’s a funny line that looks like what we want to be the recipient address. Each output is supposed to go to a specific public key, and indeed there is something in that field that looks like it’s the hash of a public key. But there’s also some other stuff that looks like a set of commands. Indeed, this field is a script, and we’ll discuss this presently.
Each transaction output doesn’t just specify a public key. It actually specifies a script. What is a script, and why do we use scripts? In this section we’ll study the Bitcoin scripting language and understand why a script is used instead of simply assigning a public key.
The most common type of transaction in Bitcoin is to redeem a previous transaction output by signing with the correct key. In this case, we want the transaction output to say, “this can be redeemed by a signature from the owner of address X.” Recall that an address is a hash of a public key. So merely specifying the address X doesn’t tell us what the public key is, and doesn’t give us a way to check the signature! So instead the transaction output must say: “this can be redeemed by a public key that hashes to X, along with a signature from the owner of that public key.” As we’ll see, this is exactly what the most common type of script in Bitcoin says.
OP_DUP OP_HASH160 69e02e18... OP_EQUALVERIFY OP_CHECKSIGFigure 3.4. an example Pay-to-PubkeyHash script, the most common type of output script in Bitcoin
But what happens to this script? Who runs it, and how exactly does this sequence of instructions enforce the above statement? The secret is that the inputs also contain scripts instead of signatures. To validate that a transaction redeems a previous transaction output correctly, we combine the new transaction’s input script and the earlier transaction’s output script. We simply concatenate them, and the resulting script must run successfully in order for the transaction to be valid. These two scripts are called scriptPubKey and scriptSig because in the simplest case, the output script just specifies a public key (or an address to which the public key hashes), and the input script specifies a signature with that public key. The combined script can be seen in Figure 3.5.
Bitcoin scripting language. The scripting language was built specifically for Bitcoin, and is just called ‘Script’ or the Bitcoin scripting language. It has many similarities to a language called Forth, which is an old, simple, stack-based, programming language. But you don’t need to understand Forth to understand Bitcoin scripting. The key design goals for Script were to have something simple and compact, yet with native support for cryptographic operations. So, for example, there are special-purpose instructions to compute hash functions and to compute and verify signatures.
The scripting language is stack-based. This means that every instruction is executed exactly once, in a linear manner. In particular, there are no loops in the Bitcoin scripting language. So the number of instructions in the script gives us an upper bound on how long it might take to run and how much memory it could use. The language is not Turing-complete, which means that it doesn’t have the ability to compute arbitrarily powerful functions. And this is by design — miners have to run these scripts, which are submitted by arbitrary participants in the network. We don’t want to give them the power to submit a script that might have an infinite loop.
<sig><pubKey>
-------------–
OP_DUP
OP_HASH160
<pubKeyHash?>
OP_EQUALVERIFY
OP_CHECKSIG
Figure 3.5. To check if a transaction correctly redeems an output, we create a combined script by appending the scriptPubKey of the referenced output transaction (bottom) to the scriptSig of the redeeming transaction (top). Notice that <pubKeyHash?> contains a ‘?’. We use this notation to indicate that we will later check to confirm that this is equal to the hash of the public key provided in the redeeming script.
There are only two possible outcomes when a Bitcoin script is executed. It either executes successfully with no errors, in which case the transaction is valid. Or, if there’s any error while the script is executing, the whole transaction will be invalid and shouldn’t be accepted into the block chain.
The Bitcoin scripting language is very small. There’s only room for 256 instructions, because each one is represented by one byte. Of those 256, 15 are currently disabled, and 75 are reserved. The reserved instruction codes haven’t been assigned any specific meaning yet, but might be instructions that are added later in time.
Many of the basic instructions are those you’d expect to be in any programming language. There’s basic arithmetic, basic logic — like ‘if’ and ‘then’ — , throwing errors, not throwing errors, and returning early. Finally, there are crypto instructions which include hash functions, instructions for signature verification, as well as a special and important instruction called CHECKMULTISIG that lets you check multiple signatures with one instruction. Figure 3.6 lists some of the most common instructions in the Bitcoin scripting language.
The CHECKMULTISIG instruction requires specifying n public keys, and a parameter t, for a threshold. For this instruction to execute validly, there have to be at least t signatures from t out of n of those public keys that are valid. We’ll show some examples of what you’d use multisignatures for in the next section, but it should be immediately clear this is quite a powerful primitive. We can express in a compact way the concept that t out of n specified public keys must sign in order for the transaction to be valid.
Incidentally, there’s a bug in the multisignature implementation, and it’s been there all along. The CHECKMULTISIG instruction pops an extra data value off the stack and ignores it. This is just a quirk of the Bitcoin language and one has to deal with it by putting an extra dummy variable onto the stack.
The bug was in the original implementation, and the costs of fixing it are much higher than the damage it causes, as we’ll see later in Section 3.5. At this point, this bug is considered a feature in Bitcoin, in that it’s not going away.
OP_DUP |
Duplicates the top item on the stack |
OP_HASH160 |
Hashes twice: first using SHA-256 and then RIPEMD-160 |
OP_EQUALVERIFY |
Returns true if the inputs are equal. Returns false and marks the transaction as invalid if they are unequal |
OP_CHECKSIG |
Checks that the input signature is a valid signature using the input public key for the hash of the current transaction |
OP_CHECKMULTISIG |
Checks that the k signatures on the transaction are valid signatures from k of the specified public keys. |
Figure 3.6 a list of common Script instructions and their functionality.
Executing a script. To execute a script in a stack-based programming language, all we’ll need is a stack that we can push data to and pop data from. We won’t need any other memory or variables. That’s what makes it so computationally simple. There are two types of instructions: data instructions and opcodes. When a data instruction appears in a script, that data is simply pushed onto the top of the stack. Opcodes, on the other hand, perform some function, often taking as input data that is on top of the stack.
Now let’s look at how the Bitcoin script in Figure 3.5 is executed. Refer to Figure 3.7, where we show the state of the stack after each instruction. The first two instructions in this script are data instructions — the signature and the public key used to generate that signature. These were specified by the recipient in the scriptSig component, or the input script. As we mentioned, when we see a data instruction, we just push it onto the stack. The rest of the script is the part that was specified by the sender — the scriptPubKey component.
First we have the duplicate instruction, OP_DUP, so we just push a copy of the public key onto the top of the stack. The next instruction is OP_HASH160, which tells us to pop the top value, compute its cryptographic hash, and push the result onto the top of the stack. When this instruction finishes executing, we will have replaced the public key on the top of the stack with its hash.
Figure 3.7 Execution of a Bitcoin script. On the bottom, we show the instruction in the script. Data instructions are denoted with surrounding angle brackets, whereas opcodes begin with “OP_”. On the top, we show the stack just after that instruction has been executed.
Next, we’re going to do one more push of data onto the stack. Recall that this data was specified by the sender of the coins. It is the hash of the public key that the sender specified had to be used to generate the signature to redeem these coins. At this point, there are two values at the top of the stack. There is the hash of the public key, as specified by the sender, and the hash of the public key that was used by the recipient when trying to claim the coins.
At this point we’ll run the EQUALVERIFY command, which checks that the two values at the top of the stack are equal. If they aren’t, an error will be thrown, and the script will stop executing. But in our example, we’ll assume that they’re equal, that is, that the recipient of the coins used the correct public key. That instruction will consume those two data items that are at the top of the stack, And the stack now contains two items — a signature and the public key that was used for that signature.
We’ve already checked that this public key was the public key that is required, and now we have to check if the signature is valid. This is a great example of where the Bitcoin scripting language is built with cryptography in mind. Even though it’s a fairly simple language in terms of logic, there’s some quite powerful instructions in there, like this “OP_CHECKSIG” instruction. This single instruction pops those two values off of the stack, and does the entire signature verification in one go.
But what is this actually a signature of? What was the input to the signature function? It turns out there’s only one thing you can sign in Bitcoin — an entire transaction. So the “CHECKSIG” instruction pops the two values, the public key and signature, off the stack, and verifies that is a valid signature for the entire transaction using that public key. Now we’ve executed every instruction in the script, and there’s nothing left on the stack. Provided there weren’t any errors, the output of this script will simply be true indicating that the transaction is valid.
What’s used in practice. In theory, Script lets us specify, in some sense, arbitrary conditions that must be met in order to spend coins. But, as of today, this flexibility isn’t used very heavily. If we look at the scripts that have actually been used in the history of Bitcoin so far, the vast majority, 99.9 percent, are exactly the same script, which is in fact the script that we used in our example. As we saw, this script just specifies one public key and requires a signature for that public key in order to spend the coins.
There are a few other instructions that do get some use. MULTISIG gets used a little bit as does a special type of script called Pay-to-Script-Hash which we’ll discuss shortly. But other than that, there hasn’t been much diversity in terms of what scripts get used. This is because Bitcoin nodes, by default, have a whitelist of standard scripts, and they refuse to accept scripts that are not on the list. This doesn’t mean that those scripts can’t be used at all; it just makes them harder to use. In fact this distinction is a very subtle point which we’ll return to in a bit when we talk about the Bitcoin peer-to-peer network.
Proof of burn. A proof-of-burn is a script that can never be redeemed. Sending coins to a proof-of-burn script establishes that they have been destroyed since there’s no possible way for them to be spent. One use of proof-of-burn is to bootstrap an alternative to Bitcoin by forcing people to destroy Bitcoin in order to gain coins in the new system. We’ll discuss this in more detail in Chapter 10. Proof-of-burn is quite simple to implement: the OP_RETURN opcode throws an error if it’s ever reached. No matter what values you put before OP_RETURN, that instruction will get executed eventually, in which case this script will return false.
Because the error is thrown, the data in the script that comes after OP_RETURN will not be processed. So this is an opportunity for people to put arbitrary data in a script, and hence into the block chain. If, for some reason, you want to write your name, or if you want to timestamp and prove that you knew some data at a specific time, you can create a very low value Bitcoin transaction. You can destroy a very small amount of currency, but you get to write whatever you want into the block chain, which should be kept around forever.
Pay-to-script-hash. One consequence of the way that Bitcoin scripts works is that the sender of coins has to specify the script exactly. But this can sometimes be quite a strange way of doing things. Say, for example, you’re a consumer shopping online, and you’re about to order something. And you say, “Alright, I’m ready to pay. Tell me the address to which I should send my coins.” Now, say that the company that you’re ordering from is using MULTISIG addresses. Then, since the one spending the coins has to specify this, the retailer will have to come back and say, “Oh, well, we’re doing something fancy now. We’re using MULTISIG. We’re going to ask you to send the coins to some complicated script.” You might say, “I don’t know how to do that. That’s too complicated. As a consumer, I just want to send to a simple address.”
In response to this problem, there’s a really clever hack in Bitcoin. Instead of having the sender specify the entire script, the sender can specify just a hash of the script that is going to be needed to redeem those coins. So the sender just needs to specify a very simple script which just hashes the top value on the stack, and checks to see if it’s equal to the required redemption script. The receiver of those coins needs to specify as a data value, the value of the script whose hash the sender specified. After this happens, a special second step of validation is going to occur. That top data value from the stack is going to be reinterpreted as instructions, and then it’s going to be executed a second time as a script.
So we see there were two stages that happened here. First there was this traditional script which checked that the redemption script had the right hash. And then the redemption script will be de-serialized, and run as a script itself. And here’s where the actual signature check is going to happen. This is called pay-to-script-hash in Bitcoin and it is often abbreviated as P2SH. It is an alternative to the normal mode of operation, which is pay to a public key.
Getting support for P2SH was quite complicated since it wasn’t