The Core Issue: Keeping Bitcoin Core Secure

Bitcoin Magazine

The Core Issue: Keeping Bitcoin Core Secure

Bitcoin Core functions as the backbone for a monetary network securing over two trillion dollars in value. The stakes are immense, and large portions of the codebase can harbor high impact bugs. The consensus engine, peer-to-peer (p2p) message processing code, and cryptographic libraries are areas where vulnerabilities could enable theft, grind the network to a halt, or fundamentally undermine trust in the system. Unlike traditional financial software backed by insurance and legal remedies, Bitcoin’s security relies entirely on the quality of its code and the processes that maintain that quality.

The approach to security in Bitcoin Core is not formally defined, but rather an evolving set of practices that have improved over time. Review processes have become more thorough, testing infrastructure has been expanded significantly, and the project as a whole has become more conservative and deliberate about changes to the software. This slower pace is itself a security measure, reducing the risk of introducing new bugs through hasty modifications.

This piece examines several key aspects of how Bitcoin Core approaches security:

the disclosure policy for handling discovered vulnerabilities
the extensive fuzzing infrastructure that hunts for bugs
the broader testing toolkit that catches issues before they reach production

These practices work together, though not as a grand unified strategy, but as complementary layers of defense that have developed as the project has matured.

Vulnerability Disclosure Process

Bitcoin Core as a software project provides no automatic update functionality for the software it ships, as a protective measure for its users against its developers, and all released binaries can be verified to match the published source code through reproducible builds. Node runners are responsible for deciding which version of the software to run and when to upgrade. In the context of security vulnerabilities, this presents a serious dilemma. Fixes need to be open source for the review process before a release can be made, yet full disclosure must be delayed to allow users reasonable time to update, given that once a vulnerability’s details are published, attackers can exploit it.

Historically, the project’s public disclosure of security-critical vulnerabilities, whether reported externally or discovered by contributors, has been inadequate. This led to a situation where many users perceived Bitcoin Core as never having bugs, a dangerous and inaccurate perception to have. Roughly a year and a half ago, motivated by these issues, the project revised and formalized its handling of security issues into a comprehensive disclosure policy and advisory process. The goals were to provide more transparency, set clear expectations for security researchers (providing them with an incentive to find and responsibly disclose vulnerabilities), better communicate the risks of running outdated versions, and make security bugs available to the wider group of contributors after disclosure to help learn from and prevent future ones.

Policy

All vulnerabilities should be reported to security@bitcoincore.org (see SECURITY.md for details). When reported, a vulnerability will be assigned a severity category. We differentiate between 4 classes of vulnerabilities:

Critical: Bugs that threaten the fundamental security and integrity of the entire Bitcoin network. These are bugs that allow for coin theft at the protocol level, the creation of coins outside of the specified issuance schedule, or permanent, network-wide chain splits.

High: Bugs with a significant impact on affected nodes or the network. These are typically exploitable remotely under default configurations and can cause widespread disruption.

Medium: Bugs that can noticeably degrade the network’s or a node’s performance or functionality, but are limited in their scope or exploitability. These might require special conditions to trigger, such as non-default settings, or result in service degradation rather than a complete node failure.

Low: Bugs that are challenging to exploit or have a minor impact on a node’s operation. They might only be triggerable under non-default configurations or from the local network, and do not pose an immediate or widespread threat.

Low severity vulnerabilities will be disclosed 2 weeks after the release of a major version containing the fix. Medium and High severity vulnerabilities will be disclosed 2 weeks after the last affected release goes End of Life (approximately a year after a major version containing the fix was first released).

A pre-announcement will be made two weeks prior to releasing the details of a vulnerability. This pre-announcement will coincide with the release of a new major version and contain the number of fixed vulnerabilities and their severity levels.

Critical bugs are not considered in the standard policy, as they would most likely require an ad-hoc procedure. Also, a bug may not be considered a vulnerability at all. Any reported issue may also be considered serious, yet not require embargo.

When a vulnerability is reported to the project, it is first verified and assessed by Bitcoin Core’s “Security Team”, a small group of long-term contributors with a track record of finding or fixing security bugs. The project categorizes vulnerabilities into four severity levels: Critical (threats to network integrity like coin theft or inflation), High (significant impact, remotely exploitable), Medium (performance degradation or limited scope), and Low (difficult to exploit with minor impact). If confirmed as serious, a fix is developed and thoroughly tested in private. The fix is then submitted as a pull request just like any other code change, but the PR description and discussion obfuscate the true nature of the fix. It might be framed as a refactoring, performance improvement, or hardening against potential issues. This allows the fix to go through normal code review while keeping the vulnerability details private.

This approach involves real tradeoffs, and it is a genuinely difficult balancing act to maintain. Critics might argue it’s paternalistic or that it concentrates too much power in the hands of a few developers who know about vulnerabilities before the public. These concerns deserve serious consideration, but the alternative of immediate public disclosure could be catastrophic. Publishing vulnerability details before most users have updated essentially provides attackers with both the target list (unupdated nodes) and the weapon (exploit code).

Fuzzing Infrastructure

Fuzzing is a testing technique that feeds randomized, malformed, or unexpected inputs to software to find bugs. Basically, continuously generate and mutate test cases automatically, feed them to the program, and watch for unexpected behavior such as crashes, hangs, logic bugs, etc.. Modern fuzzers use evolutionary algorithms to learn which inputs trigger interesting code paths, then mutate those inputs to explore deeper into the program. It’s an effective way to find edge case bugs that would be nearly impossible to discover through manual testing or code review at the same rate.

Because the fuzzer provides the inputs for this testing, the developer can’t directly assert expected outcomes (e.g., input A must yield output B). Instead, they make assertions about general properties the software should maintain. This is extremely valuable, as it allows us to build broader confidence in the desired behavior by testing properties such as preventing the node from crashing or ensuring the coin supply never inflates beyond what is expected.

Due to the critical need for correctness, robustness, and security, Bitcoin Core extensively utilizes fuzzing with various approaches. Throughout Bitcoin Core’s history, fuzz testing efforts have been ramping up. The earliest mentions of very primitive fuzzing date all the way back to 2012 and the integration of a simple fuzzing framework occurred in 2016, which evolved into today’s comprehensive framework with over 200 individual fuzz tests, covering critical individual components and functions of the codebase.

Unlike standard unit tests, fuzz tests do not have a defined “pass” point, i.e. you don’t run them once and get a “passed” or “failed” status in return. Because fuzzing is an ongoing random process, any statements about the results (when no flaws are found) can only be probabilistic. A fuzz test may run for 5000 hours without finding a bug, yet the next 5000 hours might uncover one. Consequently, to be effective, fuzz tests must be executed continuously. While Bitcoin Core leans on Google’s oss-fuzz infrastructure to run its fuzz tests, it also heavily invests in building out its own, with several contributors continuously fuzzing with their own setups. As an example, Brink’s infrastructure alone provides more than 1 million CPU hours per year to fuzzing Bitcoin Core.

While the Bitcoin Core repository has numerous fuzz tests at the component/function level, several external projects employ distinct fuzzing strategies. Cryptofuzz, now retired, focused on differentially fuzzing libsecp256k1 and other cryptographic code. For non-cryptographic code, such as serialization primitives, consensus logic, and wallet descriptor parsing, the project bitcoinfuzz uses a Bitcoin-specific differential fuzzing approach. A full-system fuzzing methodology to uncover bugs at the system level is also being developed with Fuzzamoto, mainly aimed at finding bugs arising from complicated interactions between different parts of the codebase interacting as a complete system.

Hundreds, if not thousands, of bugs have been found by fuzzing in released Bitcoin Core versions or pull requests throughout the years (obviously not all of them security relevant), highlighting the effectiveness and importance of fuzzing. A recently published high severity example is CVE-2024-35202, a remotely reachable crash bug found through fuzzing that could have enabled an attacker to crash all publicly reachable nodes. The discovery involved refactoring the compact block relay logic, extracting it into its own isolated and testable module and writing a fuzz test for it.

Quality Assurance

While fuzzing is highlighted above, the project employs various additional testing methodologies on a day-to-day basis, to further minimize the risk of issues reaching production code.

Bitcoin Core has hundreds of unit tests. These tests are designed to verify the expected behavior of small, isolated pieces of code, such as individual functions or classes. For instance, unit tests are used to verify the behavior of the proof-of-work verification function. These tests involve providing edge-case inputs to the function and testing whether the resulting outputs meet expectations.

Functional tests on the other hand test one or more Bitcoin Core instances as a whole, verifying behavior at a higher system level, by using the external interfaces of the software (e.g. RPCs, p2p messages) to simulate potential real world scenarios. Such a test could for example, spin up a small network of nodes, submit a transaction to one of them (e.g. using the wallet RPCs) and then verify whether or not all nodes in the test eventually observe and accept the transaction. Bitcoin Core historically lacked significant code modularity, a characteristic that persists in several areas. Consequently, the project has leaned more on a functional testing approach than a unit testing one, as it often requires refactoring code in advance to isolate the target code for testing independently.

Each testing methodology has its strengths and weaknesses. Unit tests are often fast to execute and are good at pin pointing where a bug is located, as their scope is small and well defined. However, by definition, they won’t detect bugs that only manifest from the interaction of multiple units. This is where the functional tests shine as they put the full system under test, which comes at the cost of execution speed, as they have to set up and tear down node instances on each test run. They are also much worse at indicating to the developer where a bug is located. Looking at the example above, if the transaction propagation test fails (i.e. the transaction did not propagate to all nodes), it is harder to tell which components of the system are buggy. It could be a bug in the mempool acceptance logic, the networking code, the RPCs used to create the transaction or any of the other components involved. No single method is the best, it is the combination of all methodologies that forges a piece of software with the highest likelihood of functioning correctly.

All tests are run within the CI on every PR and every push to the master branch. All unit, functional and fuzz tests (running previously generated inputs) are run across a matrix of different host operating systems, CPU architectures and various bug detection mechanisms, such as the sanitizers (Address, Thread, Undefined, Memory) and valgrind to catch common C++ bug classes relating to memory safety and undefined behavior.

Bitcoin Core incrementally evolved from the original client Satoshi released, with contributors coming and going as time went on, and as such contains a lot of legacy code. Refactoring existing code, to simplify and isolate it, has been and still is a large part of the work being done in the project. Whether it is the Kernel, a new p2p feature, performance improvements or preparation for putting more tests into place, all of it requires refactoring. Opinions on when and how to refactor are however divided, as it can be a double edged sword. While refactoring refreshes context for those involved, uncovers bugs and usually enables more testing, it can also be scary to touch code that no one understands anymore and may also lead to new bugs being introduced. Both the functional tests and other testing strategies at the system level (such as Fuzzamoto mentioned above in the fuzzing section) are ways to derisk refactoring efforts, as tests at that layer require little to no refactoring upfront.

Prior to major releases, as an additional testing strategy, the project produces a testing guide for users, developers and the community as a whole to manually test established and new features. Testing the software with typical usage is usually encouraged, as a call to action, to verify that individual users’ normal workflows remain functional.

The Core Issue: Keeping Bitcoin Core Secure 1 — Get your copy of The Core Issue today!

Don’t miss your chance to own The Core Issue — featuring articles written by many Core Developers explaining the projects they work on themselves!

This piece is the Letter from the Editor featured in the latest Print edition of Bitcoin Magazine, The Core Issue. We’re sharing it here as an early look at the ideas explored throughout the full issue.

This post The Core Issue: Keeping Bitcoin Core Secure first appeared on Bitcoin Magazine and is written by Niklas Gögge.