The DAO soft-fork try was tough. Not solely did it end up that we underestimated the unwanted side effects to the consensus protocol (i.e. DoS vulnerability), however we additionally managed to introduce an information race right into a rushed deployment that was a ticking time bomb. It wasn’t perfect, and though it was averted within the final resort, the fast-approaching hard-fork deadline appeared eerily bleak to say the least. We wanted a brand new technique…
The springboard to this was an thought borrowed from Google (courtesy of Nick Johnson): writing detailed postmortem occasions, with the goal of assessing the foundation causes of the issue, focusing solely on technical facets and acceptable measures to forestall recurrence.
Technical options scale and final; blaming individuals doesn’t. ~ Nick
From the post-mortem, an attention-grabbing discovery was comprised of the angle of this weblog submit. On the soft-fork inside [go-ethereum](https://github.com/ethereum/go-ethereum) appeared strong from all views: a) it was totally coated by unit exams with a 3:1 check to code ratio; b) it has been totally reviewed by six basis builders; ic) was even manually examined reside on a personal community… Nonetheless, a deadly knowledge race remained, probably inflicting a severe community outage.
It seems that the bug can solely happen in a community that consists of a number of nodes, a number of miners, and a number of blocks being mined concurrently. Even when all these situations have been true, there was solely a small probability that the bug would happen. Unit exams cannot catch it, code reviewers could or could not catch it, and guide testing is not more likely to catch it. Our conclusion was that improvement groups wanted extra instruments to run repeatable exams that will cowl the intricate interaction of a number of nodes in a concurrent networked situation. With out such a software, manually checking varied edge circumstances is cumbersome; and with out conducting these checks repeatedly as a part of the event workflow, uncommon errors would grow to be inconceivable to detect in time.
And in that approach, hive was born…
What’s a hive?
Ethereum has grown to the purpose the place testing implementations have grow to be a serious burden. Unit exams are good for checking varied implementation flaws, however checking {that a} consumer conforms to some primary high quality, or checking that purchasers can play properly collectively in a multi-client setting, is something however easy.
Hive serves as an simply expandable check belt the place anybody can add exams (both easy validations or community simulations) to the any programming language that fits them, and the hive ought to have the ability to run these exams on the identical time all potential purchasers. As such, the harness is meant for black field testing the place no client-specific inside particulars/circumstances may be examined and/or inspected, however emphasis could be positioned on adherence to official specs or conduct beneath varied circumstances.
Most significantly, Hive is designed from the bottom as much as work as a part of any consumer’s CI workflow!
How does the hive work?
Hive’s physique and soul are [docker](https://www.docker.com/). Every consumer implementation is a docker picture; every validation bundle is a docker picture; and every community simulation is a docker picture. The hive itself is a complete docker picture. It is a very highly effective abstraction…
From Ethereum purchasers are docker photographs in a hive, consumer builders can put collectively the very best setting for his or her purchasers to work in (when it comes to dependencies, instruments, and configuration). Hive will run as many cases as wanted, all working on their very own Linux methods.
Related, like check packages Ethereum validation purchasers are docker photographs, the check author can use no matter programming setting he’s most conversant in. Hive will guarantee that the consumer is working when it begins the tester, which might then confirm {that a} explicit consumer conforms to some desired conduct.
Final, community simulations they’re redefined by docker photographs, however in comparison with easy exams, simulators not solely execute code on a working consumer, however can begin and cease purchasers at will. These purchasers work in the identical digital community and may freely (or as dictated by the simulator container) join to one another, forming a personal Ethereum community on demand.
How did the hive assist the fork?
Hive shouldn’t be an alternative to unit testing or an intensive evaluate. All present utilized practices are obligatory for a clear implementation of any function. Hive can present validation past what is possible from a median developer’s perspective: working intensive exams which will require complicated execution environments; and checking community nook enclosures that may take hours to arrange.
Within the case of the DAO hard-fork, aside from all of the consensus and unit exams, we had to make sure most significantly that the nodes have been clearly divided into two subsets on the community degree: one supporting and one opposing the fork. This was vital as a result of it’s inconceivable to foretell what adverse results the launch of two competing chains in a single community may have, particularly from a minority perspective.
Due to this fact, we applied three particular community simulations within the hive:
-
First to confirm that miners working full Ethash DAGs are producing the right further block knowledge fields for each pro-forkers and no-forkers, even when attempting to pretend naively.
-
Different to confirm {that a} community consisting of blended pro-fork and no-fork nodes/miners appropriately splits in two when the fork block arrives, additionally sustaining the break up afterwards.
-
the third to test if, given an already forked community, newly joined nodes can sync, quick sync and lightweight sync with a sequence of your selection.
Nonetheless, an attention-grabbing query is: did the hive truly catch any errors, or did it simply function a further affirmation that every thing was advantageous? And the reply is, each. Hive caught three bugs not linked to forks in Geth, but additionally significantly aided Geth’s hard-fork improvement by repeatedly offering suggestions on how the modifications affected the community’s conduct.
There was some criticism of the go-ethereum workforce for taking time to implement the hard-fork. Hopefully individuals will now see what we have been as much as, whereas concurrently implementing the fork itself. All in all, I imagine hive it seems to play fairly an vital position within the purity of this transition.
What’s the way forward for the hive?
Group Options Ethereum GitHub [4 test tools already](https://github.com/ethereum?utf8=%E2percent9Cpercent93&question=check), with at the least one EVM benchmark software being cooked up in some exterior repository. They don’t seem to be absolutely exploited. They’ve tons of dependencies, generate tons of rubbish, and are very difficult to make use of.
With Hive, we goal to mixture all of the totally different scattered exams into one common consumer validator which has minimal dependencies, may be prolonged by anybody, and may be run as a part of a consumer developer’s every day CI workflow.
We welcome anybody to contribute to the venture, whether or not it is including new purchasers for validation, validators for testing, or simulators for locating attention-grabbing networking issues. Within the meantime, we’ll attempt to additional enhance the hive itself, including assist for working benchmarks in addition to mixed-client simulations.
With little or no work, we could even have assist for working Hive within the cloud, permitting it to run community simulations at a way more attention-grabbing degree.