Vitalik: the limitation of blockchain scalability

Eth Chinese station 2021-06-19 00:32:29 阅读数:295

本文一共[544]字,预计阅读时长:1分钟~
vitalik limitation blockchain scalability

author :Vitalik Buterin

thank Felix Lange, Martin Swende, Marius van der Wijden and Mark Tyneway Feedback and proofreading .

How much scalability can we extend the blockchain to ? Is it really like Elon Musk That's what it says “ Block time accelerated ten times , The size of the block has increased tenfold and the handling fee has decreased a hundredfold ”, It will not lead to extreme centralization and violate the essential attributes of blockchain ? If the answer is no , So how far can we reach ? What happens if you change the formula algorithm ? what's more , If we introduce something like ZK-SNARK Or the function of the slice ? In theory, a fragmented blockchain can continuously add fragmentation , So can we really do that ?

The fact proved that , Whether or not a slice is used , There are important and very subtle technical factors that limit the scalability of the blockchain . Many situations have solutions , But even with the solution , There are also limitations . This article will explore many of these issues .

If you just raise the parameters , The problem seems to be solved . But what will we pay for it ?

vitalik

It is very important for the decentralization of blockchain that ordinary users can run nodes

Imagine two in the morning , You got an emergency call , From the other side of the world to help you run the mine ( Pledge pool ) People who . From about 14 It started minutes ago , Your pool and a few other people are separated from the chain , And the Internet is still there 79% Calculation power . According to your node , Most chain blocks are invalid . There is a balance error : The block seems to have wrongly 450 Ten thousand extra tokens were assigned to an unknown address .

An hour later , You and the other two participants in the same small mine 、 Some block browsers are in a chat room with the exchange side , See someone post a twitter link , It begins with “ Announce new on chain Sustainable Development Fund ”.

In the morning , The discussion was widely spread on Twitter and a community forum that didn't censor content . But then 450 A large part of the 10000 tokens have been converted into other assets on the chain , And billions of dollars in defi transaction .79% The consensus node of , And all the major blockchain browsers and light wallet endpoints follow the new chain . Maybe the new developer fund will fund some development , Or maybe all of this is being done by the leading mines 、 The exchange and its cronies . But whatever the outcome , The fund has actually become a fait accompli , Ordinary users can't resist .

Maybe there's another theme movie . Maybe it will be MolochDAO Or other organizations .

Will this happen in your blockchain ? The elite of your blockchain community , Including the mines 、 Block browsers and managed nodes , Maybe it's well coordinated , They're probably all in the same telegram Channel and wechat group . If they really want to make sudden changes to the rules of the agreement out of interest , Then they may have this ability . The Ethereum blockchain completely solved the consensus failure within ten hours , If it's a blockchain implemented by only one client , And you just need to deploy code changes to dozens of nodes , Then you can coordinate client code changes more quickly . The only reliable way to resist this kind of social collaboration attack is “ Passive defense ”, And this power comes from a decentralized group : user .

Imagine , If the user runs the verification node of the blockchain ( Whether it's direct verification or other indirect technologies ), And automatically reject blocks that violate protocol rules , Even more than 90% Of the miners or Pledgors supporting these blocks , How the story will develop .

If each user runs an authentication node , Then the attack will soon fail : Some of the pools and exchanges diverge , And it looks stupid in the whole process . But even if only a few users run the authentication node , And the attacker won't win . contrary , Attacks can lead to chaos , Different users will see different blockchain versions . In the worst case , The ensuing market panic and possible continued chain forking will significantly reduce the attacker's profits . The idea of responding to such a protracted conflict can in itself prevent most attacks .

Hasu On this point of view :

“ We need to be clear about one thing , The reason why we can resist malicious protocol changes , It's because of the culture of user authentication blockchain , Not because PoW or PoS.”

vitalik

Suppose your community has 37 Node operators , as well as 80000 A passive listener , Check signatures and block heads , Then the attacker wins . If everyone runs the node , The attacker will fail . We don't know exactly what the threshold for initiating population immunity against cooperative attacks is , But one thing is absolutely clear : The more good nodes , The fewer malicious nodes , And we need more than a few hundred .

So what's the upper limit of all node work ?

In order to enable as many users as possible to run the whole node , We'll focus on ordinary consumer hardware . Even if you can easily purchase dedicated hardware , This can lower the threshold of some full nodes , But in fact, the improvement of scalability is not as good as we think .

The ability of the whole node to handle a large number of transactions is mainly limited by three aspects :

  • Calculate the force : On the premise of safety , How much can we divide CPU To run the node ?
  • bandwidth : Based on the current network connection , How many bytes can a block contain ?
  • Storage : How much space can we ask users to use for storage ? Besides , How fast should it read ?( namely ,HDD Is it enough ? Or do we need SSD?)

Many uses “ Simple ” The wrong idea that technology has greatly expanded the capacity of the blockchain comes from the over optimistic estimation of these figures . We can discuss these three factors in turn :

Calculate the force

  • Wrong answer :100% Of CPU Should be used for block validation
  • right key : about 5-10% Of CPU Can be used for block validation

The four main reasons why the limit is so low are as follows :

  • We need a secure boundary to cover DoS The possibility of attack ( It takes longer processing time for attackers to create transactions by exploiting code vulnerabilities than conventional transactions )
  • Nodes need to be able to synchronize with the blockchain after they are offline . If I drop the line for a minute , Then I should be able to synchronize in a few seconds
  • The running node should not run out of battery quickly , We should not slow down other applications
  • Node also has other non block production work to do , Most of it is verification and verification of p2p Respond to incoming transactions and requests in the network

Please note that , Until recently, most of them were aimed at “ Why just 5-10%?” The explanation of this point focuses on a different issue : because PoW The time of the block is uncertain , It takes a long time to verify blocks , It increases the risk of creating multiple blocks at the same time . There are many ways to fix this problem , for example Bitcoin NG, Or use PoS Proof of interest . But these don't solve the other four problems , So they didn't make as much progress in scalability as many people expected .

Parallelism is not a panacea . Usually , Even clients that look like single threaded blockchains have been parallelized : The signature can be verified by one thread , Execution is done by other threads , And there is a separate thread processing the transaction pool logic in the background . And the closer all threads are used 100%, The more energy consumed by the running nodes , in the light of DoS The lower the safety factor is .

bandwidth

  • Wrong answer : If not 2-3 Every second produces 10 MB Block of , So most users have networks larger than 10 MB/ second , They can all deal with these blocks, of course
  • right key : Maybe we can do it every day 12 Second processing 1-5 MB Block of , But it's still hard

Now , We often hear widely spread statistics about how much bandwidth an Internet connection can provide :100 Mbps even to the extent that 1 Gbps The numbers are very common . But for the following reasons , There is a big difference between the claimed bandwidth and the expected actual bandwidth :

  1. “Mbps” Refer to “ Millions per second bits”; One bit It's a byte of 1/8, So we need to put the claim bit Number divided by 8 To get the number of bytes .
  2. Network operators , Just like other companies , Often make up lies .
  3. There are always multiple applications using the same network connection , So the node can't monopolize the whole bandwidth .
  4. P2P The network inevitably introduces overhead : Nodes usually end up downloading and re uploading the same block multiple times ( Not to mention that transactions have to go through before they are packaged into blocks mempool Broadcast ).

When Starkware stay 2019 When an experiment was conducted in , They're trading data gas For the first time since the cost reduction 500 kB Block of , Some nodes can't actually handle blocks of this size . The ability to handle large blocks has improved and will continue to improve . But whatever we do , We still can't get to MB/ The average bandwidth in seconds , Convince ourselves that we can accept 1 Second delay , And have the ability to handle blocks of that size .

Storage

  • Wrong answer :10 TB
  • right key :512 GB

As you may have guessed , The main argument here is the same as elsewhere : The difference between theory and practice . Theoretically , We can buy it on Amazon 8 TB Solid state drive ( It does SSD or NVME;HDD Too slow for blockchain state storage ). actually , The laptop I use to write this blog is 512 GB, If you ask people to buy hardware , A lot of people get lazy ( Or they can't afford 800 The dollar 8 TB SSD) And use centralized services . Even if the blockchain can be installed on a storage device , A lot of activity can also quickly deplete disks and force you to buy new ones .

A group of blockchain protocol researchers surveyed everyone's disk space . I know the sample size is very small , But still ...

vitalik

Besides , The storage size determines the time required for a new node to go online and start participating in the network . Any data that the existing node must store is the data that the new node must download . This initial synchronization time ( And bandwidth ) It's also the main obstacle for users to be able to run nodes . In writing this blog post , Synchronize a new geth It took me about 15 Hours . If Ethereum usage increases 10 times , So synchronize a new geth Nodes will take at least a week , And it is more likely to restrict the Internet connection of nodes . This is more important during the attack , When the user has not run the node before, the successful response to the attack requires the user to enable a new node .

Interaction effects

Besides , There are interaction effects among these three types of costs . Because the database uses tree structure to store and retrieve data inside , So the cost of getting data from a database increases with the logarithm of the database size . in fact , Because the top ( Or previous levels ) It can be cached in RAM in , So the cost of disk access is directly proportional to the size of the database , yes RAM Multiple of the size of the cache data in .

Don't take this picture literally , Different databases work in different ways , Usually the part in memory is just a single ( But it's big ) The layer ( See leveldb Used in LSM Trees ). But the basic principle is the same .

vitalik

for example , If the cache is 4 GB, And we assume that each layer of the database is larger than the previous one 4 times , So Ethereum's current ~64 GB The state will need to ~2 visit . But if the state size increases 4 Times to ~256 GB, So this will increase to ~3 visit . therefore ,gas The upper limit increases 4 Times can actually be translated into a block verification time increase of about 6 times . The impact could be even greater : It takes longer to read and write when the hard disk is full than when it is idle .

What does this mean for Ethereum ?

Now in the Ethereum blockchain , Running a node is already a challenge for many users , Although it is possible to use at least conventional hardware ( I just synced a node on my laptop when I wrote this article !). therefore , We're going to have a bottleneck . Core developers are most concerned about the storage size . therefore , Great efforts are currently being made to solve the computing and data bottlenecks , Even changes to consensus algorithms , Are unlikely to bring gas limit A big increase in . Even if we solve the biggest problem of Ethereum DoS weaknesses , Also can only be gas limit Improve 20%.

For the problem of storage size , The only solution is stateless and stateful . Stateless makes it possible for node groups to verify without maintaining persistent storage . State overdue deactivates states that have not been accessed recently , The user needs to manually provide proof to update . These two paths have been studied for a long time , And we've started the implementation of the stateless proof of concept . The combination of these two improvements can greatly ease these concerns , And for significant improvement gas limit Open up space . But even after the implementation of stateless and state overdue ,gas limit Or it could only be safely raised about 3 times , Until other restrictions come into play .

Another possible medium-term solution is to use ZK-SNARKs To verify the transaction .ZK-SNARKs It can ensure that ordinary users do not need personal storage state or verification block , Even if they still need to download all the data in the block to defend against data unavailability attacks . in addition , Even if an attacker can't force an invalid block , But if it's too difficult to run a consensus node , There is still a risk of coordinated censorship attacks . therefore ,ZK-SNARKs It can't improve the capability of nodes indefinitely , But it can still be greatly improved ( May be 1-2 An order of magnitude ). Some blockchains are layer1 Explore the form on the Internet , Ethereum goes through layer2 agreement ( Also called ZK rollups) To benefit , for example zksync, Loopring and Starknet.

What will happen after splitting ?

Fragmentation fundamentally solves the above limitations , Because it decouples the data contained in the blockchain from the data that a single node needs to process and store . Node verification blocks are not downloaded and executed in person , Instead, it uses advanced mathematical and cryptographic techniques to indirectly verify blocks .

therefore , A partitioned block chain can safely have a very high level of throughput that a non partitioned block chain cannot achieve . This really requires a lot of cryptography techniques to effectively replace naive full verification , To reject invalid blocks , But it can be done : The theory already has a foundation , And a proof of concept based on the draft specification is already underway .

vitalik

Ethereum plans to adopt Quadratic partition (quadratic sharding), Total scalability is limited by the fact that : Nodes must be able to handle a single slice and beacon chain at the same time , The beacon chain must perform some fixed management work for each fragment . If the slice is too big , Nodes can no longer handle a single slice , If there are too many pieces , Nodes can no longer handle beacon chains . The product of these two constraints forms the upper bound .

As you can imagine , By cubic partition or even exponential partition , We can go further . In such a design , Data availability sampling is bound to become more complex , But it can be done . But Ethereum doesn't go beyond the quadratic , The reason lies in , From transaction fragmentation to transaction fragmentation, the additional scalability gains can not be realized on the premise that other risk levels are acceptable .

So what are these risks ?

Minimum number of users

As you can imagine , As long as one user is willing to participate , The chain can be run without division . But this is not the case with the sub block chain : A single node can't handle the whole chain , So enough nodes are needed to work together on the blockchain . If each node can handle 50 TPS, And the chain can handle 10000 TPS, So the chain needs at least 200 Only nodes can survive . If the chain at any time is less than 200 Nodes , There may be a situation where the nodes are no longer synchronized , Or the node stops detecting invalid blocks , Or there could be a lot of other bad things , It depends on the settings of the node software .

In practice , Because of the need for redundancy ( Including data availability sampling ), The minimum number of safe is more than simple “ chain TPS Divided by nodes TPS” How many times higher , For the example above , Let's set it to bit 1000 Nodes .

If the capacity of the block chain increases 10 times , The minimum number of users will also increase 10 times . Now you may ask : Why don't we start with a lower capacity , Add when there are many users , Because this is what we really need , When the number of users falls back, the capacity will be reduced ?

Here are a few questions :

  1. The blockchain itself cannot reliably detect how many unique users there are on it , So we need some kind of governance to detect and set the number of slices . Governance of capacity constraints can easily be a source of fragmentation and conflict .
  2. What if many users suddenly and accidentally drop the line at the same time ?
  3. Increase the minimum number of users required to start the fork , Making it harder to defend against malicious control .

The minimum number of users is 1,000, It's almost no problem . On the other hand , The minimum number of users is 100 ten thousand , It can't be . Even if the minimum number of users is 10,000 It can also be said that it is beginning to become risky . therefore , It seems difficult to justify a chain of blocks with more than a few hundred blocks .

Historical retrievability

The important attribute of blockchain that users really cherish is Permanence . When the company goes bankrupt or when it is no longer profitable to maintain the ecosystem , The digital assets stored on the server will be in 10 No longer exists within the year . And Ethereum NFT It's permanent .

Yes , To 2372 People will still be able to download and look up your encrypted cat .

vitalik

But once the capacity of the blockchain is too high , It becomes more difficult to store all this data , Until at some point there's a huge risk , Some historical data will eventually …… No one's storing .

It's easy to quantify this risk . With the data capacity of the blockchain (MB/sec) In units of , multiply ~30 Get the amount of data stored each year (TB). The data capacity of the current fragmentation plan is about 1.3 MB/ second , So it's about 40 TB/ year . If you add 10 times , Then for 400 TB/ year . If we don't just want access to data , And in a convenient way , We also need metadata ( For example, unzip aggregate transactions ), So every year we reach 4 PB, Or ten years later 40 PB.Internet Archive ( Internet Archives ) Use 50 PB. So this can be said to be the upper limit of the safe size of the block chain .

therefore , It seems that in these two dimensions , Ethereum chip design is actually very close to the reasonable maximum safety value . The constant can be increased a little bit , But not too much .

Conclusion

There are two ways to try to expand blockchain : Basic technical improvement and simple upgrading of parameters . First , Raising the parameters sounds attractive : If you're doing math on paper , It's easy to convince yourself that a consumer laptop can handle thousands of transactions per second , Unwanted ZK-SNARK、rollups Or slice . Unfortunately , There are many subtle reasons why this approach is fundamentally flawed .

Computers running blockchain nodes are not available 100% Of CPU To verify the blockchain ; They need a large margin of safety against unexpected accidents DoS attack , They need spare capacity to perform tasks like processing transactions in the memory pool , And users don't want to run the node on the computer and can't use it for any other application at the same time . Bandwidth will also be limited :10 MB/s Just because you're connected doesn't mean you can process it every second 10 MB Block of ! Maybe every 12 Seconds to process 1-5 MB The block . Storage is the same , It is not a solution to improve the hardware requirements of running nodes and limit the number of dedicated node operators . For decentralized blockchains , Ordinary users can run nodes and form a culture , Running nodes is a common behavior , This is crucial .

However , Basic technical improvements are feasible . At present , The main bottleneck of Ethereum is the storage size , Statelessness and state overdue can solve this problem , And make it grow at most about 3 times , But no more , Because we want to make it easier to run nodes than it is right now . Blockchain with fragmentation can be further expanded , Because a single node in the block chain does not need to process every transaction . But even a chain of blocks , There are limits to capacity : As capacity increases , The minimum number of secure users has increased , The cost of archiving blockchain ( And if no one archives the chain , There is a risk of data loss ) It will rise . But we don't have to worry too much : These restrictions are enough for us to process more than one million transactions per second while ensuring the complete security of the blockchain . But to achieve this without compromising blockchain's most valuable decentralization features , We need to do more .

Link to the original text :https://vitalik.ca/general/2021/05/23/scaling.html

版权声明:本文为[Eth Chinese station]所创,转载请带上原文链接,感谢。 https://netfreeman.com/2021/06/20210604195029101a.html