Problems and challenges of blockchain storage

User 9624935 2022-04-03 05:11:04 阅读数:32

problems challenges blockchain storage

Distributed storage Summit 2019 year 8 month 23 Held in Berlin on the th , The meeting gathered IPFS、Sia、Storj、ethereum swarm、Arweave、Filecoin And other mainstream projects in blockchain storage , It can be said to be a rare event .

This article is based on Jacob Eberhardt The contents of the opening speech at the conference are sorted out . Interpretation with pictures , In addition, the thinking depth and content scope of the original content ( From data storage to database ), I believe it will bring some enlightening thoughts to readers .

The content is divided into two parts , The first part is about decentralized storage .

Storage can be understood as a kind of memory , A memory for retrievable data . say concretely , It's storing files , That is, storing unexplainable data blocks .

The simplest example is that the client stores data in the local file system , This scheme has some risks :

  • Become a single point of failure of the local file system
  • When other clients request data , The local file system becomes a bottleneck
  • The client itself is responsible for data security

Due to the limitations of local storage , The scheme of centralized storage is proposed , This scheme can be remote , It can also be local . A single storage provider through API Encapsulate details , The client through the storage provider API perform CRUD operation , As shown in the figure above .

In this centralized storage scheme , There are some roles in the system :

  • client : Download and upload by connecting to the server , Like the one above Uploader and Downloader.
  • Server side : That is, the storage provider , With simple API Encapsulate details .

A typical case of centralized storage is Amazon's S3. We can see some characteristics of centralized storage :

  • 11 individual 9 The usability of
  • 22 dollar / Every time TB/ monthly
  • Client side encryption and server side encryption
  • Approximately linear capacity expansion is realized by adding hardware
  • Complex back-end systems , ordinary API
  • Trusted read write : Usability 、 Persistence and security
  • Economic incentives Amazon behavior compliance

The architecture of decentralized storage is that one client can connect to multiple servers ( Storage provider ).

In a decentralized storage system , There are no clients and servers , Only nodes and peer nodes :

  • node : That is, clients in centralized storage , But it can also provide storage services , Like the one above Uploader、Downloader and Storage Provider.
  • Peer node : That is, the server side in centralized storage , But it can be multiple storage providers , These storage providers can not only provide storage services , You can also upload and download data . As shown in the figure above .

Systems like decentralized storage architectures already exist , such as 2001 Year of Bittorrent and 2015 Year of IPFS:

  • Bittorrent Is a peer-to-peer file sharing network ,torrrent The file contains checksums and links to tracking nodes ( The tracking node can forward to the seed node ).
  • IPFS Is a content addressed point-to-point decentralized file system , Users decide whether to bind files and cache them locally .

Bittorrent and IPFS It is a typical decentralized storage system without incentive , in other words , Completely decentralized , But lack of motivation . In such a system , Peer nodes can shut down at any time , The file is lost , The request of the client node may also be rejected , And expect symmetrical participation models ( For example, peer-to-peer leech blood sucking problem and free riding problem ). All in all , In such a system , No usability 、 Guarantee of durability and performance .

The proposal of blockchain storage solves the problem of lack of incentive in decentralized storage system , Actually , Blockchain storage can also be understood as decentralized storage with incentives , The basic model is shown in the figure above . Because in the scenario of each specific storage operation , The operation between peer nodes still exists in the client and server model , Therefore, the above figure uses the identification of the client , The arrow between the client and server is bidirectional .

In such a system , Use encryption economic protocol to ensure the required properties of storage system , And use blockchain to support these protocols . When designing blockchain storage system , There are two different goals to consider :

  • Decentralized storage services designed for end users : Establish a storage contract between the client and the storage provider , The client is based on a specific storage interval and time SLA Payment cost .
  • Permanent data archiving : The protocol guarantees that no data is permanently stored , Will not be lost .

Challenges in centralized storage , It still exists in blockchain storage , This mainly refers to the non excitation system :

  • Encryption economic protocol needs to guarantee : persistence 、 Usability 、 expenses 、...
  • Specific challenges of non incentive systems : Security 、 Extensibility 、 performance 、 User experience 、...

For the sake of simplicity , The encryption economy protocol can be called the protocol part of blockchain storage , The non incentive system is called the blockchain storage part .

Persistence is the probability that data will last forever . In case of storage provider failure , Data cannot be lost . The traditional methods are replication scheme and erasure code scheme , Is there any other way to maintain persistence ?

Availability is when the system is called , Probability of successful response . In case of storage provider failure , Data can also be accessed . Here are some challenges :

  • Without retrieving data , How to ensure the availability of data ?
  • How to define SLA?
  • The cost of data retrieval : Retrieve the payment channel protocol of previous payment fees and encryption Economics .
  • Can achieve 11 individual 9 Availability of ? Is there such a demand ?

One of the challenges in the protocol part is mainly the attack on persistence and availability :

  • Witches attack : Storage provider forges multiple storage identities , In other words, only one copy is actually stored , But you can get multiple copies of data storage costs .
  • Outsourcing attacks : The storage provider claims to have stored data but does not actually store , Only when the client requests data , Quickly retrieve data from other storage providers temporarily .
  • Generative attack : The storage provider claims to store a lot of data, but it doesn't actually store , Only when the client requests data , Temporary program to quickly generate data . Generate attacks on filecoin In this way, the incentive layer that relies on a large amount of data storage has a higher risk of attack .

Another question : How to ensure that a data copy is an independent copy ?

One of the challenges in the motivation part is mainly :

  • How to reliably detect malicious participants ? The client requests data , Data provided by storage provider , But the customer claims it never got the data .
  • How to choose an incentive engine ? Blockchain or other engines ? There is a trade-off between trust and performance .
  • How to ensure that the agreement works correctly ? Game theory proof and incentive based evidence .
  • How to design incentive scheme ?

Another challenge in the protocol part is overhead .

  • Free things in centralized storage , Additional overhead in encryption economy protocol , For example, transaction fees and coordination expenses of blockchain
  • There are few large storage providers with centralized storage , Whether there are enough competitors in the market ?
  • Use existing hardware : Unused storage space for end users and extremely low editing costs
  • How to compete with today's storage prices ?

One of the challenges in the storage part is security .

  • Large storage providers have many security experts . How to protect decentralized storage providers ? What are the main risks ? Data loss ? Data theft ?DDoS attack ?
  • Key management : The server does not need encryption ( Encrypt on the client side ), How the client manages the key ? There is a risk of key loss , Whether the key recovery process is required ?
  • Encryption is the default option ? Storage providers are not trusted , How to share encrypted data ?

One of the challenges in the storage part is scalability .

  • Centralized storage can scale linearly . Whether decentralized storage can achieve linear expansion ? Where is the bottleneck hindering the linear expansion of decentralized storage ? Blockchain ?
  • Centralized storage can handle PB Level data . How much data is currently stored in decentralized storage ? What are the theoretical limits ? As the amount of data increases , Which storage properties will decline ? For example, delay attribute ?

One of the challenges in the storage sector is performance .

  • Delay . Centralized storage can be quickly connected to the backbone of the Internet , What about decentralized storage systems ? Can we control the physical location of data storage to meet the delay requirements ? Cloud storage can bring data close to applications to reduce latency , Can the same scheme be applied in decentralized storage ?
  • throughput . Centralized storage providers usually rebind before service delivery API The data behind it , Can parallel block retrieval and client rebinding improve latency in decentralized storage systems ?

One of the challenges in the storage part is the user experience . Centralized storage is usually simplified to API, So for decentralized storage :

  • What needs to be done before users use the system ? Sync blockchain ? Apply for one tocken? Install a wallet ?
  • How the stored files are embedded in the application ? Website ?DAPP?

Other questions :

  • To what extent are blockchains and storage systems tightly coupled ? such as filecoin Middle space time proves , In Ethereum swarm?
  • Whether the participation involves legal issues ? What if the storage provider stores the illegal data uploaded by the customer ? How to comply ? How to be compatible DGPR?
  • How to update data ? When data is updated , Renegotiate with all copies of data ? Or stored as a new file ? Storing as a new file is a huge overhead !

Another major challenge in storage is database systems .

From decentralized storage to decentralized database system , There is still a long way to go

  • Structured changeable data : Must be able to update , The interface is more complex , not only CRUD.
  • The incentive agreement is to ensure the correctness of the treatment , It is not enough to guarantee integrity through checksum .
  • Query support : Distributed join and schema Inverse regularization of .

Compared with traditional data systems , The system design is limited to CAP theory .

All in all ,2007 year ,P2P File sharing takes up the Internet 50% The bandwidth of the ,2018 year ,Bittorrent The download traffic of takes up the of the Internet 3%. Decentralization is valuable to all end users ? Or is a decentralized solution better than a centralized solution in some ways ? For example, cheaper or safer ?

above , yes Jacob Eberhardt The main content shared . Many of the above questions are directly referred to as each of the day panel Topic index , The answers to each item do not give a perfect answer , I believe these problems are still the research direction of various projects in the next few years .

版权声明:本文为[User 9624935]所创,转载请带上原文链接,感谢。