Blockchain and shared database

Qian Weining , Jin cheqing , Shao Qifeng , Zhou Aoying

School of data science and engineering, East China Normal University , Shanghai 200062

Abstract : Blockchain can achieve centreless 、 Highly credible account management , It has successfully supported the development of bitcoin and other financial applications . The essence of blockchain is trusted data management in an incomplete trusted environment , It has decentralization 、 tamper-proof 、 Strong consistency and integrity . meanwhile , Blockchain also has weak data management function 、 Low performance . By comparing blockchain with traditional data management technology , analysis 3 This is a typical blockchain application outside the financial field , Explore new research issues on blockchain , And discuss the application oriented to specific fields , Research and development of shared database system ( That is to support the core business , Supporting the business model of sharing economy , Even the database itself is implemented in the way of sharing economy ) The necessity of .
key word : Blockchain ; Shared database ; Data management

Citation format of the paper :

Qian Weining , Jin cheqing , Shao Qifeng , Zhou Aoying . Blockchain and shared database . big data [J], 2018, 4(1):36-45

QIAN W N, JIN C Q, SHAO Q F, ZHOU A Y. Blockchain and sharing database.Big Data Research[J], 2018, 4(1): 36-45

1 Blockchain

since 2008 year 10 month 31 Signed by “ Nakamoto ” Bitcoin (Bitcoin) article ) Since its publication , Cryptocurrency has shown how to build a large 、 The possibility of decentralized distributed ledgers .2014 year 10 month 22 Japan , At the British Library In the seminar of Shengbao bank , A number of spokesmen agreed that behind the trend of bitcoin , Blockchain (blockchain) It's really interesting technology . Almost simultaneously , Relative to bitcoin “ Blockchain 1.0” technology , Is considered to be “ Blockchain 2.0” Technical representative's Ethereum project release , and Hyperledger The project was followed by 2015 Released in . today , Blockchain has become a supporting technology for a large number of applications .
Although blockchain technology is developing rapidly , Blockchain system 、 platform 、 Applications emerge in endlessly , But most of them have the following 5 Characteristics .
First , They all have chain structures , Pictured 1 Shown . Data or transaction information is organized into blocks ; A series of blockchains ; Through the digital signature of the forerunner block , And put the signature in the following block , structure 、 Maintain the link relationship between blocks . The sequential organization of intra block transaction information and the chain structure between blocks can accurately record transaction flow , Realize the function of account book .

 Blockchain and shared database

chart 1 The chain structure of blockchain

secondly , Blockchain is tamper proof . It is commonly used in the block Merkle-tree Or its variants generate block summary information , It is used to check the correctness of block content , And because the signature of the forward block is a part of the subsequent block , So a block actually contains a summary of all the information from the beginning of the chain , It can be used to check whether the previous information has been tampered with . In other words , To modify a transaction that has been recorded in the blockchain , You need to modify all the contents of the block in which it is located , This often requires a huge amount of computation or the cooperation of a large number of nodes in the system , So it's usually difficult to achieve , So as to achieve tamper proof .
Third , It's distributed storage 、 To the center , Independent of a single central node . The blockchain is stored locally on multiple nodes . The update of the blockchain needs to keep the replica updated synchronously . According to need , The distributed consensus protocol between nodes can be proved by workload (proof of work,POW)、 Practical Byzantine fault tolerance (practical Byzantine fault tolerance, PBFT)、 byzantine fault tolerance Paxos Or proof of interest (proof of stake,POS) etc. . Although the decentralized architecture gets rid of the problem of single point of failure , Enhance the robustness of the system and tamper proof ability , But at the same time, distributed consensus protocol also leads to large data modification delay and low system throughput .
Fourth , Although the blockchain supporting bitcoin can only support simple transaction records and queries , But most of the new blockchain platforms support smart contracts . Smart contract means “ Commitments defined in digital form , Agreements including those on which the parties to the contract may carry out these commitments ”. It is often implemented in Turing complete general programming language or special language , It's used to define the complex logic of the business platform . The wrong implementation of smart contract may cause serious system security problems .
The fifth , Another important feature of current blockchain technologies and systems is that they are often used in financial applications ( Such as cryptocurrency 、 Distributed ledger 、 Document management 、 Initial token offering and crowdfunding 、 Charity is closely related to . Blockchain technology to distributed 、 Point to point , Provides a credible ledger management function .
At present, a lot of work has been done to explore the basic theory of blockchain 、 Implementation method 、 Application model . This paper attempts to comb the blockchain technology from the perspective of data management , And from 3 Starting from a blockchain application , Discuss the needs and challenges of blockchain technology research .

2 The essence of data management

Before discussing the data management of blockchain , First of all, it briefly introduces the core issues of data management .
Data management in a broad sense includes data acquisition 、 Storage 、 Handle 、 Utilization and so on . Data management tasks are usually performed by Database management system (database management system,DBMS) And related tools . since 20 century 70 Since the birth of relational database theory in the s , Relational database management system (relational database management system,RDBMS) Because of its application in all kinds of data management , especially “ Key tasks (missioncritical)” Good usability in application 、 Versatility and performance , Become the first or even the only choice for a large number of data management tasks . With RDBMS The growth of the industry , Database theory and storage 、 Indexes 、 Query execution 、 Query optimization 、 Transaction processing 、 A series of database technologies such as concurrency control are developing rapidly .
The core problem of data management includes the modeling of data and its processing methods 、 Data management task implementation and management 、 System performance optimization and its implementation 、 The operation and maintenance of the system .

2.1 Data model abstraction

Data model management is an important task of data management . Data model includes data structure 、 Data manipulation and data integrity constraints . It's because it provides an abstraction of the data model , Data management system can serve different applications , Realize the increase of data in a unified form 、 Delete 、 Change 、 Check function .
The most widely used data model is the relational model (relational model). It takes set theory and mathematical logic as its theoretical basis , Will be widely accepted and used SQL Language for data definition 、 Data manipulation 、 Data control and transaction control .SQL Languages are declarative languages , Compared with procedural language , It simplifies the process for developers to write database applications .
Because of the great success of RDBMS , A lot of times , When we talk about data management, we mean adopting RDBMS Data management . The data model used in the actual data management system is often an extension of the relational model , Such as object — relational model (object-relational model), It adds user-defined types to the relational model (UDT)、 User defined function (UDF)、 trigger (trigger) And so on .

2.2 Data processing abstraction

A data model is an abstraction of data , Transactions are abstractions of data processing flow . stay RDBMS in , Business is also done by SQL Language implementation . Transactions need to satisfy transaction semantics , namely “ACID” nature , Refers to the atomicity of a transaction (atomicity)、 Uniformity (consistency)、 Isolation, (isolation) And persistence (durability). It's because of transactions , The data management system can realize the data management in the key task application represented by bookkeeping and booking , While making full use of system hardware resources , Achieve right and efficient ( Low latency 、 High throughput ) Data processing of .
In order to achieve transaction processing ,DBMS Provides concurrency control and recovery mechanisms , The former is mainly used to ensure the consistency and isolation of transactions , The latter mainly guarantees atomicity and persistence .DBMS It is often necessary to maintain the consistency of data among multiple copies in the system , Such as consistency between multiple storage nodes or between disk and cache . These copies exist within a relatively trusted system environment , Therefore, the requirement of consistency maintenance is different from that of distributed consensus mechanism in blockchain .
In the recovery mechanism , Database log is often used to record the operation of data and the submission of transaction 、 Termination operation . The database log is similar to the transaction flow recorded sequentially in the blockchain in form . The difference between them is , DBMS The log storage medium in is trusted , Generally, signature is not used to protect the whole log sequence against tampering . in addition ,DBMS Logs in are usually used only for database recovery , And in many blockchain platforms , Transaction records are the only data , Therefore, it is also the object of data query .
DBMS Provides various forms of transaction interfaces . Stored procedure is a common form of transaction , It is a pre written transaction program , Store on server , Executed after being called by the client , And return the execution result to the client after the execution .
In order to solve the problem between the client program written in procedural language and the database server's collection data access in declarative language “ Impedance mismatch (impedance mismatch)” problem ,DBMS A cursor is usually provided (cursor) function , For the client program to record line by line as a unit to interact with the database server .

2.3 Independence and transparency

DBMS The interface provided is declarative , It is realized by the system itself . At the time of implementation , The system provides three-tier view and two-tier mapping , The view ( External mode )— Conceptual model ( Pattern )— The physical model ( Internal mode ) The mapping between the three , Pictured 2 Shown . such , When the storage organization of data changes or application requirements change , Just modify the corresponding schema mapping relationship , No need to modify the rest of the system , So as to save the system and application development and maintenance costs .

 Blockchain and shared database

chart 2 Three tier view of data management 、 Two level mapping

2.4 performance

Provide data management independence and transparency at the same time , DBMS Isolate application developers from the details of query execution and transaction execution , Take on most of the performance optimization problems . And the performance is DBMS The key problem of data management . One of the earliest RDBMS——S ystem R One of the main developers of Bruce Lindsay The most important thing in the database world is system performance . modern DBMS Through caching 、 Indexes 、 Query execution 、 Query optimization 、 Concurrency control and other technologies , Realize the plan optimization and execution optimization of query and transaction , Pictured 3 Shown . In recent years , With a lot of memory 、 High-speed networks 、 Multicore / Rapid advances in multicore processor technology , modern DBMS Also often through the memory database 、 Distributed data storage 、 The parallel execution of query and transaction improves the system performance .

 Blockchain and shared database

chart 3 Data management system functional architecture overview

2.5 Tools and programming interfaces

In addition to data schema management 、 Beyond queries and transactions ,DBMS Management of 、 Operation and maintenance tools are also an important aspect of data management .1998 Turing prize winner in Jim Gray Think , Easy to use 、 Easy management is an important goal of data management system . Besides , With the development of Internet technology in recent years , The scale of applications is getting bigger and bigger , The subsystems involved 、 The number of data sources is also increasing , So data integration is also an important aspect of data management , Special tools are needed DBMS Use .

3 Blockchain as a data management system

From a data management perspective , Blockchain is a platform built on peer-to-peer networks 、 Trusted data management system with chain storage . Compare blockchain with traditional data management system , Help to discover the basic theory of blockchain data management system 、 New research issues of implementation methods , It also helps to find new applications for this new data management system , To transform existing technologies and systems 、 Adapt to new applications to provide inspiration .

3.1 Technical comparison

surface 1 It lists blockchain and traditional RDBMS The main similarities and differences of . First , Both have a chain structure of sequential organization , The only difference is that it works differently , The chain structure of blockchain is the storage organization form of data , and RDBMS The log of is mainly used for data recovery . The current state of the database is not stored separately in the blockchain , And the snapshot of the database is RDBMS Support index in 、 Query and other optimization technologies .

 Blockchain and shared database

secondly ,RDBMS Usually only a certain degree of hardware fault tolerance is provided , But it doesn't support tamper proof . Tamper proof is the most important feature of blockchain in peer-to-peer networks to ensure data trust .
Third , Blockchain , Especially the public chain , It's completely decentralized , Built on Peer-to-Peer Networks . Even the alliance chain , Although some systems use the main chain — The form of branch chain organizes nodes , However, the implementation mechanisms of blockchain assume that there is no central node . By contrast , Traditional data management systems are strong centers , And the central node is trusted . This leads directly to the fact that in ensuring data consistency , The distributed consensus algorithm used in blockchain system is usually only used to maintain metadata in distributed database management system . This is the main reason for the huge performance difference between the two .
Fourth , The current major blockchain platforms do not provide the mode management of the managed data . Therefore, the data access mode only provides procedural application programming interface (application programming interface,AP I). The lack of declarative interfaces makes it difficult to develop applications for complex data management tasks , It has also become a barrier for the interaction and connection between the blockchain system and the existing data management system .
Besides , Smart contracts and RDBMS And triggers in stored procedures . It is worth noting that , In many large mission critical applications , In order to maintain high performance and maintainability of legacy code , Always avoid triggers and stored procedures .
Last , Blockchain and tradition RDBMS Different applications , The blockchain is undertaking more and more cross sectoral responsibilities in the financial field 、 Interagency 、 Trusted data management tasks across organizations and even industries .
Blockchain and RDBMS The difference is not just in Architecture 、 Function and Implementation Technology , It's also reflected in performance . The data access throughput of the current blockchain platform with better performance is shown in the table 2[16]. According to the Transaction performance Committee (TPC) The data of , stay TPC-C Benchmark evaluation , The throughput can reach nearly 5 ten thousand TPS(transaction per second). It should be noted that ,TPC-C The load complexity of is far beyond the query and transaction processing complexity that the current blockchain platform can support . and RDBMS comparison , The performance disadvantage of blockchain limits its promotion and use in many mission critical applications that need to bear high load pressure .

 Blockchain and shared database

3.2 Domain oriented data management system

The design of traditional data management system 、 The implementation of 、 The application development logic is “ All in one (one-size-fitsall)” Of , namely DBMS It's universal , Applicable to any... In any field ( structured ) Data management tasks . The rise and development of relational database management system industry also depends on this guiding ideology .2005 year ,Stonebraker M Questions are raised about this guiding ideology .10 Years later , get 2014 Turing of the year Stonebraker M It is a clear declaration of tradition DBMS No longer suitable for any application . This is because the rapid development of new hardware has overturned the tradition DBMS The assumptions on which R & D is based , It is also because the diversity of applications leads to a system optimization 、 It's impossible to balance all the features and performance metrics .
And then , Another important database scholar Carey M Put forward more constructive guiding ideology , namely “ Classification applies to (one size fits a bunch)”, Specific requirements for a specific area , Design a special data management system , For example, high-throughput transaction processing requires NewSQL System 、 On line analytical processing (online analytical processing,OLAP) Databases that need column storage 、 Text search needs a retrieval system 、 Massive and streaming data processing needs streaming data processing system 、 The data management and processing of information network need graph database , To name but a few .
Blockchain is a special data management system to meet the trusted accounting requirements of cryptocurrency applications . therefore , There are two questions : Is blockchain also applicable to other trusted data management tasks ? How to use blockchain technology for reference to solve the problem of trusted data management in wider or other fields ?

4 Application and discussion

4.1 application 1: Intelligent warehouse receipt management system based on blockchain

2016 year , For steel goods warehouse receipt mortgage common false warehouse receipt 、 Double mortgage and so on , Developed an intelligent warehouse receipt management system based on blockchain , Provide warehouse receipt generation 、 circulation 、 Trusted management of transactions and other links , Its application architecture is shown in the figure 4 Shown . This is a typical alliance chain application , Multiple nodes cooperating with each other ( Institutions ) Jointly manage warehouse receipt data and warehouse receipt transactions through blockchain 、 Circulation information . Unlike bitcoin , Nodes on the chain operate and use information differently . The owner of the warehouse receipt on the chain ( Shipper )、 managers ( Warehouse )、 Regulators ( Supervising companies )、 Inquirer ( Financial institutions ) And warehouse receipt mortgage 、 circulation 、 The nodes that play various roles involved in the transaction process .

 Blockchain and shared database

chart 4 Application architecture of intelligent warehouse receipt management system of blockchain

According to the structural characteristics of warehouse receipt data , The system realizes data mode management . Participants often need to associate warehouse receipt information on the chain with local database information at the same time , And then analyze and deal with it . The system is based on the blockchain , Realized on the chain 、 Integrated query processing of data under the chain .

4.2 application 2: Data flow

The safe house is on The data circulation cloud service platform of haiyoukede Information Technology Co., Ltd . Share data safely in the house , Perform analytical processing . In a safe house , All data access 、 Data processing is regulated and audited , Only data processing results can be “ Bring out ” Safe house . The data in and out of the system and the processing process are recorded using the blockchain , For subsequent audit and analysis .
The current blockchain technology is not enough to support all the recording and regulatory requirements of the safe house . One side , Data analysis includes a lot of machine learning and artificial intelligence algorithm processing , It's much more complicated than just dealing with transactions , The recording method of data processing and subsequent audit methods need to be further explored . On the other hand , The essence of data processing process audit in safe house is the retrospective query of data item processing process , The current blockchain platform still has weak support for backtracking query .

4.3 application 3: Governance

The government has a lot of high quality data . Depending on this data , Can be carried out accurately 、 Timely governance . In recent years , In some large and medium-sized cities in China , There have been a number of people using traffic monitoring 、 social media 、 Pedestrian, riding and other data for urban planning 、 Successful cases of urban management .
Government governance not only depends on the data of its own functional departments , Also using data from businesses and society . The sharing of this data 、 Use needs to be in a unified 、 On a regulated platform , Blockchain is a natural choice for platform implementation . Unlike the decentralization of cryptocurrency , Government governance may be polycentric or weak , Nodes can be considered trustworthy to some extent . The architecture of blockchain 、 Consensus mechanism design , And even the way data is stored 、 Model management 、 Query and transaction processing technologies need to be tailored and customized for government governance .

4.4 Discuss

Uphold the “ Classification applies to ” Thought , You can see , The current blockchain system for financial applications is not suitable for all fields . The author thinks , Blockchain technology in the following 3 It is worth further exploring .
First , Weak trust oriented weak center or multi center application environment , Trusted data management system architecture is an important issue . Many mission critical applications are in this environment , And the organizational structure of the society itself and the regulatory requirements of government functional departments , Together, it is decided that the scope of application of absolute decentralization system is not large . This requires a new look at the structure of the blockchain itself , Develop a system more suitable for the scene .
In a multi center architecture , Data management prototype system on peer to peer network developed by University of Washington Piazza It's a useful reference .Piazza In the system , Each node maintains its own data ; The data patterns between nodes may be different ; The mapping of data schema is maintained between the database of node and neighbor node ; Queries on a node can be translated by schema mapping and propagated on peer-to-peer networks , So as to access the data of other nodes . This organization method is more flexible than the full backup of node data in the current blockchain . Of course , Piazza The data management mechanism of is lack of support for tamper proof and transaction processing , There is still a lot of work to explore and try .
The second is the performance of the system . Whether it's a distributed consensus mechanism 、 Transaction processing , Or the storage organization of data 、 Indexes 、 Query and analysis processing , Blockchain systems have great room for performance optimization and improvement . And almost all applications have high requirements for the performance of blockchain .
Last , The chain structure naturally preserves the history of data , However, the current blockchain system's support for backtracking query is still weak , And backtracking is very important for auditing 、 It is also necessary for blockchain applications such as supervision . therefore , In my opinion, to achieve high efficiency 、 Flexible backtracking query mechanism is of great significance to expand the application scenarios of blockchain . here , Backtracking doesn't just refer to the backtracking of transaction history , It also includes the backtracking of data analysis and processing process such as machine learning .

5 Shared database

Blockchain partially solves the problem of trust without center in financial applications . In a wider range of application scenarios , How to build trust without relying on credit , It's an important research issue .
With the development of Internet technology , More and more fields are going online first (online) Data sharing , And then realize offline (offline) Sharing of virtual or physical objects , In order to achieve the rational use of resources and enhance the value of . This process in the rapid rise of bike sharing and then exposed a lot of business management 、 Governance 、 The problem of user behavior has been fully reflected in the process . The vigorous development of Internet applications such as bike sharing shows that China has come to the forefront of the world in business model innovation , Whether the innovation of business model can be transformed into the driving force of scientific and technological innovation is the symbol of a country's innovation ability . Need to develop new data management technology for the daily operation of the enterprise 、 Effective urban governance provides strong support .
Can support new numbers In charge of A system for managing requirements can be called “ Shared database (sharing database)”, It should be able to support the core business (mission-critical application), Supporting the business model of sharing economy (business model), Even it is a database in the era of sharing economy realized in the way of sharing economy . Blockchain has demonstrated application oriented to specific fields , The possibility of designing and implementing such a system . But in more areas , We need a shared database system similar to blockchain to solve the problem of trusted data management .
Shared database should adhere to “ Classification applies to ” Idea , Closely integrated with fields and Applications . Unlike traditional data management systems , The system form of shared database will be various : For involving “ people — goods — matter ” Application , Provide perfect transaction processing mechanism and integrated data acquisition and management ; For the management of complex data , Provides management of structured data models and patterns ; For applications involving data analysis , Provide rich temporal and backtracking query support ; For applications involving data processing auditing , On the basis of the log , Implementing transaction 、 Understanding and recording of statistical and even machine learning algorithm processes and results ; The architecture of shared database also corresponds to application , Maybe to the center , It can also be weak centered or polycentric .
Informatization is the foundation of business development and reform , Many times, he is also a pioneer of reform , Even leading application innovation . I believe , And blockchain promotes financial technology (FinTech) It's the same as the evolution of , Sharing database will develop rapidly with sharing economy .
The authors have declared that no competing interests exist.
The author has declared that there is no competitive interest .