The application of blockchain in data circulation

Yan Shu , Qingsude , Wei Kai

China academy of information and communications , Beijing 100191

Abstract : The circulation of big data is a key link in creating data value , However, at present, data circulation is facing many problems . Blockchain is a distributed ledger technology , It has the characteristics of decentralization and non tampering , It can solve some pain points in the process of data circulation . This paper introduces the characteristics of blockchain technology , This paper expounds the use of blockchain to transform the link of authorization and certificate storage 、 The main idea of data traceability and smart contract implementation , This paper combs the overall framework of data circulation . On this basis , The application examples of data circulation using blockchain in foreign countries are given , And introduces some other new data flow technologies .

key word : Blockchain ; Data flow ; Intelligent contract

doi:10.11959/j.issn.2096-0271.2018001

Citation format of the paper : Yan Shu , Qingsude , Wei Kai . The application of blockchain in data circulation [J]. big data , 2018, 4(1): 3-12.

YAN S, QING S D, WEI K. Application of blockchain in data circulation[J]. Big Data Research, 2018, 4(1): 3-12.

1 introduction

Data circulation refers to the behavior of taking data as the object between data provider and data demander according to certain circulation rules . In this case , The data is out of the original use scenario , Changed the purpose of use . With the gradual recognition of the resource value of data and the increasingly complete structure of big data industry chain , The demand for data circulation in China is also increasingly urgent . Whether it's sharing or trading , Data flow makes data transfer from data generation end to data application end , Optimized resource allocation , It is becoming an important part of releasing the value of data [1,2].

According to the different needs of data usage , Data in circulation usually includes original data and derivative data after processing , Involving data application programming interfaces (application programming interface,API)、 Data report 、 Raw data grouping 、 Technical algorithms 、 Data application and other different types of data resource commodities . Combining different data usage requirements , Data circulation service providers usually adopt different circulation modes , For example, it only establishes data circulation or service relationship for the provider and demander “ Intermediary type ” service ; Independent data collection and external sales “ Mining, production and marketing ” Service ; Further processing of data , Generate valuable derived data or applications , And provide external services “ Processing service type ” service . In these circulation modes, the data provider and the data demander are usually not the same entity [3].

However , China's data circulation , Especially in the data flow of trading industry , There are still serious problems , For example, the problem of data privacy protection is extremely prominent 、 The issue of data ownership needs to be accurately defined 、 There is a lack of consensus on the standards for all aspects of data circulation 、 Illegal data transactions are rampant . Besides , With 《 Network security law of the people's Republic of China 》 The official implementation of , Trafficking in illegal data officially punished . In the case of imperfect supporting regulations and standards , Many enterprises in order to avoid risk , The scale of data circulation industry has been postponed or reduced one after another . The data circulation industry is facing the biggest challenge in recent years , There is an urgent need to provide new support through the application of new technologies .

Blockchain technology, which has sprung up in recent years , From a technical point of view, this paper provides a solution to the above problems . This paper introduces several key technologies of blockchain application , And sort out the overall structure of using blockchain to realize data circulation .

2 Blockchain Technology Overview

Blockchain is a distributed shared database based on cryptography , Its essence is to collectively maintain a reliable database through decentralization and distrust . This technical solution allows all nodes in the system to back up a complete copy of the current blockchain , Every once in a while , The system selects the fastest and best node for bookkeeping during this period , The node uses cryptography technology to calculate and verify , Compare all valid transaction information in this period with the current blockchain's “ digital signature ” To a new block , And link the new block to the end of the current blockchain copy , Form a new copy of the blockchain . Any change of the blockchain will rely on the message broadcast of the whole system , Realize the update of the whole network blockchain copy . As mentioned above , Blockchain technology has distributed peer-to-peer capability 、 To trust 、 Four characteristics of collective maintenance and impossibility to tamper with , Therefore, it has a natural fit with the data circulation scenario .

Broadly speaking , Blockchain is a new distributed infrastructure and computing paradigm . It uses block chain data structure to verify and store data 、 Using distributed node consensus algorithm to generate and update data 、 Using cryptography to ensure the security of data transmission and access 、 Programming and manipulating data using smart contracts made up of automated script code . In a narrow sense , Blockchain is a kind of chain data structure formed by connecting data blocks in chronological order , It is a distributed account book that can not be tampered with and forged by means of cryptography . Since the development of blockchain , The global financial system took the lead in testing and correcting blockchain technology . Now , The development of blockchain can be realized through distributed ledger 、 It can't be tampered with , Combined with smart contract and other technologies , Solve real complex business scenarios .

Blockchain platform as a whole can be divided into network layer 、 Consensus layer 、 The data layer 、 Smart contract layer and application layer 5 A hierarchical . In order to achieve the data can not be tampered with , Blockchain introduces a chain structure based on blocks . Different blockchain platforms differ in the details of the data structure , But on the whole it's basically the same . Take bitcoin for example , Each block consists of block head and block body , The block body stores multiple transactions since the previous block ; The block header stores the front hash (preblockHash)、 random number (nonce)、Merkle root (Merkle root) etc. . Blockchain is based on two hash structures to ensure the non tamperability of data , namely Merkle Trees and block lists . chart 1 The data structure of bitcoin blockchain is described [4].

 The application of blockchain in data circulation

chart 1 Two hash structures of bitcoin

3 Using blockchain to solve key problems in data circulation

Through blockchain Technology , Can exist in Authorization 、 Data traceability and other key issues of data circulation should be improved , At the same time, new trading methods such as smart contracts will be realized .

3.1 Using the blockchain to transform the authorization and storage link

For a long time , Because the data flow side 、 The manufacturer 、 Separation of users , Data secondary transaction has no means of audit and control , Unable to verify authorization authenticity in real time , Data transaction authorization has not made much progress at the technical level , It is often used as shown in the figure 2 The traditional pattern shown . In this mode , Users conduct one-to-one transactions through intermediary agencies such as data providers or data trading agencies 、 Separate authorization .

 The application of blockchain in data circulation

chart 2 According to the traditional mode of authorized deposit of trading users

In the traditional mode , Authorized certificates can be tampered with at will , Without credibility . Due to the need for corresponding liability provisions , Each application and data source company needs to sign a separate agreement . Besides , Querying authorization records requires a separate development interface , And this is often overlooked . Due to the binding of authorization and business process , It's difficult for users to join and exit .

The development of blockchain technology makes a new breakthrough in this field . In blockchain mode , The complete authorization and authentication process is shown in the figure 3 Shown . Users sign electronic protocols , Authorization to the data provider . The data provider first stores credentials locally through the application system , And then upload the authorization information to the authorization information chain . The application system executes the code on the chain , Initiate chain query , And record the authorization information to the block . When the data demand side submits the data demand , Initiate authentication transactions on the chain , Confirm whether the user is authorized . Next , The authentication node on the chain returns authorization information , If authorized , The corresponding data is returned .

 The application of blockchain in data circulation

chart 3 The blockchain mode of data transaction user's authorization to deposit certificates

Blockchain mode avoids the defects of traditional mode . Any node can record authorization information , And it can't be changed . Multiple parties can share authorization records in real time , High query efficiency . Besides , Authorization is decoupled from business , You can join and exit at any time .

Take ZTE as an example , Based on the blockchain, it launched the electronic license sharing and exchange platform , Using the equal rights of blockchain Technology 、 The characteristics of Co Construction , According to the principle of co construction and sharing, we can achieve comprehensive data collection ; Publishing trusted license information through blockchain , The digital signature of the original publishing department can not be tampered with and the data can be trusted ; Asymmetric encryption based on blockchain technology , Encrypt each message individually ( Each citizen's information has a separate decryption private key ), Prevent information leakage , Accelerate the implementation of “ Internet + E-government services ”, Building government services “ A net ”, Innovatively realize the information collection of electronic license across regions in the city level 、 Quick search and result application , In the future, it will expand to the electronic license sharing and exchange platform of the whole province , Build a credible open platform for government information sharing of provincial and municipal governments , Ensure the data sharing, opening and security among various government departments .

3.2 Using blockchain for data traceability

The openness of blockchain Technology 、 Autonomy 、 Decentralisation is very suitable for data transaction ( circulation ) Tracing to the source . at present , Many experts hold the view that : data + Blockchain = Data assets . The value of blockchain is reflected in the construction of a certain degree of data “ Uniqueness ”. Data with unique identification is attached to the blockchain for transaction , It naturally solves the problem that data is difficult to trace .

In the blockchain network, multiple nodes participate in data calculation and recording , And verify the validity of their information with each other , It can be used for information anti-counterfeiting , It also provides a traceable path . Through information on the chain , The transaction information of each block constitutes a complete transaction list , It's an unforgettable record of every transaction . When the user has questions about the value of a block , It's easy and accurate to backtrack transactions , Then we can judge the historical transaction records [5].

At present , Supply chain control and Traceability Technology Based on blockchain is making rapid progress . for example , It can be combined with blockchain 、 Bitcoin related technology and multi signature technology design supply chain management and traceability scheme . Divide the internal entities of the supply chain into “ Character entity ”“ Product entity ” and “ Permission entities ”, Hierarchical wallet technology is applied to the distribution of entity key . Build a tree structure coding system based on hierarchical wallet technology , Establish decentralized authority control mechanism based on blockchain transaction and property transfer information recording and verification mechanism , Thus, a new idea of supply chain control and traceability by using blockchain is proposed .

for example , Food safety has always been a big problem for people . Take grain as an example , The real Wuchang rice can't produce much more than 10 0 More than ten thousand tons , But the quantity of Wuchang rice sold on the market is more than 1 500 Ten thousand tons of , It's hard for ordinary people to get real Wuchang rice .2017 year 4 month , Smart chain (ChainNova) The company joins hands with North China Fang famous farm , The application is based on Hyperledger Fabric 1.0 Public key and blockchain technologies infrastructure (public key infrastructure,PKI) System authentication user identity , We have built agricultural blockchain applications . This app digitally signs the submitted request and uploads it to the blockchain , So as to ensure the authenticity of the data can not be tampered with . meanwhile , Combined with the Internet of things 、 Big data and other technologies open up the whole process data channel on and off the chain , Realized 1 296 Tracing and quality assurance of rice in ten thousand mu black land , Deliver high quality rice to the people .

3.3 Data transaction based on smart contract

Smart contracts are made by computer scientists 、 Encryption master Nick · Saab in 1994 First proposed in , It's a computer program that automatically executes the terms of a contract , That is, a pre programmed program code , It identifies and judges the data information obtained from the outside . When the conditions set by the program are met , Then trigger the system to execute the corresponding contract terms automatically , In order to complete the transaction and transfer of smart assets . However , After the concept was put forward , Due to the lack of corresponding platform to execute the contract, they are in a state of being buried .

With the emergence of blockchain technology, smart contract has been paid more attention and studied again . The distributed ledger structure in blockchain technology runs through the business layer ( Such as assets )、 application layer ( Such as smart contracts )、 Middleware layer ( Such as distributed transaction consensus ) And the underlying technology layer ( The underlying network ). Smart contracts can be stored on the application layer 、 Verify and execute , Therefore, smart contract has become an important feature of blockchain technology application .

Take data trading as an example , Give the asset some code and run it on the blockchain , Make it a shared resource of the whole network , And then trigger smart contract execution through external data , To determine the flow of data assets in the network 、 Assign or transfer . The object of a smart contract is not limited to data , It could be a car 、 Property rights such as houses , It can also be equity 、 Notes 、 Intangible property rights such as digital currency .

Smart contracts are not just defined by code , It's also enforced by code , So smart contracts are completely automatic and can't intervene , There is no need for the parties to trust each other . This is exactly what data trading needs . Data trading institutions can establish rules , And replace the contract with the code form , Realize the payment function on the chain , Improve the level of automated trading .

4 Using blockchain to realize the overall structure of data circulation

Using blockchain technology to realize data circulation , From the network switching layer 、 Consensus mechanism layer 、 Data storage layer 、 Smart contract layer and data flow layer 5 This paper combs at three levels . this 5 There are three levels as shown in the figure 4 Shown .

 The application of blockchain in data circulation

chart 4 Blockchain realizes data flow 5 A hierarchical

In the network switching layer , According to our national standard 《 Basic requirements for information system security level protection 》, Unlicensed chain ( Also known as the public chain ) The technical architecture in physical access control 、 Network security 、 Service performance requirements 、 The reliable operation of the system does not meet the relevant national regulations , therefore , The architecture of unlicensed chain does not adapt to the hierarchical protection of information system , Only the licensing chain ( It is also called alliance chain or private chain ) How to deploy , You need access through a dedicated line 、 Construct virtual Virtual private network (virtual private network,***) And other ways to ensure the safety and reliability of communication . In terms of identity authentication , Use industry or regional e-commerce certification authority (CA authentication ) Authentication and access control , for example , When enterprises doing financial big data transactions are accessing , It needs to pass the China Financial Certification Center under the people's Bank of China (China Financial Certification Authority,CFCA) Certification of ; When big data trading institutions in Shanghai access , It needs to be established through the authorization of Shanghai municipal government Digital certificate authority Ltd (SHECA) And so on . The whole network is a peer-to-peer network (peer-to peer), Failure of multiple nodes in the system 、 Quit and join , Even the existence of malicious nodes , It will not affect the stability and safety of the whole system .

At the consensus level , Blockchain consensus mechanism can be divided into two types according to the consensus process : The first is the consensus of probability , Finally confirmed in Engineering , Such as proof of workload (proof of work,POW) Mechanism 、 Proof of interest (proof of stake,POS) Mechanism, etc ; The second is consensus after absolute agreement , Consensus is confirmation , Such as Byzantine (BFT) And variants based on related algorithms ( Practical Byzantine fault tolerance (PBFT) Algorithm etc. ). As mentioned earlier , In the context of the licensing chain , Try to use a consensus algorithm that is absolutely consistent . The comparison of the two consensus algorithms is shown in table 1.

surface 1 Consensus algorithm comparison

 The application of blockchain in data circulation

In the data storage layer , Every data transaction should be tracked continuously , Adopt the double entry bookkeeping model different from that of ordinary commercial banks —— Transaction based models , That is to say, for the non cost transaction output ( unspent transaction output, UTXO) Model , It is more suitable for the control and traceability of data transactions . also , Through the block chain data structure to achieve interlocking historical transaction information , Encryption and decryption of public and private keys are realized by asymmetric cryptography technology , It is helpful to confirm data right and link data before data transaction .

At the smart contract level , Relying on the independent execution and mutual proofreading ability of smart contracts in isolated sandbox , It can be expressed precisely by coding 、 Realize the authority control of data classification management , It helps to realize the hierarchical mapping between multiple government commissions and bureaux , It also helps cross level 、 Cross department 、 Across the region 、 Cross platform 、 Cross industry multi agency cooperation and linkage . In terms of privacy protection , It mainly relies on the sandbox environment in which smart contracts operate independently , In addition to data licensors and stakeholders , No one has access to the data , And access the data in strict accordance with the data view authority set by the smart contract , This ensures the privacy of data to a certain extent . More privacy technologies will be described in more detail later . Automatic transaction of derived data through automatic execution of smart contract , Help solve the problem of data pricing 、 Accurate billing is difficult 、 There are many problems, such as the difficulty in matching transactions .

In the data flow layer , Relying on the network switching layer 、 Consensus mechanism layer 、 Data storage layer 、 Smart contract layer and other related mechanisms , Compared with the traditional data circulation platform, the data provider , It's easier to encrypt and transmit your data , It is convenient to realize the right confirmation management of data . The data circulation platform faces the encrypted data , It's easier to prove your innocence , Avoid the proliferation of circulation data replication . Relying on the traceability function of the blockchain system , Data owners can track the flow of data , And every time the data is used, it needs the authorization verification of the data owner , Realize the independent management and control of the data provider , Data demanders can trace the source of data , Ensure the authenticity of data analysis , Improve the accuracy and effectiveness of data analysis .

It's worth pointing out , To enhance the reliability and security of data flow , Blockchain can pay more attention to the following points in R & D and design : One is to choose the right consensus algorithm , According to the requirements of data circulation for timeliness , Consider the consensus algorithm ; The second is to choose a compliant personal information protection algorithm , Give full play to the technical advantages of blockchain , Strictly protect personal information ; The third is to choose the appropriate blockchain deployment mode , According to the characteristics and security requirements of data circulation , It's a licensing chain , Determine a more appropriate blockchain deployment mode ; Fourth, the encryption algorithm can be organically combined with the hierarchical classification mechanism of data , For different levels 、 Data with different sensitivities adopt encryption algorithms with different costs , So as to improve the efficiency of data circulation as a whole .

5 Application examples of blockchain data circulation

Blockchain has made important progress in the application of electronic medical record data circulation and sharing in medical and health fields . In the U.S. , from National health information technology coordination office (Office of the National Coordinator for Health Information Technology) Leading electronic medical records (electronic medical record,EMR) Sharing has entered the application stage , Here is a brief introduction [6,7].

EMR The shared blockchain model has strong scalability . Each block records the patient's unique identification information 、 Encrypted medical records and their timestamps . To improve data access efficiency , Metadata content such as data format is also recorded in the block in the form of tags . All medical data is stored in what's called a data Lake (data lake) In the database of , It can store various types of data ( Include images 、 Documents, etc. ), This kind of storage is encrypted and combined with digital signature technology . The data in the data lake is of great value in data analysis .

When a medical institution prescribes EMR when , A digital signature will be automatically generated to verify the eligibility of the issuing agency , The data is then encrypted and transmitted to the data Lake . meanwhile , stay EMR On the shared blockchain , A record containing the patient's unique identification information is generated from the data lake , And tell the patient that . The process is shown in the figure 5 Shown .

 The application of blockchain in data circulation

chart 5 EMR Share blockchain and data Lake information

The uploaded data is very strict in access control . Users can decide who can access and modify the corresponding data , And you can find out when and by whom the information is accessed .

Some researchers from the Massachusetts Institute of technology in the United States also proposed to use blockchain technology to decentralize EMR A new system of information , And named it MedRec. Stakeholders such as data researchers and public health institutions act as “ A miner ” Participate in the Internet , And get desensitized 、 Relevant health data for scientific research . such “ dig ” The process can be considered as a kind of POW Mechanism , It can also be seen as “ The currency ” The application of mechanism in data circulation .

in fact , Because of the EU 《 General data protection regulations (European General Data Protection Regulation,GDPR)》 The announcement of , The requirement of data protection is higher and higher , More scientific research institutions and enterprises began to explore the use of blockchain to achieve data circulation , The combination of blockchain, cloud computing, security audit and other technologies has attracted wide attention .

6 Other data security solutions

Blockchain as a shared ledger Technology , To achieve data isolation , It also needs to be used in combination with other data security technologies , Such as zero knowledge proof (zero-knowledge proof)、 secure multi-party computation (secure multiparty computation,SMC) etc. .

6.1 Proof of zero knowledge

Zero knowledge proof is made up of Goldwasser Et al. 20 century 80 Put forward in the early s . It means that the prover can provide no useful information to the verifier , To convince the verifier of the correctness of an assertion .

The prover and the verifier have the same function or series of values . The general process of zero knowledge proof is as follows :

● It is proved that random values satisfying certain conditions are sent to the verifier , This random value is called “ promise ”;

● The verifier sends random values satisfying certain conditions to the prover , This random value is called “ Challenge ”;

● The prover performs a secret calculation , And send the results to the verifier , This result is called “ Respond to ”;

● The verifier is right “ Respond to ” To verify , If validation fails , It shows that the proving party does not have the so-called “ knowledge ”, And exit the process . otherwise , Continue with the first step , Repeat this process t Time .

If every time the verifier verifies successfully , The verifier believes that the verifier has some knowledge . And in the process , The verifier didn't get any information about this knowledge , Successfully protected the privacy of the prover .

6.2 secure multi-party computation

Secure multiparty computing is used to solve the problem of privacy preserving collaborative computing among a group of untrusted participants , Secure multiparty computation ensures input independence 、 The correctness of the calculation , At the same time, the input value is not disclosed to the participants . Usually , A secure multiparty computing problem is to compute any probability function based on any input in a distributed network , Each input has an input on this distributed network , This distributed network needs to ensure input independence 、 The correctness of the calculation , And in addition to the respective inputs , Do not disclose any information that can be used to derive other inputs and outputs [8].

Secure multiparty computation can be simply summarized as the following mathematical model : In a distributed network , Yes n I'm not a trusted participant P1,P2,…,Pn, Each participant Pi Secret input xi, They need to execute functions together F:(x1,x2,…,xn)→(y1,y2,…,yn), among yi by Pi The corresponding output . In function F During the calculation of , Ask any participant Pi( except yi Outside ) No other participants were available Pi(j≠i) Any input information for .

General secure multiparty computing protocol , Because of the independence of its computing task ( You can calculate any function ), There is no need to consider specific security attributes and external running environment , So it has a unique advantage for the security of complex applications at this stage .

7 Conclusion

Blockchain by establishing a set of public ledgers , It is recorded by all users in the network , Ensure the authenticity and non tamperability of information . These characteristics make blockchain an effective tool to solve the problem of data circulation . However , The performance bottleneck and latency of blockchain are becoming more and more obvious . Except for technical problems , Its construction cost and integration with the existing system have become the factors restricting its future development . Whether blockchain can play a greater role in data circulation in the future , It also depends on a lot of factors .