资源描述
Information Sciences 527 (2020) 108127 Contents lists available at ScienceDirect Information Sciences journal homepage: Privacy-Preserving distributed deep learning based on secret sharing Jia Duan a , Jiantao Zhou a , , Yuanman Li b a State Key Laboratory of Internet of Things for Smart City, Department of Computer and Information Science, Faculty of Science and Technology, University of Macau, China b College of Electronics and Information Engineering, Shenzhen University, China a r t i c l e i n f o Article history: Received 20 September 2019 Revised 21 March 2020 Accepted 23 March 2020 Available online 26 March 2020 Keywords: Deep neural network Distributed deep learning Secure multi-party computation Privacy preserving Secret sharing a b s t r a c t Distributed deep learning (DDL) naturally provides a privacy-preserving solution to enable multiple parties to jointly learn a deep model without explicitly sharing the local datasets. However, the existing privacy-preserving DDL schemes still suffer from severe information leakage and/or lead to significant increase of the communication cost. In this work, we de- sign a privacy-preserving DDL framework such that all the participants can keep their local datasets private with low communication and computational cost, while still maintaining the accuracy and eciency of the learned model. By adopting an effective secret sharing strategy, we allow each participant to split the intervening parameters in the training pro- cess into shares and upload an aggregation result to the cloud server. We can theoretically show that the local dataset of a particular participant can be well protected against the honest-but-curious cloud server as well as the other participants, even under the chal- lenging case that the cloud server colludes with some participants. Extensive experimental results are provided to validate the superiority of the proposed secret sharing based dis- tributed deep learning (SSDDL) framework. 2020 Elsevier Inc. All rights reserved. 1. Introduction Recently, deep neural network (DNN) architectures have obtained impressive performance across a wide variety of fields, such as face recognition 32,37 , machine translation 8,11 , object detection 26,36 , and object classification 14,19 . As the size of datasets increases, the computational intensity and memory demands of deep learning grow proportionally. Although in recent years significant advances have been made in GPU hardware, network architectures and training methods, the large-scale DNN training often takes an impractically long time on a single machine. Additionally, many accuracy improving strategies in the deep learning, such as scaling up the model parameters 31 , utilizing complex model 9 , and training on large-scale datasets 21 , are also constrained by the computational power significantly. Fortunately, distributed deep learning (DDL) framework provides a practicable and ecient solution to perform learning over large-scale datasets, especially when some datasets belong to different owners (and hence cannot be shared directly). To solve complex and time-consuming learning problems, DDL utilizes data parallelism and/or model parallelism 10 . In Corresponding author. E-mail addresses: (J. Duan), jtzhouum.edu.mo (J. Zhou), (Y. Li). doi/10.1016/j.ins.2020.03.074 0020-0255/ 2020 Elsevier Inc. All rights reserved. J. Duan, J. Zhou and Y. Li / Information Sciences 527 (2020) 108127 109 model parallelism, different participants in the distributed system are responsible for the computations in different parts (e.g. layers) of a single network. In data parallelism, different participants have a complete copy of the model, and each participant locally possesses a portion of the dataset. All the participants aim to learn an accurate deep model by sharing the training information (model parameters, gradients, etc.). These two kinds of parallelism strategies are not mutually exclusive. One can utilize model parallelism (model splits across GPUs) for each participant, and data parallelism among participants. Some representative DDL schemes are included in 10,18,22 and references therein. Meanwhile, recently many privacy-preserving protocols have been suggested to leverage a public third-party to perform various computations, e.g., solving linear equations 7,42 , linear programming 40,41,44 , keyword search over outsourced data 15,39 , cross-media retrieval 17,43 , fair watermarking 46 , nonnegative matrix factorization (NMF) 12 , and deep learning 1,16,23,29,34 Among these various protocols, the ones concerning privacy-preserving deep learning are the most relevant to our work. According to the privacy issues of different phases in the network, these works can be grouped into two categories: one is to deal with privacy issues in the training phase 1,27,29,34,45,47,48 and the other is to preserve privacy in the testing phase 2,6,16,23 Due to the confinement of the training data to its owner, it is natural to build a privacy-preserving deep learning frame- work based on the DDL architecture. The first privacy-preserving DDL was proposed in 34 , where the gradients shared between the cloud server and participants are protected via a differential privacy mechanism. Unfortunately, Phong et al. 29 found that the shared gradients would cause severe information leakage of the local training data. To prevent the in- formation leakage, they proposed to encrypt the shared gradients with additively homomorphic encryption (HE) 38 . More recently, Phong 28,30 devised a new privacy-preserving DDL framework by sharing encrypted network weights, rather than the encrypted gradients. Furthermore, Ma et al. 24 also suggested a privacy-preserving multi-party deep learning framework by incorporating ElGamal encryption 5 and Die-Hellman key exchange 35 . In this work, we propose a privacy-preserving SSDDL framework that enables multiple learning participants and one cloud server to collaboratively train an accurate DNN model with low communication and computational cost. To protect the input privacy, an effective secret sharing scheme is adopted. Specifically, instead of sharing gradients, each participant splits its own gradients into shares and distributes them to the other participants. Upon receiving the shares, each partic- ipant calculates an aggregation result, and uploads it to the cloud server for updating the global network parameters. It is shown theoretically that the local dataset of a particular participant can be well protected against the honest-but-curious cloud server as well as the other participants, even under the challenging case that the cloud server colludes with some participants. Extensive experimental results are provided to validate the superiority of the proposed SSDDL framework. The rest of this paper is organized as follows. Section 2 reviews and analyzes the state-of-the-art privacy-preserving DDL frameworks. Some preliminaries are given in Section 3 . Our proposed privacy-preserving SSDDL framework is presented in Section 4 . In Section 5 , we provide detailed security analysis for evaluating our proposed scheme, under the designed security experiments. Section 6 offers the experimental results and finally we conclude in Section 7 . 2. Review and analyses of the privacy-preserving DDL frameworks In this section, before giving our analyses of the existing privacy-preserving DDL frameworks, we first provide the details of the state-of-art frameworks. 2.1. Review of the privacy-preserving DDL frameworks Deep learning aims to extract complex features from high-dimensional data and use them to build a model which can map inputs to outputs. Usually, deep learning architectures are constructed as multi-layer networks so that more abstract features are computed as nonlinear functions of lower-level ones. A traditional multi-layer DNN architecture can be built up by one input layer, one output layer and l hidden layers, as illustrated in Fig. 1 . Each hidden layer is connected with the output of the previous layer. For simplicity, we consider a single layer of the DNN, and the expression can be extended to the other layers as well. Let x R 1 u denote an input vector, and h R 1 v denotes the output of the layer. Then, the computation of a single layer can be expressed as h = f(x W + b ) , (1) where W R u v is the weight matrix, b R 1 v is the row vector of biases, and f denotes the activation function such as sigmoid, rectified linear unit (ReLU), softplus, hyperbolic tangent, max-out, etc. The learning task is considered as an optimization problem, which determines these weight variables ( W and b ) by minimizing a pre-defined cost function over a training dataset. The cost function is evaluated over all the data in the training dataset, or over a subset (mini-batch). In practice, stochastic gradient descent (SGD), which computes the gradient over an extremely small subset of the whole dataset, is a commonly used technique for solving the optimization problem of deep learning. Let Theta1 be the flattened vector of all the weights variables (consisting of W and b in each layer) of the deep network. Then, the update rule of SGD can be formulated as: Theta1:= Theta1+ G , (2) 110 J. Duan, J. Zhou and Y. Li / Information Sciences 527 (2020) 108127 Fig. 1. The traditional architecture of deep neural network. where G = G describes the parameter change, G denotes the gradients of the parameters, and represents the learning rate. The learning rate can be fixed or adaptively determined, as presented in 13 . It should be also noted that each weight variable in Theta1 is updated independently from the others. Based on the SGD in the traditional deep learning, Shokri et al. 34 proposed a privacy-preserving DDL framework, called distributed selective stochastic gradient descent (DSSGD). Specifically, several participants collaboratively train a deep model with a honest-but-curious cloud server. The architecture of the DNN is fixed and known for all the participants and the cloud server. Each participant trains the model on its own dataset, and shares the gradients with the cloud server. The cloud server is responsible to maintain the global parameters (model parameters), and share them to the participants. Inspired by the selective SGD, each participant uploads a fraction of gradients, rather than all the gradients, so that the communication cost can be lowered without affecting the model accuracy. In addition, Shokri et al. 34 also suggested to protect the input data privacy by adopting differential privacy strategy, at the cost of degraded model accuracy. Although DSSGD was claimed to be able to provide excellent eciency, accuracy, and data privacy, Phong et al. 29 re- cently pointed out its unsatisfactory privacy issues. It was observed that the gradients of the first layer can completely leak the information of the input data. Hence, the gradients shared in the plaintext between the cloud server and participants would cause severe information leakage of the local training data, defeating the ultimate goal of privacy-preserving. To remedy this drawback, Phong et al. 29 also suggested a simple countermeasure by encrypting the shared gradients with additively HE. Specifically, all the participants negotiate a pair of public key pk and secret key sk for performing additively HE. The secret key sk is kept confidential from the cloud server, but needs to be shared among all the participants. Upon each round of local training, each participant employs HE to encrypt the gradients with the public key pk , and sends the encrypted version to the cloud server for updating the model parameters by resorting to (2) in the encrypted domain. By using the secret key sk , the participants can easily decrypt the global parameters. Under such framework, it was claimed that the model parameters and the gradients can be made private from the cloud server. Furthermore, Phong et al. 30 pro- posed a privacy-preserving DNN by sharing weights, instead of gradients. Specifically, they presented two kinds of network architectures: Server-aided Network Topology (SNT) and Fully-connected Network Topology (FNT). In a SNT system, similar to 29 , a symmetric key is shared among all the participants, but is kept secret to the cloud server. Each participant trains the weights over its own dataset locally and encrypts the weights with the symmetric key. The encrypted weights are then uploaded to the cloud server for synchronization. In FNT system, without a centralized cloud server, all the participants train and transfer their weights one by one in a fixed order. Due to the absence of the cloud server, the symmetric encryption is abandoned. Bonawitz et al. 4 also designed a failure-robust protocol which can privately share the gradients between the partici- pants and the cloud server. Specifically, each participant adds double masks to the gradient and uploads the masked gradient to the cloud server. Upon receiving enough masked gradients, the cloud server aggregates them and removes the masks by the corresponding keys. Sharmirs t -out-of- n secret sharing scheme is adopted to handle the dropped participants during the interactions. Under such framework, it was proved that the joint view of the server and any set of t 1 participants cannot leak any information about the inputs of the other participants. J. Duan, J. Zhou and Y. Li / Information Sciences 527 (2020) 108127 111 Fig. 2. The privacy issues in the DDL framework. 2.2. Some additional analyses on the existing privacy-preserving DDL schemes As described above, the previous works 29 and 34 provided two ecient privacy-preserving DDL frameworks where the input (local dataset) was protected against the cloud server. However, privacy issues still exist in these frameworks. Specifically, Phong et al. 29 . found that the k -th component of the input data x = x 1 , x 2 , , x u can be recovered by the gradients of the k -th weight variable W k and the bias b in the first layer. Furthermore, in the strategy 34 , the gradient of the weight variable W k in the first layer can be considered as proportional to the input x k . In this case, the adversary can utilize the gradients to easily produce a related input by scaling. Hence, sharing the gradients in plain is insecure in the DDL framework. In the work 29 , although the additively HE provided the confidentiality of the gradients against the cloud server, the computational cost was dramatically increased. Furthermore, the scheme cannot protect the privacy of the local dataset against the other participants. For instance, there is a participant A who is curious about the local dataset of the participant B . Assume that the participant A can eavesdrop the communication channel between the cloud server and the participant B . As shown in Fig. 2 , the participants A and B choose a pair of key pk, sk for additively HE scheme. The secret key sk is k
展开阅读全文