Privacy-preserving analytics

In this blog post, we explore the capabilities and possibilities of privacy-preserving analytics on sensitive data enabled in KRAKEN market platform. As we have reported in a previous post [Secure computation on sensitive data in KRAKEN][https://www.krakenh2020.eu/index.php/blog/secure-computation-sensitive-data-kraken] the project's ambition is to offer secure and privacy-friendly market platform that enables operating with sensitive data and evaluating various analytics on it.

Multi-party computation

Modern cryptography includes many protocols that can be counterintuitive. Secure and privacy-preserving computation on data is one of them. While it seems impossible to be able to perform analytics on data that is not revealed to you, Multi-Party Computation (MPC) allows exactly this: deploying a decentralized system of computation services (aka MPC nodes), such that the sensitive data is not revealed to any of the nodes, but jointly they can compute any desired computation on the data. While this sounds incredibly powerful, a concerned reader might be doubtful and ask about the limitations, security assumptions and possible attacks on such a system. If MPC nodes can compute any function on the data, they can for example also reveal the data.

Luckily, such doubts were considered in the design of the MPC and the systems were analyzed with rigorous mathematical proofs, hence the limitations of MPC are quite well understood.


The main security assumption in MPC is that not all of the MPC nodes are malicious (or not majority of them, depending on the protocol). To expose, steal, or fake the computation on the data, all of the MPC nodes would need to join their malicious effort. This means that as long as at least one of them (or majority of them in the weaker security model) has no intention to violate the privacy of the data, the data is secure. In the KRAKEN marketplace platform, we will deploy multiple MPC nodes, each at a different partner's server. The users of the platform will be able to publish the data for MPC computations, and even if one of the MPC node providers is hached or would like to spy on the data, it is mathematically impossible for him to gain any information. The buyers will be able to request analytics on the provided data, assuming that they buy the desired computation. They can be sure that the result is correct (or detect an inconsistency), again even if say one of the nodes is compromised. Once malicious behavior is detected, a MPC node can be outcasted.

Implementation in KRAKEN and practical aspects

The KRAKEN platform wishes to enable a broad variety of privacy-preserving analytics as possible. Many MPC protocols and their implementations aim at a specific use case, with possibly a single function that can be evaluated. On the other hand, there has been a major development of schemes supporting general-purpose computation in both theoretical and implementation sense. We base our MPC implementation on a well-known and tested library SCALE-MAMBA. It allows to implement an almost arbitrary computation in a Python (or Rust)-like language, that is then compiled into instructions for the MPC nodes. We implemented and plan to offer various analytics to choose from, from basic statistics to more advanced ones, such as linear regression.


Importantly, one has to notice that evaluating secure analytics on private data is computationally more demanding than non-private computation, which can be a limiting factor. Nevertheless, there has been a lot of research trying to improve the protocols to be as permanent as possible. We tested evaluating various functions on private data to better understand the practicality of MPC. We observed that using MPC schemes with honest majority security assumptions and a group of 3 servers located in Europe connected on WAN, we can compute basic statistics (such as mean values, standard deviations, etc.) on datasets with thousands of entries in a matter of minutes. We even went to the extent of testing computations as demanding as machine learning and neural networks, to see that it is possible only if the buyers are willing to wait days for the answers. 

Authenticity of the data

The value of certain datasets can heavily depend on the ability of the sellers to guarantee their authenticity. For example, educational data is worthless, if it is not signed by a university. But this presents a challenge to every system that wants to offer privacy-preserving analytics: how can MPC nodes verify the authenticity of the data if they see only partial data themselves. We came up with two solutions: either the verification is also part of the MPC protocol (hence seen as a privacy-preserving computation) with also signatures shared among the MPC nodes, or to include another cryptographic tool - Zero-Knowledge Proofs (ZKP). We chose the latter since the recent progress in the area enables to efficiently and non-interactively create and verify claims without revealing any private information. The ZKP paradigm has had a major impact on many blockchain technologies in recent years, while we were able to use it for the privacy-preserving dataset authenticity check.



We were able to design, implement and test a practical privacy-preserving analytics system in KRAKEN. The main technologies we based our system on are secure multi-party computation for privacy preservation and (non-interactive) zero-knowledge proofs for authenticity claims. The users of the KRAKEN marketplace will be able to use it and compute on datasets that will remain private.