Enabling Privacy Mechanisms in Apache Storm


To analyze data that is streamed into a real-time computation system has gained traction and is very useful in use cases where for example telecom networks should be optimized dynamically. For this analysis lots of data i.e., Big data is used. This nevertheless also poses privacy risks as this data usually also contains personal data and not only core applications of an organization want to access the big data pool but also other applications. In such cases the goal is to control the access to the data and be able to impose conditions on the data access if it contains personal data.

This paper achieves this goal by contributing a privacy policy framework which controls which data in a real-time computation system like Apache Storm can be accessed and under which conditions. It allows customers to specify their privacy policies on how their private data is to be treated and organizations to specify their policies according to law and their business needs. Additionally, it is able to check that the programs submitted to this real-time computation system are able to access only those data streams having been approved by the organization running the real-time computation system.