Large-Scale Anonymization at Telefónica Germany powered by Apache Flink
Together with Konstantin Knauf.
By now Telefónica Germany’s Data Anonymization Platform (DAP) relies on Apache Flink to process billions of records near real time every day, but it has not always been that way. After briefly introducing the platform and some of its use cases, we will tell you about our experience consolidating DAP’s Big Data stack (Storm, MapReduce, Pig, Hive,…) in Apache Flink. About one year ago, we started a complete overhaul of the application consisting of about 50 distributed components. The goal was to increase performance, to simplify the design and to add newly possible features. During the talk we will try to shed some light on questions like: Why did we go for Apache Flink? Which issues did we solve? Which problems did we have to work around? Which features are we using, and which are we waiting for?