In the past I used to handle some data workflows with cron, especially ETL jobs. At the beginning it was easy and natural, but it didn’t scale well and ended up in trouble.
Imagine a huge wifi network with hundred of access points (APs) and thousands of users. The network is managed by a controller, which has full visibility of all events like a new user connection or disconnection. In my case, the controller reports all the events to a Hadoop Distributed File System (HDFS) in raw format.
Continue reading “Service Provider WiFi analysis with Spark DataFrames”