Example of MongoDB’s Map Reduce

One of the projects I’m working on is a platform to store all the accounting information generated by a mobile network with ~140K users. Although the system has several collections, the one I need to focus on is named dayly, which contains ~57 million documents with the following simplified schema:

db.dayly.findOne()
{
  'ts': ISODate('2014-03-01T00:00:00.000Z'),
  'msisdn': '346xxxxxxxx',
  'imei' : '01355200xxxxxxxx',
  // ... many additional stuff
}

My goal is to discover all the mobile devices (identified by the imei field) which have been used by each user (aka msisdn) in the last 14 months. For each pair user-device, I’d like to know:

  • The date of the first detection: since
  • The date of the last detection: last
  • The number of days that this pair has been detected: count

Let’s get started!

Map

map = function(){
  emit(
    // key
    this.msisdn,
    // value
    {
	  devices: [ {
        // remove the last 2 digits (sw version)
        imei: this.imei.substr(0,14),
        since: this.ts,
        last: this.ts,
        count: 1,
        }
	  ]
    }
  );
};

Reduce

reduce = function(msisdn, emitted){
  table = {};

  // loop over all the emitted values of the
  // current msisdn
  for(var i=0;i<emitted.length;i++){
    devices = emitted[i].devices;

    // loop over all the devices
    for(var k=0;k<devices.length;k++){
      imei = devices[k].imei;
      if (imei in table){
        // -- Existing imei 

        // Increment the count field
        table[imei].count += devices[k].count;

        // Update the since field if necessary
        if (devices[k].since < table[imei].since )
          table[imei].since = devices[k].since;

        // Update the last field if necessary
        if (devices[k].last > table[imei].last )
          table[imei].last = devices[k].last;
      }
      else
        // -- New imei
        table[imei] = devices[k];
    }
  }

  // Prepare the output
  ret = { devices: [] };
  for (imei in table){
    ret.devices.push(table[imei]);
  }

  return ret;
};

Execution

The execution was pretty slow and lasted almost 5 hours. The query was run over a single live system with a pretty outdated MongoDB, so under these circumstances, it was good enough 🙂

db.dayly.mapReduce(map,reduce,
  { out: 'device_history' })
  
{
  'result' : 'device_history',
  'timeMillis' : 17727591, // 4.7 hours!! 
  'counts' : {
      'input' : 57279244,
      'emit' : 36333446,
      'reduce' : 134762,
      'output' : 136277
  },
  'ok' : 1,
}

Result example

db.device_history.findOne()
{
  '_id' : '346xxxxxxxx',
  'value' : {
      'devices' : [
          {
              'imei' : '01355200xxxxxx',
              'count' : 386,
              'since' : ISODate('2015-01-01T00:00:00Z'),
              'last' : ISODate('2016-02-21T00:00:00Z')
          },
          {
              'imei' : '35205207xxxxxx',
              'count' : 8,
              'since' : ISODate('2016-02-22T00:00:00Z'),
              'last' : ISODate('2016-02-29T00:00:00Z')
          }
      ]
  }
}
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s