Knowledge is power – Littlefinger (Game of Trones)

If in this bussiness the handsets are the Knights, and the brands are the Lords, who is winning the battle? Despite the joke, the mobile handset market awareness is crucial for a mobile service provider in order to increase the sales, enhance the customer satisfaction and reduce the churn. The days when the customers only obtain their handsets from the provider are gone, now there are many other Lords on the battle (retail, amazon, ebay, etc). There is only one way to success: detect where the market is going, learn, and decide in consecuence.

The dataset

My aim is to keep this post focused on data analysis, so how I managed to get this dataset is out of the scope of this post, as it would fit on the big data field.  Please take in mind that I cannot share the actual results so this post will only contain anonymized data, but it will not affect my data analysis approach.

My source is a set of 19 monthly text files with the following structure:

YYYYMM|MSISDN     |IMEI           |NDAYS|BRAND  |MODEL
201501|34999888666|354379060000001|21   |APPLE  |IPHONE 5
201501|34999888777|854379050000001|30   |SAMSUNG|GALAXY S3 MINI
  • MSISDN is the phone number of the user.
  • IMEI is the unique number which identifies a mobile device.
  • NDAYS counts how many days the user has used the phone in the given month.

I’m going to use python for the analysis; specifically pandas and numpy for data manipulation and basic exploration and matplotlib for quick visualization.

My goals

Focusing on the most popular handset brands I’d like to achieve 2 goals:

  1. To discover the historical brand share evolution.
  2. To figure out the level of loyalty of each brand and relevant migrations between brands.

Both goals will assist in the selection of vendors on which invest more resources on, such as portfolio designing, marketing budget, logistics, etc.

Importing the dataset

# Iter over the last months and import the dataframes
# (imports and previous stuff are omitted)
dataframes = []
for dt in rrule.rrule(\
    rrule.MONTHLY,dtstart=start,until=end):
    filename = 'MONTHLY/%s.txt' % dt.strftime('%Y%m')
    temp_df = pd.read_csv(filename, delimiter='|')
    dataframes.append(temp_df)

# Create the unique dataframe
big_df = pd.concat(dataframes)

Brands share

Let’s start by getting the number of monthly handsets of the top 5 brands.

# get the top 5 brands
top_vendors = big_df.BRAND.value_counts().head(5).index

# get the brand share dataframe
brand_counts=\
    big_df[big_df.BRAND.isin(top_vendors)]\
    [['YYYYMM','BRAND','IMEI']]\
    .groupby(['YYYYMM','BRAND']).count()\
    .unstack()                                         

# get pretty column names by dropping the first level
brand_counts.columns = brand_counts.columns.droplevel()
brand_counts.head(2)
share1
Monthly amount of handsets of the top brands.

Now let’s get the relative dataframe by dividing each row by the monthly total:

# Divide each row by the monthly totals
brand_share = 100.0*brand_counts.div(\
            big_df.groupby('YYYYMM').IMEI.count(),\
            axis=0)

# Add the 'OTHERS' column to aggregate the share not covered
# by the top 5 brands
brand_share['OTHERS'] = 100 - brand_share.sum(axis=1)
brand_share.head(2)
share2
Monthly top brand share dataframe.

Finally I’ll plot a stacked bar plot to show the evolution:

plt.style.use('bmh')
brand_share.plot(kind='bar', stacked=True, figsize=[16,6])
plt.legend(loc='upper center', ncol=6)
plt.title(u'TOP#5 BRAND SHARE EVOLUTION')
plt.xlabel('Month')
plt.ylabel('% share')
share
Share evolution of TOP#5 brands.

Some insights:

  • BRAND#1 keeps holding a privileged position, but shows a negative trend by having lost a 4% of share.
  • BRAND#2 has have a sustained growth and has doubled it’s share.
  • BRAND#3 is clearly getting into trouble as it has lost almost the 50% of his share, and it continues loosing…
  • BRAND#4 and BRAND#5 have shown a soft growth and seem to have a stable position.

Brand loyalty

This time the analysis is more tricky so I will need to perform 4 steps (or intermediate dataframes):

  1. device_history: for each (MSISDN,IMEI) tuple stores the first and last months and the total amount of days.
  2. renewed: basically is the device_history filtered and sorted to only have records from users which have renewed once.
  3. brand_movements: aggregation of renewed to get one row for each renewal, and two columns: the source and the destination brands
  4. brand_mvnt_matrix: relative relationship within each source and destination brands.
device_history
device_history = big_df\
        .groupby(['MSISDN', 'IMEI']).agg({
            'YYYYMM': {
                'FIRST_SEEN': np.min,
                'LAST_SEEN': np.max},
            'NDAYS': {'NDAYS': np.sum},
            'BRAND': { 'BRAND': 'first' }
        })

# get pretty column names by dropping the first level
device_history.columns = device_history.columns.droplevel()

# convert the multi level index into columns for further analysis
device_history.reset_index(inplace=True)

device_history.head(3)
history
Device history dataframe.
renewed

The next thing I’d like to do is to focus on users which have renewed their handset once, so they appear twice in the history dataframe. To achieve this, I’m going to add a new column with the count of handsets used by a user, filter by it and then sort the results.

phones_by_msisdn = device_history\
    .groupby('MSISDN')\
    .IMEI\
    .agg({'NHANDSETS': 'count'})\
    .reset_index()

# add the new column to the history
device_history = \
    pd.merge(device_history,phones_by_msisdn,on='MSISDN')

# filter users wich have renewed once
# and sort by MSISDN and FIRST_SEEN
renewed = device_history\
    [device_history.NHANDSETS == 2]\
    .sort_values(['MSISDN','FIRST_SEEN'])
renewed.head(2)
history2
Device history dataframe of users which have renewed once.
brand_movements
brand_movements = renewed.groupby(['MSISDN']).agg({
      'BRAND': {
            'MOVEMENT': lambda s: s.str.cat(sep='=>'),
            'SRC': lambda s: s.iloc[0],
            'DST': lambda s: s.iloc[1]
        }
    })
brand_movements.columns = brand_movements.columns.droplevel()
brand_movements.head(3)
movements
Handset renewal dataframe.
brand_mvnt_matrix
brand_mvnt_matrix = phone_movements.groupby('SRC').DST\
   .value_counts().to_frame().unstack()
brand_mvnt_matrix.columns = \
    brand_mvnt_matrix.columns.droplevel() 

# convert to relative
brand_mvnt_matrix = 100*brand_mvnt_matrix.div(\
    brand_mvnt_matrix.sum(axis=1),\
    axis=0)

# show the top brands sub-matrix
brand_mvnt_matrix.loc[top_vendors, top_vendors]
MATRIX
Brand movements matrix.

The final matrix shows the migration between brands, and if we focus on the diagonal, the level of loyalty of each brand.

loyalty
Level of loyalty  of the top 5 brands.

Conclusion

Despite the simplicity of the analysis, it has shown high value information about the handset brands. Some data driven decisions may be:

  • To bet on BRAND#2 – is showing a stable growth sustained by sales and the customer satisfaction with the 2nd best loyalty ratio.
  • To do not put any effort on BRAND#3: it has only kept the 8% of its users and has lost the half of his share.

With a pretty similar aproach it would be easy to discover for each origin smartphone which are the most popular renewed handsets. This information could be used as a new input for a handset recommender system designed to increase the ratio of users renewing their handsets with my company.

 

 

 

 

Advertisements

One thought on “Smartphone brand loyalty and handset renewal analysis

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s