Starbucks Gives Away Free Money!!!

10 min readMar 22, 2021

Files and code for this project can be viewed on my GitHub repository.

PROJECT DEFINITION

Project Overview

The final requirement for the Udacity Data Science nanodegree is selecting and completing a capstone project which demonstrates the application of skills learned throughout the course. For my project I have chosen to analyze the Starbucks data relating to various promotional offers. The data contains information about the offers, the customers, and the interactions between the two.

As a Starbucks customer, I was interested in understanding how they determine the best rewards to offer and how that process might be improved. So when I initially explored the data, I was surprised to find that Starbucks is allowing customers to receive award dollars for promotions that the customer was not even aware of. Since the purpose of the promotion should be to incentivize new business, these occurrences simply result in Starbucks giving away free money.

My natural inclination for process improvement took over and I decided to focus this project on attempting to predict whether or not a specific offer proposed to a particular customer type would result in “free money” and therefore help Starbucks to improve their marketing campaigns.

Problem Statement

Using the portfolio, profile, and transcript datasets provided, develop a machine learning classifier model which will predict the likelihood that an offer will result in award dollars paid which did not incentivize new business.

Metrics

Because the mission of this project is to develop a model which will predict a specific offer-customer interaction, it’s logical to use model accuracy as the metric by which to measure the model performance. For this project, model accuracy will be calculated using the ‘accuracy_score’ method of the sklearn.metrics module.

Data Exploration & Visualization

The data for this project was received in 3 separate .json files. The portfolio file contains data relating to the various offers that Starbucks is currently utilizing, the profile data describes individual customers, and the transcript file contains transactional information and offer events.

Portfolio Data

The Portfolio dataset contains 6 columns of string and integer data.

id (string) —unique offer id
offer_type (string) — type of offer (e.g. BOGO, discount, informational)
difficulty (int) — minimum required spend to complete an offer
reward (int) — reward given for completing an offer
duration (int) — time for offer to be open, in days
channels (list of strings) — media method(s) (e.g. web, email, mobile, social)

Starbucks is currently offering 10 unique promotions to its customers: 4 BOGO, 4 discount, and 2 informational.

The difficulty value describes the minimum amount that the customer is required to spend in order to successfully complete the offer. Difficulty is a discreet value of $0 for promotional offers or $5, $7, $10, or $20 for BOGO and discount offers.

Promotional rewards are discreet values of $0 for information offers or $2, $3, $5, or $10 for BOGO or discount offers.

Duration describes how many days a customer has to complete the offer from the day it was received. Most offers has a Duration of 7 days, but some may have discreet values of 3, 4, 5, or 10 days.

The Channels column describes which type of media will be used to deliver the promotional offer. A single offer may use one to four channels.

Profile Data

The Profile dataset contains 5 columns of string and numeric data.

id (string) — customer id
gender (string) — gender of the customer (‘M’, ‘F’, or ‘O’ for other)
age (int) — age of the customer
became_member_on (int) — date when customer created an app account
income (float) — customer’s income

There are 17,000 unique customers contained in the Profile dataset. Each row contains a unique id and values for age and membership date. However, over 2,000 records are missing gender and income information.

Male customers outnumber female customers by about 30%.

Customer Age contains default values of 118 in over 2,000 rows which does not represent true customer data. After removing these rows, customer age ranges from 18 to 101 with an average age of 54.

The became_member_on column includes the date that the customer created an online account. The chart below shows how this has developed from July 2013 to July 2018.

Starbucks customers’ incomes range from $30,000 to $120,000 with an average value of $65,000.

Transcript Data

The Transcript dataset contains 4 columns of string and numeric data.

person (str) — customer id
event (str) — description (transaction, offer received, offer viewed, etc.)
value — (dict of strings) — either an offer id or transaction amount depending on the record
time (int) — time in hours since start of test. The data begins at time t=0

There are over 300K total rows in this dataset. Approximately half of the rows relate to transactional processes and the other half indicate the status of a specific offer/customer combination.

The ‘value’ column contains key-value pairs of either transaction amount or offer id/reward value.

Methodology

Data Preprocessing

Cleaning the Portfolio dataset consisted of renaming the ‘id’ column to ‘offer_id’ and parsing the ‘offer type’ and ‘channels’ columns into individual columns for each value with a binary indicator (1=Yes, 0=No). This resulted in a new dataset with 11 columns instead of 5.

To clean the Profile dataset, I removed 2175 rows with incomplete customer information (missing ‘gender’ and ‘income’ plus default ‘age’ value). These rows cannot be used to describe a customer and therefore are not useful for this analysis. Then the ‘id’ column was the renamed to ‘person_id’ to allow for joining with other datasets. Next, the ‘gender’ column was parsed into separate columns for each value and the ‘age’, ‘became_member_on’, and ‘income’ columns were used to create separate group columns with a binary indicator (1=Yes, 0=No). For age, customers were grouped by 25-yr intervals and for membership time customers were grouped by the year they became a member.

To clean the Transcript dataset, I first had to parse the key-value pairs in the ‘value’ column. While inspecting the data in this column, I noticed that some rows contained a key “offer id” and some contained “offer_id” so I renamed all of these keys to “offer_id” to normalize the data and make it compatible with the Portfolio dataset for joining. Once this was done, all key-value pairs were parsed and new columns created using the key as the column name. Finally, I renamed ‘person id’ to ‘person_id’ to make this field compatible with the Profile dataset for joining.

The final preprocessing step involved using the 3 datasets above to form the Offers dataset needed for the machine learning model. First, the Transcript dataset was divided into 2 different datasets, Transactions and Offers using the values in the ‘event’ column. Next, the Offers dataset needed to be flattened to one row per unique offer sequence (person + offer + received time). I started by creating 3 different datasets from the Offers dataset: Received, Viewed, and Completed based on the value in the ‘event’ column. Next I joined the Received and Viewed datasets and assumed that a viewed offer was associated with the offer received nearest to it in time (must be received first). From this I got the view time of the offer and added this to the Received dataset.

I then performed a similar process using Received offers and Completed offers to get the completed time of each unique offer and the reward value for the offer and added this to the Received dataset.

Next, I added 2 more columns to indicate whether or not the offer had been received and whether or not it had been completed.

Finally, I added a column indicating whether or not a completed offer had been viewed prior to being completed. The column is called ‘invalid_reward’ and 1 indicates that the reward was paid to the customer even though the customer had never viewed the offer. A 0 indicates that either the reward paid was valid or the offer was not completed.

The final Offers dataset can be described as follows. This indicates that 30% of all reward dollars were paid to customers who were not even aware that there was a promotional offer.

Implementation

With the data preprocessing completed, the next step was to develop a machine learning model. For this project I chose ScikitLearn’s Gradient Booting Classifier (GBC) because I have had good luck with this algorithm on previous projects and I am familiar with some of the tuning parameters.

Using the Offers dataset, I created the Features (X) and Results (y) datasets and split both into Training and Test sets using the default value of 75% Training, 25% Test. Next I fit the GBC object using the Training sets and compared the predicted results to the actual results using the “accuracy_score” method. The GBC object, with default settings, achieved an accuracy of 85%.

Model Evaluation and Validation

In an attempt to improve upon the results of the GBC with default settings, I used GridSearch to perform cross validation on multiple values of various parameters. I chose ‘learning_rate’ (default=0.1), ‘max_depth’ (default=3), and ‘max_features’ (default=None) as I believe these to be significant to the performance of the model. For each parameter, I created a list of values to iterate through to try to improve the model performance.

From the GridSearch I learned that the optimal value for ‘learning_rate’ is closer to 0.5 and the optimal value for ‘max_depth’ is 2. However, because the optimal value for ‘max_features’ was the bottom of the range I provided, it is not evident if this is truly the optimal value so I re-adjusted my range and re-ran the GridSearch.

This time, the optimal value for ‘max_features’ is slightly higher than 0.25 so I know I have found the true optimal value. Finally, using these optimized parameters, I re-fit the GBC object and again calculated the accuracy score. This time the model achieved an accuracy of 86% which is a slight improvement over the default settings.

Justification

Additional testing is needed to determine if GBC is the best algorithm to use and further optimize the parameters. However, with an 86% accuracy achieved rather quickly, I believe Starbucks would be interested in a model like this. Using this model, the Marketing Dept can better target offers to customers who. Because the purpose of the promotion is to generate new business and therefore revenue, it makes sense to try to minimize the “free money” spent on customers who aren’t even aware of the existence of the offer.

Conclusion

Reflection

This was a very interesting project that involved quite a few aspects of the Data Science course. The focus on data preparation was very intense. However, according to most data scientists, that is consistent with the “real world”. The datasets were simple enough to use for a short-term project, but yet still required a decent amount of manipulation.

I found the most challenging aspect to be handling the unordered nature of the offer events. Multiple offers of the same id could be offered to the same customer so it because difficult to sort out which offer had been viewed/completed. If I worked at Starbucks, I would want to attempt to uniquely identify the offer with some sort of tag in order to ensure that the view/complete times align with the correct offer.

Improvement

I enjoyed building the ML model, but ran out of time to really experiment with other models due to the length of time spent on preprocessing.