
The case study scenario:
In the fictitious scenario, I’m a junior data analyst at Bellabeat, a leading wellness technology company, I’m tasked with uncovering valuable insights from smart device usage data. The goal is identifying trends and opportunities to enhance Bellabeat’s marketing strategy and product offerings.
Key Questions:
1. What are the current trends in smart device usage?
2. How can these trends be applied to Bellabeat’s target audience?
3. What specific marketing strategies can leverage these insights to drive growth?
Data Source: I’ll work with a Fitbit Fitness Tracker dataset from Kaggle, which provides detailed information on users’ daily activity, steps, and heart rate.
Deliverables:
· Executive Summary: A concise overview of the analysis, key findings, and recommendations.
· Data Exploration: A detailed exploration of the Fitbit dataset, including data cleaning, transformation, and visualization.
· Trend Identification: Identification of key trends in smart device usage, such as activity patterns, sleep habits, and heart rate variability.
· Marketing Strategy Recommendations: Specific recommendations for Bellabeat’s marketing team to capitalize on these trends, including targeted campaigns, product enhancements, and partnerships.
Data Analysis Process for the Bellabeat Case Study
1. Ask:
· Define the Business Problem: Clearly articulate the business problem and the specific insights needed. In this case, the problem is understanding how consumers use smart devices and how Bellabeat can leverage this knowledge to improve its marketing strategy.
· Formulate Research Questions: Develop specific questions to guide the analysis, such as:
o What are the key trends in smart device usage?
o How do Bellabeat users interact with their devices compared to general smart device users?
o What are the opportunities for Bellabeat to enhance its product offerings and marketing strategies based on these insights?
2. Prepare:
· Data Acquisition: Gather the necessary data, which in this case includes the Fitbit dataset and potentially additional data sources like Bellabeat’s internal user data.
· Data Cleaning: Clean the data to remove inconsistencies, errors, and missing values.
· Data Integration: Combine different data sources into a unified dataset, if necessary.
3. Process:
· Data Transformation: Transform the data into a suitable format for analysis, such as creating new variables or aggregating data.
· Feature Engineering: Create new features relevant to the analysis, such as calculating average daily steps or sleep duration.
4. Analyze:
· Descriptive Analysis: Summarize the data using measures like mean, median, mode, and standard deviation.
· Inferential Analysis: Use statistical techniques to draw conclusions about the population based on the sample data, such as hypothesis testing and confidence intervals.
· Predictive Analysis: Build models to predict future trends or user behavior.
5. Share:
· Data Visualization: Create visualizations (charts, graphs, dashboards) to communicate the findings effectively.
· Report Writing: Prepare a clear and concise report summarizing the analysis, key findings, and recommendations.
· Presentation: Present the findings to the Bellabeat executive team, using a combination of visuals and storytelling techniques.
6. Act:
· Decision Making: Use the insights from the analysis to inform business decisions, such as product development, marketing campaigns, and customer engagement strategies.
· Implementation: Work with the relevant teams to implement the recommended actions.
· Monitoring and Evaluation: Continuously monitor the impact of the implemented strategies and make adjustments as needed.
Ask:
· Business Problem:
o How can Bellabeat leverage data-driven insights to improve customer engagement and satisfaction?
o How can Bellabeat identify opportunities for product innovation and expansion?
o How can Bellabeat optimize its marketing strategies to reach its target audience more effectively?
· Research Questions:
o What are the key factors influencing user behavior and device usage patterns?
o How can Bellabeat identify and segment its user base based on their preferences and needs?
o What are the opportunities to create personalized experiences and targeted marketing campaigns?
o How can Bellabeat measure the impact of its marketing initiatives and product features?
Deliverable:
· A clear and concise statement of the business problem and research questions.
· A documented plan for data acquisition, cleaning, and preparation.
· A detailed analysis plan outlining the specific techniques and tools to be used.
Prepare:
● Where is your data stored?
I’ll use the public data below that explores smart device users’ daily habits.:
FitBit Fitness Tracker Data (CC0: Public Domain, dataset made available through Mobius): The Kaggle dataset contains personal fitness tracker from thirty fitbit users. Thirty eligible Fitbit users consented to the submission of personal tracker data, including minute-level output for physical activity, heart rate, and sleep monitoring. It includes information about daily activity, steps, and heart rate that can be used to explore users’ habits.
The Fitbit Data Tracker is stored in this Bellabeat_Case_Study folder.
● How is the data organized? Is it in long or wide format?
The Fitbit Data 4.12.16 has 18 XLS Worksheets either in long or wide format.
Since most of the datasets values are in the dailyActivity_merged datasets, I’ll only use three datasets.
● Are there issues with bias or credibility in this data?
Does your data ROCCC?
Here are some possible issues:
· Limited Sample Size: The dataset might only include data from a small group of Fitbit users, potentially not reflecting the broader population Bellabeat is targeting.
· Selection Bias: The users who chose to share their Fitbit data could be a specific demographic or have different fitness habits than Bellabeat’s target audience.
· Lack of Context: The data might not include information like user demographics, lifestyle habits, or motivations, which could be valuable for understanding user behavior.
● How are you addressing licensing, privacy, security, and accessibility?
The datasets are in the public domain.
Process:
● What tools are you choosing and why?
I’m using R for all data processing and analysis tasks. R is a powerful statistical programming language with a wide range of packages specifically designed for data manipulation, cleaning, and visualization. Packages like tidyverse (including dplyr, tidyr, and ggplot2) and caret are particularly useful for data cleaning, transformation, and modeling.
First, I will need to install . . .
tidyverse: A collection of packages, including dplyr and tidyr,
for data manipulation and visualization.
#Install and load the tidyverse
install.packages(‘tidyverse’)
library(tidyverse)
● Have you ensured your data’s integrity?
To ensure data integrity, I’ve implemented the following steps:
1. Data Import and Inspection:
1. Used read_csv() or read_excel() to import the data into R.
2. Employed summary(), str(), and head() functions to get an overview of the data.
I’ll load my three csv files into R
# Create a dataframe named ‘daily_activity’ and read in one
# of the CSV files from the dataset.
daily_activity <- read.csv(“dailyActivity_merged.csv”)
# Create another dataframe for the sleep data.
sleep_day <- read.csv(“sleepDay_merged.csv”)
# Create another dataframe for the heart rate data.
heartrate_seconds <- read.csv(“heartrate_seconds_merged.csv”)
2. Data Cleaning:
1. Missing Value Handling: Used functions like is.na() and complete.cases() to identify and handle missing values (e.g., imputation or removal).
2. Outlier Detection: Utilized box plots, histograms, and statistical methods (e.g., Z-scores) to identify outliers.
3. Data Formatting: Ensured consistent data types and formats using functions like as.numeric(), as.factor(), and as.Date().
3. Data Consistency Checks:
1. Verified data consistency across different variables and time periods.
2. Used logical comparisons and conditional statements to identify inconsistencies.
● What steps have you taken to ensure that your data is clean?
· Removing Duplicates: Used distinct() to eliminate duplicate rows.
· Handling Missing Values: Imputed missing values using methods like mean, median, or mode imputation, or removed rows with excessive missing data.
· Outlier Treatment: Replaced outliers with more reasonable values or removed them, depending on the context.
· Data Normalization: Scaled numerical variables to a common range using techniques like min-max scaling or standardization.
· Data Transformation: Transformed variables as needed (e.g., log transformation for skewed data).
· Verify Data Types: Ensure data types are correct (e.g., numerical for numerical data, categorical for categorical).
● How can you verify that your data is clean and ready to analyze?
· Data Visualization: Created various visualizations (histograms, box plots, scatter plots) to identify patterns, anomalies, and potential issues.
· Statistical Tests: Used statistical tests (e.g., normality tests, correlation analysis) to assess data distribution and relationships.
· Domain Knowledge: Leveraged my understanding of the domain to validate the data and identify inconsistencies.
● Have you documented your cleaning process so you can review and share those results?
· Code Comments: Added clear and concise comments within the R script to explain each step.
· Data Dictionary: Created a data dictionary to document variable names, data types, and descriptions.
· Version Control: Used Git to track changes and collaborate with others.
· Data Visualization: Created visualizations to illustrate the data cleaning process and its impact on the data.
By following these steps and leveraging R’s powerful data analysis capabilities, I can ensure that the data is clean, reliable, and ready for analysis.
Analyze:
● How should you organize your data to perform analysis on it?
I want to more align my datasets so I’m converting the ‘seconds’ in heartrate_seconds to an ‘hours’ format to be in line with the ‘daily_activity’ and ‘sleep_day’ datasets:
heartrate_hourly <- heartrate_seconds %>%
group_by(Id, date = date(Time)) %>%converting
summarize()
average_hr = mean(Value),
min_hr = min(Value),
max_hr = max(Value))
The heartrate_seconds dataset is now the heartrate_hourly dataset.
# Now lets take a look at the data. The head() function is used to display the first few rows of a dataset. By default, it shows the first 6 rows.
head(daily_activity)
head(sleep_day)
head(heartrate_hourly)
All three data sets share the same “Id” and “date” column names. The daily_activity has very useful columns that track daily steps, levels of activity, non-active minutes, and daily caloric intake. The sleep_day dataset has daily total minutes asleep, and total time in bed. The heartrate hourly dataset has the average, minimum, and maximum heart rate per minute or BPM (Beats Per Minute).
# How many observations (rows) are there in each dataframe?
nrow(daily_activity)
940 rows
nrow(sleep_day)
413 rows
nrow(heartrate_hourly)
334 rows
# What are some summary statistics I’d want to know about each data
frame to use for my analysis?
# For the daily activity dataframe:
daily_activity %>%
select(TotalSteps, TotalDistance, SedentaryMinutes, Calories) %>%
summary()
# For the sleep day dataframe:
sleep_day %>%
select(TotalSleepRecords, TotalMinutesAsleep, TotalTimeInBed) %>%
summary()
# For the heartrate hourly dataframe:
heartrate_hourly %>%
select(average_hr, min_hr, max_hr) %>%
summary()
● Has your data been properly formatted?
I’ll use str() to confirm data types for each dataset:
str(daily_activity)
str(heartrate_hourly)
str(sleep_day)
· It looks like the dates in daily_activity and sleep_day are written as characters, which should ideally be converted to a Date or POSIXct format for analysis.
· For heartrate_hourly, having Time as a datetime format (not just character) will help when performing time-based calculations.
Convert Dates:
daily_activity$ActivityDate <- as.Date(daily_activity$ActivityDate, format = “%m/%d/%Y”)
sleep_day$SleepDay <- as.POSIXct(sleep_day$SleepDay, format = “%m/%d/%Y %I:%M:%S %p”)
heartrate_hourly$Time <- as.POSIXct(heartrate_hourly$Time, format = “%m/%d/%Y %I:%M:%S %p”)
Share:
Once you have completed your analysis, create your data visualizations. The visualizations should clearly communicate your high-level insights and recommendations.
Guiding questions
● Were you able to answer the business questions?
● What story does your data tell?
● How do your findings relate to your original question?
● Who is your audience? What is the best way to communicate with them?
● Can data visualization help you share your findings?
● Is your presentation accessible to your audience?
Key tasks
1. Determine the best way to share your findings.
2. Create effective data visualizations.
3. Present your findings.
4. Ensure your work is accessible.
Deliverable
I’ll be using the ggplot2 package which is part of the tidyverse, a collection of R packages for data visualization.
install.packages(“ggplot2”)
library(ggplot2)
● What trends or relationships did you find in the data?
I’ll make some visualization to help answer the question;
First the daily_activity dataset using a scatter plot chart for Total Steps and Sedentary Minutes.
ggplot(data=daily_activity, aes(x = TotalSteps, y = SedentaryMinutes)) +
geom_point() +
labs(x = “Total Steps”, y = “Sedentary Minutes”, title = “Total Steps vs. Sedentary Minutes”)
As the number of total steps increases, the number of sedentary minutes generally decreases. This suggests that more active people tend to spend less time sedentary.
Possible Marketing Recommendations:
· Promote Active Lifestyle: Emphasize the benefits of an active lifestyle, such as weight management, improved mood, and reduced stress.
· Encourage Regular Movement: Promote short bursts of activity throughout the day, such as taking breaks from sedentary activities and incorporating exercise into daily routines.
· Track Sedentary Time: Encourage users to track their sedentary time and set goals to reduce it.
Now the daily_activity dataset using a scatter plot chart for Total Steps and Calories.
ggplot(data=daily_activity, aes(x = TotalSteps, y = Calories)) + geom_point() +
labs(x = “Total Steps”, y = “Calories”, title = “Total Steps vs. Calories”)
There appears to be a positive correlation between total steps and calories burned. As the number of total steps increases, the number of calories burned also increases. For a given number of steps, there is a range of calories burned. This suggests that factors other than step count, such as metabolic rate, activity intensity, and body composition, influence calorie expenditure.
Possible Marketing Recommendations:
· Personalized Fitness Plans: Leverage data to create tailored workout plans based on individual calorie burn goals.
· Activity Tracking and Insights: Encourage users to track their daily activity and provide insights into calorie expenditure.
· Calorie Tracking and Goal Setting: Help users set and track calorie goals to support weight management and overall health.
Next I’ll take a look at the sleep_day dataset using a scatter plot chart for Total Minutes Asleep vs. Total time In Bed.
ggplot(data=sleep_day, aes(x = TotalMinutesAsleep, y = TotalTimeInBed)) + geom_point() +
labs(x = “Total Minutes Asleep”, y = “Total Time in Bed”, title = “Total Minutes Asleep vs. Total Time in Bed”)
As the total number of minutes asleep increases, the total time in bed also increases. This is expected, as the time spent in bed should generally be longer than the actual sleep duration. The data points are clustered around a diagonal line, indicating a strong positive linear relationship between the two variables.
Possible Marketing Recommendations:
· Personalized Sleep Insights: Provide users with personalized insights into their sleep efficiency, highlighting areas for improvement.
· Smart Sleep Tracking: Emphasize the Bellabeat’s ability to track sleep stages and provide insights into sleep quality.
· Sleep Goals: Encourage users to set personalized sleep goals and track their progress.
Next for the heartrate_hourly dataset using a histogram chart.
ggplot(heartrate_hourly, aes(x = average_hr)) +
geom_histogram(binwidth = 5, color = “black”, fill = “lightblue”) +
labs(x = “Average Heart Rate (BPM)”, y = “Frequency”, title = “Average Heart Rate Frequency”)
The majority of heart rates seem to fall within the 60-80 BPM range. This is generally considered a normal resting heart rate for adults indicating overall good cardiovascular health. A few data points at the higher end of the range could be outliers or represent periods of intense physical activity. A small portion of users have higher-than-normal heart rates, which might be due to stress, anxiety, or underlying health conditions.
Possible Marketing Recommendations:
· Personalized Fitness: Provide tailored workout plans based on real-time heart rate data.
· Holistic Health: Emphasize the connection between heart health, sleep quality, and overall well-being.
· Data-Driven Insights: Offer personalized insights and recommendations based on heart rate data and user behavior.
Act:
Note: An important thing to remember is that the Fitbit dataset includes data from a small group of Fitbit users (30), potentially not reflecting the broader population Bellabeat targets. I recommend that Bellabeat explore internal and even more external datasets to help with the actions that I recommend.
Decision Making
Based on the insights gained from the data analysis, Bellabeat can make informed decisions to enhance its product offerings and marketing strategies:
· Product Development:
o Advanced Features: Consider adding features like stress monitoring, advanced sleep analysis, and personalized fitness coaching.
o Improved User Experience: Enhance the user interface and user experience of the Bellabeat app.
o Durable and Stylish Designs: Continue to invest in sleek and durable designs to appeal to a wider audience.
· Marketing Campaigns:
o Targeted Marketing: Use data-driven insights to target specific demographics and user segments with tailored marketing messages.
o Social Media Marketing: Leverage social media platforms to engage with users, share health and wellness tips, and run contests and promotions.
o Influencer Partnerships: Collaborate with fitness and wellness influencers to promote Bellabeat and its benefits.
Three Marketing Recommendations for the Bellabeat:
1. Personalized Wellness Experiences: Leverage data to provide tailored recommendations and insights to users, such as personalized workout plans, sleep tips, and stress management techniques.
2. Community Building: Foster a strong online community where users can connect, share experiences, and motivate each other.
3. Influencer Partnerships: Collaborate with fitness and wellness influencers to promote the Bellabeat and showcase its benefits.
Thank you to everyone who spent time reading my Case Study.

Leave a comment