How Can A Wellness Technology Company Play It Smart?

Step 1: Scope

Company Background

Bellabeat, founded by Urška Sršen and Sando Mur in 2013, produces health-focused smart products for women. They collect data on activity, sleep, stress, and reproductive health, empowering women with insights. The company rapidly expanded, using digital marketing extensively. In 2016, the co-founders sought to analyze smart device usage data to inform their marketing strategy for continued growth.

Business Task

Analyze FitBit Fitness Tracker usage data to understand consumer behavior and inform Bellabeat's marketing strategy.

Objectives

What are some trends in smart device usage?
How could these trends apply to Bellabeat customers?
How could these trends help influence Bellabeat marketing strategy?

Deliverables

A clear summary of the business task
A description of all data sources used
Documentation of any cleaning or manipulation of data
A summary of your analysis
Supporting visualizations and key findings
Your top high-level content recommendations based on your analysis

Step 2: Prepare

Data Details

The publicly available FitBit Fitness Tracker Data, found on Kaggle in 18 CSV files, originates from a distributed survey conducted through Amazon Mechanical Turk between March 12, 2016, and May 12, 2016, involving 30 consenting FitBit users who submitted their personal tracker data.

This data encompasses five key categories:

Recorded physical activity in minutes
Heart rate
Sleep monitoring
Daily activity
Step counts

Data Limitations

Possible drawbacks of data collected from distributed surveys include selection and non-response bias, self-reporting inaccuracies, limited control, privacy concerns, and difficulty in verifying responses. The sample size is low with only 33 female participants. There are no demographics provided. The data is over 6 years old and not relevant.

Data Selection

The following file is selected for analysis.

dailyActivity_merged.csv

Step 3: Process

Python is being used to prepare and process the data in a Jupyter Notebook.

# Import packages

import pandas as pd
import numpy as np
import matplotlib as plt
import datetime as dt

# Import dataset
daily_activity = pd.read_csv("data/dailyActivity_merged.csv")

# Explore dataset
daily_activity.head()

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	Id	ActivityDate	TotalSteps	TotalDistance	TrackerDistance	VeryActiveDistance	ModeratelyActiveDistance	LightActiveDistance	VeryActiveMinutes	FairlyActiveMinutes	LightlyActiveMinutes	SedentaryMinutes	Calories
0	1503960366	4/12/2016	13162	8.50	8.50	1.88	0.55	6.06	25	13	328	728	1985
1	1503960366	4/13/2016	10735	6.97	6.97	1.57	0.69	4.71	21	19	217	776	1797
2	1503960366	4/14/2016	10460	6.74	6.74	2.44	0.40	3.91	30	11	181	1218	1776
3	1503960366	4/15/2016	9762	6.28	6.28	2.14	1.26	2.83	29	34	209	726	1745
4	1503960366	4/16/2016	12669	8.16	8.16	2.71	0.41	5.04	36	10	221	773	1863

daily_activity.tail()

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	Id	ActivityDate	TotalSteps	TotalDistance	TrackerDistance	VeryActiveDistance	ModeratelyActiveDistance	LightActiveDistance	SedentaryActiveDistance	VeryActiveMinutes	FairlyActiveMinutes	LightlyActiveMinutes	SedentaryMinutes	Calories
935	8877689391	5/8/2016	10686	8.110000	8.110000	1.08	0.20	6.80	0.00	17	4	245	1174	2847
936	8877689391	5/9/2016	20226	18.250000	18.250000	11.10	0.80	6.24	0.05	73	19	217	1131	3710
937	8877689391	5/10/2016	10733	8.150000	8.150000	1.35	0.46	6.28	0.00	18	11	224	1187	2832
938	8877689391	5/11/2016	21420	19.559999	19.559999	13.22	0.41	5.89	0.00	88	12	213	1127	3832
939	8877689391	5/12/2016	8064	6.120000	6.120000	1.82	0.04	4.25	0.00	23	1	137	770	1849

# Check the amount of rown and columns in the dataset
daily_activity.shape

(940, 15)

# Count the duplicate rows in the dataset
daily_activity.duplicated().sum()

# Check for missing values
missing_values_count = daily_activity.isnull().sum()

missing_values_count[:]

Id                          0
ActivityDate                0
TotalSteps                  0
TotalDistance               0
TrackerDistance             0
LoggedActivitiesDistance    0
VeryActiveDistance          0
ModeratelyActiveDistance    0
LightActiveDistance         0
SedentaryActiveDistance     0
VeryActiveMinutes           0
FairlyActiveMinutes         0
LightlyActiveMinutes        0
SedentaryMinutes            0
Calories                    0
dtype: int64

# Count the number of uniuqe IDs

unique_id = len(pd.unique(daily_activity["Id"]))

print('This dataset contains ' + str(unique_id) + ' unique participants.')

This dataset contains 33 unique participants.

# Show the dataset information
daily_activity.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 940 entries, 0 to 939
Data columns (total 15 columns):
 #   Column                    Non-Null Count  Dtype
---  ------                    --------------  -----
 0   Id                        940 non-null    int64
 1   ActivityDate              940 non-null    object
 2   TotalSteps                940 non-null    int64
 3   TotalDistance             940 non-null    float64
 4   TrackerDistance           940 non-null    float64
 5   LoggedActivitiesDistance  940 non-null    float64
 6   VeryActiveDistance        940 non-null    float64
 7   ModeratelyActiveDistance  940 non-null    float64
 8   LightActiveDistance       940 non-null    float64
 9   SedentaryActiveDistance   940 non-null    float64
 10  VeryActiveMinutes         940 non-null    int64
 11  FairlyActiveMinutes       940 non-null    int64
 12  LightlyActiveMinutes      940 non-null    int64
 13  SedentaryMinutes          940 non-null    int64
 14  Calories                  940 non-null    int64
dtypes: float64(7), int64(7), object(1)
memory usage: 110.3+ KB

# convert ActivityDate to datatime64 dtype and format to yyyy-mm-dd
daily_activity["ActivityDate"] = pd.to_datetime(daily_activity["ActivityDate"], format="%m/%d/%Y")

daily_activity

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	Id	ActivityDate	TotalSteps	TotalDistance	TrackerDistance	LoggedActivitiesDistance	VeryActiveDistance	ModeratelyActiveDistance	LightActiveDistance	SedentaryActiveDistance	VeryActiveMinutes	FairlyActiveMinutes	LightlyActiveMinutes	SedentaryMinutes	Calories	Weekday
0	1503960366	2016-04-12	13162	8.500000	8.500000	0.0	1.88	0.55	6.06	0.00	25	13	328	728	1985	2
1	1503960366	2016-04-13	10735	6.970000	6.970000	0.0	1.57	0.69	4.71	0.00	21	19	217	776	1797	3
2	1503960366	2016-04-14	10460	6.740000	6.740000	0.0	2.44	0.40	3.91	0.00	30	11	181	1218	1776	4
3	1503960366	2016-04-15	9762	6.280000	6.280000	0.0	2.14	1.26	2.83	0.00	29	34	209	726	1745	5
4	1503960366	2016-04-16	12669	8.160000	8.160000	0.0	2.71	0.41	5.04	0.00	36	10	221	773	1863	6
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
935	8877689391	2016-05-08	10686	8.110000	8.110000	0.0	1.08	0.20	6.80	0.00	17	4	245	1174	2847	0
936	8877689391	2016-05-09	20226	18.250000	18.250000	0.0	11.10	0.80	6.24	0.05	73	19	217	1131	3710	1
937	8877689391	2016-05-10	10733	8.150000	8.150000	0.0	1.35	0.46	6.28	0.00	18	11	224	1187	2832	2
938	8877689391	2016-05-11	21420	19.559999	19.559999	0.0	13.22	0.41	5.89	0.00	88	12	213	1127	3832	3
939	8877689391	2016-05-12	8064	6.120000	6.120000	0.0	1.82	0.04	4.25	0.00	23	1	137	770	1849	4

940 rows × 16 columns

# Get the day of the week from ActivityDate and create a new column
daily_activity["Weekday"] = daily_activity["ActivityDate"].dt.day_name()

# Set the custom offset to make Sunday the start of the week (6)
daily_activity['Weekday'] = (daily_activity['ActivityDate'].dt.dayofweek + 1) % 7

# Map the numeric values to day names
day_names = ['Sunday', 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday']
daily_activity['Weekday'] = daily_activity['Weekday'].map(lambda x: day_names[x])

daily_activity.head()

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	Id	ActivityDate	TotalSteps	TotalDistance	TrackerDistance	VeryActiveDistance	ModeratelyActiveDistance	LightActiveDistance	VeryActiveMinutes	FairlyActiveMinutes	LightlyActiveMinutes	SedentaryMinutes	Calories	Weekday
0	1503960366	2016-04-12	13162	8.50	8.50	1.88	0.55	6.06	25	13	328	728	1985	Tuesday
1	1503960366	2016-04-13	10735	6.97	6.97	1.57	0.69	4.71	21	19	217	776	1797	Wednesday
2	1503960366	2016-04-14	10460	6.74	6.74	2.44	0.40	3.91	30	11	181	1218	1776	Thursday
3	1503960366	2016-04-15	9762	6.28	6.28	2.14	1.26	2.83	29	34	209	726	1745	Friday
4	1503960366	2016-04-16	12669	8.16	8.16	2.71	0.41	5.04	36	10	221	773	1863	Saturday

# Create new column "TotalMins" containing sum of total minutes.
daily_activity["TotalMins"] = daily_activity["VeryActiveMinutes"] + daily_activity["FairlyActiveMinutes"] + daily_activity["LightlyActiveMinutes"] + daily_activity["SedentaryMinutes"]
daily_activity["TotalMins"].head(5)

0    1094
1    1033
2    1440
3     998
4    1040
Name: TotalMins, dtype: int64

daily_activity.head()

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	Id	ActivityDate	TotalSteps	TotalDistance	TrackerDistance	VeryActiveDistance	ModeratelyActiveDistance	LightActiveDistance	VeryActiveMinutes	FairlyActiveMinutes	LightlyActiveMinutes	SedentaryMinutes	Calories	Weekday
0	1503960366	2016-04-12	13162	8.50	8.50	1.88	0.55	6.06	25	13	328	728	1985	Tuesday
1	1503960366	2016-04-13	10735	6.97	6.97	1.57	0.69	4.71	21	19	217	776	1797	Wednesday
2	1503960366	2016-04-14	10460	6.74	6.74	2.44	0.40	3.91	30	11	181	1218	1776	Thursday
3	1503960366	2016-04-15	9762	6.28	6.28	2.14	1.26	2.83	29	34	209	726	1745	Friday
4	1503960366	2016-04-16	12669	8.16	8.16	2.71	0.41	5.04	36	10	221	773	1863	Saturday

# Create new column "TotalHours" by converting to hour and round float to two decimal places
daily_activity["TotalHours"] = round(daily_activity["TotalMins"] / 60)

daily_activity.head()

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	Id	ActivityDate	TotalSteps	TotalDistance	TrackerDistance	VeryActiveDistance	ModeratelyActiveDistance	LightActiveDistance	VeryActiveMinutes	FairlyActiveMinutes	LightlyActiveMinutes	SedentaryMinutes	Calories	Weekday	TotalMins
0	1503960366	2016-04-12	13162	8.50	8.50	1.88	0.55	6.06	25	13	328	728	1985	Tuesday	1094
1	1503960366	2016-04-13	10735	6.97	6.97	1.57	0.69	4.71	21	19	217	776	1797	Wednesday	1033
2	1503960366	2016-04-14	10460	6.74	6.74	2.44	0.40	3.91	30	11	181	1218	1776	Thursday	1440
3	1503960366	2016-04-15	9762	6.28	6.28	2.14	1.26	2.83	29	34	209	726	1745	Friday	998
4	1503960366	2016-04-16	12669	8.16	8.16	2.71	0.41	5.04	36	10	221	773	1863	Saturday	1040

Step 4: Analyze

Pull statistics and perform calculations for further analysis

# Pull general statistics
daily_activity.describe()

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	Id	ActivityDate	TotalSteps	TotalDistance	TrackerDistance	LoggedActivitiesDistance	VeryActiveDistance	ModeratelyActiveDistance	LightActiveDistance	SedentaryActiveDistance	VeryActiveMinutes	FairlyActiveMinutes	LightlyActiveMinutes	SedentaryMinutes	Calories	TotalMins	TotalHours
count	9.400000e+02	940	940.000000	940.000000	940.000000	940.000000	940.000000	940.000000	940.000000	940.000000	940.000000	940.000000	940.000000	940.000000	940.000000	940.000000	940.000000
mean	4.855407e+09	2016-04-26 06:53:37.021276672	7637.910638	5.489702	5.475351	0.108171	1.502681	0.567543	3.340819	0.001606	21.164894	13.564894	192.812766	991.210638	2303.609574	1218.753191	20.313830
min	1.503960e+09	2016-04-12 00:00:00	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	2.000000	0.000000
25%	2.320127e+09	2016-04-19 00:00:00	3789.750000	2.620000	2.620000	0.000000	0.000000	0.000000	1.945000	0.000000	0.000000	0.000000	127.000000	729.750000	1828.500000	989.750000	16.000000
50%	4.445115e+09	2016-04-26 00:00:00	7405.500000	5.245000	5.245000	0.000000	0.210000	0.240000	3.365000	0.000000	4.000000	6.000000	199.000000	1057.500000	2134.000000	1440.000000	24.000000
75%	6.962181e+09	2016-05-04 00:00:00	10727.000000	7.712500	7.710000	0.000000	2.052500	0.800000	4.782500	0.000000	32.000000	19.000000	264.000000	1229.500000	2793.250000	1440.000000	24.000000
max	8.877689e+09	2016-05-12 00:00:00	36019.000000	28.030001	28.030001	4.942142	21.920000	6.480000	10.710000	0.110000	210.000000	143.000000	518.000000	1440.000000	4900.000000	1440.000000	24.000000
std	2.424805e+09	NaN	5087.150742	3.924606	3.907276	0.619897	2.658941	0.883580	2.040655	0.007346	32.844803	19.987404	109.174700	301.267437	718.166862	265.931767	4.437283

Observations

Average Steps and Distance: On average, users recorded 7,637 steps, which translates to approximately 5.4 kilometers. This average falls below the recommended activity level. According to the CDC, adult females are advised to aim for at least 10,000 steps or approximately 8 kilometers per day to realize the benefits of improved general health, weight loss, and fitness. (Source: Medical News Today article)

Predominance of Sedentary Users: A significant majority of users, constituting 81% of the total average minutes, fall into the category of sedentary users. On average, they logged approximately 991 minutes, equivalent to approximately 20 hours of sedentary activity.

Average Calories Burned: The average calorie burn for users is estimated at 2,303 calories, which is roughly equivalent to 0.6 pounds. It's important to note that a detailed interpretation of calorie burn is influenced by various factors, including age, weight, exercise, hormones, and daily calorie intake. These specific factors are not known, making it challenging to provide a more detailed analysis.

Step 5: Share

Create the visualizations and communicate findings.

# Calculate the frequency of each day
weekday_counts = daily_activity['ActivityDate'].dt.dayofweek
day_names = ['Sun', 'Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat']
weekday_counts = weekday_counts.value_counts().reindex(range(7)).fillna(0)

# Create the histogram
plt.style.use('default')
plt.figure(figsize=(6, 4))
plt.bar(weekday_counts.index, weekday_counts, width=0.6, color='purple')
plt.xticks(range(7), [day_names[i] for i in weekday_counts.index])

# Annotations
plt.xlabel("Day of the Week")
plt.ylabel("Frequency")
plt.title("App usage numbers during per week")
plt.grid(False)
plt.show()

Observations

Frequency of FitBit App Usage Across the Week: The histogram provides insights into the frequency of FitBit app usage based on days of the week.

Midweek Peak (Tuesday to Friday): The analysis reveals that users tend to prefer or remember (giving them the benefit of the doubt that they may have forgotten) to track their activity on the app during the midweek, particularly from Tuesday to Friday.

Weekend and Monday Decline: Conversely, the frequency of app usage shows a noticeable drop on Friday, which continues into the weekend and Monday. This decline suggests a reduced emphasis on app activity tracking during these days.

The histogram highlights patterns in FitBit app usage, with midweek days from Tuesday to Friday standing out as the period when users are most engaged with the app, while a decline is observed during the weekend and on Monday.

# First, filter the data to create 'daily_use2'
filtered_data = daily_activity[daily_activity['TotalSteps'] > 200]

# Group by 'Id' and count occurrences of 'ActivityDate'
grouped_data = filtered_data.groupby('Id')['ActivityDate'].count().reset_index()

# Define a function to map 'ActivityDate' to 'Usage'
def map_to_usage(activity_count):
    if 1 <= activity_count <= 14:
        return "Low Use"
    elif 15 <= activity_count <= 21:
        return "Moderate Use"
    elif 22 <= activity_count <= 31:
        return "High Use"
    return None

# Apply the mapping function
grouped_data['Usage'] = grouped_data['ActivityDate'].apply(map_to_usage)

# Convert 'Usage' to a categorical variable with specified order
usage_order = ['Low Use', 'Moderate Use', 'High Use']
grouped_data['Usage'] = pd.Categorical(grouped_data['Usage'], categories=usage_order, ordered=True)

# Rename 'ActivityDate' to 'DaysUsed'
grouped_data = grouped_data.rename(columns={'ActivityDate': 'DaysUsed'})

# Group by 'Usage'
daily_use = grouped_data.groupby('Usage').count()

# Display the counts of each 'Usage' category
print(daily_use)

              Id  DaysUsed
Usage
Low Use        2         2
Moderate Use   7         7
High Use      24        24


/var/folders/mf/pbvnqxl55dvdvmgj9n9q45s00000gn/T/ipykernel_41422/2779597259.py:28: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.
  daily_use = grouped_data.groupby('Usage').count()

# Prepare usage counts for pie chart plotting
usage_counts = daily_use['Id'].reset_index()
usage_counts.columns = ['Usage', 'Count']

# Plot a pie chart
colors = ['#ff9999', '#66b3ff', '#99ff99']
plt.figure(figsize=(8, 6))
plt.pie(usage_counts['Count'], labels=usage_counts['Usage'], colors = colors, autopct='%1.1f%%', startangle=140)
plt.title('Usage Types Distribution')
plt.axis('equal')

plt.show()

Observations

Distribution of Usage Categories: The pie chart serves as a visual representation of the distribution of "Usage" categories within the group. Notably, "High Use" emerges as the most prevalent category, followed by "Moderate Use" as the second most common, and "Low Use" as the least frequent category.

Low Use (6.1%): "Low Use" constitutes 6.1% of the overall distribution. This category characterizes individuals with lower activity levels, signifying a minority within the group.

High Use (72.7%): "High Use" stands out as the most dominant category, representing a substantial 72.7% of the total distribution. This category is indicative of individuals who engage in higher levels of activity and constitutes a significant majority.

Moderate Use (21.2%): "Moderate Use" accounts for 21.2% of the distribution, signifying individuals with moderate activity levels. It represents a substantial portion of the group's activity patterns.

The pie chart effectively illustrates the distribution of "Usage" categories, with "High Use" being the most prevalent, "Low Use" as the least common, and "Moderate Use" occupying an intermediate position in the distribution of activity levels within the group.

# Merge 'daily_activity' and 'daily_use' based for usage types information
daily_activity_usage = pd.merge(daily_activity, daily_use, on='Id', how='left')

# Calculate 'day' as the abbreviated day name
daily_activity_usage['day'] = daily_activity_usage['ActivityDate'].dt.strftime('%a')

# Calculate 'total_minutes_worn' as the sum of activity minutes
daily_activity_usage['total_minutes_worn'] = (
    daily_activity_usage['SedentaryMinutes'] +
    daily_activity_usage['LightlyActiveMinutes'] +
    daily_activity_usage['FairlyActiveMinutes'] +
    daily_activity_usage['VeryActiveMinutes']
)

# Define a function to format 'total_minutes_worn' in HH:MM:SS
def format_minutes(minutes):
    hours, remainder = divmod(minutes, 60)
    return f"{hours:02d}:{remainder:02d}:00"

# Apply the formatting function to 'total_minutes_worn'
daily_activity_usage['total_hours'] = daily_activity_usage['total_minutes_worn'].apply(format_minutes)

# Display the first 6 rows of 'daily_activity_usage'
print(daily_activity_usage.head(6))

           Id ActivityDate  TotalSteps  TotalDistance  TrackerDistance  \
0  1503960366   2016-04-12       13162           8.50             8.50
1  1503960366   2016-04-13       10735           6.97             6.97
2  1503960366   2016-04-14       10460           6.74             6.74
3  1503960366   2016-04-15        9762           6.28             6.28
4  1503960366   2016-04-16       12669           8.16             8.16
5  1503960366   2016-04-17        9705           6.48             6.48

   LoggedActivitiesDistance  VeryActiveDistance  ModeratelyActiveDistance  \
0                       0.0                1.88                      0.55
1                       0.0                1.57                      0.69
2                       0.0                2.44                      0.40
3                       0.0                2.14                      1.26
4                       0.0                2.71                      0.41
5                       0.0                3.19                      0.78

   LightActiveDistance  SedentaryActiveDistance  ...  LightlyActiveMinutes  \
0                 6.06                      0.0  ...                   328
1                 4.71                      0.0  ...                   217
2                 3.91                      0.0  ...                   181
3                 2.83                      0.0  ...                   209
4                 5.04                      0.0  ...                   221
5                 2.51                      0.0  ...                   164

   SedentaryMinutes  Calories    Weekday  TotalMins TotalHours  daysused  day  \
0               728      1985    Tuesday       1094       18.0       NaN  Tue
1               776      1797  Wednesday       1033       17.0       NaN  Wed
2              1218      1776   Thursday       1440       24.0       NaN  Thu
3               726      1745     Friday        998       17.0       NaN  Fri
4               773      1863   Saturday       1040       17.0       NaN  Sat
5               539      1728     Sunday        761       13.0       NaN  Sun

   total_minutes_worn total_hours
0                1094    18:14:00
1                1033    17:13:00
2                1440    24:00:00
3                 998    16:38:00
4                1040    17:20:00
5                 761    12:41:00

[6 rows x 22 columns]

# Calculate the daily mean steps and reset the index to create a 'steps_hour' with daily step averages
steps_hour = daily_activity_usage.groupby('day').agg(mean_steps=('TotalSteps', 'mean')).round().reset_index()

# Define the order of days for the 'day' column, starting with Sunday
day_order = ['Sun', 'Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat']
steps_hour['day'] = pd.Categorical(steps_hour['day'], categories=day_order, ordered=True)

# Fill in missing days with 0 mean_steps
steps_hour = steps_hour.set_index('day').reindex(day_order).fillna(0).reset_index()

# Display the first few rows of 'steps_hour'
print(steps_hour)

   day  mean_steps
0  Sun      6933.0
1  Mon      7781.0
2  Tue      8125.0
3  Wed      7559.0
4  Thu      7406.0
5  Fri      7448.0
6  Sat      8153.0

# Create days and mean_steps series
days = steps_hour['day']
mean_steps = steps_hour['mean_steps']

# Define the custom order for days of the week starting with Sunday
custom_order = ['Sun', 'Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat']

# Reorder the 'days' column based on the custom order
days = days.astype(pd.CategoricalDtype(categories=custom_order, ordered=True))

# Create a color gradient for the bars
colors = plt.cm.Purples(np.interp(mean_steps, (0, 10000), (0, 1)))

# Create the bar chart with the custom order
plt.figure(figsize=(10, 8))
bars = plt.bar(days, mean_steps, color=colors, edgecolor="darkblue", linewidth=0.1)

# Set title and labels
plt.title("Average Steps by Day", fontsize=16)
plt.ylabel("Calories", fontsize=14)

# Set y-axis limits and ticks
plt.ylim(0, 10000)
plt.yticks(np.arange(0, 10001, 2000), fontsize=12)

# Create a color legend
sm = plt.cm.ScalarMappable(cmap=plt.cm.Purples)
sm.set_array([])
cbar = plt.colorbar(sm, ax=plt.gca(), orientation='horizontal', pad=0.1)
cbar.set_label("Mean Steps", fontsize=12)

# Show the plot
plt.tight_layout()
plt.show()

Observations

Peak Activity Days: The data reveals that the highest step days occur on Saturday, closely followed by Tuesday and Monday. This pattern implies that individuals tend to be most active during the weekend and the early part of the workweek.

Weekday Activity Decline: In contrast, the step count shows a noticeable decrease over the remaining weekdays. This observation indicates a gradual decline in activity as the week progresses, with a consistent drop in step counts.

Sunday as the Quietest Day: Notably, Sunday emerges as the day with the fewest steps taken. This aligns with the conventional understanding of Sunday as a day of rest when physical activity tends to be lower compared to other days of the week.

Overall Activity Pattern: In summary, the data suggests a recurring pattern of activity where activity levels peak on Saturdays and gradually decrease as the week unfolds. This pattern culminates with Sunday as the least active day, which may be associated with rest and relaxation for most individuals.

# Plot scatter plot
plt.style.use("default")
plt.figure(figsize=(8,6)) # specify size of the chart
plt.scatter(daily_activity.TotalSteps, df.Calories,
            alpha = 0.8, c = df.Calories,
            cmap = "Spectral")

# Add annotations and visuals
MedianCalories = 2303
MedianSteps = 7637

plt.colorbar(orientation = "vertical")
plt.axvline(MedianSteps, color = "Blue", label = "Median steps")
plt.axhline(MedianCalories, color = "Red", label = "Median calories burned")
plt.xlabel("Steps taken")
plt.ylabel("Calories burned")
plt.title("Calories burned for every step taken")
plt.grid(True)
plt.legend()
plt.show()

Observations

The scatter plot analysis highlights the positive correlation between steps and calories burned, a trend of increasing calorie burn intensity within a certain step range, and the presence of outliers that may result from various factors, including data irregularities and user behavior changes.

Positive Correlation: The analysis of the scatter plot indicates a positive correlation between the number of steps taken and calories burned. As the number of steps increases, the calories burned also tend to increase, which is a notable trend.

Intensity of Calories Burned: The data shows that the intensity of calories burned tends to increase when users are in the range of more than zero to 15,000 steps. Beyond 15,000 steps, the rate at which calories are burned appears to decrease.

Identification of Outliers: Several outliers were identified within the data. These include instances where zero steps were taken, resulting in minimal or zero calories burned. Additionally, there is one observation where more than 35,000 steps were taken, but the calories burned were less than 3,000.

Possible Reasons for Outliers: These outliers could be attributed to natural variations in the data, changes in the user's activity patterns, or potential errors in data collection. Errors may include miscalculations, data contamination, or human errors in recording the information.

# Calculate total of individual minutes column
very_active_mins = daily_activity["VeryActiveMinutes"].sum()
fairly_active_mins = daily_activity["FairlyActiveMinutes"].sum()
lightly_active_mins = daily_activity["LightlyActiveMinutes"].sum()
sedentary_mins = daily_activity["SedentaryMinutes"].sum()

# Plot pie chart
slices = [very_active_mins, fairly_active_mins, lightly_active_mins, sedentary_mins]
labels = ["Very active minutes", "Fairly active minutes", "Lightly active minutes", "Sedentary minutes"]
colors = ['#99ff99', '#66b3ff', '#ffcc99', '#ff9999']
explode = [0, 0, 0, 0.1]
plt.style.use("default")
plt.pie(slices, labels = labels,
        colors = colors,
        explode = explode, autopct = "%1.1f%%")
plt.title("Percentage of Activity in Minutes")
plt.tight_layout()
plt.show()

Observations

Understanding App Usage

Predominance of Sedentary Minutes (81.3%): The pie chart prominently illustrates that the largest segment, accounting for 81.3% of the distribution, corresponds to sedentary minutes. This observation suggests that users primarily employ the FitBit app for logging everyday activities, such as their daily commute, inactive movements (such as transitioning between spots), or running errands.

Limited Fitness Tracking: In contrast, the data reveals a minimal use of the app for fitness tracking activities, exemplified by the minor percentages of fairly active (1.1%) and very active (1.7%) activity segments. This trend raises concerns as it appears to be inconsistent with the primary objective of the FitBit app, which is to encourage and facilitate fitness-oriented tracking.

The significant predominance of sedentary minutes highlights the practical, everyday use of the app, while the underutilization of fitness tracking features suggests the need for strategies to promote and encourage fitness-related app usage.

Step 6: Recommendations

To encourage more frequent use of the app, it's essential to convey to users that it offers more than just tracking sports activities. The objective is to help users realize that frequent use leads to better data collection, which, in turn, provides more insights and actionable recommendations for their well-being.

Understanding User Behavior: The company should conduct further research to delve into the reasons why the 'Lower' and 'Moderate Use' groups are not wearing their devices regularly. One plausible explanation could be that users perceive these devices as useful primarily during exercise or physical activities. By understanding these perceptions, the company can tailor its approach to increase overall usage.

Marketing Strategy: The marketing strategy should emphasize the holistic benefits of the device. Users should be educated on how the integration of both 'wellness' and 'sports' elements can provide a comprehensive understanding of their well-being and lifestyle habits.

Gamification: Product designers can work on introducing gamification features within the app and the devices. Gamification can create incentives for users to engage more frequently. For instance, setting goals, challenges, and rewards could motivate users to use the app consistently.

Seamless Integration: Enhancing the integration and syncing process between the app and the devices is crucial. A smoother experience can reduce barriers to usage. Similar to GoPro's successful differentiation, the company should strive to create a user-friendly and comprehensive app design.

Socialization Features: To further engage users, the company could consider adding socialization features to the app. This could include the ability to connect with friends, join communities, or share achievements. Social interactions can enhance motivation and accountability, encouraging users to utilize the app regularly.

Additionally, the company should invest in creating more inclusive studies in the future. These studies can provide a more comprehensive analysis of user behavior, leading to more effective recommendations and improvements in app design and functionality.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.gitignore		.gitignore
README.md		README.md
bellabeat.ipynb		bellabeat.ipynb
bellabeat.png		bellabeat.png
output_44_0.png		output_44_0.png
output_48_0.png		output_48_0.png
output_53_0.png		output_53_0.png
output_56_0.png		output_56_0.png
output_59_0.png		output_59_0.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

How Can A Wellness Technology Company Play It Smart?

Step 1: Scope

Company Background

Business Task

Objectives

Deliverables

Step 2: Prepare

Data Details

Data Limitations

Data Selection

Step 3: Process

Step 4: Analyze

Observations

Step 5: Share

Observations

Observations

Observations

Observations

Observations

Understanding App Usage

Step 6: Recommendations

About

Releases

Packages

Languages

acorvin/Bellabeat-Case-Study

Folders and files

Latest commit

History

Repository files navigation

How Can A Wellness Technology Company Play It Smart?

Step 1: Scope

Company Background

Business Task

Objectives

Deliverables

Step 2: Prepare

Data Details

Data Limitations

Data Selection

Step 3: Process

Step 4: Analyze

Observations

Step 5: Share

Observations

Observations

Observations

Observations

Observations

Understanding App Usage

Step 6: Recommendations

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages