Intermediate Google Sheets Notes in DataCamp

Intermediate Google Sheets

After finishing the three courses on DataCamp, I’ve officially completed the Intermediate

Google Sheets track—and it’s been a deep dive into the kind of spreadsheet magic that turns raw data into real insight.

This track wasn’t just about formulas. It sharpened my ability to:

🧠 Apply introductory statistics directly in Sheets for smarter decision-making

⚠️ Understand error and uncertainty, and how cognitive biases sneak into data work

📊 Build marketing dashboards that are not only dynamic but also clean and client-ready

Each module—from statistical analysis to dashboard design—reinforced my love for modular, scalable systems. Im sharing my notes from the Intermediate Google Sheets track to help others learn and revisit key concepts. I hope this post serves as a useful reference for anyone exploring spreadsheet skills- now and also in the future.

I. Introduction to Statistics in Google Sheets

1. Getting to Know Your Data

Welcome to the course

Averages reduces information

Mean = sum of all observation/number of observations

Median - middle number of the dataset

Half the number are less than the median
Half the number are above the median

How far from average?

Variance

Measures how dispersed a dataset is
Smaller variance indicates a dataset is less spread
Large differences between data points increase the variance
=VARP(range)

Standard Deviation

Square root of variance
=STDEVP(range)

Quartile()

Quartile 4: The maximum value in the data
Quartile 3: 75% of the data is less than the third quartile
Quartile 2: The median of the data
Quartile 1: The smallest values 25% of the data

Standardizing Data

Variables are sometimes measured on different scales
Makes it harder to compare
Easier to misinterpret variable importance
Solution: standardize your data
All variables on the same scale
Z-scores also known as standard scores

2. Statistical Data Visualization

Visualizing Distribution
Scatterplots

Line Estimate - calculates both the slope & the intercept of two variables using the least-squares method.

Linest(rangeX, rangeY)

CORREL() - solve the correlation of the given data
SLOPE() - will return the slope of a trend line or linear regression representing the linear change in one unit to another.
INTERCEPT() - returns the value where the trendline will intersect the y-axis.

Bar Charts

3. Statistical Hypothesis Testing

Central to Stats: Sampling

What is a population?

An entire distribution of observations/events
Costly and time consuming to work with
Better to "sample" the population

Sample

A subset from a population
Meant to represent the population
The larger the sample size, the closer the statistics of the sample will emulate the statistics of the population.

Central Limit Theorem

If a sample size from an independent random variable is large enough, then the sampling distribution will be normal or nearly normal

Hypothesis Testing

Hypothesis: Any testable claim

Null Hypothesis

Represented the status quo (accepted fact)
Represented by Ho

Alternate/Research Hypothesis

Challenger Statement
Represented by H1

Removing subjectivity in a test

Hypothesis test use "test statistics" to verify hypothesis
Example: t-test (In Google Sheets: T.test(range1, range2, tails, type)

Produces a "p-value"
p-value: Probability that the results you see are due to chance/error
Choose a p-value cutoff (Example: 1% or 5%)
If p-value is less than the cutoff, REJECT the null Hypothesis

T.Test(range1, range2, tail,type)

If testing only greater than or less than:

The Test has 1 tail

If the Ho operator is Equals

2 tails, as the values can be above or below the mean

If measuring the same observations at different times

Type = 1

If measuring different observations with same variance

Type = 2

If measuring different observations with different variances

Type = 3

Hypothesis Testing with the Z-test

=Z.Test(range1, testStatistic, StDev)

Needs 1 range
Needs a test statistics (i.e. population mean)
Used with bigger datasets (n>30)
Fail to Reject Ho if the -value is greater than 0.05
Reject Ho if the p-value is less than 0.05

Hypothesis Testing with the Chi-squared test

The difference in the new group stems from random sampling (Ho)
There is in fact a meaningful difference
For a chi-squared to be useful:

Data has to be in groups (e.g: "Old treatment", "New treatment")
Avoid really small expected values (<5)

CHITEST (observed_range, expected_range)

FAIL TO REJECT Ho if the p-value is greater than 0.05
REJECT Ho if the p-value is less than 0.05

4. Case Study: Dating Profile Analysis

Dating Data
Visuals & Distributions
Tipping the scale to positive correlation

Correlation = 1

Positive: as one var so does the other #

Correlation = 0

No relationship among variables

Correlation = -1

Negative or opposite relationship

More complex relationships

II. Error and Uncertainty in Google Sheets

1. Defining Error, Uncertainty, and Risk

Defining Error and Uncertainty

Errors in Judgment

How or why things happened
Biases, assumptions, psychology

Errors in Statistics

Measures of variation or uncertainty
Incorrect conclusions about the data

Types of Statistical Errors

Type 1 Error

"False Positive"

Sampling
Data Quality
Design

Type 2 Error

"False Negative"

Sampling
Variation
Measurement

City of Seattle Open Data portal - Sources of Open Dataset

IF Functions

Unique() function

Returns distinct values

Counts of Unique Values

Countif() function

range
criterion

Other IF Functions

AVERAGEIF()
SUMIF()

Multiple IF conditions

Countifs()

Pairs of ranges and criteria
Allows multiple criteria
Different criteria in same range

Averageifs()

Range
Pairs of ranges in criteria
Allows multiple criteria

Correlation

Relationship between two groups
Correlation coefficient (r)

Varies between -1 and 1
Positive: Increase/decrease together
Negative: Opposite increases/decreases
Zero: No relationship

Evaluating strength of relationships

Weak

-0.3 to 0.3

Moderate

-0.3 to -0.7 or 0.3 to 0.7

Strong

-0.7 to -1.0 or 0.7 to 1.0

2. Making Accurate Predictions

Weighted Average vs Linear models

Linear Model

Lines of "best fit"
Easily generalizable
Reasonable predictions from novel data
May oversimplify relationship

Weighted Averages

Simple to compute
Good predictors for interim measures
Bad at generalizing

Average.weighted(values, weights)

values : data to average
weights : proportions

Advanced Prediction Strategies

Linear Models

Predict Unobserved values
Predict Using Other Measures
Independent Variable

Causes some change

Forecasting

Forecast() function = (x, data_y, data_x)

x : value to use to predict
data_y : actual outcomes
data_x : predictors

Predicted vs. Actual

Compare forecasts to observations

prediction - observation

Absolute Deviation
Mean absolute deviation

Average() of absolute deviation
Useful for comparing predictions

Statistical Significance

<5% probability due to chance
Not 0%
T.test() - Likelihood that differences are significant

tails = 1: Upper or lower distribution
tails = 2: Upper and lower distribution

type

1: Paired-samples
2: Two-sample (equal variance)
3: Two-sample (unequal variance)

Confidence levels

Probability due to chance

3. Poking holes in Predictions

Defining Risk

Exposure to Danger

Likelihood - how frequently occurring

Consequences

Severity

Defining Outliers

Outside normal range

Skew data
Low likelihood
(Potentially) severe

SORTing outliers

SORT() function

range: cells to sort
sort_column: columns to sort by
is_ascending: whether to sort low to high

FILTERing Outliers

Filter() function

range: cells to filter
condition1: filter to apply

>,<=, etc

Sparklines

Sparkline() function

date: The cell(s) to plot
options: Options to configure the plot

charttype

bar, line, column, etc

max

upper limit

Risk
Noise

Noise- Unexplained Variation
Random Numbers

Multiply predictions
=Randbetween(low, high)

4. Case study: Should you change your bakery's menu?

III. Marketing Analytics in Google Sheets

1. Data Validation for Clean Data Entry

Importance of Clean Data Entry

Validation Message

Appears for valid cells
Provides more criteria
Optional in Data Validation settings

Invalid Message

Appears in flagged cells
Explains why the cells is flagged

Data Validations: Dropdowns

Two types of dropdown validations

1 . List from a range

Options generated from a selected range
Better for larger and/or often modified lists

2. List of items

List entered manually in settings
Better for a short list

Text Validation: URL and email validity

URL validity

Criteria: 'is valid url'
Flagged if URL structure is invalid

Email validity

Criteria: 'is valid email'
Flagged if email address is invalid

Data Validation: Checkboxes

Checkbox

Adds interactive checkboxes to cells
Great for cells with binary values
Cell value based on checkbox status
Custom cell values for checked and unchecked boxes

2. What are regular expressions?

What are regular expressions?

A special search pattern
Made a sequence of characters
Also known as 'regex'

When are they used?

To search within strings
Commonly used when filtering by categories

Regular_expression = .*[u|U]sers

.* - matches any number of characters preceding the word 'users'
u|U - matches either the lower or upper case variation of letter 'u'
[u|U]sers - matched term must contain one of the items within the brackets

Basic Regular Expression Characters:

Wildcards

. -> matches any character (Example: d.g -> matches both 'dog' and 'dig')
.* -> matches 0 or more times (Example: .* -> matches all the letters in 'dog'
? -> matches 0 or 1 time (Example: dogs? -> matches either 'dog' or 'dogs'
+ -> matches 1 or more times (Example: dog.+ -> matches 'dogs' but not 'dog'
x|y -> matches x OR y (Example: dog|cat -> matches either 'dog' or 'cat'
\ -> escapes any special character (Example: who\? -> matches 'who?')

Anchors

^x -> the start of a string (Example: ^T -> matches 'The dog likes to dig')
x$ -> the end of a string (Example: g$ -> matches 'The dog likes to dig')

Groups

[x|X] -> matches either 'x' or 'X' (Example: [d|D]og -> matches either 'dog' or 'Dog')
{x} -> matches x number of times (Example: .{2} -> matches 'Do' in 'Dog')

Test a string using REGEXMATCH

=REGEXMATCH (string to test, regular expression)
Great for testing existence of a string within a string

Returns a boolean (True or False)

Often used filtering tables

Creating subtables using Filter()

Useful when aggregating specific categories

SUM()
Average()

Filter with REGEXMATCH()

=Filter(range, REGEXMATCH (string to test, regular expression)
Example: =filter(A2:E29,REGEXMATCH(B2:B29, "Competitors?"))

Aggregate with Regular Expression

=SUM(FILTER(range to sum, REGEXMATCH(string to test, regular expression)))
Example: =sum(filter(D2:D29,REGEXMATCH(B2:B29,"Python Brand")))

Modify a string using REGEXEXTRACT and REGEXREPLACE

=REGEXREPLACE(string, regular expression, replacement string)

Returns: original string and replacement string

Excludes the matched part of the string

Often used to clean up categorical data
Example:
.[u|U]sers : Replaces the word 'Users' after the strings R and Python
Enter ' ' to delete the word entirely
=REGEXREPLACE(`string`,`regular expression`, ' ')

=REGEXEXTRACT( string, regular expression)

Returns: matched portion of the original string
Great for extracting a specific portion of diffrent strings

Parenthesis () indicate the group of characters to extract

To extract Python Users

(.*).Users returns Python

(.*).[u|U]sers would extract 'Python' and 'R' from the campaign names

(.*) matches the part of the string to extract
[u|U]sers is the part of the strings to match, but not extract

USING REGEXREPLACE() and REGEXEXTRACT()

To replace only a certain categorical string:

=IF(REGEXMATCH(test, regex), REGEXREPLACE (string, regex, replacement string) if False)

To extract only a certain categorical string:

=IF(REGEXMATCH(test, regex), REGEXEXTRACT (string, regex) if False)

3. Visualize the Data with Charts

Visualizing CTR Trends

Line Chart

Show basic trend of each line
May be the most recognizable of all trend charts
Great for any number of trends

Area Chart

Shaded under the curve
Great for showing relative differences between trends
Will get messy if too many trends are plotted on a single area chart

Visualizing cost trends

Stacked Area Chart

Aggregate of all trends
Show contribution by each trend
CTR does not work with this since it is an average

Visualizing ad group performance with column & bar charts

Column charts and bar charts are very similar

The only difference is the axis in which the labels reside

Normal bar/column charts

Great for visualizing 1 or 2 ad group metrics at a time

Stacked bar / column charts

Effective at showing a few ad group metrics at a time

100% stacked bar / column charts (with columns / rows switched)

Provide insight into percent make up of the total for each metric

Evaluating campaigns with pie & scatter plots

Goal Completions

Important user engagements that measure the effectiveness of a campaign

Pie/Doughnut Charts

Display relative makeup of a total

Scatter Charts

Great for determining the relative relationship between two metrics

Bubble Charts

Makes the chart more informative by adding another dimension (size)

Build a Digital Marketing Dashboard

Dashboards are used to relay information in a timely manner
Typically includes only basic, easy-to-understand information
Usually made up of charts, but may contain data tables as well
Used for decision making:

Determining a campaign budget based on performance
Allocating resources to fix poor performing ad groups

Dashboard charts review: doughnut/pie charts

Display relative make up total
Same as a pie chart except it has a hole in the middle

Dashboard charts review: stacked column/bar chart

Visualize a few metrics at a time, as a single bar
Bar chart is the same as a column chart except the labels are on the y-axis

Dashboard charts review: scatter/bubble chart

Visualize the relationship between metrics
A scatter chart is the same as a bubble chart, except it does not have weights

3. Visualize the Data with Charts

Data Validation Type: List of Items

Work well for a small number of items
Best choice if the list remains constant
Provides an easy to-use dropdown menu

Data Validation Type: List from range

Works well for a large number of items
Best choice if the list changes often
Provides an easy-to-use dropdown menu

Building a regex table

Filter Tables))

=Filter(range , (REGEXMATCH(string to test, regular expression))

Visualize the data

Requirements of the dashboard

Must be easy to understand

Be understood by a broad range of audiences

Contain key metrics

Only the important metrics that are often used in making key decisions

It must get to the point, while still delivering important information

Building a simple dashboard

Filter with a dropdown
Included the filtered table
Add a pie chart to percent make up of impressions
View campaigns with a stacked column chart
Analyze cost with a bubble chart

Cleaning up the visuals

Common Reporting Issues

Messy Charts
Too Many Charts

Avoid adding too many charts to a dashboard

Adding a dynamic dropdown makes for a better experience

How to make a dashboard dynamic?

Step 1: Add Data Validation Dropdown
Step 2: Create the filtered table and charts

Example formula:
=IF(
len(H3) > 1,
IF(
H2 = "",
FILTER(A1:F29, REGEXMATCH(B1:B29, H3)),
FILTER(A1:F29, REGEXMATCH(B1:B29, H3), REGEXMATCH(A1:A29, H2))
),
FILTER(A2:F29, REGEXMATCH(A2:A29, ".+"))
)

Step 3: Link chart to the filtered table

Link the filtered table to each of the charts in the dashboard.
When the table changes, the charts will reflect the change.
Be sure to test out a few filter combinations

Project MJ

Project MJ

Puzzle.io -Accounting Software

Intermediate Google Sheets Notes in DataCamp