Intermediate Google Sheets
After finishing the three courses on DataCamp, I’ve officially completed the Intermediate
Google Sheets track—and it’s been a deep dive into the kind of spreadsheet magic that turns raw data into real insight.
This track wasn’t just about formulas. It sharpened my ability to:
🧠Apply introductory statistics directly in Sheets for smarter decision-making
⚠️ Understand error and uncertainty, and how cognitive biases sneak into data work
📊 Build marketing dashboards that are not only dynamic but also clean and client-ready
Each module—from statistical analysis to dashboard design—reinforced my love for modular, scalable systems. Im sharing my notes from the Intermediate Google Sheets track to help others learn and revisit key concepts. I hope this post serves as a useful reference for anyone exploring spreadsheet skills- now and also in the future.
I. Introduction to Statistics in Google Sheets
1. Getting to Know Your Data
- Welcome to the course
- Averages reduces information
- Mean = sum of all observation/number of observations
- Median - middle number of the dataset
- Half the number are less than the median
- Half the number are above the median
- How far from average?
- Variance
- Measures how dispersed a dataset is
- Smaller variance indicates a dataset is less spread
- Large differences between data points increase the variance
- =VARP(range)
- Standard Deviation
- Square root of variance
- =STDEVP(range)
- Quartile()
- Quartile 4: The maximum value in the data
- Quartile 3: 75% of the data is less than the third quartile
- Quartile 2: The median of the data
- Quartile 1: The smallest values 25% of the data
- Standardizing Data
- Variables are sometimes measured on different scales
- Makes it harder to compare
- Easier to misinterpret variable importance
- Solution: standardize your data
- All variables on the same scale
- Z-scores also known as standard scores
2. Statistical Data Visualization
- Visualizing Distribution
- Scatterplots
- Line Estimate - calculates both the slope & the intercept of two variables using the least-squares method.
- Linest(rangeX, rangeY)
- CORREL() - solve the correlation of the given data
- SLOPE() - will return the slope of a trend line or linear regression representing the linear change in one unit to another.
- INTERCEPT() - returns the value where the trendline will intersect the y-axis.
- Bar Charts
3. Statistical Hypothesis Testing
- Central to Stats: Sampling
- What is a population?
- An entire distribution of observations/events
- Costly and time consuming to work with
- Better to "sample" the population
- Sample
- A subset from a population
- Meant to represent the population
- The larger the sample size, the closer the statistics of the sample will emulate the statistics of the population.
- Central Limit Theorem
- If a sample size from an independent random variable is large enough, then the sampling distribution will be normal or nearly normal
- Hypothesis Testing
- Hypothesis: Any testable claim
- Null Hypothesis
- Represented the status quo (accepted fact)
- Represented by Ho
- Alternate/Research Hypothesis
- Challenger Statement
- Represented by H1
- Removing subjectivity in a test
- Hypothesis test use "test statistics" to verify hypothesis
- Example: t-test (In Google Sheets: T.test(range1, range2, tails, type)
- Produces a "p-value"
- p-value: Probability that the results you see are due to chance/error
- Choose a p-value cutoff (Example: 1% or 5%)
- If p-value is less than the cutoff, REJECT the null Hypothesis
- T.Test(range1, range2, tail,type)
- If testing only greater than or less than:
- The Test has 1 tail
- If the Ho operator is Equals
- 2 tails, as the values can be above or below the mean
- If measuring the same observations at different times
- Type = 1
- If measuring different observations with same variance
- Type = 2
- If measuring different observations with different variances
- Type = 3
- Hypothesis Testing with the Z-test
- =Z.Test(range1, testStatistic, StDev)
- Needs 1 range
- Needs a test statistics (i.e. population mean)
- Used with bigger datasets (n>30)
- Fail to Reject Ho if the -value is greater than 0.05
- Reject Ho if the p-value is less than 0.05
- Hypothesis Testing with the Chi-squared test
- The difference in the new group stems from random sampling (Ho)
- There is in fact a meaningful difference
- For a chi-squared to be useful:
- Data has to be in groups (e.g: "Old treatment", "New treatment")
- Avoid really small expected values (<5)
- CHITEST (observed_range, expected_range)
- FAIL TO REJECT Ho if the p-value is greater than 0.05
- REJECT Ho if the p-value is less than 0.05
4. Case Study: Dating Profile Analysis
- Dating Data
- Visuals & Distributions
- Tipping the scale to positive correlation
- Correlation = 1
- Positive: as one var so does the other #
- Correlation = 0
- No relationship among variables
- Correlation = -1
- Negative or opposite relationship
- More complex relationships
II. Error and Uncertainty in Google Sheets
1. Defining Error, Uncertainty, and Risk
- Defining Error and Uncertainty
- Errors in Judgment
- How or why things happened
- Biases, assumptions, psychology
- Errors in Statistics
- Measures of variation or uncertainty
- Incorrect conclusions about the data
- Types of Statistical Errors
- Type 1 Error
- "False Positive"
- Sampling
- Data Quality
- Design
- Type 2 Error
- "False Negative"
- Sampling
- Variation
- Measurement
- City of Seattle Open Data portal - Sources of Open Dataset
- IF Functions
- Unique() function
- Returns distinct values
- Counts of Unique Values
- Countif() function
- range
- criterion
- Other IF Functions
- AVERAGEIF()
- SUMIF()
- Multiple IF conditions
- Countifs()
- Pairs of ranges and criteria
- Allows multiple criteria
- Different criteria in same range
- Averageifs()
- Range
- Pairs of ranges in criteria
- Allows multiple criteria
- Correlation
- Relationship between two groups
- Correlation coefficient (r)
- Varies between -1 and 1
- Positive: Increase/decrease together
- Negative: Opposite increases/decreases
- Zero: No relationship
- Evaluating strength of relationships
- Weak
- -0.3 to 0.3
- Moderate
- -0.3 to -0.7 or 0.3 to 0.7
- Strong
- -0.7 to -1.0 or 0.7 to 1.0
2. Making Accurate Predictions
- Weighted Average vs Linear models
- Linear Model
- Lines of "best fit"
- Easily generalizable
- Reasonable predictions from novel data
- May oversimplify relationship
- Weighted Averages
- Simple to compute
- Good predictors for interim measures
- Bad at generalizing
- Average.weighted(values, weights)
- values : data to average
- weights : proportions
- Advanced Prediction Strategies
- Linear Models
- Predict Unobserved values
- Predict Using Other Measures
- Independent Variable
- Causes some change
- Forecasting
- Forecast() function = (x, data_y, data_x)
- x : value to use to predict
- data_y : actual outcomes
- data_x : predictors
- Predicted vs. Actual
- Compare forecasts to observations
- prediction - observation
- Absolute Deviation
- Mean absolute deviation
- Average() of absolute deviation
- Useful for comparing predictions
- Statistical Significance
- <5% probability due to chance
- Not 0%
- T.test() - Likelihood that differences are significant
- tails = 1: Upper or lower distribution
- tails = 2: Upper and lower distribution
- type
- 1: Paired-samples
- 2: Two-sample (equal variance)
- 3: Two-sample (unequal variance)
- Confidence levels
- Probability due to chance
3. Poking holes in Predictions
- Defining Risk
- Exposure to Danger
- Likelihood - how frequently occurring
- Consequences
- Severity
- Defining Outliers
- Outside normal range
- Skew data
- Low likelihood
- (Potentially) severe
- SORTing outliers
- SORT() function
- range: cells to sort
- sort_column: columns to sort by
- is_ascending: whether to sort low to high
- FILTERing Outliers
- Filter() function
- range: cells to filter
- condition1: filter to apply
- >,<=, etc
- Sparklines
- Sparkline() function
- date: The cell(s) to plot
- options: Options to configure the plot
- charttype
- bar, line, column, etc
- max
- upper limit
- Risk
- Noise
- Noise- Unexplained Variation
- Random Numbers
- Multiply predictions
- =Randbetween(low, high)
4. Case study: Should you change your bakery's menu?
III. Marketing Analytics in Google Sheets
1. Data Validation for Clean Data Entry
- Importance of Clean Data Entry
- Validation Message
- Appears for valid cells
- Provides more criteria
- Optional in Data Validation settings
- Invalid Message
- Appears in flagged cells
- Explains why the cells is flagged
- Data Validations: Dropdowns
- Two types of dropdown validations
- 1 . List from a range
- Options generated from a selected range
- Better for larger and/or often modified lists
- 2. List of items
- List entered manually in settings
- Better for a short list
- Text Validation: URL and email validity
- URL validity
- Criteria: 'is valid url'
- Flagged if URL structure is invalid
- Email validity
- Criteria: 'is valid email'
- Flagged if email address is invalid
- Data Validation: Checkboxes
- Checkbox
- Adds interactive checkboxes to cells
- Great for cells with binary values
- Cell value based on checkbox status
- Custom cell values for checked and unchecked boxes
2. What are regular expressions?
- What are regular expressions?
- A special search pattern
- Made a sequence of characters
- Also known as 'regex'
- When are they used?
- To search within strings
- Commonly used when filtering by categories
- Regular_expression = .*[u|U]sers
- .* - matches any number of characters preceding the word 'users'
- u|U - matches either the lower or upper case variation of letter 'u'
- [u|U]sers - matched term must contain one of the items within the brackets
- Basic Regular Expression Characters:
- Wildcards
- . -> matches any character (Example: d.g -> matches both 'dog' and 'dig')
- .* -> matches 0 or more times (Example: .* -> matches all the letters in 'dog'
- ? -> matches 0 or 1 time (Example: dogs? -> matches either 'dog' or 'dogs'
- + -> matches 1 or more times (Example: dog.+ -> matches 'dogs' but not 'dog'
- x|y -> matches x OR y (Example: dog|cat -> matches either 'dog' or 'cat'
- \ -> escapes any special character (Example: who\? -> matches 'who?')
- Anchors
- ^x -> the start of a string (Example: ^T -> matches 'The dog likes to dig')
- x$ -> the end of a string (Example: g$ -> matches 'The dog likes to dig')
- Groups
- [x|X] -> matches either 'x' or 'X' (Example: [d|D]og -> matches either 'dog' or 'Dog')
- {x} -> matches x number of times (Example: .{2} -> matches 'Do' in 'Dog')
- Test a string using REGEXMATCH
- =REGEXMATCH (string to test, regular expression)
- Great for testing existence of a string within a string
- Returns a boolean (True or False)
- Often used filtering tables
- Creating subtables using Filter()
- Useful when aggregating specific categories
- SUM()
- Average()
- Filter with REGEXMATCH()
- =Filter(range, REGEXMATCH (string to test, regular expression)
- Example: =filter(A2:E29,REGEXMATCH(B2:B29, "Competitors?"))
- Aggregate with Regular Expression
- =SUM(FILTER(range to sum, REGEXMATCH(string to test, regular expression)))
- Example: =sum(filter(D2:D29,REGEXMATCH(B2:B29,"Python Brand")))
- Modify a string using REGEXEXTRACT and REGEXREPLACE
- =REGEXREPLACE(string, regular expression, replacement string)
- Returns: original string and replacement string
- Excludes the matched part of the string
- Often used to clean up categorical data
- Example:
.[u|U]sers : Replaces the word 'Users' after the strings R and Python
Enter ' ' to delete the word entirely
=REGEXREPLACE(`string`,`regular expression`, ' ') - =REGEXEXTRACT( string, regular expression)
- Returns: matched portion of the original string
- Great for extracting a specific portion of diffrent strings
- Parenthesis () indicate the group of characters to extract
- To extract Python Users
- (.*).Users returns Python
- (.*).[u|U]sers would extract 'Python' and 'R' from the campaign names
- (.*) matches the part of the string to extract
- [u|U]sers is the part of the strings to match, but not extract
- USING REGEXREPLACE() and REGEXEXTRACT()
- To replace only a certain categorical string:
- =IF(REGEXMATCH(test, regex), REGEXREPLACE (string, regex, replacement string) if False)
- To extract only a certain categorical string:
- =IF(REGEXMATCH(test, regex), REGEXEXTRACT (string, regex) if False)
3. Visualize the Data with Charts
- Visualizing CTR Trends
- Line Chart
- Show basic trend of each line
- May be the most recognizable of all trend charts
- Great for any number of trends
- Area Chart
- Shaded under the curve
- Great for showing relative differences between trends
- Will get messy if too many trends are plotted on a single area chart
- Visualizing cost trends
- Stacked Area Chart
- Aggregate of all trends
- Show contribution by each trend
- CTR does not work with this since it is an average
- Visualizing ad group performance with column & bar charts
- Column charts and bar charts are very similar
- The only difference is the axis in which the labels reside
- Normal bar/column charts
- Great for visualizing 1 or 2 ad group metrics at a time
- Stacked bar / column charts
- Effective at showing a few ad group metrics at a time
- 100% stacked bar / column charts (with columns / rows switched)
- Provide insight into percent make up of the total for each metric
- Evaluating campaigns with pie & scatter plots
- Goal Completions
- Important user engagements that measure the effectiveness of a campaign
- Pie/Doughnut Charts
- Display relative makeup of a total
- Scatter Charts
- Great for determining the relative relationship between two metrics
- Bubble Charts
- Makes the chart more informative by adding another dimension (size)
- Pie
- Build a Digital Marketing Dashboard
- Dashboards are used to relay information in a timely manner
- Typically includes only basic, easy-to-understand information
- Usually made up of charts, but may contain data tables as well
- Used for decision making:
- Determining a campaign budget based on performance
- Allocating resources to fix poor performing ad groups
- Dashboard charts review: doughnut/pie charts
- Display relative make up total
- Same as a pie chart except it has a hole in the middle
- Dashboard charts review: stacked column/bar chart
- Visualize a few metrics at a time, as a single bar
- Bar chart is the same as a column chart except the labels are on the y-axis
- Dashboard charts review: scatter/bubble chart
- Visualize the relationship between metrics
- A scatter chart is the same as a bubble chart, except it does not have weights
3. Visualize the Data with Charts
- Data Validation Type: List of Items
- Work well for a small number of items
- Best choice if the list remains constant
- Provides an easy to-use dropdown menu
- Data Validation Type: List from range
- Works well for a large number of items
- Best choice if the list changes often
- Provides an easy-to-use dropdown menu
- Building a regex table
- Filter Tables))
- =Filter(range , (REGEXMATCH(string to test, regular expression))
- Visualize the data
- Requirements of the dashboard
- Must be easy to understand
- Be understood by a broad range of audiences
- Contain key metrics
- Only the important metrics that are often used in making key decisions
- It must get to the point, while still delivering important information
- Building a simple dashboard
- Filter with a dropdown
- Included the filtered table
- Add a pie chart to percent make up of impressions
- View campaigns with a stacked column chart
- Analyze cost with a bubble chart
- Cleaning up the visuals
- Common Reporting Issues
- Messy Charts
- Too Many Charts
- Avoid adding too many charts to a dashboard
- Adding a dynamic dropdown makes for a better experience
- How to make a dashboard dynamic?
- Step 1: Add Data Validation Dropdown
- Step 2: Create the filtered table and charts
- Example formula:
=IF(
len(H3) > 1,
IF(
H2 = "",
FILTER(A1:F29, REGEXMATCH(B1:B29, H3)),
FILTER(A1:F29, REGEXMATCH(B1:B29, H3), REGEXMATCH(A1:A29, H2))
),
FILTER(A2:F29, REGEXMATCH(A2:A29, ".+"))
)
- Step 3: Link chart to the filtered table
- Link the filtered table to each of the charts in the dashboard.
- When the table changes, the charts will reflect the change.
- Be sure to test out a few filter combinations
0 Comments