DataCamp: A Guide to Achieving Data Analyst Associate Certification

Introduction

Pursuing this certification can be a rewarding journey, filled with opportunities to expand your knowledge and skills in data analysis. In this blog, I will share my personal experience and provide a comprehensive guide to help you navigate the process of obtaining the Data Analyst Associate Certification.

How Did I Get There?

I applied for a scholarship with Data Engineering Pilipinas and, fortunately, I was selected as one of the scholars to gain access to DataCamp. Previously, I was also a scholar with the Department of Information and Communications Technology (DICT), where I enjoyed a 12-month access to Coursera. During that time, I earned my professional certificates in Google Data Analytics and Google Project Management.

Associate Data Analyst in SQL - Career Track

Before attempting the Data Analyst Associate Certification Exam, I completed a career track that includes eleven brief courses and two projects. This track is estimated to take about 39 hours to finish.

The career track includes the following:

Introduction to SQL
Intermediate SQL
Analyzing Students’ Mental Health - Project
Joining Data in SQL
Data Manipulation in SQL
PostreSQL Summary Stats and Window Functions
Functions for Manipulating Data in PostreSQL
Introduction to Statistics
Exploratory Data Analysis in SQL
Data Driven Decision Making in SQL
Understanding Data Visualization
Data Communication Concepts
Analyzing Motorcycle Part Sales - Project

Associate Data Analyst in SQL - Project

I am presenting my project results to showcase them as part of my portfolio.

A. Analyzing Students’ Mental Health

Project Description

Does going to university in a different country affect your mental health? A Japanese international university surveyed its students in 2018 and published a study the following year that was approved by several ethical and regulatory boards.

The study found that international students have a higher risk of mental health difficulties than the general population, and that social connectedness (belonging to a social group) and acculturative stress (stress associated with joining a new culture) are predictive of depression.

Explore the students data using PostgreSQL to find out if you would come to a similar conclusion for international students and see if the length of stay is a contributing factor.

Here is a data description of the columns you may find helpful.

-- Find the number of international students and their average scores by length of stay, in descending order of length of stay

SELECT stay,

COUNT(*) AS count_int,

ROUND(AVG(todep), 2) AS average_phq,

ROUND(AVG(tosc), 2) AS average_scs,

ROUND(AVG(toas), 2) AS average_as

FROM students

WHERE inter_dom = 'Inter'

GROUP BY stay

ORDER BY stay DESC;

B. Analyzing Motorcycle Part Sales

Project Description

Working on behalf of a company that sells motorcycle parts, you'll dig into their data to help them understand their revenue streams. You'll build up a query to find out how much net revenue they are generating across their product lines, segregating by date and warehouse.

You're working for a company that sells motorcycle parts, and they've asked for some help in analyzing their sales data!

They operate three warehouses in the area, selling both retail and wholesale. They offer a variety of parts and accept credit cards, cash, and bank transfer as payment methods. However, each payment type incurs a different fee.

The board of directors wants to gain a better understanding of wholesale revenue by product line, and how this varies month-to-month and across warehouses. You have been tasked with calculating net revenue for each product line and grouping results by month and warehouse. The results should be filtered so that only "Wholesale" orders are included.

Find out how much Wholesale net revenue each product_line generated per month per warehouse in the dataset.

The query should be saved as revenue_by_product_line using the SQL cell provided, and contain the following:

product_line,
month: displayed as 'June', 'July', and 'August',
warehouse, and
net_revenue: the sum of total minus the sum of payment_fee.

The results should be sorted by product_line and month, followed by net_revenue in descending order.

-- Start coding here

SELECT product_line,

TO_CHAR(date, 'Month') AS month,

warehouse,

sum(total - payment_fee) AS net_revenue

FROM sales

WHERE client_type = 'Wholesale'

GROUP BY 1,2,3

Data Analyst Associate Certification Exam

The Data Analyst Associate Certification Exam includes both theoretical and practical components. This is a timed assessment consisting of 60 theoretical questions to be completed in 2 hours, and 4 practical questions that must be finished within 4 hours.

Theory

In my situation, most of the questions pertain to SQL, statistics, and data visualization. I scored 94, while the passing score was 84. I successfully passed the exam.

Practical Exam

The practical exam consisted of 4 practical question and should answer it correctly. The following are my output and served it as part of my portfolio.

Project Description: Grocery Store Sales

FoodYum is a grocery store chain that is based in the United States.

Food Yum sells items such as produce, meat, dairy, baked goods, snacks, and other household food staples.

As food costs rise, FoodYum wants to make sure it keeps stocking products in all categories that cover a range of prices to ensure they have stock for a broad range of customers.

Task 1

In 2022 there was a bug in the product system. For some products that were added in that year, the year_added value was not set in the data. As the year the product was added may have an impact on the price of the product, this is important information to have.

Write a query to determine how many products have the year_added value missing. Your output should be a single column, missing_year, with a single row giving the number of missing values.

-- Write your query for task 1 in this cell

SELECT COUNT(product_id) AS missing_year

FROM products

WHERE year_added IS NULL

Task 2

Given what you know about the year added data, you need to make sure all of the data is clean before you start your analysis. The table below shows what the data should look like.

Write a query to ensure the product data matches the description provided. Do not update the original table.

This is my first response; however, the output contains an error.

-- Write your query for task 2 in this cell

WITH products_v0 AS

(SELECT

product_id

, CASE WHEN product_type IS NOT NULL THEN product_type

ELSE 'Unknown' END as product_type

, CASE WHEN brand IS NOT NULL THEN brand

ELSE 'Unknown' END as brand

, ROUND(CAST(TRIM(REPLACE(weight,'grams','')) AS NUMERIC),2)

AS weight_0

--Later for the median

, ROUND (CAST(price AS NUMERIC), 2) as price_0 --Later for the median

, CASE WHEN average_units_sold IS NOT NULL THEN average_units_sold

ELSE 0 END as average_units_sold

, CASE WHEN year_added IS NOT NULL THEN year_added

ELSE 2022 END AS year_added

, CASE WHEN UPPER(stock_location) IS NOT NULL THEN UPPER(stock_location)

ELSE 'Unknown' END stock_location

FROM products)

, products_v1 as

(SELECT

PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY weight_0) AS Weight_median

, PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY price_0) AS Price_median

FROM products_v0)

select

product_id

, product_type

, brand

, CASE WHEN weight_0 IS NOT NULL THEN weight_0

ELSE (

SELECT weight_median FROM products_v1

)

END AS weight

, CASE WHEN price_0 IS NOT NULL THEN price_0

ELSE (

SELECT price_median FROM products_v1

)

END AS price

, average_units_sold

, year_added

, stock_location

from products_v0

-----

This is the correct answer.

--Correct Answer

SELECT product_id,

product_type,

REPLACE(brand, '-', 'Unknown') AS brand,

ROUND(REPLACE(weight::TEXT, 'grams', '')::NUMERIC, 2) AS weight,

ROUND(price::numeric, 2) AS price,

average_units_sold,

COALESCE(year_added, '2022') AS year_added,

UPPER(stock_location) AS stock_location

FROM products;

Task 3

To find out how the range varies for each product type, your manager has asked you to determine the minimum and maximum values for each product type.

Write a query to return the product_type, min_price and max_price columns.

-- Write your query for task 3 in this cell

SELECT

product_type

, MIN(price) as min_price

, MAX(price) as max_price

FROM products

GROUP BY 1

Task 4

The team want to look in more detail at meat and dairy products where the average units sold was greater than ten.

Write a query to return the product_id, price and average_units_sold of the rows of interest to the team.

-- Write your query for task 4 in this cell

SELECT product_id

, price

, average_units_sold

FROM products

WHERE product_type IN ('Meat', 'Dairy')

AND average_units_sold > 10

Conclusion

In conclusion, the journey to achieving the Data Analyst Associate Certification is both challenging and rewarding, as demonstrated by the comprehensive preparation and successful completion of various projects and exams. The experience, which included being a scholar with access to DataCamp, laid a solid foundation in SQL, statistics, and data visualization. Completing a career track with practical projects equipped me with real-world skills. This achievement not only enhances my portfolio but also broadens my expertise in data analytics.

Project MJ

Project MJ

Puzzle.io -Accounting Software

DataCamp: A Guide to Achieving Data Analyst Associate Certification

Introduction

How Did I Get There?

Related Link

Associate Data Analyst in SQL - Career Track