Puzzle.io -Accounting Software

DataCamp: A Guide to Achieving Data Analyst Associate Certification

Data Camp: Data Analyst Certification

Introduction

Pursuing this certification can be a rewarding journey, filled with opportunities to expand your knowledge and skills in data analysis. In this blog, I will share my personal experience and provide a comprehensive guide to help you navigate the process of obtaining the Data Analyst Associate Certification.

How Did I Get There?

I applied for a scholarship with Data Engineering Pilipinas and, fortunately, I was selected as one of the scholars to gain access to DataCamp. Previously, I was also a scholar with the Department of Information and Communications Technology (DICT), where I enjoyed a 12-month access to Coursera. During that time, I earned my professional certificates in Google Data Analytics and Google Project Management.

Related Link

Google Data Analytics Certification Review -Coursera Journey
Google Project Management Certification Review - Coursera Journey

Associate Data Analyst in SQL - Career Track 

Before attempting the Data Analyst Associate Certification Exam, I completed a career track that includes eleven brief courses and two projects. This track is estimated to take about 39 hours to finish.

The career track includes the following:

  • Introduction to SQL
  • Intermediate SQL
  • Analyzing Students’ Mental Health - Project
  • Joining Data in SQL
  • Data Manipulation in SQL
  • PostreSQL Summary Stats and Window Functions
  • Functions for Manipulating Data in PostreSQL
  • Introduction to Statistics
  • Exploratory Data Analysis in SQL
  • Data Driven Decision Making in SQL
  • Understanding Data Visualization
  • Data Communication Concepts
  • Analyzing Motorcycle Part Sales - Project

Associate Data Analyst in SQL - Project

I am presenting my project results to showcase them as part of my portfolio.

A. Analyzing Students’ Mental Health

Project Description

Does going to university in a different country affect your mental health? A Japanese international university surveyed its students in 2018 and published a study the following year that was approved by several ethical and regulatory boards.

The study found that international students have a higher risk of mental health difficulties than the general population, and that social connectedness (belonging to a social group) and acculturative stress (stress associated with joining a new culture) are predictive of depression.

Explore the students data using PostgreSQL to find out if you would come to a similar conclusion for international students and see if the length of stay is a contributing factor.

Here is a data description of the columns you may find helpful.

-- Find the number of international students and their average scores by length of stay, in descending order of length of stay

SELECT stay, 

       COUNT(*) AS count_int,

       ROUND(AVG(todep), 2) AS average_phq, 

       ROUND(AVG(tosc), 2) AS average_scs, 

       ROUND(AVG(toas), 2) AS average_as

FROM students

WHERE inter_dom = 'Inter'

GROUP BY stay

ORDER BY stay DESC;

B. Analyzing Motorcycle Part Sales

Project Description

Working on behalf of a company that sells motorcycle parts, you'll dig into their data to help them understand their revenue streams. You'll build up a query to find out how much net revenue they are generating across their product lines, segregating by date and warehouse.

You're working for a company that sells motorcycle parts, and they've asked for some help in analyzing their sales data!

They operate three warehouses in the area, selling both retail and wholesale. They offer a variety of parts and accept credit cards, cash, and bank transfer as payment methods. However, each payment type incurs a different fee.

The board of directors wants to gain a better understanding of wholesale revenue by product line, and how this varies month-to-month and across warehouses. You have been tasked with calculating net revenue for each product line and grouping results by month and warehouse. The results should be filtered so that only "Wholesale" orders are included.

Find out how much Wholesale net revenue each product_line generated per month per warehouse in the dataset.

The query should be saved as revenue_by_product_line using the SQL cell provided, and contain the following:

  • product_line,
  • month: displayed as 'June', 'July', and 'August',
  • warehouse, and
  • net_revenue: the sum of total minus the sum of payment_fee.

The results should be sorted by product_line and month, followed by net_revenue in descending order.

-- Start coding here

SELECT product_line,

 TO_CHAR(date, 'Month') AS month,

 warehouse,

 sum(total - payment_fee) AS net_revenue

FROM sales

WHERE client_type = 'Wholesale'

GROUP BY 1,2,3

Data Analyst Associate Certification Exam

The Data Analyst Associate Certification Exam includes both theoretical and practical components. This is a timed assessment consisting of 60 theoretical questions to be completed in 2 hours, and 4 practical questions that must be finished within 4 hours.

Theory

In my situation, most of the questions pertain to SQL, statistics, and data visualization. I scored 94, while the passing score was 84. I successfully passed the exam.

Practical Exam

The practical exam consisted of 4 practical question and should answer it correctly. The following are my output and served it as part of my portfolio. 

Project Description: Grocery Store Sales

FoodYum is a grocery store chain that is based in the United States.

Food Yum sells items such as produce, meat, dairy, baked goods, snacks, and other household food staples.

As food costs rise, FoodYum wants to make sure it keeps stocking products in all categories that cover a range of prices to ensure they have stock for a broad range of customers.

Task 1

In 2022 there was a bug in the product system. For some products that were added in that year, the year_added value was not set in the data. As the year the product was added may have an impact on the price of the product, this is important information to have.

Write a query to determine how many products have the year_added value missing. Your output should be a single column, missing_year, with a single row giving the number of missing values.

-- Write your query for task 1 in this cell

SELECT COUNT(product_id) AS missing_year

FROM products

WHERE year_added IS NULL

Task 2

Given what you know about the year added data, you need to make sure all of the data is clean before you start your analysis. The table below shows what the data should look like. 

Write a query to ensure the product data matches the description provided. Do not update the original table. 

This is my first response; however, the output contains an error.

-- Write your query for task 2 in this cell

WITH products_v0 AS

(SELECT 

 product_id

, CASE WHEN product_type IS NOT NULL THEN product_type

 ELSE 'Unknown' END as product_type

, CASE WHEN brand IS NOT NULL THEN brand

 ELSE 'Unknown' END as brand

, ROUND(CAST(TRIM(REPLACE(weight,'grams','')) AS NUMERIC),2) 

 AS weight_0

 --Later for the median

, ROUND (CAST(price AS NUMERIC), 2) as price_0 --Later for the median

, CASE WHEN average_units_sold IS NOT NULL THEN average_units_sold

 ELSE 0 END as average_units_sold

, CASE WHEN year_added IS NOT NULL THEN year_added

 ELSE 2022 END AS year_added

, CASE WHEN UPPER(stock_location) IS NOT NULL THEN UPPER(stock_location)

 ELSE 'Unknown' END stock_location

FROM products)


, products_v1 as

(SELECT 

 PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY weight_0) AS Weight_median

, PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY price_0) AS Price_median

FROM products_v0)


select 

 product_id

 , product_type

 , brand

 , CASE WHEN weight_0 IS NOT NULL THEN weight_0

 ELSE (

 SELECT weight_median FROM products_v1

 ) 

 END AS weight

 , CASE WHEN price_0 IS NOT NULL THEN price_0

 ELSE (

 SELECT price_median FROM products_v1

 ) 

 END AS price

 , average_units_sold

 , year_added

 , stock_location

 from products_v0

-----

This is the correct answer.

 --Correct Answer

 SELECT product_id, 

 product_type,

 REPLACE(brand, '-', 'Unknown') AS brand,

 ROUND(REPLACE(weight::TEXT, 'grams', '')::NUMERIC, 2) AS weight,

 ROUND(price::numeric, 2) AS price,

 average_units_sold,

 COALESCE(year_added, '2022') AS year_added,

 UPPER(stock_location) AS stock_location

FROM products;

Task 3

To find out how the range varies for each product type, your manager has asked you to determine the minimum and maximum values for each product type.

Write a query to return the product_type, min_price and max_price columns.

-- Write your query for task 3 in this cell

SELECT

product_type

, MIN(price) as min_price

, MAX(price) as max_price

FROM products

GROUP BY 1

Task 4

The team want to look in more detail at meat and dairy products where the average units sold was greater than ten.

Write a query to return the product_id, price and average_units_sold of the rows of interest to the team.

-- Write your query for task 4 in this cell

SELECT product_id

, price

, average_units_sold

FROM products

WHERE product_type IN ('Meat', 'Dairy')

AND average_units_sold > 10

Conclusion

In conclusion, the journey to achieving the Data Analyst Associate Certification is both challenging and rewarding, as demonstrated by the comprehensive preparation and successful completion of various projects and exams. The experience, which included being a scholar with access to DataCamp, laid a solid foundation in SQL, statistics, and data visualization. Completing a career track with practical projects equipped me with real-world skills. This achievement not only enhances my portfolio but also broadens my expertise in data analytics.

Post a Comment

0 Comments