Data Science Explorer

SQL Code Analysis 본문

SQL

SQL Code Analysis

grace21110 2023. 10. 10. 17:29
반응형

Let's analyze the following code: 

SELECT * 
FROM (
	SELECT 
		* 
		, ROW_NUMBER() OVER(PARTITION BY department ORDER BY RET_RATIO DESC) AS RNK 
	FROM (
		SELECT 
				department
				, A.product_id
				, SUM(reordered) / COUNT(*) AS RET_RATIO
		FROM order_products__prior A 
		LEFT 
		JOIN products B 
		ON A.product_id = B.product_id 
		LEFT 
		JOIN departments C
		ON B.department_id = C.department_id
		GROUP BY 1, 2
	) A
	WHERE product_id IN (
		SELECT product_id 
		FROM (
			SELECT 
				product_id
				, SUM(reordered) AS TOTAL
			FROM
				order_products__prior
			GROUP BY 1 
		) A
		WHERE TOTAL >= 10
	)
) BASE
WHERE RNK <= 10
;

1. Innermost Subquery (Calculating Repurchase Rates):

SELECT 
    department
    , A.product_id
    , SUM(reordered) / COUNT(*) AS RET_RATIO
FROM order_products__prior A 
LEFT JOIN products B ON A.product_id = B.product_id 
LEFT JOIN departments C ON B.department_id = C.department_id
GROUP BY 1, 2
  • In this part, we are calculating the repurchase rates (RET_RATIO) for each product within each department.
  • We use JOIN operations to link the order_products__prior, products, and departments tables to obtain information about the products and their departments.
  • The data is grouped by department and product_id, and the repurchase rate is calculated as the sum of reordered items divided by the count of all items for each product.
  • This subquery calculates the repurchase rates for all products within all departments.

 

2. Middle Subquery (Filtering Products by Order Quantity):

SELECT product_id 
FROM (
    SELECT 
        product_id
        , SUM(reordered) AS TOTAL
    FROM order_products__prior
    GROUP BY 1 
) A
WHERE TOTAL >= 10
  • This part filters out products based on a minimum total order quantity (TOTAL) threshold of 10 or more.
  • The subquery calculates the total order quantity for each product by summing the reordered items for each product_id.
  • It filters the products to include only those with a total order quantity of 10 or more.

 

3. Outer Query (Assigning Rankings and Final Filtering):

SELECT * 
FROM (
    -- ...
) BASE
WHERE RNK <= 10;
  • In the outer query, we assign rankings to the products within each department based on their repurchase rates. The ROW_NUMBER() window function is used to assign these rankings, and PARTITION BY department ensures that rankings are assigned separately for each department.
  • The products with higher repurchase rates receive lower rank numbers (sorted in descending order).
  • The final WHERE RNK <= 10 filter selects products with a rank of 10 or less within each department. This effectively limits the result set to the top 10 products with the highest repurchase rates in each department.

 


The code is referred a book called SQL로 맛보는 데이터 전처리 분석(노수영 저). 

https://product.kyobobook.co.kr/detail/S000001934242

'SQL' 카테고리의 다른 글

[Basic Grammar] SQL SELECT Statement  (0) 2023.10.12
SQL Code Analysis  (0) 2023.10.11
SQL Code Analysis  (0) 2023.10.10
SQL Code Analysis  (0) 2023.10.06
SQL Code Analysis  (0) 2023.10.06