Data Science Explorer

SQL Code Analysis 본문

SQL

SQL Code Analysis

grace21110 2023. 10. 11. 17:19
반응형

This SQL code appears to be generating a new column called "quantile" for each row in the result set. The quantile column will have the value "Quantile_1" for rows that meet a specific condition. Let's break down the code step by step:

SELECT 
	*, 
    CASE WHEN RNK <= (SELECT COUNT(DISTINCT user_id)
FROM 
	(SELECT 
		user_id
		, COUNT(DISTINCT order_id) F 
	 FROM orders
     GROUP BY 1
) A) / 10 THEN 'Quantile_1' END quantile
FROM (
	SELECT 
		*
		, ROW_NUMBER() OVER(ORDER BY F DESC) RNK
	FROM (
		SELECT 
			user_id
			, COUNT(DISTINCT order_id) AS F
		FROM 
			orders
		GROUP BY 1
	) A
) A 
;

1. The innermost query:

SELECT 
    user_id,
    COUNT(DISTINCT order_id) AS F
FROM 
    orders
GROUP BY 1
  • This query counts the number of distinct order_ids for each user_id in the "orders" table and assigns it an alias "F." The result will be a list of user_ids and their corresponding F values.

 

2. The middle query (aliased as "A"):

SELECT 
    *,
    ROW_NUMBER() OVER (ORDER BY F DESC) AS RNK
FROM (
    -- Innermost query here
) A
  • The ROW_NUMBER() function is applied to this result set, which assigns a row number to each row based on the descending order of the "F" values.
  • The result set will include all columns from the subquery and an additional column "RNK," indicating the row number.

 

3. The outermost query:

SELECT 
    *,
    CASE WHEN RNK <= (SELECT COUNT(DISTINCT user_id)
                      FROM (
                          -- The innermost query from step 1
                      ) A) / 10 THEN 'Quantile_1' END AS quantile
FROM (
    -- The middle query from step 2
) A
  • The subquery (SELECT COUNT(DISTINCT user_id) FROM ...) counts the number of distinct user_ids from the result of the innermost query in step 1. It calculates the total number of unique users.
  • The CASE statement assigns the value "Quantile_1" to the "quantile" column for rows where the "RNK" is less than or equal to one-tenth of the total number of unique users. This is essentially dividing the rows into 10 quantiles, and "Quantile_1" is assigned to the first quantile.

 

** Kind reminder:

Here is the order of query; FROM WHERE GROUP BY HAVING SELECT ORDER BY** 

 

 

The code is referred a book called SQL로 맛보는 데이터 전처리 분석(노수영 저). 

https://product.kyobobook.co.kr/detail/S000001934242

'SQL' 카테고리의 다른 글

[Basic Grammar] SQL WHERE Clause  (0) 2023.10.12
[Basic Grammar] SQL SELECT Statement  (0) 2023.10.12
SQL Code Analysis  (0) 2023.10.10
SQL Code Analysis  (0) 2023.10.10
SQL Code Analysis  (0) 2023.10.06