Notice
Recent Posts
Recent Comments
Link
일 | 월 | 화 | 수 | 목 | 금 | 토 |
---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | 5 | ||
6 | 7 | 8 | 9 | 10 | 11 | 12 |
13 | 14 | 15 | 16 | 17 | 18 | 19 |
20 | 21 | 22 | 23 | 24 | 25 | 26 |
27 | 28 | 29 | 30 |
Tags
- train/test
- Default X points
- For loops
- variables
- break
- Github
- iterates
- error
- MySQL
- Text mining
- start exercise
- continue
- multiple lines
- matplotlib.pyplot
- PANDAS
- polynomial regression
- line color
- PROJECT
- Text Analytics
- AS
- self parameter
- SQL
- Else
- Python
- machine learning
- matplotlib
- __init__
- pie charts
- data distribution
- line width
Archives
- Today
- Total
Data Science Explorer
SQL Code Analysis 본문
반응형
Let's analyze the following code:
SELECT *
FROM (
SELECT
*
, ROW_NUMBER() OVER(PARTITION BY department ORDER BY RET_RATIO DESC) AS RNK
FROM (
SELECT
department
, A.product_id
, SUM(reordered) / COUNT(*) AS RET_RATIO
FROM order_products__prior A
LEFT
JOIN products B
ON A.product_id = B.product_id
LEFT
JOIN departments C
ON B.department_id = C.department_id
GROUP BY 1, 2
) A
WHERE product_id IN (
SELECT product_id
FROM (
SELECT
product_id
, SUM(reordered) AS TOTAL
FROM
order_products__prior
GROUP BY 1
) A
WHERE TOTAL >= 10
)
) BASE
WHERE RNK <= 10
;
1. Innermost Subquery (Calculating Repurchase Rates):
SELECT
department
, A.product_id
, SUM(reordered) / COUNT(*) AS RET_RATIO
FROM order_products__prior A
LEFT JOIN products B ON A.product_id = B.product_id
LEFT JOIN departments C ON B.department_id = C.department_id
GROUP BY 1, 2
- In this part, we are calculating the repurchase rates (RET_RATIO) for each product within each department.
- We use JOIN operations to link the order_products__prior, products, and departments tables to obtain information about the products and their departments.
- The data is grouped by department and product_id, and the repurchase rate is calculated as the sum of reordered items divided by the count of all items for each product.
- This subquery calculates the repurchase rates for all products within all departments.
2. Middle Subquery (Filtering Products by Order Quantity):
SELECT product_id
FROM (
SELECT
product_id
, SUM(reordered) AS TOTAL
FROM order_products__prior
GROUP BY 1
) A
WHERE TOTAL >= 10
- This part filters out products based on a minimum total order quantity (TOTAL) threshold of 10 or more.
- The subquery calculates the total order quantity for each product by summing the reordered items for each product_id.
- It filters the products to include only those with a total order quantity of 10 or more.
3. Outer Query (Assigning Rankings and Final Filtering):
SELECT *
FROM (
-- ...
) BASE
WHERE RNK <= 10;
- In the outer query, we assign rankings to the products within each department based on their repurchase rates. The ROW_NUMBER() window function is used to assign these rankings, and PARTITION BY department ensures that rankings are assigned separately for each department.
- The products with higher repurchase rates receive lower rank numbers (sorted in descending order).
- The final WHERE RNK <= 10 filter selects products with a rank of 10 or less within each department. This effectively limits the result set to the top 10 products with the highest repurchase rates in each department.
The code is referred a book called SQL로 맛보는 데이터 전처리 분석(노수영 저).
'SQL' 카테고리의 다른 글
[Basic Grammar] SQL SELECT Statement (0) | 2023.10.12 |
---|---|
SQL Code Analysis (0) | 2023.10.11 |
SQL Code Analysis (0) | 2023.10.10 |
SQL Code Analysis (0) | 2023.10.06 |
SQL Code Analysis (0) | 2023.10.06 |