Notice
Recent Posts
Recent Comments
Link
일 | 월 | 화 | 수 | 목 | 금 | 토 |
---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | 5 | ||
6 | 7 | 8 | 9 | 10 | 11 | 12 |
13 | 14 | 15 | 16 | 17 | 18 | 19 |
20 | 21 | 22 | 23 | 24 | 25 | 26 |
27 | 28 | 29 | 30 |
Tags
- Text mining
- start exercise
- pie charts
- MySQL
- machine learning
- For loops
- error
- Python
- AS
- SQL
- Else
- Github
- line color
- continue
- train/test
- data distribution
- line width
- matplotlib.pyplot
- Default X points
- Text Analytics
- PANDAS
- multiple lines
- break
- matplotlib
- variables
- PROJECT
- self parameter
- iterates
- __init__
- polynomial regression
Archives
- Today
- Total
Data Science Explorer
SQL Code Analysis 본문
반응형
This SQL code appears to be generating a new column called "quantile" for each row in the result set. The quantile column will have the value "Quantile_1" for rows that meet a specific condition. Let's break down the code step by step:
SELECT
*,
CASE WHEN RNK <= (SELECT COUNT(DISTINCT user_id)
FROM
(SELECT
user_id
, COUNT(DISTINCT order_id) F
FROM orders
GROUP BY 1
) A) / 10 THEN 'Quantile_1' END quantile
FROM (
SELECT
*
, ROW_NUMBER() OVER(ORDER BY F DESC) RNK
FROM (
SELECT
user_id
, COUNT(DISTINCT order_id) AS F
FROM
orders
GROUP BY 1
) A
) A
;
1. The innermost query:
SELECT
user_id,
COUNT(DISTINCT order_id) AS F
FROM
orders
GROUP BY 1
- This query counts the number of distinct order_ids for each user_id in the "orders" table and assigns it an alias "F." The result will be a list of user_ids and their corresponding F values.
2. The middle query (aliased as "A"):
SELECT
*,
ROW_NUMBER() OVER (ORDER BY F DESC) AS RNK
FROM (
-- Innermost query here
) A
- The ROW_NUMBER() function is applied to this result set, which assigns a row number to each row based on the descending order of the "F" values.
- The result set will include all columns from the subquery and an additional column "RNK," indicating the row number.
3. The outermost query:
SELECT
*,
CASE WHEN RNK <= (SELECT COUNT(DISTINCT user_id)
FROM (
-- The innermost query from step 1
) A) / 10 THEN 'Quantile_1' END AS quantile
FROM (
-- The middle query from step 2
) A
- The subquery (SELECT COUNT(DISTINCT user_id) FROM ...) counts the number of distinct user_ids from the result of the innermost query in step 1. It calculates the total number of unique users.
- The CASE statement assigns the value "Quantile_1" to the "quantile" column for rows where the "RNK" is less than or equal to one-tenth of the total number of unique users. This is essentially dividing the rows into 10 quantiles, and "Quantile_1" is assigned to the first quantile.
** Kind reminder:
Here is the order of query; FROM WHERE GROUP BY HAVING SELECT ORDER BY**
The code is referred a book called SQL로 맛보는 데이터 전처리 분석(노수영 저).
'SQL' 카테고리의 다른 글
[Basic Grammar] SQL WHERE Clause (0) | 2023.10.12 |
---|---|
[Basic Grammar] SQL SELECT Statement (0) | 2023.10.12 |
SQL Code Analysis (0) | 2023.10.10 |
SQL Code Analysis (0) | 2023.10.10 |
SQL Code Analysis (0) | 2023.10.06 |