Transpose rows 1 and 0 's into different rows in Snowflake - snowflake-cloud-data-platform

I'm trying to load a file and transpose the row into different rows.
Days Column have 11010011 and need to transpose into vertical format.
Below is the sample input
I'm trying to get the expected output like below
Can you please help me on this in Snowflake? Appreciate your help

Replace '1' with '1,' and '0' with '0,'. Trim the trailing comma. You can then use split to table to turn that into rows:
with SOURCE_DATA as
(
select COLUMN1::int as FACTORY
,COLUMN2::int as YEAR
,COLUMN3::string as DAYS
from (values
(01,2021,'01010100100101010001'),
(99,2021,'00100111010101011010')
)
)
select FACTORY, YEAR, SEQ as SOURCE_ROW, INDEX as POSITION_IN_STRING, VALUE as WORKING_DAY
from SOURCE_DATA, table(split_to_table(trim(replace(replace(DAYS,'1','1,'),'0','0,'),','),',')) D
;
Abbreviated output:
FACTORY
YEAR
SOURCE_ROW
POSITION_IN_STRING
WORKING_DAY
1
2021
1
1
0
1
2021
1
2
1
1
2021
1
3
0
1
2021
1
4
1
1
2021
1
5
0
The split() table function gives you some metadata columns with information on the split. You can change the sample to select * to see them and maybe they're useful in some way for your requirements.

Related

SQL Server query problem. example is in excel sheet picture

Please see the following pic and i want to convert this formula in SQL Server.
in excel sheet
M N
15 1 0
16 3 1
17 5 2
18 8 4
19 9 4
N= IF(M16-M15<=1,N15,M16-M15-1+N15
Please see the screenshot for reference:
As per your tags, this can be done with LAG and then doing a running total.
For each row, first calculate the difference in M from the previous row (using LAG) - I call this Dif_Last_M. This mirrors the 'M24-M23' part of your formula.
If Dif_Last_M is <= 1, add 0 to the running total (effectively making the running total the same as for the previous row)
Else if Dif_Last_M is > 1, add (Dif_Last_M minus 1) to the running total
Here is the code assuming your source table is called #Temp and has an ID (sorting value)
WITH M_info AS
(SELECT ID, M, (M - LAG(M, 1) OVER (ORDER BY ID)) AS Dif_Last_M
FROM #Temp
)
SELECT ID,
M,
SUM(CASE WHEN Dif_Last_M > 1 THEN Dif_Last_M - 1 ELSE 0 END) OVER (ORDER BY ID) AS N
FROM M_info;
And here are the results
ID M N
1 1 0
2 3 1
3 5 2
4 8 4
5 9 4
6 12 6
7 13 6
Here is a db<>fiddle with the above. It also includes additional queries showing
The result from the CTE
The values used in the running total
Note that while it possible to do this with recursive CTEs, they tend to have performance problems (they are loops, fundamentally). Soit is better (performance-wise) to avoid recursive CTEs if possible.

Using group by in Proc SQL for SAS

I am trying to summarize my data set using the proc sql, but I have repeated values in the output, a simple version of my code is:
PROC SQL;
CREATE TABLE perm.rx_4 AS
SELECT patid,ndc,fill_mon,
COUNT(dea) AS n_dea,
sum(DEDUCT) AS tot_DEDUCT
FROM perm.rx
GROUP BY patid,ndc,fill_mon;
QUIT;
Some sample output is:
Obs Patid Ndc FILL_mon n_dea DEDUCT
3815 33003605204 00054465029 2000-05 2 0
3816 33003605204 00054465029 2000-05 2 0
12257 33004361450 00406035701 2000-06 2 0
16564 33004744098 00603128458 2000-05 2 0
16565 33004744098 00603128458 2000-05 2 0
16566 33004744098 00603128458 2000-06 2 0
16567 33004744098 00603128458 2000-06 2 0
46380 33008165116 00406035705 2000-06 2 0
85179 33013674758 00406035801 2000-05 2 0
89248 33014228307 00054465029 2000-05 2 0
107514 33016949900 00406035805 2000-06 2 0
135047 33056226897 63481062370 2000-05 2 0
213691 33065594501 00472141916 2000-05 2 0
215192 33065657835 63481062370 2000-06 2 0
242848 33066899581 60432024516 2000-06 2 0
As you can see there are repeated out put, for example obs 3815,3816. I have saw some people had similar problem, but the answers didn't work for me.
The content of the dataset is this:
The SAS System 5
17:01 Thursday, December 3, 2015
The CONTENTS Procedure
Engine/Host Dependent Information
Data Set Page Size 65536
Number of Data Set Pages 210
First Data Page 1
Max Obs per Page 1360
Obs in First Data Page 1310
Number of Data Set Repairs 0
Filename /home/zahram/optum/rx_4.sas7bdat
Release Created 9.0401M2
Host Created Linux
Inode Number 424673574
Access Permission rw-r-----
Owner Name zahram
File Size (bytes) 13828096
The SAS System 6
17:01 Thursday, December 3, 2015
The CONTENTS Procedure
Alphabetic List of Variables and Attributes
# Variable Type Len Format Informat Label
3 FILL_mon Num 8 YYMMD. Fill month
2 Ndc Char 11 $11. $20. Ndc
1 Patid Num 8 19. Patid
4 n_dea Num 8
5 tot_DEDUCT Num 8
Sort Information
Sortedby Patid Ndc FILL_mon
Validated YES
Character Set ASCII
The SAS System 7
17:01 Thursday, December 3, 2015
The CONTENTS Procedure
Sort Information
Sort Option NODUPKEY
NOTE: PROCEDURE CONTENTS used (Total process time):
real time 0.08 seconds
cpu time 0.01 seconds
I'll guess that you have a format on a variable, most likely the date. Proc SQL does not aggregate over formatted values but will use the underlying values but still shows them as formatted, so they appear as duplicates. Your proc contents confirms this. You can get around this by converting this the variable to a character variable.
PROC SQL;
CREATE TABLE perm.rx_4 AS
SELECT patid,ndc, put(fill_mon, yymmd.) as fill_month,
COUNT(dea) AS n_dea,
sum(DEDUCT) AS tot_DEDUCT
FROM perm.rx
GROUP BY patid,ndc, calculated fill_month;
QUIT;

Running counts of records and sum of max() records within date range based specified intervals in t-sql

Sample data: (assume year_month_record is the first day of the month and is datetime data type)
location item year_month_record type visits1 visits2
ABC111 11JF445553 2014-01 sales 3 5
ABC111 11JF445553 2014-02 sales 3 6
ABC111 11JF445553 2014-03 sales 2 8
ABC111 11JF445553 2014-04 sales 2 4
ABC111 22WZ777814 2014-02 sales 3 5
ABC111 55RR342013 2014-01 nsales 1 2
For the given sample data, I need to count how many times records with the same location and item appear within specified intervals. In addition, I need to grab the maximum value for specified interval / time frame and sum it up based on location, item_number and type.
The output should look something like this:
location year_month_record length_months type count_unique_visits sum_max_visits1 sum_max_visits2
ABC111 2014-01 3 sales 4 6 13
ABC111 2014-02 3 sales 4 6 12
ABC111 2014-03 3 sales 2 4 12
ABC111 2014-04 3 sales 1 2 4
ABC111 2014-01 3 nsales 1 1 2
notes for calculating visits1 / visits2 above
example output of record 1: max(of item 11JF445553) = 3 + max(item 22WZ77781) = 3. Sum = 6 (item 55RR342013 has a different type). Note 2. All records with max summed up are within "length_months" specified of 3 months. 2014-01 through 2014-03.
new "type" will cause new grouping to start
Additional notes:
count_unique_visits is the count for each record within date range
length_months is defined prior to execution and can be hardcoded
current year_month_record + length_months (i.e. 2014-01 year_month_record with length_months = 3) is 01/2014 through 03/2014
I've tried creating a recursive CTE to select the count and max, but i'm doing something wrong.
Basically, I need to be able to recursively, grab a count and the max visit1/2 for a given interval.
Starting with 01/2014, it would need to look for the max(visits1/2) for the next three months (basically, 01/2014 - 04/2014) and return those. In 02/2014, it would use the range of 02/2014 through 05/2014 and return the max there as well. It would continue this throughout the recordset. The interval would be 3 months, but then I could copy the query and replace with 6 months and so on and so forth.
Closing this topic to ask a more targeted/specific question.
Any help would be appreciated.
You can use a combination of a groupping subquery followed by a cross apply subquery:
DECLARE #len int = 3
SELECT grp.*, SUM(ca.cuv) count_unique_visits, SUM(ca.visits1) sum_max_visits1, SUM(ca.visits2) sum_max_visits2
FROM
(SELECT v.location, v.year_month_record, v.type
FROM Visits v
GROUP BY v.location, v.year_month_record, v.type) grp CROSS APPLY
(SELECT COUNT(*) cuv, MAX(visits1) visits1, MAX(visits2) visits2
FROM Visits ca_v
WHERE ca_v.location = grp.location AND grp.type = ca_v.type AND ca_v.year_month_record >= grp.year_month_record AND
ca_v.year_month_record < DATEADD(month, #len, grp.year_month_record)
GROUP BY ca_v.item
) ca
GROUP BY grp.location, grp.year_month_record, grp.type
ORDER BY grp.type DESC, grp.year_month_record
You can see the results in this SQLFiddle.
NOTE: As I wrote in the comment to the original question, I suspect you have a mistake in the requested output, if not, please explain...

Condense multiple rows to single row with counts based on unique values in sqlite

I am trying to condense a table which contains multiple rows per event to a smaller table which contains counts of key sub-events within each event. Events are defined based on unique combinations across columns.
As a specific example, say I have the following data involving customer visits to various stores on different dates with different items purchased:
cust date store item_type
a 1 Main St 1
a 1 Main St 2
a 1 Main St 2
a 1 Main St 2
b 1 Main St 1
b 1 Main St 2
b 1 Main St 2
c 1 Main St 1
d 2 Elm St 1
d 2 Elm St 3
e 2 Main St 1
e 2 Main St 1
a 3 Main St 1
a 3 Main St 2
I would like to restructure the data to a table that contains a single line per customer visit on a given day, with appropriate counts. I am trying to understand how to use SQLite to condense this to:
Index cust date store n_items item1 item2 item3 item4
1 a 1 Main St 4 1 3 0 0
2 b 1 Main St 3 1 2 0 0
3 c 1 Main St 1 1 0 0 0
4 d 2 Elm St 2 1 0 1 0
5 e 2 Main St 2 2 0 0 0
6 a 3 Main St 2 1 1 0 0
I can do this in excel for this trivial example (begin with sumproduct( cutomer * date) as suggested here, followed by cumulative sum on this column to generate Index, then countif and countifs to generate desired counts).
Excel is poorly suited to doing this for thousands of rows, so I am looking for a solution using SQLite.
Sadly, my SQLite kung-fu is weak.
I think this is the closest I have found, but I am having trouble understanding exactly how to adapt it.
When I tried a more basic approach to begin by generating a unique index:
CREATE UNIQUE INDEX ui ON t(cust, date);
I get:
Error: indexed columns are not unique
I would greatly appreciate any help with where to start. Many thanks in advance!
To create one result record for each unique combination of column values, use GROUP BY.
The number of records in the group is available with COUNT.
To count specific item types, use a boolean expression like item_type=x, which returns 0 or 1, and sum this over all records in the group:
SELECT cust,
date,
store,
COUNT(*) AS n_items,
SUM(item_type = 1) AS item1,
SUM(item_type = 2) AS item2,
SUM(item_type = 3) AS item3,
SUM(item_type = 4) AS item4
FROM t
GROUP BY cust,
date,
store

Total field in Crosstab query in SQL Server 2008

I have got this result with the answer given by bluefeet
Equipt BSL AQ
TFP 3 2
TM 1 0
VCB 18 6
VCD 5 8
Query script was
SELECT Equipt, [BSL] AS BSL, [AQ] AS AQ
FROM
(
SELECT Equipt, Shed
FROM PunctualityMain WHERE Date >= '4/1/2012' AND Date <= '4/30/2012' AND classification = 'Loco'
) x
PIVOT
(
COUNT(Shed)
FOR Shed IN ([BSL], [AQ])
) p
Can it be possible to added one total field the above script like access crosstab
Equipt BSL AQ TTL
TFP 3 2 5
TM 1 0 1
VCB 18 6 24
VCD 5 8 13
You should be able to add the Totals field by including a new column:
SELECT Equipt, [BSL] AS BSL, [AQ] AS AQ, ([BSL] + [AQ]) as TTL
FROM
(
SELECT Equipt, Shed
FROM PunctualityMain
WHERE Date >= '4/1/2012' AND Date <= '4/30/2012'
AND classification = 'Loco'
) x
PIVOT
(
COUNT(Shed)
FOR Shed IN ([BSL], [AQ])
) p

Resources