trying a different approach. I have a single dataset consisting of 10000 observations, 1000 per ID. I would like to run each ID from that dataset through a proc surveyselect function. I would like the first ID to run through the function then loop to the second ID to run through the function. Outputting a dataset for each ID. How can I execute that? If possible.
Three ways:
Option 1: Call execute
Create a distinct list of IDs and run call execute on each one.
proc sql noprint;
create table all_ids as
select distinct id
from have
;
quit;
data _null_;
set all_ids;
call execute(cat('
proc surveyselect data=have out=want_', id, ' sampsize=100;',
' where id = ', id, ';
run;')
);
run;
Option 2: Loop with a macro
Create a distinct macro list of IDs and loop through using a macro.
%macro survey(sampsize=100);
proc sql noprint;
select distinct id
into :all_ids separated by ' '
from have
;
quit;
%do i = 1 %to %sysfunc(countw(&all_ids.) );
%let id = %scan(&all_ids., &i.);
proc surveyselect data=have out=want_&id. sampsize=&sampsize.;
where id = &id.;
run;
%end;
%mend;
%survey;
Option 3: Strata
This will not get you individual datasets for each ID, but you can easily stratify by ID and get samples.
proc surveyselect data=have out=want sampsize=100;
strata id;
run;
Related
I have a catalog DB that stores the names of other DBs. These DBs contains the same schema and tabls. Now I want to extract all the DB names from the catalog DB and query a specific table in all those DBs.
Here is an example:
catalog DB name: CatalogDB
schema name: schemaExp
table name: tableExp
CatalogDB contains a list of otherDBs, e.g., otherDB1, otherDB2, otherDBXYZ, etc.
So I can get all the other DB names by
select DBName
from CatalogDB;
I can query the table in otherDB1 using the following query
select *
from otherDB1.schemaExp.tableExp;
I want to query the same tableExp in all the other DBs. How can I do that?
EDIT: I am not interested in combining tables since table content can get updated. Is it possible to query the catalog db and put the return db names in a parameter then run a query to select from each DBs from the parameter?
So if you wanted to select row count for "all those tables" you could:
create table test.test.cat_tab(cat_db string, cat_schema string, cat_table string, other_dbs array);
create database test_a;
create schema test_a.test;
create table test_a.test.table_exp(val int);
insert into test_a.test.table_exp values (1),(2);
create database test_b;
create schema test_b.test;
create table test_b.test.table_exp(val int);
insert into test_b.test.table_exp values (3);
insert into test.test.cat_tab select 'test', 'test', 'table_exp', array_construct('test_a','test_b');
then dynamically count those great data driven rows:
declare
counts int;
total_counts int := 0;
begin
let c1 cursor for select f.value::text ||'.'|| cat_schema ||'.'|| cat_table as fqn from test.test.cat_tab,table(flatten(input=>other_dbs)) f where cat_table = 'table_exp';
for record in c1 do
let str := 'select count(*) from ' || record.fqn;
execute immediate str;
select $1 into counts from table(result_scan(last_query_id()));
total_counts := total_counts + counts;
end for;
return total_counts;
end;
anonymous block
3
if you want to select from all those tables in a union greatness:
declare
sql string := '';
res resultset;
begin
let c1 cursor for select f.value::text ||'.'|| cat_schema ||'.'|| cat_table as fqn from test.test.cat_tab,table(flatten(input=>other_dbs)) f where cat_table = 'table_exp';
for record in c1 do
if (sql <> '') then
sql := sql || ' union all ';
end if;
sql := sql || 'select * from ' || record.fqn;
end for;
res := (execute immediate :sql);
return table(res);
end;
gives:
VAL
1
2
3
Let's say I have the following monthly tables with table names formatted such that the number after the underscore refers to the month. What I want to do is to combine these 12 tables into one without having to write 10-30 insert/union all statements
table_1
table_2
table_3
table_4
table_5
table_6
table_7
table_8
table_9
table_10
table_11
table_12 -- (only 12 in this instance but could be as many as 36)
My current approach is to first create the master table with data from table_1.
create temporary table master_table_1_12 as
select * -- * to keep it simple for this example
from table_1;
Then use variables such that I can simply keep hitting the run button until it errors out with "table_13 does not exist"
set month_id=(select max(month_id) from master_table_1_12) + 1;
set table_name=concat('table_',$month_id);
insert into master_table_1_12
select *
from identifier($table_name);
Note: All monthly tables have a month_id column
Sure it saves some space on the console(compared to multiple inserts), but I still have to run it 12 times. Are Snowflake Tasks something I could use for this? I couldn't find a fitting example from their documentation to code that up but, if anyone had success with that or with a Javascript based SP for a problem like this, please enlighten.
Here's a stored procedure that will insert into master_table_1_12 from selects on table_1 through table_12. Modify as required:
create or replace procedure FILL_MASTER_TABLE()
returns string
language javascript
as
$$
var rows = 0;
for (var i=1; i<=12; i++) {
rows += insertRows(i);
}
return rows + " rows inserted into master_table_1_12.";
// End of main function
function insertRows(i) {
sql =
`insert into master_table_1_12
select *
from table_${i};`;
return doInsert(sql);
}
function doInsert(queryString) {
var out;
cmd1 = {sqlText: queryString};
stmt = snowflake.createStatement(cmd1);
var rs = stmt.execute();;
rs.next();
return rs.getColumnValue(1);
}
$$;
call fill_master_table();
By the way, if you don't have any processing to do and just need to consolidate the tables, you can do something like this:
insert into master_table_1_12
select * from table_1
union all
select * from table_2
union all
select * from table_3
union all
select * from table_4
union all
select * from table_5
union all
select * from table_6
union all
select * from table_7
union all
select * from table_8
union all
select * from table_9
union all
select * from table_10
union all
select * from table_11
union all
select * from table_12
;
Can you not create a view on top of these 12 tables. The view will be an union of all these tables.
Based on the comments below, I further elaborated my answer. please try this approach. It will provide better performance when your table is large. Partitioning it will improve performance. This is based on real experience.
CREATE TABLE SALES_2000 (REGION VARCHAR, UNITS_SOLD NUMBER);
CREATE TABLE SALES_2001 (REGION VARCHAR, UNITS_SOLD NUMBER);
CREATE TABLE SALES_2002 (REGION VARCHAR, UNITS_SOLD NUMBER);
CREATE TABLE SALES_2003 (REGION VARCHAR, UNITS_SOLD NUMBER);
INSERT INTO SALES_2000 VALUES('ASIA', 25);
INSERT INTO SALES_2001 VALUES('ASIA', 50);
INSERT INTO SALES_2002 VALUES('ASIA', 55);
INSERT INTO SALES_2003 VALUES('ASIA', 65);
CREATE VIEW ALL_SALES AS
SELECT * FROM SALES_2000
UNION
SELECT * FROM SALES_2001
UNION
SELECT * FROM SALES_2002
UNION
SELECT * FROM SALES_2003;
SELECT * FROM ALL_SALES WHERE UNITS_SOLD = 25;
I ended up creating a UDF that spits out a create view statement and a stored procedure that executes it to create a temporary view. I work with tables following specific naming convention, so you might have to tweak this solution a little for your use case. The separation of UDF and stored proc actually helps with that as you'd mostly need to tweak the SQL UDF. I am sharing a simplified version of what I actually have in the interest of keeping it representative of the tables I listed in my question.
SQL UDF FOR GENERATING A CREATE VIEW STATETEMENT
create or replace function sandbox.public.define_view(table_pattern varchar, start_month varchar, end_month varchar)
returns table ("" varchar) as
$$
with cte1(month_id) as
(select start_month::int + row_number() over (order by 1) - 1
from table(generator(rowcount=> end_month::int - start_month::int + 1)))
,cte2(month_id,statement) as
(select 0,
concat('create or replace temporary view master_',
split_part(table_pattern,'.',-1),
start_month,
'_',
end_month,
' as ')
union all
select month_id,
concat('select * from ',
table_pattern,
month_id,
case when month_id=end_month::int then ';' else ' union all ' end)
from cte1)
select listagg(statement, '\n') within group (order by month_id) as create_view_statement
from cte2
$$;
PROCEDURE FOR EXECUTING THE OUTPUT OF THE UDF ABOVE
create or replace procedure sandbox.public.create_view(TABLE_PATTERN varchar, START_MONTH varchar,END_MONTH varchar)
returns varchar not null
language Javascript
execute as caller
as
$$
sql_command = 'select * from table(sandbox.public.define_view(:1, :2, :3))';
var stmt = snowflake.createStatement({sqlText: sql_command ,binds: [TABLE_PATTERN, START_MONTH, END_MONTH]}).execute();
stmt.next();
var ddl = stmt.getColumnValue(1);
var run=snowflake.createStatement({sqlText: ddl}).execute();
run.next();
var message=run.getColumnValue(1);
return "Temporary " + message;
$$;
USAGE DEMO
set table_pattern ='sandbox.public.table_';
set start_month ='1';
set end_month = '12';
set master_view='master_'||split_part($table_pattern,'.',-1)||$start_month||'_'||$end_month;
call create_view($table_pattern, $start_month, $end_month);
select top 100 *
from identifier($master_view);
I would like to combine tables starting with the same name into one table.
For example let's say I have a database with tables 'EXT_ABVD', 'EXT_ADAD','EXT_AVSA','OTHER', and I want to combine all tables beginning with 'EXT_', I would want the result of
select col1 ,col2 from EXT_ABVD
union all
select col1 ,col2 from EXT_ADAD
union all
select col1 ,col2 from EXT_AVSA;
I would like to do this on a regular basis (daily for example), and every time this runs there may be new tables starting with 'EXT_'. I don't want to update the union_all query manually.
I am new to Snowflake and don't know how can I do that? Can I use a script inside Snowflake?
Given these tables:
CREATE TABLE TEST_DB.PUBLIC.EXT_ABVD (col1 INTEGER, col2 INTEGER);
CREATE TABLE TEST_DB.PUBLIC.EXT_ADAD (col1 INTEGER, col2 INTEGER);
CREATE TABLE TEST_DB.PUBLIC.EXT_ADAQ (col1 INTEGER, col2 INTEGER);
A view like this could be dynamically created:
CREATE OR REPLACE VIEW TEST_DB.PUBLIC.union_view AS
SELECT * FROM TEST_DB.PUBLIC.EXT_ABVD
UNION ALL
SELECT * FROM TEST_DB.PUBLIC.EXT_ADAD
UNION ALL
SELECT * FROM TEST_DB.PUBLIC.EXT_ADAQ
Using this Procedure:
create or replace procedure TEST_DB.PUBLIC.CREATE_UNION_VEIW(TBL_PREFIX VARCHAR)
returns VARCHAR -- return final create statement
language javascript
as
$$
// build query to get tables from information_schema
var get_tables_stmt = "SELECT Table_Name FROM TEST_DB.INFORMATION_SCHEMA.TABLES \
WHERE TABLE_TYPE = 'BASE TABLE' AND TABLE_NAME LIKE '"+ TBL_PREFIX + "%';"
var get_tables_stmt = snowflake.createStatement({sqlText:get_tables_stmt });
// get result set containing all table names
var tables = get_tables_stmt.execute();
// to control if UNION ALL should be added or not
// this could likely be handled more elegantly but i don't know JavaScript :)
var row_count = get_tables_stmt.getRowCount();
var rows_iterated = 0;
// define view name
var create_statement = "CREATE OR REPLACE VIEW TEST_DB.PUBLIC.union_view AS \n";
// loop over result set to build statement
while (tables.next()) {
rows_iterated += 1;
// we get values from the first (and only) column in the result set
var table_name = tables.getColumnValue(1);
// this will obviously fail if the column count doesnt match
create_statement += "SELECT * FROM TEST_DB.PUBLIC." + table_name
// add union all to all but last row
if (rows_iterated < row_count){
create_statement += "\n UNION ALL \n"
}
}
// create the view
var create_statement = snowflake.createStatement( {sqlText: create_statement} );
create_statement.execute();
// return the create statement as text
return create_statement.getSqlText();
$$
;
Which we would call like this: CALL CREATE_UNION_VIEW('EXT_A');
This is just a basic example so logic for column counts, schemas etc. likely needs to be added. But given this I think you will be able to figure out how to deal with result sets, parameters and statements.
Edit: See here for how to set up a task that would run a procedure on daily basis. The most basic would in this case look like this:
create or replace task create_union_task
warehouse = COMPUTE_WH
schedule = '1440 minute' -- once every day
as
CALL CREATE_UNION_VIEW('EXT_A');
The only way you can achieve this currently is via a Snowflake Stored Procedure.
You don't specify how you want to consume the result of the query, but a convenient way is via a VIEW. So the Stored Procedure has to generate a VIEW definition containing the query in your question.
I have to run a report every morning, but cannot do until all the tables that I am querying from have been updated for that day.
I want to make a macro that does not continue with sas processing until all of the tables have updated. There is one table with all of the tables in SSDM and date and time they are updated. For easier purposes,I will call this table Info and the col names are tablename and dateupdated. The tables I will be using are table1, table2 and table3 out of n tables.
%macro Updated;
proc sql;
create table Data_ready as
select
tablename,
dateupdated,
case when dateupdated=today() then 'Ready'
else 'Not Ready'
end as 'Status'
from Info
where tablename in (table1, table2, ..., tablen)
quit;
%if count(Data_ready.Status = 'Ready') ne count(Data_ready.tablename) %then %do;
proc sql;
drop table work.Data_ready
;quit;
sleep(60*30,1);
%end;
%else %do;
proc print data=Data_ready;
run;
%end
%mend;
*here I will have the rest of the code to produce the report knowing that the information is up to date
Is there a way I can do this with a do while or do until ? I have been trying to figure out some kind of macro but am running into some problems with making sure all tables are updated before going forward. Thanks in advance.
Here is some sample code (untested) that uses DICTIONARY.TABLES to examine the modification time stamp of a dataset, and counts how many of those correspond to today(). A try_limit is also used to prevent infinite waiting.
%macro wait_for_all_today (libname=);
%local today_count all_count;
%local try try_limit try_wait_s;
%local rc;
%let try = 0;
%let try_limit = 10;
%let try_wait_s = 60;
%do %until (&today_count = &all_count or &try > &try_limit);
%let try = %eval (&try + 1);
%if &try > 1 %then %do;
%let rc = %sysfunc(sleep(&try_wait_s, 1));
%end;
proc sql noprint;
select count(*), sum(today()=datepart(moddate))
into :all_count, :today_count
from dictionary.tables
where libname = "%sysfunc(upcase(&libname))"
and memtype = "DATA"
;
quit;
%* at this point today_count and all_count
%* have values that will be used in the UNTIL evaluation;
%end;
%if &today_count ne &all_count %then %do;
%put ERROR: Not all data sets in Library &libname were updated today. Waited a bunch of times;
%abort cancel;
%end;
%mend;
I created a database with NBA player statistics just to practice SQL and SSRS. I am new to working with stored procedures, but I created the following procedure that should (I think) allow me to specify the team and number of minutes.
CREATE PROCEDURE extrapstats
--Declare variables for the team and the amount of minutes to use in --calculations
#team NCHAR OUTPUT,
#minutes DECIMAL OUTPUT
AS
BEGIN
SELECT p.Fname + ' ' + p.Lname AS Player_Name,
p.Position,
--Creates averages based on the number of minutes per game specified in #minutes
(SUM(plg.PTS)/SUM(plg.MP))*#minutes AS PTS,
(SUM(plg.TRB)/SUM(plg.MP))*#minutes AS TRB,
(SUM(plg.AST)/SUM(plg.MP))*#minutes AS AST,
(SUM(plg.BLK)/SUM(plg.MP))*#minutes AS BLK,
(SUM(plg.STL)/SUM(plg.MP))*#minutes AS STL,
(SUM(plg.TOV)/SUM(plg.MP))*#minutes AS TOV,
(SUM(plg.FT)/SUM(plg.MP))*#minutes AS FTs,
SUM(plg.FT)/SUM(plg.FTA) AS FT_Percentage,
(SUM(plg.FG)/SUM(plg.MP))*#minutes AS FGs,
SUM(FG)/SUM(FGA) as Field_Percentage,
(SUM(plg.[3P])/SUM(plg.MP))*#minutes AS Threes,
SUM([3P])/SUM([3PA]) AS Three_Point_Percentage
FROM PlayerGameLog plg
--Joins the Players and PlayerGameLog tables
INNER JOIN Players p
ON p.PlayerID = plg.PlayerID
AND TeamID = #team
GROUP BY p.Fname, p.Lname, p.Position, p.TeamID
ORDER BY PTS DESC
END;
I then tried to use the SP by executing the query below:
DECLARE #team NCHAR,
#minutes DECIMAL
EXECUTE extrapstats #team = 'OKC', #minutes = 35
SELECT *
When I do that, I encounter this message:
Msg 263, Level 16, State 1, Line 5
Must specify table to select from.
I've tried different variations of this, but nothing has worked. I thought the SP specified the tables from which to select the data.
Any ideas?
Declaring the stored procedure parameters with OUTPUT clause means the values will be returned by the stored procedure to the calling function. However you are using them as input parameters, please remove the OUTPUT clause from both input parameters and try.
Also remove the SELECT * in your execute statement, it is not required, the stored procedure will return the data as it has the select statement.