SQL assignment: Using Variables and using IF statements

SQL assignment: Using Variables and using IF statements - sql-server

Okay, So here is the first question on the assignment. I just don't know where to start with the problem. If anyone could just help me get started I'd be able to figure it out probably. Thanks
Set two variable values as follows:
#minEnrollment = 10
#maxEnrollment = 20
Determine the number of courses with enrollments between the values assigned to #minEnrollment and #maxEnrollment. If there are courses with enrollments between these two values, display a message in the form
There is/are __class(es) with enrollments between __ and __..
If there are no classes within the defined range, display a message in the form
“
There are no classes with an enrollment between __ and __ students.”
.....
And here is the database to use:
CREATE TABLE Faculty
(Faculty_ID INT PRIMARY KEY IDENTITY,
LastName VARCHAR (20) NOT NULL,
FirstName VARCHAR (20) NOT NULL,
Department VARCHAR (10) SPARSE NULL,
Campus VARCHAR (10) SPARSE NULL);
INSERT INTO Faculty VALUES ('Brown', 'Joe', 'Business', 'Kent');
INSERT INTO Faculty VALUES ('Smith', 'John', 'Economics', 'Kent');
INSERT INTO Faculty VALUES ('Jones', 'Sally', 'English', 'South');
INSERT INTO Faculty VALUES ('Black', 'Bill', 'Economics', 'Kent');
INSERT INTO Faculty VALUES ('Green', 'Gene', 'Business', 'South');
CREATE TABLE Course
(Course_ID INT PRIMARY KEY IDENTITY,
Ref_Number CHAR (5) CHECK (Ref_Number LIKE '[0-9][0-9][0-9][0-9][0-9]'),
Faculty_ID INT NOT NULL REFERENCES Faculty (Faculty_ID),
Term CHAR (1) CHECK (Term LIKE '[A-C]'),
Enrollment INT NULL DEFAULT 0 CHECK (Enrollment < 40))
INSERT INTO Course VALUES ('12345', 3, 'A', 24);
INSERT INTO Course VALUES ('54321', 3, 'B', 18);
INSERT INTO Course VALUES ('13524', 1, 'B', 7);
INSERT INTO Course VALUES ('24653', 1, 'C', 29);
INSERT INTO Course VALUES ('98765', 5, 'A', 35);
INSERT INTO Course VALUES ('14862', 2, 'B', 14);
INSERT INTO Course VALUES ('96032', 1, 'C', 8);
INSERT INTO Course VALUES ('81256', 5, 'A', 5);
INSERT INTO Course VALUES ('64321', 2, 'C', 23);
INSERT INTO Course VALUES ('90908', 3, 'A', 38);

Your request is how to get started, so I'm going to focus on that instead of any specific code.
Start by getting the results that are being asked for, then move on to formatting them as requested.
First, work with the Course table and your existing variables, #minEnrollment = 10 and #maxEnrollment = 20, to get the list that meets the enrollment requirements. Hint: WHERE and BETWEEN. (The Faculty table you have listed doesn't factor into this at all.) After you're sure you have the right results in that list, use the COUNT function to get the number you need for your answer, and assign that value to a new variable.
Now, to the output. IF your COUNT variable is >0, CONCATenate a string together using your variables to fill in the values in the sentence you're supposed to write. ELSE, use the variables to fill in the other sentence.

Part of the problem is, you've actually got 3 or so questions in your post. So instead of trying to post a full answer, I'm instead going to try to get you started with each of the subquestions.
Subquestion #1 - How to assign variables.
You'll need to do some googling on 'how to declare a variable in SQL' and 'how to set a variable in SQL'. This one won't be too hard.
Subquestion #2 - How to use variables in a query
Again, you'll need to google how to do this - something like 'How to use a variable in a SQL query'. You'll find this one is pretty simple as well.
Subquestion #3 - How to use IF in SQL Server.
Not to beat a dead horse, but you'll need to google this. However, one thing I would like to note: I'd test this one first. Ultimately, you're going to want something that looks like this:
IF 1 = 1 -- note, this is NOT the correct syntax (on purpose.)
STUFF
ELSE
OTHERSTUFF
And then switch it to:
IF 1 = 2 -- note, this is NOT the correct syntax (on purpose.)
STUFF
ELSE
OTHERSTUFF
... to verify the 'STUFF' happens when the case is true, and that it otherwise does the 'OTHERSTUFF'. Only after you've gotten it down, should you try to integrate it in with your query (otherwise, you'll get frustrated not knowing what's going on, and it'll be tougher to test.)

One step at a time. Let me give you some help:
Set two variable values as follows: #minEnrollment = 10 #maxEnrollment
= 20
Translated to SQL, this would look like:
Declare #minEnrollment integer = 10
Declare #maxEnrollment integer =15
Declare #CourseCount integer = 0
Determine the number of courses with enrollments between the values
assigned to #minEnrollment and #maxEnrollment.
Now you have to query your tables to determine the count:
SET #CourseCount = (SELECT Count(Course_ID) from Courses where Enrollment > #minEnrollment
This doesn't answer your questions exactly (ON PURPOSE). Hopefully you can spot the mistakes and fix them yourself. The other answers gave you some helpful hints as well.

Related

Snowflake: Trouble getting numbers to return from a PIVOT function

I am moving a query from SQL Server to Snowflake. Part of the query creates a pivot table. The pivot table part works fine (I have run it in isolation, and it pulls numbers I expect).
However, the following parts of the query rely on the pivot table- and those parts fail. Some of the fields return as a string-type. I believe that the problem is Snowflake is having issues converting string data to numeric data. I have tried CAST, TRY_TO_DOUBLE/NUMBER, but these just pull up 0.
I will put the code down below, and I appreciate any insight as to what I can do!
CREATE OR REPLACE TEMP TABLE ATTR_PIVOT_MONTHLY_RATES AS (
SELECT
Market,
Coverage_Mo,
ZEROIFNULL(TRY_TO_DOUBLE('Starting Membership')) AS Starting_Membership,
ZEROIFNULL(TRY_TO_DOUBLE('Member Adds')) AS Member_Adds,
ZEROIFNULL(TRY_TO_DOUBLE('Member Attrition')) AS Member_Attrition,
((ZEROIFNULL(CAST('Starting Membership' AS FLOAT))
+ ZEROIFNULL(CAST('Member Adds' AS FLOAT))
+ ZEROIFNULL(CAST('Member Attrition' AS FLOAT)))-ZEROIFNULL(CAST('Starting Membership' AS FLOAT)))
/ZEROIFNULL(CAST('Starting Membership' AS FLOAT)) AS "% Change"
FROM
(SELECT * FROM ATTR_PIVOT
WHERE 'Starting Membership' IS NOT NULL) PT)
I realize this is a VERY big question with a lot of moving parts... So my main question is: How can I successfully change the data type to numeric value, so that hopefully the formulas work in the second half of the query?
Thank you so much for reading through it all!
EDITED FOR SHORTENING THE QUERY WITH UNNEEDED SYNTAX
CAST(), TRY_TO_DOUBLE(), TRY_TO_NUMBER(). I have also put the fields (Starting Membership, Member Adds) in single and double quotation marks.

Unless you are quoting your field names in this post just to highlight them for some reason, the way you've written this query would indicate that you are trying to cast a string value to a number.
For example:
ZEROIFNULL(TRY_TO_DOUBLE('Starting Membership'))
This is simply trying to cast a string literal value of Starting Membership to a double. This will always be NULL. And then your ZEROIFNULL() function is turning your NULL into a 0 (zero).
Without seeing the rest of your query that defines the column names, I can't provide you with a correction, but try using field names, not quoted string values, in your query and see if that gives you what you need.

You first mistake is all your single quoted columns names are being treated as strings/text/char
example your inner select:
with ATTR_PIVOT(id, studentname) as (
select * from values
(1, 'student_a'),
(1, 'student_b'),
(1, 'student_c'),
(2, 'student_z'),
(2, 'student_a')
)
SELECT *
FROM ATTR_PIVOT
WHERE 'Starting Membership' IS NOT NULL
there is no "starting membership" column and we get all the rows..
ID
STUDENTNAME
1
student_a
1
student_b
1
student_c
2
student_z
2
student_a
So you need to change 'Starting Membership' -> "Starting Membership" etc,etc,etc
As Mike mentioned, the 0 results is because the TRY_TO_DOUBLE always fails, and thus the null is always turned to zero.
now, with real "string" values, in real named columns:
with ATTR_PIVOT(Market, Coverage_Mo, "Starting Membership", "Member Adds", "Member Attrition") as (
select * from values
(1, 10 ,'student_a', '23', '150' )
)
SELECT
Market,
Coverage_Mo,
ZEROIFNULL(TRY_TO_DOUBLE("Starting Membership")) AS Starting_Membership,
ZEROIFNULL(TRY_TO_DOUBLE("Member Adds")) AS Member_Adds,
ZEROIFNULL(TRY_TO_DOUBLE("Member Attrition")) AS Member_Attrition
FROM ATTR_PIVOT
WHERE "Starting Membership" IS NOT NULL
we get what we would expect:
MARKET
COVERAGE_MO
STARTING_MEMBERSHIP
MEMBER_ADDS
MEMBER_ATTRITION
1
10
0
23
150

How to replace string column with number 0 if the values in that column is null

Hope this is right place to ask question related to snowflake database..
I would like to know how to replace string column value with 0 if it is null.
so far, i tried nvl function, but that didn't work.
CREATE TABLE EMP(ENAME VARCHAR(10));
INSERT INTO EMP VALUES('JACK');
INSERT INTO EMP VALUES(null);
SELECT NVL(ENAME,0) FROM EMP1;
Error :Numeric value 'JACK' is not recognized
Any help would be appreciated.
Thanks,
Nawaz

SQL is strongly typed. The output type of NVL is being inferred to be an NUMBER, so the actual query looks something like
SELECT NVL(ENAME::NUMBER, 0) FROM EMP1;
You should decide what type your output should be. If you want strings, then you will need to pass NVL a string, like
SELECT NVL(ENAME, '0') FROM EMP1;
If you want integers, you will need to convert the strings to integers safely. For example, if you want non-integers to become NULL, then 0, then you can use
SELECT NVL(TRY_TO_INTEGER(ENAME), 0) FROM EMP1;

how to append the rows using indexing (or any other method)?

I have a data set with 6 columns and 4.5 million rows, and I want to iterate through all the data set to compare the value of the last column with the value of the 1st column for each and every row in my data set and append the rows whose last column value matches the value of first column of a row to that row. the first and last columns are indexed, but none are integers.
I asked the same question in stackoverflow and received a good answer which was based on numpy and arraying the data, but I am afraid it is too slow for a rather big dataset.
let's assume this is my data set (in the real data set, the first and last elements are not integers):
x = [['2', 'Jack', '8'],['1', 'Ali', '2'],['4' , 'sgee' , '1'],
['5' , 'gabe' , '2'],['100' , 'Jack' , '6'],
['7' , 'Ali' , '2'],['8' , 'nobody' , '20'],['9' , 'Al', '10']]
the result should look something like this:
[['2', 'Jack', '8', '1', 'Ali', '2', '5' , 'gabe' , '2','7' , 'Ali' , '2'],
['1', 'Ali', '2', '4' , 'sgee' , '1'],
['8' , 'nobody' , '20', '2', 'Jack', '8']]
I think I can use indexing to make the process faster, but my knowledge of databases is very limited. Does anybody have a solution (using indexes or any other tool)?
the numpy solution for this question is below:
How to compare two columns from the same data set?
here is the link to a sample of the real data in sqlite: https://drive.google.com/open?id=11w-o4twH-hyRaX8KKvFLL6dQtkTKCJky

A potential SQL-based solution could go as follows (I'm using your big sample DB as a reference):
To make my proposed solution efficient I would do the following:
Create an index on the last column and create a partial index to eliminate rows where the first and last columns are the same. This is optional so you may remove this from the later query if you think this causes a problem. But if you do you should create a full index on col 0. All three are included here for completeness.
CREATE INDEX [index_my_tab_A] ON [tab]([0]);
CREATE INDEX [index_my_tab_B] ON [tab]([5]);
CREATE INDEX [index_my_tab_AB] ON [tab]([0]) where [0] != [5];
ANALYZE;
Then I would take advantage of join behavior to generate the listing you need to produce the result you are after. By joining the table to itself you can get multiple return rows for each row considered.
SELECT * from tab t1
JOIN tab t2 on t2.[5] = t1.[0]
WHERE t1.[0] != t1.[5]
AND t2.[5] != 'N/A' -- Optional
ORDER by t1.[0];
Running that SQL against your big sample database (After ANALYZE step had completed) took 0.2 seconds on my machine. It produced three rows that matched which I presume to be correct.
It may not be immediately obvious what the resulting table means so here is the result the above query gives when run against the small sample you gave in your original post. (that SQL was modified ever so slightly to deal with the reduced number of columns) … when run it produced the following result which is equivalent to your original desired result:
1 Ali 2 4 sgee 1
2 Jack 8 1 Ali 2
2 Jack 8 5 gabe 2
2 Jack 8 7 Ali 2
8 Nobody 20 2 Jack 8
All you would have to do is run through this resulting list and combine the rows to produce the list you specified. The general idea here is to add the second trio of entries to the first trio of entries until the first trio of entries changes but only include the first trio of entries once.
So starting with the first line you would combine the Ali trio with the sgee trio giving you ['1', 'Ali', '2', '4' , 'sgee' , '1']
You would then then combine the three Jack rows giving ['2', 'Jack', '8', '1', 'Ali', '2', '5' , 'gabe' , '2','7' , 'Ali' , '2']
then the final row combines to form ['8' , 'nobody' , '20', '2', 'Jack', '8']
This matches the three arrays you specified (although not in the same order)
Note: Your original question did not indicate what result you expected for the case where the first and last column match in the same row ... [3, George, 3] so ... The where clause eliminates two kinds of entries. I noticed in your big sample data that there were many rows when col 0 and col 5 were the same. So the where clause eliminates these rows from consideration. The second thing I noticed was that many rows have 'N/A' in col 5 so I removed those from consideration too.

Postgres select by array element range

In my table I've got column facebook where I store facebook data ( comment count, share count etc.) and It's an array. For example:
{{total_count,14},{comment_count,0},{comment_plugin_count,0},{share_count,12},{reaction_count,2}}
Now I'm trying to SELECT rows that facebook total_count is between 5 and 10. I've tried this:
SELECT * FROM pl where regexp_matches(array_to_string(facebook, ' '), '(\d+).*')::numeric[] BETWEEN 5 and 10;
But I'm getting an error:
ERROR: operator does not exist: numeric[] >= integer
Any ideas?

There is no need to convert the array to text and use regexp. You can access a particular element of the array, e.g.:
with pl(facebook) as (
values ('{{total_count,14},{comment_count,0},{comment_plugin_count,0},{share_count,12},{reaction_count,2}}'::text[])
)
select facebook[1][2] as total_count
from pl;
total_count
-------------
14
(1 row)
Your query may look like this:
select *
from pl
where facebook[1][2]::numeric between 5 and 10
Update. You could avoid the troubles described in the comments if you would use the word null instead of empty strings ''''.
with pl(id, facebook) as (
values
(1, '{{total_count,14},{comment_count,0}}'::text[]),
(2, '{{total_count,null},{comment_count,null}}'::text[]),
(3, '{{total_count,7},{comment_count,10}}'::text[])
)
select *
from pl
where facebook[1][2]::numeric between 5 and 10
id | facebook
----+--------------------------------------
3 | {{total_count,7},{comment_count,10}}
(1 row)
However, it would be unfair to leave your problems without an additional comment. The case is suitable as an example for the lecture How not to use arrays in Postgres. You have at least a few better options. The most performant and natural is to simply use regular integer columns:
create table pl (
...
facebook_total_count integer,
facebook_comment_count integer,
...
);
If for some reason you need to separate this data from others in the table, create a new secondary table with a foreign key to the main table.
If for some mysterious reason you have to store the data in a single column, use the jsonb type, example:
with pl(id, facebook) as (
values
(1, '{"total_count": 14, "comment_count": 0}'::jsonb),
(2, '{"total_count": null, "comment_count": null}'::jsonb),
(3, '{"total_count": 7, "comment_count": 10}'::jsonb)
)
select *
from pl
where (facebook->>'total_count')::integer between 5 and 10
hstore can be an alternative to jsonb.
All these ways are much easier to maintain and much more efficient than your current model. Time to move to the bright side of power.

MS SQL Server: Treat Column As Json

I want a function (MS SQL Server 2016) that can recursively call itself to traverse a tree and to return that traversal as a single Json value. I have a working chunk of code, shown below, but I'd like to do something other than the clunky JSON_MODIFY I've used. Unfortunately, I can't find a way to make it work without it. If you comment out the line of code with the JSON_MODIFY and uncomment the next line, you'll see what I mean.
Is there a better solution?
DROP TABLE dbo.Node;
GO
DROP FUNCTION dbo.NodeList;
GO
CREATE TABLE dbo.Node (
NodeId INT NOT NULL ,
ParentNodeId INT NULL ,
NodeName NVARCHAR(MAX)
);
GO
INSERT dbo.Node(NodeId, ParentNodeId, NodeName)
VALUES (1, NULL, 'A'), (2, 1, 'B'), (3, 1, 'C'), (4, 3, 'D'), (5, 3, 'E');
GO
CREATE FUNCTION dbo.NodeList(#ParentNodeId INT) RETURNS NVARCHAR(MAX)
AS BEGIN
DECLARE #JsonOut NVARCHAR(MAX) = (
SELECT n.NodeId ,
n.NodeName ,
JSON_MODIFY(dbo.NodeList(n.NodeId), '$.x', '') AS Children
-- dbo.NodeList(n.NodeId) AS Children
FROM dbo.Node n
WHERE ISNULL(n.ParentNodeId, -1) = ISNULL(#ParentNodeId, -1)
FOR JSON AUTO
) ;
RETURN #JsonOut;
END;
GO
PRINT dbo.NodeList(NULL);
GO
The output with the JSON_MODIFY is exactly what I want...
[{"NodeId":1,"NodeName":"A","Children":[{"NodeId":2,"NodeName":"B"},
{"NodeId":3,"NodeName":"C","Children":[{"NodeId":4,"NodeName":"D"},
{"NodeId":5,"NodeName":"E"}]}]}]
... but without it, it all goes wrong ...
[{"NodeId":1,"NodeName":"A","Children":"[{\"NodeId\":2,\"NodeName\":\"B\"},
{\"NodeId\":3,\"NodeName\":\"C\",\"Children\":\"
[{\\\"NodeId\\\":4,\\\"NodeName\\\":\\\"D\\\"},
{\\\"NodeId\\\":5,\\\"NodeName\\\":\\\"E\\\"}]\"}]"}]
Thanks in advance for any ideas.

Two things:
1) The JSON_MODIFY() doesn't actually generate exactly what I want. Using the original code, the JSON_MODIFY adds an element with the name 'x' and the value '' to the value.
2) Although not a perfect answer, it is much cleaner to replace JSON_MODIFY() with JSON_QUERY() (specifically with JSON_QUERY(dbo.NodeList(...), '$'). Not only does it make more sense, this way but it gets rid of the extra 'x' artifact.
SQL obviously know that whatever comes back from JSON_MODIFY isn't really just an NVARCHAR (despite its signature), but I can't find any other way to make SQL think that an NVARCHAR is JSON.