How to identify missing weeks using SQL code? - sql-server

I am trying to identify specific weeks that are missing in my data. I have multiple states with different date ranges and would like to output all of the missing weeks for the various states that I do have. Not even sure where to begin with a SQL code that would begin to identify missing weeks. Any help on this would be greatly appreciated. BTW I'm using SQL Server 2016.
Thanks!
Sample Data:
State WeekEndingDate Sales
ID 7/5/2015 125000
ID 12/13/2015 127263
IN 8/20/2016 126589
IN 8/27/2016 124568
IN 10/15/2016 119654
MI 01/02/2017 105687
MI 02/05/2017 145962
An example of my desired output would be:
MI 01/09/2017 136589
MI 01/16/2017 125641
MI 01/23/2017 145769
MI 01/30/2017 135697
IN 09/03/2016 145693 and so on....

This is what's commonly known as a "gaps and islands" problem.
To solve, I strongly suggest you create a date table. Then, assuming that you have established that your week ending day is always Saturday, you can include that date table in the query.
select MD.state,DT.weekendingdate,MD.sales
from DateTable DT
left outer join MyData MD on DT.Weekendingdate = MD.Weekendingdate
where dt.weekendingdate >= '2016-08-01' and dt.weekendingdate <= '2016-08-31'

Related

Why Hibernate HSQL Concat is not working for MSSQL?

So, I have Hibernate 5.3.1 in a project which connects to different enginees (MySql, Oracle, PostgreSQL and MS SQL), so I can't use native queries.
Let's say I have 3 records in a table, which all of them have the same datetime, but I need to group them only by date (not time). For example, 2019-12-04;
I execute this query:
SELECT
CONCAT(year(tx.date_), month(tx.date_), day(tx.date_)),
iss.code,
COUNT(tx.id)
FROM
tx_ tx
JOIN
issuer_ iss
ON
tx.id_issuer = iss.id
GROUP BY
CONCAT(year(tx.date_), month(tx.date_), day(tx.date_)), iss.code
But, when I test it connected to SQL SERVER 2017, instead of return 20191204, it's returning 2035. In Oracle and MySQL is working fine.
Anyone has any idea why is this happen? I've tried different ways, like use + instead of CONCAT but the result is the same.
I've also tried to extract them for separate (without concat), and they have been returning correct. The problem is, I need to group them by the complete date.
And just for the record, the field is declared as datetime2 in DDBB
How about simply adding them, instead of using CONCAT.
(year(tx.date_)*10000 + month(tx.date_)*100 + day(tx.date_)*1) AS datenum
Thus, try this:
SELECT
CAST((year(tx.date_)*10000 + month(tx.date_)*100 + day(tx.date_)*1) AS string) AS datenum,
iss.code
FROM tx_ tx
JOIN issuer_ iss
ON tx.id_issuer = iss.id
GROUP BY year(tx.date_), month(tx.date_), day(tx.date_), iss.code
Thanks for the hint Gert Arnold gave me. I just didn't realize that the query was adding like if they were numbers in MSSQL.
Finally, I manage to make it work in the 4 RDBMS casting to string first
SELECT
CONCAT(CAST(year(tx.date_) AS string), CAST(month(tx.date_) AS string), CAST(day(tx.date_) AS string)),
iss.code
FROM
tx_ tx
JOIN
issuer_ iss
ON
tx.id_issuer = iss.id
GROUP BY
CONCAT(year(tx.date_), month(tx.date_), day(tx.date_)), iss.code
I tried also casting to TEXT, but it throws exception in MySQL
Why use concat() to begin with?
Assuming Hibernate takes care of converting the non-standard year(), month() and day() functions, then the following should work on any DBMS
SELECT year(tx.date_), month(tx.date_), day(tx.date_), iss.code
FROM tx_ tx
JOIN issuer_ iss ON tx.id_issuer = iss.id
GROUP BY year(tx.date_), month(tx.date_), day(tx.date_), iss.code

How to get the result of CONNECT_BY_ISCYCLE and CONNECT_BY_ISLEAF in snowflake without using them?

I need to make hierarchical queries, and I need to get the results of CONNECT_BY_ISCYCLE and CONNECT_BY_ISLEAF, but these features are supported in Oracle not in Snowflake.
What are the alternative ways to implement the functionalities of CONNECT_BY_ISCYCLE and CONNECT_BY_ISLEAF in snowflake without using them as these keywords are not supported there?
Wonder if you have taken a look at the following Snowflake features?
https://docs.snowflake.net/manuals/user-guide/queries-hierarchical.html#using-connect-by-or-recursive-ctes-to-query-hierarchical-data
Yes I took a look there. I also took a look at https://docs.snowflake.net/manuals/sql-reference/constructs/connect-by.html where it clearly says that these features are not supported in Snowflake.
I was trying below code block to find an alternative but facing varieties of error in snowflake.
person_vertex as (
select
emp_number,
user_id
from person
),
person_edges as (
select
supervisor_emp_number,
emp_number
from person
where supervisor_emp_number is not null
),
select
pv.emp_number emp_id_pk,
level,
CONNECT_BY_ROOT pv.emp_number AS root,
concat(SYS_CONNECT_BY_PATH(pv.emp_number,':'),':') as path,
-- CONNECT_BY_ISCYCLE AS iscyclic, ------------------- no idea how to implement this
-- CONNECT_BY_ISLEAF as isleaf ------------------- i tried below block, but it is not working
case
when (pe.supervisor_emp_number in (select emp_number from pv)) then 0
else 1
end AS isleaf
from person_vertex pv
left join person_edges pe on pv.emp_number = pe.emp_number
connect by prior A.emp_number = A.supervisor_emp_number
start with A.supervisor_emp_number is null
Any help with this block is really appreciated.
Thanks.
enter code here

How to find misspellings in data

I am trying to find the misspellings in TOWN_C field. Data looks something like below. There is no specific pattern, sometimes misspelling can be at the beginning, sometimes it can be in middle or at the end. Length of misspelling can be different too.
I am using SQL Server Management Studio to execute the queries. I used SUBSTR to find out duplicates along with the left outer join. But that does not give only misspelling. I still need to go and manually look at data.
Data ->
Achampet
ACHEMPET
AGIA
AGIYA
ASHOK NAGAR
ASHOKNAGAR
ASHOKNAGER
SQL query which I am using ->
Select distinct(T3.TOWN__C)
From (Select T1.Sub_Str, Count(T1.Sub_Str) as Y
From (SELECT TOWN__C, SUBSTRING(TOWN__C, 1, 3) as Sub_Str
FROM [SALESFORCE].[dbo].[Outlet Master] group by TOWN__C)T1
Group by T1.Sub_Str having count(*)> 1)T2
Left outer join
[SALESFORCE].[dbo].[Outlet Master]T3
On T2.Sub_Str = SUBSTRING(T3.TOWN__C, 1, 3)
Order by T3.TOWN__C
Is there a way to find out all such cases using SQL or Excel or anything else?
Here's an example using SOUNDEX, to try to locate values where multiple spellings have been used for "similar" names:
declare #t table (town varchar(35) not null)
insert into #t(town) values
('Achampet'),
('ACHEMPET'),
('AGIA'),
('AGIYA'),
('ASHOK NAGAR'),
('ASHOKNAGAR'),
('ASHOKNAGER'),
('Downtown'),
('DOWNTOWN'),
('DownTown')
select
v.*
from
(select
*,
MIN(town) OVER (PARTITION BY town_sound) as minTown,
MAX(town) OVER (PARTITION BY town_sound) as maxTown
from
#t
cross apply
(select SOUNDEX(REPLACE(town,' ','')) as town_sound) u
) v
where minTown != maxTown
Note that this doesn't return "downtown" where the only variations are in capitalization, but does return all of the values in your given sample data, which I assume were all meant to be found as possible misspellings.
Also note that SOUNDEX has had a chequered history and under older versions of SQL Server it was usually recommended that a "better" soundex be implemented as a UDF. You should be able to find versions of that with a simple search, if required.
Note, also, that Soundex was specifically designed around English pronunciation. Again, you may be able to find a better tailored function as a UDF for specific other languages.
Results:
town town_sound minTown maxTown
------------- ---------- ------------- ------------
AGIA A200 AGIA AGIYA
AGIYA A200 AGIA AGIYA
ASHOK NAGAR A225 ASHOK NAGAR ASHOKNAGER
ASHOKNAGAR A225 ASHOK NAGAR ASHOKNAGER
ASHOKNAGER A225 ASHOK NAGAR ASHOKNAGER
Achampet A251 Achampet ACHEMPET
ACHEMPET A251 Achampet ACHEMPET

How to get count data column in SQL server between different dates and each date has time range?

Please, I need help for the correct sql syntax and how to do the following:
SELECT count(CreateDateTime)
FROM tblINFO
WHERE
(CreateDateTime BETWEEN '2016-03-22 06:59:00' AND '2016-03-22 14:59:00')
OR
(CreateDateTime BETWEEN '2016-04-14 06:59:00' AND '2016-04-14 14:59:00')
I tried this Code but It calculated the count of (CreatedDateTime) only for those two days between the given times, But also I need to Calculate the count of the days that are between those two days between the same given times.
Thank you.
I could interpret your question two ways. First one is that you want created date to be on either of those days, in which case you would do this;
SELECT count(CreatedDateTime)
FROM tblINFO
WHERE
(createdDateTime BETWEEN '2015-03-22 06:59:00' AND '2015-03-22 14:59:00')
OR
(createdDateTime BETWEEN '2015-04-14 06:59:00' AND '2015-04-14 14:59:00')
The alternative would be that you want the created date to be between the start and end of those two dates, in which case you'd have this;
SELECT count(CreatedDateTime)
FROM tblINFO
WHERE createdDateTime BETWEEN '2015-03-22 06:59:00' AND '2015-04-14 14:59:00'
If you're after everything between those times on any day between 2015-03-22 and 2015-04-14 then you'll want something like this;
SELECT count(CreatedDateTime)
FROM tblINFO
WHERE
(CONVERT(DATE,createdDateTime) BETWEEN '2015-03-22' AND '2015-04-14')
AND
(CONVERT(TIME,createdDateTime) BETWEEN '06:59:00' AND '14:59:00')
And if you want this to show you the count for all days but split by day, you'll want this;
SELECT
CONVERT(Date,CreatedDateTime) Date
,count(CreatedDateTime) Volume
FROM tblINFO
WHERE
(CONVERT(DATE,createdDateTime) BETWEEN '2015-03-22' AND '2015-04-14')
AND
(CONVERT(TIME,createdDateTime) BETWEEN '06:59:00' AND '14:59:00')
GROUP BY CONVERT(Date,CreatedDateTime)
If I Understood correctly your requirement, then this may help you
SELECT count(CreatedDateTime)
FROM tblINFO
WHERE (createdDateTime BETWEEN '2015/03/22 6:59:00' AND '2015/03/22 14:59:00' )
OR(createdDateTime BETWEEN '2015/04/14 6:59:00' AND '2015/04/14 14:59:00' )
OR(createdDateTime BETWEEN '2015/03/22 6:59:00' AND '2015/04/14 14:59:00' )

SQL Server SUM/Group/Window Function

Good day,
I have a table as follows.
What I would love to do is add a new Column that will tabulate/summarize (anyway possible) called "New Net" by CovID/PolicyNo/CovYear/Positive(Negative) values.
In the example below the new column would look like this.
In short, what we are trying to do is SumUp all the Values in that group and only place that total in the first row of that group and zero out all the others. Any help/pointers would be appreciated with this. I have tried SQL Server Window Functions, standard SUM/GROUP.
This should meet ypur expectations:
SELECT PolicyNo ,
CovID ,
CovYear ,
p ,
net,
CASE WHEN ROW_NUMBER()OVER(PARTITION BY CovID, PolicyNo, CovYear, net ORDER BY PolicyNo) = 1 THEN net ELSE 0 END AS NewNet
FROM dbo.test1;

Resources