More efficient SQL apportionment view - would a function be better? - sql-server

I'm writing a view that apportions costs across different sales departments and I'm wondering if there's a better way to do this.
Lets say I've got some code that outputs this...
Dept Type Oct Nov Dec
DeptA SalesA 10000 20000 5000
DeptA SalesB 4000 2000 8200
DeptA SalesC 6000 7000 4000
DeptB SalesA 12000 4000 6333
DeptB SalesB 8445 3880 4500
DeptB SalesC 8700 8740 6500
General Costs1 890 5874 138
General Costs2 545 547 320
General Costs3 2674 354 214
and I want to apportion 'Costs1' across the sales departments and report back on 'SalesB' from 'DeptA like this....
Oct Nov Dec
SalesB 4000.00 2000.00 8200.00
Approtioned cost1 152.94 499.59 17.98
..(where 'Apportioned cost1' = Cost1 * (DeptA.SalesB / Total sales) ).
The code that I've written so far to do this looks like this...
select
Dept,
Type,
[Oct-15],
[Nov-15],
[Dec-15]
from ProfitandLoss
where Type like 'SalesB%'
and Dept like 'DeptA'
union
select
'Apportioned Costs1' as 'Dept',
'' as 'Type',
round(((round((select sum([Oct-15]) from ProfitandLoss where Type = 'Costs1'),2))* (round((select sum([Oct-15]) from ProfitandLoss where Type like 'SalesB%' and Dept like 'DeptA'),2) / round((select sum([Oct-15]) from ProfitandLoss where type like 'Sales%'),2))),2) as 'Oct-15',
round(((round((select sum([Nov-15]) from ProfitandLoss where Type = 'Costs1'),2))* (round((select sum([Nov-15]) from ProfitandLoss where Type like 'SalesB%' and Dept like 'DeptA'),2) / round((select sum([Nov-15]) from ProfitandLoss where type like 'Sales%'),2))),2) as 'Nov-15',
round(((round((select sum([Dec-15]) from ProfitandLoss where Type = 'Costs1'),2))* (round((select sum([Dec-15]) from ProfitandLoss where Type like 'SalesB%' and Dept like 'DeptA'),2) / round((select sum([Dec-15]) from ProfitandLoss where type like 'Sales%'),2))),2) as 'Dec-15'
from ProfitandLoss
Due to the number of Select statements within the query, this is currently taking 43 seconds to run - at the moment this is just for one department over three months. I need to run this for 34 departments over 12 months! -is there a more efficient way to do this? ..would a function that stores the percentages of total sales for each department that I can then refer to in the query rather than sub-querying multiple times to get individual numbers be any quicker?
...also assume that the costs are not apportioned evenly...for example 'Costs2' is only split between DeptA.SalesA and DeptB.SalesA....I need to make specific apportionments rather than everything divide by the total.

Related

Oracle database is returning date as 1951 instead of 2051

Hi i am using an input field to enter date in UI and enters as 02/01/2051. It will save into the database as 01-FEB-51, i am using DATE as the datatype. When i am fetching it is returning as 01-feb-1951. Below is the query i am using
select to_date(LN_MAT_DT,'dd/mm/YYYY') from Emp ;
Can some one please help on this.
It is about format mask you use and differences between RRRR and YYYY. Have a look at the following example:
SQL> select to_date('01.02.51', 'dd.mm.yy') date_yy,
2 to_date('01.02.51', 'dd.mm.rr') date_rr,
3 --
4 to_date('01.02.1951', 'dd.mm.yyyy') date_19_yyyy,
5 to_date('01.02.1951', 'dd.mm.rrrr') date_19_rrrr,
6 --
7 to_date('01.02.2051', 'dd.mm.yyyy') date_20_yyyy,
8 to_date('01.02.2051', 'dd.mm.rrrr') date_20_rrrr
9 from dual;
DATE_YY DATE_RR DATE_19_YY DATE_19_RR DATE_20_YY DATE_20_RR
---------- ---------- ---------- ---------- ---------- ----------
01.02.2051 01.02.1951 01.02.1951 01.02.1951 01.02.2051 01.02.2051
SQL>
What you should do is to use 4-digits year with the YYYY format mask.
SInce LN_MAT_DT is DATE datatype, you need to_char, not to_date, ie:
select to_char(LN_MAT_DT,'dd/mm/YYYY') from Emp ;

SQL Server - extract dates from strings in several formats

I've inherited quite a mess of a database table column called DOB, of type nvarchar - here is just a sample of the data in this column:
DOB: 1998-09-04US
Sex: M Race: White Year of Birth: 1950
12/31/00
January 5th, 1998
Date of Birth: 12/19/1938
AGE; 46
DOB: 11-24-1967
May 31, 1942, Split, Croatia
DOB:   12/28/1986
D.O.B.31-OCT-92
D.O.B.: January 8, 1973
31/07/1974 (44 years old)
Date Of Birth: 08/01/1979
78  (DOB: 12/09/1940)
1961 (56 years old)
12/31/1985 (PRIMARY)
DOB: 05/27/67
8-Jun-43
9/9/78
12/31/84 0:00
NA
Birth Year 2018
nacido el 29 de junio de 1959
I am trying to determine whether there is any way to extract the dates from these fields, with so many varying formats, without using something like RegEx patterns for every single possible variation in this column.
The resulting extracted data would look like this:
1998-09-04
1950
12/31/00
January 5th, 1998
12/19/1938
11-24-1967
May 31, 1942
12/28/1986
31-OCT-92
January 8, 1973
31/07/1974
08/01/1979
12/09/1940
1961
12/31/1985
05/27/67
8-Jun-43
9/9/78
12/31/84
NA
2018
29 de junio de 1959
While it may be a complete pipe dream, I was wondering if this could be accomplished with SQL, with some kind of "if it looks like a date, attempt to extract it" method. And if not out-of-the-box, perhaps with a helper extension or plugin?
It is possible, but there are potential pitfalls. This will certainly have to be expanded and maintained.
This is a brute-force pattern match where the longest matching pattern is selected
Example - See Full Working Demo
Select ID
,DOB
,Found
From (
Select *
,Found = substring(DOB,patindex(PatIdx,DOB),PatLen)
,RN = Row_Number() over (Partition By ID Order by PatLen Desc)
From #YourTable A
Left Join (
Select *
,PatIdx = '%'+replace(replace(Pattern, 'A', '[A-Z]'), '0', '[0-9]') +'%'
,PatLen = len(Pattern)
From #FindPattern
) B
on patindex(PatIdx,DOB)>0
) A
Where RN=1
Returns

Vars to cases & retain variable/value labels Tableau setup - restructure data for Tableau, flip data

I am flipping my survey data so I can use it in Tableau. Here is example data in SPSS (keep in mind that each variable has value & variable labels).
ID age rate1 rate2 rate3 mr_1 mr_2 mr_3 ...
1 35 8 3 2 1 2
2 40 2 2 3 2
3 41 6 3 5 2 3
4 43 3 3 1
Where rate1-3 are 3 rating questions. Mr_1 to mr_3 is a multiple response check all the apply question (What is your ethnicity? 1=White 2=Hispanic, 3=Black)
I flip the data using this:
VARSTOCASES
/MAKE answer FROM age rate1 rate2 rate3 mr_1 mr_2 mr_3
/INDEX=Index1(7)
/KEEP= All
/NULL=KEEP.
Results look like this:
ID Index1 answer
1 1 35
1 2 8
1 3 3
1 4 2
1 5 1
...
...
...
Which works just fine when connecting this to Tableau. However, what I want is more than just Index1 as an identifier to each variable that has been flipped. What I want is this (Var, VarLab, ValueLabel are just String variables):
ID Var VarLab answer ValueLabel
1 'age' 'What is your age?' 35 '35'
1 'rate1' 'Rate food' 8 '8'
1 'rate2' 'Rate wait time' 3 '3'
1 'rate3' 'Rate bathroom' 2 '2'
1 'mr_1' 'Ethnicity' 1 'White'
1 'mr_2' 'Ethnicity' 2 'Hispanic'
...
...
...
As you can see, I retained the variable label, value label, and the variable name itself for each flipped variable. This is the ideal Tableau setup as Tableau requires "tall" datasets. Also, I can use either the string or numeric representation of the response. Lastly, I no longer need to edit aliases inside of Tableau. Any ideas how to accomplish this? Perhaps this will require python or macro? Any ideas are greatly appreciated.
Thanks!
you need to use OMS to read the dictionary into two datasets - one for variable labels and one for value labels.
then you can match your restructured dataset to the variable labels by variable name, and then match it to the value labels by variable name and value.
Run this to get the two datasets - BEFORE you restructure, of course:
DATASET DECLARE varlab.
OMS /SELECT TABLES /IF COMMANDS=['File Information'] SUBTYPES=['Variable Information']
/DESTINATION FORMAT=SAV OUTFILE='varlab' VIEWER=YES.
DATASET DECLARE vallab.
OMS /SELECT TABLES /IF COMMANDS=['File Information'] SUBTYPES=['Variable Values']
/DESTINATION FORMAT=SAV OUTFILE='vallab' VIEWER=YES.
display dictionary.
omsend.
now restructure and match files - (after renaming the proper variables for matching in the two new datasets).
This is the solution based on the other answer using OMS, and I added few other things.
This flips the vars you want and converts any other var you want to string.
dataset close all.
new file.
get file 'C:\Users\nicholas\Desktop\testFile.sav'.
************************************************************************************************
TABLEAU SETUP
**********************************************
insert file="C:/Users/nicholas/Desktop/Type2syntax.sps".
!toString vars = visitorType.
!flipAndMatch vars = rate1 rate2 rate3 mr_1 mr_2 mr_3.
exe.
*CATEGORIZE FLIPPED VARS
String filter (a150).
!groupingBy 'Rating satis' rate1 rate2 rate3.
!groupingBy 'MR with' mr_1 mr_2 mr_3.
save outfile 'C:\Users\nicholas\Desktop\OtherTableauTest2.sav'.
"C:/Users/nicholas/Desktop/Type2syntax.sps" is :
* Encoding: UTF-8.
save outfile 'C:\Users\nicholas\Desktop\tempSav.sav'.
DATASET DECLARE varlab.
OMS /SELECT TABLES /IF COMMANDS=['File Information'] SUBTYPES=['Variable Information']
/DESTINATION FORMAT=SAV OUTFILE='varlab' VIEWER=YES.
DATASET DECLARE vallab.
OMS /SELECT TABLES /IF COMMANDS=['File Information'] SUBTYPES=['Variable Values']
/DESTINATION FORMAT=SAV OUTFILE='vallab' VIEWER=YES.
display dictionary.
omsend.
DATASET ACTIVATE varlab.
rename variables var1= varName / label = Question.
alter type varName (a20).
alter type Question (a1000).
sort cases by varName.
SAVE OUTFILE='C:\Users\nicholas\Desktop\varlabsTemp.sav'
/keep varName Question.
DATASET ACTIVATE vallab.
rename variables var1=varName / var2 = AnswerNumb / Label = AnswerText.
alter type varName (a20).
alter type AnswerText (a120).
sort cases by varName AnswerNumb.
SAVE OUTFILE='C:\Users\nicholas\Desktop\vallabsTemp.sav'
/keep varName AnswerNumb AnswerText.
dataset close all.
new file.
get file 'C:\Users\nicholas\Desktop\tempSav.sav'.
compute UNIQUE_ID = $casenum.
DEFINE !toString (vars=!CMDEND)
!DO !var !IN (!vars)
!LET !varDelete=!CONCAT("Delete", !var)
rename variables !var = !varDelete.
String !var (a120).
compute !var = valuelabels(!varDelete).
exe.
delete variables !varDelete.
!DOEND
!ENDDEFINE.
DEFINE !groupingBy (!POSITIONAL !TOKENS(1)
/!POSITIONAL !CMDEND)
!DO !var !IN (!2)
!LET !varString=!CONCAT("'", !var,"'")
if varName eq !varString filter eq !1.
!DOEND
exe.
!ENDDEFINE.
DEFINE !flipAndMatch (vars=!CMDEND)
VARSTOCASES
/MAKE AnswerNumb FROM !vars
/INDEX=VarName (AnswerNumb)
/KEEP=ALL
/NULL=KEEP.
EXECUTE.
sort cases by varName AnswerNumb.
alter type varName (a20).
match files files*
/table='C:\Users\nicholas\Desktop\vallabsTemp.sav'
/by varName AnswerNumb.
match files files*
/table='C:\Users\nicholas\Desktop\varlabsTemp.sav'
/by varName.
if AnswerText eq '' AnswerText = string(AnswerNumb, f).
!ENDDEFINE.
Output looks something like this. I didn't flip age or visitorType, but I certainly could have.
UNIQUE_ID VarName AnswerNumb AnswerText Question filter age VisitorType
1 'rate1' 8 '8' 'Rate food' 'Rating group' 35 'Overnight Visitor'
1 'rate2' 3 '3' 'Rate wait time''Rating group' 35 'Overnight Visitor'
1 'rate3' 2 '2' 'Rate bathroom' 'Rating group' 35 'Overnight Visitor'
1 'mr_1' 1 'White' 'Ethnicity' 'MR group' 35 'Overnight Visitor'
...

Adding together similar columns into one column SQL

I'm trying to show the counts of a table of records based on a trackid.
A few of the entries in my ToTransaction column are very similar: Toshi-A,Toshi-B,Toshi-C, Tosan, Toki, Toto
What I want to do in my query is to show all the Toshi's in one row, while still giving Tosan, Toki, and Toto their own rows.
Route ToTransaction Count
F43 Toshi 100
F43 Tosan 200
F43 Toki 75
F43 Toto 125
Instead of
Route ToTransaction Count
F43 Toshi-A 35
F43 Toshi-B 25
F43 Toshi-C 22
F43 Toshi-D 18
F43 Tosan 200
F43 Toki 75
F43 Toto 125
SELECT Route, ToTransaction, count(TrackID) as 'Count' from TestDB
group by Route, ToTransaction
Try this one:
SELECT IF(SUBSTR(ToTransaction,1,5)="Toshi","Toshi",ToTransaction) as "Trans",
COUNT(TracID) as "Count" from TestDB
GROUP BY Trans;
If you use always "-" as seperator and the character count before the "-" character is dynamic you can use substring_index :
For MySQL :
SELECT substring_index(ToTransaction,'-',1) as "Trans",
COUNT(TracID) as "Count" from TestDB
GROUP BY 1;
For MSSQL (I Couldn't try but It should work) :
SELECT CASE WHEN CHARINDEX('-',ToTransaction) > 1
THEN LEFT(ToTransaction,CHARINDEX('-',ToTransaction)-1)
ELSE ToTransaction
END as "Trans",
COUNT(TracID) as "Count" from TestDB GROUP BY Trans;

Grouping results to get unique rows after multiple joins

disclaimer : I don't have full control over the db schema don't judge the data structure or the naming conventions :)
I am doing this large query with multiple joins :
SELECT TOP 30
iss.iss_lKey as IssueId,
iss.iss_sName as IssueName,
con.con_lKey as ContainerId,
con.con_sName as ContainerName,
sto.sto_lKey as StoryId,
sto.sto_sName as StoryName,
sto.sto_Guid as StoryGuid,
sto.sto_sByline as Byline,
sto.sto_created_dWhen as StoryCreatedDate,
sto.sto_deadline_dWhen as StoryDeadline,
sto.sto_lType as StoryType,
sto.sto_sct_lKey as StoryCategory,
sto.sto_created_use_lKey as CreatedBy,
sfv.sfv_tValue as FieldValue,
sf.sfe_lKey as StoryFieldId,
sf.sfe_sCaption as StoryFieldCaption,
sre.sre_lIndex as RevisionIndex
FROM tStory30 sto
JOIN tContainer30 con ON sto.sto_con_lKey = con.con_lKey
JOIN tIssue30 iss ON con.con_iss_lKey = iss.iss_lKey
LEFT OUTER JOIN tStoryRevision30 sre ON sre.sre_sto_lKey = sto.sto_lKey
LEFT OUTER JOIN tStoryField30 sf ON sre.sre_lKey = sf.sfe_sre_lKey
LEFT OUTER JOIN tStoryFieldValue30 sfv ON sfv.sfv_sfe_lKey= sf.sfe_lKey
WHERE sre.sre_lIndex = 0
AND (sto.sto_sName LIKE '%' + #0 + '%'
OR sfv.sfv_tValue LIKE '%' + #0 + '%')";
What I need is really only one row by StoryId, that includes the FieldValue that matched if there was any. I am currently grouping in the code to produce the output, but that prevents me from paging the results.
from r in items
group r by new { r.StoryId, r.ContainerId, r.IssueId }
into storyGroup
select {
storyGroup.Key.StoryId,
storyGroup.Key.ContainerId,
storyGroup.Key.IssueId,
Hits = storyGroup.ToList()
}
Is there any way to achieve this kind of grouping in sql, so that I could then page the result properly (using ROW_NUMBER() OVER)?
Also, I am aware that this is bad practice and should use FullText search. it is planned to setup a solr instance, or use the fulltext options in sqlserver. This is a first attempt to get a smthg going.
EDIT
trying to explain verbally what I try to achieve :
For the context, our app is a cms for magazine editor/publisher.
for a given magazine they have many Issues
each issue has many Container (sort of logical article group)
in each container you have several stories
a story van have 0 or many revisions
the fields of a story are stored by revision (many field per revision)
and a field has a field value.
I need to retrieve the stories that have a given text in the name or in a field value of the first revision (that's the where revisionIndex = 0).
but I also need to retrieve associated data for each story. (issueId, name, containerId and name, and so one..)
the difficult one is probably to retrieve one of the fieldvalue that matched the search. I don't need all of them, just one...
hope this helps!
EDIT Sample data searching for "test". I simplified the columns to make it easier to understand.
Row | IssueId | IssueName | ContainerId | StoryId | FieldValue
1 | 11 IssueName A 394 868 Test Marsupilami bla bla youpi
2 | 40 IssueName B 6 631 story save test
3 | 40 IssueName B 6 666 test story
4 | 4 IssueName c 30 846 test abs
5 | 4 IssueName c 30 846 absc test
6 | 4 IssueName c 30 846 hello test
I am able to get the row number in sqlserver on my query, but here, as you see, I get amultiple times the same story. In this case, I could have simple the following result:
Row | IssueId | IssueName | ContainerId | StoryId | FieldValue
1 | 11 IssueName A 394 868 Test Marsupilami bla bla youpi
2 | 40 IssueName B 6 631 story save test
3 | 4 IssueName c 30 846 test abs
if a story would have test in the story name, then I am ok with a null value in the column FieldValue which field value is selected doesn't matter much.
This is a digression but are you aware that you have converted a left join to an inner join?
LEFT OUTER JOIN tStoryRevision30 sre ON sre.sre_sto_lKey = sto.sto_lKey
LEFT OUTER JOIN tStoryField30 sf ON sre.sre_lKey = sf.sfe_sre_lKey
LEFT OUTER JOIN tStoryFieldValue30 sfv ON sfv.sfv_sfe_lKey= sf.sfe_lKey
WHERE sre.sre_lIndex = 0
try this instead
LEFT OUTER JOIN tStoryRevision30 sre ON sre.sre_sto_lKey = sto.sto_lKey
AND sre.sre_lIndex = 0
LEFT OUTER JOIN tStoryField30 sf ON sre.sre_lKey = sf.sfe_sre_lKey
LEFT OUTER JOIN tStoryFieldValue30 sfv ON sfv.sfv_sfe_lKey= sf.sfe_lKey
(I would have done this in a comment but it is easier to see the code change here.

Resources