Multiplying a singular part within a string based on an integer - sql-server

I'm currently building a recursive CTE to rebuild the folder/file structure within a database.
i.e., ID, ParentID, Folder/File Name, Hierarchy level.
Union Join on ParentId = ID, and a string combination of the Folder/File name.
I added a Line Break in between for that is the end result my client desires.
Now, they'd also like to have some tabs to denote the depth of the folder/file (which I would add through Char(9).)
I just can't figure out how to increase the amount of tabs based on the hierarchy level.
So:
ParentID
ID
Name
Hierarchy Level
Path in #
PathName
1
1
Main
0
0
Main
1
2
Alpha
1
1.2
Main -Linebreak- -Tab- Alpha
1
3
Beta
1
1.3
Main -LineBreak- -Tab- Beta
2
4
Gamma
2
1.2.4
Main -Linebreak- -Tab- Alpha -Linebreak- -Tab- -Tab- Gamma
3
5
Delta
2
1.3.5
Main -Linebreak- -Tab- Beta -Linebreak- -Tab- -Tab- Delta
3
6
Epsilon
2
1.3.6
Main -Linebreak- -Tab- Beta -Linebreak- -Tab- -Tab- Epsilon
5
7
Zeta
3
1.3.5.7
Main -Linebreak- -Tab- Beta -Linebreak- -Tab- -Tab- Delta -Linebreak- -Tab- -Tab- -Tab- Zeta
I can't seem to figure out how to make the amount of tabs equivalent to the level of the hierarchy.
Could someone help me with this?
Cheers,
Casper

Related

I want to get all projects that were active for some period in Google Data Studio GDS Lookes

I have a list of projects:
id
startDate
endDate
Name
Status
1
09/23/2022
10/23/2022
Project 1
Close
2
09/23/2022
Project 2
Active
3
09/23/2022
01/24/2023
Project 3
Close
4
01/24/2023
01/27/2023
Project 4
Close
5
01/24/2023
Project 5
Active
6
01/30/2023
Project 6
Active
and I want to get all projects that were active on the previous week.
For example date_range control 01/23/2023 - 01/29/2023.
Correct answer will be: Project 2, Project 3, Project 4, Project 5
How I can implement it in Google Data Studio (Looker)
Thank you in advance.
I can't imagine how to do it in GDS as date range assigns to one row

How should i format/set up my dataset/dataframe? and factor ->numeric problems

New to R and new to this forum, tried searching, hope i dont embarass myself by failing to identify previous answers.
So i got my data, and i intend to do some kind of glmm's in the end but thats far away in the future, first im going to do some simple glm/lm's to learn what im doing
first about my data:
I have data sampled from 2 "general areas" on opposite sides of the country.
in these general areas there are roughly 50 trakts placed (in a grid, random staring point)
Trakts have been revisited each year for a duration of 4 years
A tract contains 16 sample plots, i intend to work on trakt-level so i use the means of the 16 sample plots for each trakt.
2x4x50 = 400 rows (actual number is 373 rows when i have removed trakts where not enough plots could be sampled due to terrain etc)
the data in my excel file is currently divided like this:
rows = trakts
Columns= the measured variable
i got 8-10 columns i want to use
short example how the data looks now:
V1 - predictor, 4 different columns
V2 - Response variable = proportional data, 1-4 columns depending on which hypothesis i end up testing,
the glmm in the end would look something like, (V2~V1+V1+V1,(area,year))
Area Year Trakt V1 V2
A 2015 1 25.165651 0
A 2015 2 11.16894652 0.1
A 2015 3 18.231 0.16
A 2014 1 3.1222 N/A
A 2014 2 6.1651 0.98
A 2014 3 8.651 1
A 2013 1 6.16416 0.16
B 2015 1 9.12312 0.44
B 2015 2 22.2131 0.17
B 2015 3 12.213 0.76
B 2014 1 1.123132 0.66
B 2014 2 0.000 0.44
B 2014 3 5.213265 0.33
B 2013 1 2.1236 0.268
How should i get started on this?
8 different files?
Nested by trakts ( do i start nesting now or later when i'm doing glmms?)
i load my data into r through the read.tables function
If i run: sapply(dataframe,class)
V1 and V2 are factors, everything else integer
if i run sapply(dataframe,mode)
everything is numeric
so finally to my actual problems, i have been trying to do normality tests (only trid shapiro so far) but i keep getting errors that imply my data is not numeric
also, when i run a normality test, do i only run one column and evaluate it before moving on to the next column or should i run several columns? the entire dataset?
should i in my case run independent normality tests for each of my areas and year?
hope it didnt end up to cluttered
best regards

NEXT Function, or test if following Row Group is hidden

I'm using ReportBuilder 2.0 / SQL Server 2008.
I have a report that uses visibility settings on the row groups which results in some row group headings being hidden, which in turn makes report totals seem incorrect. I can't change the visibility settings (for business reasons); what I'm looking for is a way to test EITHER for hidden items, OR for apparently incorrect totals. Consider the following dataset:
ItemCode SubPhaseCode SubPhase BidItem XTDPrice
1 1 Water Utility 1 5000
2 1 Water Utility 2 4000
3 2 Electrical Utility 3 75,000
4 2 Electrical Utility 3 75,000
5 2 Electrical Utility 3 100000
6 2 Electrical Utility 4 2500
7 2 Electrical Utility 4 2500
8 2 Electrical Utility 4 5064
9 2 Electrical Utility 5 3000
10 2 Electrical Utility 5 3000
11 2 Electrical Utility 5 5796
12 3 Gas Utility 6 60000
13 3 Gas Utility 6 60000
14 3 Gas Utility 6 61547
15 4 Other Utility 7 6000
16 4 Other Utility 7 7000
There are 3 Row Groups on the report, one for SubPhaseCode ("Group1"), and two for BidItem("Group2" and "DetailsGroup"):
Link to Design View Screenshot
The Row Visibility property for Group1 (SubPhaseCode) is:
=IIF(Fields!SubPhaseCode.Value = 3, true, false)
This results in the heading for the SubPhase "Gas" being hidden. This means that, when the report is run, I get something like the following:
Total 475407
Water 9000
-Utility 1 5000
-Utility 2 4000
Electrical 271860
-Utility 3 250000
-Utility 4 10064
-Utility 5 11796
-Utility 6 181547
Other 13000
-Utility 7 13000
The fact that SubPhase 3 ("Gas") is hidden results in 2 apparent errors:
1) The sum for "Electrical" (271860) appears incorrect for the 4 items below it (because there should be another row heading above "Utility 6")
2) The total of 475407 appears incorrect for the 3 groups below it (9000 + 271860 + 13000).
What I am looking for is a way to change the formatting of the headings (especially the Group Headings) if the numbers below them apparently don't add up. I understand how to implement conditional formatting and have done this for the Total. I am unclear how this could be implemented for the Row Group.
I would basically need some kind of a test, for each Row Heading, to see if the following heading would be hidden, according to the rules. This sounds to me like a "NEXT" function, which I know doesn't exist.
Other searches have indicated that I might need to add the desired data to the dataset or modify the underlying SP. Just wondering if there are any simpler solutions.
Thanks much for the help!
I'd avoid to sum the values in the SubPhase group SUM().
Try:
=SUM(IIF(Fields!SubPhaseCode.Value=3,0,Fields!XTDPrice.Value))
Let me know if this helps.

SSIS Inserting Records into table in the same order in flat file

I have a flatfile that looks like the first set. I have a table with an auto incrementing primary key field. Using SSIS how can I guarantee when I import that data that it keeps the record order as specified in the flatfile? I'm assuming that when SSIS reads the file that it will keep that order as it inserts into the database. Is this true?
In File:
RecordType | Amount
5 1.00
6 2.00
6 3.00
5 .5
6 1.5
7 .8
5 .5
In a Database Table
ID | RecordType | Amount
1 5 1.00
2 6 2.00
3 6 3.00
4 5 .5
5 6 1.5
6 7 .8
7 5 .5
Just to be safe, I'd add a Sort Transformation in your SSIS package, you can choose the column you want sorted and how it's sorted. This should ensure it reads it the way you want.
Thew order doesn't matter in a Table. It only matters in a Query.
In my experience it will always load in the order of the input file if you are using an autoincrement ID that is also the clustered index.
Here is a similar discussion that has a couple ideas. Particularly preprocessing the file or using a script component as the source. You may want to take one of those routes because the fact that it may behave the way you want by default does not mean it always will.
http://www.sqlservercentral.com/Forums/Topic1300952-364-1.aspx

db optimization: computing rank

This question asks how to select a user's rank by his id.
id name points
1 john 4635
3 tom 7364
4 bob 234
6 harry 9857
The accepted answer is
SELECT uo.*,
(
SELECT COUNT(*)
FROM users ui
WHERE (ui.points, ui.id) >= (uo.points, uo.id)
) AS rank
FROM users uo
WHERE id = #id
which makes sense. I'd like to understand what the performance tradeoffs would be, between this approach, or by modifying the db structure to store a calculated rank (I guess that would require massive changes every time there's a rank change), or any other approaches that I'm too newb to think of. I'm a db noob.
The performance tradeoff would basically be what you described:
If you modified the structure to store a rank, queries would be very, very simple and fast. However, this would require some overhead any time "points" changed, as you'd have to verify that the rank hasn't changed. If the ranking had changed, you'd have to do multiple updates.
This causes more work (with the potential for bugs) at every update/insert. The tradeoff is very fast reads. If you're typical usage is very few modifications compared to millions of reads, AND you found this query to be a bottleneck, it might be worth considering reworking this. However, I would avoid the added complexity and maintainability headaches unless you truly found this to be a problem, since the current solution requires less storage, and is very flexible.
The link you reference is a MySQL question. If the original database had been Oracle the accepted answer would be to use an analytic function, which does scale, very nicely:
SQL> select id, name, points from users order by id
2 /
ID NAME POINTS
---------- ---------- ----------
1 john 4635
3 tom 7364
4 bob 234
6 harry 9857
8 algernon 1
9 sebastian 234
10 charles 888
7 rows selected.
SQL> select name, id, points, rank() over (order by points)
2 from users
3 /
NAME ID POINTS RANK()OVER(ORDERBYPOINTS)
---------- ---------- ---------- -------------------------
algernon 8 1 1
bob 4 234 2
sebastian 9 234 2
charles 10 888 4
john 1 4635 5
tom 3 7364 6
harry 6 9857 7
7 rows selected.
SQL> select name, id, points, dense_rank() over (order by points desc)
2 from users
3 /
NAME ID POINTS DENSE_RANK()OVER(ORDERBYPOINTSDESC)
---------- ---------- ---------- -----------------------------------
harry 6 9857 1
tom 3 7364 2
john 1 4635 3
charles 10 888 4
bob 4 234 5
sebastian 9 234 5
algernon 8 1 6
7 rows selected.
SQL>
Does not the 'where' portion of that query internally require reading the entire table? I understand about premature optimization. Academically, it seems that this wouldn't scale further than a few thousand rows.

Resources