Group column value depending on another column value - sql-server

I have two columns extracted from our data base and I would like to group one of the column (Norms) depends on the value in another column (Directive).
The norms are connected to some directives and this relation exist only in an excel file. So i am trying to build a view which realizes this relation.
The data looks like this,
Directive | Norms
CE, 650, No Flicker | CEI EN 60598-2-22
CE, 650, No Flicker | CEI EN 60598-2-2
CE, 650, No Flicker | CEI EN 60598-1
CE, 650, No Flicker | EN 62471
CE, 650, No Flicker | 2009/125/CE
CE, 650, No Flicker | 874/2012/CE
CE, 650, No Flicker | 2014/30/EU
CE, 650, No Flicker | 2014/35/EU
CE, 650, No Flicker | ISO 15714
CE, 650, No Flicker | EN 60335-2-65
and I would like to generate something like this
Directive | Norms
CE | CEI EN 60598-2-22, CEI EN 60598-2-2, CEI EN60598-1
650 | 2009/125/CE, 874/2012/CE, 2014/30/EU,2014/35/EU
No Flicker | EN 62471, ISO 15714, EN 60335-2-65
Sample Image

Related

Efficient data retention policy other than time in timescaledb

I have a hypertable which looks like this:
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
-------------+---------+-----------+----------+---------+----------+-------------+--------------+-------------
state | text | | | | extended | | |
device | text | | | | extended | | |
time | bigint | | not null | | plain | | |
Indexes:
"device_state_time" btree ("time")
Triggers:
ts_insert_blocker BEFORE INSERT ON "device_state" FOR EACH ROW EXECUTE FUNCTION _timescaledb_internal.insert_blocker()
Child tables: _timescaledb_internal._hyper_4_2_chunk
Access method: heap
I have 100k devices each sending their state at different time intervals. For ex, device1 sends state every second, device2 every day, device3 every 5 days etc. And I MUST keep at least 10 latests states for a device. So, I can't really use the default data retention policy provided by timescale.
Is there any way to achieve this efficiently other than manually selecting the latest 10 entries for each device and deleting the rest?
Thanks!
That sounds like a corner case because the chunks are time-based. Can you categorize these devices in advance?
Maybe you can insert data into different hypertables based on the insert timeframe if you still want to use the retention policies.
For example, on promscale, the solution uses one table for each metric, allowing users to redefine the retention policy for every metric.
It will depend on how you read the data later; maybe fragmenting it into several hypertables will make it harder.
Also, consider hacking the hypertable creation optional arguments maybe you can get something from the partitioning_func and time_partitioning_func.

Combining fields in Google Data Studio

I have a CSV file of the form (unimportant columns hidden)
player,game1,game2,game3,game4,game5,game6,game7,game8
Example data:
Alice,0,-10,-30,-60,-30,-50,-10,30
Bob,10,20,30,40,50,60,70,80
Charlie,20,0,20,0,20,0,20,0
Derek,1,2,3,4,5,6,7,8
Emily,-40,-30,-20,-10,10,20,30,40
Francine,1,4,9,16,25,36,49,64
Gina,0,0,0,0,0,0,0,0
Hank,-50,50,-50,50,-50,50,-50,50
Irene,-20,-20,-20,50,50,-20,-20,-20
I am looking for a way to make a Data Studio view where I can see a chart of all the results of a certain player. How would I make a custom field that combines the data from game1 to game8 so I can make a chart of it?
| Name | Scores |
|----------|---------------------------------|
| Alice | [0,-10,-30,-60,-30,-50,-10,30] |
| Bob | [10,20,30,40,50,60,70,80] |
| Charlie | [20,0,20,0,20,0,20,0] |
| Derek | [1,2,3,4,5,6,7,8] |
| Emily | [-40,-30,-20,-10,10,20,30,40] |
| Francine | [1,4,9,16,25,36,49,64] |
| Gina | [0,0,0,0,0,0,0,0] |
| Hank | [-50,50,-50,50,-50,50,-50,50] |
| Irene | [-20,-20,-20,50,50,-20,-20,-20] |
The goal of the resulting chart would be something like this, where game1 is the first point and so on.
If this is not possible, how would I best represent the data so what I am looking for can work in Data Studio? I currently have it implemented in a Google Sheet, but the issue is there's no way to make views, so when someone selects a row it changes for everyone viewing it.
If you have two file games as data sources, I guess that you want to combine them by the name, right?
You can do it with the blending data option. Resource > manage blends I think is the option.
Then you can create a blend data source merging it by the name.
You can add also both score fields, with different labels.
This is some documentation about it: https://support.google.com/datastudio/answer/9061420?hl=en

Matching and labeling

Good Morning all. I'm trying to write some of my first scripts and I'm having a difficult time doing so. I'm trying to match data from one file to another, and add a label to that row in the original.
I'm using two different data sources, to accomplish this and there are tens of thousands different rows to match. I'm trying to take one column of zip codes in data source one, match it to the same zip codes in a data source two, and add a new column labeling the location in data source one. see example below.
Data Source One:
|A | | B |
|13329 | X |
|22193 | X |
|13211 | X |
Data source two:
|A | | B |
|13211 | Syracuse |
|22193 | D.C. Metro |
|13329 | Utica Rome |
New Data Source one:
| A | B | C |
|13329 | X | Utica-Rome |
|22193 | X | D.C. Metro |
|13211 | X | Syracuse |
New Data Source One is the desired end state. I am dealing with rows that will have no new labels and can be labeled as N/A or NA (whichever way is fine). I hope I have explained the problem and the desired result well enough. Please help.
More commonly than Matching and labeling this is called joining.
join <(sort DS1) <(sort DS2)

CARTO - Performing cluster analysis on specific selection

Is there a way to perform cluster analysis on a selection of a layer in CARTO? For example, if had data points throughout the U.S., and I wanted to know cluster of points in San Francisco, could I feasibly do (pseudo-SQL ahead):
SELECT ST_ClusterWithin(geom) FROM table
WHERE city = "San Francisco"
Or am I better off just splitting layers by city and then performing analysis on each layer in CARTO? I realize this option may not be ideal for ease of updating data across the layers. Any help is appreciate, thank you.
You can use the Filter by column value to extract a selection of your table and then perform the cluster analysis on that analysis node. You can even drag out the original dataset source to create a layer and perform again your analysis on another selection.
+---------+
| dataset |
+------------+---------+----------+
| |
+-----------v------------+ +-------------v--------+
| name = "San Francisco" | | name = "New York" |
+-----------+------------+ +-------------+--------+
| |
| |
+----------v---------+ +-----------v-------+
| cluster analysis | | cluster analysis |
+--------------------+ +-------------------+

Limit X-Axis AFTER applying grouping/series values

I have a dataset, which has a list of article names, and a user.
eg.
Article | User
Article1 | ABT
Article1 | ABT
Article2 | ABT
Article1 | MLH
Article2 | MLH
Article2 | MLH
and I have a dataset, which is preparing this in a count, so that the data looks like this:
Article | User | Count
Article1 | ABT | 2
Article2 | ABT | 1
Article1 | MLH | 1
Article2 | MLH | 2
So you can see, I'm just counting the views for each article grouped by the user.
I want to present this in a stacked bar chart, so that the Article is the x-axis and the user is the series, so I can see the popularity of a given article and also see the popularity by user for that article.
eg. (can't post images)
Example Stacked Bar Chart
This works fine, and I have this already, but I want to restrict the actual amount of Articles displayed. I will end up having over 100 articles to display, so i'd like to restrict to top 10-20 articles, but in the same stacked format, so I can't just "TOP N" the dataset, as it could lose series data for a given article.
eg.
Article | User | Count
Article1 | ABT | 100
Article2 | ABT | 98
Article1 | MLH | 10
Article2 | MLH | 2
Putting a "TOP 2" on this would lose series data for the MLH visits to each article.
Is there a way to restrict the X-Axis, after it is prepared for rendering for the chart? Or other solution i've completely missed.
Add a filter to the Category Group (Article):
Expression: =Count(Fields!ArticleName.Value)
Operator: TOP N
Value : 10
returns top 10 for that category after all grouping is applied.

Resources