Building index for specific value

Building index for specific value - sql-server

I have a table that keeps inventory information for products in stores on daily basis. It is like:
|------------|-----------|---------|-----------------|
| Date | ProductId | StoreId | InventoryOnHand |
|------------|-----------|---------|-----------------|
| 2017-10-11 | 348 | 121 | 2 |
| 2017-10-11 | 110 | 200 | 0 |
| 2017-10-11 | 254 | 587 | -2 |
| 2017-10-12 | 311 | 875 | 26 |
| 2017-10-12 | 954 | 364 | 15 |
| 2017-10-12 | 348 | 121 | 0 |
| 2017-10-12 | 441 | 121 | 7 |
| . | . | . | . |
| . | . | . | . |
| . | . | . | . |
|------------|-----------|---------|-----------------|
In most queries I used have condition like WHERE InventoryOnHand > 0. I need to speed up these queries.
Therefore, I want to build and index that separates values on column InventoryOnHand whether they are greater than 0 or not.
Filtered Index does not solve my problem because if I use filtered index all values greater than 0 will be indexed and this increases index size. I only need to know if a value greater than 0 or not.
i.e. I want to build an index that only works when condition is InventoryOnHand > 0. Is there any way to do this on SQL-Server?

Related

what should be minimum ratio of dead tuple for a table to be considered for VACUUM FULL in Postgres

I am a developer and looking for an advise on optimisation or maintenance of Postgres database.
I am currently investigating on commands which helps in clean up/defragmentation of DB and release some memory to filesystem as DB disk storage space is usage is growing quickly. I found that "VACUUM FULL" can help release memory used by dead tuples. However could not find information on how many or percentage of dead tuples should be there before we consider running this command.
Currently we have two tables in Nextcloud Postgres database which has dead tuples. Also total relation size for these tables is higher than the disk usage reported by \dt+ command. I am providing the stats below. Please advise if they are eligible for "VACUUM FULL" based on given stats.
###########################################
Disk space usage per table (\dt+ command)
###########################################
Schema | Name | Type | Owner | Size | Description
--------+-----------------------------+-------+----------+------------+-------------
public | oc_activity | table | XXXXXXXX | 4796 MB |
public | oc_filecache | table | XXXXXXXX | 127 MB |
#################################
oc_activity total relation size
#################################
SELECT pg_size_pretty( pg_total_relation_size('oc_activity') )
----------------
pg_size_pretty
----------------
9666 MB
########################################
Additional stats for oc_activity table
########################################
relid | schemaname | relname | seq_scan | seq_tup_read | idx_scan | idx_tup_fetch | n_tup_ins | n_tup_upd | n_tup_del | n_tup_hot_upd | n_live_tup | n_dead_tup | n_mod_since_analyze | last_vacuum | last_autovacuum | last_analyze | last_autoanalyze | vacuum_count | autovacuum_count | analyze_count | autoanalyze_count
-------+------------+-------------+----------+--------------+----------+---------------+-----------+-----------+-----------+---------------+------------+------------+---------------------+-------------+-----------------+--------------+-------------------------------+--------------+------------------+---------------+-------------------
yyyyy | public | oc_activity | 272 | 1046966870 | 4737 | 57914604 | 1548217 | 0 | 325585 | 0 | 11440511 | 940192 | 268430 | | | | 2023-02-15 10:01:36.657028+00 | 0 | 0 | 0 | 3
###################################
oc_filecache total relation size
###################################
SELECT pg_size_pretty( pg_total_relation_size('oc_filecache') )
----------------
pg_size_pretty
----------------
541 MB
#########################################
Additional stats for oc_filecache table
#########################################
SELECT * FROM pg_stat_all_tables WHERE relname='oc_filecache'
relid | schemaname | relname | seq_scan | seq_tup_read | idx_scan | idx_tup_fetch | n_tup_ins | n_tup_upd | n_tup_del | n_tup_hot_upd | n_live_tup | n_dead_tup | n_mod_since_analyze | last_vacuum | last_autovacuum | last_analyze | last_autoanalyze | vacuum_count | autovacuum_count | analyze_count | autoanalyze_count
-------+------------+--------------+----------+--------------+------------+---------------+-----------+-----------+-----------+---------------+------------+------------+---------------------+-------------+-------------------------------+--------------+-------------------------------+--------------+------------------+---------------+-------------------
zzzzz | public | oc_filecache | 104541 | 28525391484 | 1974398333 | 2003365293 | 43575 | 695612 | 39541 | 348823 | 461510 | 19418 | 4069 | | 2023-02-16 10:46:15.165442+00 | | 2023-02-16 16:25:32.568168+00 | 0 | 8 | 0 | 33

There is no hard rule. I personally would consider a table uncomfortably bloated if the pgstattuple extension showed that less than a third or a quarter of the table are user data and the rest is dead tuples and empty space.
Rather than regularly running VACUUM (FULL) (which is downtime), you should strive to fix the problem that causes the table bloat in the first place.

Multi-dimensional data structure management in R

I have a concern about data organisation and the best approach to simplify some multi-layered data. Simply, I have a 10 replicates of small wood beams (BeamID, ~10) subjected to a 10 different treatment (TreatID, ~10), and each beam is load tested which produces a series data of a Load with consequent Displacement (ranging from 10 to 50 rows per test; I have code that corrects for disparities in row length). Each wood beam is tested multiple times (Rep, ~10).
My plan was to lump all this data into a 5-D array:
Array[Load, Deflection, BeamID, TreatID, Rep]
This way, I should be able to plot the load~deflection curves for a given BeamID, TreatID, for all Reps by using Array[ , ,1,1, ], right? So the hypothetical output for Array[ , ,1,1,1], would be:
+------------+--------+-----+
| Deflection | Load | Rep |
+------------+--------+-----+
| 0 | 0 | 1 |
| 6.35 | 10.5 | 1 |
| 12.7 | 20.8 | 1 |
| 19.05 | 45.3 | 1 |
| 25.4 | 75.2 | 1 |
+------------+--------+-----+
And Array[ , ,1,1,2] would be:
+------------+--------+-----+
| Deflection | Load | Rep |
+------------+--------+-----+
| 0 | 0 | 2 |
| 7.3025 | 12.075 | 2 |
| 14.605 | 23.92 | 2 |
| 21.9075 | 52.095 | 2 |
| 29.21 | 86.48 | 2 |
+------------+--------+-----+
Or I think I could keep it as a simpler, 'melted' dataframe, which would have columns for Load and Deflection, and BeamID, TreatID, and Rep would be repeated for each row of the test output.
+------------+--------+-----+--------+---------+
| Deflection | Load | Rep | BeamID | TreatID |
+------------+--------+-----+--------+---------+
| 0 | 0 | 1 | 1 | 1 |
| 6.35 | 10.5 | 1 | 1 | 1 |
| 12.7 | 20.8 | 1 | 1 | 1 |
| 19.05 | 45.3 | 1 | 1 | 1 |
| 25.4 | 75.2 | 1 | 1 | 1 |
| 0 | 0 | 2 | 1 | 1 |
| 7.3025 | 12.075 | 2 | 1 | 1 |
| 14.605 | 23.92 | 2 | 1 | 1 |
| 21.9075 | 52.095 | 2 | 1 | 1 |
| 29.21 | 86.48 | 2 | 1 | 1 |
+------------+--------+-----+--------+---------+
However, with the latter, I'm not sure how I could easily and discretely pull out all the Rep test values for a specific BeamID and TreatID, especially since I use a linear model to fit a 3rd order polynomial for an specific test to extract the slope of the curves. Having it as a continuous dataframe means I'd have to specify starting and stopping points to start the linear model, correct?
Thoughts, suggestions? Am I headed in the right direction in using a 5-D array? R is a new programming language for me, so please pardon my misunderstandings.

SQL Database Constraint | Multi-table Constraint

I need to make 2 database constraints that connect two different tables at one time.
1. The total score of the four quarters equals the total score of the game the quarters belong to.
2. The total point of all the players equals to the score of the game of that team.
Here is what my tables look like.
quarter table
+------+--------+--------+--------+
| gNum | Period | hScore | aScore |
+------+--------+--------+--------+
| 1 | 1 | 13 | 18 |
| 1 | 2 | 12 | 19 |
| 1 | 3 | 23 | 31 |
| 1 | 4 | 32 | 18 |
| | | Total | Total |
| | | 80 | 86 |
+------+--------+--------+--------+
Game Table
+-----+--------+--------+--------+
| gID | hScore | lScore | tScore |
+-----+--------+--------+--------+
| 1 | 86 | 80 | 166 |
+-----+--------+--------+--------+
Player Table
+-----+------+--------+--------+
| pID | gNum | Period | Points |
+-----+------+--------+--------+
| 1 | 1 | 1 | 20 |
| | | 2 | 20 |
| | | 3 | 20 |
| | | 4 | 20 |
+-----+------+--------+--------+
So Virtually I need to use CHECK I think to make sure that players points = score of their team ie (hScore, aScore) and also make sure that the hScore and aScore = the total score in the Game table.
I was thinking of creating a foreign key variable on one of the tables and setting up constraints on that would this be the best way of going about it?
Thanks

SQL Server : Islands And Gaps

I'm struggling with an "Islands and Gaps" issue. This is for SQL Server 2008 / 2012 (we have databases on both).
I have a table which tracks "available" Serial-#'s for a Pass Outlet; i.e., Buss Passes, Admissions Tickets, Disneyland Tickets, etc. Those Serial-#'s are VARCHAR, and can be any combination of numbers and characters... any length, up to the max value of the defined column... which is VARCHAR(30). And this is where I'm mightily struggling with the syntax/design of a VIEW.
The table (IM_SER) which contains all this data has a primary key consisting of:
ITEM_NO...VARCHAR(20),
SERIAL_NO...VARCHAR(30)
In many cases... particularly with different types of the "Bus Passes" involved, those Serial-#'s could easily track into the TENS of THOUSANDS. What is needed... is a simple view in SQL Server... which simply outputs the CONSECUTIVE RANGES of Available Serial-#'s...until a GAP is found (i.e. a BREAK in the sequences). For example, say we have the following Serial-#'s on hand, for a given Item-#:
123
124
125
139
140
ABC123
ABC124
ABC126
XYZ240003
XYY240004
In my example above, the output would be displayed as follows:
123 -to- 125
139 -to- 140
ABC123 -to- ABC124
ABC126 -to- ABC126
XYZ240003 to XYZ240004
In total, there would be 10 Serial-#'s...but since we're outputting the sequential ranges...only 5-lines of output would be necessary. Does this make sense? Please let me know...and, again, THANK YOU!...Mark

This should get you started... the fun part will be determining if there are gaps or not. You will have to handle each serial format a little bit differently to determine if there are gaps or not...
select x.item_no,x.s_format,x.s_length,x.serial_no,
LAG(x.serial_no) OVER (PARTITION BY x.item_no,x.s_format,x.s_length
ORDER BY x.item_no,x.s_format,x.s_length,x.serial_no) PreviousValue,
LEAD(x.serial_no) OVER (PARTITION BY x.item_no,x.s_format,x.s_length
ORDER BY x.item_no,x.s_format,x.s_length,x.serial_no) NextValue
from
(
select item_no,serial_no,
len(serial_no) as S_LENGTH,
case
WHEN PATINDEX('%[0-9]%',serial_no) > 0 AND
PATINDEX('%[a-z]%',serial_no) = 0 THEN 'NUMERIC'
WHEN PATINDEX('%[0-9]%',serial_no) > 0 AND
PATINDEX('%[a-z]%',serial_no) > 0 THEN 'ALPHANUMERIC'
ELSE 'ALPHA'
end as S_FORMAT
from table1 ) x
order by item_no,s_format,s_length,serial_no
http://sqlfiddle.com/#!3/5636e2/7
| item_no | s_format | s_length | serial_no | PreviousValue | NextValue |
|---------|--------------|----------|-----------|---------------|-----------|
| 1 | ALPHA | 4 | ABCD | (null) | ABCF |
| 1 | ALPHA | 4 | ABCF | ABCD | (null) |
| 1 | ALPHANUMERIC | 6 | ABC123 | (null) | ABC124 |
| 1 | ALPHANUMERIC | 6 | ABC124 | ABC123 | ABC126 |
| 1 | ALPHANUMERIC | 6 | ABC126 | ABC124 | (null) |
| 1 | ALPHANUMERIC | 9 | XYY240004 | (null) | XYZ240003 |
| 1 | ALPHANUMERIC | 9 | XYZ240003 | XYY240004 | (null) |
| 1 | NUMERIC | 3 | 123 | (null) | 124 |
| 1 | NUMERIC | 3 | 124 | 123 | 125 |
| 1 | NUMERIC | 3 | 125 | 124 | 139 |
| 1 | NUMERIC | 3 | 139 | 125 | 140 |
| 1 | NUMERIC | 3 | 140 | 139 | (null) |

Transform ranged data in an Access table

I have a table in Access database as below;
Name | Range | X | Y | Z
------------------------------
A | 100-200 | 1 | 2 | 3
A | 200-300 | 4 | 5 | 6
B | 100-200 | 10 | 11 | 12
B | 200-300 | 13 | 14 | 15
C | 200-300 | 16 | 17 | 18
C | 300-400 | 19 | 20 | 21
I have trying write a query that convert this into the following format.
Name | X_100_200 | Y_100_200 | Z_100_200 | X_200_300 | Y_200_300 | Z_200_300 | X_300_400 | Y_300_400 | Z_300_400
A | 1 | 2 | 3 | 4 | 5 | 6 | | |
B | 10 | 11 | 12 | 13 | 14 | 15 | | |
C | | | | 16 | 17 | 18 | 19 | 20 | 21
After trying for a while the best method I could come-up with is to write bunch of short queries that selects the data for each Range and then put them together again using a Union query. The problem is that for this example I have shown 3 columns (X, Y and Z), but I actually have much more. Access is starting to strain with the amount of SQL I have come up with.
Is there a better way to achieve this?

The answer was simple. Just use Access Pivotview. Finding it hard to export the results to Excel though.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Building index for specific value - sql-server

Related

what should be minimum ratio of dead tuple for a table to be considered for VACUUM FULL in Postgres

Multi-dimensional data structure management in R

SQL Database Constraint | Multi-table Constraint

SQL Server : Islands And Gaps

Transform ranged data in an Access table

Categories

Resources