I have read about the possibilty to create wide tables (30,000 columns) in SQL server (1)
But how do I actually save the result of a select statement (one that has 1024+ columns) into a wide table?
Because if I do:
Select *
Into wide_table
From (
**Select statement with 1024+ columns**
) b
I get: CREATE TABLE failed because column 'c157' in table 'wide_table' exceeds the maximum of 1024 columns.
And, will I be able to query that table and all it's columns in a regular manner?
Thank you for your help!
You are right you are allowed to created table with 30 000 columns, but you can SELECT or INSERT 'only' 4096 column in one clause:
So, in case of SELECT you will need to get the columns in parts or concatenate the results. All of this does not seem to be practical and easier and performance efficient.
If you are going to have so many columns, maybe it will be better to try to UNPIVOT the data and normalized it further.
Related
For sync purposes, I am trying to get a subset of the existing objects in a table.
The table has two fields, [Group] and Member, which are both stringified Guids.
All rows together may be to large to fit into a datatable; I already encountered an OutOfMemory exception. But I have to check that everything I need right now is in the datatable. So I take the Guids I want to check (they come in chunks of 1000), and query only for the related objects.
So, instead of filling my datatable once with all
SELECT * FROM Group_Membership
I am running the following SQL query against my SQL database to get related objects for one thousand Guids at a time:
SELECT *
FROM Group_Membership
WHERE
[Group] IN (#Guid0, #Guid1, #Guid2, #Guid3, #Guid4, #Guid5, ..., #Guid999)
The table in question now contains a total of 142 entries, and the query already times out (CommandTimeout = 30 seconds). On other tables, which are not as sparsely populated, similar queries don't time out.
Could someone shed some light on the logic of SQL Server and whether/how I could hint it into the right direction?
I already tried to add a nonclustered index on the column Group, but it didn't help.
I'm not sure that WHERE IN will be able to maximally use an index on [Group], or if at all. However, if you had a second table containing the GUID values, and furthermore if that column had an index, then a join might perform very fast.
Create a temporary table for the GUIDs and populate it:
CREATE TABLE #Guids (
Guid varchar(255)
)
INSERT INTO #Guids (Guid)
VALUES
(#Guid0, #Guid1, #Guid2, #Guid3, #Guid4, ...)
CREATE INDEX Idx_Guid ON #Guids (Guid);
Now try rephrasing your current query using a join instead of a WHERE IN (...):
SELECT *
FROM Group_Membership t1
INNER JOIN #Guids t2
ON t1.[Group] = t2.Guid;
As a disclaimer, if this doesn't improve the performance, it could be because your table has low cardinality. In such a case, an index might not be very effective.
I have 10 tables of which 4 tables have 99 columns and 6 tables have 100 columns. I have to combine using UNION ALL. when executing SQL query getting below error
Msg 205, Level 16, State 1, Line 6
All queries combined using a UNION, INTERSECT or EXCEPT operator must have an equal number of expressions in their target lists.
I understood the reason of error is for not same number of columns. I tried using NULL as Column100 but still getting same error.
please can anyone suggest me how to use * and UNION ALL in SQL query.
Thanks.
If the extra column happens to be at the beginning or end and the other columns are in exactly the same order, then you can add the column manually:
select t99.*, 't99' as col
from t99
union all
select t100.*
from t100;
But really, is it that hard to list the columns? An explicit column list is much less prone to error. And, it will work regardless of where the 100th column appears.
You can get the list in SQL Server Management Studio by clicking on the table name. You can also run a query such as:
select column_name
from information_schema.columns
where table_name = 't99';
And then use the column names to construct the query (I often use a spreadsheet for this purpose).
UNION requres that columns before and after it MATCH.
You can not do union of 99 columns and then 100 columns. You have to either provide dummy value for 100th column that do not exist in that table, or tell DB to skipp that column.
So add to the smaller table select:
NULL AS missing-column-name
Or list all the common columns by hand omitting columns that do not exists in both.
Hi everyone I have a couple of queries for some reports in which each query is pulling Data from 35+ tables. Each Table has almost 100K records. All the Queries are Union ALL for Example
;With CTE
AS
(
Select col1, col2, col3 FROM Table1 WHERE Some_Condition
UNION ALL
Select col1, col2, col3 FROM Table2 WHERE Some_Condition
UNION ALL
Select col1, col2, col3 FROM Table3 WHERE Some_Condition
UNION ALL
Select col1, col2, col3 FROM Table4 WHERE Some_Condition
.
.
. And so on
)
SELECT col1, col2, col3 FROM CTE
ORDER BY col3 DESC
So far I have only tested this query on Dev Server and I can see It takes its time to get the results. All of these 35+ tables are not related with each other and this is the only way I can think of to get all the Desired Data in result set.
Is there a better way to do this kind of query ??
If this is the only way to go for this kind of query how can I
improve the performance for this Query by making any changes if
possible??
My Opinion
I Dont mind having a few dirty-reads in this report. I was thinking of using Query hints with nolock or Transaction Isolation Level set to READ UNCOMMITED.
Will any of this help ???
Edit
Every Table has 5-10 Bit columns and a Corresponding Date column to each Bit Column and my condition for each SELECT Statement is something like
WHERE BitColumn = 1 AND DateColumn IS NULL
Suggestion By Peers
Filtered Index
CREATE NONCLUSTERED INDEX IX_Table_Column
ON TableName(BitColumn)
WHERE BitColum = 1
Filtered Index with Included Column
CREATE NONCLUSTERED INDEX fIX_IX_Table_Column
ON TableName(BitColumn)
INCLUDE (DateColumn)
WHERE DateColumn IS NULL
Is this the best way to go ? or any suggestions please ???
There are lots of things that can be done to make it faster.
If I assume you need to do these UNIONs, then you can speed up the query by :
Caching the results, for example,
Can you create an indexed view from the whole statement ? Or there are lots of different WHERE conditions, so there'd be lots of indexed views ? But know that this will slow down modifications (INSERT, etc.) for those tables
Can you cache it in a different way ? Maybe in the mid layer ?
Can it be recalculated in advance ?
Make a covering index. Leading columns are columns form WHERE and then all other columns from the query as included columns
Note that a covering index can be also filtered but filtered index isn't used if the WHERE in the query will have variables / parameters and they can potentially have the value that is not covered by the filtered index (i.e., the row isn't covered)
ORDER BY will cause sort
If you can cache it, then it's fine - no sort will be needed (it's cached sorted)
Otherwise, sort is CPU bound (and I/O bound if not in memory). To speed it up, do you use fast collation ? The performance difference between the slowest and fastest collation can be even 3 times. For example, SQL_EBCDIC280_CP1_CS_AS, SQL_Latin1_General_CP1251_CS_AS, SQL_Latin1_General_CP1_CI_AS are one of the fastest collations. However, it's hard to make recommendations if I don't know the collation characteristics you need
Network
'network packet size' for the connection that does the SELECT should be the maximum value possible - 32,767 bytes if the result set (number of rows) will be big. This can be set on the client side, e.g., if you use .NET and SqlConnection in the connection string. This will minimize CPU overhead when sending data from the SQL Server and will improve performance on both side - client and server. This can boost performance even by tens of percents if the network was the bottleneck
Use shared memory endpoint if the client is on the SQL Server; otherwise TCP/IP for the best performance
General things
As you said, using isolation level read uncommmitted will improve the performance
...
Probably you can't do changes beyond rewriting the query, etc. but just in case, adding more memory in case it isn't sufficient now, or using SQL Server 2014 in memory features :-), ... would surely help.
There are way too many things that could be tuned but it's hard to point out the key ones if the question isn't very specific.
Hope this helps a bit
well you haven't give any statistics or sample run time of any execution so it is not possible to guess what is slow and is it really slow. how much data is in the result set? it might be just retrieving 100K rows as in result is just taking its time. if the result set of 10000 rows is taking 5 minute, yes definitely something can be looked at. so if you have sample query, number of rows in result and how much time it took for couple of execution with different where conditions, post that. it will help us to compare results.
BTW, do not use CTE just use regular inner and outer query select. make sure the Temp DB is configured properly. LDF and MDF is not default configured for 10% increase. by certain try and error you will come to know how much log and temp DB is increased for verity of range queries and based on that you should set the initial and increment size of the MDF and LDF of temp DB. for the Covered filter index the include column should be col1, col2 and co3 not column Date unless Date is also in select list.
how frequently the data in original 35 tables get updated? if max once per day or if they all get updates almost same time then Indexed-Views can be a possible solution. but if original tables getting updates more than once a day or they gets updates anytime and no where they are in same line then do no think about Indexed-View.
if disk space is not an issue as a last resort try and test performance using trigger on each 35 table. create new table to hold final results as you are expecting from this select query. create insert/update/delete trigger on each 35 table where you check the conditions inside trigger and if yes then only copy the same insert/update/delete to new table. yes you will need a column in new table that identifies which data coming from which table. because Date is Null-Able column you do not get full advantage of Index on that Column as "mostly you are looking for WHERE Date is NULL".
in the new Table only query you always do is where Date is NULL then do not even bother to create that column just create BIT columns and other col1, col2, col3 etc... if you give real example of your query and explain the actual tables, other details can be workout later.
The query hints or the Isolation Level are only going to help you in case of any blocking occurs.
If you dont mind dirty reads and there are locks during the execution it could be a good idea.
The key question is how many data fits the Where clausule you need to use (WHERE BitColumn = 1 AND DateColumn IS NULL)
If the subset filtered by that is small compared with the total number of rows, then use an index on both columns, BitColum and DateColumn, including the columns in the select clausule to avoid "Page Lookup" operations in your query plan.
CREATE NONCLUSTERED INDEX IX_[Choose an IndexName]
ON TableName(BitColumn, DateColumn)
INCLUDE (col1, col2, col3)
Of course the space needed for that covered-filtered index depends on the datatype of the fields involved and the number of rows that satisfy WHERE BitColumn = 1 AND DateColumn IS NULL.
After that I recomend to use a View instead of a CTE:
CREATE VIEW [Choose a ViewName]
AS
(
Select col1, col2, col3 FROM Table1 WHERE Some_Condition
UNION ALL
Select col1, col2, col3 FROM Table2 WHERE Some_Condition
.
.
.
)
By doing that, your query plan should look like 35 small index scans, but if most of the data satisfies the where clausule of your index, the performance is going to be similar to scan the 35 source tables and the solution won't worth it.
But You say "Every Table has 5-10 Bit columns and a Corresponding Date column.." then I think is not going to be a good idea to make an index per bit colum.
If you need to filter by using diferent BitColums and Different DateColums, use a compute column in your table:
ALTER TABLE Table1 ADD ComputedFilterFlag AS
CAST(
CASE WHEN BitColum1 = 1 AND DateColumn1 IS NULL THEN 1 ELSE 0 END +
CASE WHEN BitColum2 = 1 AND DateColumn2 IS NULL THEN 2 ELSE 0 END +
CASE WHEN BitColum3 = 1 AND DateColumn3 IS NULL THEN 4 ELSE 0 END
AS tinyint)
I recomend you use the value 2^(X-1) for conditionX(BitColumnX=1 and DateColumnX IS NOT NULL). It is going to allow you to filter by using any combination of that criteria.
By using value 3 you can locate all rows that accomplish: Bit1, Date1 and Bit2, Date2 condition. Any condition combination has its corresponding ComputedFilterFlag value because the ComputedFilterFlag acts as a bitmap of conditions.
If you heve less than 8 diferents filters you should use tinyint to save space in the index and decrease the IO operations needed.
Then use an Index over ComputedFilterFlag colum:
CREATE NONCLUSTERED INDEX IX_[Choose an IndexName]
ON TableName(ComputedFilterFlag)
INCLUDE (col1, col2, col3)
And create the view:
CREATE VIEW [Choose a ViewName]
AS
(
Select col1, col2, col3 FROM Table1 WHERE ComputedFilterFlag IN [Choose the Target Filter Value set]--(1, 3, 5, 7)
UNION ALL
Select col1, col2, col3 FROM Table2 WHERE ComputedFilterFlag IN [Choose the Target Filter Value set]--(1, 3, 5, 7)
.
.
.
)
By doing that, your index coveres all the conditions and your query plan should look like 35 small index seeks.
But this is a tricky solution, may be a refactoring in your table schema could produce simpler and faster results.
You'll never get real time results from a union all query over many tables but I can tell you how I got a little speed out of a similar situation. Hopefully this will help you out.
You can actually run all of them at once with a little bit coding and ingenuity.
You create a global temporary table instead of a common table expression and don't put any keys on the global temporary table it will just slow things down. Then you start all the individual queries which insert into the global temporary table. I've done this a hundred or so times manually and it's faster than a union query because you get a query running on each cpu core. The tricky part is the mechanism to determine when the individual queries have finished your on your own for that piece hence I do these manually.
On SQL Server, executing the following SQL Statement:
SELECT 1,2,3
will return
(no column name) (no column name) (no column name)
1 2 3
Note that the columns don't have names and the number of columns is not definite (it can have 1 column or it can also have > 100 columns).
My question is - Does anybody know of a simple approach so I can get the following result:
(no column name)
1
2
3
What I'm really trying to do is come up with a SQL similar to the one below. I wish I could execute it as it is but of course we know that the Select 1,2,3 won't work, we have to somehow transform that into a table with the values in each row.
SELECT *
FROM NORTHWIND.DBO.CUSTOMERS
WHERE EMPLOYEEID IN (*Select 1,2,3*); -- *Select 1,2,3 will not work
Currently I'm thinking of creating a user defined function that returns a table by iterating through each column and dynamically creating multiple SQL statements combined by UNION similar to: SELECT 1 Col1 UNION SELECT 2 UNION SELECT 3. I'm not a fan of dynamic SQL and looping procedures in my queries as it can be expensive to process especially for an application with expected usage of 1000+ per minute. Also, there is that concern for SQL Injection Attacks with Dynamic SQL when I start using strings instead of integer values. I'm also trying to avoid temporary tables as it can even be more expensive to process.
Any ideas? Can we use UNPIVOT without the need for looping through the indefinite number of columns and dynamically creating the SQL text to execute it and transform the columnar values into rows? What about Common Table Expressions?
Get rid of the select and just specify a list of values:
SELECT * FROM NORTHWIND.DBO.CUSTOMERS
WHERE EMPLOYEEID IN (1,2,3);
I have a table which has a bunch of columns but the two relevant ones are:
Due_Amount MONEY
Bounced_Due_Amount MONEY
I have a SQL query like the following
SELECT * FROM table WHERE (Due_Amount > 0 OR Bounced_Due_Amount > 0)
Would the best index to put on this table for SQL Server 2008 be an index which includes both columns in the index, or should I put an separate index on each column?
An Index can't be used on an OR like that. try this:
SELECT * FROM table WHERE Due_Amount > 0
UNION ALL
SELECT * FROM table Bounced_Due_Amount > 0
--use "UNION" if Due_Amount and Bounced_Due_Amount could both >0 at any one time
have an index on Due_Amount and another on Bounced_Due_Amount.
It might be better to redesign your table. Without knowing your business logic or table, I'm going to guess that you could have a "Bounced" Y/N or 1/0 char/bit column and just a "Due_Amount" column. Add an index on that "Due_Amount" and the query would just be:
SELECT * FROM table WHERE Due_Amount > 0
you could still differentiate between a Bounced or not row. This will not work if you need to have both a bounced and non-bounced due amount at the same time.
My guess is that you would be better off with an index on each individual column. Having it on both won't help any more than having it on just the first column unless you have other queries that would use the compound index.
Your best bet is to try the query with an index on one column, an index on the other column, and two indexes - one on each column. Do some tests with each (on real data, not test data) and see which works best. Take a look at the query plans to understand why.
Depending on the specific data (both size and cardinality) SQL Server may end up using one, both, or possibly even neither index. The only way to know for sure is to test them each.
Technically, you can have an index on a persisted computed column and use the computed column instead of the OR condition in the query, see Creating Indexes on Computed Columns:
alter table [table] add Max_Due_Amount as
case
when Due_Amount > Bounced_Due_Amount the Due_Ammount
else Bounced_Due_Amount
end
persisted;
go
create index idxTableMaxDueAmount on table (Max_Due_Amount );
go
SELECT * FROM table WHERE Max_Due_Amount > 0;
But in general I'd recommend using the UNION approach like KM suggested.
Specifically for this query, it would be best to create an index on both columns in the order they are used in the where clause. Otherwise the index might not be used.