Summarizing rows based on column values - sql-server

I am trying to summarize rows together based on aggregate column values.
Is there a way to use grouping and aggregate statements to do this, or do I need to give up and use a cursor? If so, what would that cursor look like?

SELECT Route
,MIN(From_Milepost) From_MilePost
,MAX(To_Milepost) To_MilePost
,Pavement_Condition
FROM [yourTable]
GROUP BY PavementCondition
,Route

Related

Select Columns only if greater than 1

I have about 70 total columns that will have either a 1 or a 0 in the column. I'm trying to only Select the columns that have more than 0. What's the best way to do that please? Thank you.
I'm not sure what you're trying to accomplish, but SQL really doesn't have a good mechanism for selecting columns based on their value. However, if you convert the columns to rows using PIVOT then you can use a basic WHERE clause to filter the rows.
https://technet.microsoft.com/en-us/library/ms177410%28v=sql.105%29.aspx

How to group rows (bassed on CustomerID) using Pivot in SSIS?

I am practicing SSIS and currently working on Pivot transformation. Here's what i am working on.
I created a Data Source (Table name: Pivot) with the following data.
Using SSIS, i created a package for Pivoting the data to have the following columns
PersonID --- Product1 --- Product2 --- Product3.
Here's where am at, I was able to create the pivot data to text file. But The output is not grouped by PersonID.
My Current Output is
As we can see the Transformation does not group the based on
SetKey(PersonID : PivotUsage =1)
The output i am hoping to get is
Where the data is grouped based on PersonID.
What am i missing here?
Edit:
Going back to the example i was following, I re-ordered the input data as follows.
Does the Input data need to be in this order/pattern, every time? Most of the examples i came across follow the similar pattern.
Yes, the input data needs to be sorted by whatever you're pivoting on:
To pivot data efficiently, which means creating as few records in the
output dataset as possible, the input data must be sorted on the pivot
column. If the data is not sorted, the Pivot transformation might
generate multiple records for each value in the set key, which is the
column that defines set membership. For example, if the dataset is
pivoted on a Name column but the names are not sorted, the output
dataset could have more than one row for each customer, because a
pivot occurs every time that the value in Name changes.
That's a direct quote from the Pivot Transformation documentation on MSDN. (Emphasis added.)
When I first read this answer, I thought that the sorted column should be the one with PivotUsage=2 in the pivot. That's what I understood the pivot column to be. However, what finally worked for me was to sort by a column with pivot usage=1. It's a column I would group by if writing the sql by hand.

Comparing two rows in SQL Server

Scenario
A very large size query returns a lot of fields from multiple joined tables.
Some records seem to be duplicated.
You accomplish some checks, some grouping. You focus on a couple of records for further investigation.
Still, there are too much fields to check each value.
Question
Is there any built-in function that compares two records, returning TRUE if the records match, otherwise FALSE and the set of not matching fields?
The CHECKSUM function should help identify matching rows
SELECT CHECKSUM(*) FROM table
May be this is what you are looking for:
SELECT * FROM YourTable
GROUP BY <<ColumnList>>
HAVING COUNT(*) > 1
Just developing on the suggestion provide by Podiluska to find the records which are duplicates
SELECT CHECKSUM(*)
FROM YourTable
GROUP BY CHECKSUM(*)
HAVING COUNT(*) > 1
I would suggest that use the hashbytes function to compare rows.It is better than checksum.
What about creating a row_number and parttion by all the columns and then select all the rows which are having the rn as 2 and above? This is not slow method as well as it will give you perfect data and will give the full row's data which is being duplicated.I would go with this method instead of relying on all the hashing techniques..

SQL Server Select Query

I have to write a query to get the following data as result.
I have four columns in my database. ID is not null, all others can have null values.
EMP_ID EMP_FIRST_NAME EMP_LAST_NAME EMP_PHONE
1 John Williams +123456789
2 Rodney +124568937
3 Jackson +124578963
4 Joyce Nancy
Now I have to write a query which returns the columns which are not null.
I do not want to specify the column name in my query.
I mean, I want to use SELECT * FROM TABLE WHERE - and add the filter, but I do not want to specify the column name after the WHERE clause.
This question may be foolish but correct me wherever necessary. I'm new to SQL and working on a project with c# and sql.
Why I do not want to use the column name because, I have more than 250 columns and 1500 rows. Now if I select any row, at least one column will have null value. I want to select the row, but the column which has null values for that particular row should not appear in the result.
Please advice. Thank you in advance.
Regards,
Vinay S
Every row returned from a SQL query must contain exactly the same columns as the other rows in the set. There is no way to select only those columns which do not return null unless all of the results in the set have the same null columns and you specify that in your select clause (not your where clause).
To Anders Abels's comment on your question, you could avoid a good deal of the query complexity by separating your data into tables which serve common purposes (called normalizing).
For example, you could put names in one table (Employee_ID, First_Name, Last_Name, Middle_Name, Title), places in another (Address_ID, Address_Name, Street, City, State), relationships in another, then tiny 2-4 column tables which link them all together. Structuring your data this way avoids duplication of individual facts, like, "who is John Williams's supervisor and how do I contact that person."
Your question reads:
I want to get all the columns that don't have a null value.
And at the same time:
But I don't want to specify column names in the WHERE clause.
These are conflicting goals. Your only option is to use the sys.tables and sys.columns DMVs to build a series of dynamic SQL statements. In the end, this is going to be more work that just writing one query by hand the first time.
You can do this with a dynamic PIVOT / UNPIVOT approach, assuming your version of SQL Server supports it (you'll need SQL Server 2005 or better), which would be based on the concepts found in these links:
Dynamic Pivot
PIVOT / UNPIVOT
Effectively, you'll select a row, transform your columns into rows in a pivot table, filter out the NULL entries, and then unpivot it back into a single row. It's going to be ugly and complex code, though.

insert values from one table into another table with different design in sql server

I'm back with another (possibly) silly question. sorry.
I have a pretty complicated query which joins 4 tables and computes the sum of a column based on the other two columns in two tables. the result returned is like this:
Image http://eternalvinay.iocleicester.com/blahblah.png
Now, I want the results to be like the right hand side of the image. the number rows per month/year might change though its 4 for now.
I am creating a temporary table as:
Declare #TmpTable (id int identity, AnsSum float, AnsMonth int, AnsYear int)
to store the values from image --> table1. However, I cant figure out how to convert those rows into the format required by table 2.
So, Any hints on this please?
Thanks so much..
ps: I tried to google and related questions here, no luck.
pss: I am not expecting the exact answer too, i am quite interested to learn new things so if you know where i can learn to do this, a push in the right direction, that would be great too!
You could use cross apply to get all the values in a comma delimted format in a single column. instead of "4" different columns. The problem is this "4" cannot be defined everytime. it may increase or decrease and it is not advisable to have this as columns.
SELECT DISTINCT AnaMonth, anayear, [DerivedColumn] FROM #TmpTable A
CROSS APPLY
(
SELECT AnaSum + ',' FROM #TmpTable B
WHERE A.AnaMonth = B.AnaMonth
AND A.AnaYear = B.AnaYear
FOR XML PATH('')
) AS C (DerivedColumn)
You will get [6.0000, 1.000, 8.0000, 5.0000] in one column for month 5 and year 2010 etc ... You could use this as a table to query for any particular month.
Hope this helps
So you have normalized data and you want to pivot the result set to create repeating groups.
You could use PIVOT but you'd need some other attribute in your base table to define the four columns.
I would recommend do not pivot this query in SQL. Just do the query against the database and get four rows per month/year. Then write code in your application to aggregate the results by month/year.

Resources