How to transpose columns to rows in sql server - sql-server

I'm trying to take a raw data set that adds columns for new data and convert it to a more traditional table structure. The idea is to have the script pull the column name (the date) and put that into a new column and then stack each dates data values on top of each other.
Example
Store 1/1/2013 2/1/2013
XYZ INC $1000 $2000
To
Store Date Value
XYZ INC 1/1/2013 $1000
XYZ INC 2/1/2013 $2000
thanks

There are a few different ways that you can get the result that you want.
You can use a SELECT with UNION ALL:
select store, '1/1/2013' date, [1/1/2013] value
from yourtable
union all
select store, '2/1/2013' date, [2/1/2013] value
from yourtable;
See SQL Fiddle with Demo.
You can use the UNPIVOT function:
select store, date, value
from yourtable
unpivot
(
value
for date in ([1/1/2013], [2/1/2013])
) un;
See SQL Fiddle with Demo.
Finally, depending on your version of SQL Server you can use CROSS APPLY:
select store, date, value
from yourtable
cross apply
(
values
('1/1/2013', [1/1/2013]),
('2/1/2013', [2/1/2013])
) c (date, value)
See SQL Fiddle with Demo. All versions will give a result of:
| STORE | DATE | VALUE |
|---------|----------|-------|
| XYZ INC | 1/1/2013 | 1000 |
| XYZ INC | 2/1/2013 | 2000 |

Depending on the details of the problem (i.e. source format, number and variability of dates, how often you need to perform the task, etc), it very well may be much easier to use some other language to parse the data and perform either a reformatting function or the direct insert into the final table.
The above said, if you're interested in a completely SQL solution, it sounds like you're looking for some dynamic pivot functionality. The keywords being dynamic SQL and unpivot. The details vary based on what RDBMS you're using and exactly what the specs are on the initial data set.

I would use a scripting language (Perl, Python, etc.) to generate an INSERT statement for each date column you have in the original data and transpose it into a row keyed by Store and Date. Then run the inserts into your normalized table.

Related

Preventing SQL injection in a report generator with custom formulas

For my customers, I am building a custom report generator, so they can create their own reports.
The concept is this: In a control table, they fill in the content of report columns. Each column can either consist of data from DIFFERENT DATA SOURCES (=tables), or of a FORMULA.
Here is a reduced sample how this looks:
Column | Source | Year | Account | Formula
----------------------------------------------
col1 | TAB1 | 2015 | SALES | (null)
col2 | TAB2 | 2014 | SALES | (null)
col3 | FORMULA | (null) | (null) | ([col2]-[col1])
So col1 and col2 get data from tables tab1 and tab2, and col3 calculates the difference.
A stored procedure then creates a dynamic SQL, and delivers the report data.
The resulting SQL query looks like this:
SELECT
(SELECT sum(val) from tab1 where Year=2015 and Account='SALES') as col1,
(SELECT sum(val) from tab2 where Year=2014 and Account='SALES') as col2,
(
(SELECT sum(val) from tab1 where Year=2015 and Account='SALES')
-
(SELECT sum(val) from tab2 where Year=2014 and Account='SALES')
) as col3 ;
In reality it is far more complex, because there are more parameters, and I'm using coalesce(), etc.
My main headache are the formulas. While they give users a very flexible tool at hand, it is total vulnerable for SQL injections.
Just wanted to know if there is some simple way to check a parameter for a possible SQL injection.
Otherwise I think that I need to limit the flexibility of the system for normal users, and only "super users" get access to the full flexible reports.
not really - many injections involve comments (to comment out the rest of the regulare statment) so you could check for comments (-- and /*) and the ; sign (end of statment).
On the other side if you allow your users to put anything into the filters - why should not someone write a filter as 1 = (select password from users where username = 'admin') to provoke an error message Error converting 'ReallyStrongPassword' to integer'?
Furthermore I guess that performance will be a much bigger problem as injection if I see your queries (it will read tab1 and tab2 twice instead only once if you would write it 'regular').
Edit:
You could check for SQL codewords as select, update, delete, exec ... in the filter parameter, to harden your code / queries.

Dynamic pivot table sql server 2008

I would like to do data mining. But my data is not useful.
my table structure is something like:
date customerid age residence prosubsclassid productid
----------------------------------------------------------------------------
21.11.2001 123232323 a b 2099 23232322
amount asset sales
------------------------
4 34 56
Now I have to show the data in this way:
prosubsclassid 130207 130208 130209
------ ------ ------
1413232 1 3 1
3435545 2 1 2
3534344 3 1 sum(amount)
Column(customerid)
I want to convert to tabular form in my data.
There is no automatic way to do this. There are support for pivot in SQL server, but the columns still needs to be specified.
Depending on if you want to have a text report or a table with dynamic columns I would group the data per date and prosubsclassid and then using a cursor to build the data.
If you want a dynaic table build a dynamic sql query based on the grouped data and run Exec.
If you want text report, just concatenate the string data the way you want per line into
a temp table with one textcolumn and when you are done, select the table.

Equivalent to PostgreSQL array() / array_to_string() functions in Oracle 9i

I'm hoping to return a single row with a comma separated list of values from a query that returns multiple rows in Oracle, essentially flattening the returned rows into a single row.
In PostgreSQL this can be achieved using the array and array_to_string functions like this:
Given the table "people":
id | name
---------
1 | bob
2 | alice
3 | jon
The SQL:
select array_to_string(array(select name from people), ',') as names;
Will return:
names
-------------
bob,alice,jon
How would I achieve the same result in Oracle 9i?
Thanks,
Matt
Tim Hall has the definitive collection of string aggregation techniques in Oracle.
If you're stuck on 9i, my personal preference would be to define a custom aggregate (there is an implementation of string_agg on that page) such that you would have
SELECT string_agg( name )
FROM people
But you have to define a new STRING_AGG function. If you need to avoid creating new objects, there are other approaches but in 9i they're going to be messier than the PostgreSQL syntax.
In 10g I definitely prefer the COLLECT option mentioned at the end of Tim's article.
The nice thing about that approach is that the same underlying function (that accepts the collection as an argument), can be used both as an aggregate and as a multiset function:
SELECT deptno, tab_to_string(CAST(MULTISET(SELECT ename FROM emp
WHERE deptno = dept.deptno) AS t_varchar2_tab), ',') FROM dept
However in 9i that's not available. SYS_CONNECT_BY_PATH is nice because it's flexible, but it can be slow, so be careful of that.

SQL make rows into columns, PIVOT maybe

I have an MS SQL Server with a database for an E-commerce storefront.
This is some of the tables I have:
Products:
Id | Name | Price
ProductAttributeTypes: -Color, Size, Format
Id | Name
ProductAttributes: --Red, Green, 12x20 cm, Mirrored
Id | ProductAttributeTypeId | Name
Orders:
Id | DateCreated
OrderItems:
Id | OrderId | ProductId
OrderItemsToProductAttributes: --Relates an OrderItem to its product and selected attributes
OrderItemId | ProductAttributeId | ProductAttributeTypeId | ProductId
I want to select from the OrderItems table, to see which items have been purchased.
To see what kind of variants (ProductAtriibutes) was selected, I want those as "dynamic" columns in the resultset.
So the resultset should look like this:
OrderItemId | ProductId | ProductName | Color | Size | Format
1234 123 Mount. Bike Red 2x20 Mirror
I don't know if PIVOT is the thing to use? I'm not using any aggregate functions, so I guess not...
Is there any SQL Ninjas that can help me out?
If you are using sql2005 or 2008 you can use the pivot command. See here.
In the example below the OrderAttributes set will look like:
OrderItemId AttName AttValue
----- ------ -----
100 Color Red
100 Size Small
101 Color Blue
101 Size Small
102 Color Red
102 Size Small
103 Color Blue
103 Size Large
The final results after the PIVOT will be:
OrderItemId Size Color
----- ------ -----
100 Small Red
101 Small Blue
102 Small Red
103 Large Blue
WITH OrderAttributes(OrderItemId, AttName, AttValue)
AS (
SELECT
OrderItemId,
pat.Name AS AttName,
pa.Name AS AttValue
FROM OrderItemsToProductAttributes x
INNER JOIN ProductAttributes pa
ON x.ProductAttributeId = pa.id
INNER JOIN ProductAttributeTypes pat
ON pa.ProductAttributeTypeId = pat.Id
)
SELECT AttrPivot.OrderItemId,
[Size] AS [Size],
[Color] AS Color
FROM OrderAttributes
PIVOT (
MAX([AttValue])
FOR [AttName] IN ([Color],[Size])
) AS AttrPivot
ORDER BY AttrPivot.OrderItemId
There is a way to dynamically build the columns (i.e. the Color and Size columns), as can be seen here. Make sure your database compatibility level on your database is set to something greater than 2000 or you will get strange errors.
In the past, I've created physical tables for read purposes only. The structure you have above is GREAT for storage, but terrible for reporting.
So you could do the following:
Write a script (that is scheduled nightly) or a trigger (on data change) that does the following tasks:
First, you would dynamically go through each Product and build a static table "Product_[ProductName]"
Then go through each ProductAttributeTypes for each product and create/update/delete a physical column on the corresponding Product table.
Then, fill that table with the proper values based on OrderItemsToProductAttributes and ProductAttributes
This is just a rough idea. Make sure you are storing OrderID in the "Static"/"Flattened" tables. And make sure you do everything else you need to do. But after that, you should be able to start pulling from those flattened tables to get the data you need.
Pivot is your best bet, but what I did for reporting purposes, and to make it work well with SSIS is to create a view, which then has this query:
SELECT [InputSetID], [InputSetName], CAST([470] AS int) AS [Created By], CAST([480] AS datetime) AS [Created], CAST([479] AS int) AS [Updated By], CAST([460] AS datetime)
AS [Updated]
FROM (SELECT st.InputSetID, st.InputSetName, avt.InputSetID AS avtID, avt.AttributeID, avt.Value
FROM app.InputSetAttributeValue avt JOIN
app.InputSets st ON avt.InputSetID = st.InputSetID) AS p PIVOT (MAX(Value) FOR AttributeID IN ([470], [480], [479], [460])) AS pvt
Then I can just interact with the view, but, I have a trigger on the table that any new dynamic attributes must be added to, which recreates this view, so I can assume the view is always correct.

Detecting Correlated Columns in Data

Suppose I have the following data:
OrderNumber | CustomerName | CustomerAddress | CustomerCode
1 | Chris | 1234 Test Drive | 123
2 | Chris | 1234 Test Drive | 123
How can I detect that the columns "CustomerName", "CustomerAddress", and "CustomerCode" all correlate perfectly? I'm thinking that Sql Server data mining is probably the right tool for the job, but I don't have too much experience with that.
Thanks in advance.
UPDATE:
By "correlate", I mean in the statistics sense, that whenever column a is x, column b will be y. In the above data, The last three columns correlate with each other, and the first column does not.
The input of the operation would be the name of the table, and the output would be something like :
Column 1 | Column 2 | Certainty
CustomerName | CustomerAddress | 100%
CustomerAddress | CustomerCode | 100%
There is a 'functional dependency' test built in to the SQL Server Data Profiling component (which is an SSIS component that ships with SQL Server 2008). It is described pretty well on this blog post:
http://blogs.conchango.com/jamiethomson/archive/2008/03/03/ssis-data-profiling-task-part-7-functional-dependency.aspx
I have played a little bit with accessing the data profiler output via some (under-documented) .NET APIs and it seems doable. However, since my requirement dealt with distribution of column values, I ended up going with something much simpler based on the output of DBCC STATISTICS. I was quite impressed by what I saw of the profiler component and the output viewer.
What do you mean by correlate? Do you just want to see if they're equal? You can do that in T-SQL by joining the table to itself:
select distinct
case when a.OrderNumber < b.OrderNumber then a.OrderNumber
else b.OrderNumber
end as FirstOrderNumber,
case when a.OrderNumber < b.OrderNumber then b.OrderNumber
else a.OrderNumber
end as SecondOrderNumber
from
MyTable a
inner join MyTable b on
a.CustomerName = b.CustomerName
and a.CustomerAddress = b.CustomerAddress
and a.CustomerCode = b.CustomerCode
This would return you:
FirstOrderNumber | SecondOrderNumber
1 | 2
Correlation is defined on metric spaces, and your values are not metric.
This will give you percent of customers that don't have customerAddress uniquely defined by customerName:
SELECT AVG(perfect)
FROM (
SELECT
customerName,
CASE
WHEN COUNT(customerAddress) = COUNT(DISTINCT customerAddress)
THEN 0
ELSE 1
END AS perfect
FROM orders
GROUP BY
customerName
) q
Substitute other columns instead of customerAddress and customerName into this query to find discrepancies between them.

Resources