Dynamic PIVOT with varchar columns - sql-server

I'm trying to to pivot rows into columns. I basically have lots of lines where every N rows means a row on a table I'd like to list as a result set. I'll give a short example:
I have a table structure like this:
Keep it in mind that I removed lots of rows to simplify this example. Every 6 rows means 1 row in the result set, which I would like to be like this:
All columns are varchar types (that's why I couldn't get it done with pivot)
Number os columns are dynamic, so it's the number of rows in source table
Logically, Number of rows (table rows in result set) are equally dynamic

(Not really an answer, but it's what I've got.)
This is a name/value pair table, right? Your query will require something that identifies which "set" of rows is associated with one another. Without something like this, I don't see how the query can be written. The key factor is that you must never assume that data will be returned from SQL (Server, at least) in any particular order. How the data is stored internally generally, but not always, determines how it is returned when order is not specified.
Another consideration: what if (when?) a row is missing -- say, Product 4 has no Price B column? That would break a simple "every six rows" rule. "Start fresh with every new Code row" would it problems if a Code is missed or when (not if) data is not returned in the anticipated order.
If you have some means of grouping items, let us know in an updated question, but otherwise I don't think this one is particularly solvable.

I actually did it.
I wrote a SQL while...do based on the number of columnns registered for the resultset. This way I could write a dynamic SQL clause for N columns based on the values read. In the end I just inserted the resultset in a temp table, and voi lá.
Thanks anyways!

Related

with what logic results are retrieved when a aggreerate function is applied on columns having text data ..?

So I was retrieving some values from few set of tables of a database for making a dataset.Each and every column of that dataset is dependent on the value of a column having primary key.
So the problem is here...
there is column having textual data in the dataset which will be affecting to the value of a columns next to it.
So lets consider them as col1 and col2 .
now I was trying to get the values of col1 by using aggregate functions like MAX,MIN and they were giving me correct result for a set of primary keys. after some time when primary keys changes or I apply this logic with another dataset with same database its not giving me correct values.
I think its perfectly working for the columns having 2 values like 'A' or 'B' and the moment number of values increases by 3 and more its not working out correctly.
is there solution for this?
MAX and MIN, when applied to text essentially sort the whole column alphabetically (according to the collation of he column or the db) and then take the last (max) or first (min) value respectively
I've never encountered a problem with this as an algorithm; it's always correctly chosen the alphabetically latest or earliest value regardless of the count of values but you should bear in mind that collation will affect things; different languages sort differently and you should closely consider how a culture sorts it's alphabet when looking to my n or max text
Also remember min and max and grouping operations mix row data up so you don't get to keep other data from the same row when it is part of a grouping operation. If you want eg "the latest row as defined by textcolumn1, plus all the other data from that row" you'd probably need to use row_number() over(order by textcolumn1) and then pick the row where the output of row number was 1

Database design: ordered set

task_set is a database with two colums(id, task):
id task
1 shout
2 bark
3 walk
4 run
assume there is another table with two colums(employee,task_order)
task_order is an ordered set of tasks, for example (2,4,3,1)
generally, the task_order is unchanged, but sometimes it may be inserted or deleted, e.g, (2,4,9,3,1) ,(2,4,1)
how to design such a database? I mean how to realize the ordered set?
If, and ONLY if you don't need to search inside the task_set column, or update one of it's values (i.e change 4,2,3 to 4,2,1), keeping that column as a delimited string might be an easy solution.
However, if you ever plan on searches or updates for specific values inside the task_set, then you better normalize that structure into a table that will hold employee id, task id, and task order.

Is it a good idea to include flags in Fact table

The transactional fact table of one ofthe star schemas need to anser questions like Is the first application is final application.This is associated with one of the business process.
Is it a good idea to keep this as a part of the fact table with a column name,
IsFirstAppLastFlag.
There are not much flags to create a seperate dimension.Also this flag(calculated flag) is essential in the report writing.In this context do we need to keep it in Dimension or in Fact!
I assume the creation of junk dimension is for those flags /low cardinality columns which are not so useful can kept it inside a dimension?!
This will depend on your own needs but if you like the purest view of the fact table then the answer is no, these fields should not be included in your fact table.
The fact table should include dimension keys, degenerate dimension keys, and facts.
IsStatusOne, IsStatusTwo, etc are attributes and as you rightly suggest would be well suited to a junk dimension in the absence of them belonging to a more suitable dimension, e.g., IsWeekDay would be suited to dimension "Date" table.
You may start off with only a few "Is" attributes in your fact table but over time you may need more and more of these attributes, you will look back and possibly wish you created a junk dimension.
Performance:
Interestingly if you are using bit columns for your flags then then there is little storage difference in using 8 bit flags in your fact table then having one tinyint dimension key, however when your flags are more verbose or have multiple status values then you should use the junk dimension to improve performance on the fact table, less storage, memory, more rows in a page, etc..
Personally, I would junk them
That seems fine, as long as it it an attribute of the fact, not of one of the dimensions. In some cases I think you might have a slowly changing dimension in which it would be more appropriately placed.
I would be concerned that this plan might require updates on the fact table, for example if you were intending to flag that a particular fact was the most recent for a customer. If that was the case it might be better to keep a transaction number in the fact table, and a "most recent transaction number" in the dimension table, and provide an indexing method to effectively retrieve the most recent per-customer.
You can use Junk Dimension.
Instead of creating several dimension with few rows you can create on dimnsion with all possible combination of value then you add just one foregion key in your fact table.
you can populate your junk dimension with a query like below.
WITH cteFlags AS
(
SELECT 'N' AS Value
UNION ALL
SELECT 'Y'
)
SELECT
Flag1.Value,
Flag2.Value,
Flag3.Value
FROM
cteFlags Flag1
CROSS JOIN cteFlags Flag2
CROSS JOIN cteFlags Flag3

T-SQL: query which joins with all dependent tables and produce cartesian product

I have a bunch of tables which refer to some number of other tables (zero, one, two or more).
My example tables might contain following columns:
Id | StatementTable1Id | StatementTable2Id | Value
where StatementTable1 will contain following columns:
Id | Name | Label
I wish to get all possible combinations and join all of them.
I found this link very useful (query which produce information about dependencies).
I would imagine my code as follows:
Prepare list of tables which I wish to query.
Query link for all my tables and save results into temporary table.
Check maximum number of dependent tables. Prepare query template - for example if maximum number of dependent tables is equal two:
Select
Id, '%Table1Name%' as Table1Name,
'%StatementLabelTable1%' as StatementLabelTable1,
'%Table2Name%' as Table2Name,
'%StatementLabelTable2%' as StatementLabelTable2, Value"
Use cursor - for each dependent table replace appropriate part with dependent table name and label of elements within it.
When all dependent tables have been used - replace all remaining columns with empty string.
add "UNION ALL" and proceed to next table
Run query
Could you tell me if there's any easier or better way?
What you've listed there sounds like you'll need to do if you don't know the column details ahead of time. There's likely going to be some trial-and-error to get the details correct, but it's a good plan to start.
That being said, why on earth would you want to do such a thing? It sounds like you need to narrow down your requirements on what data is actually needed. Otherwise, as you add data to your database, this query and resulting data set is going to quickly become quite unwieldy (these data sets are the kinds you hear about becoming daily "door-stop reports"; no one uses them, but they never remember why it was created, so they keep running the report, and just use it as a door-stop).

Creating a Non-Int and Non-Guid Unique Identifier

I'm looking for a way SQL Server can generate a unique identifier that is not an increment Int or a GUID.
The Unique ID can be a combination of letters and numbers and has no other characters, and as previously mentioned Must be Unique.
ie AS93K239DFAK
And if possible must always start with AS or end with an K
It would be nice if this unique id can be generated automatically when there is an Insert like GUIDs and IsIdentity = Yes does. It can be a random number, it is not predetermined in the app.
Is doing something like this possible, or does it have to be generated application-side?
From comments, it sounds like you would be OK with using an IDENTITY field and padding it with 0s and adding a prefix/suffix. Something like this should work:
1 - Add an IDENTITY field which will be auto-incremented
2 - Add a calculated field in the table with the definition of:
[InvoiceNo] AS ('AS' + RIGHT(('000000000' + CAST(idfield AS varchar(9))), 9) + 'FAK')
This will give you invoiceno in the format of:
AS000000001FAK
AS000000002FAK
...
AS000995481FAK
I've never seen a randomly generated invoice number. Most of them are usually a combination of multiple identifying fields. For example, one segment might be the companyID, another might be the InvoieID, and a third might be a Date value
For example, AS-0001-00005-K or AS-001-00005-021712-K, which would stand for CompanyId 1, Invoice #5, generated on 2/17/12
You said in a comment that you don't want to let the company know a count of how many past invoices there are, and this way they won't know the count except for how many invoices they have received, which is a value they should know anyways.
If you're concerned about giving away how many companies there are, use an alpha company code instead, so your end result looks like AS-R07S-00005-K or ASR07S00005K
So you can do it this way, just don't expect it to perform well.
(1) populate a big massive table with some exhaustive set of invoice values - which should be at least double the number of invoices you think you'll ever need. Populate the data in random order in advance.
(2) create a stored procedure that pulls the next invoice off the pile, and then either deletes it or marks it as taken.
But, be sure that this solution makes sense for your business. In many countries it is actually law for invoice numbers to be sequential. I'm guessing we're not really talking about invoices, but wanted to make sure it's at least considered.
What is so confusing about the random part being unique? If you have a two digit invoice number there can only be 100 unique values (00 - 99). A GUID has 2 to the power 128 values and is statistically unique. If you use a 8 characters of a GUID then with even 1 million invoices you have a betting chance of getting a collision. With 1 million invoices if you use 12 characters of GUID then you have very good chance of NOT getting a collision. If you use 16 characters of a GUID then you are pretty much statistically unique if you have less than 1 billion invoices. I would use 12 characters but check against actual values for uniqueness and you only have lottery chance of getting collision.
How are you inserting these new invoices to the table? A straight up batch insert or are you doing some business logic/integrity checks in a stored procedure first and 'creating' the invoices one by one?
In the second case, you could easily build a unique ID in the procedure. You could store a seed number in a table and take the number from there then cast it as a varchar and append the alphanumeric characters, you can then increment the seed. This also gives you the option of creating a gap between unique IDs if you needed to import some records into the gap at a later date.

Resources