Insert characters in SSIS - sql-server

I have a requirement where I need to insert special characters for a particular column after the interval 2,3,3
example
ABCDEFGHIJKL -> AB-CDE-FGH-IJKL
I know I need to use Derived column in SSIS 2012 but I am stuck with the expression. I would really appreciate if anyone can help me out with the correct expression

The logic will be that you need to split your string at a given ordinal position into two pieces and then concatenate those pieces back together with your special character.
A derived column is going to be a bit ugly because the language doesn't have the power of the .net libraries. We'll make heavy use and abuse of SUBSTRING to get the job done
SUBSTRING(MyCol, 1, 2) + "-" + SUBSTRING(MyCol, 2, LEN(MyCol) -2)
That applies the first special character logic. To simplify matters, I would add this column as MyColStep1 to the data flow. I then add a second Derived Column task that uses the above logic but instead uses MyColStep1 as the input. Taking this approach will make it much, much easier to debug (since you can attach a data viewer on the output path of each component).
SUBSTRING(MyColStep1, 1, 6) + "-" + SUBSTRING(MyColStep1, 6, LEN(MyColStep1) -6)
Etc.

Related

How to compare a part of a string using like function SQL

I'm not sure if like function can be used to compare strings or if there is another function to achieve this but this is my case, I have the below part:
R71-14-40000-ATN-LH-D-PF, for the third segment (40000) which is the length; the first 3 digits are the integer part and last 2 digits are the decimals.
I would like to get all parts from DB where the length (third segment) is equals or greater than that value for example, if I use the above part I should get the yellow values an omit the other ones (the values can also be R71-14-50000-ATN-LH-D-PF, R71-14-55000-ATN-LH-D-PF, R71-14-60000-ATN-LH-D-PF, not only start with 4 etc).
I tried this PartNum like '%R71-14-%-ATN-LH-D-PF%' but I get all parts no matter its third segment value
You can use a substring, I think:
where substring(col, 8, 5) >= substring('R71-14-40000-ATN-LH-D-PF', 8, 5)
Some databases use substr() rather than substring().
Using a more restriction LIKE value such as
PartNum LIKE 'R71-14-4____-ATN-LH-D-PF'
would answer the particular query for "values with the 3rd-segment starting with a 4". It could also be ..14-4%-ATN.., although I chose the _ match-exactly-one wildcard for explicitness of a fixed 3rd-segment length (5); it's also easier for the engine to match against.
Then expanding to for "equals or greater than 4" under this fixed-width data can be done by choosing the 3rd-segment starting with a 4, or 5, or 6..
PartNum LIKE 'R71-14-[456789]____-ATN-LH-D-PF'
This works in SQL Server, although there might be slight variations in different RDMBS implementations. This approach is lexical based, which works fine on single-character integer values even though it does not use/utilize numeric equality. SQL Server also supports character-negations that can be useful - see the documentation for the specific RDBMS.
The leading and trailing % are not needed per the shown data. Using a leading % can also be very detrimental to index usage.
The trailing % makes more sense if not caring about the remaining segments,
PartNum LIKE 'R71-14-[456789]____-%'
And if needing to only care about the 3rd-segment,
PartNum LIKE '___-__-[456789]____-%'
PartNum LIKE '___-__-[456789]%' -- or even this
Note the difference from the original query (..14-%-ATN..), which matches all values as expected. This is because it does not add any restrictions to the 3rd-segment value.

SQL server , We have one column of the data decimal(38,35) truncating when we are using cast and float in sql server

We have one column in sql server where we need to consider the all the digits as it is transaction table.The column data type like decimal(38,35) ,its appending zeroes when in sql server like the value is 1.2369 but its displaying as 1.236900000000000000.. but it can be restricted by using float and cast like
"select cast(cast('1.2369'as decimal(38,35))as float)" it will truncate all the zeroes but the real question is when we use the same expression for the bigger decimal value like 1.236597879646479444896645 its truncate the trailing values,considering only up-to scale of 15 digits,if anybody finds the logic for this one please help me .Thank you and
Note :The values are always dynamic.
Becouse float has a precision of 15 digits.
See documentation: float and real
To format a DECIMAL(38,35) without insignificant zeroes, use an explicit FORMAT string, e.g.
SELECT FORMAT(1.23690000000000000, '0.' + REPLICATE('#', 35))
gives 1.2369 (SQL Server 2012 and up). Note, however, that the resulting type is a string, not a number, and so this should only ever be done as the final step (and only if your client software isn't smart enough to format the values on its own). While you're calculating with it, there is either no need to cut off digits, or else you need to be explicit about it by converting to the appropriate DECIMAL (e.g. 1.2369 fits in a DECIMAL(5, 4)). SQL Server can't do this automatically because it doesn't know what kind of precision you're going for in your calculations, but it is definitely something you must take into account, because combining DECIMALs of different scale and precisions can give unexpected results when you're close to the edge.
If you want to remove trailing 0s, recognize that this is now string formatting, not any numeric processing. So we'll convert it to a string and then trick RTRIM into doing the job for us:
select REPLACE(
RTRIM(
REPLACE(
CONVERT(varchar(40),convert(decimal(38,35),'1.2069'))
,'0',' '))
,' ','0')
As I said in a comment though, it's usually more appropriate to put these presentation concerns in your presentation layer, rather than doing it down in the database - especially if whatever is consuming this result set wants to work with this data numerically. In that case, leave it alone - the trailing zeroes are only their because that's how management studio chooses to format decimals for display.

Is it better to maintain 3 small columns or 1 large column in a Table?

Three small number columns [Number(1)] >>
OptionA | 0/1
OptionB | 0/1
OptionC | 0/1
or one larger string column [Varchar2(29)] >>
Options | OptionA=0/1|OptionB=0/1|OptionC=0/1
I'm not sure about the way database handles tables, but I think that maintaining three columns as Number(1) is better than one column as Varchar2(29) !
-EDIT-
Let me explain the situation a bit more:
I am working on a common framework where the all incoming/outgoing request/response is tracked, these interactions can be channeled to a DB/File/JMS; now the all the configuration is being loaded from a table which has a column that corresponds to the output type, currently I'm using "DB=1|FILE=1|JMS=0" as the value of that column so that later if anyone wants to add this for their module they can easily understand what is going on, in my code I've written a simple logic which splits the string by "|" and then I use the exclusive or operator to switch between choice using a switch case..
Everything is already done but I don't like the idea of one large column is better than three small + it will remove the split string I'm doing.
-EDIT-
I finally got it clarified, there may be a situation where we have to add more options; in that case if we add the data column wise, it will result in modifying the table + changing the entity + adding more if's n all; on the other hand I ended up making an enum out of it in a simple bit wise logic to switch between options; this way, I need to modify the enum & add a new handler for the new option & then we are good to go.
Using a single column to store multiple pieces of data is probably the worst thing you can do in a database.
Violating first normal form has at least the following disadvantages:
More difficult to query. OptionA = 1 and OptionB = 1 and OptionC = 0 versus substr(options, 9, 1) = '1' and substr(options, 19, 1) = '1' and substr(options, 19, 1) = '0'.
Less flexable. What happens when you need to add another option? Adding a new column is easy. Adding a new format could mess up old queries. For example, if someone tries to read OptionC with substr(options, -1, 1). (Although this is a good reason to use a 3rd option - a separate table.)
No type safety. This can be a very subtle and tricky problem. Let's say you write substr(options, 9, 1) = 1 instead of substr(options, 9, 1) = '1'. If anyone ever gets the format wrong, a single value could ruin lots of queries. Or worse, it only intermittently crashes a small number of queries, because the access paths keep changing. (Although you can prevent this with a check constraint.)
Slower queries. Normally the amount of work done in an expression or condition isn't a significant cost for a query. But adding a lot of unnecessary string manipulation can make a difference.
Less optimizing. Oracle can only build efficient query plans if it can understand your data. For example, let's say that OptionA is "0" 99.9% of the time. When you filter OptionA = 0, Oracle can use a histogram make a very accurate prediction about the number of rows returned. But for substr(options, 9, 1) = '1' you'll only get a wild guess. If you have complicated queries using this columns you may spend a lot of time trying to "fix" the cardinality estimates. (Although maybe expression statistics could help with this?)
There are times when denormalizing is a good idea. For example, if you have terabytes of data, and compress the table, the single column may take up less space. (But if you're trying to save space, why not use a format like "000" instead?).
If there really is a good reason for this, it definitely needs to be documented. Perhaps add a comment on the column.
For a start, if I am reading your question right, you want each of the options to have one of just two possible values, correct?
If so then you could:
have a separate integer (or boolean) column for each option
have an options column that is a string of 1's and 0's, one digit for each options e.g. "001"
use an 'options' column that is an integer and use a bit value for each options, e.g. optionA == options & 1, optionB == options & 2 etc.
some databases have a bit vector data type which you could use. For mysql there is the BIT data type, which can store bit strings up to 64 bits long.
There will be a trade-off between code complexity and efficiency for each of these. Ask yourself, how much of the machine's time or storage will be saved by employing each of these options? And how much of your time will be saved?
In this instance the 3 column approach is the one I would recommend, not only does this keep things simple in terms of extracting data, but should you ever wish you could set values against all 3 columns rather than being limited to one VarChar2 field. If you opt for the single column VarChar2 then it is fairly simple to extract the info you need using the substr command or perhaps another variation, and although this isn't heavy work for an Oracle db, it does essentially put extra work on the server which is not necessary.

number combination algorithm

Write a function that given a string of digits and a target value, prints where to put +'s and *'s between the digits so they combine exactly to the target value. Note there may be more than one answer, it doesn't matter which one you print.
Examples:
"1231231234",11353 -> "12*3+1+23*123*4"
"3456237490",1185 -> "3*4*56+2+3*7+490"
"3456237490",9191 -> "no solution"
If you have an N digit value, there are N-1 possible slots for the + or * operators. So brute force, there are 3^(N-1) possibilities. Testing all of these are inefficient.
BUT
Your examples are all 10 digits. 3^9 = 19683, so brute force is FINE! No need to get any fancier.
So all you need to do is iterate through all 19683 cases, each time building a string for that case, and evaluating the expression. Evaluating the expression is a straightforward task. Iterating is straightforward (just use an incrementing value, you can read the state of the first slot by (i%3), which gives you "no operator" "+" or "*", the state of the second slot is (i/3)%3, the state of the third slot is (i/9)%3 and so on.)
Even with crude parsing code, CPUs are fast.
The brute force option starts becoming ugly after about 20 digits, and you'd have to switch to be more clever.
If this is for the gaming programmer position, do not use the brute force approach. I did that but failed this a couple of years ago. Later heard from someone inside that dynamic programming approach is the one that gets the job.
This can be solved either by backtracking or by dynamic programming.
The "cleverer" approach (using dynamic programming) is basically this:
For each substring of the original string, figure out all possible values it can create. (e.g. in your first example "12" can become either 1+2=3 or 1*2=2)
There may be a lot of different combinations, but many of them will be duplicates. (Also, you should ignore all combinations that are greater than the target).
Thus, when you add a "+" or a "*", you can envision it as combining two substrings of the string. (and since you have the possible values for each substring, you can see if such a combination is possible)
These values can be generated similarly: try splitting the substring in all possible ways, and combine the different values in each half of the substring.
The total number of "states", then, is something like |S|^2 * target - for your example case, it's worse than the brute-force method. But if you had a string of length 1000 and a target of say 5000, then the problem would be solvable with dynamic programming.
Google Code Jam had an extended version of this problem last year (in Round 1C), called Ugly Numbers. You can visit that link and click "Contest Analysis" for some approaches to that problem, when extended to large numbers of digits.

How can I find odd house numbers in MS SQL?

The problem is I need to ignore the stray Letters in the numbers: e.g. 19A or B417
Take a look here:
Extracting Numbers with SQL Server
There are several hidden "gotcha's" that are explained pretty well in the article.
It depends on how much data you're dealing with, but doing that in SQL is probably going to be slow. Not everyone will agree with me here, but I think all data processing should be done in application code.
I would just take the rows you want, and filter it in the application you're dealing with.
The easiest thing to do here would be to create a CLR function which takes the address. In the CLR function, you would take the first part of the address (assuming it is the house number), which should be delimited by whitespace.
Then, replace any non-numeric characters with an empty string.
You should have a string representing an integer at that point which you can pass to the Parse method on the Int32 class to produce an integer, which you can then check to see if it is odd.
I recommend a CLR function (assuming you are using SQL Server 2005 and above, and can set the compatibility level of the database) here because it's easier to perform string manipulations in .NET than it is in T-SQL.
Assuming [Address] is the column with the address in it...
Select Case Cast(Substring(Reverse(Address), PatIndex('%[0-9]%',
Reverse(Address)), 1) as Integer) % 2
When 0 Then 'Even'
When 1 Then 'Odd' End
From Table
I've been through this drill before. The best alternative is to add a column to the table or to a subsidiary joinable table that stores the inferred numerical value for the purpose. Then use iterative queries to set the column repeatedly until you get sufficient accuracy and coverage. You'll end up encountering stuff like "First, Third," "451a", "1200 South 19th Blvd East", and worse.
Then filter new and edited records as they occur.
As usual, UDF's should be avoided as being slow and (comparatively) less debuggable.

Resources