Convert varchar to integer from substring - sql-server

using Microsoft SQL Server 2019
So I have parsed out several columns from a large text field and will need to use the parsed data as a multiplier to other fields
here is a single parsed column mostly for context
SUBSTRING(curv.curv_data,
CHARINDEX('PctUsage', curv.curv_data, CHARINDEX('||1', curv.curv_data)) + 9,
(CHARINDEX(')())', curv.curv_data, CHARINDEX('PctUsage', curv.curv_data, CHARINDEX('||2', curv.curv_data))) - 9) - CHARINDEX('PctUsage', curv.curv_data, CHARINDEX('||1', curv.curv_data) + 9)) PctUsage1
Here is the output.
PctUsage1
5
3.5
0.5
6.5
5
9
1
1.3
1.5
25.
4
0.0
1.2
so given there are 0s and 0.5 I am having a hard time turning this column into something useful. the end result I would like is to turn these into usable percentages for example 5 turns into .05 and so on.
I am going to be using the SQL above in a sub-select that will then be used as a multiplier for my working hours field here is an example of my multiplier field
Working_hours
51.600006
0.000000
20.799993
8.399999
28.799999
86.027731
5.199999
This is the line I would like to be able to use
PctUsage1*Working_hours curved_hours,
So working_hours is an Integer already but Pctusage1 is a varchar
Thank you in advance for the help

Okay so i think i have a solution, sorry for any of the trouble you guys have taken i guess writing out my problem really helped me work through it. i used a case statement with try_cast that helped filter out the different problems i was having with using only one solution. the last problem with this is making sure my multiplier is correct for that i had to use CHARINDEX with an imbedded case statement
case when try_cast(PctUsage1 as int) is null then case when CHARINDEX('.',PctUsage1) = 3 then cast(replace(PctUsage1, '.','') as Decimal(5,2))/100 when CHARINDEX('.',PctUsage1) = 2 then cast(replace(PctUsage1, '.','') as Decimal(5,2))/1000 end else try_cast(PctUsage1 as decimal(5,2))/100 end PctUsage1_converted,

Related

Calculate working days between two dates: null value returned

I'm trying to figure out the number of working days between two dates. The table (dfDates) is laid out as follows:
Key
StartDateKey
EndDateKey
1
20171227
20180104
2
20171227
20171229
I have another table (dfDimDate) with all the relevant date keys and whether the date key is a working day or not:
DateKey
WorkDayFlag
20171227
1
20171228
1
20171229
1
20171230
0
20171231
0
20180101
0
20180102
1
20180103
1
20180104
1
I'm expecting a result as so:
Key
WorkingDays
1
6
2
3
So far (I realise this isn't complete to get me the above result), I've written this:
workingdays = []
for i in range(0, len(dfDates)):
value = dfDimDate.filter((dfDimDate.DateKey >= dfDates.collect()[i][1]) & (dfDimDate.DateKey <= df.collect()[i][2])).agg({'WorkDayFlag': 'sum'})
workingdays.append(value.collect())
However, only null values are being returned. Also, I've noticed this is very slow and took 54 seconds before it errored.
I think I understand what the error is about but I'm not sure how to fix it. Also, I'm not sure how to optimise the command so it runs faster. I'm looking for a solution in pyspark or spark SQL (whichever is easiest).
Many thanks,
Carolina
Edit: The error below was resolved thanks to a suggestion from #samkart who said to put the agg after the filter
AnalysisException: Resolved attribute(s) DateKey#17075 missing from sum(WorkDayFlag)#22142L in operator !Filter ((DateKey#17075 <= 20171228) AND (DateKey#17075 >= 20171227)).;
A possible and simple solution:
from pyspark.sql import functions as F
dfDates \
.join(dfDimDate, dfDimDate.DateKey.between(dfDates.StartDateKey, dfDates.EndDateKey)) \
.groupBy(dfDates.Key) \
.agg(F.sum(dfDimDate.WorkDayFlag).alias('WorkingDays'))
That is, first join the two datasets in order to link each date with all the dimDate rows in its range (dfDates.StartDateKey <= dfDimDate.DateKey <= dfDates.EndDateKey).
Then simply group the joined dataset by the date key and count the number of working days in its range.
In the solution you proposed, you are performing the calculation directly on the driver, so you are not taking advantage of the parallelism that spark offers. This should be avoided when possible, especially for large datasets.
Apart from that, you are requesting repeated collects in the for-loop, even for the same data, resulting in a further slowdown.

This string has a value I need to extract and its usually at the end of the string

Can someone help me please?
Eg.
1 x E10,Day rate per hour,1
1 x E10
1 x E2A,"As E2 but with wireless roomstat, also assumes power available within 2m (this code only)",2.5
I got thousands of rows like the sample above and would like to extract the last figures before the comma from the right
How can I extract that figures from the string please using SQL script?
From the three samples above, I would please like to see something like this:
1
this should be blank or 0 since the expected figure is not there in this case
2.5
I think I need to use a substring and charindex but cannot get it working.
Thanks in advance for your help
Here is a solution using CHARINDEX and RIGHT to extract the values
IIF(CHARINDEX(',', REVERSE(column)) > 0, RIGHT(column, CHARINDEX(',', REVERSE(column)) - 1), '')

DT_DBTIMESTAMP2 having only 3 digits

I'm having (DT_DBTIMESTAMP2,7)GETDATE() in SSIS Derived Column Transformation and Table column with datetime2(7).
Even though I set 7 Digit Second Scale in both, but seems it comes only 3 digit.
For example, I expected like '2018-05-02 16:45:15.6192346' but it comes '2018-05-02 16:45:15.6190000'.
The reason why I need the millseconds, I'd like to sort out the latest record from any duplications using timestamp. I realized only 3 digit second scale is not enough for this pourpose.
Except for Derived Column Transformation and Table Columns, is there any requrired setting in SSIS package? Any advices would be appreciated.
GETDATE() returns a datetime, you should use SYSDATETIME() instead. See documentation.
edit
As noted by Larnu, you are probably using SSIS expression GETDATE, rather that the sql expression GETDATE as I assumed. The point is more-or-less the same though. GETDATE returns a DT_DBTIMESTAMP, where "The fractional seconds have a maximum scale of 3 digits." (Source).
Although this is almost the same as what HoneyBadger has said, I'm expanding a little, as the OP isn't using the GETDATE() expression in SQL Server. The value 2018-05-02 16:45:15.619 could never be returned by GETDATE() (Transact-SQL) as it's only accurate to 1/300th of a second (thus the final digit can only every be 0,3, and 7 (technically 0, 333333333~ and 666666666~, which is why the final digit is a 7, as it's rounded up)).
In SSIS the GETDATE() expression returns a datatype of DB_TIMESTAMP. According to the Documentation:
A timestamp structure that consists of year, month, day, hour, minute,
second, and fractional seconds. The fractional seconds have a maximum
scale of 3 digits.
Thus, the last 4 characters are lost. Unfortunately, I don't believe there is a function in SSIS that returns the the current date and time to the accuracy you require. Thus, if you need this high level, you'll likely need to use an expression in SQL Server that does, such as SYSDATETIME() that HoneyBadger recommended.

Migration conversion of float data to decimal data

I am migrating data from one table to a new table. The old table uses FLOAT, and in the new table I am using DECIMAL as the field attribute.
I am using the following statement which worked fine:
CAST(OLD_COLUMN_NAME as DECIMAL(9,2) AS 'NEW_COLUMN_NAME'
that works like a charm until I hit the bump in the road. The old data is defined as float,null, the new field is defined as decimal(5,5). I understand that decimal(5,5) will requires all data behind the decimal for 5 positions. Just wondering if they is any way to handle this problem of moving data from a float data field to a decimal data field.
The input data from the old field is varied and looks like this: 5, 0.5, 0.5, 0.75, 2, and so forth.
The error I am receiving is:
Msg 8115, Level 16, State 6, Line 8 Arithmetic overflow error converting float to data type numeric.The statement has been terminated.
The code is against a SQL SERVER 2005 database using SQL SERVER 2008. Not sure this matters, but thought I would include this information.
Could someone shed some light on how to address this data conversion issue? Thank you!
Decimal(5, 5) only has a range - 0.99999 to + 0.99999
but you say you are trying to put 5 and 2 into it. Do you need Decimal(10, 5)?
If not you have indeed been left a bit of a conundrum.
I mention the following as a point of interest rather than a serious suggestion, though, who knows, it may be useful!
If the column is nullable one way to fullfill your bosses requirements would be
create table #t (j int,i Decimal(5, 5) null)
set ansi_warnings off
set arithabort off
insert into #t values(1,0.2345678)
insert into #t values(2,10)
insert into #t values(3,0.455464)
select * from #t
Output
Arithmetic overflow occurred.
j i
----------- ------------------------------
1 0.23457
2 NULL
3 0.45546
I would recommend doing something like this.
Cast(Round(field, 5) As Decimal(5, 5))
Does your boss understand that decimal(5,5) won't allow anything to the left of the decimal point?
Ask your boss what he/she wants in plain english. If your boss is really asking you to store numbers greater than .99999, but insisting specifically on "decimal(5,5)", then clearly they don't understand what they're asking for. The first "5" in decimal(5,5) refers to the total number of digits in the number (before and after combined)

How do I get back a 2 digit representation of a number in SQL 2000

I have a table on SQL2000 with a numeric column and I need the select to return a 01, 02, 03...
It currently returns 1,2,3,...10,11...
Thanks.
Does this work?
SELECT REPLACE(STR(mycolumn, 2), ' ', '0')
From http://foxtricks.blogspot.com/2007/07/zero-padding-numeric-value-in-transact.html
This sort of question is about the interface to the database. Really the database should return the data and your application can reformat it if it wants the data in a particular format. You shouldn't do this in the database, but out in the presentation layer.
John's answer works and is generalizable to any number of digits, but I would be more comfortable with
select case when mycolumn between -9 and 9 then '0' + str(mycolumn) else str(mycolumn) end
where n is a positive integer between 0 and 99:
select right('0'+ltrim(str(n)),2)
or
select right(str(100+n),2)
but I like John's answer best. Single point of specification for target width, but I posted these because they are also common idioms that might work better in other situations or languages.

Resources