I have the data coming from a flat file source. "unit_Price" (column name) it has the following data.
Unit_Price
00465797857
precisely, I have to convert this values to 465.797857.
In the SSIS package, I conevrted it to DT_R8 and multiplied with 0.000001 so, I got the value in staging table 465.797857. (Staging table I set the datatype for this column as float)
(Datatype in original table is numeric(11,6)) while moving the data from staging to original table, It is rounding off the value to 465.798000. I tried set the datatype as float and real but it is rounding off the value in any case.
Is there anyway, I can get the value in the Original table as it is in Staging table. ? (i.e 465.797857)
Thanks
Related
I'm really struggling to get 0.00 values saved down to a .csv as 0.00 and not .00 (opened in Notepad++, not Excel issue).
Other posts recommended changing all the outputs to string, this did not help leading zeros are truncated.
Converting to currency via DT_CY (Doen this on mapping and derived column) oddly I just got 0 rather than .00 or 0.00, a colleague believe he fixed it with his conversion.
I've used derived columns to format dates, but not helping to get my leading zeros out.
An older stackflow post suggested this method in a new derived string column
[Price] < 1 ? ([Price] >= 0 ? "0" + (DT_WSTR,18)[Price] : [Price] > -1 ? "-0" + (DT_WSTR,18)(-1 * [Price]) : (DT_WSTR,18)[Price]) : (DT_WSTR,18)[Price]
When I put that in, it complains about the conversion between integer and string.
That post was from way back in 2010. I'm using VS 2019 and all my stored procedure datatypes are decimal(18,2).
I can't really change the stored procedure at this stage so looking for an SSIS solution.
Any tips would be appreciated, thanks.
When I add a dataviewer between Derived Column > Flat File Destination it shows the column I'm working with as 0.0000 (converted to dt_Cy) writes to file as 0. All other values like 7.56 appear as they should. Perplexing...
Resolved it, made sure the source columns were output as numeric(18,2) then derive column to string using the expression I posted in question, output as string and set to string in flat file column output details (albeit slow when to action if you have tons of columns).
All works and happy :)
I am using a flat file data provider in SSIS to import data from an external system. I don't have any control over the file, it is pushed out on a weekly basis, and I pick it up from a common folder.
The first two columns of the CSV are dates. Part of the way through the file, the date format has changed from date to numeric as follows:
Service_Date, Event_Datetime
2018-04-30,2018-04-30 21:18
43220,43220.92412
As you can see, the format changed from date to numeric. Other date columns not shown here also have changed.
Obviously, this is breaking the data flow task.
Aside from going into Excel and saving the CSV with the proper column format, is there any way within SSIS can convert on the fly so that the job doesn't fail and require manual intervention?
These data values 43220,43220.92412 are called date serials, you can get the date value in many approaches:
(1) Using A derived Column
You can convert this column to float then to datetime using a derived column:
(DT_DATE)(DT_R8)[dateColumn]
References
convert Excel Date Serial Number to Regular Date
Is there a better way to parse [Integer].[Integer] style dates in SSIS?
(2) Using a script component
You can use DateTime.FromOADAte() function, as example: (code in VB.NET)
If Row.ServiceDate_IsNull = False AndAlso String.IsnullOrEmpty(Row.ServiceDate) Then
Dim dblTemp as Double
If Double.TryParse(Row.ServiceDatemdblTemp) Then
Row.OutputDate = DateTime.FromOADate(dblTemp)
Else
Row.OutputDate = Date.Parse(Row.ServiceDatemdblTemp)
End
End If
Reference
SSIS Script Task - VB - Date is extracting as INT instead of date string
I was able to solve the problem using a variation of the derived column. This expression would catch the column obviously formatted as a date, and convert it to a date, otherwise it converts the date serial to a float first, then to a date
FINDSTRING(Date_Service,"-",1) != 0 ? (DT_DATE)Date_Service : (DT_DATE)(DT_R8)Date_Service
I have a table that I'm inserting GPS coordinates (lat/lon) on each record.
table schema is like: (Id, Time, Lat, Lon)
Is it possible to calculate distance of two continuous records using Calculated columns?
Something that schema become like this: (Id, Time, Lat, Lon, Distanceof( ID -1, ID ))
Note: I know how to calculate distance of two points, but I dont know if its possible to access multiple rows data on a calculated column.
It is impossible to directly use values of other rows in calculated column definition, but one can create a user-defined function and use it:
CREATE FUNCTION dbo.CalcDistance(#prev_row_id INT, #row_id INT)
AS
...
and then define calculated column expression as dbo.CalcDistance(Id - 1, Id).
you can create a function Distanceof for calculating the distance.
and you can select as
Select id, time, Lat,Lon, Distanceof(Lat, Lng,radius) from the table.
it will calculate for each row. But it will timeconsuming if you are doing many rows
I have a tricky problem trying to find an efficient way of ordering a set of objects (~1000 rows) that contain a large (~5 million) number of indexed data points. In my case I need a query that allows me to order the table by a specific datapoint. Each datapoint is a 16-bit unsigned integer.
I am currently solving this problem by using an large array:
Object Table:
id serial NOT NULL,
category_id integer,
description text,
name character varying(255),
created_at timestamp without time zone NOT NULL,
updated_at timestamp without time zone NOT NULL,
data integer[],
GIST index:
CREATE INDEX object_rdtree_idx
ON object
USING gist
(data gist__intbig_ops)
This index is not currently being used when I do a select query, and I am not certain it would help anyway.
Each day the array field is updated with a new set of ~5 million values
I have a webserver that needs to list all objects ordered by the value of a particular data point:
Example Query:
SELECT name, data[3916863] as weight FROM object ORDER BY weight DESC
Currently, it takes about 2.5 Seconds to perform this query.
Question:
Is there a better approach? I am happy for the insertion side to be slow as it happens in the background, but I need the select query to be as fast as possible. In saying this, there is a limit to how long the insertion can take.
I have considered creating a lookup table where every value has it's own row - but I'm not sure how the insertion/lookup time would be affected by this approach and I suspect entering 1000+ records with ~5 million data points as individual rows would be too slow.
Currently inserting a row takes ~30 seconds which is acceptable for now.
Ultimately I am still on the hunt for a scalable solution to the base problem, but for now I need this solution to work, so this solution doesn't need to scale up any further.
Update:
I was wrong to dismiss having a giant table instead of an array, while insertion time massively increased, query time is reduced to just a few milliseconds.
I am now altering my generation algorithm to only save a datum if it non-zero and changed from previous update. This has reduced insertions to just a few hundred thousands values which only takes a few seconds.
New Table:
CREATE TABLE data
(
object_id integer,
data_index integer,
value integer,
)
CREATE INDEX index_data_on_data_index
ON data
USING btree
("data_index");
New Query:
SELECT name, coalesce(value,0) as weight FROM objects LEFT OUTER JOIN data on data.object_id = objects.id AND data_index = 7731363 ORDER BY weight DESC
Insertion Time: 15,000 records/second
Query Time: 17ms
First of all, do you really need a relational database for this? You do not seem to be relating some data to some other data. You might be much better off with a flat-file format.
Secondly, your index on data is useless for the query you showed. You are querying for a datum (a position in your array) while the index is built on the values in the array. Dropping the index will make the inserts considerably faster.
If you have to stay with PostgreSQL for other reasons (bigger data model, MVCC, security) then I suggest you change your data model and ALTER COLUMN data SET TYPE bytea STORAGE external. Since the data column is about 4 x 5 million = 20MB it will be stored out-of-line anyway, but if you explicitly set it, then you know exactly what you have.
Then create a custom function in C that fetches your data value "directly" using the PG_GETARG_BYTEA_P_SLICE() macro and that would look somewhat like this (I am not a very accomplished PG C programmer so forgive me any errors, but this should help you on your way):
// Function get_data_value() -- Get a 4-byte value from a bytea
// Arg 0: bytea* The data
// Arg 1: int32 The position of the element in the data, 1-based
PG_FUNCTION_INFO_V1(get_data_value);
Datum
get_data_value(PG_FUNCTION_ARGS)
{
int32 element = PG_GETARG_INT32_P(1) - 1; // second argument, make 0-based
bytea *data = PG_GETARG_BYTEA_P_SLICE(0, // first argument
element * sizeof(int32), // offset into data
sizeof(int32)); // get just the required 4 bytes
PG_RETURN_INT32_P((int32*)data);
}
The PG_GETARG_BYTEA_P_SLICE() macro retrieves only a slice of data from the disk and is therefore very efficient.
There are some samples of creating custom C functions in the docs.
Your query now becomes:
SELECT name, get_data_value(data, 3916863) AS weight FROM object ORDER BY weight DESC;
I have a date dimension and a fact table. I have two measures:
WorkOrdersCount:=
count(
FactWorkOrderLifeCycle[Clientsid]
)
and
WorkOrdersLastYearCount:=
CALCULATE(
count(FactWorkOrderLifeCycle[Clientsid]),
SAMEPERIODLASTYEAR(DimDate[FullDate])
)
WorkOrdersCount is simple and works fine. I thought WorkOrdersLastYearCount would be simple as well, but I now realize I don't understand SAMEPERIODLASTYEAR().
My date dimension has a DateSID column containing an integer representation of date as YYYYMMDD. It has two recordkeeping rows with SIDs of -1 and -2 for unknown and TBD dates. I'm only using the -1 row in this solution. The data is stored in a SQL Server table and the FullDate column is a "date" type. The actual value is 1900-01-01.
My fact, FactWorkOrderLifecycle, has a field called InvoicedDateSID that can have a null value which I replace with -1.
No errors are thrown in Visual Studio or when processing the olap, but upon referencing the column in a pivot table I get the following error:
ERROR - CALCULATION ABORTED: Calculation error in measure
'FactWorkOrderLifeCycle'[WorkOrdersLastYearCount]: An invalid numeric
representation of a date value was encountered.
Things I've tried (not all make sense):
changed SID values to positive integers
changed date value in dimdate to 9999-12-31 instead of 1900-01-01 when I saw that DAX dates might start at 1900-03-01
adding other dimensions to the pivot first to see if the formula calculates correctly at all.
I'm a DAX noob and I'm not sure how to troubleshoot this. Any help is appreciated!
Make sure you calendar table is indeed using a Date data type.
Remove any time component of your dates.
Make sure there are no gaps and no duplicates in your Calendar table.
Make sure you are using fields from your Calendar table on the pivot, and NOT date related fields from your data table.