I have two different table for date and time in one application. In date table "date" is stored in "datetime" format and in time table "time" part is stored in varchar format.
Both the table id is stored in transaction table for date time value.
I have an issue while querying the database specifying particular datetime value from transaction table.
Date table
ID_DAT DATE_DAT (smalldatetime)
20000101 01/01/2000 0:00
20000102 02/01/2000 0:00
20000103 03/01/2000 0:00
20000104 04/01/2000 0:00
20000105 05/01/2000 0:00
20000106 06/01/2000 0:00
20000107 07/01/2000 0:00
20000108 08/01/2000 0:00
20000109 09/01/2000 0:00
20000110 10/01/2000 0:00
Time Table
ID_TIM HOUR_TIM MINUTE_TIM STRING_TIM (varchar)
0 0 0 00:00
1 0 1 00:01
2 0 2 00:02
3 0 3 00:03
4 0 4 00:04
5 0 5 00:05
6 0 6 00:06
7 0 7 00:07
8 0 8 00:08
9 0 9 00:09
10 0 10 00:10
Transaction data sample (id may not match with master provided)
SEQNUM ID_DAT ID_TIM ORIGINAL_VALUE_PER
2495089 20130424 30 10.0000000000
2495089 20130424 60 12.0000000000
2495089 20130424 90 15.0000000000
2495089 20130424 120 20.0000000000
2495089 20130424 150 24.0000000000
2495089 20130424 180 28.0000000000
2495089 20130424 210 34.0000000000
now I want to query transaction data let's say after 03:30 for the particular day.
Please guide me how can i achieve the same.
Thanks
Something vaguely like this:
SELECT * from TRANSACTION_TABLE TT
join DATE_TABLE DT on TT.ID_DAT = DT.ID_DAT
join TIME_TABLE TM on TT.ID_TIM = TM.ID_TIM
where (DT.DATE_DAT > '10/19/2013' and DT.DATE_DAT < '10/21/2013')
or (DT.DATE_DAT = '10/21/2013' and HOUR_TIM < 3 and MINUT_TIME < 30)
or (DT.DATE_DAT = '10/19/2013' and HOUR_TIM >= 3 and MINUT_TIME >= 30)
Related
Please see my data below;
data finance;
input id loan1 loan2 loan3 assets home$ type;
datalines;
1 93000 98000 45666 new 1
1 98000 45678 98765 67 old 2
1 55000 56764 435371 54 new 1
2 7000 6000 7547 57 new 1
4 67333 87444 98666 34 old 1
4 98000 68777 986465 23 new 1
5 4555 334 652 12 new 1
5 78999 98999 80000 34 new 1
5 889 989 676 3 new 1
;
data finance1;
set finance;
if loan1<80000 then conc'level1';
if loan2 <80000 and home='new' then borrowcap = 'high';
run;
I would like the following dataset, as you can see although there are multiple rows for each ID initially, if there was a level1 or high in any of those rows, I would like to capture that in the same row.
data finance;
input id conc$ borrowcap$;
datalines;
1 level1 high
2 level1 high
4 level1
5 level1 high
;
Any help is appreciated!
Use retain statement, you can keep value from any row for each ID. Use by statement + if last.var statement, you can keep only one row for each ID.
data finance;
input id loan1 loan2 loan3 assets home$ type;
datalines;
1 93000 98000 45666 . new 1
1 98000 45678 98765 67 old 2
1 55000 56764 435371 54 new 1
2 7000 6000 7547 57 new 1
4 67333 87444 98666 34 old 1
4 98000 68777 986465 23 new 1
5 4555 334 652 12 new 1
5 78999 98999 80000 34 new 1
5 889 989 676 3 new 1
;
data finance1;
set finance;
by id;
retain conc borrowcap;
length conc borrowcap $8.;
if first.id then call missing(conc,borrowcap);
if loan1<80000 then conc='level1';
if loan2<80000 and home='new' then borrowcap = 'high';
if last.id;
run;
I am attempting to return day of the week (i.e. Monday = 1, Tuesday = 2, etc) based on a date column ("Posting_date"). I tried a for loop but got it wrong:
#First date of table was a Sunday (1 March 2019) => so counter starts at 6
posting_df3['Day'] = (posting_df3['Posting_date'] - dt.datetime(2019,3,31)).dt.days.astype('int16')
# Start counter on the right date (31 March 2019 is a Sunday)
count = 7
for x in posting_df3['Day']:
if count != 7:
count = 1
else:
count = count + 1
posting_df3['Day'] = count
Not sure if there are other ways of doing this. Attached is an image of my database structure:
level_0 Posting_date Reservation date Book_window ADR Day
0 9 2019-03-31 2019-04-01 -1 156.00 0
1 25 2019-04-01 2019-04-01 0 152.15 1
2 11 2019-04-01 2019-04-01 0 149.40 1
3 42 2019-04-01 2019-04-01 0 141.33 1
4 45 2019-04-01 2019-04-01 0 159.36 1
... ... ... ... ... ... ...
4278 739 2020-02-21 2019-04-17 310 253.44 327
4279 739 2020-02-22 2019-04-17 310 253.44 328
4280 31 2020-03-11 2019-04-01 345 260.00 346
Final output should be 2019-03-31 Day column should return 7 since it is a Sunday
and 2019-04-01 Day column should return 1 since its Monday etc
You can do it this way
df['weekday']=pd.to_datetime(df['Posting_date']).dt.weekday+1
Input
level_0 Posting_date Reservation_date Book_window ADR Day
0 9 3/31/2019 4/1/2019 -1 156.00 0
1 25 4/1/2019 4/1/2019 0 152.15 1
2 11 4/1/2019 4/1/2019 0 149.40 1
3 42 4/1/2019 4/1/2019 0 141.33 1
4 45 4/1/2019 4/1/2019 0 159.36 1
Output
level_0 Posting_date Reservation_date Book_window ADR Day weekday
0 9 3/31/2019 4/1/2019 -1 156.00 0 7
1 25 4/1/2019 4/1/2019 0 152.15 1 1
2 11 4/1/2019 4/1/2019 0 149.40 1 1
3 42 4/1/2019 4/1/2019 0 141.33 1 1
4 45 4/1/2019 4/1/2019 0 159.36 1 1
I have a problem, I want to get the daily accumulations of the precipitation variable with the command timeAverage of package openair, so I try this:
CovTemp <- read.table("CovPrec", header = TRUE, sep = ";",
stringsAsFactors = FALSE, dec = ",", na.strings = "NA")
date <- ajuste_tiempos(CovPrec)
Met_CovPrec <- cbind(date, CovPrec[-c(3,4)])
Met_CovPrec$date <- as.POSIXct(strptime(Met_CovPrec$date,
format = "%d/%m/%Y %H:%M", "GMT"))
Met_CovPrec_prom_day <- timeAverage(Met_CovPrec, avg.time = "day", statistic = "sum")
but the result applies to the entire data frame and not just to the data column:
the original data frame CovPrec
MS_NR SS_NR DATE HOUR VALUE
1 13095010 240 1/01/2014 0:00:00 NA
2 13095010 240 1/01/2014 0:10:00 NA
3 13095010 240 1/01/2014 0:20:00 NA
4 13095010 240 1/01/2014 0:30:00 NA
5 13095010 240 1/01/2014 0:40:00 NA
6 13095010 240 1/01/2014 0:50:00 NA
the result Met_CovPrec_prom_day:
date MS_NR SS_NR VALUE
1 2014-01-01 00:00:00 1885681440 34560 0
2 2014-01-02 00:00:00 1885681440 34560 0
3 2014-01-03 00:00:00 1885681440 34560 0
4 2014-01-04 00:00:00 1885681440 34560 28
5 2014-01-05 00:00:00 1885681440 34560 2
6 2014-01-06 00:00:00 1885681440 34560 0
Thank you for your answers
"timeAverage" always applies to the entire data frame. You need to select columns, try with:
Met_CovPrec_prom_day <- timeAverage(Met_CovPrec[, c("date", "VALUE")], avg.time = "day", statistic = "sum")
I am new to R and i've been stuck on this. I have a data set below wherein I created a new array list variable called 'amountOfTxn_array' that contains three numeric values in sequential order. These are amounts of transactions taken from Jan to Mar. My objective is to create new variables from this array list that iterate each data elements in the 'amountOfTxn_array'.
> head(myData_05_Array)
Index accountID amountOfTxn_array
1:00 8887 c(36.44, 75.00,185.24)
2:00 13462 c(639.45,656.10,237.00)
3:00 47249 c(0, 24, 2012)
4:00 49528 c(1189.20,2326.26,1695.89)
5:00 57201 c(24.67, 0.00, 0.00)
6:00 57206 c(0.00, 661.98,2957.68)
str(myData_05_Array)
Classes ‘data.table’ and 'data.frame': 3176 obs. of 4 variables:
$ accountID : int 8887 13462 47249 49528 57201 57206 58522 79073 80465 81032 ...
$ amountOfTxn_200501: num 36.4 639.5 0 1189.2 24.7 ...
$ amountOfTxn_200502: num 75 656 24 2326 0 ...
$ amountOfTxn_200503: num 185 237 2012 1696 0 ...
$ amountOfTxn_array :List of 3176
Also, an example code for creating a new variable is provided below wherein I would like to tag 1 if a value in the array is greater than 100 and 0 else. When I ran the example code, I am getting "Error: (list) object cannot be coerced to type ‘double’ error. May I ask for a solution for this. I would highly appreciate any response.
Thanks!
> for(i in 1:3)
+ {
+ if(myData_05_Array$amountOfTxn_array[i] > 100){
+ myData_05_Array$testArray[i] <- 1
+ }
+ else{
+ myData_05_Array$testArray[i] <- 0
+ }
+ }
Error: (list) object cannot be coerced to type 'double'
What I am expecting as the output is as follows:
amountOfTxn_testArray
c(0, 0, 1)
c(1, 1, 1)
c(0, 0, 0)
c(1, 1, 1)
c(0, 0, 0)
c(0, 1, 1)
"Doing calculations for 24 columns is quite cumbersome"
a HA! welcome to the dplyr world:
library(dplyr)
#generate dummy data
dummyDf <-read.table(text='Index accountID Jan Feb March
1:00 8887 36.44 75.00 185.24
2:00 13462 639.45 656.10 237.00
3:00 47249 0 24 2012
4:00 49528 1189.20 2326.26 1695.89
5:00 57201 24.67 0.00 0.00
6:00 57206 0.00 661.98 2957.68', header=TRUE, stringsAsFactors=FALSE)
mutate column by column index
#the dot (.) argument refers to the focal column
df %>% mutate_at(3:5, funs(as.numeric(.>100)))
mutate columns by predefined names
changeVars =c("Jan","Feb","March")
df %>% mutate_at(.cols=changeVars, funs(as.numeric(.>100)))
mutate columns if some condition is met
df %>%mutate_if(is.double, funs(as.numeric(.>100)))
output:
Index accountID Jan Feb March
1 1:00 8887 0 0 1
2 2:00 13462 1 1 1
3 3:00 47249 0 0 1
4 4:00 49528 1 1 1
5 5:00 57201 0 0 0
6 6:00 57206 0 1 1
I am attempting to normalize data using SSIS in the following format:
SerialNumber Date R01 R02 R03 R04
-------------------------------------------
1 9/25/2011 9 6 1 2
1 9/26/2011 4 1 3 5
2 9/25/2011 7 3 2 1
2 9/26/2011 2 4 10 6
Each "R" column represents a reading for an hour. R01 is 12:00 AM, R02 is 1:00 AM, R03 is 2:00 AM and R04 is 3:00 AM. I would like to transform the data and store it in another table in this format (line breaks for readability):
SerialNumber Date Reading
-----------------------------------------
1 9/25/2011 12:00 AM 9
1 9/25/2011 1:00 AM 6
1 9/25/2011 2:00 AM 1
1 9/25/2011 3:00 AM 2
1 9/26/2011 12:00 AM 4
1 9/26/2011 1:00 AM 1
1 9/26/2011 2:00 AM 3
1 9/26/2011 3:00 AM 5
2 9/25/2011 12:00 AM 7
2 9/25/2011 1:00 AM 3
2 9/25/2011 2:00 AM 2
2 9/25/2011 3:00 AM 1
2 9/26/2011 12:00 AM 2
2 9/26/2011 1:00 AM 4
2 9/26/2011 2:00 AM 10
2 9/26/2011 3:00 AM 6
I am using the unpivot transformation in an SSIS 2008 package to accomplish most of this but the issue I am having is adding the hour to the date based on the column of the value I am working with. Is there a way to accomplish this in SSIS? Keep in mind that this is a small subset of data of around 30 million records so performance is an issue.
Thanks for the help.
Create a SSIS package and add a new Data Flow Task and configure this DFT (Edit...)
Add a new data source
Add UNPIVOT component and configure it thus:
Add DATA CONVERSION component:
Temporary results:
Add DERIVED COLUMN component:
For NewData derived column you can use this expression: DATEADD("HOUR",(Type == "R01" ? 0 : (Type == "R02" ? 1 : (Type == "R03" ? 2 : 3))),Date). «boolean_expression» ? «when_true» : «when_false» operator is like IIF() function (from VBA/VB) and is used to calculate number of hours to add: for "R01" -> 0 hours, for "R02" -> 1 hour, for "R03" -> 2 hours or else 3 hours (for "R04").
Results: