GQL Query (python) to retrieve data using timestamp - google-app-engine

I have a table in Google Datastore that holds n values in n columns, and one of them is a timestamp.
The timestamp property is defined like this, inside the table class (Java):
#Persistent
private Date timestamp;
The table is like this:
id | value | timestamp
----------------------------------------------------------
1 | ABC | 2014-02-02 21:07:40.822000
2 | CDE | 2014-02-02 22:07:40.000000
3 | EFG |
4 | GHI | 2014-02-02 21:07:40.822000
5 | IJK |
6 | KLM | 2014-01-02 21:07:40.822000
The timestamp column was added later to the table, so some rows have not the corresponding timestamp value.
I'm trying, using Python Google App Engine to build an api that returns the total number of rows that have a timestamp >= to some value.
For example:
-- This is just an example
SELECT * FROM myTable WHERE timestamp >= '2014-02-02 21:07:40.822000'
I've made this class, in python:
import sys
...
import webapp2
from google.appengine.ext import db
class myTable(db.Model):
value = db.StringProperty()
timestamp = datetime.datetime
class countHandler(webapp2.RequestHandler):
def get(self, tablename, timestamp):
table = db.GqlQuery("SELECT __key__ FROM " + tablename + " WHERE timestamp >= :1", timestamp )
recordsCount = 0
for p in table:
recordsCount += 1
self.response.out.write("Records count for table " + tablename + ": " + str(recordsCount))
app = webapp2.WSGIApplication([
('/count/(.*)/(.*)', countHandler)
], debug=True)
I've successfully deployed it and I'm able to call it, but for some reason I don't understand it's always saying
Records count for table myTable: 0
I'm struggling with the data type for the timestamp.. I think the issue is there.. any idea? which type should it be declared?
Thank you!

You problem (as discussed in the comments as well) seems to be that you are passing a string (probably) to the GqlQuery parameters.
In order to filter your query by datetime you need to pass a datetime object in to the query params. For that take a look here on how to convert that.
Small example:
# not sure how your timestamps are formatted but supposing they are strings
# of eg 2014-02-02 21:07:40.822000
timestamp = datetime.datetime.strptime(timestamp, "%Y-%m-%d %H:%M:%S.%f" )
table = db.GqlQuery("SELECT __key__ FROM " + tablename + " WHERE timestamp >= :1", timestamp)

Related

Store a list of values as a string when creating a table in snowflake

I am trying to create a table with 5 columns. COLUMN #2 (PROGRESS) is a comma seperated list (i.e 1,2,3,4 etc.) but when trying to create this table as either a string, variant or varchar, Snowflake refuses to allow this. Any advice on how I can create a column seperated list from a CSV? I tried to import the data as a TSV, XML, as well as a JSON file but no success.
create or replace TABLE AD_HOC.TEMP.NEW_DATA (
VISITOR_ID VARCHAR(16777216),
PROGRESS VARCHAR(16777216),
DATE DATETIME,
ROLE VARCHAR(16777216),
FIRST_VISIT DATETIME
)COMMENT='Interaction data'
;
Goal:
VISITOR_ID | PROGRESS | DATE | ROLE | FIRST_VISIT
111 | [1,2,3] | 1/1/2022 | OWNER | 1/1/2021
123 | [1] | 1/2/2022 | ADMIN | 2/2/2021
23321 | [1,2,3,4] | 2/22/2022 | USER | 3/12/2021
I encoded the column in python and loaded the data in Snowflake!
from sklearn.preprocessing import MultiLabelBinarizer
mlb = MultiLabelBinarizer()
df = doc_data.join(pd.DataFrame(mlb.fit_transform(doc_data.pop('PROGRESS')),
columns=mlb.classes_,
index=doc_data.index))
df

Adding multiple records from a string

I have a string of email addresses. For example, "a#a.com; b#a.com; c#a.com"
My database is:
record | flag1 | flag2 | emailaddresss
--------------------------------------------------------
1 | 0 | 0 | a#a.com
2 | 0 | 0 | b#a.com
3 | 0 | 0 | c#a.com
What I need to do is parse the string, and if the address is not in the database, add it.
Then, return a string of just the record numbers that correspond to the email addresses.
So, if the call is made with "A#a.com; c#a.com; d#a.com", the rountine would add "d#a.com", then return "1, 3,4" corresponding to the records that match the email addresses.
What I am doing now is calling the database once per email address to look it up and confirm it exists (adding if it doesn't exist), then looping thru them again to get the addresses 1 by 1 from my powershell app to collect the record numbers.
There has to be a way to just pass all of the addresses to SQL at the same time, right?
I have it working in powershell.. but slowly..
I'd love a response from SQL as shown above of just the record number for each email address in a single response. That is, "1,2,4" etc.
My powershell code is:
$EmailList2 = $EmailList.split(";")
# lets get the ID # for each eamil address.
foreach($x in $EmailList2)
{
$data = exec-query "select Record from emailaddresses where emailAddress = #email" -parameter #{email=$x.trim()} -conn $connection
if ($($data.Tables.record) -gt 0)
{
$ResponseNumbers = $ResponseNumbers + "$($data.Tables.record), "
}
}
$ResponseNumbers = $($ResponseNumbers+"XX").replace(", XX","")
return $ResponseNumbers
You'd have to do this in 2 steps. Firstly INSERT the new values and then use a SELECT to get the values back. This answer uses delimitedsplit8k (not delimitedsplit8k_LEAD) as you're still using SQL Server 2008. On the note of 2008 I strongly suggest looking at upgrade paths soon as you have about 6 weeks of support left.
You can use the function to split the values and then INSERT/SELECT appropriately:
DECLARE #Emails varchar(8000) = 'a#a.com;b#a.com;c#a.com';
WITH Emails AS(
SELECT DS.Item AS Email
FROM dbo.DelimitedSplit8K(#Emails,';') DS)
INSERT INTO YT (emailaddress) --I don't know what the other columns value should be, so have excluded
SELECT E.Email
FROM dbo.YourTable YT
LEFT JOIN Emails E ON YT.emailaddress = E.Email
WHERE E.Email IS NULL;
SELECT YT.record
FROM dbo.YourTable YT
JOIN dbo.DelimitedSplit8K(#Emails,';') DS ON DS.Item = YT.emailaddress;

How to import column value in Cassandra like one having such values "13/01/09 23:13"?

Query:
CREATE TABLE IF NOT EXISTS "TEMP_tmp".temp (
"Date_Time" timestamp,
PRIMARY KEY ("Date_Time")
);
CSV Contains "13/01/09 23:13" values.
Error : Failed to import 1 rows: ParseError - Failed to parse 13/01/09 23:13 : invalid literal for long() with base 10: '13/01/09 23:13', given up without retries.
What Data Type should I Use ?
Default Cqlsh timestamp format is : year-month-day hour:min:sec+timezone
Example :
2017-02-01 05:28:36+0000
You either change your date format to above or you can change the format from cqlshrc file
Check this answer custom cassandra / cqlsh time_format
cassandra will store timestamp as 2017-02-01 08:28:21+0000. For example, if I store a timestamp in your described table "TEMP_tmp".temp:
cassandra#cqlsh> INSERT INTO TEMP_tmp.temp ("Date_Time") VALUES ( toTimestamp(now()));
cassandra#cqlsh> SELECT * FROM TEMP_tmp.temp;
Date_Time
--------------------------
2017-02-01 09:14:29+0000
If we copy all the data to csv:
cassandra#cqlsh> COPY Temp_tmp.temp TO 'temp.csv';
temp.csv will contain:
2017-02-01 09:14:29+0000
If we truncate the table:
cassandra#cqlsh> TRUNCATE TABLE TEMP_tmp.temp;
cassandra#cqlsh> SELECT * FROM TEMP_tmp.temp;
Date_Time
--------------------------
Then if we import temp.csv:
cassandra#cqlsh> COPY Temp_tmp.temp FROM 'temp.csv';
Using 1 child processes
Starting copy of Temp_tmp.temp with columns [Date_Time].
Processed: 1 rows; Rate: 1 rows/s; Avg. rate: 1 rows/s
1 rows imported from 1 files in 0.746 seconds (0 skipped).
If you want custom date/time format, then follow Ashraful Islam's answer from your question.

Hive query, better option to self join

So I am working with a hive table that is set up as so:
id (Int), mapper (String), mapperId (Int)
Basically a single Id can have multiple mapperIds, one per mapper such as an example below:
ID (1) mapper(MAP1) mapperId(123)
ID (1) mapper(MAP2) mapperId(1234)
ID (1) mapper(MAP3) mapperId(12345)
ID (2) mapper(MAP2) mapperId(10)
ID (2) mapper(MAP3) mapperId(12)
I want to return the list of mapperIds associated to each unique ID. So for the above example I would want the below returned as a single row.
1, 123, 1234, 12345
2, null, 10, 12
The mapper Strings are known, so I was thinking of doing a self join for every mapper string I am interested in, but I was wondering if there was a more optimal solution?
If the assumption that the mapper column is distinct with respect to a given ID is correct, you could collect the mapper column and the mapperid column to a Map using brickhouse collect. You can clone the repo from that link and build the jar with Maven.
Query:
add jar /complete/path/to/jar/brickhouse-0.7.0-SNAPSHOT.jar;
create temporary function collect as 'brickhouse.udf.collect.CollectUDAF';
select id
,id_map['MAP1'] as mapper1
,id_map['MAP2'] as mapper2
,id_map['MAP3'] as mapper3
from (
select id
,collect(mapper, mapperid) as id_map
from some_table
group by id
) x
Output:
| id | mapper1 | mapper2 | mapper3 |
------------------------------------
1 123 1234 12345
2 10 12

Scala Slick Lifted Date GroupBy

I'm using Scala 2.10 with Slick 1.0.0 and trying to do a lifted query.
I have a table, "Logins", where I'm attempting to do a load, and groupBy on a Timestamp column. However, when I attempt to groupBy, I am running into an issue when I try and format the Timestamp field to extract only the day portion, to group the objects by the same day.
Given the objects:
id | requestTimestamp
1 | Jan 1, 2013 01:02:003
2 | Jan 1, 2013 03:04:005
3 | Jan 1, 2013 05:06:007
4 | Jan 2, 2013 01:01:001
I'd like to return a grouping out of the database by similar days, where, for the sake of brevity, the the following Formatted timestamp to id relationship happens, where the id's would actually be a list of objects
Jan 1, 2013 -> (1, 2, 3)
Jan 2, 2013 (4)
I've got the following slick table object:
private implicit object Logins extends Table[(Int, Timestamp)]("LOGINS") {
def id = column[Int]("ID", O.PrimaryKey)
def requestTimeStamp = column[Timestamp]("REQUESTTIMESTAMP", O.NotNull)
def * = logId ~ requestTimeStamp
}
The following Query method:
val q = for {
l <- Logins if (l.id >= 1 && l.id <= 4)
} yield l
val dayGroupBy = new java.text.SimpleDateFormat("MM/dd/yyyy")
val q1 = q.groupBy(l => dayGroupBy.format(l.requestTimeStamp))
db.withSession {
q1.list
}
However, instead of getting the expected grouping, I get an exception on the line where I attempt the groupBy:
java.lang.IllegalArgumentException: Cannot format given Object as a Date
Does anyone have any suggestions on properly grouping by Timestamps out of the database?
Timestamp and Date are not the same thing! Try to convert Timestamp to Human understandable text using calendar or SimpleDateTime.
Not so sure about the second one though!

Resources