NDB Query Chaining oddity - Order + Inequality? - google-app-engine

I just spent a few hours debugging something that seems odd. However, I can't tell if it is a bug or I'm just doing something wrong. Short version, it seems if I have an inequality filter on an NDB query AND an order, they must be the same line.
Note: All the data shown below was created in order, even though I was futzing with the dates in the datastore so some do not have microseconds.
#Q1. Yields the correct results:
q = BlogPost.query().filter(BlogPost.published_date > PUBLISHED_DATE_MIN).order(-BlogPost.published_date)
for p in q.fetch(1000):
print "%s - %s" % (p.published_date, p.title)
# 2014-03-02 21:49:25 - First
# 2014-03-01 22:51:14.998963 - Should be 2nd
# 2014-03-01 21:49:54.273152 - Should be Third
Here is what I initially had. Note: order is on separate line
q = BlogPost.query().filter(BlogPost.published_date > PUBLISHED_DATE_MIN)
q.order(-BlogPost.published_date)
for p in q.fetch(1000):
print "%s - %s" % (p.published_date, p.title)
# 2014-03-01 21:49:54.273152 - Should be Third
# 2014-03-01 22:51:14.998963 - Should be 2nd
# 2014-03-02 21:49:25 - First
The NDB query appears to just be unordered or reverse ordered, however, then I remove the inequality and I get:
q = BlogPost.query() #.filter(BlogPost.published_date > PUBLISHED_DATE_MIN)
q.order(-BlogPost.published_date)
for p in q.fetch(1000):
print "%s - %s" % (p.published_date, p.title)
# 2014-03-02 21:49:25 - First
# 2014-03-01 22:51:14.998963 - Should be 2nd
# 2014-03-01 21:49:54.273152 - Should be Third
I am seeing this behavior in the sdk console as well as the remote console and on the actual appspot when I deploy my code. Is this a bug (not likely) or something I am missing?

When you run q.order(-BlogPost.published_date) it creates and returns a new query which you aren't assigning to anything.
You want to have:
q = q.order(-BlogPost.published_date)

Related

Skip one iteration and adding conditions Python (Basic Question)

I have been reading a text file a object and create a list with the contents.
textfile:
Activity Time Location
Football 8-9 Pitch
Basketball 9-10 Gym
Lunch 11-12 Home
Read 13-14 Library
Swim 14-15 Pool
openTime = 6
closeTime = 15
come = int(input('When do you want to come?'))
leave = int(input('When do you want to leave?'))
# endtime_of_activity and startTimeofactivity is equal to the startingtime
# of each activity and the end time of each activity in the textfile
# (taken from a list that I have been splitting).
for i in range(len(my_list)):
item = my_list[i]
if (i == 1):
continue
if closeTime <= come <= endtime_of_activity and startTimeofactivity < leave <= closeTime:
print(item.activities)
My question: As you can read in the textfile there are some activities appering on different times. For example football between 8 and 9. With the code I want to be able to skip the second element (basketball) as the code is doing, however, I want the if statement under "continue" to work. If i type that im coming 8 and leaving at 12 I want all the activities (excluding the second one) to show. This works for me when I'm doing a regular for-loop without skiping the second activity, like when im just writing: for i in my_list, then adding on the condition, but when Im doing the code above it shows me all the activites (except basketball) independeltly of when I chose to come and leave. What have I missed? How could I write the code better?
If you want to skip a certain activity, simply test if the activitiy to be printed is the same and skip it if it is:
with open("f.txt","w") as f:
f.write(("Activity Time Location\nFootball 8-9 "
"Pitch\nBasketball 9-10 Gym\nLunch 11-12"
" Home\nRead 13-14 Library\nSwim 14-15"))
data = []
with open("f.txt") as f:
for line in f:
act, time, what = (line.strip().split(" ") + ["","",""])[:3]
if data:
try:
time=list(map(int, time.split("-")))
except ValueError:
continue # invalid time: skip row
data.append([act,time,what])
header,*data = data
openTime = 6
closeTime = 15
come = 9
leave = 12
skip=set(["Basketball"])
fmt = "{:<15}"*len(header)
print(fmt.format(*header))
for act,(start,stop),what in data:
if act in skip:
continue
if start >= come and stop <= leave and openTime <= come and closeTime >= leave:
print(fmt.format(act, f"{start}-{stop}", what))
Output:
Activity Time Location
Football 8-9 Pitch
Lunch 11-12 Home

How to check df rows that has a difference between 2 columns and then send it to another table to verify information

I’m very new to python and am trying really hard these last few days on how to go through a df row by row, and check each row that has a difference between columns dQ and dCQ. I just said != 0 since there could be a pos or neg value. Now if this is true, I would like to check in another table whether certain criteria are met. I'm used to working in R, where I could store the df into a variable and call upon the column name, I can't seem to find a way to do it in python. I posted all of the code I’ve been playing with. I know this is messy, but any help would be appreciated. Thank you!
I've tried installing different packages that wouldn't work, I tried making a for loop (I failed miserably), maybe a function? I’m not sure where to even look. I've never learned Python, I’m really doing my best watching videos online and reading on here.
import pyodbc
import PyMySQL
import pandas as pd
import numpy as np
conn = pyodbc.connect("Driver={ODBC Driver 17 for SQL Server};"
"Server=***-***-***.****.***.com;"
"Database=****;"
"Trusted_Connection=no;"
"UID=***;"
"PWD=***")
# cur = conn.cursor()
# cur.execute("SELECT TOP 1000 tr.dQ, po.dCQ,
tr.dQ - po.dCQ as diff FROM [IP].[dbo].
[vT] tr (nolock) JOIN [IP].[dbo].[vP] po
ON tr.vchAN = po.vchCustAN WHERE tr.dQ
!= po.dCQ")
# query = cur.fetchall()
query = "SELECT TOP 100 tr.dQ, po.dCQ/*, tr.dQ -
po.dCQ as diff */FROM [IP].[dbo].[vT]
tr (nolock) INNER JOIN [IP].[dbo].[vP] po ON
tr.vchAN = po.vchCustAN WHERE tr.dQ !=
po.dCQ"
df = pd.read_sql(query, conn)
#print(df[2,])
cursor = conn.cursor(PyMySQL.cursors.DictCursor)
cursor.execute("SELECT TOP 100 tr.dQ, po.dCQ/*,
tr.dQ - po.dCQ as diff */FROM [IP].[dbo].
[vT] tr (nolock) INNER JOIN [IP].[dbo].
[vP] po ON tr.vchAN = po.vchCustAN
WHERE tr.dQ != po.dCQ")
result_set = cursor.fetchall()
for row in result_set:
print("%s, %s" % (row["name"], row["category"]))
# if df[3] != 0:
# diff = df[1]-df[2]
# print(diff)
# else:
# exit
# cursor = conn.cursor()
# for row in cursor.fetchall():
# print(row)
#
# for record in df:
# if record[1] != record[2]:
# print(record[3])
# else:
# record[3] = record[1]
# print(record)
# df['diff'] = np.where(df['dQ'] != df["dCQ"])
I expect some sort of notification that there's a difference in row xx, and now it will check in table vP to verify we received this data's details. I believe i can get to this point, if i can get the first part working. Any help is appreciated. I'm sorry if this question is not clear, i will do my best to answer any questions someone may have. Thank you!
One solution could be to make a new column where you store the result of the diff between df[1] and df[2]. One note first. It might be more precise to either name your columns when you make the df, then reference them with df['name1'] and df['name2'], or use df.iloc[:,1] and df.iloc[:,2]. Also note that column numbers start with zero, so these would refer to the second and third columns in the df. The reason to use iloc is and the colons is to explicitly state that you want all rows and and column numbers 1 and 2. Otherwise, with df[1] or df[2] if your df was transposed that may actually refer to what you think of as the index. Now, on to a solution.
You could try
df['diff']=df.iloc[:,1]-df.iloc[:,2]
df['diff_bool']=np.where(df['diff']==0,False, True)
or you could combine this into one method
df['diff_bool']==np.where(df.iloc[:,1]-df.iloc[:,2]==0,False, True)
This will create a column in your df that says if there is a difference between columns one and two. You don't actually need to loop through row by row because pandas functions work like matrix math, so df.iloc[:,1]-df.iloc[:,2] will apply the subtraction row by row automatically.

un-R behaviour of ud.convert()

Here is a data.frame df in which I want to convert the value column to g/g (grams/gram).The first two entries are in ug/mg (micrograms/milligram) and the last two entries are in ng/mg (nanograms/milligram). However, the ud.convert() function only seems to consider the first unit entry (i.e. ug/mg) it encounters to then convert all value entries from that unit, ignoring the change in units in row #3.
require(udunits2)
df = data.frame(
value = rep(1,4),
unit = c(rep('ug/mg', 2), rep('ng/mg', 2)),
stringsAsFactors = FALSE
)
df$value2 = ud.convert(df$value, df$unit, 'g/g')
df
# value unit value2
# 1 1 ug/mg 0.001
# 2 1 ug/mg 0.001
# 3 1 ng/mg 0.001
# 4 1 ng/mg 0.001
Every other R function I can think of does such an operation for each row. Consider paste() or substr():
paste(df$value, df$unit, 'g/g', sep = '---')
substr(df$unit,1,2)
In my opinion this is a very un-R behavior of ud.convert() and should be changed or at least a warning should be given. Or am I overlooking something? The conversion happens in the C-function R_ut_convert. Unfortunately, I don't know any C to propose a change ;)

Netbeans regex - Find and Replace (Ctrl + H)

I'm exploring regex, but I simply could not achieve exactly what I want yet. I'm using NetBeans and I need to swap all strncpy(... , sizeof(x)) to strncpy(... , sizeof(x) -1 ), i.e, add the "-1" between the last parenthesis.
An example should be:
strncpy(data->error, t_result[ID(data->modulo)].status, sizeof(data->error)); //need below
strncpy(data->error, t_result[ID(data->modulo)].status, sizeof(data->error) - 1);
See regex in use here
(strncpy\(.*?sizeof\([^)]*\))
(strncpy\(.*?sizeof\([^)]*\)) Capture the following into capture group 1
strncpy\( Matches strncpy( literally
.*? Matches any character any number of times, but as few as possible
sizeof\( Matches sizeof( literally
[^)]* Matches any character except ) any number of times
\) Matches ) literally
Replacement $1 - 1
Result in:
strncpy(data->error, t_result[ID(data->modulo)].status, sizeof(data->error) - 1);

"Backward" pagination with cursors in db.* with use not documented tricks following Guido example for ndb.*

Is code bellow from backward cursor will be compatible?
This is advance question since it is related to undocumented feature of *.db and Google App Engine - maybe it against some documention of cursor()!
https://developers.google.com/appengine/docs/python/datastore/queryclass?hl=pl#Query_fetch
Whatever following documentation ... future invocation of the same query ... not will give backward cursor feature. I was inverted direction of query to go backward.
I was study Guido example for ndb.* and found that reverse cursor could be implemented in db.* with use some tricks used in ndb.* for cursor.reserve() (+/-)!
Backward pagination with cursor is working but missing an item
So query is DeleteMe.all().order('rank') and inverted is DeleteMe.all().order('-rank'). I assume that before sort we could apply some filters and it will still use same index scan.
Now is code doing real backward cursor in db.* which I want to confirm will be compatible - if it possible.
messages = []
class DeleteMe(db.Model):
rank = db.IntegerProperty()
db.delete(DeleteMe.all(keys_only=True))
for rank in range(50):
e = DeleteMe(rank = rank)
e.put()
messages.append('forward +5')
q = DeleteMe.all().order('rank')
r = q.fetch(5)
for e in r:
messages.append(e.rank)
endCursor = q.cursor()
messages.append('end to %s' % endCursor)
messages.append('forward +5')
startCursor = endCursor
messages.append('start from %s' % startCursor)
q = DeleteMe.all().order('rank').with_cursor(startCursor)
r = q.fetch(1)
# 1st trick
backwardCursor = q.cursor()
messages.append('backward cursor %s' % backwardCursor)
q.with_cursor(backwardCursor)
r.extend(q.fetch(5-1))
for e in r:
messages.append(e.rank)
endCursor = q.cursor()
messages.append('end to %s' % endCursor)
messages.append('backward +5')
startCursor = backwardCursor
messages.append('modified start from %s' % startCursor)
# 2st trick
q = DeleteMe.all().order('-rank').with_cursor(startCursor)
r = q.fetch(5)
for e in r:
messages.append(e.rank)
endCursor = q.cursor()
messages.append('end to %s' % endCursor)
And result of working backward cursor:
forward +5
0
1
2
3
4
end to E-ABAOsB8gEEcmFua_oBAggE7AGCAhxqCGVyZXN0MjRochALEghEZWxldGVNZRjNnQYMFA==
forward +5
start from E-ABAOsB8gEEcmFua_oBAggE7AGCAhxqCGVyZXN0MjRochALEghEZWxldGVNZRjNnQYMFA==
backward cursor E-ABAOsB8gEEcmFua_oBAggF7AGCAhxqCGVyZXN0MjRochALEghEZWxldGVNZRjOnQYMFA==
5
6
7
8
9
end to E-ABAOsB8gEEcmFua_oBAggJ7AGCAhxqCGVyZXN0MjRochALEghEZWxldGVNZRjSnQYMFA==
backward +5
modified start from E-ABAOsB8gEEcmFua_oBAggF7AGCAhxqCGVyZXN0MjRochALEghEZWxldGVNZRjOnQYMFA==
4
3
2
1
0
end to E-ABAOsB8gEEcmFua_oBAggA7AGCAhxqCGVyZXN0MjRochALEghEZWxldGVNZRjJnQYMFA==
The is little long but should be. Please give some hint if such use of db.* is safe since it allow very fast paging.
I don't think this is how you're supposed to do it. When using a backward cursor you should also use a reverse query -- in your case that would mean order('-rank').

Resources