I want something similar to this: if fileObj.is_file() == True: But for a dataset.
I want to check if a date exists befor I select it.
y_begin = 2007
y_end = 2020
begin_date = '05-01'
end_date = '09-31'
ds_so_merge = None
for y in range(y_begin, y_end +1):
begin = str(y) + '-' + begin_date
end = str(y) + '-' + end_date
!!!here checking if the date exists and if not trying the following date!!!
ds_so = dataset.sel(time=slice(begin, end))
if ds_so_merge is None:
ds_so_merge = ds_so
else:
ds_so_merge = ds_so.merge(ds_so_merge)
you can check if a coordinate contains a specific value with value in coord just like you could with a numpy array or a pandas index.
Another option since you're using slices would just be to pull all elements which match the slice criteria, then select the first matched element.
Something like the following should work:
first_time_matching_slice = dataset.sel(time=slice(begin, end)).isel(time=0)
Related
Stackoverflow supports table markdown. For example, to display a table like this:
N_NATIONKEY
N_NAME
N_REGIONKEY
0
ALGERIA
0
1
ARGENTINA
1
2
BRAZIL
1
3
CANADA
1
4
EGYPT
4
You can write code like this:
|N_NATIONKEY|N_NAME|N_REGIONKEY|
|---:|:---|---:|
|0|ALGERIA|0|
|1|ARGENTINA|1|
|2|BRAZIL|1|
|3|CANADA|1|
|4|EGYPT|4|
It would save a lot of time to generate the Stackoverflow table markdown automatically when running Snowflake queries.
The following stored procedure accepts either a query string or a query ID (it will auto-detect which it is) and returns the table results as Stackoverflow table markdown. It will automatically align numbers and dates to the right, strings, arrays, and objects to the left, and other types default to centered. It supports any query you can pass to it. It may be a good idea to use $$ to terminate the string passed into the procedure in case the SQL contains single quotes. You can create the procedure and test it using this script:
create or replace procedure MARKDOWN("queryOrQueryId" string)
returns string
language javascript
execute as caller
as
$$
const MAX_ROWS = 50; // Set the maximum row count to fetch. Tables in markdown larger than this become hard to read.
var [rs, i, c, row, props] = [null, 0, 0, 0, {}];
if (!queryOrQueryId || queryOrQueryId == 0){
queryOrQueryId = `select * from table(result_scan(last_query_id())) limit ${MAX_ROWS}`;
}
queryOrQueryId = queryOrQueryId.trim();
if (isUUID(queryOrQueryId)){
rs = snowflake.execute({sqlText:`select * from table(result_scan('${queryOrQueryId}')) limit ${MAX_ROWS}`});
} else {
rs = snowflake.execute({sqlText:`${queryOrQueryId}`});
}
props.columnCount = rs.getColumnCount();
for(i = 1; i <= props.columnCount; i++){
props["col" + i + "Name"] = rs.getColumnName(i);
props["col" + i + "Type"] = rs.getColumnType(i);
}
var table = getHeader(props);
while(rs.next()){
row = "|";
for(c = 1; c <= props.columnCount; c++){
row += escapeMarkup(rs.getColumnValueAsString(c)) + "|";
}
table += "\n" + row;
}
return table;
//------ End main function. Start of helper functions.
function escapeMarkup(s){
s = s.replace(/\\/g, "\\\\");
s = s.replaceAll('|', '\\|');
s = s.replace(/\s+/g, " ");
return s;
}
function getHeader(props){
s = "|";
for (var i = 1; i <= props.columnCount; i++){
s += props["col" + i + "Name"] + "|";
}
s += "\n";
for (var i = 1; i <= props.columnCount; i++){
switch(props["col" + i + "Type"]) {
case 'number':
s += '|---:';
break;
case 'string':
s += '|:---';
break;
case 'date':
s += '|---:';
break;
case 'json':
s += '|:---';
break;
default:
s += '|:---:';
}
}
return s + "|";
}
function isUUID(str){
const regexExp = /^[0-9a-fA-F]{8}\b-[0-9a-fA-F]{4}\b-[0-9a-fA-F]{4}\b-[0-9a-fA-F]{4}\b-[0-9a-fA-F]{12}$/gi;
return regexExp.test(str);
}
$$;
-- Usage type 1, a simple query:
call stackoverflow_table($$ select * from SNOWFLAKE_SAMPLE_DATA.TPCH_SF1.NATION limit 5 $$);
-- Usage type 2, a query ID:
select * from SNOWFLAKE_SAMPLE_DATA.TPCH_SF1.NATION limit 5;
set quid = (select last_query_id());
call stackoverflow_table($quid);
Edit: Based on Fieldy's helpful feedback, I modified the procedure code to allow passing null or 0 or a blank string '' as the parameter. This will use the last query ID and is a helpful shortcut. It also adds a constant to the code that will limit the returns to a set number of rows. This limit will be applied when using query IDs (or sending null, '', or 0, which uses the last query ID). The limit is not applied when the input parameter is the text of a query to run to avoid syntax errors if there's already a limit applied, etc.
Greg Pavlik's Javascript Stored Procedure solution made me wonder if this would be any easier with the new Python language support in Stored Procedures. This is currently a public-preview feature.
The Python Snowpark API supports returning a result as a Pandas dataframe, and Pandas supports returning a dataframe in Markdown format, via the tabulate package. Here's the stored procedure.
CREATE OR REPLACE PROCEDURE markdown_table(query_id VARCHAR)
RETURNS VARCHAR
LANGUAGE PYTHON
RUNTIME_VERSION = '3.8'
PACKAGES = ('snowflake-snowpark-python','pandas','tabulate', 'regex')
HANDLER = 'markdown_table'
EXECUTE AS CALLER
AS $$
import pandas as pd
import tabulate
import regex
def markdown_table(session, queryOrQueryId = None):
# Validate UUID
if(queryOrQueryId is None):
pandas_result = session.sql("""Select * from table(result_scan(last_query_id()))""").to_pandas()
elif(bool(regex.match("^[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$", queryOrQueryId))):
pandas_result = session.sql(f"""select * from table(result_scan('{queryOrQueryId}'))""").to_pandas()
else:
pandas_result = session.sql(queryOrQueryId).to_pandas()
return pandas_result.to_markdown()
$$;
Which you can use as follows:
-- Usage type 1, use the result from the query ran immediately proceeding the Store-Procedure Call
select * from SNOWFLAKE_SAMPLE_DATA.TPCH_SF1.NATION limit 5;
call markdown_table(NULL);
-- Usage type 2, pass in a query_id
select * from SNOWFLAKE_SAMPLE_DATA.TPCH_SF1.NATION limit 5;
set quid = (select last_query_id());
select $quid;
call markdown_table($quid);
-- Usage type 3, provide a Query string to the Store-Procedure Call
call markdown_table('select * from SNOWFLAKE_SAMPLE_DATA.TPCH_SF1.NATION limit 5');
The table can also be
N_NATIONKEY|N_NAME|N_REGIONKEY
--|--|--
0|ALGERIA|0
1|ARGENTINA|1
2|BRAZIL|1
3|CANADA|1
4|EGYPT|4
giving, so it can be a simpler solution
N_NATIONKEY
N_NAME
N_REGIONKEY
0
ALGERIA
0
1
ARGENTINA
1
2
BRAZIL
1
3
CANADA
1
4
EGYPT
4
I grab the result table and use notepad++ and replace tab \t with pipe space | and then insert by hand the header marker line. I sometime replace the empty null results with the text null to make the results make more sense. the form you use with the start/end pipes gets around the need for that.
DBeaver IDE supports "data export as markdown" and "advanced copy as markdown" out-of-the-box:
Output:
|R_REGIONKEY|R_NAME|R_COMMENT|
|-----------|------|---------|
|0|AFRICA|lar deposits. blithely final packages cajole. regular waters are final requests. regular accounts are according to |
|1|AMERICA|hs use ironic, even requests. s|
|2|ASIA|ges. thinly even pinto beans ca|
|3|EUROPE|ly final courts cajole furiously final excuse|
|4|MIDDLE EAST|uickly special accounts cajole carefully blithely close requests. carefully final asymptotes haggle furiousl|
It is rendered as:
R_REGIONKEY
R_NAME
R_COMMENT
0
AFRICA
lar deposits. blithely final packages cajole. regular waters are final requests. regular accounts are according to
1
AMERICA
hs use ironic, even requests. s
2
ASIA
ges. thinly even pinto beans ca
3
EUROPE
ly final courts cajole furiously final excuse
4
MIDDLE EAST
uickly special accounts cajole carefully blithely close requests. carefully final asymptotes haggle furiousl
id Weather
1 {{KS,'S'},{MO,'S'},{CA,'S'}}
I am trying to update 'S' to 'W' for all KS,MO,CA.
I am executing below query and it is throwing me an error
UPDATE table
SET Weather[][2] ='W' where id=1;
expected output
id Weather
1 {{KS,'W'},{MO,'W'},{CA,'W'}}
I think is possible using ordinary UPDATE.
Let's try to select items what we need to replace:
-- first, second and third item from array and convert to string
SELECT array_to_string(weather[1:1], ',') AS KS,
array_to_string(weather[2:2], ',') AS MO,
array_to_string(weather[3:3], ',') AS CA
FROM test_tbl;
Result: "KS,'S'","MO,'S'","CA,'S'"
Now we can select records records for update using just comparing string with array item(as string):
SELECT * FROM test_tbl
-- KS,'S'
WHERE array_to_string(weather[1:1], ',') = concat('KS,', quote_literal('S'))
-- MO,'S'
-- array_to_string(weather[2:2], ',') = concat('MO,', quote_literal('S'))
-- CA,'S'
-- array_to_string(weather[3:3], ',') = concat('CA,', quote_literal('S'))
Ok. Now we need just to merge array by parts with a new item.
UPDATE test_tbl
-- generate first item + other items
SET weather = string_to_array(concat('KS,', quote_literal('W')), ',')::varchar[] || weather[2:]
WHERE array_to_string(weather[1:1], ',') = concat('KS,', quote_literal('S'));
UPDATE test_tbl
-- first item + generate second + other items
SET weather = weather[1:1] || string_to_array(concat('MO,', quote_literal('W')), ',')::varchar[] || weather[3:]
WHERE array_to_string(weather[2:2], ',') = concat('MO,', quote_literal('S'));
UPDATE test_tbl
-- first + second items, generate third + other items
SET weather = weather[0:2] || string_to_array(concat('CA,', quote_literal('W')), ',')::varchar[] || weather[4:]
WHERE array_to_string(weather[3:3], ',') = concat('CA,', quote_literal('S'));
Result: {{KS,'W'},{MO,'W'},{CA,'W'}}. Hope this helps
I believe that normalizing your data would be the best solution, but if it is not an option for you, try this function:
CREATE OR REPLACE FUNCTION change_array(p TEXT[][]) RETURNS TEXT[][] AS $$
DECLARE row record; res TEXT[][];
DECLARE i INT :=0;
BEGIN
LOOP
EXIT WHEN i = array_length(p,1);
i:=i+1;
IF p[i:i][1:2] <# ARRAY[['KS','S']] THEN
res := res || ARRAY[['KS','W']];
ELSEIF p[i:i][1:2] <# ARRAY[['MO','S']] THEN
res := res || ARRAY[['MO','W']];
ELSEIF p[i:i][1:2] <# ARRAY[['CA','S']] THEN
res := res || ARRAY[['CA','W']];
ELSE res := res || p[i:i];
END IF;
END LOOP;
RETURN res;
END;
$$ LANGUAGE plpgsql ;
Testing with your example..
SELECT change_array(ARRAY[['KS','S'],['MO','S'],['CA','S'],['XX','S']]);
change_array
-------------------------------
{{KS,W},{MO,W},{CA,W},{XX,S}}
(1 Zeile)
For
A=[100;300;1000;240]
and
B=cell(8,1)
I have the following results stored in a B
[100]
[300]
[1000]
[240]
[100;300;240]
[100;1000]
[300;1000]
[100;300;1000]
I want to print these to display the output as :
choose first
choose second
choose third
choose fourth
choose first or second or fourth
choose first or third
.
.
etc
Basically, from the array A=[100;300;1000;240] , I want each value inside of it to be represented by a string, and not one variable. Any idea how to do this ?
note :
For my code, I want the user to input their own numbers in array A, and hence the length of A is variable and can be more than 4. The size of cell B also changes according to a formula, so it is not always fixed at size 8.
I would also appreciate a simple code, nothing too complex (unless necessary) as I don't have professional knowledge with matlab. A simpler code can help me understand and learn.
for mapping I would just use a map object
index_to_string = containers.Map(keySet,valueSet)
where
keySet = 1:20
valueSet = {'first'; 'second'; ...; 'twentieth'}
If A is available before printing, you can use the same valueSet, just cut it down to the size of A.
index_to_string = containers.Map(A,valueSet(1:length(A)))
Example:
G = cell(size(B))
for i = 1:length(B)
out1 = 'choose ';
if len(B{i}) == 1
out1 = [out1, index_to_string(B{i})];
else
temp = B{i}
for j=1:(length(temp)-1)
out1 = [out1, index_to_string(temp(j)), ' or ' ];
end
out1 = [out1, index_to_string(temp(end))];
end
G{i} = out1
end
Here's how I'd do it
function IChooseYouPikachu(Choices, Results)
% put in A for choices and B for results
%simple boolean to indicate whether a choice has been made already
answerChosen = 0;
for k = 1:length(Results)
Response = 'choose';
for m = 1:length(Choices)
if any(Results{k} == Choices(m))
if answerChosen
Response = [Response ' or ' NumToOrd(m)];
else
answerChosen = 1;
Response = [Response ' ' NumToOrd(m)];
end
end
end
fprintf('%s\n',Response);
answerChosen = 0;
end
function ordinal = NumToOrd(number)
switch number
case 1, ordinal = 'first';
case 2, ordinal = 'second';
case 3, ordinal = 'third';
case 4, ordinal = 'fourth';
otherwise, ordinal = 'out of index';
end
This answer is entirely based on JaredS's answer. I have just clarified your doubts.
Write this in some m-file.
Choices=A; Results=B;
%simple boolean to indicate whether a choice has been made already
answerChosen = 0;
for k = 1:length(Results)
Response = 'choose';
for m = 1:length(Choices)
if any(Results{k} == Choices(m))
if answerChosen
Response = [Response ' or ' NumToOrd(m)];
else
answerChosen = 1;
Response = [Response ' ' NumToOrd(m)];
end
end
end
fprintf('%s\n',Response);
answerChosen = 0;
end
Please write the following function in a separate file and put that in the same directory as the previous m-file. Then you should get an error saying: "Undefined function 'NumToOrd' for input arguments of type 'double'."
function ordinal = NumToOrd(number)
switch number
case 1, ordinal = 'first';
case 2, ordinal = 'second';
case 3, ordinal = 'third';
case 4, ordinal = 'fourth';
otherwise, ordinal = 'out of index';
end
I'm having a problem and I don't find any information about.
I define a field in my model like this.
class Dates(ndb.model):
...
date = ndb.DateTimeProperty(required = True) # I want to store date and time
...
Later I try a query (now I want all the dates for a day, I don'tn mind the time):
kl = Dates.query(ndb.AND(Dates.date.year == year,
Dates.date.month == month,
Dates.date.day == day),
ancestor = customer.key).fetch(keys_only = True)
dates = ndb.get_multi(kl)
But I get this error log:
AttributeError: 'DateTimeProperty' object has no attribute 'year'
I don't know why. I've tried Dates.date() == date, Dates.date == date (<-DateTime obj), ...
My DB is still empty but I suppose this doesn't mind because I'll never have dates for every possible days.
Anybody knows why? Should I go with GQL instead?
You can use "range" queries for this. See example below.
import datetime
date = datetime.datetime.strptime('02/19/2013', '%m/%d/%Y')
kl = Dates.query(
ndb.AND(Dates.date >= date),
Dates.date < date + datetime.timedelta(days=1))
Will fetch all datetime's with 02/19/2013.
What you are trying to achieve is not really possible, because you can only query for the whole date and not for some parts of it.
In order to achieve what you are trying there I would suggest you to add few more properties to your model:
class Dates(ndb.model):
...
date = ndb.DateTimeProperty(requiered=True)
date_year = ndb.IntegerProperty()
date_month = ndb.IntegerProperty()
date_day = ndb.IntegerProperty()
...
You could update these values on every save or you could use Model Hooks to do it automagically and then your new query will become:
kl = Dates.query(ndb.AND(Dates.date_year == year,
Dates.date_month == month,
Dates.date_day == day),
ancestor=customer.key).fetch(keys_only=True)
dates = ndb.get_multi(kl)
Use a DateProperty. Then you can use a simple == query:
>>> import datetime
>>> from google.appengine.ext.ndb import *
>>> class D(Model):
... d = DateProperty()
...
>>> d = D(d=datetime.date.today())
>>> d.put()
Key('D', 9)
>>> d
D(key=Key('D', 9), d=datetime.date(2013, 2, 20))
>>> D.query(D.d == datetime.date.today()).fetch()
[D(key=Key('D', 9), d=datetime.date(2013, 2, 20))]
I expanded #Guido van Rossum code snippet to include <> and timedelta for calculations, mostly for my own satisfaction
import datetime
from datetime import timedelta
from google.appengine.ext.ndb import *
class D(Model):
d = DateProperty()
now = datetime.date.today()
date1 = now-timedelta(+500)
date2 = now-timedelta(+5)
d1 = D(d=now)
d2 = D(d=date1)
d3 = D(d=date2)
d1.put()
d2.put()
d3.put()
date2 = now-timedelta(+50)
result1 = D.query(D.d == now).fetch(4)
result2 = D.query(D.d > date2).fetch(2)
result3 = D.query(D.d < date2).fetch(2)
result4 = D.query(D.d >= date2, D.d <= now).fetch(2)
print result1
print "+++++++"
print result2
print "+++++++"
print result3
print "+++++++"
print result4
is there any easier way to select & sort by weight ?
fetchCount = 1000
date1 = datetime.datetime.utcnow().date()
entries = GqlQuery("SELECT * FROM Entry WHERE category = :category and date >= :datetime ORDER BY date, weight DESC", category = category, datetime = date1).fetch(fetchCount)
if entries is not None:
# Sort entries ( lazy way for now ).
sort = True
while sort:
sort = False
for i in range(0, len(entries)-1):
if entries[i].weight < entries[i + 1].weight:
e = entries[i + 1]
entries[i + 1] = entries[i]
entries[i] = e
sort = True
solved by:
entries = GqlQuery("SELECT * FROM Entry WHERE category = :category and date > :datetime ORDER BY date, weight DESC", category = category, datetime = date1).fetch(fetchCount)
entries = sorted(entries, key=lambda x: x.weight, reverse=True)
since there is no other way atm....
It's a limitation of the datastore that if you use an inequality filter (e.g. date >= :datetime) that must also be your first ordering key. Also, you can only have inequalities on one property per query. So, in your case you have no choice but sorting them in memory. The sorted() call in the other answer is perfect.