Use of Maps in Flink SQL - apache-flink

If you register a table and one of its fields is a map (extra in this case“) the following statement works just fine:
SELECT f1, f2 FROM customers WHERE extra['sportPrefs'] = 'Football';
Now try to reference a key that does not exist in the map.
SELECT f1, f2 FROM customers WHERE extra['moviePrefs'] = 'Action';
You will get an NPE and the job exits. This would be OK if there was a way to check whether a particular key exists in a map. Unfortunately, I have not found a way. Check for IS NOT NULL does not work. Without this feature maps in Flink SQL are pretty useless. What am I missing? Thanks!

What you do describe was a bug that is described here.
It will be fixed in the next Flink version 1.5.0 that will be released next month.

Related

Preveiw app - inability to select and execute more than 1 sql statement

Started using the 'Preview App' late last week.
In the classic UI, we can highlight multiple rows and execute them all, like this for example...
update table.blah set updated_ts = current_timestamp()::timestamp_ntz, value = 1 where account_id = 5;
update table.blah set updated_ts = current_timestamp()::timestamp_ntz, value = 1 where account_id = 6;
In the new app.snowflake UI, we get the error...
000006 (0A000): Multiple SQL statements in a single API call are not supported; use one API call per statement instead.
This is a major annoyance. Is this something that is planned to be changed in a future release?
Thanks,
Craig
This is one of the current limitations of the new UI for Snowflake. Remember, it is only in Preview. You will see more and more functionality from the old interface be moved into the new interface over time. I am sure this is one of them.

Entity Framework: Max. number of "subqueries"?

My data model has an entity Person with 3 related (1:N) entities Jobs, Tasks and Dates.
My query looks like
var persons = (from x in context.Persons
select new {
PersonId = x.Id,
JobNames = x.Jobs.Select(y => y.Name),
TaskDates = x.Tasks.Select(y => y.Date),
DateInfos = x.Dates.Select(y => y.Info)
}).ToList();
Everything seems to work fine, but the lists JobNames, TaskDates and DateInfos are not all filled.
For example, TaskDates and DateInfos have the correct values, but JobNames stays empty. But when I remove TaskDates from the query, then JobNames is correctly filled.
So it seems that EF can only handle a limited number of these "subqueries"? Is this correct? If so, what is the max. number of these "subqueries" for a single statement? Is there a way to work around these issue without having to make more than one call to the database?
(ps: I'm not entirely sure, but I seem to remember that this query worked in LINQ2SQL - could it be?)
UPDATE
I'm getting crazy about this. I tried to repro the issue from ground up using a fresh, simple project (to post the entire piece of code here, not only an oversimplified example) - and I found I wasn't able to repro it. It still happens within our existing code base (apparently there's more behind this problem, but I cannot share this closed code base, unfortunately).
After hours and hours of playing around I found the weirdest behavior:
It works great when I don't SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED; before calling the LINQ statement
It also works great (independent of the above) when I don't use a .Take() to only get the first X rows
It also works great when I add an additional .Where() statements to cut the the number of rows returned from SQL Server
I didn't find any comprehensible reason why I see this behavior, but I started to look at the SQL: Although EF generates the exact same SQL, the execution plan is different when I use READ UNCOMMITTED. It returns more rows on a specific index in the middle of the execution plan, which curiously ends in less rows returned for the entire SQL statement - which in turn results in the missing data, that is the reason for my question to begin with.
This sounds very confusing and unbelievable, I know, but this is the behavior I see. I don't know what else to do, I don't even know what to google for at this point ;-).
I can fix my problem (just don't use READ UNCOMMITTED), but I have no idea why it occurs and if it is a bug or something I don't know about SQL Server. Maybe there's some "magic max number of allowed results in sub-queries" in SQL Server? At least: As far as I can see, it's not an issue with EF itself.
A little late, but does calling ToList() on each subquery produce the required effect?
var persons = (from x in context.Persons
select new {
PersonId = x.Id,
JobNames = x.Jobs.Select(y => y.Name.ToList()),
TaskDates = x.Tasks.Select(y => y.Date).ToList(),
DateInfos = x.Dates.Select(y => y.Info).ToList()
}).ToList();

Rails 3, ActiveRecord, PostgreSQL - ".uniq" command doesn't work?

I have following query:
Article.joins(:themes => [:users]).where(["articles.user_id != ?", current_user.id]).order("Random()").limit(15).uniq
and gives me the error
PG::Error: ERROR: for SELECT DISTINCT, ORDER BY expressions must appear in select list
LINE 1: ...s"."user_id" WHERE (articles.user_id != 1) ORDER BY Random() L...
When I update the original query to
Article.joins(:themes => [:users]).where(["articles.user_id != ?", current_user.id]).order("Random()").limit(15)#.uniq
so the error is gone... In MySQL .uniq works, in PostgreSQL not. Exist any alternative?
As the error states for SELECT DISTINCT, ORDER BY expressions must appear in select list.
Therefore, you must explicitly select for the clause you are ordering by.
Here is an example, it is similar to your case but generalize a bit.
Article.select('articles.*, RANDOM()')
.joins(:users)
.where(:column => 'whatever')
.order('Random()')
.uniq
.limit(15)
So, explicitly include your ORDER BY clause (in this case RANDOM()) using .select(). As shown above, in order for your query to return the Article attributes, you must explicitly select them also.
I hope this helps; good luck
Just to enrich the thread with more examples, in case you have nested relations in the query, you can try with the following statement.
Person.find(params[:id]).cars.select('cars.*, lower(cars.name)').order("lower(cars.name) ASC")
In the given example, you're asking all the cars for a given person, ordered by model name (Audi, Ferrari, Porsche)
I don't think this is a better way, but may help to address this kind of situation thinking in objects and collections, instead of a relational (Database) way.
Thanks!
I assume that the .uniq method is translated to a DISTINCT clause on the SQL. PostgreSQL is picky (pickier than MySQL) -- all fields in the select list when using DISTINCT must be present in the ORDER_BY (and GROUP_BY) clauses.
It's a little unclear what you are attempting to do (a random ordering?). In addition to posting the full SQL sent, if you could explain your objective, that might be helpful in finding an alternative.
I just upgraded my 100% working and tested application from 3.1.1 to 3.2.7 and now have this same PG::Error.
I am using Cancan...
#users = User.accessible_by(current_ability).order('lname asc').uniq
Removing the .uniq solves the problem and it was not necessary anyway for this simple query.
Still looking through the change notes between 3.1.1 and 3.2.7 to see what caused this to break.

Quickly finding the last item in a database Cakephp

I just inherited some cakePHP code and I am not very familiar with it (or any other php/serverside language). I need to set the id of the item I am adding to the database to be the value of the last item plus one, originally I did a call like this:
$id = $this->Project->find('count') + 1;
but this seems to add about 8 seconds to my page loading (which seems weird because the database only has about 400 items) but that is another problem. For now I need a faster way to find the id of the last item in the database, is there a way using find to quickly retrieve the last item in a given table?
That's a very bad approach on setting the id.
You do know that, for example, MySQL supports auto-increment for INT-fields and therefore will set the id automatically for you?
The suggested functions getLastInsertId and getInsertId will only work after an insert and not always.
I also can't understand that your call adds 8 seconds to your siteload. If I do such a call on my table (which also has around 400 records) the call itself only needs a few milliseconds. There is no delay the user would notice.
I think there might be a problem with your database-setup as this seems very unlikely.
Also please have a look if your database supports auto-increment (I can't imagine that's not possible) as this would be the easiest way of adding your wanted functionality.
I would try
$id = $this->Project->getLastInsertID();
$id++;
The method can be found in cake/libs/model/model.php in line 2768
As well as on this SO page
Cheers!
If you are looking for the cakePHP3 solution to this you simply use last().
ie:
use Cake\ORM\TableRegistry;
....
$myrecordstable=Tableregistry::get('Myrecords');
$myrecords=$myrecordstable->find()->last();
$lastId = $myrecords->id;
....

SQL Query Notifications and GetDate()

I am currently working on a query that is registered for Query Notifications. In accordance w/ the rules of Notification Serivces, I can only use Deterministic functions in my queries set up for subscription. However, GetDate() (and almost any other means that I can think of) are non-deterministic. Whenever I pull my data, I would like to be able to limit the result set to only relevant records, which is determined by the current day.
Does anyone know of a work around that I could use that would allow me to use the current date to filter my results but not invalidate the query for query notifications?
Example Code:
SELECT fcDate as RecordDate, fcYear as FiscalYear, fcPeriod as FiscalPeriod, fcFiscalWeek as FiscalWeek, fcIsPeriodEndDate as IsPeriodEnd, fcPeriodWeek as WeekOfPeriod
FROM dbo.bFiscalCalendar
WHERE fcDate >= GetDate() -- This line invalidates the query for notification...
Other thoughts:
We have an application controls table in our database that we use to store application level settings. I had thought to write a small script that keeps a record up to date w/ teh current smalldatetime. However, my join to this table is failing for notificaiton as well and I am not sure why. I surmise that it has something to do w/ me specifitying a text type (the column name), which is frustrating.
Example Code 2:
SELECT fcDate as RecordDate, fcYear as FiscalYear, fcPeriod as FiscalPeriod, fcFiscalWeek as FiscalWeek, fcIsPeriodEndDate as IsPeriodEnd, fcPeriodWeek as WeekOfPeriod
FROM dbo.bFiscalCalendar
INNER JOIN dbo.xApplicationControls ON fcDate >= acValue AND acName = N'Cache_CurrentDate'
Does anyone have any suggestions?
EDIT: Here is a link on MSDN that gives the rules for Notification Services
As it turns out, I figured out the solution. Basically, I was invalidating my query attempts because I was casting a value as a DateTime which marks it as Non-Deterministic. Even though you don't specifically call out a cast but do something akin to:
RecordDate = 'date_string_value'
You still end up w/ a Date Cast. Hopefully this will help out someone else who hits this issue.
This link helped me quite a bit.
http://msdn.microsoft.com/en-us/library/ms178091.aspx
A good way to bypass this is simply to create a view that just says "SELECT GetDate() AS Now", then use the view in your query.
EDIT : I see nothing about not using user-defined functions (which is what I've used the 'view today' bit in). So can you use a UDF in the query that points at the view?

Resources