Slick: Difficulties working with Column[Int] values - database

I have a followup to another Slick question I recently asked (Slick table Query: Trouble with recognizing values) here. Please bear with me!! I'm new to databasing and Slick seems especially poor on documentation. Anyway, I have this table:
object Users extends Table[(Int, String)]("Users") {
def userId = column[Int]("UserId", O.PrimaryKey, O.AutoInc)
def userName = column[String]("UserName")
def * = userId ~ userName
}
Part I
I'm attempting to query with this function:
def findByQuery(where: List[(String, String)]) = SlickInit.dbSlave withSession {
val q = for {
x <- Users if foo((x.userId, x.userName), where)
} yield x
q.firstOption.map { case(userId, userName) =>
User(userId, userName)}
}
where "where" is a list of search queries //ex. ("userId", "1"),("userName", "Alex")
"foo" is a helper function that tests equality. I'm running into a type error.
x.userId is of type Column[Int]. How can one manipulate this as an Int? I tried casting, ex:
foo(x.userId.asInstanceOf[Int]...)
but am also experiencing trouble with that. How does one deal with Slick return types?
Part II
Is anyone familiar with the casting function:
def * = userId ~ userName <> (User, User.unapply _)
? I know there have been some excellent answers to this question, most notably here: scala slick method I can not understand so far and a very similar question here: mapped projection with companion object in SLICK. But can anyone explain why the compiler responds with
<> method overloaded
for that simple line of code?

Let's start with the problem:
val q = for {
x <- Users if foo((x.userId, x.userName), where)
} yield x
See, Slick transforms Scala expressions into SQL. To be able to transform conditions, as you want, into SQL statement, Slick requires some special types to be used. The way these types works are actually part of the transformation Slick performs.
For example, when you write List(1,2,3) filter { x => x == 2 } the filter predicate is executed for each element in the list. But Slick can't do that! So Query[ATable] filter { arow => arow.id === 2 } actually means "make a select with the condition id = 2" (I am skipping details here).
I wrote a mock of your foo function and asked Slick to generate the SQL for the query q:
select x2."UserId", x2."UserName" from "Users" x2 where false
See the false? That's because foo is a simple predicate that Scala evaluates into the Boolean false. A similar predicate done in a Query, instead of a list, evaluates into a description of a what needs to be done in the SQL generation. Compare the difference between the filters in List and in Slick:
List[A].filter(A => Boolean):List[A]
Query[E,U].filter[T](f: E => T)(implicit wt: CanBeQueryCondition[T]):Query[E,U]
List filter evaluates to a list of As, while Query.filter evaluates into new Query!
Now, a step towards a solution.
It seems that what you want is actually the in operator of SQL. The in operator returns true if there is an element in a list, eg: 4 in (1,2,3,4) is true. Notice that (1,2,3,4) is a SQL list, not Tuple like in Scala.
For this use case of the in SQL operator Slick uses the operator inSet.
Now comes the second part of the problem. (I renamed the where variable to list, because where is a Slick method)
You could try:
val q = for {
x <- Users if (x.userId,x.userName) inSet list
} yield x
But that won't compile! That's because SQL doesn't have Tuples the way Scala has. In SQL you can't do (1,"Alfred") in ((1,"Alfred"),(2, "Mary")) (remember, the (x,y,z) is the SQL syntax for lists, I am abusing the syntax here only to show that it's invalid -- also there are many dialects of SQL out there, it is possible some of them do support tuples and lists in a similar way.)
One possible solution is to use only the userId field:
val q = for {
x <- Users if x.userId inSet list2
} yield x
This generates select x2."UserId", x2."UserName" from "Users" x2 where x2."UserId" in (1, 2, 3)
But since you are explicitly using user id and user name, it's reasonable to assume that user id doesn't uniquely identify a user. So, to amend that we can concatenate both values. Of course, we need to do the same in the list.
val list2 = list map { t => t._1 + t._2 }
val q2 = for {
x <- Users if (x.userId.asColumnOf[String] ++ x.userName) inSet list2
} yield x
Look the generated SQL:
select x2."UserId", x2."UserName" from "Users" x2
where (cast(x2."UserId" as VARCHAR)||x2."UserName") in ('1a', '3c', '2b')
See the above ||? It's the string concatenation operator used in H2Db. H2Db is the Slick driver I am using to run your example. This query and the others can vary slightly depending on the database you are using.
Hope that it clarifies how slick works and solve your problem. At least the first one. :)

Part I:
Slick uses Column[...]-types instead of ordinary Scala types. You also need to define Slick helper functions using Column types. You could implement foo like this:
def foo( columns: (Column[Int],Column[String]), values: List[(Int,String)] ) : Column[Boolean] = values.map( value => columns._1 === value._1 && columns._2 === value._2 ).reduce( _ || _ )
Also read pedrofurla's answer to better understand how Slick works.
Part II:
Method <> is indeed overloaded and when types don't work out the Scala compiler can easily become uncertain which overload it should use. (We should get rid of the overloading in Slick I think.) Writing
def * = userId ~ userName <> (User.tupled _, User.unapply _)
may slightly improve the error message you get. To solve the problem make sure that the Column types of userId and userName exactly correspond to the member types of your User case class, which should look something like case class User( id:Int, name:String ). Also make sure, that you extend Table[User] (not Table[(Int,String)]) when mapping to User.

Related

How to get all vector ids from Milvus2.0?

I used to use Milvus1.0. And I can get all IDs from Milvus1.0 by using get_collection_stats and list_id_in_segment APIs.
These days I am trying Milvus2.0. And I also want to get all IDs from Milvus2.0. But I don't find any ways to do it.
milvus v2.0.x supports queries using boolean expressions.
This can be used to return ids by checking if the field is greater than zero.
Let's assume you are using this schema for your collection.
referencing: https://github.com/milvus-io/pymilvus/blob/master/examples/hello_milvus.py
as of 3/8/2022
fields = [
FieldSchema(name="pk", dtype=DataType.INT64, is_primary=True, auto_id=False),
FieldSchema(name="random", dtype=DataType.DOUBLE),
FieldSchema(name="embeddings", dtype=DataType.FLOAT_VECTOR, dim=dim)
]
schema = CollectionSchema(fields, "hello_milvus is the simplest demo to introduce the APIs")
hello_milvus = Collection("hello_milvus", schema, consistency_level="Strong")
Remember to insert something into your collection first... see the pymilvus example.
Here you want to query out all ids (pk)
You cannot currently list ids specific to a segment, but this would return all ids in a collection.
res = hello_milvus.query(
expr = "pk >= 0",
output_fields = ["pk", "embeddings"]
)
for x in res:
print(x["pk"], x["embeddings"])
I think this is the only way to do it now, since they removed list_id_in_segment

Django query filter using large array of ids in Postgres DB

I want to pass a query in Django to my PostgreSQL database. When I filter my query using a large array of ids, the query is very slow and goes up to 70s.
After looking for an answer I saw this post which gives a solution to my problem, simply change the ARRAY [ids] in IN statement by VALUES (id1), (id2), ....
I tested the solution with a raw query in pgadmin, the query goes from 70s to 300ms...
How can I do the same command (i.e. not using an array of ids but a query with VALUES) in Django?
I found a solution building on #erwin-brandstetter answer using a custom lookup
from django.db.models import Lookup
from django.db.models.fields import Field
#Field.register_lookup
class EfficientInLookup(Lookup):
lookup_name = "ineff"
def as_sql(self, compiler, connection):
lhs, lhs_params = self.process_lhs(compiler, connection)
rhs, rhs_params = self.process_rhs(compiler, connection)
params = lhs_params + rhs_params
return "%s IN (SELECT unnest(%s))" % (lhs, rhs), params
This allows to filter like this:
MyModel.objects.filter(id__ineff=<list-of-values>)
The trick is to transform the array to a set somehow.
Instead of (this form is only good for a short array):
SELECT *
FROM tbl t
WHERE t.tbl_id = ANY($1);
-- WHERE t.tbl_id IN($1); -- equivalent
$1 being the array parameter.
You can still pass an array like you had it, but unnest and join. Like:
SELECT *
FROM tbl t
JOIN unnest($1) arr(id) ON arr.id = t.tbl_id;
Or you can keep your query, too, but replace the array with a subquery unnesting it:
SELECT * FROM tbl t
WHERE t.tbl_id = ANY (SELECT unnest($1));
Or:
SELECT * FROM tbl t
WHERE t.tbl_id IN (SELECT unnest($1));
Same effect for performance as passing a set with a VALUES expression. But passing the array is typically much simpler.
Detailed explanation:
IN vs ANY operator in PostgreSQL
How to use ANY instead of IN in a WHERE clause with Rails?
Optimizing a Postgres query with a large IN
Is this an example of the first thing you're asking?
relation_list = list(ModelA.objects.filter(id__gt=100))
obj_query = ModelB.objects.filter(a_relation__in=relation_list)
That would be an "IN" command because you're first evaluating relation_list by casting it to a list, and then using it in your second query.
If instead you do the exact same thing, Django will only make one query, and do SQL optimization for you. So it should be more efficient that way.
You can always see the SQL command you'll be executing with obj_query.query if you're curious what's happening under the hood.
Hope that answers the question, sorry if it doesn't.
I had lots of trouble to make the custom lookup 'ineff' work.
I may have solved it, but would love some validation from Django and Postgres experts.
1) Using it 'directly' on a ForeignKey field (ModelB)
ModelA.objects.filter(ModelB__ineff=queryset_ModelB)
Throws the following exception:
"Related Field got invalid lookup: ineff"
ForeignKey fields cannot be used with custom lookups.
A similar issue is reported here:
Custom lookup is not being registered in Django
2) Using it 'indirectly' on the pk field of related model (ModelB.id)
ModelA.objects.filter(ModelB__id__ineff=queryset_ModelB.values_list('id', flat=True))
Throws the following exception:
"can only concatenate list (not "tuple") to list"
Looking at Django Traceback, I noticed that rhs_params is a tuple.
Yet we try to add it to lhs_params (a list) in our custom lookup.
Hence I changed:
params = lhs_params + rhs_params
into:
params = lhs_params + list(rhs_params)
3) I then got a Postgres error (at least I had passed Django ORM)
"function unnest(uuid) does not exist"
"HINT: No function matches the given name and argument types. You might need to add explicit type casts."
I apparently solved it by changing the sql:
from:
return "%s IN (SELECT unnest(%s))" % (lhs, rhs), params
to:
return "%s IN (SELECT unnest(ARRAY(%s)))" % (lhs, rhs), params
Hence my final as_sql method looks like this:
def as_sql(self, compiler, connection):
lhs, lhs_params = self.process_lhs(compiler, connection)
rhs, rhs_params = self.process_rhs(compiler, connection)
params = lhs_params + list(rhs_params)
return "%s IN (SELECT unnest(ARRAY(%s)))" % (lhs, rhs), params
It seems to work, and is indeed faster than in__ (tested with EXPLAIN ANALYZE in Postgres).
But I would love to have some validation from experts, perhaps Erwin Brandstetter?
Thanks for your input.

Generate sql query by anorm, with all nulls except one

I developing web application with play framework 2.3.8 and scala, with complex architecture on backend and front-end side. As backend we use MS SQL, with many stored procedures, and called it by anorm. And here one of the problems.
I need to update some fields in database. The front end calls play framework, and recive name of the field, and value. Then I parse, field name, and then I need to generate SQL Query for update field. I need assign null, for all parameters, except recived parameter. I try to do it like that:
def updateCensusPaperXX(name: String, value: String, user: User) = {
DB.withConnection { implicit c =>
try {
var sqlstring = "Execute [ScXX].[updateCensusPaperXX] {login}, {domain}"
val params = List(
"fieldName1",
"fieldName2",
...,
"fieldNameXX"
)
for (p <- params){
sqlstring += ", "
if (name.endsWith(p))
sqlstring += value
else
sqlstring += "null"
}
SQL(sqlstring)
.on(
"login" -> user.login,
"domain" -> user.domain,
).execute()
} catch {
case e: Throwable => Logger.error("update CensusPaper04 error", e)
}
}
}
But actually that doesn't work in all cases. For example, when I try to save string, it give's me an error like:
com.microsoft.sqlserver.jdbc.SQLServerException: Incorrect syntax near 'some phrase'
What is the best way to generate sql query using anorm with all nulls except one?
The reason this is happening is because when you write the string value directly into the SQL statement, it needs to be quoted. One way to solve this would be to determine which of the fields are strings and add conditional logic to determine whether to quote the value. This is probably not the best way to go about it. As a general rule, you should be using named parameters rather than building a string to with the parameter values. This has a few of benefits:
It will probably be easier for you to diagnose issues because you will get more sensible error messages back at runtime.
It protects against the possibility of SQL injection.
You get the usual performance benefit of reusing the prepared statement although this might not amount to much in the case of stored procedure invocation.
What this means is that you should treat your list of fields as named parameters as you do with user and domain. This can be accomplished with some minor changes to your code above. First, you can build your SQL statement as follows:
val params = List(
"fieldName1",
"fieldName2",
...,
"fieldNameXX"
)
val sqlString = "Execute [ScXX].[updateCensusPaperXX] {login}, {domain}," +
params.map("{" + _ + "}").mkString{","}
What is happening above is that you don't need to insert the values directly, so you can just build the string by adding the list of parameters to the end of your query string.
Then you can go ahead and start building your parameter list. Note, the parameters to the on method of SQL is a vararg list of NamedParameter. Basically, we need to create Seq of NamedParameters that covers "login", "domain" and the list of fields you are populating. Something like the following should work:
val userDomainParams: Seq[NamedParameter] = (("login",user.login),("domain",user.domain))
val additionalParams = params.map(p =>
if (name.endsWith(p))
NamedParameter(p, value)
else
NamedParameter(p, None)
).toSeq
val fullParams = userDomainParams ++ additionalParams
// At this point you can execute as follows
SQL(sqlString).on(fullParams:_*).execute()
What is happening here is that you building the list of parameters and then using the splat operator :_* to expand the sequence into the varargs needed as arguments for the on method. Note that the None used in the NamedParameter above is converted into a jdbc NULL by Anorm.
This takes care of the issue related to strings because you are no longer writing the string directly into the query and it has the added benefit eliminating other issues related with writing the SQL string rather than using parameters.

Google App Engine: IN filter and argument position

I have a table where one of the columns contains a list. I want to know if it is possible to select all rows where the list contains a specific element.
More concretely, I have a guests column containing a list of strings and I want to know if a specific guest string is part of this list. I would like to write a query like this:
q = TableName.gql('WHERE :g IN guests', g=guest)
It seems, however, that I can't put variables in this position. For instance, this query (where ownerid is a string and not a string list) is also disallowed:
q = TableName.gql('WHERE :g = ownerid', g=guest)
I seem to have to write it this way:
q = TableName.gql('WHERE ownerid = :g', g=guest)
Thus I have the following questions:
How can I construct a query that gets rows where a list-cell contains a specific member?
Are arguments for GQL queries restricted to the right-hand side of operators? What is the restriction?
I am using Google App Engine with Python 2.7. Thanks!
You have misunderstood what the IN operator is for. It is not for querying against a repeated field: you just use the normal = for that. IN is for querying against a list of values: eg guest IN [1, 2, 3, 4]. Your query should be:
q = TableName.gql('WHERE guests = :g', g=guest)
or better, since GQL doesn't give you anything that the standard DB syntax doesn't:
q = TableName.all().filter('guests =', guest)

T-SQL IsNumeric() and Linq-to-SQL

I need to find the highest value from the database that satisfies a certain formatting convention. Specifically, I would like to find the highest value that looks like
EU999999 ('9' being any digit)
select max(col) will return something like 'EUZ...' for instance that I want to exclude.
The following query does the trick, but I can't produce this via Linq-to-SQL. There seems to be no translation for the isnumeric() function in SQL Server.
select max(col) from table where col like 'EU%'
and 1=isnumeric(replace(col, 'EU', ''))
Writing a database function, stored procedure, or anything else of that nature is far down the list of my preferred solutions, because this table is central to my app and I cannot easily replace the table object with something else.
What's the next-best solution?
Although ISNUMERIC is missing, you could always try the nearly equivalent NOT LIKE '%[^0-9]%, i.e., there is no non-digit in the string, or alternatively, the string is empty or consists only of digits:
from x in table
where SqlMethods.Like(x.col, 'EU[0-9]%') // starts with EU and at least one digit
&& !SqlMethods.Like(x.col, '__%[^0-9]%') // and no non-digits
select x;
Of course, if you know that the number of digits is fixed, this can be simplified to
from x in table
where SqlMethods.Like(x.col, 'EU[0-9][0-9][0-9][0-9][0-9][0-9]')
select x;
You could make use of the ISNUMERIC function by adding a method to a partial class for the DataContext. It would be similar to using a UDF.
In your DataContext's partial class add this:
partial class MyDataContext
{
[Function(Name = "ISNUMERIC", IsComposable = true)]
public int IsNumeric(string input)
{
throw new NotImplementedException(); // this won't get called
}
}
Then your code would use it in this manner:
var query = dc.TableName
.Select(p => new { p.Col, ReplacedText = p.Col.Replace("EU", "") })
.Where(p => SqlMethods.Like(p.Col, "EU%")
&& dc.IsNumeric(p.ReplacedText) == 1)
.OrderByDescending(p => p.ReplacedText)
.First()
.Col;
Console.WriteLine(query);
Or you could make use MAX:
var query = dc.TableName
.Select(p => new { p.Col, ReplacedText = p.Col.Replace("EU", "") })
.Where(p => SqlMethods.Like(p.Col, "EU%")
&& dc.IsNumeric(p.ReplacedText) == 1);
var result = query.Where(p => p.ReplacedText == query.Max(p => p.ReplacedText))
.First()
.Col;
Console.WriteLine("Max: {0}, Result: {1}", max, result);
Depending on your final goal it might be possible to stop at the max variable and prepend it with the "EU" text to avoid the 2nd query that gets the column name.
EDIT: as mentioned in the comments, the shortcoming of this approach is that ordering is done on text rather than numeric values and there's currently no translation for Int32.Parse on SQL.
As you said, there is no translation for IsNumeric from LINQ to SQL. There are a few options, you already wrote database function and stored procedure down. I like to add two more.
Option 1: You can do this by mixing LINQ to SQL with LINQ to Objects, but when you've got a big database, don't expect great performance:
var cols = (from c in db.Table where c.StartsWith("EU") select c).ToList();
var stripped = from c in cols select int.Parse(c.Replace("EU", ""));
var max = stripped.Max();
Option 2: change your database schema :-)
My suggestion is to fall back to in-line SQL and use the DataContext.ExecuteQuery() method. You would use the SQL query you posted in the beginning.
This is what I have done in similar situations. Not ideal, granted, due to the lack of the type-checking and possible syntax errors, but simply make sure it is included in any unit tests. Not every possible query is covered by the Linq syntax, hence the existence of ExecuteQuery in the first place.

Resources