How do I search a "Property Bag" table in SQL? - sql-server

I have a basic "property bag" table that stores attributes about my primary table "Card." So when I want to start doing some advanced searching for cards, I can do something like this:
SELECT dbo.Card.Id, dbo.Card.Name
FROM dbo.Card
INNER JOIN dbo.CardProperty ON dbo.CardProperty.IdCrd = dbo.Card.Id
WHERE dbo.CardProperty.IdPrp = 3 AND dbo.CardProperty.Value = 'Fiend'
INTERSECT
SELECT dbo.Card.Id, dbo.Card.Name
FROM dbo.Card
INNER JOIN dbo.CardProperty ON dbo.CardProperty.IdCrd = dbo.Card.Id
WHERE (dbo.CardProperty.IdPrp = 10 AND (dbo.CardProperty.Value = 'Wind' OR dbo.CardProperty.Value = 'Fire'))
What I need to do is to extract this idea into some kind of stored procedure, so that ideally I can pass in a list of property/value combinations and get the results of the search.
Initially this is going to be a "strict" search meaning that the results must match all elements in the query, but I'd also like to have a "loose" query so that it would match any of the results in the query.
I can't quite seem to wrap my head around this one. My previous version of this was to do generate some massive SQL query to execute with a lot of AND/OR clauses in it, but I'm hoping to do something a little more elegant this time. How do I go about doing this?

it seems to me that you have an EAV model here.
if you're using sql server 2005 and up i'd suggest you use XML datatype for this:
http://weblogs.sqlteam.com/mladenp/archive/2006/10/14/14032.aspx
makes searching and stuff much easier with built in xml querying capabilities.
if you can't change your model then look at this:
http://weblogs.sqlteam.com/davidm/articles/12117.aspx

Related

How to preload with Gorm

I am hitting a road block with preloading and associations
type Entity struct {
ID uint `gorm:"primary_key"`
Username string
Repositories []*Repository `gorm:"many2many:entity_repositories"`
}
type Repository struct {
ID uint `gorm:"primary_key"`
Name string
Entities []*Entity `gorm:"many2many:entity_repositories"`
}
With small users numbers the preloading is fine using the below
db.Preload("Repositories").Find(&list)
Also tried
db.Model(&User{}).Related(&Repository{}, "Repositories").Find(&list)
The preload seems to a select * entities and then a inner join using a SELECT * FROM "repositories" INNER JOIN "entity_repositories" ON "entity_repositories"."repository_id" = "repositories"."id" WHERE ("entity_repositories"."entity_id" IN ('1','2','3','4','5','6','7','8','9','10'))
As the number of user's increases this is no longer maintainable as it hits a sqlite limit (dev). I've tried numerous permutations! .. Realistically i guess i just want it to do something like
SELECT entities.*, repositories.*
FROM entities
JOIN entity_repositories ON entity_repositories.entity_id = entities.id
JOIN repositories ON repositories.id = entity_repositories.repository_id
ORDER BY entities.id
And fill in the model for me ..
Am I doing something obviously wrong or?
Unfortunately that's just the way GORM handles preloading.
go-pg has slightly better queries, but doesn't have the same functionality as GORM. It still will do multiple queries in some cases.
I would honestly recommend just using query building with raw SQL, especially if you know what your models will look like at compile time. I ended up going with this approach in my project, despite the fact that I didn't know that my models were going to look like.
Can try the Joins(), which can perform JOIN syntax in SQL

Combine parts of different tables [duplicate]

I'm kind of new to writing sql and I have a question about joins. Here's an example select:
select bb.name from big_box bb, middle_box mb, little_box lb
where lb.color = 'green' and lb.parent_box = mb and mb.parent_box = bb;
So let's say that I'm looking for the names of all the big boxes that have nested somewhere inside them a little box that's green. If I understand correctly, the above syntax is another way of getting the same results that we could get by using the 'join' keyword.
Questions: is the above select statement efficient for the task it's doing? If not, what is a better way to do it? Is the statement syntactic sugar for a join or is it actually doing something else?
If you have links to any good material on the subject I'd gladly read it, but since I don't know exactly what this technique is called I'm having trouble googling it.
You are using implicit join syntax. This is equivalent to using the JOIN keyword but it is a good idea to avoid this syntax completely and instead use explicit joins:
SELECT bb.name
FROM big_box bb
JOIN middle_box mb ON mb.parent_box = bb.id
JOIN little_box lb ON lb.parent_box = mb.id
WHERE lb.color = 'green'
You were also missing the column name in the join condition. I have guessed that the column is called id.
This type of query should be efficient if the tables are indexed correctly. In particular there should be foreign key constraints on the join conditions and an index on little_box.color.
An issue with your query is that if there are multiple green boxes inside a single box you will get duplicate rows returned. These duplicates can be removed by addding DISTINCT after SELECT.

SQL Select on Records that Meet Criteria from Another Table

I've got a very complex query and trying to give a simple example of one of the sub-tables I'm having problems with, if you need more information or context please let me know.
I've posed a CSV file with some sample data here:
https://drive.google.com/open?id=0B4xdnV0LFZI1dzE5S29QSFhQSmM
We make cakes, and 99% of our cakes are made by us. The 1% is when we have a cake delivered to us from a subcontractor and we 'Receive' and 'Audit' it.
What I wanted to do was to write something like this:
SELECT
Cake.Cake
Instruction.Cake_Instruction_Key
Steps
FROM
Cake
Join Instruction
ON Cake.Cake_Key = Instruction.Cake_Key
JOIN Steps
ON Instruction.Step_Key = Steps.Step_Key
WHERE
MIN(Steps.Step_Key) = 1
This fails because you can't have an aggregate in the WHERE clause.
The desired results would be:
Cake C 13 Receive
Cake C 14 Audit
Cake D 15 Receive
Cake D 16 Audit
Thank you in advance for your help!
Take a look at the HAVING keyword:
https://msdn.microsoft.com/en-us/library/ms180199.aspx
It works more or less the same as the WHERE clause but for aggregate functions after the GROUP BY clause.
Beware however this can be slow. You should try filtering down the number of records as much as possible in the WHERE and even consider using a tempory table to aggregate the data into first.
What you're talking about is the GROUP BY/HAVING clause, so in your case you would need to add something like
GROUP BY Cake.Cake, Instruction.Cake_Instruction_Key, Steps
HAVING MIN(Steps.Step_Key) = 1

Performance issue with django exclude

I have a Django 1.8 application, and I am using an MsSQL database, with pyodbc as the db backend (using "django-pyodbc-azure" module).
I have the following models:
class Branch(models.Model):
name = models.CharField(max_length=30)
startTime = models.DateTimeField()
class Device(models.Model):
uid = models.CharField(max_length=100, primary_key=True)
type = models.CharField(max_length=20)
firstSeen = models.DateTimeField()
lastSeen = models.DateTimeField()
class Session(models.Model):
device = models.ForeignKey(Device)
branch = models.ForeignKey(Branch)
start = models.DateTimeField()
end = models.DateTimeField(null=True, blank=True)
I need to query the session model, and I want to exclude some records with specific device values. So I issue the following query:
sessionCount = Session.objects.filter(branch=branch)
.exclude(device__in=badDevices)
.filter(end__gte=F('start')+timedelta(minutes=30)).count()
badDevices is a pre-filled list of device ids with around 60 items.
badDevices = ['id-1', 'id-2', ...]
This query takes around 1.5 seconds to complete. If I remove the exclude from the query, it takes around 250 miliseconds.
I printed the generated sql for this queryset, and tried it in my database client. There, both versions executed in around 250 miliseconds.
This is the generated SQL:
SELECT [session].[id], [session].[device_id], [session].[branch_id], [session].[start], [session].[end]
FROM [session]
WHERE ([session].[branch_id] = my-branch-id AND
NOT ([session].[device_id] IN ('id-1', 'id-2', 'id-3',...)) AND
DATEPART(dw, [session].[start]) = 1
AND [session].[end] IS NOT NULL AND
[session].[end] >= ((DATEADD(second, 600, CAST([session].[start] AS datetime)))))
So, using the exclude in database level doesn't seem to be affecting the query performance, but in django, the query runs 6 times slower if I add the exclude part. What could be causing this?
The general issue seems to be that django is doing some extra work to prepare the exclude clause. After that step and by the time the SQL has been generated and sent to the database, there isn't anything interesting happening on the django side that could cause such a significant delay.
In your case, one thing that might be causing this is some kind of pre-processing of badDevices. If, for instance, badDevices is a QuerySet then django might be executing the badDevices query just to prepare the actual query's SQL. Possibly something similar might be happening in the case where device has a non-default primary key.
The other thing might delay the SQL preparation is of course django-pyodbc-azure. Maybe it's doing something strange while compiling the query and it becomes a bottleneck.
This is all wild speculation though, so if you're still having this issue then post the Device and Branch models as well, the exact content of badDevices and the SQL generated from the queries. Then maybe some scenarios can be at least eliminated.
EDIT: I think it must be the Device.uid field. Possibly django or pyodbc is getting confused by the non-default primary key and is fetching all the devices while generating the query. Try two things:
Replace device__in with device_id__in, device__pk__in and device__uid__in and check each one again. Maybe a more explicit query will be easier for django to translate into SQL. You can even try replacing branch with branch_id, just in case.
If the above doesn't work, try replacing the exclude expression with a raw SQL where clause:
# add quotes (because of the hyphens) & join
badDevicesIdString = ", ".join(["'%s'" % id for id in badDevices])
# Replaces .exclude()
... .extra(where=['device_id NOT IN (%s)' % badDevicesIdString])
If neither works, then most likely the problem is with the whole query and not just exclude. There are some more options in that case but try the above first and I will update my answer later if necessary.
Just want to share a similar problem that I had with MySQL and exclude clauses performance and how it was fixed.
When running the exclude clause, the list with the "in" lookup was actually a Queryset that I got using values_list method. Checking the exclude query executed by MySQL, the "in" objects were not values but actually another query. This behavior was impacting performance on specific large queries.
To fix that, instead of passing the queryset, I flat it out in a python list of values. By doing that, each value is passed as an argument inside the in lookup and the performance was really improved.

Hitting the 2100 parameter limit (SQL Server) when using Contains()

from f in CUSTOMERS
where depts.Contains(f.DEPT_ID)
select f.NAME
depts is a list (IEnumerable<int>) of department ids
This query works fine until you pass a large list (say around 3000 dept ids) .. then I get this error:
The incoming tabular data stream (TDS) remote procedure call (RPC) protocol stream is incorrect. Too many parameters were provided in this RPC request. The maximum is 2100.
I changed my query to:
var dept_ids = string.Join(" ", depts.ToStringArray());
from f in CUSTOMERS
where dept_ids.IndexOf(Convert.ToString(f.DEPT_id)) != -1
select f.NAME
using IndexOf() fixed the error but made the query slow. Is there any other way to solve this? thanks so much.
My solution (Guids is a list of ids you would like to filter by):
List<MyTestEntity> result = new List<MyTestEntity>();
for(int i = 0; i < Math.Ceiling((double)Guids.Count / 2000); i++)
{
var nextGuids = Guids.Skip(i * 2000).Take(2000);
result.AddRange(db.Tests.Where(x => nextGuids.Contains(x.Id)));
}
this.DataContext = result;
Why not write the query in sql and attach your entity?
It's been awhile since I worked in Linq, but here goes:
IQuery q = Session.CreateQuery(#"
select *
from customerTable f
where f.DEPT_id in (" + string.Join(",", depts.ToStringArray()) + ")");
q.AttachEntity(CUSTOMER);
Of course, you will need to protect against injection, but that shouldn't be too hard.
You will want to check out the LINQKit project since within there somewhere is a technique for batching up such statements to solve this issue. I believe the idea is to use the PredicateBuilder to break the local collection into smaller chuncks but I haven't reviewed the solution in detail because I've instead been looking for a more natural way to handle this.
Unfortunately it appears from Microsoft's response to my suggestion to fix this behavior that there are no plans set to have this addressed for .NET Framework 4.0 or even subsequent service packs.
https://connect.microsoft.com/VisualStudio/feedback/ViewFeedback.aspx?FeedbackID=475984
UPDATE:
I've opened up some discussion regarding whether this was going to be fixed for LINQ to SQL or the ADO.NET Entity Framework on the MSDN forums. Please see these posts for more information regarding these topics and to see the temporary workaround that I've come up with using XML and a SQL UDF.
I had similar problem, and I got two ways to fix it.
Intersect method
join on IDs
To get values that are NOT in list, I used Except method OR left join.
Update
EntityFramework 6.2 runs the following query successfully:
var employeeIDs = Enumerable.Range(3, 5000);
var orders =
from order in Orders
where employeeIDs.Contains((int)order.EmployeeID)
select order;
Your post was from a while ago, but perhaps someone will benefit from this. Entity Framework does a lot of query caching, every time you send in a different parameter count, that gets added to the cache. Using a "Contains" call will cause SQL to generate a clause like "WHERE x IN (#p1, #p2.... #pn)", and bloat the EF cache.
Recently I looked for a new way to handle this, and I found that you can create an entire table of data as a parameter. Here's how to do it:
First, you'll need to create a custom table type, so run this in SQL Server (in my case I called the custom type "TableId"):
CREATE TYPE [dbo].[TableId] AS TABLE(
Id[int] PRIMARY KEY
)
Then, in C#, you can create a DataTable and load it into a structured parameter that matches the type. You can add as many data rows as you want:
DataTable dt = new DataTable();
dt.Columns.Add("id", typeof(int));
This is an arbitrary list of IDs to search on. You can make the list as large as you want:
dt.Rows.Add(24262);
dt.Rows.Add(24267);
dt.Rows.Add(24264);
Create an SqlParameter using the custom table type and your data table:
SqlParameter tableParameter = new SqlParameter("#id", SqlDbType.Structured);
tableParameter.TypeName = "dbo.TableId";
tableParameter.Value = dt;
Then you can call a bit of SQL from your context that joins your existing table to the values from your table parameter. This will give you all records that match your ID list:
var items = context.Dailies.FromSqlRaw<Dailies>("SELECT * FROM dbo.Dailies d INNER JOIN #id id ON d.Daily_ID = id.id", tableParameter).AsNoTracking().ToList();
You could always partition your list of depts into smaller sets before you pass them as parameters to the IN statement generated by Linq. See here:
Divide a large IEnumerable into smaller IEnumerable of a fix amount of item

Resources