I have a visit Model and I'm getting the data I want like that:
$app_visits = Visit::select([
'start',
'end',
'machine_name'
])->where('user_id', $chosen_id)->get();
But I want to add points for every visit. Every visit has an interaction (but there's no visit_id (because of other system I cannot add it).
Last developer left it like that:
$interactions = Interaction::where([
'machine_name' => $app_visit->machine_name,
])->whereBetween('date', [$app_visit->start, $app_visit->end])->get();
$points = 0;
foreach ($interactions as $interaction) {
$points += (int)$interaction->app_stage;
}
$app_visits[$key]['points'] = $points
But I really don't like it as it's slow and messy. I wanted to just add 'points' sum to the first query, to touch database only once.
#edit as someone asked for database structure:
visit:
|id | start | end | machine_name | user_id
inteaction:
|id | time | machine_name | points
You can use a few things in eloquent. Probably the most useful for this case, is the select(DB::raw(sql...)) as you will have to add a bit of raw sql to retrieve a count.
For example:
return $query
->join(...)
->where(...)
->select(DB::raw(
COUNT(DISTINCT res.id) AS count'
))
->groupBy(...);
Failing that, I'd just replace the eloquent with raw sql. We've had to do that a fair bit, as our data sets are massive, and eloquent model building has proven a little slow.
Update as you've added structure. Why not just add a relation to Interaction, based upon machine_name (or even a custom method using raw sql that calculates the points), and use: Visits::with('interaction.visitPoints')->...blah ?
Take a look at DB instead of Eloquent:
https://laravel.com/docs/5.6/queries
For more complex and efficient queries.
There is also a possibility to use raw SQL with this facade.
Related
I am just starting out with nosql databases, and OrientDB in particular. My prior experience is with relational databases, mostly with SQL Server.
In MSSQL, I can do something like this:
SELECT s.url, p.title
FROM Site s
JOIN Page p ON s.Id = p.SiteId
And it will give me a table of all pages, along with the url of the site they belong to:
url | title
----------------------------
site1.com | page1
site1.com | page2
site2.com | page1
Now, in OrientDb, my understanding is that you should use a link for a one-way relationship, or an edge for a bidirectional relationship. Since I want to know what pages belong to a site, as well as what site a particular page belongs to, I have decided to use an edge in this case. The Site and Page classes/vertices are already created in similar fashion, but I can't figure out how to get a similar result set. From the documentation (https://orientdb.com/docs/2.2/SQL.html):
OrientDB allows only one class (classes are equivalent to tables in
this discussion) as opposed to SQL, which allows for many tables as
the target. If you want to select from 2 classes, you have to execute
2 sub queries and join them with the UNIONALL function
Their example SELECT FROM E, V then becomes SELECT EXPAND( $c ) LET $a = ( SELECT FROM E ), $b = ( SELECT FROM V ), $c = UNIONALL( $a, $b ), but that's not what I want. That results in something along the lines of
url | title
----------------------------
site1.com |
site1.com |
site2.com |
| page1
| page2
| page1
How would I go about creating the original result set, like in MSSQL?
Additional consideration: My training and experience with MSSQL dictates that database operations should be done in the database, rather than the application code. For example, I could have done one database call to get the s.url and s.id fields, then a second call to get the p.title and p.SiteId fields, and then matched them up in application code. The reason I avoid this is because multiple database calls is less efficient time-wise than the time it takes to return the extra/redundant information (in my example, site1.com is returned twice).
Is this perhaps not the case for OrientDb, or even graph/nosql databases in general? Should I instead be making two separate calls to get all of the data I need, i.e. SELECT FROM Site WHERE Url = "site1.com" AND SELECT EXPAND(OUT("HasPages")) FROM Site WHERE Name = "site1.com"?
Thank you
Try this:
select Url, out("HasPages").title as title from Site unwind title
Hope it helps
Regards
In a legacy project we had issues where if a developer would forget a project_id in the query condition, rows for all projects would be shown - instead of the single project they are meant to see. For example for "Comments":
comments [id, project_id, message ]
If you forget to filter by project_id you would see all projects. This is caught by tests, sometimes not, but I would rather do a prevention - the dev should see straightaway "WRONG/Empty"!
To get around this, the product manager is insisting on separate tables for comments, like this:
project1_comments [id,message]
project2_comments [id,message]
Here if you forgot the project/table name, if something were to still pass tests and got deployed, you would get nothing or an error.
However the difficulty is then with associated tables. Example "Files" linked to "Comments":
files [ id, comment_id, path ]
3, 1, files/foo/bar
project1_comments
id | message
1 | Hello World
project2_comments
id | message
1 | Bye World
This then turns into a database per project, which seems overkill.
Another possibility, how to add a Behaviour on the Comments model to ensure any find/select query does include the foreign key, eg - project_id?
Many thanks in advance.
In a legacy project we had issues where if a developer would forget a project_id in the query condition
CakePHP generates the join conditions based upon associations you define for the tables. They are automatic when you use contains and it's unlikely a developer would make such a mistake with CakePHP.
To get around this, the product manager is insisting on separate tables for comments, like this:
Don't do it. Seems like a really bad idea to me.
Another possibility, how to add a Behaviour on the Comments model to ensure any find/select query does include the foreign key, eg - project_id?
The easiest solution is to just forbid all direct queries on the Comments table.
class Comments extends Table {
public function find($type = 'all', $options = [])
{
throw new \Cake\Network\Exception\ForbiddenException('Comments can not be used directly');
}
}
Afterwards only Comments read via an association will be allowed (associations always have valid join conditions), but think twice before doing this as I don't see any benefits in such a restriction.
You can't easily restrict direct queries on Comments to only those that contain a product_id in the where clause. The problem is that where clauses are an expression tree, and you'd have to traverse the tree and check all different kinds of expressions. It's a pain.
What I would do is restrict Comments so that product_id has to be passed as an option to the finder.
$records = $Comments->find('all', ['product_id'=>$product_id])->all();
What the above does is pass $product_id as an option to the default findAll method of the table. We can than override that methods and force product_id as a required option for all direct comment queries.
public function findAll(Query $query, array $options)
{
$product_id = Hash::get($options, 'product_id');
if (!$product_id) {
throw new ForbiddenException('product_id is required');
}
return $query->where(['product_id' => $product_id]);
}
I don't see an easy way to do the above via a behavior, because the where clause contains only expressions by the time the behavior is executed.
I have a Django 1.8 application, and I am using an MsSQL database, with pyodbc as the db backend (using "django-pyodbc-azure" module).
I have the following models:
class Branch(models.Model):
name = models.CharField(max_length=30)
startTime = models.DateTimeField()
class Device(models.Model):
uid = models.CharField(max_length=100, primary_key=True)
type = models.CharField(max_length=20)
firstSeen = models.DateTimeField()
lastSeen = models.DateTimeField()
class Session(models.Model):
device = models.ForeignKey(Device)
branch = models.ForeignKey(Branch)
start = models.DateTimeField()
end = models.DateTimeField(null=True, blank=True)
I need to query the session model, and I want to exclude some records with specific device values. So I issue the following query:
sessionCount = Session.objects.filter(branch=branch)
.exclude(device__in=badDevices)
.filter(end__gte=F('start')+timedelta(minutes=30)).count()
badDevices is a pre-filled list of device ids with around 60 items.
badDevices = ['id-1', 'id-2', ...]
This query takes around 1.5 seconds to complete. If I remove the exclude from the query, it takes around 250 miliseconds.
I printed the generated sql for this queryset, and tried it in my database client. There, both versions executed in around 250 miliseconds.
This is the generated SQL:
SELECT [session].[id], [session].[device_id], [session].[branch_id], [session].[start], [session].[end]
FROM [session]
WHERE ([session].[branch_id] = my-branch-id AND
NOT ([session].[device_id] IN ('id-1', 'id-2', 'id-3',...)) AND
DATEPART(dw, [session].[start]) = 1
AND [session].[end] IS NOT NULL AND
[session].[end] >= ((DATEADD(second, 600, CAST([session].[start] AS datetime)))))
So, using the exclude in database level doesn't seem to be affecting the query performance, but in django, the query runs 6 times slower if I add the exclude part. What could be causing this?
The general issue seems to be that django is doing some extra work to prepare the exclude clause. After that step and by the time the SQL has been generated and sent to the database, there isn't anything interesting happening on the django side that could cause such a significant delay.
In your case, one thing that might be causing this is some kind of pre-processing of badDevices. If, for instance, badDevices is a QuerySet then django might be executing the badDevices query just to prepare the actual query's SQL. Possibly something similar might be happening in the case where device has a non-default primary key.
The other thing might delay the SQL preparation is of course django-pyodbc-azure. Maybe it's doing something strange while compiling the query and it becomes a bottleneck.
This is all wild speculation though, so if you're still having this issue then post the Device and Branch models as well, the exact content of badDevices and the SQL generated from the queries. Then maybe some scenarios can be at least eliminated.
EDIT: I think it must be the Device.uid field. Possibly django or pyodbc is getting confused by the non-default primary key and is fetching all the devices while generating the query. Try two things:
Replace device__in with device_id__in, device__pk__in and device__uid__in and check each one again. Maybe a more explicit query will be easier for django to translate into SQL. You can even try replacing branch with branch_id, just in case.
If the above doesn't work, try replacing the exclude expression with a raw SQL where clause:
# add quotes (because of the hyphens) & join
badDevicesIdString = ", ".join(["'%s'" % id for id in badDevices])
# Replaces .exclude()
... .extra(where=['device_id NOT IN (%s)' % badDevicesIdString])
If neither works, then most likely the problem is with the whole query and not just exclude. There are some more options in that case but try the above first and I will update my answer later if necessary.
Just want to share a similar problem that I had with MySQL and exclude clauses performance and how it was fixed.
When running the exclude clause, the list with the "in" lookup was actually a Queryset that I got using values_list method. Checking the exclude query executed by MySQL, the "in" objects were not values but actually another query. This behavior was impacting performance on specific large queries.
To fix that, instead of passing the queryset, I flat it out in a python list of values. By doing that, each value is passed as an argument inside the in lookup and the performance was really improved.
I want to paginate a union query in CakePHP 3.0.0. By using a custom finder, I have it working almost perfectly, but I can't find any way to get limit and offset to apply to the union, rather than either of the subqueries.
In other words, this code:
$articlesQuery = $articles->find('all');
$commentsQuery = $comments->find('all');
$unionQuery = $articlesQuery->unionAll($commentsQuery);
$unionQuery->limit(7)->offset(7); // nevermind the weirdness of applying this manually
produces this query:
(SELECT {article stuff} ORDER BY created DESC LIMIT 7 OFFSET 7)
UNION ALL
(SELECT {comment stuff})
instead of what I want, which is this:
(SELECT {article stuff})
UNION ALL
(SELECT {comment stuff})
ORDER BY created DESC LIMIT 7 OFFSET 7
I could manually construct the correct query string like this:
$unionQuery = $articlesQuery->unionAll($commentsQuery);
$sql = $unionQuery->sql();
$sql = "($sql) ORDER BY created DESC LIMIT 7 OFFSET 7";
but my custom finder method needs to return a \Cake\Database\Query object, not a string.
So,
Is there a way to apply methods like limit() to an entire union query?
If not, is there a way to convert a SQL query string into a Query object?
Note:
There's a closed issue that describes something similar to this (except using paginate($unionQuery)) without a suggestion of how to overcome the problem.
Apply limit and offset to each subquery?
scrowler kindly suggested this option, but I think it won't work. If limit is set to 5 and the full result set would be this:
Article 9 --|
Article 8 |
Article 7 -- Page One
Article 6 |
Article 5 --|
Article 4 --|
Comment 123 |
Article 3 -- Here be dragons
Comment 122 |
Comment 121 --|
...
Then the query for page 1 would work, because (the first five articles) + (the first five comments), sorted manually by date, and trimmed to just the first five of the combined result would result in articles 1-5.
But page 2 won't work, because the offset of 5 would be applied to both articles and comments, meaning the first 5 comments (which weren't included in page 1), will never show up in the results.
Being able to apply these clauses directly on the query returned by unionAll() is not possible AFAIK, it would require changes to the API that would make the compiler aware where to put the SQL, being it via options, a new type of query object, whatever.
Query::epilog() to the rescue
Luckily it's possible to append SQL to queries using Query::epilog(), being it raw SQL fragments
$unionQuery->epilog('ORDER BY created DESC LIMIT 7 OFFSET 7');
or query expressions
$unionQuery->epilog(
$connection->newQuery()->order(['created' => 'DESC'])->limit(7)->offset(7)
);
This should give you the desired query.
It should be noted that according to the docs Query::epilog() expects either a string, or a concrete \Cake\Database\ExpressionInterface implementation in the form a \Cake\Database\Expression\QueryExpression instance, not just any ExpressionInterface implementation, so theoretically the latter example is invalid, even though the query compiler works with any ExpressionInterface implementation.
Use a subquery
It's also possible to utilize the union query as a subquery, this would make things easier in the context of using the pagination component, as you wouldn't have to take care of anything other than building and injecting the subquery, since the paginator component would be able to simply apply the order/limit/offset on the main query.
/* #var $connection \Cake\Database\Connection */
$connection = $articles->connection();
$articlesQuery = $connection
->newQuery()
->select(['*'])
->from('articles');
$commentsQuery = $connection
->newQuery()
->select(['*'])
->from('comments');
$unionQuery = $articlesQuery->unionAll($commentsQuery);
$paginatableQuery = $articles
->find()
->from([$articles->alias() => $unionQuery]);
This could of course also be moved into a finder.
Consider an e-commerce application with multiple stores. Each store owner can edit the item catalog of his store.
My current database schema is as follows:
item_names: id | name | description | picture | common(BOOL)
items: id | item_name_id | picture | price | description | picture
item_synonyms: id | item_name_id | name | error(BOOL)
Notes: error indicates a wrong spelling (eg. "Ericson"). description and picture of the item_names table are "globals" that can optionally be overridden by "local" description and picture fields of the items table (in case the store owner wants to supply a different picture for an item). common helps separate unique item names ("Jimmy Joe's Cheese Pizza" from "Cheese Pizza")
I think the bright side of this schema is:
Optimized searching & Handling Synonyms: I can query the item_names & item_synonyms tables using name LIKE %QUERY% and obtain the list of item_name_ids that need to be joined with the items table. (Examples of synonyms: "Sony Ericsson", "Sony Ericson", "X10", "X 10")
Autocompletion: Again, a simple query to the item_names table. I can avoid the usage of DISTINCT and it minimizes number of variations ("Sony Ericsson Xperia™ X10", "Sony Ericsson - Xperia X10", "Xperia X10, Sony Ericsson")
The down side would be:
Overhead: When inserting an item, I query item_names to see if this name already exists. If not, I create a new entry. When deleting an item, I count the number of entries with the same name. If this is the only item with that name, I delete the entry from the item_names table (just to keep things clean; accounts for possible erroneous submissions). And updating is the combination of both.
Weird Item Names: Store owners sometimes use sentences like "Harry Potter 1, 2 Books + CDs + Magic Hat". There's something off about having so much overhead to accommodate cases like this. This would perhaps be the prime reason I'm tempted to go for a schema like this:
items: id | name | picture | price | description | picture
(... with item_names and item_synonyms as utility tables that I could query)
Is there a better schema you would suggested?
Should item names be normalized for autocomplete? Is this probably what Facebook does for "School", "City" entries?
Is the first schema or the second better/optimal for search?
Thanks in advance!
References: (1) Is normalizing a person's name going too far?, (2) Avoiding DISTINCT
EDIT: In the event of 2 items being entered with similar names, an Admin who sees this simply clicks "Make Synonym" which will convert one of the names into the synonym of the other. I don't require a way to automatically detect if an entered name is the synonym of the other. I'm hoping the autocomplete will take care of 95% of such cases. As the table set increases in size, the need to "Make Synonym" will decrease. Hope that clears the confusion.
UPDATE: To those who would like to know what I went ahead with... I've gone with the second schema but removed the item_names and item_synonyms tables in hopes that Solr will provide me with the ability to perform all the remaining tasks I need:
items: id | name | picture | price | description | picture
Thanks everyone for the help!
The requirements you state in your comment ("Optimized searching", "Handling Synonyms" and "Autocomplete") are not things that are generally associated with an RDBMS. It sounds like what you're trying to solve is a searching problem, not a data storage and normalization problem. You might want to start looking at some search architectures like Solr
Excerpted from the solr feature list:
Faceted Searching based on unique field values, explicit queries, or date ranges
Spelling suggestions for user queries
More Like This suggestions for given document
Auto-suggest functionality
Performance Optimizations
If there were more attributes exposed for mapping, I would suggest using a fast search index system. No need to set aliases up as the records are added, the attributes simply get indexed and each search issued returns matches with a relevance score. Take the top X% as valid matches and display those.
Creating and storing aliases seems like a brute-force, labor intensive approach that probably won't be able to adjust to the needs of your users.
Just an idea.
One thing that comes to my mind is sorting the characters in the name and synonym throwing away all white space. This is similar to the solution of finding all anagrams for a word. The end result is ability to quickly find similar entries. As you pointed out, all synonyms should converge into one single term, or name. The search is performed against synonyms using again sorted input string.