PocketBase filter data by with multiple relation set - database

i have a collection where the field "users" contains the id's of two users.
how can i search for this dataset where this both user id's are.
i tried
users = ["28gjcow5t1mkn7q", "frvl86sutarujot"]
still don't work

This is a relation, so there must be a collection that allows you multiple, non unique results, so this table you are looking at to, is the the dataset, you can query the whole dataset on script with
// you can also fetch all records at once via getFullList
const records = await pb.collection('tlds').getFullList(200 /* batch size */, {
sort: '-created',
});
I sugest you to look into: js-sdk/pocketbase on github.

Related

database design eloquent problem with query

I need to create an app who manage soccer sheets
I have actually a table who store the match with both teams
match :
-id
-dt_math
-club_home_id
-club_visitor_id
each team have a sheet to create the list of players.
So what i did, i created table match_sheet to store the both sheets from the teams.
match_sheet :
-id
-match_id
to store the players in each sheets i created the table match_sheet_player
match_sheet_player:
-id
-match_sheet_id
-player_id
Now i need to display only the matchs who have the both sheets in my view. and i don't know how to achieve that.
The first query that i made is that :
$matchs_sheets = MatchSheet::all();
$matchs = Match::whereIn('id', $matchs_sheets->pluck('match_id'))->orderByDesc('dt_match')->paginate(5);
But this return my the match even if there is one sheet but not both sheets. i really need to show the match onyl if there is the two sheets.
Update :
here my data for math_sheet
there is two records with 1659. 1659 is the id of the match. so i would like to only show match 1659 and not 1649 because there is only one record for this match
Assuming your model relationships are set up correctly, you can ask Laravel to get the matches only if the related model has a count of at least 2, using has(). For instance:
$matches = Match::whereIn('id', $ids)->has('matchSheet', '=', 2)...
Your relationship should be set up as e.g. this:
// on Match model
public function matchSheets()
{
return $this->hasMany(MatchSheet::class);
}
// on MatchSheet model
public function match()
{
return $this->belongsTo(Match::class);
}
Docs here: https://laravel.com/docs/5.6/eloquent-relationships#one-to-many - I really recommend reading through them, they'll save you huge amounts of time eventually!

Is there a built-in function to get all unique values in an array field, across all records?

My schema looks like this:
var ArticleSchema = new Schema({
...
category: [{
type: String,
default: ['general']
}],
...
});
I want to parse through all records and find all unique values for this field across all records. This will be sent to the front-end via being called by service for look-ahead search on tagging articles.
We can iterate through every single record and run go through each array value and do a check, but this would be O(n2).
Is there an existing function or another way that has better performance?
You can use the distinct function to get the unique values across all category array fields of all documents:
Article.distinct('category', function(err, categories) {
// categories is an array of the unique category values
});
Put an index on category for best performance.

Django Query Optimisation

I am working currently on telecom analytics project and newbie in query optimisation. To show result in browser it takes a full minute while just 45,000 records are to be accessed. Could you please suggest on ways to reduce time for showing results.
I wrote following query to find call-duration of a person of age-group:
sigma=0
popn=len(Demo.objects.filter(age_group=age))
card_list=[Demo.objects.filter(age_group=age)[i].card_no
for i in range(popn)]
for card in card_list:
dic=Fact_table.objects.filter(card_no=card.aggregate(Sum('duration'))
sigma+=dic['duration__sum']
avgDur=sigma/popn
Above code is within for loop to iterate over age-groups.
Model is as follows:
class Demo(models.Model):
card_no=models.CharField(max_length=20,primary_key=True)
gender=models.IntegerField()
age=models.IntegerField()
age_group=models.IntegerField()
class Fact_table(models.Model):
pri_key=models.BigIntegerField(primary_key=True)
card_no=models.CharField(max_length=20)
duration=models.IntegerField()
time_8bit=models.CharField(max_length=8)
time_of_day=models.IntegerField()
isBusinessHr=models.IntegerField()
Day_of_week=models.IntegerField()
Day=models.IntegerField()
Thanks
Try that:
sigma=0
demo_by_age = Demo.objects.filter(age_group=age);
popn=demo_by_age.count() #One
card_list = demo_by_age.values_list('card_no', flat=True) # Two
dic = Fact_table.objects.filter(card_no__in=card_list).aggregate(Sum('duration') #Three
sigma = dic['duration__sum']
avgDur=sigma/popn
A statement like card_list=[Demo.objects.filter(age_group=age)[i].card_no for i in range(popn)] will generate popn seperate queries and database hits. The query in the for-loop will also hit the database popn times. As a general rule, you should try to minimize the amount of queries you use, and you should only select the records you need.
With a few adjustments to your code this can be done in just one query.
There's generally no need to manually specify a primary_key, and in all but some very specific cases it's even better not to define any. Django automatically adds an indexed, auto-incremental primary key field. If you need the card_no field as a unique field, and you need to find rows based on this field, use this:
class Demo(models.Model):
card_no = models.SlugField(max_length=20, unique=True)
...
SlugField automatically adds a database index to the column, essentially making selections by this field as fast as when it is a primary key. This still allows other ways to access the table, e.g. foreign keys (as I'll explain in my next point), to use the (slightly) faster integer field specified by Django, and will ease the use of the model in Django.
If you need to relate an object to an object in another table, use models.ForeignKey. Django gives you a whole set of new functionality that not only makes it easier to use the models, it also makes a lot of queries faster by using JOIN clauses in the SQL query. So for you example:
class Fact_table(models.Model):
card = models.ForeignKey(Demo, related_name='facts')
...
The related_name fields allows you to access all Fact_table objects related to a Demo instance by using instance.facts in Django. (See https://docs.djangoproject.com/en/dev/ref/models/fields/#module-django.db.models.fields.related)
With these two changes, your query (including the loop over the different age_groups) can be changed into a blazing-fast one-hit query giving you the average duration of calls made by each age_group:
age_groups = Demo.objects.values('age_group').annotate(duration_avg=Avg('facts__duration'))
for group in age_groups:
print "Age group: %s - Average duration: %s" % group['age_group'], group['duration_avg']
.values('age_group') selects just the age_group field from the Demo's database table. .annotate(duration_avg=Avg('facts__duration')) takes every unique result from values (thus each unique age_group), and for each unique result will fetch all Fact_table objects related to any Demo object within that age_group, and calculate the average of all the duration fields - all in a single query.

Is there an algorithm or pattern to merge several rows of the same record into one row?

Due to some unknown fault, every time I have sync'ed my Nokia's Contacts with my Outlook Contacts, via Nokia Suite, each contact on the phone gets added to Outlook again. I now have up to four copies of some contacts in Outlook. Some have different fields populated in different duplicates.
What I want to do is import my contacts into an a database table or in-memory object collection, from CSV, and then merge the properties of all copies of each 'unique' record into one record, and import back into an empty Contacts folder in outlook. Is there any elegant way to do this, either in plain C#, LINQ, or T-SQL?
Or do I just loop through all copies (rows) of the first column, and copy any values found into versions of that column that are blank or less up to date, then carry on iterating onto the second column through to the last?
My strategy would be to first group all rows on some key like new { FirstName, LastName } or EMail (I don't know what your data looks like). Now you have groups of rows that all belong to the same person. You now need to merge them (using any algorithm you like). You could either choose the newest one, or merge individual attributes like this:
from r in rows
group r by r.EMail into g
select new {
EMail = g.Key,
DateOfBirth = g.Select(x => x.DateOfBirth).Where(x => x != null).First(),
...
}
In this example I'm picking the first non-null value for DateOfBirth non-deterministically.

About indexes of GAE datastore

I have a following model in the GAE app.
class User
school_name = db.StringProperty(Indexed=True)
country = db.StringProperty(Indexed=True)
city = db.StringProperty(Indexed=True)
sex = db.StringProperty(Indexed=True)
profession = db.StringProperty(Indexed=True)
joined_date = db.DateTimeProperty(Indexed=True)
And I want to filter the users by combinations of these fields. Result of the filter should show a user at first who is joined recently. So which means any query end by order operation, I suppose. like that:
User.all().filter('country =','US').filter('profession =','SE').order('-joined_date')
User.all().filter('school_name =','AAA').filter('profession =','SE').order('-joined_date')
....
User.all().filter('sex =','Female').filter('profession =','HR').order('-joined_date')
All these fields combination would be C(5,1)+C(5,2)+...+C(5,5) = 31.
My question is to implement it, do I need to create indexes for all these cases(31) in the Google AppEngine. Or can you suggest other way to implement it?
Note: C(n,k) is combination formula, see more on http://en.wikipedia.org/wiki/Combination
Thanks in advance!
You have several options:
Create all 31 indexes, as you suggest.
Do the sorting in memory. Without a sort order, all your queries can be executed with the built-in merge-join strategy, and so you won't need any indexes at all.
Restrict queries to those that are more likely, or those that eliminate most of the non-matching results, and perform additional filtering in memory.
Put all your data in a ListProperty for indexing as "key:value" strings, and filter only on that. You will need to create multiple indexes with different occurrence counts on that field (eg, indexing it once, twice, etc), and it will result in the same number of index entries, but fewer custom indexes used.

Resources