I'm working on a web application for a farmer, a permacultor to be more precise. It implies that he's handling various seeds and plants. The purpose of the application is to store in database data regarding his garden. Once enough data are gathered, the application is to provide a frame to analyze data.
For now I'm developping the first fonctionality : store data in database and display them on webpages.
Let's focus on the main topic of this question.
The garden has several fields. A field can contain several plants. Plants can have several state throught time (seed, plant, flower, harvestable).
When a plant reach a state, we need to store specific information :
the state
the date when state was observed (and other time-related data)
the quantity (from seed state to flower state, we can have loss for many reasons)
So far so good, nothing fancy.
NOW a plant can grow on a field until a specitifc state then be planted into another field until its harvesting.
For instance, 12 carrots that are growing in tray n°3 from the seed to germination state.
At germination state, 2 carrots didn't make it. The farmer now intend to resume the growing of his carrots not in tray n°3 but in field n°1
In model, let's say "state_plant_table" you have 2 entries :
carrots - 12 - seeds - tray n°3
carrots - 10 - germ - field n°1
You might see it coming.
Let's say now that... there isn't enough room in field n°1 for the 10 carrots, only 8 can fit. So he just put the 2 left in the field aside - field n°2.
We now have
carrots - 12 - seeds - tray n°3
carrots - 8 - germ - field n°1
carrots - 2 - germ - field n°2
NOW, on display we would show HTML table for each fields, trays or w/e. When you click on a field you have the detail of every plants rooted in it.
For field n°1 we would have :
carrots - 8
For field n°2 we would have :
carrots - 2
And, unfortunately, for tray n°3, we would have :
carrots - 12
But we should have 0 (if 0 => exclude from display of course).
I'm struggling with the theorical design of my process right now... any tips, hints, suggestions are welcome !
I have thought about a "parent" quantity and a "child" quantity where the initial quantity would be store in "plant_table" as "parent" quantity and "children" quantity would be stored in "state_plant_table" - the quantity is more linked to a state in which it's being observed than the plant itself.
I feel like this is the right way, but I don't manage to push the reasoning to its end either.
Reasoning with "parent" and "children" was one of the correct approach.
There are actually 3 natures of quantity to store :
quantity of plants observed at a certain state (quantite_etat)
quantity of plants actually in the field (quantite)
quantity of plants from the same parent (quantite_lot)
models.py
class EtatLot(models.Model):
id = models.AutoField(primary_key = True)
id_lot = models.ForeignKey('Lot', on_delete = models.PROTECT, verbose_name = 'Lot', related_name = 'lot_parent', null = True, blank = True)
etat = models.CharField(verbose_name = 'État', max_length = 50, null = True, blank = True)
quantite = models.PositiveSmallIntegerField(verbose_name = 'Quantité', null = True, blank = True)
quantite_etat = models.PositiveSmallIntegerField(verbose_name = 'Quantité relatée', null = True, blank = True)
class Lot(models.Model):
id = models.AutoField(primary_key = True)
quantite_lot = models.PositiveSmallIntegerField(verbose_name = 'Quantité', null = True, blank = True)
The quantity displayed is the one from quantite. The quantity used to analyze data is the one from quantite_etat.
Example
Let's say we have 12 cauliflower on field n°1 and the farmer want to plant 10 of them on field n°2.
Within database we have :
Lot table
id : 1 quantite_lot : 12
EtatLot table
id : 1 id_lot : 1 etat : Seed quantite : 12 quantite_etat : 12
id : 2 id_lot : 1 etat : Germ quantite : 12 quantite_etat : 12
At the end of operations, we should have this :
Lot table
id : 1 quantite_lot : 2
EtatLot table
id : 1 id_lot : 1 etat : Seed quantite : 12 quantite_etat : 12 field : fieldn°1
id : 2 id_lot : 1 etat : Germ quantite : 2 quantite_etat : 12 field : fieldn°1
id : 3 id_lot : 1 etat : Plan quantite : 10 quantite_etat : 10 field : fieldn°2
For this operation, quantite_lot is irrelevant. However I store it in order to do some stock check : you cannot plant more plants that you have.
This is how I achieved to the table above :
get quantity from the last child of lot_parent (quantite)
update this value with the difference between it and the value added by the farmer in the form, update the value from quantite_lot in the parent as well
store the value of the form in quantite and quantite_etat of the entry that is about to be added in the database
Related
We have two data tables like item, itemMeta. Each table has CRUD APIs. and each data relations one to one.
<item table in A server>
id name created_at
------------------------
1 a_text 2022-08-23
2 b_text 2022-08-23
3 c_text 2022-08-23
4 d_text 2022-08-23
5 e_text 2022-08-23
...
xxxx hello_text 2022-08-23
...
<itemMeta table in B server>
id itemId price created_at
--------------------------------
1 1 10 2022-08-23
1 11 110 2022-08-23
1 24 420 2022-08-23
1 4 130 2022-08-23
1 5 1340 2022-08-23
....
yyyy xxxx 500 2022-08-23
....
When I want make endpoint like
/search-with-item-meta?search=o_text&page=4&sort=highprice-to-lowprice
I shoud call items with search text and call itemMeta with price sort infomations and then matching two datas with uniq id.
but item table hasn't price and itemMeta table hasn't title and also has pagination. Unfortunately, two table is different DB and seperate place. so It should call with APIs.
simply I will make complete with add field price at item, add field title at itemMeta. But It is not clear. and worried about to sync with two table and pagination.
How can I solve this issues?
We used Postgresql DB with typeorm and NestJS
I am writing this answer based on MySql.
Ensure that you have made the relationship between the entities then Create a queryBuilder and join the two tables in your items repository like below
relationalQuery(){
return itemRepository.createQueryBuilder("items")
.leftJoinAndSelect("items.itemMeta","itemMeta")
}
Now we need a function to pass all parameter to filter them and return paginated response
async findAll(page: number, limit: number, search: string, sort: string, _queryBuilder: SelectQueryBuilder<T>){
queryBuilder = _queryBuilder.take(limit).skip((page - 1) * limit)
if (sort) {
let order = "ASC"
if (sort === "highprice-to-lowprice") {
order = "DESC"
}
queryBuilder.addOrderBy("itemMeta.price", order)
}
if (search) {
queryBuilder.andWhere(new Brackets((qb: SelectQueryBuilder<T>) => {
condition = {
operator: like,
parameters: [`${qb.alias}_name}`, `:${name}`],
}
qb = qb.orWhere(qb['createWhereConditionExpression'](condition), {
name: `%${search}%`
}),
})
})
[items, totalItems] = await queryBuilder.getManyAndCount()
let totalPages = totalItems / limit
if (totalItems % limit) totalPages = Math.ceil(totalPages)
retrun{
items,
totalItems,
totalPages,
}
}
Calling this function
const query = relationalQuery()
findAll(1,15,"o_text","highprice-to-lowprice",query)
This is a demo of how to implement it. You can add a column-wise filter, sort and search.Then you have to pass a object for sort like
sort_by:{
name:"ASC",
id: "DESC",
"itemMeta.price":"DESC"
}
You have to make a customize pagination function like the above where you have to break down all the columns and operations of your own.
For example I have 3 different entities
#action = eat,run,walk
#person = Michael, John, Fred
#emotion = angry,sad,happy
I want to count user entered action and person entities
If bot recognizes
entities['action'].size() + entities['person'].size() > 2
Any other way to achieve this?
To account for one of the entities not being recognized, you can use ternary operator <Expression> ? <what_to_do_when_true> : <what_to_do_when_false>.
So, in your example the condition would look like this:
((entities['action'] != null ? entities['action'].size() : 0) + (entities['action'] != null ? entities['person'].size() : 0)) > 2
When one of the entity is not recognized (null), the value counted will be 0.
I've been reading Aql Graph Operation and Graphs, and have found no concrete example and performance explanation for the use case of SQL-Traverse.
E.g.:
If I have a collection Users, which has a company relation to collection Company
Collection Company has relation location to collection Location;
Collection Location is either a city, country, or region, and has relation city, country, region to itself.
Now, I would like to query all users who belong to companies in Germany or EU.
SELECT from Users where Users.company.location.city.country.name="Germany";
SELECT from Users where Users.company.location.city.parent.name="Germany";
or
SELECT from Users where Users.company.location.city.country.region.name="europe";
SELECT from Users where Users.company.location.city.parent.parent.name="europe";
Assuming that Location.name is indexed, can I have the two queries above executed with O(n), with n being the number of documents in Location (O(1) for graph traversal, O(n) for index scanning)?
Of course, I could just save regionName or countryName directly in company, as these cities and countries are in EU, unlike in ... other places, won't probably change, but what if... you know what I mean (kidding, what if I have other use cases which require constant update)
I'm going to explain this using the ArangoDB 2.8 Traversals.
We Create these collections to match your shema using arangosh:
db._create("countries")
db.countries.save({_key:"Germany", name: "Germany"})
db.countries.save({_key:"France", name: "France"})
db.countries.ensureHashIndex("name")
db._create("cities")
db.cities.save({_key: "Munich"})
db.cities.save({_key: "Toulouse")
db._create("company")
db.company.save({_key: "Siemens"})
db.company.save({_key: "Airbus"})
db._create("employees")
db.employees.save({lname: "Kraxlhuber", cname: "Xaver", _key: "user1"})
db.employees.save({lname: "Heilmann", cname: "Vroni", _key: "user2"})
db.employees.save({lname: "Leroy", cname: "Marcel", _key: "user3"})
db._createEdgeCollection("CityInCountry")
db._createEdgeCollection("CompanyIsInCity")
db._createEdgeCollection("WorksAtCompany")
db.CityInCountry.save("cities/Munich", "countries/Germany", {label: "beautiful South near the mountains"})
db.CityInCountry.save("cities/Toulouse", "countries/France", {label: "crowded city at the mediteranian Sea"})
db.CompanyIsInCity.save("company/Siemens", "cities/Munich", {label: "darfs ebbes gscheits sein? Oder..."})
db.CompanyIsInCity.save("company/Airbus", "cities/Toulouse", {label: "Big planes Ltd."})
db.WorksAtCompany.save("employees/user1", "company/Siemens", {employeeOfMonth: true})
db.WorksAtCompany.save("employees/user2", "company/Siemens", {veryDiligent: true})
db.WorksAtCompany.save("employees/user3", "company/Eurocopter", {veryDiligent: true})
In AQL we would write this query the other way around.
We start with the constant time FILTER on the indexed attribute name and start our traversals from there on.
Therefor we filter for the country "Germany":
db._explain("FOR country IN countries FILTER country.name == 'Germany' RETURN country ")
Query string:
FOR country IN countries FILTER country.name == 'Germany' RETURN country
Execution plan:
Id NodeType Est. Comment
1 SingletonNode 1 * ROOT
6 IndexNode 1 - FOR country IN countries /* hash index scan */
5 ReturnNode 1 - RETURN country
Indexes used:
By Type Collection Unique Sparse Selectivity Fields Ranges
6 hash countries false false 66.67 % [ `name` ] country.`name` == "Germany"
Optimization rules applied:
Id RuleName
1 use-indexes
2 remove-filter-covered-by-index
Now that we have our well filtered start node, we do a graph traversal in reverse direction. Since we know that Employees are exactly 3 steps away from the start Vertex, and we're not interested in the path, we only return the 3rd layer:
db._query("FOR country IN countries FILTER country.name == 'Germany' FOR v IN 3 INBOUND country CityInCountry, CompanyIsInCity, WorksAtCompany RETURN v")
[
{
"cname" : "Xaver",
"lname" : "Kraxlhuber",
"_id" : "employees/user1",
"_rev" : "1286703864570",
"_key" : "user1"
},
{
"cname" : "Vroni",
"lname" : "Heilmann",
"_id" : "employees/user2",
"_rev" : "1286729095930",
"_key" : "user2"
}
]
Some words about this queries performance:
We locate Germany using a hash index is constant time -> O(1)
Based on that we want to traverse m many paths where m is the number of employees in Germany; Each of them can be traversed in constant time.
-> O(m) at this step.
Return the result in constant time -> O(1)
All combined we need O(m) where we expect m to be less than n (the number of employees) as used in your SQL-Traversal.
I have a test database which logs data from when a store logs onto a store portal and how long it stays logged on.
Example:
(just for visualizing purposes - not actual database)
Stores
Id Description Address City
1 Candy shop 43 Oxford Str. London
2 Icecream shop 45 Side Lane Huddersfield
Connections
Id Store_Ref Start End
1 2 2011-02-11 09:12:34.123 2011-02-11 09:12:34.123
2 2 2011-02-11 09:12:36.123 2011-02-11 09:14:58.125
3 1 2011-02-14 08:42:10.855 2011-02-14 08:42:10.855
4 1 2011-02-14 08:42:12.345 2011-02-14 08:50:45.987
5 1 2011-02-15 08:35:19.345 2011-02-15 08:38:20.123
6 2 2011-02-19 09:08:55.555 2011-02-19 09:12:46.789
I need to get various data from the database. I've already gotten the max and average connection duration. (So probably very self-evident that..) I also need to have some information about which connection lasted the least. I ofcourse immediately thought of the Min() function of Linq, but as you can see, the database also includes connections that started and ended instantly. Therefore, that data isn't actually "valid" for data analysis.
So my question is how to get the minimum value, but if the value = 0, get the next value that is the lowest.
My linq query so far (which implements the Min() function):
var min = from connections in Connections
join stores in Stores
on connections.Store_Ref equals stores.Id
group connections
by stores.Description into groupedStores
select new
{
Store_Description = groupedStores.Key,
Connection_Duration = groupedStores.Min(connections =>
(SqlMethods.DateDiffSecond(connections.Start, connections.End)))
};
I know that it's possible to get the valid values through multiple queries and/or statements though, but I was wondering if it's possible to do it all in just one query, since my program expects linq queries to be returned and my preference goes to keeping the program as "light" as possible.
If you have to great/simple method to do so, please share it. Your contribution is very appreciated! :)
What if you add, before the select new, a let clause for the duration of the conection with something like:
let duration = SqlMethods.DateDiffSecond(connections.Start, connections.End)
And then add a where clause
where duration != 0
var min = from connections in Connections.Where(connections => (SqlMethods.DateDiffSecond(connections.Start, connections.End) > 0)
join stores in Stores
on connections.Store_Ref equals stores.Id
group connections
by stores.Description into groupedStores
select new
{
Store_Description = groupedStores.Key,
Connection_Duration = groupedStores.Min(connections =>
(SqlMethods.DateDiffSecond(connections.Start, connections.End)))
};
Try this, With filtering the "0" values you will get the right result, at least that is my taught.
Include a where clause before calculating the Min value.
groupedStores.Where(conn => SqlMethods.DateDiffSecond(conn.Start, conn.End) > 0)
.Min(conn => (SqlMethods.DateDiffSecond(conn.Start, conn.End))
I have a table named stats
player_id team_id match_date goal assist`
1 8 2010-01-01 1 1
1 8 2010-01-01 2 0
1 9 2010-01-01 0 5
...
I would like to know when a player reach a milestone (eg 100 goals, 100 assists, 500 goals...)
I would like to know also when a team reach a milestone.
I want to know which player or team reach 100 goals first, second, third...
I thought to use triggers with tables to accumulate the totals.
Table player_accumulator (and team_accumulator) table would be
player_id total_goals total_assists
1 3 6
team_id total_goals total_assists
8 3 1
9 0 5
Each time a row is inserted in stats table, a trigger will insert/update player_accumulator and team_accumulator tables.
This trigger could also verify if player or team has reached a milestone in milestone table containing numbers
milestone
100
500
1000
...
A table player_milestone would contains milestone reached by player:
player_id stat milestone date
1 goal 100 2013-04-02
1 assist 100 2012-11-19
There is a better way to implements a "milestone" ?
There is an easiest way without triggers ?
I'm using PostgreSQL
I'd just count all goals and assists of a player which scores, and team, which scores.
Like this on client side (in pseudocode):
function insert_stat(player_id, team_id, match_date, goals, assists)
{
if (goals>0) {
player_goals_before = query('select count(goal) from stats where player_id=?',player_id);
team_goals_before = query('select count(goal) from stats where team_id=?',team_id);
}
if (assists>0) {
player_assists_before = query('select count(assist) from stats where player_id=?',player_id);
team_assists_before = query('select count(assist) from stats where team_id=?',team_id);
}
query("insert into stats (player_id, team_id, match_date, goal, assist)\n"
+"values (?, ?, ?, ?, ?)", player_id, team_id, match_date, goal, assist);
if (goals>0) {
if ( has_milestone(player_goals_before+goals) and !has_milestone(player_goals_before) ) {
alert("player " + player_id + " reached milestone!")
}
if ( has_milestone(team_goals_before+goals) and !has_milestone(team_goals_before) ) {
alert("team " + team_id + " reached milestone!")
}
}
// etc
}
Do not maintain milestone table, as this makes the database denormalized. I think this is a premature optimization. Only when the above is really not fast enough (for example when stats will have more than few thousands of rows per player_id or team_id) then you can think of maintaining milestone table.