Postgres Update add to array and if larger than 5 remove last

Postgres Update add to array and if larger than 5 remove last - arrays

I have a little program that collects local news headlines all over a country. It should collect the top headline every day in an array and if it has more than 5 headlines, it should remove the oldest one and add the newest one at the top.
Heres the table:
CREATE TABLE place{
name text PRIMARY KEY,
coords text,
headlines json[]
}
The headlines array is basically just json objects with a time and headline property, that would be upserted like this:
insert into place VALUES ('giglio','52.531677;13.381777',
ARRAY[
'{"timestamp":"2012-01-13T13:37:27+00:00","headline":"costa concordia sunk"}'
]::json[])
ON CONFLICT ON CONSTRAINT place_pkey DO
UPDATE SET headlines = place.headlines || EXCLUDED.headlines
But obviously as soon at it hits 5 elements in the array, it will keep adding onto it. So is there a way to add these headlines and limit them to 5?
Alternative Solution:
insert into place VALUES ('giglio','52.531677;13.381777',
ARRAY[
'{"timestamp":"2012-01-13T13:37:27+00:00","headline":"costa concordia sunk"}'
]::json[])
ON CONFLICT ON CONSTRAINT place_pkey DO
UPDATE SET headlines = place.headlines[0:4] || EXCLUDED.headlines
RETURNING *

So is there a way to add these headlines and limit them to 5?
I believe yes.
You can define max array size
(search section 8.15.1 here https://www.postgresql.org/docs/current/arrays.html#ARRAYS-DECLARATION)
like this
headlines json[5]
But current implementation of Postgres does not enforce it (still good to do it for future compatibility and proper data model definition).
So I'd try if CHECK constraint is of any help here:
headlines json[5] CHECK (array_length(headlines) < 6)
This should give you a basic consistency check. From here there are two ways to continue (which seems out of the scope of this question):
Catch the PG exception on your app layer, clean up the data, and try inserting it again
Implement a function in your DB schema, that would attempt insert and cleanup.

Here's how I ended up doing it:
insert into place VALUES ('giglio','52.531677;13.381777',
ARRAY[
'{"timestamp":"2012-01-13T13:37:27+00:00","headline":"costa concordia sunk"}'
]::json[])
ON CONFLICT ON CONSTRAINT place_pkey DO
UPDATE SET headlines = place.headlines[0:4] || EXCLUDED.headlines
RETURNING *
EXCLUDED explanation
https://www.postgresql.org/docs/9.5/sql-insert.html

Related

No other way than dropping the duplicates, if ValueError: Index contains duplicate entries, cannot reshape?

enter image description here
Hi everyone, this is my first question.
I'm working on a dataset from patients who undergone urine analysis.
Every row refer to a single Patient Id and every Request ID could refer to different types of urine analysis (aspect, colour, number of erythrocytes, bacteria and go on).
I've add an image to let you understand my dataset.
I'd like to reshape making one request = one row , with all the tests done in the same request on the same row.
After that I want to merge with another df, that I reshape by Request ID (cause the first was missing a "long result" column, that I downloaded from another software in use in our Hospital).
I've tried:
df_pivot = df.pivot(index='Id Richiesta', columns = 'Nome Analisi Elementare', values = 'Risultato')
df_pivot.reset_index(inplace=True)
After I want to do --> df_merge = pd.merge (df_pivot,df,how='left', on='Id Richiesta')
I've tried once with another dataset, but I had to drop_duplicates for other purpose, and it worked.
But this time I have to analyse all the features.
How can I do? Is there no other way than dropping the duplicates?
Thank you for any help! :)

I've studied more my data and discovered 1 duplicate of bacteria for the same id request (1 in almost 8 million entries....)
df.drop_duplicates[df[['Id Richiesta', 'Id Analisi Elementare', 'Risultato']].duplicated()]
Then visualized all the rows referring at the "Id Richiesta" and the keep last (they were the same).
Thank you and sorry.
Please, tell me if I had to delete this question.

How to limit amount of associations in Elixir Ecto

I have this app where there is a Games table and a Players table, and they share an n:n association.
This association is mapped in Phoenix through a GamesPlayers schema.
What I'm wondering how to do is actually quite simple: I'd like there to be an adjustable limit of how many players are allowed per game.
If you need more details, carry on reading, but if you already know an answer feel free to skip the rest!
What I've Tried
I've taken a look at adding check constraints, but without much success. Here's what the check constraint would have to look something like:
create constraint("games_players", :limit_players, check: "count(players) <= player_limit")
Problem here is, the check syntax is very much invalid and I don't think there actually is a valid way to achieve this using this call.
I've also looked into adding a trigger to the Postgres database directly in order to enforce this (something very similar to what this answer proposes), but I am very wary of directly fiddling with the DB since I should only be using ecto's interface.
Table Schemas
For the purposes of this question, let's assume this is what the tables look like:
Games
Property
Type
id
integer
player_limit
integer
Players
Property
Type
id
integer
GamesPlayers
Property
Type
game_id
references(Games)
player_id
references(Players)

As I mentioned in my comment, I think the cleanest way to enforce this is via business logic inside the code, not via a database constraint. I would approach this using a database transaction, which Ecto supports via Ecto.Repo.transaction/2. This will prevent any race conditions.
In this case I would do something like the following:
begin the transaction
perform a SELECT query counting the number of players in the given game; if the game is already full, abort the transaction, otherwise, continue
perform an INSERT query to add the player to the game
complete the transaction
In code, this would boil down to something like this (untested):
import Ecto.Query
alias MyApp.Repo
alias MyApp.GamesPlayers
#max_allowed_players 10
def add_player_to_game(player_id, game_id, opts \\ []) do
max_allowed_players = Keyword.get(opts, :max_allowed_players, #max_allowed_players)
case is_game_full?(game_id, max_allowed_players) do
false -> %GamesPlayers{
game_id: game_id,
player_id: player_id
}
|> Repo.insert!()
# Raising an error causes the transaction to fail
true -> raise "Game #{inspect(game_id)} full; cannot add player #{inspect(player_id)}"
end
end
defp is_game_full?(game_id, max_allowed_players) do
current_players = from(r in GamesPlayers,
where: r.game_id == game_id,
select: count(r.id)
)
|> Repo.one()
current_players >= max_allowed_players
end

Django Query Optimisation

I am working currently on telecom analytics project and newbie in query optimisation. To show result in browser it takes a full minute while just 45,000 records are to be accessed. Could you please suggest on ways to reduce time for showing results.
I wrote following query to find call-duration of a person of age-group:
sigma=0
popn=len(Demo.objects.filter(age_group=age))
card_list=[Demo.objects.filter(age_group=age)[i].card_no
for i in range(popn)]
for card in card_list:
dic=Fact_table.objects.filter(card_no=card.aggregate(Sum('duration'))
sigma+=dic['duration__sum']
avgDur=sigma/popn
Above code is within for loop to iterate over age-groups.
Model is as follows:
class Demo(models.Model):
card_no=models.CharField(max_length=20,primary_key=True)
gender=models.IntegerField()
age=models.IntegerField()
age_group=models.IntegerField()
class Fact_table(models.Model):
pri_key=models.BigIntegerField(primary_key=True)
card_no=models.CharField(max_length=20)
duration=models.IntegerField()
time_8bit=models.CharField(max_length=8)
time_of_day=models.IntegerField()
isBusinessHr=models.IntegerField()
Day_of_week=models.IntegerField()
Day=models.IntegerField()
Thanks

Try that:
sigma=0
demo_by_age = Demo.objects.filter(age_group=age);
popn=demo_by_age.count() #One
card_list = demo_by_age.values_list('card_no', flat=True) # Two
dic = Fact_table.objects.filter(card_no__in=card_list).aggregate(Sum('duration') #Three
sigma = dic['duration__sum']
avgDur=sigma/popn

A statement like card_list=[Demo.objects.filter(age_group=age)[i].card_no for i in range(popn)] will generate popn seperate queries and database hits. The query in the for-loop will also hit the database popn times. As a general rule, you should try to minimize the amount of queries you use, and you should only select the records you need.
With a few adjustments to your code this can be done in just one query.
There's generally no need to manually specify a primary_key, and in all but some very specific cases it's even better not to define any. Django automatically adds an indexed, auto-incremental primary key field. If you need the card_no field as a unique field, and you need to find rows based on this field, use this:
class Demo(models.Model):
card_no = models.SlugField(max_length=20, unique=True)
...
SlugField automatically adds a database index to the column, essentially making selections by this field as fast as when it is a primary key. This still allows other ways to access the table, e.g. foreign keys (as I'll explain in my next point), to use the (slightly) faster integer field specified by Django, and will ease the use of the model in Django.
If you need to relate an object to an object in another table, use models.ForeignKey. Django gives you a whole set of new functionality that not only makes it easier to use the models, it also makes a lot of queries faster by using JOIN clauses in the SQL query. So for you example:
class Fact_table(models.Model):
card = models.ForeignKey(Demo, related_name='facts')
...
The related_name fields allows you to access all Fact_table objects related to a Demo instance by using instance.facts in Django. (See https://docs.djangoproject.com/en/dev/ref/models/fields/#module-django.db.models.fields.related)
With these two changes, your query (including the loop over the different age_groups) can be changed into a blazing-fast one-hit query giving you the average duration of calls made by each age_group:
age_groups = Demo.objects.values('age_group').annotate(duration_avg=Avg('facts__duration'))
for group in age_groups:
print "Age group: %s - Average duration: %s" % group['age_group'], group['duration_avg']
.values('age_group') selects just the age_group field from the Demo's database table. .annotate(duration_avg=Avg('facts__duration')) takes every unique result from values (thus each unique age_group), and for each unique result will fetch all Fact_table objects related to any Demo object within that age_group, and calculate the average of all the duration fields - all in a single query.

How to get the values of a table columns other than primary key

For getting the value of the primary key of the selected row, we can use record.get('id') in extjs. How to get the values of other columns (Say, if I have a column named name or t_id). In my case when I alert record.get('id') it gives the exact value whereas alerting record.get('t_id') shows undefined.
Thanks.
Update:
I am getting the result for record.get('name'). Only the foreign key t_id is not working.

You need to check certain thing #ejo.
1. As #Daemon said you need to check whether you had defined 't_id' in your store feilds or in model.
2. Second, you need to check whether you are sending the 't_id' value from backend.
3. third, if you use grid.getview option, check whether that t_id has been mapped to grid.
The most important one, please post your code, so that we can able to find the problem.

Yeah! The foreign keys in grid are always missing in my experience. Please try defining a hidden field for that foreign key(not tested yet):
column :t_id do |c|
c.hidden = true
end

How to insert records in master/detail relationship

I have two tables:
OutputPackages (master)
|PackageID|
OutputItems (detail)
|ItemID|PackageID|
OutputItems has an index called 'idxPackage' set on the PackageID column. ItemID is set to auto increment.
Here's the code I'm using to insert masters/details into these tables:
//fill packages table
for i := 1 to 10 do
begin
Package := TfPackage(dlgSummary.fcPackageForms.Forms[i]);
if Package.PackageLoaded then
begin
with tblOutputPackages do
begin
Insert;
FieldByName('PackageID').AsInteger := Package.ourNum;
FieldByName('Description').AsString := Package.Title;
FieldByName('Total').AsCurrency := Package.Total;
Post;
end;
//fill items table
for ii := 1 to 10 do
begin
Item := TfPackagedItemEdit(Package.fc.Forms[ii]);
if Item.Activated then
begin
with tblOutputItems do
begin
Append;
FieldByName('PackageID').AsInteger := Package.ourNum;
FieldByName('Description').AsString := Item.Description;
FieldByName('Comment').AsString := Item.Comment;
FieldByName('Price').AsCurrency := Item.Price;
Post; //this causes the primary key exception
end;
end;
end;
end;
This works fine as long as I don't mess with the MasterSource/MasterFields properties in the IDE. But once I set it, and run this code I get an error that says I've got a duplicate primary key 'ItemID'.
I'm not sure what's going on - this is my first foray into master/detail, so something may be setup wrong. I'm using ComponentAce's Absolute Database for this project.
How can I get this to insert properly?
Update
Ok, I removed the primary key restraint in my db, and I see that for some reason, the autoincrement feature of the OutputItems table isn't working like I expected. Here's how the OutputItems table looks after running the above code:
ItemID|PackageID|
1 |1 |
1 |1 |
2 |2 |
2 |2 |
I still don't see why all the ItemID values aren't unique.... Any ideas?

Does using insert rather than append on the items table behave any differently? My guess here is that the append on the detail "sees" an empty dataset, so the auto-increment logic starts at one, the next record two, etc even though those values have already been assigned... just to a different master record.
One solution I used in the past was to create a new table named UniqueNums that persisted the next available record id number that I was going to use. As I used a number, I would lock that table, increment the value and write it back then unlock and use. This might get you around the specific issue you are having.

First of all the idea of autoincrement and setting ID's by code clash in my opinion. The clear path to go is to generate the key yourself in the code. Especially with multi user apps that require master/detail inserts it is hard to impossible to get the right key inserted for the detail.
So generate a ID by code. When designing the table, set the ID field to primary key but no auto increment. If I'm not mistaken Append is used for the operation.
Also you seem to iterate while the visual controls are enabled? (Item.Activated) . But the operation is a batch process by nature. For GUI performance you should consider, disabling db controls that are connected and then execute the operation. Being in the master/detail scope, this may be the issue that two other cursors not iterating as expected.

Have you tried to replace Append/Insert with Edit?
And skip the "FieldByName('PackageID').AsInteger := Package.ourNum;" line.
I think that the M/D relationship automatically appends the detail records as needed, and also sets the detail table's primary keys.
That may also be the reason for the duplicate primary key error. The record is already created by the M/D-relationship when you try to Append/Insert another one.