Question about PostgreSQL performance tables - database

I am working on redesigning a service to optimize it and I don't know which option to choose for a table.
I have two tables with many entries.
Table 1: logs (10M rows)
Table 2: user_agents (2M rows)
logs
+----+-----+---------------+-----+
| id | ... | user_agent_id | ... |
+----+-----+---------------+-----+
user_agents
+----+------+
| id | name |
+----+------+
Currently my table logs have a user_agent_id to go directly to the user_agent associated with the logs of a user's visit.
When a new visitor sends logs before saving them in the database I check that this user agent is already in the user_agent table otherwise I add it.
As long as the logs are not registered in the database, the user does not have access to the page he wants.
I would like to optimize the writing speed in the logs table and I would like to know if the fact of having separated its data into 2 tables is really significant in terms of writing time.
The alternative would be to put a user_agent column back into the logs table.
Fontion to get id/insert user_agent :
public function get_ua_id($headers)
{
return ListUA::firstOrCreate(['name' => isset($headers['HTTP_USER_AGENT']) ? utf8_encode($headers['HTTP_USER_AGENT']) : '']);
}

Related

Is it possible to create a repeating table in SSRS Report based on data from SQL database?

I have created a Powerapp which is used to audit schools and the data saves to my SQL database. I have designed a report in SSRS to display the findings of the audit. The SQL table, shown below, stores the items in each room that were audited (i.e. desks, pcs, shelves etc) plus the name of the room and whether any actions need to take place. I need my report to display one table per room with the items down the left hand side and the name of the room as a title. This should be repeated for each room. There may be a different number of rooms in each report so this will be varied. I've included a screenshot of what the table needs to look like. When I create the table, I can only get the room names down the left hand side in one table and the items across the top. Please help.
Too long for a comment so I'll have to reply here.
Your data is not a a format that is particularly suited to this. I can't see how you can determine 'Compliant' from the data you have shown in your screen shots although it maybe that you have not shown everything you have available.
However, I would start by looking into the t-sql UNPIVOT function to get your data into a more normalised format. Using UNPIVOT you could turn your data into something like..
AuditID | Room | Item | Present
------------------------------------------
3019 | Reception | PC | True
3019 | Reception | Desks | True
3019 | Class 1 | PC | False
3019 | Class 1 | Desks | True
You can obviously extend this to include all pertinent data.
Once you have your data in this format, create tablix with 'item' and 'present' columns only. You will have a 'detail' rowgroup at this point. Right-click the rowgroup and add a parent group, set this group to be grouped by Room.
This will give you the basic layout, from there you can add some padding or blank rows to the room group or even page breaks.
If you cannot get past the UNPIVOT function then I suggest you post a new question specifically on that topic then return here once you have the data in the correct format

How to structure Dynamodb correctly to have nested or linked properties?

I am getting started with Dynamodb and I am trying to figure out what is the correct way to structure the following:
I have a user, and each user can have multiple pictures (s3 links and some metadata), with no limit to the amount.
Whenever I am calling for a user I would retrieve all their pictures, there would be no reads on a single picture, and each time a user uploads a picture, I need to store it for the user.
In Mongodb I would have created an array called pictures holding objects with each picture's data. Is this also the correct approach in Dynamodb?
Here is how you can do with dynamodb
| pk | sk |
| user1 | metadata | age:24 | name: Jon| ...
| user1 | picture#1234#id1 | url:abc.com/xyz.jpeg | ... some other metadata
| user1 | picture#2456#id2 | url:abc.com/xyz.jpeg | ... some other metadata
| user1 | picture#4567#id3 | url:abc.com/xyz.jpeg | ... some other metadata
where 1234, 2456 may represent epoch time uploading the picture.
Now you can do queries like
get me all the pictures in chronological order.
Select * from table where pk=user1 and sk starts with picture#
get me all the pictures uploaded b/w or after certain date.
Select * from table where pk=user1 and sk starts with picture# and sk>picture#1234
Get me user details
Select from table where pk=user1 and sk=metadata
This will make sure few things
No race condition/lock while uploading the picture
you can decide how many images you want to fetch, And you will read only what you need, instead of loading all of them and then filtering.

Another way to build database structure

I have to optimize my little-big database, because it's too slow, maybe we'll find another solution together.
First of all let's talk about data that are stored in the database. There are two objects: users and let's say messages
Users
There is something like that:
+----+---------+-------+-----+
| id | user_id | login | etc |
+----+---------+-------+-----+
| 1 | 100001 | A | ....|
| 2 | 100002 | B | ....|
| 3 | 100003 | C | ....|
|... | ...... | ... | ....|
+----+---------+-------+-----+
There is no problem inside this table. (Don't afraid of id and user_id. user_id is used by another application, so it has to be here.)
Messages
And the second table has some problem. Each user has for example messages like this:
+----+---------+------+----+
| id | user_id | from | to |
+----+---------+------+----+
| 1 | 1 | aab | bbc|
| 2 | 2 | vfd | gfg|
| 3 | 1 | aab | bbc|
| 4 | 1 | fge | gfg|
| 5 | 3 | aab | gdf|
|... | ...... | ... | ...|
+----+---------+------+----+
There is no need to edit messages, but there should be an opportunity to updated the list of messages for the user. For example, an external service sends all user's messages to the db and the list has to be updated.
And the most important thing is that there are about 30 Mio of users and average user has 500+ of messages. Another problem that I have to search through the field from and calculate number of matches. I designed a simple SQL query with join, but it takes too much time to get the data.
So...it's quite big amount of data. I decided not to use RDS (I used Postgresql) and decided to move to databases like Clickhouse and so on.
However I faced with a problem that for example Clickhouse doesn't support UPDATE statement.
To resolve this issues I decided to store messages as one row. So the table Messages should be like this:
Here I'd like to store messages in JSON format
{"from":"aaa", "to":bbe"}
{"from":"ret", "to":fdd"}
{"from":"gfd", "to":dgf"}
||
\/
+----+---------+----------+------+ And there I'd like to store the
| id | user_id | messages | hash | <= hash of the messages.
+----+---------+----------+------+
I think that full-text search inside the messages column will save some time resources and so on.
Do you have any ideas? :)
In ClickHouse, the most optimal way is to store data in "big flat table".
So, you store every message in a separate row.
15 billion rows is Ok for ClickHouse, even on single node.
Also, it's reasonable to have each user attributes directly in messages table (pre-joined), so you don't need to do JOINs. It is suitable if user attributes are not updated.
These attributes will have repeated values for each users' message - it's Ok because ClickHouse compresses data well, especially repeated values.
If users' attributes are updated, consider to store users table in separate database and use 'External dictionaries' feature to join it.
If message is updated, just don't update it. Write another row with modified message to a table instead and leave old message as is.
Its important to have right primary key for your table. You should use table from MergeTree family, which constantly reorders data by primary key and so maintains efficiency of range queries. Primary key is not required to be unique, for example you could define primary key as just (from) if you would frequently write "from = ...", and if these queries must be processed in short time.
And you could use user_id as primary key: if queries by user id are frequent and must be processed as fast as possible, but then queries with predicate on 'from' will scan whole table (mind that ClickHouse do full scan efficiently).
If you need to fast lookup by many different attributes, you could just duplicate table with different primary keys. It's typically that table will be compressed well enough and you could afford to have data in few copies with different order for different range queries.
First of all, when we have such a big dataset, from and to columns should be integers, if possible, as their comparison is faster.
Second, you should consider creating proper indexes. As each user has relatively few records (500 compared to 30M in total), it should give you a huge performance benefit.
If everything else fails, consider using partitions:
https://www.postgresql.org/docs/9.1/static/ddl-partitioning.html
In your case they would be dynamic, and hinder first time inserts immensely, so I would consider them only as last, if very efficient, resort.

Database logic for user achievments

I have a table in my database that stores users, for example:
userID | userName | email | password | wins | losses | exp
Now I want the user to be able to get achievments in my game, like "win 5 games in a row", and I obviously want that progress in the database (Google app-engine) so progress is not lost when user exits client. Example of achievment table:
achievmentID | achievmentTitle | description | reward
Now how would I go about saving achievment progress for each user in the best manner? I need to save both progress (like 3/5 games in a row won) and if achievment is completed or not.
The product is for Android/iOS and uses google app engine (datastore) as database.
The way you set up your table would not be very efficient. In my mind to be the most efficient, you would have to make a new column on your users table (not new table, but after the 'exp') and create a sort of key for achievements. For example, you could give each achievement an ID (which you would keep track of like in a notes on notepad or something).
Then, when they get that achievement, you would put "123/" and if you did another achievement, it would say something like "123/461/".
Then you could make a script that breaks apart these IDs to see what achievements have been completed.

Designing a schedule in a sports database

I will try to be as specific as possible, but I am having trouble conceptualizing the problem. As a hobby I am trying to design a NFL database that takes raw statistics and stores it for future evaluation for fantasy league analysis. One of the primary things I want to see is if certain players/teams perform well against specific teams and which defenses are suspect to either pass/run. The issue I am having is trying to design the schedule/event table. My current model is as follows.
TEAMS
TeamID, Team
SCHEDULE
ScheduleID, TeamID, OpponentID, Season, Week, Home_Away, PointsFor, PointsAgainst
In this scenario I will be duplicating every game, but when I use an event table where I use TeamAway and TeamHome I find my queries impossible to run since I have to query both AwayTeam and HomeTeam to find the event for a specific team.
In general though I cannot get a query to work where I have two relationships from a table back to one table, even in the schedule table my query does not work.
I have also considered dropping the team table and just storing NE, PIT, etc. for the Team and Opponent fields so I do not have to deal with the cross-relationships back to the team table.
How can I design this so I am not running queries for TeamID = OpponentID AND TeamID?
I am doing this in MS Access.
Edit
The issue I am having is when I query two table: Team (TeamID, Team) and Event(TeamHomeID, TeamAwayID), that had relationships built between the TeamID - TeamHomeID, and TeamID - TeamWayID I had issues building the query in ms Access.
The SQL would look something like:
SELECT Teams.ID, Teams.Team, Event.HomeTeam
FROM Teams INNER JOIN (Event INNER JOIN Result ON Event.ID = Result.EventID)
ON (Teams.ID = Result.LosingTeamID) AND (Teams.ID = Result.WinningTeamID)
AND (Teams.Team = Event.AwayTeam) AND (Teams.Team = Event.HomeTeam);
It was looking for teams that had IDs of both the losing team and the winning team (which does not exist).
I think I might have fixed this problem. I didn't realize the Relationships in database design are only default, and that within the Query builder I could change the joins on which a particular query is built. I discovered this by deleting all the AND portions of the SQL statement returned, and was able to return the name of all winnings teams.
This is an interesting concept - and good practice.
First off - it sounds like you need to narrow down exactly what kind of data you want so you know what to store. I mean, hell, what about storing the weather conditions?
I would keep Team, but I would also add City (because Teams could switch cities).
I would keep Games (Schedule) with columns GameID, HomeTeamID, AwayTeamID, ScheduleDate.
I would have another table Results with columns ResultID, GameID, WinningTeamID, LosingTeamID, Draw (Y/N).
Data could look like
TeamID | TeamName | City
------------------------
1 | PATS | NE
------------------------
2 | PACKERS | GB
GameID | HomeTeamID | AwayTeamID | ScheduleDate | Preseason
-----------------------------------------------------------
1 | 1 | 2 | 1/1/2016 | N
ResultID | GameID | WinningTeamID | LosingTeamID | Draw
------------------------------------------------------------
1 | 1 | 1 | 2 | N
Given that, you could pretty easily give any W/L/D for any Scheduled Game and date, you could easily SUM a Teams wins, their wins when they were home, away, during preseason or regular season, their wins against a particular team, etc.
I guess if you wanted to get really technical you could even create a Season table that stores SeasonID, StartDate, EndDate. This would just make sure you were 100% able to tell what games were played in which season (between what dates) and from there you could infer weather statistics, whether or not a player was out during that time frame, etc.

Resources