MongoDb query Vs engine ressources - database

I learned some web programming during the last 2 years but i know nothing about computer hardware and how it actually deal with my functions so sorry if my question is silly.
I built a online pilot logbook. I use Mongodb for my database and i have 1 schema for users and another one for flight where each flight is one entry.
I have a overview page where i can see the total flight time for the last week, month, 90 days and 365 days. So my function query the database asking for all flight where user id is XXX and the date is today minus 7 days, minus 30 days, minus 90 days and minus 365 days.
So my question is : Everytime i load this page my function will go 4 times over each entry in the database to check if it fit my query, Does it take a lot of ressources ?
I know that my case is not big with only a few thousand entries in my database but i just want to build something efficient and know if i should care about this stuff or if it s such a small part of all the computation in a full webapp that it doesn't really matters ?
Thanks

Related

MongoDB -- TTL / Expire on Individual Fields?

I have some database of ~2 billion documents and ~8 TB which I store for 90 days before dropping the documents. However, several of these fields contain much more data than the rest, and I only need them for a shorter time, say 30 days. After 30 days, I want to clear the fields out to free up space, before archiving the document entirely later on.
It doesn't seem that MongoDB has native functionality for TTL on individual fields.
The database is both write and read heavy.
I'm thinking about writing some script to query Mongo every 1 minute, and then do some query like:
timestamp: $gt -30 days 1 hour AND $lt -30 days and then updateMany to write "" to these fields.
So essentially run a script every minute with a rolling window of one hour (just to ensure no documents escape) and doing an updateMany.
Is this a decent approach? Are there any design considerations I should be aware of when addressing this problem?

What is the best way to compare 35 availability timeslots between two parties?

This is for a ruby app, but the problem is more general.
Party 1 requires party 2 to be available for work, however rather than specific timing, it is more general
Monday to Sunday
Morning
Afternoon
Evening
Overnight
Overnight- awake
In other words 7 days with 5 time slots, creating 35 matching points
The simplest way would be to add 35 columns, with Boolean for both parties and then see if the respective columns match. We can measure the degree to which they match via a count.
Given this needs to be done multiple times, party 1 required availability against hundreds, possibly thousands of party 2 offered availablity I'm concerned that this will be very slow.
I would appreciate opinions on the above approach to the problem
is there a better way to store this matrix/array of required/availability that would make the compare faster?
any thoughts where this calculation should happen, dB or controller, helper?

Data inaccuracy

In my new job we have an old program that is 10 to 15 years old I think, but is still used. I work on renewing the system which have a major problem in the old data.
The program is part of a payment system. It allows the original payment to be split at the paying time.
When it splits the payment it updates the original record and keeps the original date on the new record. It keeps the original value of the last operation in a separate field.
1000$ original split into 2 500$ --> by adding new 500$ record and updating the original into 500$ payment keeping 1000$ as the original.
500$ split into 300$ , 200$ --> by adding new 200$ record and updating the original row into 300$ payment, now the original is updated to 500$ instead of 1000$.
and so on.
The following image contains a example case based on a real case. with two original payments 1000 and 600.
Whoever made the program did not use transactions, so some times the new record is not added (that's how the problem was discovered, but too late 15 years too late).
How can I find the affected customers in an 4.5 million records?
Is there a way to find the real original amount from the original field in the image? (I know that the answer might be no).
The database is oracle and the program was developed on oracle forms.
Thank you.
edit : a step by step example in a spreadsheet
https://docs.google.com/spreadsheets/d/1I9jOlCeiVuGdNlgXpiF_-Ic0e-cqaalrpUCJIUM5oAk/edit?usp=sharing
The problem is that the date field does not keep time only date. if the customer made several translations in the same day the error becomes hard to detect. it can only be detected if the customer made only one transaction in the day, even then it has to be viewed case by case. That's hard for years of work. Unfortunately, maybe they will have to suffer a loss for bad programming.
I will provide all the table fields tomorrow for better understanding.
Thank you for the replies.

How to store total visits statistics for user history efficiently?

I'm maintaining a system where users create something called "books" that are accessed by other users.
I need a convenient (good performance) way to store events in database where users visit these books to later display graphs with statistics. The graphs need to demonstrate a history where the owner of the book can see which days in the week, and at which times there is more visiting activity (all over the months).
Using ERD (Entity-Relationship-Diagram), I can produce the following Conceptual Model:
At first the problem seems to be solved, as we have a very simple situation here. This will give me a table with 3 fields. One will be the occurrence of the visit event, and the other 2 will be foreign keys. One represents the user, while the other represents which book was visited. In short, every record in this table will be a visit:
However, thinking that a user can average about 10 to 30 book visits per day, and having a system with 100.000 users, in a single day this table can add many gigabytes of new records. I'm not the most experienced person in good database performance practices, but I'm pretty sure that this is not the solution.
Even though I do a cleanup on the database to delete old records, I need to keep a record history of the last 2 months of visits (at least).
I've been looking for a way to solve this for days, and I have not found anything yet. Could someone help me, please?
Thank you.
OBS: I'm using PostgreSQL 9.X, and the system is written in Java.
As mentioned in the comments, you might be overestimating data size. Let's do the math. 100k users at 30 books/day at, say, 30 bytes per record.
(100_000 * 30 * 30) / 1_000_000 # => 90 megabytes per day
Even if you add index size and some amount of overhead, this is still a few orders of magnitude lower than "many gigabytes per day".

Printing the names of all the people greater than age 18?

This was a pretty good question that was posed to me recently. Suppose we have a hypothetical (insert your favorite data storage tool here) database that consists of the names, ages and address of all the people residing on this planet. Your task is to print out the names of all the people whose age is greater than 18 within an HTML table. How would you go about doing that? Lets say that hypothetically the population is growing at the rate of 1200/per second and the database is updated accordingly(don't ask how). What would be your strategy to print the names of all these people and their addresses on an HTML table?
Storing the ages in a DB tables sounds like a recipe for trouble to me - it would be impossible to maintain. You would be better off storing the birth dates, then building an index on that column/attribute.
You have to get an initial dump of the table for display. Just calculate the date 18 years ago (let's say D0) and use a query for any person born earlier than that.
Use DB triggers to receive notifications about deaths, so that you can remove them from the table immediately.
Since people only get older (unfortunately?), you can use ranged queries to get new additions (i.e. people that become 18 years old since yo last queried the table). E.g. if you want to update the display the next day, you issue a query for the people that were born in day D0 + 1 only - no need to request the whole table again.
You could even prefetch the people who reach 18 years of age the next day, keep the entries in memory, and add them to the display at the exact moment they reach that age.
BTW, even with 2KB of data for each person, you get a 18TB database (assuming 50% overhead). Any slightly beefed up server should be able to handle this kind of DB size. On the other hand, the thought of a 12 TB HTML table terrifies me...
Oh, and beware of timezone and DST issues - time is such a relative thing these days...
I don't see what the problem is. You don't have to worry about new records being added at all, since none of them will be included in your query unless that query takes 18 or more years to run. If you have an index on age, and presumably any DB technology sufficient to handle that much data and 1200 inserts a second updates indexes on insert, it should just work.
In the real world, using existing technologies or something like it, I would create a daily snapshot once a day and do queries on that read-only snapshot that would not include records for that day. That table would certainly be good enough for this query, and most others.
Are you forced to aggregate all of the entries into one table?
It would be simpler if you were to create a table for each age group (only around 120 tables would be needed) and just insert the inputs into those, as it's computationally simpler to look over 120 tables when you insert an entry than to look over 6,000,000,000 when looking for entries.

Resources