Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I am programming a database(DB) in Lazarus as a project and for increased complexity have NOT downloaded any additional libraries. The software is intended to allow customers to check availability of hiring a villa and them make a booking if the villa is available.
Declaring records and writing them to a file acts as a table for my DB. Reading/Writing/Deleting the records in my files has been achieved successfully and I now move on to the point where I use this data to make bookings.
I have 3 tables:
Clients
Villas
Bookings
Now, my problem comes in with the bookings table. How do I make my application know that a villa has already been booked for the period in which a new booking wants to be made. (basically double booking shouldn't be allowed) So far as mentioned, I can only read/write and delete records in my tables and now move on to the booking stage. Please ask if further info is needed
I'm thinking of using the Tcalendar but have no idea how to program with it or even if that is the simple way of doing it. Any tips please?
You have to solve two distinct problems:
Implement logic (not visible to the user) that determines whether a requested booking is available or whether the villa is booked for some or all of the requested period.
Implement some kind of visual display for the user to see when a villa is booked.
TCalendar would only help you with the second part, which is the least interesting (because you don't really need a visual interface, you could simply pop up a message that says "Villa Not Available").
To write the logic that will tell you whether a booking is available, you will need to refine your data model (or, if you've done that already, explain it to us in more detail). Specific questions you need to address are:
Do you issue bookings against individual villas ("I want to book the Butterfly House for one week starting July 1") or against an inventory of identical villas ("I want to book one of your two-bedroom villas for one week starting July 1").
You need to decide how you're going to store the booking information. When I've tried a task like this in the past, I've found it easiest to store a record for each night of each reservation then, for a request, do a separate query to determine if I can satisfy that individual night. A request for which I can satisfy all nights is bookable.
Related
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
I use SQL server as a database and I want to know Regardless of the database I use or I mean in general
I had a scenario in an auction website and I wanted to always have the highest price that the user entered, I added an attribute named "Highest Price" to the item entity (which is the item that is being sold), and checked that if the user entered a higher price it will be updated
so my question is, was it faster to do this or that I have a table like this and search in it every time I insert a new record
My direct answer to your question (as you put it into a title) would definitely be searching, as writing takes longer time than reading.
So, here are your two options for your example (that is, if you want to actually identify the user that made the highest bid, and I guess you would do):
Use the "bids" table.
Advantages: Allows you to keep history of bids, adds coherence (totally makes sense linking users and items in a table).
Disadvantages: Way slower as to find the maximum price you need to search using MAX(), plus you will need to write each entry, regardless its price.
Add two attributes to the table that contains the items, referring to the highest price and the user ID that put that price.
Advantages: Faster, you select directly one row and you only write given certain conditions.
Disadvantages: Doesn't look too coherent linking the user to an item, rather than to a bid.
Common things to both options: You write data to two columns, although with different intensity.
If you need absolutely the highest speed possible, go for the second one, but, especially when you're engineering software, you must also take coherence into account.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I'm designing a database for tracking stock transactions & real-time portfolio holdings. The system should be able to pull historical position on a custom time-period basis (e.g. end of day holdings). My current design includes a transaction table and a real-time position table, when each transaction is booked into the database it will automatically trigger the update of the real-time position table. I'm planning to use PostgreSQL and the TimescaleDB extension for the transaction table.
I'm kind of confused about how to implement the function of historical holdings, as the historical holdings as a certain timestamp t can be derived by aggregating the transactions with timestamp <=t together. Should I use a separate table to record the historical holdings or simply do the aggregating? I am also considering use binary files to store snapshots of real-time positions at end of each day to support the historical position look-up.
I have little experience with Database design thus any advice/help is appreciated.
This question is lacking detail, so my answer is general.
One thing you could do is have two tables: one for the detailed data and one for the aggregation. Then you calculate one record for the latter from the former every day. Use partitioning for the detail table to be able to get rid of old data easily.
You can also use the same (partitioned) table for both, if the data structure allows it. Then you calculate a new aggregated record every day, drop the partition for that day and extend the partition boundary for the “aggregated” partition.
Consider carefully if you need the TimescaleDB extension. If it offers benefits you know you need, go for it. If not, do without it. It is always nice to have few dependencies. Just because you are storing time series data doesn't mean you need TimescaleDB.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
We have a feature request where a said users choice of certain entity has to be recorded, this would be an integer that is incremented every time the user chooses that specific entity. This information is later used for ordering the said entities every time the user gets all its associated of the said entities.
Assuming that we are limited to sql server as our persistent storage and that we could have !millions of users and each user could have anywhere between 1 to 10 of those said entities, performance would quickly become an issue.
One option is to have an append only log table where the user id and entity id are saved every time a user chooses an entity and on the query side we get all the records and group by entity id. This could quickly cause the table to grow very large.
The other would be to have three columns consisting of user id, entity id and count and we would increment the count every time the user chooses the said entity.
The above two options are what I have been able to come up with.
I would like to know if there are any other options and also what would be the performance implications of the above solutions.
Thanks
If this was 2001 performance could be an issue.
If you are using SQL server 2012 or 2016
then I say either are good.
Ints or Big ints index well and the performance hit would be insignificant.
You could also store the data in a xml or json varchar field
but I would go with your first option.
Using Big Ints and make sure you use indexes no matter what you do
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I have recently been tasked with making an account system for all of our products, much like that of a Windows account across Microsoft's products. However, one of the requirements is that we are able to easily check for accounts with the same information across them so that we can detect traded accounts, sudden and possibly fraudulent changes of information, etc.
When thinking of a way to solve this problem, I thought we could reduce redundancy of our data while we're at it. I thought it might help us save some storage space and processing time, since in the end we're going to just be processing the data set into what I explain below.
A bit of background on how this is set up right now:
An account table just contains an id and a username
A profile table contains a reference to an account and references to separate pieces of profile data: names, mailing addresses, email addresses
A name table contains an id and the first, last, and middle name of an individual
An address table contains data about an address
An email address table contains an id and the mailbox and domain of an email address
A profile record is what relates the unique pieces of profile data (shared across many accounts) to a specific account. If there are fifty people named "John Smith", then there is only one "John Smith" record in the names table. If a user changes any piece of their information, the profile record is soft deleted and a new one is created. This is to facilitate change tracking.
After profiling, I have noticed that creating constraints like UNIQUE(FirstName, MiddleName, LastName) is pretty painful in terms of record insertion. Is that simply the price we're going to have to pay or is there a better approach?
Having records about 2 persons named John Smith is not a redundancy, but a necessity.
The approach you've suggested is far from being optimal. "The profile record is soft deleted and a new one is created. This is to facilitate change tracking." Deleting and re-inserting will cause problems with dependent records in other tables.
There are much easier methods to track changes - search for some 3rd party tools
As for the tables created, it's not necessary to split the data in so many tables. Why don't you merge the Name and Account tables. Are both Address and Email address tables necessary?
I have concluded my research and decided that this approach is fine if insert performance is not critical. In cases where it is critical, increasing data redundancy within reason is an acceptable trade off.
The solution described in my question is adequate for my performance needs. Storage is considered more expensive than insertion time, in our model.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
We are looking to create a software that receive log files from a high number of devices. We are looking around 20 million rows a day with log (2kb / each for each log line).
I have developed a lot of software but never with this large quantity of input data. The data needs to be searchable, sortable, groupable by source IP, dest IP, alert level etc.
It should be combining similiar log entries (occured 6 times etc..)
Any ideas and suggestions on what type of design, database and general thinking around this would be much appreciated.
UPDATE:
Found this presentation, seems like a similar scenario, any thoughts on this?
http://skillsmatter.com/podcast/cloud-grid/mongodb-humongous-data-at-server-density
I see a couple of things you may want to consider.
1) message queue - to drop a log line and let other part (worker) of the system to take care of it when time permits
2) noSQL - reddis, mongodb,cassandra
I think your real problem would be in querying the data , not in storing.
Also you probably would need a scalable solution.
Some of noSql databases are distributed you may need that.
Check this out, it might be helpful
https://github.com/facebook/scribe
A web search on "Stackoverflow logging device data" yielded dozens of hits.
Here is one of them. The question asked may not be exactly the same as yours, but you should get dozens on intersting ideas from the responses.
I'd base many decisions on how users most often will be selecting subsets of data -- by device? by date? by sourceIP? You want to keep indexes to a minimum and use only those you need to get the job done.
For low-cardinality columns where indexing overhead is high yet the value of using an index is low, e.g. alert-level, I'd recommend a trigger to create rows in another table to identify rows corresponding to emergency situations (e.g. where alert-level > x) so that alert-level itself would not have to be indexed, and yet you could rapidly find all high-alert-level rows.
Since users are updating the logs, you could move handled/managed rows older than 'x' days out of the active log and into an archive log, which would improve performance for ad-hoc queries.
For identifying recurrent problems (same problem on same device, or same problem on same ip address, same problem on all devices made by the same manufacturer, or from the same manufacturing run, for example) you could identify the subset of columns that define the particular kind of problem and then create (in a trigger) a hash of the values in those columns. Thus, all problems of the same kind would have the same hash value. You could have multiple columns like this -- it would depend on your definition of "similar problem" and how many different problem-kinds you wanted to track, and on the subset of columns you'd need to enlist to define each kind of problem. If you index the hash-value column, your users would be able to very quickly answer the question, "Are we seeing this kind of problem frequently?" They'd look at the current row, grab its hash-value, and then search the database for other rows with that hash value.