Is there any master/detail best practice to limit detail lookups when quickly enumerating master records - master-detail

Currently we have master/detail solution which creates requests on detail each time you select master record. When you go through master records fast, there is bunch of requests on irrelevant details. The only relevant one is on the end of queue.
Is there any best practice to use to postpone service calls in this fast result browsing scenario?

Related

How to query when data is spread into different microservices?

I'm new on microservices architecture and I'm facing this problem:
I have a platform where basically Users manage the accounting of their Clients.
I have one microservice in charge of the security. This one manages which Users have access to which Clients.
Then I have another microservice that manages the Invoices of the Clients.
One of the functions here would be: given a User is logged, list all the Invoices of all the Clients that the User has access to.
For that, I thought that I should ask the Security microservice to give me the list of the Clients the User has access to. And then, I go to the database of Invoices and query, filtering by all those clients.
The problem is that I end up with a horrible query, as it's something like:
SELECT * FROM Invoice WHERE clientId IN (CLI1, CLI2, CLI3, ...) -- Potentially 200 clients
I thought to keep a copy of the User-Client relation in the Invoice database. Or to have both microservices sharing the same database. But none of them convince me as I have more microservices that may face the same problem, leading to a huge repetition of data or to a big monolithic database.
Is there a better way to do this?
Thanks in advance!
In general, database access is restricted across the services, and also keeping information which you do not own, tends to be a tiresome process as you will always be in fight to sync this piece of data with its intended source of truth.
So the only option that you have is what you have already mentioned in the question.
You end up with a horrible query.
But is it horrible when you write it or will it lag in performance?
It depends, yes if you are using MySQL, you can always perform joins over sub queries
But joins have there own cost.
And even than it will be ok to check if sub queries in these cases gives you expected performance.
If its feasible you can explore other databases which could be really optimized for queries like these.
Or worst case scenario, you can copy the data but you have to put in lot effort to ensure they remain in sync.
Most of the choices are not straight forward and there will be trade offs that you need to take call on.

have multiple table copies in databases for easy join query or do data associate in program? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
In a large system which use multiple databases.
eg:
db_trade used for trade information storage
db_fund used for user account storage
db_auth used for authentication and authorization
In this case user_info is common info.
trade system and fund system UI need display trade or account information with user info. for better performance, it need execute sql query left join user_info.
I wan't know how to design in larger system:
Perform data association in program ?
Sync user_info table in every databases ?
There are pros and cons to each approach:
The normalized approach stores each piece of data exactly once, and is better from a data integrity perspective. This is the traditional approach used in relational database design. For example, in a banking system you would probably not keep the current account balance in more than one place, right? Because then when you change it in one place, the other one becomes inconsistent, which may lead to wrong business decisions.
The denormalized approach allows to store multiple copies of the same data in different places, and is better for performance. This is the approach generally recommended for big data and NoSQL database design. An example where this makes sense: Suppose you are designing a chat system, and you need to display messages next to the name of the message author. You will probably prefer to store the display name next to the message, and not just the user ID, so that you don't need to do an expensive Join every time you display messages.
If you denormalize, you need to take care of data integrity at the application level. First, you need to make sure that you're clear about what's the source of truth. It's ok to have multiple copies of the user_info ready to be fetched with low latency, but there should be one place where the most correct and up-to-date user info can be found. This is the master table for user info. The other copies of user info should derive from it. So you must decide which one of the databases in your design is the Master of user info.
In the end, you have to make a tradeoff between consistency and performance (which is closely related to availability).
If user_info doesn't change a lot, and you have lots of queries and lots of users, and performance is your main concern - go with the denormalized approach and sync user_info table in every database. Your application will have to keep those tables as consistent as you need, by database-level replication or by some application logic.
If you must have strongly consistent views of the user_info in every query (which is not a typical situation), you may need to sacrifice performance and keep all user info in one location.
Generally, big data systems sacrifice consistency in favor of performance and availability.

SQL Server switching live database

A client has one of my company's applications which points to a specific database and tables within the database on their server. We need to update the data several times a day. We don't want to update the tables that the users are looking at in live sessions. We want to refresh the data on the side and then flip which database/tables the users are accessing.
What is the accepted way of doing this? Do we have two databases and rename the databases? Do we put the data into separate tables, then rename the tables? Are there other approaches that we can take?
Based on the information you have provided, I believe your best bet would be partition switching. I've included a couple links for you to check out because it's much easier to direct you to a source that already explains it well. There are several approaches with partition switching you can take.
Links: Microsoft and Catherin Wilhelmsen blog
Hope this helps!
I think I understand what you're saying: if the user is on a screen you don't want the screen updating with new information while they're viewing it, only update when they pull a new screen after the new data has been loaded? Correct me if I'm wrong. And Mike's question is also a good one, how is this data being fed to the users? Possibly there's a way to pause that or something while the new data is being loaded. There are more elegant ways to load data like possibly partitioning the table, using a staging table, replication, have the users view snapshots, etc. But we need to know what you mean by 'live sessions'.
Edit: with the additional information you've given me, partition switching could be the answer. The process takes virtually no time, it just changes the pointers from the old records to the new ones. Only issue is you have to partition on something patitionable, like a date or timestamp, to differentiate old and new data. It's also an Enterprise-Edition feature and I'm not sure what version you're running.
Possibly a better thing to look at is Read Committed Snapshot Isolation. It will ensure that your users only look at the new data after it's committed; it provides a transaction-level consistent view of the data and has minimal concurrency issues, though there is more overhead in TempDB. Here are some resources for more research:
http://www.databasejournal.com/features/mssql/snapshot-isolation-level-in-sql-server-what-why-and-how-part-1.html
https://msdn.microsoft.com/en-us/library/tcbchxcb(v=vs.110).aspx
Hope this helps and good luck!
The question details are a little vague so to clarify:
What is a live session? Is it a session in the application itself (with app code managing it's own connections to the database) or is it a low level connection per user/session situation? Are users just running reports or active reading/writing from the database during the session? When is a session over and how do you know?
Some options:
1) Pull all the data into the client for the entire session.
2) Use read committed or partitions as mentioned in other answers (however requires careful setup for your queries and increases requirements for the database)
3) Use replica database for all queries, pause/resume replication when necessary (updating data should be faster than your process but it still might take a while depending on volume and complexity)
4) Use replica database and automate a backup/restore from the master (this might be the quickest depending on the overall size of your database)
5) Use multiple replica databases with replication or backup/restore and then switch the connection string (this allows you to update the master constantly and then update a replica and switch over at a certain predictable time)

DB Strategy for inserting into a high read table (Sql Server)

Looking for strategies for a very large table with data maintained for reporting and historical purposes, a very small subset of that data is used in daily operations.
Background:
We have Visitor and Visits tables which are continuously updated by our consumer facing site. These tables contain information on every visit and visitor, including bots and crawlers, direct traffic that does not result in a conversion, etc.
Our back end site allows management of the visitor's (leads) from the front end site. Most of the management occurs on a small subset of our visitors (visitors that become leads). The vast majority of the data in our visitor and visit tables is maintained only for a much smaller subset of user activity (basically reporting type functionality). This is NOT an indexing problem, we have done all we can with indexing and keeping our indexes clean, small, and not fragmented.
ps: We do not currently have the budget or expertise for a data warehouse.
The problem:
We would like the system to be more responsive to our end users when they are querying, for instance, the list of their assigned leads. Currently the query is against a huge data set of mostly irrelevant data.
I am pondering a few ideas. One involves new tables and a fairly major re-architecture, I'm not asking for help on that. The other involves creating redundant data, (for instance a Visitor_Archive and a Visitor_Small table) where the larger visitor and visit tables exist for inserts and history/reporting, the smaller visitor1 table would exist for managing leads, sending lead an email, need leads phone number, need my list of leads, etc..
The reason I am reaching out is that I would love opinions on the best way to keep the Visitor_Archive and the Visitor_Small tables in sync...
Replication? Can I use replication to replicate only data with a certain column value (FooID = x)
Any other strategies?
It sounds like your table is a perfect candidate for partitioning. Since you didn't mention it, I'll briefly describe it, and give you some links, in case you're not aware of it.
You can divide the rows of a table/index across multiple physical or logical devices, and is specifically meant to improve performance of data sets where you may only need a known subset of the data to work with at any time. Partitioning a table still allows you to interact with it as one table (you don't need to reference partitions or anything in your queries), but SQL Server is able to perform several optimizations on queries that only involve one partition of the data. In fact, in Designing Partitions to Manage Subsets of Data, the AdventureWorks examples pretty much match your exact scenario.
I would do a bit of research, starting here and working your way down: Partitioned Tables and Indexes.
Simple solution: create separate table, de-normalized, with all fields in it. Create stored procedure, that will update this table on your schedule. Create SQl Agent job to call the SP.
Index the table as you see how it's queried.
If you need to purge history, create another table to hold it and another SP to populate it and clean main report table.
You may end up with multiple report tables - it's OK - space is cheap these days.

Effective strategy for leaving an audit trail/change history for DB applications?

What are some strategies that people have had success with for maintaining a change history for data in a fairly complex database. One of the applications that I frequently use and develop for could really benefit from a more comprehensive way of tracking how records have changed over time. For instance, right now records can have a number of timestamp and modified user fields, but we currently don't have a scheme for logging multiple change, for instance if an operation is rolled back. In a perfect world, it would be possible to reconstruct the record as it was after each save, etc.
Some info on the DB:
Needs to have the capacity to grow by thousands of records per week
50-60 Tables
Main revisioned tables may have several million records each
Reasonable amount of foreign keys and indexes set
Using PostgreSQL 8.x
One strategy you could use is MVCC, Multi-Value Concurrency Control. In this scheme, you never do updates to any of your tables, you just do inserts, maintaining version numbers for each record. This has the advantage of providing an exact snapshot from any point in time, and it also completely sidesteps the update lock problems that plague many databases.
But it makes for a huge database, and selects all require an extra clause to select the current version of a record.
If you are using Hibernate, take a look at JBoss Envers. From the project homepage:
The Envers project aims to enable easy versioning of persistent JPA classes. All that you have to do is annotate your persistent class or some of its properties, that you want to version, with #Versioned. For each versioned entity, a table will be created, which will hold the history of changes made to the entity. You can then retrieve and query historical data without much effort.
This is somewhat similar to Eric's approach, but probably much less effort. Don't know, what language/technology you use to access the database, though.
In the past I have used triggers to construct db update/insert/delete logging.
You could insert a record each time one of the above actions is done on a specific table into a logging table that keeps track of the action, what db user did it, timestamp, table it was performed on, and previous value.
There is probably a better answer though as this would require you to cache the value before the actual delete or update was performed I think. But you could use this to do rollbacks.
The only problem with using Triggers is that it adds to performance overhead of any insert/update/delete. For higher scalability and performance, you would like to keep the database transaction to a minimum. Auditing via triggers increase the time required to do the transaction and depending on the volume may cause performance issues.
another way is to explore if the database provides any way of mining the "Redo" logs as is the case in Oracle. Redo logs is what the database uses to recreate the data in case it fails and has to recover.
Similar to a trigger (or even with) you can have every transaction fire a logging event asynchronously and have another process (or just thread) actually handle the logging. There would be many ways to implement this depending upon your application. I suggest having the application fire the event so that it does not cause unnecessary load on your first transaction (which sometimes leads to locks from cascading audit logs).
In addition, you may be able to improve performance to the primary database by keeping the audit database in a separate location.
I use SQL Server, not PostgreSQL, so I'm not sure if this will work for you or not, but Pop Rivett had a great article on creating an audit trail here:
Pop rivett's SQL Server FAQ No.5: Pop on the Audit Trail
Build an audit table, then create a trigger for each table you want to audit.
Hint: use Codesmith to build your triggers.

Resources