Joining SQL Query data with Rest Service data on the fly - sql-server

I need to merge data from a mssql server and rest service on the fly. I have been asked to not store the data permanently in the mssql database as it changes periodically (caching would be OK, I believe as long as the cache time was adjustable).
At the moment, I am querying for data, then pulling joined data from a memory cache. If the data is not in cache, I call a rest service and store the result in cache.
This can be cumbersome and slow. Are there any patterns, applications or solutions that would help me solve this problem?
My thought is I should move the cached data to a database table which would speed up joins and have the application periodically refresh the data in the database table. Any thoughts?

You can try Denodo. It allows connecting multiple data source and has inbuild caching feature.
http://www.denodo.com/en

Related

Synapse and on-prem tempdb usage

I've made several pipelines with several copy data activities in Synapse. They get data from different sources (SQL, CSV, REST API) and sink that to an on-premise SQL server.
The pipelines run smooth, data is processed correctly, no issues there. What I'm facing right now is that the on-prem tempdb eats a lot of memory, but it's not clear for me when this space gets flushed or becomes available again. With most copy data activities I'm making use of the upsert function, so the checkbox 'Use TempDB' becomes available.
As soon as the table is synchronized I don't need the temp table anymore of course. Does Synapse or SQL Server automatically truncates this db? Or is this a setting on the tempdb? I've already shrank the tempdb but this only helps temporary ofcourse...
Let me know what the best options is here, thanks in advance! :)

Load balancer and multiple instance of database design

The current single application server can handle about 5000 concurrent requests. However, the user base will be over millions and I may need to have two application servers to handle requests.
So the design is to have a load balancer to hope it will handle over 10000 concurrent requests. However, the data of each users are being stored in one single database. So the design is to have two or more servers, shall I do the followings?
Having two instances of databases
Real-time sync between two database
Is this correct?
However, if so, will the sync process lower down the performance of the servers
as Database replication seems costly.
Thank you.
You probably want to think of your service in "tiers". In this instance, you've got two tiers; the application tier and the database tier.
Typically, your application tier is going to be considerably easier to scale horizontally (i.e. by adding more application servers behind a load balancer) than your database tier.
With that in mind, the best approach is probably to overprovision your database (i.e. put it on its own, meaty server) and have your application servers all connect to that same database. Depending on the database software you're using, you could also look at using read replicas (AWS docs) to reduce the strain on your database.
You can also look at caching via Memcached / Redis to reduce the amount of load you're placing on the database.
So – tl;dr – put your DB on its own, big, server, and spread your application code across many small servers, all connecting to that same DB server.
Best option could be the synchronizing the standby node with data from active node as cost effective solution since it can be achievable using open source relational database(e.g. Maria DB).
Do not store computable results and statistics that can be easily doable at run time which may help reduce to data size.
If history data is not needed urgent for inquiries , it can be written to text file in easily importable format to database(e.g. .csv).
Data objects that are very oftenly updated can be kept in in-memory database as key value pair, use scheduled task to perform batch update/insert to relation database to achieve persistence
Implement retry logic for database batch update tasks to handle db downtimes or network errors
Consider writing data to relational database as serialized objects
Cache configuration data to memory from database either periodically or via API to refresh the changing part.

How should I use Redis as a cache for SQL Server?

I have got some tabular data that due to unrelated issues is proving too slow to get out of SQL Server in realtime. As we get more users this will only get worse so I am thinking of using Redis as a front-end cache to store users' tabular pageable data.
This data could become stale after about 10 minutes after which time I would like to get the record set again and put in in Redis.
The app is an .NET MVC app. I was thinking that when the user logs into the app this data gets pulled out of the database (takes around 10 seconds) and put into Redis ready to be consumed by the MVC client. I would put an expiry on that data and then when it becomes stale it will get refetched from the SQL Server database.
Does this all sound reasonable? I'm a little bit scared that:
The user could get to the page before the data is in Redis
If Redis goes down or does not respond I need to ensure that the ViewModel can get filled direct from SQL SErver without Redis being there
I will go for Service stack redis implementation, here are all the details required. Redis is particularly good when doing caching in compare to other nosql. But if you are having high read - write application, I will insist to check out nosql database as database combined with SQL server. That will help in case of scalability.
Please let me know if any further details required. You just need to fire nuget command and you are almost up and running.
You could use something like MemcacheD to store cached pages in memory.
You can set a validity of 10 minutes on a cached object. After that the cache will automatically remove the object.
Your actual repository would have to do these steps:
1. Check the cache for the data you want, if it is there, great use it
2. If the cached data doesn't exist, go to SQL server to retrieve it
3. Update the cache with data returned from SQL server
I've used the Enyim client before. It works great. https://github.com/enyim/EnyimMemcached
I might also use something like Quartz to schedule a background task to prime the cache. http://quartznet.sourceforge.net/

Data Warehouse Best Practice: Intra-day DW Loads and Reporting

we have intra-day Data Warehouse loads through the day (using SSIS, SQL Server 2005).
The reporting is done through Business Objects (XI 3.1 WebI).
We are not currently facing any issues, but what are the Best Practices for intra-day Data Warehouse loads, where at the same time Reporting from the same Database?
thanks,
Amrit
Not sure If I understood you correctly, but I guess that the two main problems you may be facing are:
data availability: your users may want to query data that you have temporary removed because you're refreshing it (...this depends on your data loading approach).
performance: The reporting may be affected by the data loading processes.
If your data is partitioned, I think it would be a nice approach to use a partitioned switch based data load.
You perform the data load on a staging partition that contains the data that you're reloading (while the datawarehouse partition is still available with all the data for the users). Then, once you have finished loading the data in your staging partition, you can immediately switch the partitions between staging and the datawarehouse. This will solve the data availability problem and could help reducing the performance one (if for instance your staging partition is on a different hard-drive than the datawarehouse).
more info on partitioned data load and other data loading techniques here:
http://msdn.microsoft.com/en-us/library/dd425070(v=sql.100).aspx

Pattern for very slow DB Server

I am building an Asp.net MVC site where I have a fast dedicated server for the web app but the database is stored in a very busy Ms Sql Server used by many other applications.
Also if the web server is very fast, the application response time is slow mainly for the slow response from the db server.
I cannot change the db server as all data entered in the web application needs to arrive there at the end (for backup reasons).
The database is used only from the webapp and I would like to find a cache mechanism where all the data is cached on the web server and the updates are sent to the db asynchronously.
It is not important for me to have an immediate correspondence between read db data and inserted data: think like reading questions on StackOverflow and new inserted questions that are not necessary to show up immediately after insertion).
I thought to build an in between WCF service that would exchange and sync the data between the slow db server and a local one (may be an Sqllite or an SqlExpress one).
What would be the best pattern for this problem?
What is your bottleneck? Reading data or Writing data?
If you are concerning about reading data, using a memory based data caching machanism like memcached would be a performance booster, As of most of the mainstream and biggest web sites doing so. Scaling facebook hi5 with memcached is a good read. Also implementing application side page caches would drop queries made by the application triggering lower db load and better response time. But this will not have much effect on database servers load as your database have some other heavy users.
If writing data is the bottleneck, implementing some kind of asyncronyous middleware storage service seems like a necessity. If you have fast and slow response timed data storage on the frontend server, going with a lightweight database storage like mysql or postgresql (Maybe not that lightweight ;) ) and using your real database as an slave replication server for your site is a good choise for you.
I would do what you are already considering. Use another database for the application and only use the current one for backup-purposes.
I had this problem once, and we decided to go for a combination of data warehousing (i.e. pulling data from the database every once in a while and storing this in a separate read-only database) and message queuing via a Windows service (for the updates.)
This worked surprisingly well, because MSMQ ensured reliable message delivery (updates weren't lost) and the data warehousing made sure that data was available in a local database.
It still will depend on a few factors though. If you have tons of data to transfer to your web application it might take some time to rebuild the warehouse and you might need to consider data replication or transaction log shipping. Also, changes are not visible until the warehouse is rebuilt and the messages are processed.
On the other hand, this solution is scalable and can be relatively easy to implement. (You can use integration services to pull the data to the warehouse for example and use a BL layer for processing changes.)
There are many replication techniques that should give you proper results. By installing a SQL Server instance on the 'web' side of your configuration, you'll have the choice between:
Making snapshot replications from the web side (publisher) to the database-server side (suscriber). You'll need a paid version of SQLServer on the web server. I have never worked on this kind of configuration but it might use a lot of the web server ressources at scheduled synchronization times
Making merge (or transactional if requested) replication between the database-server side (publisher) and web side(suscriber). You can then use the free version of MS-SQL Server and schedule the synchronization process to run according to your tolerance for potential loss of data if the web server goes down.
I wonder if you could improve it adding a MDF file in your Web side instead dealing with the Sever in other IP...
Just add an SQL 2008 Server Express Edition file and try, as long as you don't pass 4Gb of data you will be ok, of course there are more restrictions but, just for the speed of it, why not trying?
You should also consider the network switches involved. If the DB server is talking to a number of web servers then it may be being constrained by the network connection speed. If they are only connected via a 100mb network switch then you may want to look at upgrading that too.
the WCF service would be a very poor engineering solution to this problem - why make your own when you can use the standard SQLServer connectivity mechanisms to ensure data is transferred correctly. Log shipping will send the data across at selected intervals.
This way, you get the fast local sql server, and the data is preserved correctly in the slow backup server.
You should investigate the slow sql server though, the performance problem could be nothing to do with its load, and more to do with the queries and indexes you're asking it to work with.

Resources