Large data sets and sql server - sql-server

I have some devices which will log data to table each seconds.Each device will have 16 records per second, As the number of devices grow large the table will have billions of records, Now I’m using sql server, some times a simple record count query itself takes seconds to execute.
There are situation where we need historical data mostly as average of data in hour so we were processing large data each hour and converting it into hourly data so there will be only 16 records for a device in an hour but now there is a requirement to get all records between some time ranges and process it so we need to access big data.
currently I use sql server, Can you please suggest some alternative methods or how to deal with big data in sql server or some other db.

I don't think that's too much for SQL Server. For starters, please see the links below.
https://msdn.microsoft.com/en-us/library/ff647793.aspx?f=255&MSPPError=-2147217396
http://sqlmag.com/sql-server-2008/top-10-sql-server-performance-tuning-tips
http://www.tgdaily.com/enterprise/172441-12-tuning-tips-for-better-sql-server-performance
Make sure your queries are tuned properly and make sure the tables are indexed properly.

Related

SQL table Indexes and Spotfire performance

I have a spotfire project that references several large SQL Server based tables (One has 700,000 rows with 200 columns, another is 80,000,000 rows with 10 columns, a few others that are much smaller by comparison). Currently I use information links with prompts to narrow down the data before loading into spotfire. Still have issues sometimes with RAM usage creeping up and random CPU spikes after data has been loaded.
My questions are if I add indexes to the SQL tables:
Will the amount of RAM/CPU usage by spotfire get better (lower)?
Will it help speed up the initial data load time?
Should I even bother?
I'm using SQL Server 2016 and Tibco Spotfire Analyst 7.7.0 (build version 7.7.0.39)
Thanks
If you add indexes without logical reason, it actually makes your system slower because indexes constantly update themselves after each INSERT, UPDATE, DELETE. You can ignore my statement if your DB has static data and you won't change the content usually.
You need to understand what parts of your queries consume most of resources, then create indexes accordingly.
Following URLs will help you:
https://www.liquidweb.com/kb/mysql-performance-identifying-long-queries/
https://www.eversql.com/choosing-the-best-indexes-for-mysql-query-optimization/

Database structure for 2000 data per second and real-time shown?

I am working on a monitoring project based on Web. There are nearly 50 sensors with the sample frequency 50Hz. All the raw data of sensor must be stored into the database. That means that nearly 2500 data per second will be deal and 200 million per day. And the data must be saved for at least three years. My job is making a webserver to show the real-time and historical sensor data through the database. The time-delay displaying is allowed.
Which database should we choose for this application? SQL server or Oracle? Is it possible for these databases to stand so huge I/O transactions per second?
How to design the database structure for real-time data and historical data? My opinions are that there are two databases, one store real-time data, another store historical data. First the coming data was stored in real-time database, and at the some time(eg. 23:59:59 every day ), I write a sql transaction to transfer the real-time data to the historical database. So for the real-time displaying, it read the real-time database. And for historical data shown, it read historical database. Is it feasible? And how to determine the time to transfer the data? I think on day is too long for some huge data.
How to store the time information? It comes a data per 20 million seconds. For one data + one time information, the database will expand so huge. For that the datetime type of sql server 2008 take 8 byte space. For 50 data+ one time information , the volume of database will reduce ,but for the data shown , it will cost time to get ervery data from the 50 data. How to balance the database size and the efficient of data reading?

How to Improve SQL Server 2012 Timeout Issues - Large Database

We work for a small company that cannot afford to pay SQL DBA nor consultation.
What started as a small project has now become a full scale system with a lot of data.
I need someone to help me sort out performance improvements. I realise no-one will be able to help directly and nail this issue completely, but I just want to make sure I have covered my tracks.
OK, the problem is basically we are experiencing time-outs with our queries on cached data. I have increased the time-out time with c# code but I can only go so far when it's becoming ridiculous.
The current setup is a database that has data inserted every 5 / 10 seconds, constantly! During this process we populate tables from csv files. Over night we run data caching processes that reduces the overload on the "inserted" tables. Originally we were able to convert 10+ million rows into say 400000 rows, but as users want more filtering we had to include more data rows and of course increases the number of data cached tables from 400000 to 1-3 million rows.
On my SQL Development Server (which does not have data inserted every 5 seconds) it used to take 30 seconds to run queries on data cache table with 5 million rows, now with indexing and some improvements it's now 17 seconds. The live server has standard SQL Server and used to take 57 seconds, now 40 seconds.
We have 15+ instances running with same number of databases.
So far we have outlined the following ways of improving the system:
Indexing on some of the data cached tables - database now bloated and slows down overnight processes.
Increased CommandTimeout
Moved databases to SSD
Recent improvements likely:
Realised we will have to move csv files on another hard disk and not on the same SSD drive SQL Server databases reside.
Possibly use file-groups for indexing or cached tables - not sure if SQL Server standard will cover this.
Enterprise version and partition table data - customer may pay for this but we certainly can't afford this.
As I said I'm looking for rough guidelines and realise no-one may be able to help fix this issue completely. We're are a small team and no-one has extended SQL Server experience. Customer wants answers and we've tried everything we know. Incidentally they had a small scale version in Excel and said they found no issues so why are we?!?!?
Hope someone can help.

Warehouse PostgreSQL database architecture recommendation

Background:
I am developing an application that allows users to generate lots of different reports. The data is stored in PostgreSQL and has natural unique group key, so that the data with one group key is totally independent from the data with others group key. Reports are built only using 1 group key at a time, so all of the queries uses "WHERE groupKey = X;" clause. The data in PostgreSQL updates intensively via parallel processes which adds data into different groups, but I don't need a realtime report. The one update per 30 minutes is fine.
Problem:
There are about 4 gigs of data already and I found that some reports takes significant time to generate (up to 15 seconds), because they need to query not a single table but 3-4 of them.
What I want to do is to reduce the time it takes to create a report without significantly changing the technologies or schemes of the solution.
Possible solutions
What I was thinking about this is:
Splitting one database into several databases for 1 database per each group key. Then I will get rid of WHERE groupKey = X (though I have index on that column in each table) and the number of rows to process each time would be significantly less.
Creating the slave database for reads only. Then I will have to sync the data with replication mechanism of PostgreSQL for example once per 15 minutes (Can I actually do that? Or I have to write custom code)
I don't want to change the database to NoSQL because I will have to rewrite all sql queries and I don't want to. I might switch to another SQL database with column store support if it is free and runs on Windows (sorry, don't have Linux server but might have one if I have to).
Your ideas
What would you recommend as the first simple steps?
Two thoughts immediately come to mind for reporting:
1). Set up some summary (aka "aggregate") tables that are precomputed results of the queries that your users are likely to run. Eg. A table containing the counts and sums grouped by the various dimensions. This can be an automated process -- a db function (or script) gets run via your job scheduler of choice -- that refreshes the data every N minutes.
2). Regarding replication, if you are using Streaming Replication (PostgreSQL 9+), the changes in the master db are replicated to the slave databases (hot standby = read only) for reporting.
Tune the report query. Use explain. Avoid procedure when you could do it in pure sql.
Tune the server; memory, disk, processor. Take a look at server config.
Upgrade postgres version.
Do vacuum.
Out of 4, only 1 will require significant changes in the application.

Fast interaction with database

I'm developing a project which gets some data from a hardware every 100 milliseconds. I'm using Visual Studio 2010 and C#. The data size is about 50KB in each round. The customer wants to log all the data in the database for statistical purposes.
I prefer using SQL Server 2005+ since I'm familliar with it and the project should be done in about 15 days it's a small size project.
Is this a reasonable speed for such a data size to be inserted into db? Do you suggest any generic approaches to speed up the interactions? (using sql commands, EF, other technologies which could have a positive effect on speed).
If this is way too fast for SQL Server to handle, what do you suggest I should use which:
1-has a quick learning curve.
2-could accept queries for statistical data.
3- could satisfy my speed interaction needs.
I'm thinking about System.Data.SQLite If it's a no go on SQL Server. But I don't know about the learning curve and speed enhancements.
500kb per second is nothing. I work with Sql databases that does gigabytes per second, it all depends on the hardware and server configuration underneath, but lets say you were to run this on a standard office desktop, you will be fine. Even then I would say you can start thinking about new hardware if you look at 20Mb per second or more.
Second part of your question. Since you are using c#, I suggest you use SQL 2008 and then use a table valued parameter (TVP), and then buffer the data in the application, in a dataset or datatable until you have say 10K rows, and then you call the proc to do the insert, and all you do is pass it the datatable as a parameter. This will save hundreds or thousands of ad-hoc inserts.
Hope this is clear, if not, ask an I will try to explain further.
50kB every 100 millseconds is 500kB a second. These days networks run at gigabit speeds (many megabytes per second) and hard drives can cope with hundreds of MB per second. 500kB is a tiny amount of data, so I'd be most surprised if SQL server can't handle it.
If you have a slow network connection to the server or some other problem that means it struggles to keep up, then you can try various strategies to improve things. Ideas might be:
Buffer the data locally (and/or on the server) and write it into the database with a separate thread/process. If you're not continually logging 24 hours a day, then a slow server would catch up when you finish logging. Even if you are logging continuously, this would smooth out any bumps (e.g. if your server has periods of "busy time" where it is doing so much else that it struggles to keep up with the data from your logger)
Compress the data that is going to the server so there's less data to send/store. If the packets are similar you may find you can get huge compression ratios.
If you don't need everything in each packet, strip out anything "uninteresting" from the data before uploading it.
Possibly batching the data might help - by collecting several packets and sending them all at once you might be able to minimise transmission overheads.
Just store the data directly to disk and use the database just to index the data.
... So I'd advise writing a prototype and see how much data you can chuck at your database over your network before it struggles.

Resources