SQL Server table design with non fixed column - sql-server

I need your help in designing one table.
I have some groups tables and we need to load data in that group tables from xml files that contain column names and data.
The column name is actually index of some main column like activity_col1, activity_col2 and so on and not fixed every time, there is possibility that same table file contains 1000 columns sometimes and 10 column values some time also there is maximum limit is also defined so no file will contain more than
2000 column per group.
So I need to design a table that is the best possible solution for this also I need to do the aggregation of column values. The files contain min level data and I need to store this data in min table and after that this min data need to be aggregated in an hour, day, week and month.
If I create max columns in all tables but data will not come every time in all columns so this design seems not good because most of the values will be null.
If I insert column name as rows in column_name column and values against each column value in values column then aggregation will be a tedious task for me
and it will impact performance.
Please suggest.

One option would be EAV, but it's more complicated to build, to query and to insert, and readability is very low.
You require a schema-less design, Allowing an unlimited number of columns,
your best bet is probably to use a NoSQL solution. Even though the weaknesses of EAV relative to relational databases also apply to NoSQL alternatives.
Also take a look at here :
Benefits of NoSQL
Recommendations (as priority):
Choice EAV, If you are using a relational-database and this is
where you turn either the whole table or a portion (in another
table) on its side. This is a good choice if you already have a
relational-database in-house that you can't move away from easily.
Choice NoSQL, If does not matter kind of DBMS for you It is very
flexible and fast and not all of the report writers out there
support this style of storage. There are many example database
implementations of NoSQL. The one that seems to be most popular
right now, is MongoDB.
and the last option that I don't recommend you to use it:
Choice Standard tables with XML columns, If the you don't need to
query them, and you just want to be stored and retrieved as plain
text for using some extra usage.
I hope to be helpful for you:)

Related

Azure Database Large Table Group By Performance

I'm looking for design and/or index recommendations for the problem listed below.
I have a couple of denormalized tables in an Azure S1 Standard (20 DTU) database. One of those tables has ~20 columns and a million rows. My application requirements need me to support sub-second (or at least close to it) querying of this table by any combination of columns in my WHERE clause, as well as sub-second (or at least close to it) querying of DISTINCT values in each column.
In order to picture the use case behind this, here is an example. Imagine you were using an HR application that allowed you to search for employees and view employee information. The employee table might have 5 columns and millions of rows. The application allows you to filter by any column, and provides an interface to allow this. Therefore, the underlying SQL queries that must be made are:
A GROUP BY (or DISTINCT) query for each column, which provides the interface with the available filter options
A general employee search query, that filters all rows by any combination of filters
In order to solve performance issues on the first set of queries, I've implemented the following:
Index columns with a large variety of values
Full-Text index columns that require string matching (So CONTAINS querying instead of LIKE)
Do not index columns with a small variety of values
In order to solve the performance issues on the second query, I've implemented the following:
Forcing the front end to use pagination, implemented using SELECT * FROM table OFFSET 0 ROWS FETCH NEXT n ROWS ONLY, and ensuring the order by column is indexed
Locally, this seemed to work fine. Unfortunately, and Azure Standard database doesn't have the same performance as my local machine, and I'm seeing issues. Specifically, the columns I am not indexing (the ones with a very small set of distinct values) are taking 30+ seconds to query for. Additionally, while the paging is initially very quick, the query takes longer and longer the higher and higher I increase the offset.
So I have two targeted questions, but any other advice or design suggestions would be most welcome:
How bad is it to index every column in the table? Know that the table does need to be updated, but the columns that I update won't actually be part of any filters or WHERE clauses. Will the indexes still need to be rebuilt on update? You can also safely assume that the table will not see any inserts/deletes, except for once a month where the entire table is truncated and rebuilt from scratch
In regards to the paging getting slower and slower the deeper I get, I've read this is expected, but the performance becomes unacceptable at a certain point. Outside of making my clustered column the sort by column, are there any other suggestions to get this working?
Thanks,
-Tim

SQLite performance advice for .net

I am using SQLite in my application. The scenario is that I have stock market data and each company is a database with 1 table. That table stores records which can range from couple thousand to half a million.
Currently when I update the data in real time I - open connection, check if that particular data exists or not. If not, I then insert it and close the connection. This is then done in a loop and each database (representing a company) is updated. The number of records inserted is low and is not the problem. But is the process okay?
An alternate way is to have 1 database with many tables (each company can be a table) and each table can have a lot of records. Is this better or not?
You can expect at around 500 companies. I am coding in VS 2010. The language is VB.NET.
The optimal organization for your data is to make it properly normalized, i.e., put all data into a single table with a company column.
This is better for performance because the table- and database-related overhead is reduced.
Queries can be sped up with indexes, but what indexes you need depends on the actual queries.
I did something similar, with similar sized data in another field. It depends a lot on your indexes. Ultimately, separating each large table was best (1 table per file, representing a cohesive unit, in you case one company). Plus you gain the advantage of each company table being the same name, versus having x tables of different names that have the same scheme (and no sanitizing of company names to make new tables required).
Internally, other DBMSs often keep at least one file per table in their internal structure, SQL is thus just a layer of abstraction above that. SQLite (despite its conceptors' boasting) is meant for small projects and querying larger data models will get more finicky in order to make it work well.

how to manage millions/billions of small values in a "database"

I have an application that will generate millions of date/type/value entries. we don't need to do complex queries, only for example get the average value per day of type X between date A and B.
I'm sure a normal db like mysql isn't the best to handle these sort of things, is there a better system that like these sort of data.
EDIT: The goal is not to say that relational database cannot handle my problem but to know if another type of database like key/value database, nosql, document oriented, ... can be more adapted to what i want to do.
If you are dealing with a simple table as such:
CREATE TABLE myTable (
[DATE] datetime,
[TYPE] varchar(255),
[VALUE] varchar(255)
)
Creating an index probably on TYPE,DATE,VALUE - in that order - will give you good performance on the query you've described. Use explain plan or whatever equivalent on the database you're working with to review the performance metrics. And, setup a scheduled task to defragment that index regularly - frequency will depend on how often insert, delete and update occurs.
As far as an alternative persistence store (i.e. NoSQL) you don't gain anything. NoSQL shines when you want schema-less storage. In other words you don't know the entity definitions head of time. But from what you've described, you have a very clear picture of what you want to store, which lends itself well to a relational database.
Now possibilities for scaling over time include partitioning and each TYPE record into a separate table. The partitioning piece could be done by type and/or date. Really would depend on the nature of the queries you're dealing with, if you typically query for values within the same year for instance, and what your database offers in that regard.
MS SQL Server and Oracle offer concept of Partitioned Tables and Indexes.
In short: you could group your rows by some value, i.e. by year and month. Each group could be accessible as separate table with own index. So you can list, summarize and edit February 2011 sales without accessing all rows. Partitioned Tables complicate the database, but in case of extremely long tables it could lead to significantly better performance.
Based upon the costs you can choose either MySQL or SQL Server, in this case you have to be clear that what do you want to achieve with the database just for storage then any RDBMS can handle.
You could store the data as fixed length records in a file.
Do binary search on the file opened for random access to find your start and end records then sum the appropriate field for the given condition of all records between your start index and end index into the file.

Database which increasea every month, which design strategy should I use?

I have a database that increases every month. The schema remains the same, so I think I use one of these two methods:
Use only one table, new data will be appended to this table, and will be identified by a date column. The increasing data every month is about 20,000 rows, but in long term, I think this should be problem to search and analyze this data
create dynamically one table per month, the table name will indicate which data it contains (for example, Usage-20101125), this will force us to use dynamic SQL, but in long term, it seems fine.
I must confess that I have no experiences about designing this kind of database. Which one should I use in real world?
Thank you so much
20 000 rows per month is not a lot. Go with your first option. You didn't mention which database you'll be using, but SQL Server, Oracle, Sybase and PostgreSQL, to name just a few, can handle millions of rows comfortably.
You will need to investigate a proper maintenance plan, including indexing and statistics, but that will come with lots of reading and experience.
Look into partitioning your table.
That way you can physically store the data on different disks for performance while logically it would be one table so your database stays well designed.

How to design this database?

I have to design a database to store log data but I don't have experience before. My table contains about 19 columns (about 500 bytes each row) and daily grows up to 30.000 new rows. My app must be able to query effectively again this table.
I'm using SQL Server 2005.
How can I design this database?
EDIT: data I want to store contains a lot of type: datetime, string, short and int. NULL cells are about 25% in total :)
However else you'll do lookups, a logging table will almost certainly have a timestamp column. You'll want to cluster on that timestamp first to keep inserts efficient. That may mean also always constraining your queries to specific date ranges, so that the selectivity on your clustered index is good.
You'll also want indexes for the fields you'll query on most often, but don't jump the gun here. You can add the indexes later. Profile first so you know which indexes you'll really need. On a table with a lot of inserts, unwanted indexes can hurt your performance.
Well, given the description you've provided all you can really do is ensure that your data is normalized and that your 19 columns don't lead you to a "sparse" table (meaning that a great number of those columns are null).
If you'd like to add some more data (your existing schema and some sample data, perhaps) then I can offer more specific advice.
Throw an index on every column you'll be querying against.
Huge amounts of test data, and execution plans (with query analyzer) are your friend here.
In addition to the comment on sparse tables, you should index the table on the columns you wish to query.
Alternatively, you could test it using the profiler and see what the profiler suggests in terms of indexing based on actual usage.
Some optimisations you could make:
Cluster your data based on the most likely look-up criteria (e.g. clustered primary key on each row's creation date-time will make look-ups of this nature very fast).
Assuming that rows are written one at a time (not in batch) and that each row is inserted but never updated, you could code all select statements to use the "with (NOLOCK)" option. This will offer a massive performance improvement if you have many readers as you're completely bypassing the lock system. The risk of reading invalid data is greatly reduced given the structure of the table.
If you're able to post your table definition I may be able to offer more advice.

Resources