Can we get the benefits of the partitioning of a SQL server 2010 table when we use entity framework as the data layer?
The table will have 10 000 records per day and it will be partitioned by the date created (Ex :- Older than 30 days and new)
I'm not very skilled in SQL Server so perhaps I'm wrong but I believe that table partitioning should be transparent to queries (if we are talking about automatic partition function defined in the table) - it means that common queries should still work and even have better performance if partitioning is configured correctly. So in case of database-frist design, EF should not have any problem with this because it still works with single logical table. If you mean manual partitioning by creating new table each month then it is a big probrem with EF and you will need stored procedures to access that tables.
Related
I have a table with 339 million rows and twenty-one columns of which seventeen are varchar(100) types, two are integer types and one is a float and a DateTime type. It is in an Azure SQL server database. The table has no indices and only the primary key constraint. I aim to copy this table to a new database and delete the old one. My approach to this problem is to save the data on Delta Lake and use an Azure Data Factory pipeline to save it to the new database. I have used this approach several times before for migrating tables to new databases.
However, I am met with a strange problem. In the old database, the table is about 80Gb in total. Yet on the new database, it takes only 54% of the data (182 million rows) to fill 276Gb. There is no DBA in my team to help me with this matter. What could, possibly, be causing this? I hope I have included all the information that could help with this issue.
I have a large database that will use partitioned column-store tables in a redesign. Is it possible to specify the partition in the generated sql with Entity Framework Core 2.2?
This is for an Azure SQL hyperscale database with a table which currently contains about 3 billion rows. When using stored procedures to execute the requests performance is excellent but if the partition range is not specified in the query performance is less that optimal. I am hoping to move away from the inline sql we are currently using in the application layer and move to entity framework core. Being able to specify the partition for the tenant is our only blocker at the moment.
This is an example where clause in the stored proceduure
Select #Range = $PARTITION.TenantRange(#InputTenantId)
Select ..... FROM xxx where $PARTITION.TenantRange(TenantId) = #range
The above query would provide excellent performance but I am hoping I can make the same specification of the partition using entity framework.
I'm a (very) junior Analyst responsible for setting up an mssql DWH which hosts data from our CRM for reporting purposes.
The current CRM uses uniqueidentifiers in its mssql database for all keys, and some of the tables have 8m+ rows. In our reporting software (Qlikview) I can swap the GUIDs for ints and take an 800mb data file down to 90mb which is excellent, however I'd like to perform this logic in the DWH if possible to make it faster and a little cleaner.
My issue is I have no idea how to do so while maintaining FK links to other tables. I have considered maintaining a staging table of GUIDs and associated numeric IDs however this seems inefficient and poses a problem of then trying to write some arbitrary numeric ID to the PK column of the destination table which I'm sure is a terrible idea.
The DWH import works as follows: I have USPs on the source db performing SELECTs which are executed by a SSIS package, the output of which are placed in tables of the same name on the [Staging] schema of the DWH. From there, transform is performed by USPs on the DWH, also executed by the same SSIS package, which handles execution order and multi-threading. Whatever implementation I come up with will need to be compatible with this architecture (done within USPs that potentially run asynchronously).
I'm very much a SQL noob so I do ask to please link documentation if necessary or at least describe answers in a google-friendly way.
Is the removal of GUID is the major cause of possible shrink to 90mb ? Do you not need GUID to process the Report?
Do you strip the relationship and join almost all table into as few table as possible when creating the staging table?
If answer to number 1 and 2 is yes then you do not need GUID and simply need to have a int unique column.
I suggest in select command during creating/inserting staging table you use ROW_NUMBER for replacing the GUID column with int unique column. This is only going to work if you recreating the staging table each time running the SSIS Script.
If you are simply inserting data to an already existing Staging Table when running SSIS Script then you can just create an autoincrement primary column. When you insert data to Staging Table, do not insert to autoincrement primary column so the column is automatically generating unique int value.
I'm going to use a single table to aggregate historical data about our (very big) virtual infrastructure. The table will be composed of 15 to 30 fields, and I esitmate from 500 to 1000 records a day.
Why a single table? A couple of reasons:
Data is extracted to csv using powershell scripts. Then bulk load on a single table is very easy and fast.
I will use the table to connect excel and report through pivot tables. Then a single table is perfect (otherwise I should create views).
Now my question:
If I'm planning in the future to build cubes upon this table is the "single-table" choice a bad solution?
Do cubes rely on relational databases or they can be easily built upon single-table databases?
Thanks for any suggestion
Can't tell you specifically about SQL Server Analysis Services, but for OLAP you typically use denormalized and aggregated data. That means fewer tables than in a normal relational scenario. And as your data volume is not really big (365k rows/year - even small for OLAP), I don't see any problem using a single table for your data.
We have a SQL 2005/2008 database that has a table with a computed column. We're using the computed column as a discriminator in NHibernate so having it in the database is proving to be very useful.
In order to gain the benefits of faster integration tests, I'd like to be able to run our integration tests against an in-memory database such as SQLite or SQL CE. But I don't think either of those support the computed column.
Are there any other solutions to my problem? I have complete access to the database and can modify it if there's a better solution available. I've seen this post that suggests using a view instead of a computed column, is this the best alternative?
What I did was added the computed column to the DataTable when loading the table from SqlCe. I stored the definition of the computed DataColumn in a "configuration" table stored in the database. I was able to do complex calculations that depended on a "chain" of tables, where each table performed a simplier function of a more complex function. (The last table in the chain contained the results.) I used SqlCe because one table of five contained 15 million rows. Too much data for the in-memory data sets of ADO.NET. (I had a requirement of using local client based calculations before posting to server.)