Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I'm being optimize my Oracle Database.
I confusing about IO Performance between write concurrent 10 request to a Table and write concurrent 10 request to 10 table
and if i have 10 type data can store in 1 table --> which way bring the best performance insert data between 1 or 10 table
anybody know about it ?
Performance tuning is a large topic, and questions can't really by answered with so little information to begin with. But I'll try to provide some basic pointers.
If I got it right, you are mainly concerned with insert performance into what is currently a single table.
The first step should be to find out what is actually limiting your performance. Let's consider some scenarios:
disk I/O: Disks are slow. So get the fastest disks you can get. This might well mean SSDs. Put them in a RAID that is tuned for performance, "striping" is the key word as far as I know. Of course the SSDs will fail, as your HDs do so you want to plan for that. HDs are also faster when they aren't completely full (never really checked that). Partitioned tables might help as well (see below). But most of the time we can reduce the I/O load which is way more efficient then more and faster hardware ...
contention on locks (of primary keys for example).
Partitioned tables and indexes might be a solution. A partitioned table is logically one table (you can select it and write to it just like a normal table), but internally the data gets spread across multiple tables. A partitioned index is similar but an index. This might help, because an index underlying a unique key get locked when a new value gets added, so two sessions can't insert the same value. If the values are spread between n indexes, this might reduce the contention on such locks. Also partitions can be spread over different tablespaces/disks, so you have less waiting time for your physical stuff.
time to very constraints: If you have constraints on the table they need time to do their job. If you do batch insert, you should consider deferred constraints, they only get checked on commit time instead of on every insert. If you are careful with your application you can even disable them and enable them afterwards without checking them. This is fast, but of course you have to be really really sure the constraints actually hold. of course you should make sure your constraints have all the indexes they need to perform good.
talking about batch inserts. If you are doing those you might want to look into direct load: http://docs.oracle.com/cd/A58617_01/server.804/a58227/ch_dlins.htm (I think this is the Oracle 8 version, I'm sure there is an updated documentation somewhere)
To wrap it up. Without knowing where exactly your performance problem is, there is no way one can tell how to fix it. So find out where your problem is, then come back with with a more precise question.
Related
Good day,
In my java web application, I have a table, which having 107 columns, and this table also a parent table, and having many child tables. Currently this table is having more than 10 millions row of records in production.
Since last year, the java web application keep hitting slowness issue. After checking and debugging, we found that the slowness is happen during update or select data from this table.
Every time having this issue, I will take the select query or update query to run a db2advis command to check its result, and everytime I am getting result that need >99% improvement to apply the recommended indexes. After add those indexes, will solve the slowness issue.
So until now, there are already 7~8 indexes being apply in this table. Today, I am being reported there is a slowness issue again. After checking, found that its also slowness issue during a select statement from this table and join other table. Same way, I run the db2advis command and result also >99% improvement and few recommended indexes.
However, I am starting to question myself, is all these solution is a good solution? If there is another slowness issue in future, should I apply the same solution again?
And everytime I get the result of db2advis, it will also have a part of unused existing indexes that list of drop index query, those indexes are the index that I insert previously. I believe this is because of those indexes is not related to current query for db2advis? So I can ignore this? Or these existing indexes will affected the performance?
As my understanding, there are disadvantage for index also, specially for insert and update statement.
Additionally, there is a policy for the system owner to keep the data for at least 7 years, thus, the owner is not going to do housekeeping for the database.
Would like to ask for advice, other than add index, and change the query to better query, is there any other way to overcome this issue?
This answer contains general advice about levers that may be available to you.
Your situation happens in many companies that are subject to regulatory requirements for multi-year online data retention.
When the physical data model is not designed to exploit range-partitioning for easy roll out of old data (without delete), performance can degrade over time especially when business changes or legal changes impact data distributions.
Your question is not about programming, but instead it is about performance management, and that is a big topic.
Because of that reason, your question may be more suitable for dba.stackexchange.com. This stackoverflow website is intended for more specific programming questions.
Always focus on the whole workload, not only a single query. A "good solution" for one query may be bad for another aspect of functionality.
Adding one index can speed up one query but negatively impact other insert/update/delete activities, as you mention.
Companies that have a non-production environment that has the same (or higher) volumes of data with matching distributions can exploit such environments for performance-measurement , especially if they have a realistic test workload-generator and instrumentation for profiling.
Separately, keep in mind the importance of designing the statistics collection properly - sometimes column-group-statistics can have a big impact to help index selection even for existing indexes, other times the use of distribution-statistics can greatly help dynamic SQL, and statistical-views can help with many problems. So before adding new indexes always consider if other kinds of techniques can help especially if the join columns are already indexed correctly, and foreign-key indexes are present , but for some reason the Db2-optimiser is ignoring the indexes.
If the Db2 index lastused column (in syscat.indexes) shows that an index is never used or used extremely rarely, then you should investigate why the index was created, and why some queries that might be expected to benefit from that specific index are ignoring the index. Sometimes, it's necessary to reorder the columns in the index to ensure that the highest selectivity columns are at the lowest ordinal position.
There are other levers you can adjust, MQT, MDC, optimisation profiles (hints), registry settings, optimisation-levels, but the start point is a good data model and good measurements.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 6 years ago.
Improve this question
I could see that manually splitting a complicated queries to a chain of kind select [...] into #t and then select [...] from #t [...] would prove visibly faster. It may have been caused by a bad design.
But nevertheless, is it possible for an SQL engine/optimizer to use such temporary tables if it would consider it useful?
Also if the kind of manual splitting I have described above results in a significant performance increase does it imply that the design of the original view/underlying tables is significantly flawed?
Using a chain of temp table does NOT prove a bad design. It also doesn't guaranty you get better performance. You should consider using temp table wisely. sometimes it improves performance and sometime it only increases disk read/write.
One obvious usage of temp table in these kind of scenario is, you store data into a temp table to eliminate multiple execution of a slow query. By this mean, if you execute an expensive query, multiple times in your code, you should consider using temp table and use stored data instead of executing the query in multiple place of your code.
The other case might be because of making code more understandable and easier to read (if performance is not an issue)
One other case is, you use temp table to be able to add index to your data. In such a case, you store data into temp table and then create index on it to gain better read performance.
In general, if you should not use temp table for a query that executes only once in your code.
Finally, SQL Server uses temp table for it's own use, not the way you may want to store the data. So don't count on SQL Server to do, what you may want to do using temp table.
what's a fast way to query large amounts of data (between 10.000 - 100.000, it will get bigger in the future ... maybe 1.000.000+) spread across multiple tables (20+) that involves left joins, functions (sum, max, count,etc.)?
my solution would be to make one table that contains all the data i need and have triggers that update this table whenever one of the other tables gets updated. i know that trigger aren't really recommended, but this way i take the load off the querying. or do one big update every night.
i've also tried with views, but once it starts involving left joins and calculations it's way too slow and times out.
Since your question is too general, here's a general answer...
The path you're taking right now is optimizing a single query/single issue. Sure, it might solve the issue you have right now, but it's usually not very good in the long run (not to mention the cumulative cost of maintainance of such a thing).
The common path to take is to create an 'analytics' database - the real-time copy of your production database that you're going to query for all your reports. This analytics database can eventually be even a full blown DWH, but you're probably going to start with a simple real-time replication (or replicate nightly or whatever) and work from there...
As I said, the question/problem is too broad to be answered in a couple of paragraphs, these only some of the guidelines...
Need a bit more details, but I can already suggest this:
Use "with(nolock)", this will slightly improve the speed.
Reference: Effect of NOLOCK hint in SELECT statements
Use Indexing for your table fields for fetching data fast.
Reference: sql query to select millions record very fast
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
Given the following requirements of a persistent key/value store:
Only fetch, insert, and full iteration of all values (for exports) are required
No deleting values or updating values
Keys are always the same size
Code embedded in the host application
And given this usage pattern:
Fetches are random
Inserts and fetches are interleaved with no predictability
Keys are random, and inserted in random order
What is the best on-disk data structure/algorithm given the requirements?
Can a custom implementation exceed the performance of LSM-based (Log Structured Merge) implementations (i.e. leveldb, rocksdb)?
Would a high performance custom implementation for these requirements also be considerably simpler in implementation?
While it might be possible to have better performance custom implementation for your requirements, a well-configured RocksDB should be able to beat most of such custom implementations in your case. Here is what I would config RocksDB:
First, since you don't have updates and deletes, it's best to compact all data into big files in RocksDB. Because RocksDB sort these keys in a customizable order, having some big files gives you faster read performance as binary search across some big files is faster than across multiple small files. Typically, having big files hurt the performance of compaction --- the process of reorganizing big part of the data in RocksDB such that 1. multiple updates associated with a single key will be merged; 2. execute deletions to free disk space; 3. keep the data sorted. However, since you don't have updates and deletes, having big files give you fast read and write performance.
Second, specify large bits for bloom filter, this allow you to avoid most IOs when you're likely to issue some queries where keys do not exist in RocksDB.
So a read request goes like this. First, it compares the query key with the key ranges of those big files to identify which file the query key might located. Then, once the file(s) which key-range covers the query key, it will computes the bloom bits of the query key to see if the query key may exist in those files. If so, then a binary search inside each file will be triggered to identify the matched data.
As for scan operation, since RocksDB internally sort data in a user customizable order, scan can be done very efficiently using RocksDB's iterator.
More information about RocksDB basics can be found in here.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
A query on a non-indexed column will result in O(n) because it has to search the entire table. Adding an index allows for O(log n) due to binary search. Is there a way for databases to achieve O(1) using the same technique a hash table uses (or perhaps another way) if you are searching on a unique key?
Hash-based indices are supported by some rdbms under some conditions. For example, MySQL supports the syntax CREATE INDEX indexname USING HASH ON tablename (cols…) but only if the named table is stored in memory, not if it is stored on disk. Clustered tables are a special case.
I guess the main reason against widespread use of hash indices in rdbms is the fact that they scale poorly. Since disk I/O is expensive, a very thin index will require lots of I/O for little gain in information. Therefore you would prefer a rather densely populated index (e.g. keep the filled portion between ⅓ and ⅔ at all times), which can be problematic in terms of hash collisions. But even more problematic: as you insert values, such a dense index might become too full, and you'd have to increase the size of the hash table fairly often. Doing so will mean completely rebuilding the hash table. That's an expensive operation, which requires lots of disk I/O, and will likely block all concurrent queries on that table. Not something to look forward to.
A B-tree on the other hand can be extended without too much overhead. Even inceasing its depth, which is the closest analogon to an extension of the hash table size, can be done more cheaply, and will be required less often. Since B-trees tend to be shallow, and since disk I/O tends to outweight anything you do in memory, they are still the preferred solution for most practical applications. Not to mention the fact that they provided cheap access to ranges of values, which isn't possible with a hash.