Is the oft-quoted handles limit of 10,000 for both GDI and users objects, or for each?
In other words, is the limit 10,000, or 20,000?
I've inherited a large application that creates a large number of Windows forms, with a large number of controls on each form, and I'm hitting the handles limit (I think), with an OutOfMemoryException. Just need to know what the actual limit is in total, to better understand why I'm getting this.
Related
What is the maximum size limit of a SQLite db file containing one table on Android devices? Is there any limit on the number of columns inside a table?
The answers to these and others can be found here Limits In SQLite
File size as far as SQLite is concerned will more than likely be the constraint of the underlying file system rather than SQLite's potential for a theoretical limit of 140 Terabytes (241TB as from Version 3.33.0 - see Update). The underlying restriction, as far as SQLite is concerned, being the maximum number of pages which defaults to 1073741823 but can be as large as 2147483646. as per :-
Maximum Number Of Pages In A Database File
SQLite is able to limit the size of a database file to prevent the
database file from growing too large and consuming too much disk
space. The SQLITE_MAX_PAGE_COUNT parameter, which is normally set to
1073741823, is the maximum number of pages allowed in a single
database file. An attempt to insert new data that would cause the
database file to grow larger than this will return SQLITE_FULL.
The largest possible setting for SQLITE_MAX_PAGE_COUNT is 2147483646.
When used with the maximum page size of 65536, this gives a maximum
SQLite database size of about 140 terabytes.
The max_page_count PRAGMA can be used to raise or lower this limit at
run-time.
Maximum Number Of Rows In A Table
The theoretical maximum number of rows in a table is 2^64
(18446744073709551616 or about 1.8e+19). This limit is unreachable
since the maximum database size of 140 terabytes will be reached
first. A 140 terabytes database can hold no more than approximately
1e+13 rows, and then only if there are no indices and if each row
contains very little data.
Maximum Database Size
Every database consists of one or more "pages". Within a single
database, every page is the same size, but different database can have
page sizes that are powers of two between 512 and 65536, inclusive.
The maximum size of a database file is 2147483646 pages. At the
maximum page size of 65536 bytes, this translates into a maximum
database size of approximately 1.4e+14 bytes (140 terabytes, or 128
tebibytes, or 140,000 gigabytes or 128,000 gibibytes).
This particular upper bound is untested since the developers do not
have access to hardware capable of reaching this limit. However, tests
do verify that SQLite behaves correctly and sanely when a database
reaches the maximum file size of the underlying filesystem (which is
usually much less than the maximum theoretical database size) and when
a database is unable to grow due to disk space exhaustion.
Update (1 June 2021)
As of SQLite Version 3.33.0 (not yet included with Android) the maximum page size has been increased (doubled). So the theoretical maximum database size is now 281TB. As per :-
Maximum Number Of Pages In A Database File
SQLite is able to limit the size of a database file to prevent the database file from growing too large and consuming too much disk space. The SQLITE_MAX_PAGE_COUNT parameter, which is normally set to 1073741823, is the maximum number of pages allowed in a single database file. An attempt to insert new data that would cause the database file to grow larger than this will return SQLITE_FULL.
The largest possible setting for SQLITE_MAX_PAGE_COUNT is 4294967294. When used with the maximum page size of 65536, this gives a maximum SQLite database size of about 281 terabytes.
The max_page_count PRAGMA can be used to raise or lower this limit at run-time.
Maximum Database Size
Every database consists of one or more "pages". Within a single database, every page is the same size, but different database can have page sizes that are powers of two between 512 and 65536, inclusive. The maximum size of a database file is 4294967294 pages. At the maximum page size of 65536 bytes, this translates into a maximum database size of approximately 1.4e+14 bytes (281 terabytes, or 256 tebibytes, or 281474 gigabytes or 256,000 gibibytes).
This particular upper bound is untested since the developers do not have access to hardware capable of reaching this limit. However, tests do verify that SQLite behaves correctly and sanely when a database reaches the maximum file size of the underlying filesystem (which is usually much less than the maximum theoretical database size) and when a database is unable to grow due to disk space exhaustion.
However, other limits may be of concern so it is suggested that the document as linked to above is studied.
The default maximum number of columns is 2000, you can change this at compile time to to maximum of 32767, as per :-
Maximum Number Of Columns
The SQLITE_MAX_COLUMN compile-time parameter is used to set an upper
bound on:
The number of columns in a table The number of columns in an index The
number of columns in a view The number of terms in the SET clause of
an UPDATE statement The number of columns in the result set of a
SELECT statement The number of terms in a GROUP BY or ORDER BY clause
The number of values in an INSERT statement The default setting for
SQLITE_MAX_COLUMN is 2000. You can change it at compile time to values
as large as 32767. On the other hand, many experienced database
designers will argue that a well-normalized database will never need
more than 100 columns in a table.
In most applications, the number of columns is small - a few dozen.
There are places in the SQLite code generator that use algorithms that
are O(N²) where N is the number of columns. So if you redefine
SQLITE_MAX_COLUMN to be a really huge number and you generate SQL that
uses a large number of columns, you may find that sqlite3_prepare_v2()
runs slowly.
The maximum number of columns can be lowered at run-time using the
sqlite3_limit(db,SQLITE_LIMIT_COLUMN,size) interface.
I am experiencing extremely slow performance of Google Cloud Datastore queries.
My entity structure is very simple:
calendarId, levelId, levelName, levelValue
And there are only about 1400 records and yet the query takes 500ms-1.2 sec to give back the data. Another query on a different entity also takes 300-400 ms just for 313 records.
I am wondering what might be causing such delay. Can anyone please give some pointers regarding how to debug this issue or what factors to inspect?
Thanks.
You are experiencing expected behavior. You shouldn't need to get that many entities when presenting a page to user. Gmail doesn't show you 1000 emails, it shows you 25-100 based on your settings. You should fetch a smaller number (e.g., the first 100) and implement some kind of paging to allow users to see other entities.
If this is backend processing, then you will simply need that much time to process entities, and you'll need to take that into account.
Note that you generally want to fetch your entities in large batches, and not one by one, but I assume you are already doing that based on the numbers in your question.
Not sure if this will help but you could try packing more data into a single entity by using embedded entities. Embedded entities are not true entities, they are just properties that allow for nested data. So instead of having 4 properties per entity, create an array property on the entity that stores a list of embedded entities each with those 4 properties. The max size an entity can have is 1MB, so you'll want to pack the array to get as close to that 1MB limit as possible.
This will lower the number of true entities and I suspect this will also reduce overall fetch time.
I'm looking for a space-efficient key-value mapping/dictionary/database which satisfies certain properties:
Format: The keys will be represented by http(s) URIs. The values will be variable length binary data.
Size: There will be 1-100 billion unique keys (average length 60-70 bytes). Values will initially only be a few tens of bytes but might eventually grow to tens of kilobytes in size (perhaps even more if I decide to store multiple versions). The total size of the data will be measured in terabytes or petabytes.
Hardware: The data will have to be distributed across multiple machines. This distribution should ensure that all URIs from a particular domain end up on the same machine. Furthermore, data on a machine will have to be distributed between the RAM, SSD, and HDD according to how frequently it is accessed. Data will have to be shifted around as machines are added or removed from the cluster. Replication is not needed initially, but might be useful later.
Access patterns: I need both sequential and (somewhat) random access to the data. The sequential access will be from a low-priority batch process that continually scans through the data. Throughput is much more important than latency in this case. Ideally, the iteration will proceed lexicographicaly (i.e. dictionary order). The random accesses arise from accessing the URIs in an HTML page, I expect that most of these will point to URIs from the same domain as the page and hence will be located on the same machine, while others will be located on different machines. I anticipate needing at most 100,000 to 1,000,000 in-memory random accesses per second. The data is not static. Reads will occur one or two orders of magnitude more often than writes.
Initially, the data will be composed of 100 million to 1 billion urls with several tens of bytes of data per url. It will be hosted on a small number of cheap commodity servers with 10-20GBs of RAM and several TBs of hard drives. In this case, most of the space will be taken up storing the keys and indexing information. For this reason, and because I have a tight budget, I'm looking for something which will allow me to store this information in as little space as possible. In particular, I'm hoping to exploit the common prefixes shared by many of the URIs. In this way, I believe it might be possible to store the keys and index in less space than the total length of the URIs.
I've looked at several traditional data structures (e.g. hash-maps, self-balancing trees (e.g. red-black, AVL, B), tries). Only the tries (with some tricks) seem to have the potential for reducing the size of the index and keys (all the others store the keys in addition to the index). The most promising option I've thought of is to split URIs into several components (e.g. example.org/a/b/c?d=e&f=g becomes something like [example, org, a, b, c, d=e, f=g]). The various components would each index a child in subsequent levels of a tree-like structure, kind of like a filesystem. This seems profitable as a lot of URIs share the same domain and directory prefix.
Unfortunately, I don't know much about the various database offerings. I understand that a lot of them use B-trees to index the data. As I understand it, the space required by the index and keys exceeds the total length of the URLs.
So, I would like to know if anyone can offer some guidance as to any data structures or databases that can exploit the redundancy in the URIs to save space. The other stuff is less important, but any help there would be appreciated too.
Thanks, and sorry for the verbosity ;)
What is the maximum number of records within a single custom object in salesforce.com?
There does not seem to be a limit indicated in https://login.salesforce.com/help/doc/en/limits.htm
But of course, there has to be a limit of some kind. EG: Could 250 million records be stored in a single salesforce.com custom object?
As far as I'm aware the only limit is your data storage, you can see what you've used by going to Setup -> Administration Setup -> Data Management -> Storage Usage.
In one of the Orgs I work with I can see one object has almost 2GB of data for just under a million records, and this accounts for a little over a third of the storage available. Your storage space depends on your Salesforce Edition and number of users. See here for details.
I've seen the performance issue as well, though after about 1-2M records the performance hit appears magically to plateau, or at least it didn't appear to significantly slow down between 1M and 10M. I wonder if orgs are tier-tuned based on volume... :/
But regardless of this, there are other challenges which make it less than ideal for big data. Even though they've increased the SOQL governor limit to permit up to 50 million records to be retrieved in one call, you're still strapped with a 200,000 line execution limit in Apex and a 10K DML limit (per execution thread). These can be bypassed through Batch Apex, yet this has limitations as well. You can only execute 250K batches in 24 hours and only have 5 batches running at any given time.
So... the moral of the story seems to be that even if you managed to get a billion records into a custom object, you really can't do much with the data at that scale anyway. Therefore, it's effectively not the right tool for that job in its current state.
2-cents
LaceySnr is correct. However, there is an inverse relationship between the number of records for an object and performance. Any part of the system that filters on that object will be impacted, such as views, reports, SOQL queries, etc.
It's hard to talk specific numbers since salesforce has upwards of a dozen server clusters, each with their own performance characteristics. And there's probably a lot of dynamic performance management that occurs regularly. But, in the past I've seen performance issues start to creep in around 2M records. One possible remedy is you can ask salesforce to index fields that you plan to filter on.
Here's the situation. Multi-million user website. Each user's page has a message section. Anyone can visit a user's page, where they can leave a message or view the last 100 messages.
Messages are short pieces of txt with some extra meta-data. Every message has to be stored permanently, the only thing that must be real-time quick is the message updates and reading (people use it as chat). A count of messages will be read very often to check for changes. Periodically, it's ok to archive off the old messages (those > 100), but they must be accessible.
Currently all in one big DB table, and contention between people reading the messages lists and sending more updates is becoming an issue.
If you had to re-architect the system, what storage mechanism / caching would you use? what kind of computer science learning can be used here? (eg collections, list access etc)
Some general thoughts, not particular to any specific technology:
Partition the data by user ID. The idea is that you can uniformly divide the user space to distinct partitions of roughly the same size. You can use an appropriate hashing function to divide users across partitions. Ultimately, each partition belongs on a separate machine. However, even on different tables/databases on the same machine this will eliminate some of the contention. Partitioning limits contention, and opens the door to scaling "linearly" in the future. This helps with load distribution and scale-out too.
When picking a hashing function to partition the records, look for one that minimizes the number of records that will have to be moved should partitions be added/removed.
Like many other applications, we could assume the use of the service follows a power law curve: few of the user pages cause much of the traffic, followed by a long tail. A caching scheme can take advantage of that. The steeper the curve, the more effective caching will be. Given the short messages, if each page shows 100 messages, and each message is 100 bytes on average, you could fit about 100,000 top-pages in 1GB of RAM cache. Those cached pages could be written lazily to the database. Out of 10 Mil users, 100,000 is in the ballpark for making a difference.
Partition the web servers, possibly using the same hashing scheme. This lets you hold separate RAM caches without contention. The potential benefit is increasing the cache size as the number of users grows.
If appropriate for your environment, one approach for ensuring new messages are eventually written to the database is to place them in a persistent message queue, right after placing them in the RAM cache. The queue suffers no contention, and helps ensure messages are not lost upon machine failure.
One simple solution could be to denormalize your data, and store pre-calculated aggregates in a separate table, e.g. a MESSAGE_COUNTS table which has a column for the user ID and a column for their message count. When the main messages table is updated, then re-calculate the aggregate.
It's just shifting the bottleneck from one place to another, but it might move it somewhere that's less of a burden.