Is there a restriction for using DimensionDeleteAllElements() in TM1 wherein it can't work in tandem with a dimension update process that's called from the TI which houses DimensionDeleteAllElements()?
I've a TI which deletes all elements of a dimension using DimensionDeleteAllElements() and subsequently rebuilds it by calling another TI process which updates the dimension with elements from the database. This serves to weed out unnecessary elements.
After successful execution of this TI, I can find that the elements are wiped out in the dimension. But the dimension fails to get rebuilt. However, according to the tm1server log the secondary TI that updates the dimension with database elements completes its execution normally. Also, running the dimension update TI manually works fine and updates the dimension with elements from the database.
Should I use the contents of the dimension update process here in this TI instead of calling that?
Let me state this plainly... you should emphatically NOT, under any circumstances, be doing what you are doing.
The general consensus amongst TM1 experts is that except in very, very exceptional cases (such as creating a reference dimension which is not used in any cubes), DimensionDeleteAllElements() is too dangerous to be used. (Example 1, Example 2.) If the TI process fails part way through you can lose your elements. Lose your elements, and you lose your data.
You haven't specified the tab on which you're making that call but let me explain how a metadata update (currently) works. (It works a bit differently with the new functions like DimensionElementInsertDirect, or the new Restful API which is stateless, but for the purposes of this exercise it still applies.)
Any changes that you make to a dimension in the Prolog or Metadata tabs will cause a copy of the dimension to be made in memory.
After the last row of the datasource (if any) is processed on the Metadata tab or, if there is no datasource, after the execution of the code passes through the Metadata tab on its way to the Prolog, the copie(s) of the changed dimensions will be checked for integrity and, if they pass that check, will be registered as replacements for the original dimension objects.
Before the second of those things happen, however, the rest of the system does not know about the copy of the dimension. They are similar to private objects that only the TI process itself knows about.
So what is happening in your case is this:
Your first process executes the DimensionDeleteAllElements command. This causes a copy of the dimension to be created and all of the elements to be removed.
I would guess that you are calling the second process on your Prolog tab. (I would hope it's not the Metadata tab, otherwise you'll be executing the call once for every row in your record source, if any.)
When that process is called it will do the rebuild of the dimension. It will do this by creating its own copy of the dimension in memory, quite separate from the first process' one, updating that copy, then registering it as the new dimension once it passes its own Metadata tab.
Control will then return to the Prolog of the first process which, you may remember, still has its own copy of the dimension in memory, one which now has no elements. Once the first process passes the end of its own Metadata tab, it will do the integrity check (an absence of elements does not cause that check to fail) and register that dimension copy as the updated dimension, thus obliterating (or more precisely overwriting) the changes that the second process made.
The solution? If you are going to be calling DimensionDeleteAllElements (and you generally shouldn't) then you must do it in the Prolog of the same process that rebuilds the dimension. In that way the element deletion and the re-addition of the elements from the data source happens to the same copy of the dimension, and the resulting dimension is then registered.
You should not ever be removing N or S elements that contain data in cubes. These should never be "unnecessary elements" to be "weeded out". Doing so can cause hard to explain changes in cube values (since data vanishes with the elements) which is toxic from an auditing point of view.
C level elements are a different matter. If your intent is to remove all of those and allow the current hierarchy to be rebuilt from the source, it would be best to just iterate through the dimension elements (backwards) using the DimSiz and DimNm functions, and using the DType function to return the element type so that you can identify and delete consolidations. This is obviously done in the Prolog.
Related
I have a multi-dimensional array of strings (arbitrary length but generally no more than 5 or 6 characters) which needs to be random-read capable. Currently my issue is that I cannot create this array in any programming language (which keeps or loads the entire array into memory) since the array would far exceed my 32GB RAM.
All values in this database are interrelated, so if I break the database up into smaller more manageable pieces, I will need to have every "piece" with related data in it, loaded, in order to do computations with the held values. Which would mean loading the entire database, so we're back to square one.
The array dimensions are: [50,000] * [50,000] * [2] * [8] (I'll refer to this structure as X*Y*Z*M)
The array needs to be infinitely resizable on the X, Y, and M dimensions, though the M dimension would very rarely be changed, so a sufficiently high upper-bound would be acceptable.
While I do have a specific use-case for this, this is meant to be a more general and open-ended question about dealing with huge multi-dimensional arrays - what methods, structures, or tricks would you recommend to store and index the values? The array itself, clearly needs to live on disk somewhere, as it is far too large to keep in memory.
I've of course looked into the basic options like an SQL database or a static file directory structure.
SQL doesnt seem like it would work since there is an upper-bound limitation to column widths, and it only supports tables with columns and rows - SQL doesn't seem to support the kind of multidimensionality that I require. Perhaps there's another DBMS system for things like this which someone recommends?
The static file structure seemed to work when I first created the database, but upon shutting the PC down and everything being lost from the read cache, the PC will no longer read from the disk. It returns zero on every read and doesnt even attempt to actually read the files. Any attempt to enumerate the database's contents (right-clicking properties on the directory) will BSOD the entire PC instantly. There's just too many files and directories, windows can't handle it.
I'm working on a project in which I want all documents in a "pool" to be returned by searching for any element in the pool.
So for instance lets say we have 3 pools, each with varying documents labeled by letter
Pool 1: A, B, C
Pool 2: D
Pool 3: E, F, G, H
When I search for A, I want to get A, B, and C. When I search C, I also want to get A, B, and C.
If I add a document I, and it satisfies criteria for Pool 1 and 2, then Pools 1 and 2 should be merged, and any search for any A, B, C, D, I should return all of them.
I know how to do this inefficiently (create a new document with each element as key, then update all documents on each insertion), but I was wondering if there was a better way?
Thanks in advance
I think that with something as abstract as data, particularly database documents, a good visualization helps with conceptualizing the problem. Try viewing this problem from the perspective of attempting to maintain a set of trees of depth no more than 1. Specifically, each document is a leaf and the "rules" that determine which ones are part of the "pool" are the root (i.e. the root is the subset of labels that can be a leaf).
Now, what you're saying you want to do is to be able to add a new leaf. If this leaf is able to connect to more than one root, then those roots should be merged, which means updating what the root is and pointing every leaf from the affected trees to this new root.
Otherwise, what you end up with is the need to jump around from the new leaf to each of the roots it connects to and then to every other leaf. But each other leaf could potentially also be connected to other roots, which means you could be jumping around like this an arbitrary number of times. This is a non-ideal situation.
In order for this query to be efficient, you need to decide what these "roots" are going to be and update those accordingly. You may, for instance, decide to keep a "pool" document and merge these "pools" together as needed, e.g. by having a labels field that is an array of labels to be included in the pool. Merging is then just a matter of merging the arrays themselves. Alternatively, you could use a common ObjectId (not necessarily attached to any particular document) and use this value as a sort of "pseudo root node" in place of having documents. There are a number of options you could explore. In general, however, you should try to reduce any examining of field values for individual documents down to a single value check (e.g. don't keep arrays of other "related" labels in each document!).
Regardless of your approach, keep these tree structures in mind, consider what it means to traverse the nodes in terms of MongoDB queries, and determine how you want to traverse the nodes in order to 1) ensure that the number of "hops" you need between nodes is a constant-time operation, and 2) ensure that you can efficiently and reliably merge those roots without risk of data loss.
Finally, if you're finding that your update queries are too slow, then you're probably running into indexing problems. With proper indexes, updates on collections with even millions of documents shouldn't take any time at all. Additionally, if you're not doing a multi update and are instead running an individual update for each document, then your updates are badly written because you'll be running into O(n) search time and network overhead, which will slow your updates down to a crawl.
There are two parts of a TI process that confounds me to no end.
This process allegedly creates new dimensions for a cube (using attributes of some element) without a data source. But all I can see is that it creates the dimension name and right away moves on to adding an element to this dimension. How is that even possible, unless someone already created a dimension of that name, which is very unlikely? (Screenshot below)
Creating dimension without data source
This process is also said to add these newly created dimensions to an existing cube. How can that be performed? How will the existing data in that cube accomodate the new dimensions?
This process allegedly creates new dimensions for a cube
No, it doesn't, nor does it claim to. The commentary in the code doesn't say anything about creating a dimension, it says Create Dimension Name. That is, it is just working out what the dimension name should be for use in the DimensionElementInsert function. The attribute provides the base name for a dimension that should already exist. (Though it's something of an article of faith given that the DimensionExists function isn't called at any point. Of course, given the complete absence of error handling in TI there isn't much you could do about it even if it didn't exist.) The section of code above the one you have highlighted does NOT attempt to create a dimension - the DimensionCreate function s not called anywhere here - it simply parses the attribute value, character by character, replacing any spaces with underscores (after sticking rp_ in front of it) to get the correct dimension name.
Another attribute defines what the top element in the dimension should be. If that element does not exist, the code that you have highlighted creates it.
The comment by Nick McDermaid is correct; you CANNOT add dimensions to an existing cube. You can export the data, destroy the cube, build a new cube with the same name but with extra dimension(s), and import the old data into it, but that's different. And the import process would need to have some code to select the appropriate element(s) of the new dimension(s) to use when writing the data.
isn't that why they add new elements to the measure dimension instead when there's a need for adding more dimensionality to a cube
Measures dimensions do not exist, as such, in TM1. A dimension of a cube can be flagged as a "measure" dimension for communication with other systems that may need one, but they have no impact within TM1 itself. For convenience the last dimension of a cube is often referred to as "the measures dimension" but it has no significance beyond being a convenient name for identifying the dimension that holds the metrics that are stored in the cube.
More importantly, dimensions are dimensions, elements are elements. When you add elements to a dimension you are NOT changing the dimensionality of the cube. (You may (and probably will be) be changing the sparsity, but that's a completely different concept.) The only way to do that is by adding new dimensions to the cube which, as noted above, you can't actually do; you are instead destroying the old cube and replacing it with a new one which just happens to have the same name and a different number of dimensions. Given that doing so will trash every single slice, active form, view etc that was ever written for the cube, it's not something that is, or should be, done very often in practice.
How do you go about collecting and storing data which was not part of the initial database and software design? For example, if you've come up with a pointing system, you have to collect the points for every user which has already been registered. For new users, that would be easy, because the changes of the business logic will reflect the pointing system ... but the old ones?
In general, how does one deal with data, which should have been there from the beginning, but wasn't? Writing manual queries to collect the missing pieces? Using crons?
Well, you are asking for something that is by definition not possible, I think.
deal with data hich should have been there from the beginning, but wasn't?
Because if you are able to deduce the number of points from the existing data in the database. If that were possible, there is obviously no missing data.... Storing the points separately would make it redundant (still a fine option in case you need that for performance).
For example: stackoverflow rewards number of consecutive visits. Let's say they did not do that from the start. If they were logging date-of-visit already, you can recalc the points. So no missing data.
So if that is not possible, you need another solution: either get data from other sources (parse a webserver log) or get the business to draft some extra business rules for the determination of the default values for the existing users (difficult in this particular example).
Writing manual queries to collect the missing pieces? Using crons?
I would populate that in a conversion script or even in a special conversion application if very complex.
This question is in the context of Core Data, but if I am not mistaken, it applies equally well to a more general SQL case.
I want to maintain an ordered table using Core Data, with the possibility for the user to:
reorder rows
insert new lines anywhere
delete any existing line
What's the best data model to do that? I can see two ways:
1) Model it as an array: I add an int position property to my entity
2) Model it as a linked list: I add two one-to-one relations, next and previous from my entity to itself
1) makes it easy to sort, but painful to insert or delete as you then have to update the position of all objects that come after
2) makes it easy to insert or delete, but very difficult to sort. In fact, I don't think I know how to express a Sort Descriptor (SQL ORDER BY clause) for that case.
Now I can imagine a variation on 1):
3) add an int ordering property to the entity, but instead of having it count one-by-one, have it count 100 by 100 (for example). Then inserting is as simple as finding any number between the ordering of the previous and next existing objects. The expensive renumbering only has to occur when the 100 holes have been filled. Making that property a float rather than an int makes it even better: it's almost always possible to find a new float midway between two floats.
Am I on the right track with solution 3), or is there something smarter?
If the ordering is arbitrary i.e. not inherent in the data being modeled, then you have no choice but to add an attribute or relationship to maintain the order.
I would advise the linked list since it is easiest to maintain. I'm not sure what you mean by a linked list being difficult to sort since you won't most likely won't be sorting on an arbitrary order anyway. Instead, you will just fetch the top most instance and walk your way down.
Ordering by a divisible float attribute is a good idea. You can create an near infinite number of intermediate indexes just by subtracting the lower existing index from higher existing index, dividing the result by two and then adding that result to the lower index.
You can also combine an divisible index with a linked list if you need an ordering for tables or the like. The linked list would make it easy to find existing indexes and the divisible index would make it easy to sort if you needed to.
Core Data resist this kind of ordering because it is usually unnecessary. You don't want to add something to the data model unless it is necessary to simulate the real world object, event or condition that the model describes. Usually, ordering/sorting are not inherent to the model but merely needed by the UI/view. In that case, you should have the sorting logic in the controller between the model and the view.
Think carefully before adding ordering to model when you may not need it.
Since iOS 5 you can (and should) use NSOrderedSet and its mutable subclass. → Core Data Release Notes for OS X v10.7 and iOS 5.0
See the accepted answer at How can I preserve an ordered list in Core Data.