Adding dimensions to existing cube in TM1 - cognos-tm1

There are two parts of a TI process that confounds me to no end.
This process allegedly creates new dimensions for a cube (using attributes of some element) without a data source. But all I can see is that it creates the dimension name and right away moves on to adding an element to this dimension. How is that even possible, unless someone already created a dimension of that name, which is very unlikely? (Screenshot below)
Creating dimension without data source
This process is also said to add these newly created dimensions to an existing cube. How can that be performed? How will the existing data in that cube accomodate the new dimensions?

This process allegedly creates new dimensions for a cube
No, it doesn't, nor does it claim to. The commentary in the code doesn't say anything about creating a dimension, it says Create Dimension Name. That is, it is just working out what the dimension name should be for use in the DimensionElementInsert function. The attribute provides the base name for a dimension that should already exist. (Though it's something of an article of faith given that the DimensionExists function isn't called at any point. Of course, given the complete absence of error handling in TI there isn't much you could do about it even if it didn't exist.) The section of code above the one you have highlighted does NOT attempt to create a dimension - the DimensionCreate function s not called anywhere here - it simply parses the attribute value, character by character, replacing any spaces with underscores (after sticking rp_ in front of it) to get the correct dimension name.
Another attribute defines what the top element in the dimension should be. If that element does not exist, the code that you have highlighted creates it.
The comment by Nick McDermaid is correct; you CANNOT add dimensions to an existing cube. You can export the data, destroy the cube, build a new cube with the same name but with extra dimension(s), and import the old data into it, but that's different. And the import process would need to have some code to select the appropriate element(s) of the new dimension(s) to use when writing the data.
isn't that why they add new elements to the measure dimension instead when there's a need for adding more dimensionality to a cube
Measures dimensions do not exist, as such, in TM1. A dimension of a cube can be flagged as a "measure" dimension for communication with other systems that may need one, but they have no impact within TM1 itself. For convenience the last dimension of a cube is often referred to as "the measures dimension" but it has no significance beyond being a convenient name for identifying the dimension that holds the metrics that are stored in the cube.
More importantly, dimensions are dimensions, elements are elements. When you add elements to a dimension you are NOT changing the dimensionality of the cube. (You may (and probably will be) be changing the sparsity, but that's a completely different concept.) The only way to do that is by adding new dimensions to the cube which, as noted above, you can't actually do; you are instead destroying the old cube and replacing it with a new one which just happens to have the same name and a different number of dimensions. Given that doing so will trash every single slice, active form, view etc that was ever written for the cube, it's not something that is, or should be, done very often in practice.

Related

Which is better in performance: a dynamic array determined at run-time vs a class structure in VBA?

I have created a spreadsheet userform that allows users to input information, do surface level data verification, and other things. The problem that I run into is because part of it involves adding all items involved in their submission (the input this is used for could be related to one or more items, sometimes in the higher 30's) I had to create an array and have the array write to the new sheet. This is terribly slow.
I've been learning about classes and thinking that they may be able to provide a sufficient alternative to an array since each item has the exact same information needed. (QTY, UOM, Type, etc) I was thinking of doing 2 classes, one where I have the member class for the item and another where it's the collection and it composes the members objects.
My question is if the performance of doing this would be better than using an array. Are arrays generally the better way of handling collections of data that need to be inputted and outputted to a sheet?

Dimension values not visible

Have created a sample ssas cube, in where it has one dimension and a measure.
Have linked dimension and measure using dimension usage tab. The scenarios is, not all dimensional values are available in the fact.
Have deployed and processed the cube.
When browsed only the dimension, the attribute is not showing all the values that are in DB, it is showing only those values that are in fact.
Got various links asking me to do full process of dimension, and it didnt work.
No matter the dimensional value is available in the measure or not, I want to see all the values when I browser only the dimension.
Any non empty property that I am missing here. Please let me know.
Isn't there a 'show empty cells' option in excel and ssms? I think you right click to enable it.

Deleting elements in a dimension and rebuilding them in TM1

Is there a restriction for using DimensionDeleteAllElements() in TM1 wherein it can't work in tandem with a dimension update process that's called from the TI which houses DimensionDeleteAllElements()?
I've a TI which deletes all elements of a dimension using DimensionDeleteAllElements() and subsequently rebuilds it by calling another TI process which updates the dimension with elements from the database. This serves to weed out unnecessary elements.
After successful execution of this TI, I can find that the elements are wiped out in the dimension. But the dimension fails to get rebuilt. However, according to the tm1server log the secondary TI that updates the dimension with database elements completes its execution normally. Also, running the dimension update TI manually works fine and updates the dimension with elements from the database.
Should I use the contents of the dimension update process here in this TI instead of calling that?
Let me state this plainly... you should emphatically NOT, under any circumstances, be doing what you are doing.
The general consensus amongst TM1 experts is that except in very, very exceptional cases (such as creating a reference dimension which is not used in any cubes), DimensionDeleteAllElements() is too dangerous to be used. (Example 1, Example 2.) If the TI process fails part way through you can lose your elements. Lose your elements, and you lose your data.
You haven't specified the tab on which you're making that call but let me explain how a metadata update (currently) works. (It works a bit differently with the new functions like DimensionElementInsertDirect, or the new Restful API which is stateless, but for the purposes of this exercise it still applies.)
Any changes that you make to a dimension in the Prolog or Metadata tabs will cause a copy of the dimension to be made in memory.
After the last row of the datasource (if any) is processed on the Metadata tab or, if there is no datasource, after the execution of the code passes through the Metadata tab on its way to the Prolog, the copie(s) of the changed dimensions will be checked for integrity and, if they pass that check, will be registered as replacements for the original dimension objects.
Before the second of those things happen, however, the rest of the system does not know about the copy of the dimension. They are similar to private objects that only the TI process itself knows about.
So what is happening in your case is this:
Your first process executes the DimensionDeleteAllElements command. This causes a copy of the dimension to be created and all of the elements to be removed.
I would guess that you are calling the second process on your Prolog tab. (I would hope it's not the Metadata tab, otherwise you'll be executing the call once for every row in your record source, if any.)
When that process is called it will do the rebuild of the dimension. It will do this by creating its own copy of the dimension in memory, quite separate from the first process' one, updating that copy, then registering it as the new dimension once it passes its own Metadata tab.
Control will then return to the Prolog of the first process which, you may remember, still has its own copy of the dimension in memory, one which now has no elements. Once the first process passes the end of its own Metadata tab, it will do the integrity check (an absence of elements does not cause that check to fail) and register that dimension copy as the updated dimension, thus obliterating (or more precisely overwriting) the changes that the second process made.
The solution? If you are going to be calling DimensionDeleteAllElements (and you generally shouldn't) then you must do it in the Prolog of the same process that rebuilds the dimension. In that way the element deletion and the re-addition of the elements from the data source happens to the same copy of the dimension, and the resulting dimension is then registered.
You should not ever be removing N or S elements that contain data in cubes. These should never be "unnecessary elements" to be "weeded out". Doing so can cause hard to explain changes in cube values (since data vanishes with the elements) which is toxic from an auditing point of view.
C level elements are a different matter. If your intent is to remove all of those and allow the current hierarchy to be rebuilt from the source, it would be best to just iterate through the dimension elements (backwards) using the DimSiz and DimNm functions, and using the DType function to return the element type so that you can identify and delete consolidations. This is obviously done in the Prolog.

Iterating and storing thousands/millions of objects efficiently

I am working on a simulation where I need to be able to handle thousands potentialy millions of objects updating every loop.
All of the objects need to have their logic function called (AI).
But depending on the location of the object determines how detailed the logic will be. For example:
[working with 100 objects to keep it simple]
All objects have a location (x,y)
20 objects are 500 points away
from a 'point of interest' location.
50 objects are 500 points
from the 20 objects (1000 points away).
30 objects are within 100
points from the point of interest.
Now say this was a detailed city simulation with the objects being virtual citizens.
At 6pm it's time for everyone to go home from their jobs and sleep.
So we iterate through all citizens, but I'm wanting them to do different things.
The furtherest away objects (50) Go home from their job and sleep
until morning.
The closer objects (20) Go home from their job, have a
bite to eat then sleep until morning.
The closest objects (30) Go
home from their job, have a bite to eat, brush teeth then sleep until
morning.
As you can see the closer they are to the point of interest the more detailed the logic becomes.
I am trying to work out what the best and most performance efficient way to iterate through all objects would be.
This would be relativly easy with a hand full of objects but as this needs to handle at lest 500,000 objects efficiently, I need some advice.
Also I'm not sure if I should iterate through all objects every loop or maybe it would be better to iterate through the closest objects every loop but only itereate through further away objects every 10 loops?
With the additional requirement of needing the objects to interact between other objects close to them, I have been thinking the best way to do this might be to organise them in a quadtree but I'm not sure. It seems as though quad trees are more for static content, but the objects i'm dealing with, as mentioned have a location and are required to move to other locations.
Am I going down the right track of thinking? or is there a 'better' way?
I am also working in c++ if anyone thinks its relevant.
Any advise would be greatly appreciated.
NOTE:
The point of interest changes regularly, think of it as a camera
view.
Objects are created and destroyed dynamically
If you want to quickly select objects in certain radius from particular point, then quad-tree or just simple square grid will help.
If your problem is how to store millions of objects to make iteration through them efficient, then you probably could use column based technique, when instead of having 1 million objects each having 5 fields, you have 5 arrays of 1 million elements each. In this case each object is just an index in range 0 .. 999999. So, for example, you want to store 1 million object of the following structure:
struct resident
{
int x;
int y;
int flags;
int age;
int health; // This is computer game, right?
}
Then, instead of declaring resident residents [1000000] you declare 5 arrays:
int resident_x [1000000];
int resident_y [1000000];
int resident_flags [1000000];
int resident_age [1000000];
int resident_health [1000000];
And then, instead of, say, residents [n].x you use resident_x [n]. Such way to store objects may be faster when you need to iterate through all objects of the same type and do something with couple of fields in each object (with the same set of fields in each object).
You need to break the problem down into "classes", just like in the real world. Each person's class is computed from the distance. So lower class people are far away and upper class are close. Or more correctly "far class", nearish class and "here class" or whatever you want to name them.
1) Make an array with one slot for each class. This slot will hold a "linked list" of each person in that class. When a persons class changers(social climbers), then it is very rapid to move the object to another list.
2) So put everybody into the proper classes and iterate only the classes close to you. In a proper scenario there are objects which are to far away to care about so you can put those back to disk and only reload when you get nearer.
There's a few questions embedded in there:
-How to deal with large quantities of objects? If there is a constant number of fixed objects, you may be able to simply create an array of them, as long as you have sufficient memory. If you need to dynamically create and destroy them, you put yourself at risk for memory leaks without careful handling of destroyed objects. At a certain point, you may ask yourself whether it is better to use another application, such as a database, to store your objects, and perform just the logic in your C++ code. Databases will provide additional functionality that I will highlight.
-How to find objects in a given distance from others. This is a classic problem for geographic information systems (GIS); it sounds like you are trying to operate a simple GIS to store your objects and your attributes, so it is applicable. It takes computation power to test SQRT((X-x)^2+(Y-y)^2), the distance formula, on every point. Instead, it is common to use a 'windowing function' to extract a square containing all the points you want, and then search within this to find points that lie specifically in a given radius. Some databases are optimized to perform a variety of GIS functions, including returning points within a given radius, or returning points within some other geometry like a polygon. Otherwise you'll have to program this functionality yourself.
-Tree storage of objects. This can improve speed, but you will hit a tradeoff if the objects are constantly moving around, wherein the tree has to be restructured often. It all depends on how often things move versus how often you want to do calculations on them.
-AI code. If you're trying to do AI on millions of objects, this may be your biggest use of performance, instead of the methodology used to store and search the objects. You're right in that simpler code for points farther away will increase performance, as will executing the logic less often for far away points. This is sometimes handled using Monte Carlo analysis, where the logic will be performed on a random subset of points during any given iteration, and you could have the probability of execution decrease as distance from the point of interest increases.
I would consider using a Linear Quadtree with Morton Encoding / Z-Order indexing. You can further optimize this structure by using a Bit Array to represent nodes that contain data and very quickly perform calculations.
I've done this extremely efficiently in the browser using Javascript and I can traverse through 67 million nodes in sub-seconds. Once I've narrowed it down to the region of interest, I look up the data in a different structure. All of it still in milliseconds. I'm using this for spatial vector animation.

How should I change my Graph structure (very slow insertion)?

This program I'm doing is about a social network, which means there are users and their profiles. The profiles structure is UserProfile.
Now, there are various possible Graph implementations and I don't think I'm using the best one. I have a Graph structure and inside, there's a pointer to a linked list of type Vertex. Each Vertex element has a value, a pointer to the next Vertex and a pointer to a linked list of type Edge. Each Edge element has a value (so I can define weights and whatever it's needed), a pointer to the next Edge and a pointer to the Vertex owner.
I have a 2 sample files with data to process (in CSV style) and insert into the Graph. The first one is the user data (one user per line); the second one is the user relations (for the graph). The first file is quickly inserted into the graph cause I always insert at the head and there's like ~18000 users. The second file takes ages but I still insert the edges at the head. The file has about ~520000 lines of user relations and takes between 13-15mins to insert into the Graph. I made a quick test and reading the data is pretty quickly, instantaneously really. The problem is in the insertion.
This problem exists because I have a Graph implemented with linked lists for the vertices. Every time I need to insert a relation, I need to lookup for 2 vertices, so I can link them together. This is the problem... Doing this for ~520000 relations, takes a while.
How should I solve this?
Solution 1) Some people recommended me to implement the Graph (the vertices part) as an array instead of a linked list. This way I have direct access to every vertex and the insertion is probably going to drop considerably. But, I don't like the idea of allocating an array with [18000] elements. How practically is this? My sample data has ~18000, but what if I need much less or much more? The linked list approach has that flexibility, I can have whatever size I want as long as there's memory for it. But the array doesn't, how am I going to handle such situation? What are your suggestions?
Using linked lists is good for space complexity but bad for time complexity. And using an array is good for time complexity but bad for space complexity.
Any thoughts about this solution?
Solution 2) This project also demands that I have some sort of data structures that allows quick lookup based on a name index and an ID index. For this I decided to use Hash Tables. My tables are implemented with separate chaining as collision resolution and when a load factor of 0.70 is reach, I normally recreate the table. I base the next table size on this Link.
Currently, both Hash Tables hold a pointer to the UserProfile instead of duplication the user profile itself. That would be stupid, changing data would require 3 changes and it's really dumb to do it that way. So I just save the pointer to the UserProfile. The same user profile pointer is also saved as value in each Graph Vertex.
So, I have 3 data structures, one Graph and two Hash Tables and every single one of them point to the same exact UserProfile. The Graph structure will serve the purpose of finding the shortest path and stuff like that while the Hash Tables serve as quick index by name and ID.
What I'm thinking to solve my Graph problem is to, instead of having the Hash Tables value point to the UserProfile, I point it to the corresponding Vertex. It's still a pointer, no more and no less space is used, I just change what I point to.
Like this, I can easily and quickly lookup for each Vertex I need and link them together. This will insert the ~520000 relations pretty quickly.
I thought of this solution because I already have the Hash Tables and I need to have them, then, why not take advantage of them for indexing the Graph vertices instead of the user profile? It's basically the same thing, I can still access the UserProfile pretty quickly, just go to the Vertex and then to the UserProfile.
But, do you see any cons on this second solution against the first one? Or only pros that overpower the pros and cons on the first solution?
Other Solution) If you have any other solution, I'm all ears. But please explain the pros and cons of that solution over the previous 2. I really don't have much time to be wasting with this right now, I need to move on with this project, so, if I'm doing to do such a change, I need to understand exactly what to change and if that's really the way to go.
Hopefully no one fell asleep reading this and closed the browser, sorry for the big testament. But I really need to decide what to do about this and I really need to make a change.
P.S: When answering my proposed solutions, please enumerate them as I did so I know exactly what are you talking about and don't confuse my self more than I already am.
The first approach is the Since the main issue here is speed, I would prefer the array approach.
You should, of course, maintain the hash table for the name-index lookup.
If I understood correctly, you only process the data one time. So there is no dynamic data insertion.
To deal with the space allocation problem, I would recommend:
1 - Read once the file, to get the number of vertex.
2 - allocate that space
If you data is dynamic, you could implement some simple method to increment the array size in steps of 50%.
3 - In the Edges, substitute you linked list for an array. This array should be dynamically incremented with steps of 50%.
Even with the "extra" space allocated, when you increment the size with steps of 50%, the total size used by the array should only be marginally larger than with the size of the linked list.
I hope I could help.

Resources