How to understand the ARMv8 AArch64 MMU table descriptor format in the diagram? - mmu

The diagram below is taken from ARMv8-A Programmer's Guide:
I am a bit confused by the highlighted entry type. Let me state my current understanding first (suppose the scenario of a stage 1 translation in EL3 with granule size 4KB).
In the first place, it seems that the name "entry" and "descriptor" is interchangeable in this context.
Then, according to ARM ARM doc (e.g. Figure D5-6), it seems that there are 3 types of valid descriptors:
D_Table descriptor, which is an entry pointing to the next level translation table. Reaching this type of descriptor means that the translation walk is not complete yet.
D_Block descriptor, which is an entry pointing to a memory "region" which is bigger than the granule size ("page"). This is one of the two cases where a translation walk completes.
D_Page descriptor, which points to a memory "page" of the granule size. This is the second case where a translation walk completes.
There are also several constraints (w.r.t valid descriptors), namely:
L0 table can only contain D_Table descriptors, no other two types (D_Block, D_Page).
L3 can only contain D_Page descriptors.
D_Block can only appear in L1/L2 translation tables.
So back to the diagram above, I don't understand why there is another Table entry for L1 and L2? Because the 1st line in the diagram depicts table descriptor for L0/L1/L2 already. Then, even if this is another table descriptor type, why the middle content is marked as "Output block address" (instead of "Next level table address")?

I think I see where you're stuck, this was a pain for me to understand too.
In the first place, it seems that the name "entry" and "descriptor" is interchangeable in this context.
This is not true. A table descriptor points to another table. A table entry points to a data page. Since it is not valid to have a table entry in L0/L1 (dependent on the granule) nor a table descriptor in L2, ARM is actually able to re-use the same encoding to mean two semantically different things. This means that your desire to distinguish between D_TABLE and D_PAGE is sorta murky since what actually determines what type PTE[1:0] actually is comes from the context PTE is coming from (L0/L1? it's D_TABLE! L2? it's D_PAGE!).
This is kinda a pain for us as software people but it really helps on hardware implementation since it means that they can re-use all the same decoding logic for both descriptor and entry results. If you're an optimistic type, you can think about this though as ARM letting you use the same format for generating both table links and page entries which can ease the implementation a bit haha.
So back to the diagram above, I don't understand why there is another Table entry for L1 and L2?
So, why do we have two? It's mainly that although descriptors and entries share most of the same encoding, entries support things that descriptors do not. Namely, all of the stuff in the "lower attributes" field (stuff like sharability, fine grained access permissions, MAIR, etc.) that don't generally make sense to apply hierarchically from a page table (what MAIR would you apply if L0 says one thing and L1 says another??)

Related

SCD-2 in data modelling: how do I detect changes?

I know the concept of SCD-2 and I'm trying to improve my skills about it doing some practices.
I have the next scenario/experiment:
I'm calling daily to a rest API to extract information about companies.
In my initial load to the DB everything is new, so everything is very easy.
Next day I call to the same rest API, which might returns the same companies, but some of them might have (or not) some changes (i.e., they changed the size, the profits, the location, ...)
I know SCD-2 might be really simple if the rest API returns just records with changes, but in this case it might returns as well records without changes.
In this scenario, how people detect if the data of a company has changes or not in order to apply SCD-2?, do they compare all the fields?.
Is there any example out there that I can see?
There is no standard SCD-2 nor even a unique concept of it. It is a general term for large number of possible approaches. The only chance is to practice and see what is suitable for your use case.
In any case you must identify the natural key of the dimension and the set of the attributes you want to keep the history.
You may of course make it more complex by the decision to use your own surrogate key.
You mentioned that there are two main types of the interface for the process:
• You get periodically a full set of the dimension data
• You get the “changes only” (aka delta interface)
Paradoxically the former is much simple to handle than the latter.
First of all, in the full dimensional snapshot the natural key holds, contrary to the delta interface (where you may get more changes for one entity).
Additionally you have to handle the case of late change delivery or even the wrong order of changes delivery.
Next important decision is if you expect deletes to occur. This is again trivial in the full interface, you must define some convention, how this information would be passed in the delta interface.
Connected is the question whether a previously deleted entity can be reused (i.e. reappear in the data).
If you support delete/reuse you'll have to thing about how to show them in your dimension table.
In any case you will need some additional columns in the dimension to cover the historical information.
Some implementation use a change_timestamp, some other use validity interval valid_from and valid_to.
Even other implementation claim that additional sequence number is required – so you avoid the trap of more changes with the identical timestamp.
So you see that before you look for some particular implementation you need carefully decide the options above. For example the full and delta interface leads to a completely different implementations.

Deleting elements in a dimension and rebuilding them in TM1

Is there a restriction for using DimensionDeleteAllElements() in TM1 wherein it can't work in tandem with a dimension update process that's called from the TI which houses DimensionDeleteAllElements()?
I've a TI which deletes all elements of a dimension using DimensionDeleteAllElements() and subsequently rebuilds it by calling another TI process which updates the dimension with elements from the database. This serves to weed out unnecessary elements.
After successful execution of this TI, I can find that the elements are wiped out in the dimension. But the dimension fails to get rebuilt. However, according to the tm1server log the secondary TI that updates the dimension with database elements completes its execution normally. Also, running the dimension update TI manually works fine and updates the dimension with elements from the database.
Should I use the contents of the dimension update process here in this TI instead of calling that?
Let me state this plainly... you should emphatically NOT, under any circumstances, be doing what you are doing.
The general consensus amongst TM1 experts is that except in very, very exceptional cases (such as creating a reference dimension which is not used in any cubes), DimensionDeleteAllElements() is too dangerous to be used. (Example 1, Example 2.) If the TI process fails part way through you can lose your elements. Lose your elements, and you lose your data.
You haven't specified the tab on which you're making that call but let me explain how a metadata update (currently) works. (It works a bit differently with the new functions like DimensionElementInsertDirect, or the new Restful API which is stateless, but for the purposes of this exercise it still applies.)
Any changes that you make to a dimension in the Prolog or Metadata tabs will cause a copy of the dimension to be made in memory.
After the last row of the datasource (if any) is processed on the Metadata tab or, if there is no datasource, after the execution of the code passes through the Metadata tab on its way to the Prolog, the copie(s) of the changed dimensions will be checked for integrity and, if they pass that check, will be registered as replacements for the original dimension objects.
Before the second of those things happen, however, the rest of the system does not know about the copy of the dimension. They are similar to private objects that only the TI process itself knows about.
So what is happening in your case is this:
Your first process executes the DimensionDeleteAllElements command. This causes a copy of the dimension to be created and all of the elements to be removed.
I would guess that you are calling the second process on your Prolog tab. (I would hope it's not the Metadata tab, otherwise you'll be executing the call once for every row in your record source, if any.)
When that process is called it will do the rebuild of the dimension. It will do this by creating its own copy of the dimension in memory, quite separate from the first process' one, updating that copy, then registering it as the new dimension once it passes its own Metadata tab.
Control will then return to the Prolog of the first process which, you may remember, still has its own copy of the dimension in memory, one which now has no elements. Once the first process passes the end of its own Metadata tab, it will do the integrity check (an absence of elements does not cause that check to fail) and register that dimension copy as the updated dimension, thus obliterating (or more precisely overwriting) the changes that the second process made.
The solution? If you are going to be calling DimensionDeleteAllElements (and you generally shouldn't) then you must do it in the Prolog of the same process that rebuilds the dimension. In that way the element deletion and the re-addition of the elements from the data source happens to the same copy of the dimension, and the resulting dimension is then registered.
You should not ever be removing N or S elements that contain data in cubes. These should never be "unnecessary elements" to be "weeded out". Doing so can cause hard to explain changes in cube values (since data vanishes with the elements) which is toxic from an auditing point of view.
C level elements are a different matter. If your intent is to remove all of those and allow the current hierarchy to be rebuilt from the source, it would be best to just iterate through the dimension elements (backwards) using the DimSiz and DimNm functions, and using the DType function to return the element type so that you can identify and delete consolidations. This is obviously done in the Prolog.

child node in MBR (R-Tree Implementation)

I am new to R-Tree concept. Sorry If I ask a very basic question related to Rtree. I have read a few literature on R-Tree to get the basic concept of R-Tree. However, I could not understand the clustering or grouping steps in MBR. What's bothering me is:
How many points or object could fit in each MBR? i could see that the number of object stored in each MBR is varies. So is there any condition or procedure or formula or anything to determine how many objects will be stored in each MBR?
Thanks for your help! Gracias!
Read the R-tree publication, or a book on index structures.
You fix a page size (because the R-tree is a disk-oriented data structure, this should be something such as e.g. 8kb).
If a page gets too empty, it will be removed. If a page is too full, it will be split.
Just like with pretty much any other page-based tree, actually (e.g. B-tree).

Co-worker argues that storing a serialized string of data in a database does not violate 1NF

1NF requires fields to be atomic; that is, it should represent only a single value.
He says because he doesn't expect for the data to be searchable, or readable then it is not in violation of normal form and that the value represents a single object.
Is he right?
Define "atomic".
The recentmost advances in theory reveal that the concept of "atomicity" on which the definition of 1NF (as typically understood) relies, is vague, and probably undefinable altogether.
For example, a coordinate on a map, is that an "atomic" value ? Usually, such a value has clearly visible 'X' and 'Y' components, and the value of those components can be "drawn out of" your "atomic" value. And if something can be "drawn out of" something else, then it is suspect to claim that that "something else" is "atomic" in the usual sense of the word (i.e. not further decomposable).
Is using a value of type "coordinate on a map" then in violation of 1NF, for precisely that reason ? That position is hard to maintain.
For such reasons, a single string holding a list of CSV's, does not formally violate 1NF. That is not to say that actually designing your databases on this basis is a very good idea. Most of the time, it won't be. But formally speaking, it does not violate 1NF (or whatever is left of it).
A string is a single value. The fact that it can be split into smaller strings doesn't mean you are violating 1NF. If you are encoding a lot of information into strings then you may not be taking best advantage of your DBMS features (i.e. the ability to query the data and enforce constraints on it) but that's a different question.
The problem is that a single value can often actually be decomposed into separate values depending on the context (e.g., a varchar can be many char values, and a floating point number can be two separate numbers). If the serialized data is not relevant to the relation that's represented by the table, then it may be considered 1NF.
An Address field can contain a street name and city in a generic ContactInfo table, but the field wouldn't be considered atomic in an Addresses table that would have separate attributes for street name, city, ZIP, etc.
Yes, your cow-orker is right.
The current wisdom is that a single value can be arbitrarily complex. It can even be a table. (In Chris Date's books, look for "relation-valued attribute".) Dates and timestamps are single values, but they both have internal structure.
But if a type does have internal structure, the dbms either ignores that internal structure (as SQL does if you SELECT CURRENT_TIMESTAMP) or it provides functions that operate on that internal structure (as SQL does if you SELECT EXTRACT(YEAR FROM CURRENT_TIMESTAMP)).
The key is that the user doesn't have to write any procedural code to manipulate the contents of that internal structure. Either the dbms provides those functions, or a database designer who creates new types provides those functions.

Atomicity of field for part numbers

In our internal inventory application, we store three values (in separate fields) that become the printed "part number" in this format: PPP-NNNNN-VVVV (P = Prefix, N = Number, V = version).
So for example, if you have a part 010-00001-01 you know it's version 1 of a part of type "010" (which let's say is a printed circuit board).
So, in the process of creating parts engineering wants to group parts together by keeping the "number" component (the middle 5 digits) the same across multiple prefixes like so:
001-00040-0001 - Overall assembly
010-00040-0001 - PCB
015-00040-0001 - Schematics
This seems problematic and frustrating as it sometimes adds extra meaning to the "number" field (but not consistently since not all parts with the same "number" component are necessarily linked).
Am I being a purist or is this fine? 1NF is awfully vague with regards to atomicity. I think I'm mostly frustrated because of the extra logic to ensure that the next "number" part of the overall part number is valid and available for all prefixes.
There have been a number of enterprises that have foundered, or nearly foundered, on the "part number syndrome". You might be able to find some case studies. DEC part numbers were somewhat mixed up.
The customer is not always right, but the customer is always the customer.
In this case, it sounds to me like engineering is trying to use as single number to model a relationship. I mean the relationship between Overall assembly, PCB, and Scematics. It's better to model relationships as relations. It allows you more flexibility down the road. You may have a hard time selling engineering on this point.
In my experience, regardless of database normative rules, when the client/customer/user wants something done a certain way, there is most likely a reason for it, and that reason will save them money (in some fashion). Sometimes it will save money by reducing steps, by reducing training costs, or simply because That's The Way It's Always Been. Whatever the reason, eventually you'll end up doing it because they're paying to have it done (unless it violates accounting rules).
In this instance, it sounds like an extra sorting criteria on some queries for reports, and a new 'allocated number' table with an auto-incrementing key. That doesn't sound too bad to me. Ask me sometime about the database report a client VP commissioned strictly to cast data in such a fashion as to make a different VP look bad in meetings (not that he told me that up front).

Resources