How to perform data lineage using Erwin - database

Data lineage is defined as a kind of data life cycle that includes the data's origins and where it moves over time. This term can also describe what happens to data as it goes through diverse processes. Data lineage can help with efforts to analyze how information is used and to track key bits of information that serve a particular purpose.
I want to know if there is a specific way to perform data lineage using Erwin. I have searched but could not find a place where it clearly says how to perform data lineage. Please help.

Other than the Table editor and then "Where Used" I don't believe there is... it's not going to build the traditional flow chart that other lineage tools create.

Erwin have the ability to natively see this data lineage using 2 different features.
first Design layer. Allow a conceptual/logical model to be drive to another logical or physical model.
Or
Go to your ERwin model properties and make sure that you are supporting the data movement properties capabilities of ERwin. This can be combined with the dimensional features of the ERwin modeling tool.
Once have done this. You will have the ability to go and properly document data movement rules attached to certain tables and in the column properties of the table, the is now the ability to add source to target mapping. The source to target mapping can to use to add content to the answer provided earlier. Looking at the "Where used" properties of a table.
The data lineage however in the are report that provide source to target mapping. Comment of the mapping can be added.
To see this graphically, you will need the new CA web portal. Once the model is publish to portal. Model connections are created and additional semantic, data and other lineage information are now visible graphically. This however is another product in the ERwin solution Suite.

Enabling the Data Movement option under General Model Properties
Will allow you to specify the source table/field and transformation rule for every attribute in your model.
This the only known Erwin option to establish source-to-target lineage for documentation purpose.
Ref :
https://support.ca.com/cadocs/0/CA%20ERwin%20%20Data%20Modeler%20r7%203%2011-ENU/Bookshelf_Files/HTML/ERwin%20Online%20Help/Set_General_Model_Properties.html
Hope this helps.

Related

Managing multiple datasources in CakePHP

I'm planning to develop a web application in CakePHP that shows information in graphics and cards. I chose CakePHP because the information that we need to show is very structured, so the model approach makes easier to manage data; also I have some experience with MVC from ASP.NET and I like how simple is to use the routing.
So, my problem is that the multiple organizations that could use the app would have their own database with a different schema that the one we need. I can't just set their string connection in the app.php file because their database won't match my model.
And the organization datasource couldn't fit my model for a lot of reasons: the tables don't have the same name, the schema is different, the fields of my entity are in separated tables, maybe they have the info in different databases or also in different DBMS!
I want to know if there's a way to make an interface that achieves this
In such a way that cakephp Model/Entity can use data regardless of the source. Do you have any suggestions of how to do that? Does CakePHP have an option to make this possible? Should I use PHP with some kind of markup language like JSON or XML? Maybe MySQL has an utility to transform data from different sources into a view and I can make CakePHP use the view instead of the table?
In case you have an answer be as detailed as you can.
This other options are possible if it's impossible to make the interface:
- Usw another framework that can handle this easier and has the features I mentioned above.
- Make the organization change their database so it matches my model (I don't like this one, and probably they won't do it).
- Transfer the data in the application own database.
Additional information:
The data shown in graphics are from students in university. Any university has its own database with their own structure and applications using the db, that's why isn't that easy to change structure. I just want to make it as easy as possible to any school to configure their own db.
EDIT:
The version is CakePHP 3.2.
An important appointment is that it doesn't need all CRUD operations, only "reading". Hope that makes the solution easier.
I don't think your "question" can be answered properly, it doesn't contain enough information, not enough details. I guess there is something that will stay the same for all organizations but their data and business logic will be different. But I'll try it.
And the organization datasource couldn't fit my model for a lot of reasons: the tables don't have the same name, the schema is different, the fields of my entity are in separated tables, maybe they have the info in different databases or also in different DBMS!
Model is a whole layer, so if you have completely different table schemas your business logic, which is part of that layer, will be different as well. Simply changing the database connection alone won't help you then. The data needs to be shown in the views as well and the views must be different as well then.
So what you could try to do and what your 2nd image shows is, that you implement a layer that contains interfaces and base classes. Then create a Cake plugin for each of the organizations that uses these interfaces and base classes and write some code that will conditionally use the plugin depending on whatever criteria (guess domain or sub-domain) is checked. You will have to define the intermediate interfaces in a way that you can access any organization the same way on the API level.
And one technical thing: You can define the connection of a table object in the model layer. Any entity knows about it's origin but you should not implement business logic inside an entity nor change the connection through an entity.
EDIT: The version is CakePHP 3.2. An important appointment is that it doesn't need all CRUD operations, only "reading". Hope that makes the solution easier.
If that's true either use the CRUD plugin (yes, you can use only the R part of it) or write some code, like a class that describes the organization and will be used to create your table objects and views on the fly.
Overall it's a pretty interesting problem but IMHO to broad for a simple answer or solution that can be given here. I think this would require some discussion and analysis to find the best solution. If you're interested in consulting you can contact me, check my profile.
I found a way without coding any interface. In fact, it's using some features already included in the DBMS and CakePHP.
In the case that the schema doesn't fit the model, you can create views to match de table names and column names from the model. By definition, views work as a table so CakePHP searches for the same table name and columns and the DBMS makes the work.
I made a test with views in MySQL and it worked fine. You can also combine the data from different tables.
MySQL views
SQL Server views.
If the user uses another DBMS you just change the datasource in app.php, and make the views if it's necessary
If the data is distributed in different DBMS, CakePHP let's you set a datasource for each table, you just add it to app.php and call it in the table if it's required.
Finally, in case you just need the "reading" option, create a user with limited access to the views and only with SELECT privileges.
USING:
CakePHP 3.2
SQL SERVER 2016
MySQL5.7

What database abstraction patterns are there?

I am trying to get my head around the common patterns for database abstraction.
So far I've found:
Database Layer
just a separate class which holds the SQL
does not conform to any other rules
Data Access Object (DAO)
like above but there is a transfer object which represents the columns of the database table
create, delete, update methods take the filled transfer object as input
the find methods may take an input like string (findByName) or an integer (findByAge) but always return lists of transfer objects
Repository
abstraction of a collection of objects
closer to the domain model
I need to read more here
Object Relational Mapper
tool which gives me an object which is mapped to the database table in the background
the object represents a row in the table
a property change of the object leads to an update
Please don't worry too much about my quick explanations of the patterns. I am still in an understanding phase.
But is this list complete or are there other concepts which are missing here?
Martin Fowler's "Patterns of Enterprise Application Architecture" is an excellent book, well respected in the community, which documents about fifty design patterns, around half of which are concerned with interacting with databases. It includes Repository, several kinds of DAOs (more or less covering your Database Layer and DAO) and several entire categories of patterns found in object-relational mappers. So there's a good place to start.
It's hard to summarize any more of the content of POEAA in this answer without simply repeating the list of patterns. Fortunately the list can be found at Fowler's web site. Unfortunately the copyright symbol there suggests that I shouldn't just include it here.

What is a good web application SQL Server data mart implementation in ElasticSearch?

Coming from a RDBMS background and trying to wrap my head around ElasticSearch data storage patterns...
Currently in SQL Server, we have a star schema data mart, RecordData. Rows are organized by user ID, geographic location that pertains to the rest of the searchable record, title and description (which are free text search fields).
I would like to move this over to ElasticSearch, and have read about creating a separate index per user. If I understand this correctly, with this suggestion, I would be creating a RecordData type in each user index, correct? What is a recommended naming convention for user indices that will be simple for Kibana analysis?
One issue I have with this recommendation is, how would you organize multiple web applications on the ES server? You wouldn't want to have all those user indices all over the place?
Is it so bad to have one index per application, and type per SQL Server table?
Since in SQL Server, we have other tables for user configuration, based on user ID's, I take it that I could then create new ES types in user indices for configuration. Is this a recommended pattern? I would rather not have two data base systems for this web application.
Suggestions welcome, thank you.
I went through the same thing, and there are a few things to take into account.
Data Modeling
You say you use a star schema today. Elasticsearch is typically appropriate for denormalized data where the totality of the information resides in each document unlike with a star schema. If you can live with denormalized, that is fine but I assume that since you already have star schema, denormalized data is not an option because you don't want to go and update millions of documents each time the location name change for example(if i understand the use case). At least in my use case that wasn't an option.
What are Elasticsearch options for normalized data?
This leads us to think of how to put star schema like data in a system like Elasticsearch. There are a few options in the documentation, the main ones i focused were
Nested Objects - more details at https://www.elastic.co/guide/en/elasticsearch/guide/current/nested-objects.html . In nested objects the entire information is kept in a single document, meaning one location and its related users would be in a single document. That may make it not optimal becasue the document will be huge and again, a change in the location name will require to update the entire document. So this is better but still not optimal.
Parent - Child Relationship - more details at https://www.elastic.co/guide/en/elasticsearch/guide/current/parent-child.html . In this case the location and the User records would be kepts in separate indices similarly to a relational database. This seems to be the right modeling for what we need. The only major issue with this option is the fact that Kibana 4 does not provide ways to manipulate/aggregate documents based on parent/child relationship as of this writing. So if you main driver for using Elasticsearch is Kibana(this was mine), that kind of eliminates the option. If you want to benefit from the elasticsearch speed as an engine this seems to be the desired option for your use case.
In my opinion once you got right the data modeling all of your questions will be easier to answer.
Regarding the organization of the servers themselves, the way we organize that is by having a separate cluster of 3 elasticsearch nodes behind a Load Balancer(all of that is hosted on a cloud) and then have all your Web Applications connect to that cluster using the Elasticsearch API.
Hope that helps.

Best practices for analyzing/reporting database with 'flexible' schema

I am given a task to create views (Excel, websites, etc. not database 'view') for a SQL Server table with 'flexible' schema like below:
Session(guid) | Key(int) | Value(string)
My first thought is to create a series of 'standard' relational data tables/views that speak to the analysis/reporting requests. They can be either new tables updated by a daemon service who transforms data on a schedule, or just a series of views with deeply nested queries. Then, use SSAS, SSRS and other established ways to do the analysis and reporting. But I'm totally uncertain if that's the right line of thinking.
So my questions are:
Is there a terminology for this kind of 'flexible' schema so that I can search for related information?
Do my thoughts make sense or they're totally off?
If my thoughts make sense, should I create views with deep queries or new tables + data transform service?
I would start with an SSAS cube to expose all the values , presuming you can get some descriptive info from the key. The cube might have one measure (count) and three dimensions for each of your attributes.
This cube would have little value for end users (too confusing), but I would use it to validate whether any particular data is actually usable before proceeding. I think this is important because usually this data structure masks weak data validation and integrity in the source system.
Once a subject has been validated I would build physical tables via SSIS in preference to views - I find them easier to test and tune.
Finally found the terminology - it's called entity-attribute-value pattern (EAV) and there are a lot of discussions and resources around it.

Synchronising data entities from different applications

I'm looking for some feedback on the best approach to a problem I've been tasked with. There are two systems with their own databases which store very similar business entities.
For each entity in question there needs to be a synchronization mechanism in place to make sure that changes in one database are delivered to the other when a change occurs and for the changes to be translated into the destination table structure. This translation means that replication is not an option but I don't want to start writing bespoke triggers or views etc to keep them in sync.
Is this something which BizTalk or a similar product could handle after an initial configuration / mapping process? Also, is Biztalk potentially overkill and are there any other methods which I could employee to achieve this?
Thanks,
Brian.
It depends on the size of the "systems" (tables ?) to synchronise.
EAI are the general application to do this. Connecting two systems which can't interact together, effectivly mapping one business object to another one, aplling a map to translate one into another.
But such tools (like webMethods for exemple) are entreprise tools, if you only need to synchronise two table from two systems EAI will clearly be overkill.
Anyway the principles can help you. The EAI approach would be to have a generic business object that's match all of properties found in both systems for the business objects you want to syncrhonise. Then you will have to have some sort of map to translate each application specific business objet to and from you generic business object. Your object should not only describe the business data, but also the operation to perform (create, update, delete data).
Then you need a trigger (or two if you want to synchronyze both ways) to detect when a change happen, use the map to transform the data your trigger get to generic object (with the operation to perform at the other end).
And finally you need an "updater" that will take the specific business object and do the right operation in the database (insert/update/delete)
EAI provide connectors to take care of triggering the workflow and updating the database. You will still need to define some mappings in some specific way depending of the EAI used.
EAI are a lot more powerfull than juste synchronizing two tables. Connnectors have various type and can interact with various system (proprietary ones), various database, simple format (xml, text) or specific protocols (ftp, webservices, etc.)
EAI also ensure that any modification is effectivly commited at the end.
Hope it helps.
Sql Server Integration Services could be a cheep candidate for solving the problem (can connect to other DBs and data sources that Sql Server). SSIS is part of all Sql Server installations (with the exception of Express).
There is a nifty tool called "datariver" by the Swiss company Sowatec (where I did work a few years ago. I wasn't involved with this product though; just so you know). It's meant to flow data from sources to sinks (just like a river).
The web site is in German but the guys behind it are happy to answer any of your questions in English by mail.
BizTalk is and would be an ideal solution for this kind of problem.
What BizTalk can do?
1. Define a schema which represents a common business entity, this is essentially all the fields which need to be in sync across several database tables.
Define the flow of communication (Orchestrations) and end-points(web services), i.e. which update triggers what changes!
Use maps to map the common business entity into specific data elements required
by the databases. Note that biztalk has built-in adapters to speed up the development process.
Adequate time must be spent in design and of this system the results would be fabulous.
For development purposes refer my articles (google keywords: Biztalk + Karamchetti)

Resources