Expressing data transformation

Expressing data transformation - database

Two different relational databases.
Your task is to write a code to transfer the data from the first database to the second database.
Some tables in the database you are transferring to are of the same structure as the table you are transferring from, the transfer of these tables is as simple as "INSERT INTO DbA.TableA (...) VALUES SELECT * FROM DbB.TableB".
Some tables in the database you are transferring to have different structures and different purposes. After proper analysis, you understand the relations and you understand the right transformation you need to code.
My question is: how do you express such knowledge? How do you express the transformational relations between two databases? Are there any tools or diagrams?
The best way I know right now is writting the list of tables of the first database and for each table describing how it is to be transformed into the second database. Is it possible to make this more formal/concise/cool?

If you are wanting a toolset and work in the Microsoft database stack then this is exactly what SQL Server Integration Services (or SSIS) is used for.
If you are wanting to document the process then you would typically write an interface definition document (IDD). There are many examples on Google but here is something to get you started.

Related

Keeping databases structure in sync

Lets pretend we have several similar websites each focussing on a separate country. We want to have the same code and same SQL Server database structures but the content of each database is different.
Is there an easy way of keeping the databases structurally in sync - adding a column to one will result in it being added to all?
Basically have a group of databases that column/table changes/additions/edits are propagated through?

SQL Server tables connection

I have to connect multiple tables that are part of single or multiple databases. Approximately 10-15 tables in each query have to be connected to generate data for the analysis in SQL Server 2014.
I don't have access to the database diagram or architecture and these reports are to be sent out weekly. I want to understand the approach on how to begin writing these kind of queries which are of basic and advanced level and identify the relationship between tables and what kind of advanced level queries I can learn or utilize like CTE, Rank Partition, Subqueries etc.
Anybody who can provide a rough flow diagram or structure about the approach will be really helpful.

It's very unlikely that owners of those source systems want to be directly queried every time someone runs a report. Since you already have access to SQL Server, I would suggest building a data warehouse with that.
You haven't provided a whole lot of information to go on, but SSIS packages could be created to connect to the source systems and load into your data warehouse. And furthermore, those packages can be scheduled through Agent.
As for modeling... Again it is difficult with the lack of information, but generally the star model works great for reporting, which is a fact table surrounded by dimension (or attribute) tables.
As for figuring out relationships without a diagram, this will have to be done via experimentation and tieing to existing reports to make sure your joins aren't dropping records or cascading.
Good luck.

MS SQL Server: central database and foreign keys

I'm am currently developing one project of many to come which will be using its own database and also data from a central database.
Example:
the database "accountancy" with all accountancy package specific tables.
the database "personelladministration" with its specific tables
But we also use data which is general and will be used in all projects like "countries", "cities", ...
So we have put these tables in a separate database called "general"
We come from a db2 environment where we could create foreign keys between databases.
However, we are switching to MS SQL server where it is not possible to put foreign keys between databases.
I have seen that a workaround would be to use triggers, but I'm not convinced that is a clean solution.
Are we doing something wrong in our setup? Because it seems right to me to put tables with general data in a separate database instead of having a table "countries" in every database, that seams difficult to maintain and inefficiënt.
What could be a good approach to overcome this?

I would say that countries is not a terrible table to reproduce in multiple databases. I would rather duplicate static data like that than use more elaborate techniques. There is one physical schema per database in sql server and the schema can not be shared. That is why people use replication or triggers for shared data.
I can across this problem a while back. We have one database for authentication, however, those users have to be shared across multiple applications some of which have their own database.
Here is my question on this topic.
We resorted to replication and using an custom Authentication/Registration service agent to keep the data up to data.
Using views, in what Sourav_Agasti suggested in his answer, would be the most straight forward approach for static data. You can create views and indexed views and join data from databases on linked servers.

Create a loopback linked server and then create a view(if required, on each database) which accesses the table in this "central database" through this linked server. There will be a minor performance impact but it more than enough compensates by being very simiplistic.

SELECTing across multiple DB2 databases in one query

Ive run into the issue where I need to query 2 separate databases(same instance) in one query.
I am used to doing this with mysql, but Im not sure how to do it with DB2.
In mySQL it would be something like:
SELECT user_info.*, game.*
FROM user_info, second_db.game_stats as game
WHERE user_info.uid = game.uid
So the question is how i translate a query like that into DB2 syntax?
Equivalent of this

Is there a reason why you have the tables in a separate database? MySQL doesn't support the concept of schemas, because in MySQL a "schema" is the same thing as a "database". In DB2, a schema is simply a collection of named objects that lets you group them together.
In DB2, a single database is much closer to an entire MySQL server, as each DB2 database can have multiple schemas. With multiple schemas inside the same database, your query can run more or less unchanged from how it is written.
However, if you really have 2 separate DB2 databases (and, for some reason, don't want to migrate to a single database with multiple schemas): You can do this by defining a nickname in your first database.
This requires a somewhat convoluted process of defining a wrapper (CREATE WRAPPER), a server (CREATE SERVER), user mapping(s) (CREATE USER MAPPING) and finally the nickname (CREATE NICKNAME). It is generally easiest to do these tasks using the Control Center GUI because it will walk you through the process of defining each of these.

Linking tables between databases

I’m after a bit of advice on the best way to go about this is SQL server 2008R2 express. I have a number of applications that are in separate databases on the same server. They are all “plugins” that use a central staff/structure list that will be in a separate database. The application is in the process of being migrated from JET.
What I’m looking for is the best way of all the “plugin” databases being able to see the central database and use those tables in standard queries and views etc.
As I’m using express that rules out any replication solution and so far the only option I can think of is to use triggers or a stored procedure to “push” out all the changes to the plugins. The information needs to be populated on a near enough real time basis however the number of changes will be very small maybe up to 100 a day and the biggest table only has about 1000 rows at the moment (the staff names table).
Hopefully that will cover all everything but if anyone needs any more details then just ask
Thanks

Apologies if I've misunderstood, but from your description it sounds like all these databases are hosted on the same instance of SQL Server - it's your mention of replication that makes me uncertain.
Assuming that's the case, you should be able to replace any copies of tables from the central database which are held in the "plugin" databases with views or synonyms which reference the central tables directly, since SQL server allows you to make references between databases on the same server using three-part naming (database_name.schema_name.object_name)
For example, if each plugin db has a table StaffNames, you could replace this with a view by dropping the table, then creating a view:
drop table StaffNames
go
create view StaffNames
as
select * from <centraldbname>.<schema - probably dbo>.StaffNames
go
and your code should continue to work seamlessly, as long as permissions are set up.
Alternatively, you could replace all the references to the shared tables in the plugin databases with three-part name references to the central database, but the view method requires less work.