I'm absolute new to databases! I've developed a web-app with a postgres database. For evaluating my software, I've done a first field experiment with a fee colleagues. Afterwards there was a lot of bugfixes.
Now I've got a dump after the field experiment. But then I changed my database, renamed two tables, added a few tables and so on. How can I merge my dumped/old database the easy way?
Is there an easy way?
Feel free to ask for some data or everything, that helps.
Greetings,
Tobias
Related
I've been doing some research on this topic for a while now and can't seem to find a similar instance to my issue. I will try and explain everything as best I can, as simply as I can.
The problem is in the title; I am trying to migrate data from an Access database to SQL Server. Typically, this isn't really a hard problem as there exists several import/export tools within SQL Server but I am looking for the best solution. That or some advice/tips as I am somewhat new to database migration. I will now begin to explain my situation.
So I am currently working on migrating data that exists in an Access “database” (database in quotes because I don’t think it is actually a database, you’ll know why in a minute) in an un-normalized form. What I mean by un-normalized is that all of the data is in one table. This table has about 150+ columns and the rows number in the thousands. Yikes, I know; this is what I’ve walked into lol. Anyways, sitting down and sorting through everything, I’ve designed relationships for the data that normalize it nicely in its new home, SQL Server. Enter my predicament (or at least part of it). I have the normalized database set up to hold the data but I’m not sure how to import it, massage/cut it up, and place it in the respective tables I’ve set up.
Thus far I’ve done a bunch of research into what can be done and for starters I have found out about the SQL Server Migration Assistant. I’ve begun messing with it and was able to import the data from Access into SQL Server, but not in the way I wanted. All I got was a straight copy & paste of the data into my SQL Server database, exactly as it was in the Access database. I then learned about the typical practice of setting up a global table/staging area for this type of migration, but I am somewhat of a novice when it comes to using TSQL. The heart of my question comes down to this; Is there some feature in SQL Server (either its import/export tool or the SSMA) that will allow me to send the data to the right tables that already exist in my normalized SQL Server database? Or do I import to the staging area and write the script(s) to dissect and extract the data to the respective normalized table? If it is the latter, can someone please show me some tips/examples of what the TSQL would look like to do this sort of thing. Obviously I couldn’t expect exact scripts from anyone without me sharing the data (which I don’t have the liberty of as it is customer data), so some cookie cutter examples will work.
Additionally, future data is going to come into the new database from various sources (like maybe excel for example) so that is something to keep in mind. I would hate to create a new issue where every time someone wants to add data to the database, a new import, sort, and store script has to be written.
Hopefully this hasn’t been too convoluted and someone will be willing (and able) to help me out. I would greatly appreciate any advice/tips. I believe this would help other people besides me because I found a lot of other people searching for similar things. Additionally, it may lead to TSQL experts showing examples of such data migration scripts and/or an explanation of how to use the tools that exist in such a way the others hadn’t used before or have functions/capabilities not adequately explained in the documentation.
Thank you,
L
First this:
Additionally, future data is going to come into the new database from
various sources (like maybe excel for example)...?
That's what SSIS is for. Setting up SSIS is not a trivial task but it's not rocket science either. SQL Server Management Studio has an Import/Export Wizard which is a easy-to-use SSIS package creator. That will get you started. There's many alternatives such as Powershell but SSIS is the quickest and easiest solution IMO. Especially when dealing with data from multiple sources.
SSIS works nicely with Microsoft Products as data sources (such as Excel and Sharepoint).
For some things too, you can create an MS Access Front-end that interfaces with SQL Server via sql server stored procedures. It just depends on the target audience. This is easy to setup. A quick google search will return many simple examples. It's actually how I learned SQL server 20+ years ago.
Is there some feature in SQL Server that will allow me to send the
data to the right tables that already exist in my normalized SQL
Server database?
Yes and don't. For what you're describing it will be frustrating.
Or do I import to the staging area and write the script(s) to dissect
and extract the data to the respective normalized table?
This.
If it is the latter, can someone please show me some tips/examples of
what the TSQL would look like to do this sort of thing.
When dealing with denormalized data a good splitter is important. Here's my two favorites:
DelimitedSplit8K
PatternSplitCM
In SQL Server 2016 you also have split_string which is faster (but has issues).
Another must have is a good NGrams function. The link I posted has the function attached at the bottom of the article. I have some string cleaning functions here.
The links I posted have some good examples.
I agree with all the approaches mentioned: Load the data into one staging table (possibly using SSIS) then shred it with T-SQL (probably wrapped up in stored procedures).
This is a custom piece of work that needs hand built scripts. There's no automated tool for this because both your source and target schemas are custom schemas. So you'd need to define all that mapping and rules somewhow.... and no SSIS does not magically do this!
It sounds like you have a target schema and mappings between source and target schema already worked out
As an example your first step is to load 'lookup' tables with this kind of query:
INSERT INTO TargetLookupTable1 (Field1,Field2,Field3)
SELECT DISTINCT Field1,Field2,Field3
FROM SourceStagingTable
TargetLookupTable1 should already have an identity primary key defined (which is not mentioned in the above query because it is auto generated)
This is where you will find your first problem. You'll almost definitely find your distinct query just gives you a whole lot of duplicated mispelt data rubbish data. So before you even load your lookup table you need to do data cleansing.
I suggest you clean the data in your source system directly but it depends how comfortable you are with that.
Next step is: assuming your data is all clean and you've loaded a dozen lookup tables in this way..
Now you need to load transactions but you don't know the lookup key that you just generated!
The trick is to pre-include an empty column for this in your staging table to record this
Once you've loaded up your lookup table you can write the key back into the staging table. This query matches back on the fields you used to load the lookup, and writes the key back into the staging table
UPDATE TGT
SET MyNewLookupKey = NewLookupTable.MyKey
FROM SourceStagingTable TGT
INNER JOIN
NewLookupTable
ON TGT.Field1 = NewLookupTable.Field1
AND TGT.Field2 = NewLookupTable.Field2
AND TGT.Field3 = NewLookupTable.Field3
Now you have a column called MyNewLookupKey in your staging table which holds the correct lookup key to load into you transaction table
Ongoing uploads of data is a seperate issue but you might want to investigate an MS Access Data Project (although they are apparently being phased out, they are very handy for a front end into SQL Server)
The thing to remember is: if there is anything ambiguous about your data, for example, "these rows say my car is black but these rows say my car is white", then you (a human) needs to come up with a rule for "disambiguating" it. It can't be done automatically.
So there are quite a number of ways to skin this cat. I don't know much about the "Migration Assistant", but I somehow doubt it's going to make your life easier given what you're trying to do.
I'd just dump the whole denormalized mess into a single big staging table then shred it where you need it using SQL. I know you asked for help with the TSQL, but without having some idea of what the denormalized data is and how you want to re-shape it, all I can do really is suggest you read up on SQL in general (select, from, where, group by, etc).
You could also do the work in SSIS, but ultimately the solution you use is largely going to depend on the nature of how you need to normalize the big denormalized data set. IMHO doing this in SQL is usually the easiest way, but then again when you're a hammer, everything looks like a nail.
As far as future proofing the process, how you import the Access data probably will have little bearing on how you'd import Excel data. If you have a whole lot of different data sources which you'll need to incorporate on a recurring basis, SSIS might be a good choice to invest some time and effort into for the long run. No matter what, incorporating data from a distinct data source takes time and effort. You'll have to do some extra work no matter what. I would weight how frequently you think you'll have to integrate a given data source, and how much effort is involved to massage it into the format you want.
I have a completely different opinion. Because I do both database development and Microsoft's Power BI - - on the PBI side we come across a lot of non-normalized data because a lot of the data is coming in from excel.
My guess is that what is now in Access was an import of something originally began in excel.
Excel Power Query and PBI offers transforms to pivot and unpivot layout. I would use these tools to do that task. Then import the results into SQL.
first of all, I don't know if this is the right platform for this question. I hope it is. This is basically an architectural issue or more specifically a database design issue.
My company has asked me to create a service based website where individual subscribers can log in to their own customizable retail store. One fundamental question related to this requirement is designing the database. As I can understand there are two major approaches
Create a separate database based on a template for each subscriber / client.
Have a single database for all clients and link the tables based on primary key fields.
If any one has experience with the above scenario or can provide any useful insights, please do let me know.
Regards
Romi
separate database
You can put them easy to a other dedicated server
You have to administrate 1,000 databases for 1,000 clients
Your application need to figure out which database have to be used
one database with relationships
You have to administrate only one database
Less additional complexity in your application (tons of configurations etc.)
You can easy JOIN tables over all clients. For statistics or what ever.
I have a general Database Design question: When is it better to create a new Database instead of adding new Tables to an existing? The question is related to design/maintainability andalso performance issues.
Background: we are mirroring the main tables of our customer by importing the data every night into our DB called RM2. Some new (ASP.Net-)Projects need access to this data too, now i'm wondering if i should create new databases for each project or merge their tables with the RM2(current size: 37991.94 MB).
I won't necessarily answer your question, but I'll give you a bunch of other questions to consider as well:
When should I add files to a filegroup in my database? - When files get too big, where "too big" may be a matter of opinion.
When should I add a new filegroup to my database? - When you want to be able to optimize disk usage for different database operations.
When should I add a new schema to my database? - When you have a set of objects that are logically related and may require different default permissions for users.
When should I add a new database to my application? - When you don't need any referential integrity between any of the tables in the two databases. When you don't want to allow any ownership-chained permissions to cross between two sets of objects. When you want to independently backup and restore. When you want different SQL Server recovery models for two sets of data.
I guess that may have answered your question some. ;-)
This isn't a database design question. This is an organisational question. The organisational aspects of this question are much more important than technical questions.
The answer is: whatever makes life easiest for you as developers
For instance, you say:
Some new (ASP.Net-)Projects need access to this data too.
How integrated are these projects with your project? Do you actually share data (or write to the same tables)? For instance, if you make a breaking change to one of your tables, do you need to make changes to the code in the other projects at the same time? (Sometimes really hard to synchronize between two projects).
If you don't actually share data (apart from the customer data, which I assume is effectively read-only), then use separate databases (OR schemas). This makes changes a lot easier to manage.
Another trick is to have in each database a set of views onto the customer data, which lives in elsewhere, in another schema.
So, have a database per project, with views in each database onto the customer data, which lives in a single separate database.
Performance shouldn't really be an issue, unless the databases live on separate machines.
You can get other database for projects if any project want use only self database. But if you want get datacenter of this you should create for it a other shared database + self database for each project.
I've recently asked a question about how suitable a DVCS is for the corporate environment, and that has sparked another question for me.
One of the plus sides to a DVCS seems to be that you can easily branch and try out new things. My problem starts when I begin to think about database changes. I've always found it tricky to get a DB into a VCS and it just sounds like it's going to be even harder with a DVCS.
So, whats the best way to work with databases and a DVCS?
EDIT: I've started looking into Migrator.NET. What do people think of projects like this for easily moving between versions specificaly with experimental branches in your DVCS?
I think the best way to deal with this issue is to work with DB Schemas, not the databases themselves. In this case, each developer would have their own database to develop against.
Here are some of the options available:
Migrations framework within Ruby on Rails.
South for Django, in addition to the schema being defined in the model classes themselves.
Visual Studio 2008 Team System Database Edition for .NET: You define the schema and the tool can do a diff on schema and data to generate scripts to go between different versions of the database.
These may give you some inspiration on how to deal with putting a database in version control. Another benefit that comes when you deal schemas is that you can more readily implement TDD and Continuous Integration (CI). Your TDD/CI environment would be able to build up a new version of the database and then run tests against the newly generated environment.
Version all the scripts you're using to manage your database. If you need to have "in-development" changes to a DB, make them on your personal DB until such time as you "publish" your changes.
Database version control is always the most difficult thing in a multi-developer environment.
Typically each user will have their own DB which is a chimera of some but not all of the DB changes. When they make changes, they'll need to commit their change scripts. This gets really awkward. The core problems seem to stem from database changes affecting many aspects of the system and multiple table changes being dependent on each other - and how to migrate to the new schema from the old schema. Migrating data to a new schema is typically non-trivial. Often you want to default a column when data is copied to the new schema, but NOT default a column in general for INSERT, say. These are typically already difficult in production deployment issues and having to manage the database during development when the database design could be in major flux in the same way as a major deployment is a lot more work than you usually need to be doing in development. Time that could be better spent ensuring that your database is well-designed - constraints, foregin keys, etc.
Because the developers are more likely to step on each other with database changes, we always had a database chokepoint - the developers all developed against the SAME development database and made their changes "live". Then the dev database was version controlled independently. This is not really easy when people are offsite or whatever. Another alternative is to have designated database developers who coordinate changes several developers need to the same table - that doesn't need to be their entire job, but gives you better DB design consistency. Or you can coordinate database revisions so that people become more aware of the DB revs other people are doing and time their changes to wait until a DB rev is available from another developer.
The best way to not put database into VCS in binary form. Period.
If you have text representation of your database and you have special merge tool to resolve conflicts when your database will be changed in different branches -- then you can start thinking about versioning databases. Otherwise it will be constant pain in the ass.
I have been googling a lot and I couldn't find if this even exists or I'm asking for some magic =P
Ok, so here's the deal.
I need to have a way to create a "master-structured" database which will only contain the schemas, structures, tables, store procedures, udfs, etc, everything but real data in SQL SERVER 2005 (if this is available in 2008 let me know, I could try to convince my client to pay for it =P)
Then I want to have several "children" of that master db which implement those schemas, tables, etc but each one has different data.
So when I need to create a new stored procedure or something like that, I just create it on the master database (and of course it's available on its children).
Actually I have several different databases with the same schema and different data. But the problem is to maintain congruency between them. Everytime I create a script to create some SP or add some index or whatever, I have to execute it in every database, and sometimes I could miss one =P
So let's say you have a UNIVERSE (would be the master db) and the universe has SPACES (each one represented by a child db). So the application I'm working on needs to dynamically "clone" SPACES. To do that, we have to create a new database. Nowadays I'm creating a backup of the db being cloned, restoring it as a new one and truncate the tables.
I want to be able to create a new "child" of the "master" db, which will maintain the schemas and everything, but will start with empty data.
Hope it's clear... My english is not perfect, sorry about that =P
Thanks to all!
What you really need is to version-control your database schema.
See do-you-source-control-your-databases
If you use SQL Server, I would recommend dbGhost - not expensive and does a great job of:
synchronizing 2 databases
diff-ing 2 databases
creating a database from a set of scripts (I would recommend this version).
batch support, so that you can upgrade all your databases using a single batch
You can use this infrastructure for both:
rolling development versions to test, integration and production systems
rolling your 'updated' system to multiple production deployments (especially in a hosted environment)
I would write my changes as a sql file and use OSQL or SQLCMD via a batchfile to ensure that I repeatedly executed on all the databases without thinking about it.
As an alternative I would use the VisualStudio Database Pro tools or RedGate SQL compare tools to compare and propogate the changes.
There are kludges, but the mainstream way to handle this is still to use Source Code Control (with all its other attendant benefits.) And SQL Server is increasingly SCC friendly.
Also, for many (most robust) sites it's a per-server issue as much as a per-database issue.
You can put things in master like SPs and call them from anywhere. As far as other objects like tables, you can put them in model and new databases will get them when you create a new database.
However, in order to get new tables to simply pop up in the child databases after being added to the parent, nothing.
It would be possible to create something to look through the databases and script them from a template database, and there are also commercial tools which can help discover differences between databases. You could also have a DDL trigger in the "master" database which went out and did this when you created a new table.
If you kept a nice SPACES template, you could script it out (without data) and create the new database - so there would be no need to TRUNCATE. You can script it out from SQL or an external tool.
Little trivia here. The mssqlsystemresource database works as you describe: is defined once and 'appears' in every database as the special sys schema. Unfortunately the special 'magic' needed to get this working is not available to the user databases. You'll have to use deployment techniques to keep your schema in synk. That is, apply the changes to every database as the other answers already suggested.
In theory, you could put a trigger on your UNIVERSE.sysobjects table (assuming SQL Server), and then you could enumerate master.dbo.sysdatabases to find all the child databases. If you have a special table that indicates it's a child database, you can reference child.dbo.sysobjects to find it.
Make no mistake, it would be difficult to implement. But it's one way you could do it.