I have a desktop (winforms) application that uses a Firebird database as a data store (in embedded mode) and I use NHibernate for ORM. One of the functions we need to support is to be able to import / export groups of data to/from an external file. Currently, this external file is also a database with the same schema as the main database.
I've already got NHibernate set up to look at multiple databases and I can work with two databases at the same time. The problem, however, is copying data between the two databases. I have two copy strategies: (1) copy with all the same IDs for objects [aka import/export] and (2) copy with mostly new IDs [aka duplicate / copy]. I say "mostly new" because there are some lookup items that will always be copied with the same ID.
Copying everything with new IDs is fine, because I'll just have a "CopyForExport" method that can create copies of everything and not assign new IDs (or wipe out all the IDs in the object tree).
What is the "best practices" way to handle this situation and to copy data between databases while keeping the same IDs?
Clarification: I'm not trying to synchronize two databases, just exporting a subset (user-selectable) or data for transfer to someone else (who will then import the subset of data into their own database).
Further Clarification: I think I've isolated the problem down to this:
I want to use the ISession.SaveOrUpdate feature of NHibernate, so I set up my entities with an identity generator that isn't "assigned". However, I have a problem when I want to override the generated identity (for copying data between multiple databases in the same process).
Is there a way to use a Guid.Comb or UUID generator, but be able to sometimes specify my own identifier (for transferring to a different database connection with the same schema).
I found the answer to my own question:
The key is the ISession.Replicate method. This allows you to copy object graphs between data stores and keep the same identifier. To create new identifiers, I think I can use ISession.Merge, but I still have to verify this.
There are a few caveats though: my test class has a reference to the parent object (many-to-one relationship) and I had to make the class non-lazy-loading to get Replicate to work properly. If I didn't have it set to eager load (non lazy load I guess), it would only replicate the object and not the parent object (cascade="all" in my hbm.xml file).
The java Hibernate docs have a reference to Replicate(), but the NHibernate documentation doesn't (section 10.9 in the java docs).
This makes sense for the Replicate behavior because we want to have fully hydrated entities before transferring them to another data store. What's weird though is that even with both sessions open (one to each data store), it didn't think to hydrate the object when I wanted to replicate it.
You can use FBCopy for this. Just define which tables and columns you want copied and I'll do the job. You can also add optional WHERE clause for each table, so it only copies the rows you want.
While copying it makes sure the order of which data is exported is maintained, so that foreign keys do not break. It also supports generators.
Related
I am looking for the best way to automatically detect new files in a S3 bucket and then to load the data into a Snowflake table.
I know this can be achieved using Snowpipe and SNS, SQS notifications set up in AWS but I would like to have a self-contained solution within Snowflake which can be used for multiple data sources.
I want to have a table which is updated with the file names from a S3 bucket and then to load files which have not already been loaded from S3 into Snowflake.
The only way I have found to automatically detect new files from an external S3 stage in Snowflake so far is to use the code below and a task on a set schedule. This lists the file names and then uses result_scan to display the last query as a table.
list #STAGE_NAME;
set qid=last_query_id();
select "name" from table(result_scan($qid))
Does anyone know a better way to automatically detect new files in an external stage from Snowflake? Any help is much appreciated.
Not necessarily better than the way you've already found, but there is an alternative approach to listing the files in an S3 bucket.
If you create an EXTERNAL TABLE over the data in S3, you can then use the METADATA$FILENAME property in a query. If you have a record of which files have already been loaded into Snowflake then you can compare and select the names of the new files and process them.
e.g.
ALTER EXTERNAL TABLE MYSCHEMA.MYEXTERNALTABLE REFRESH;
SELECT DISTINCT
METADATA$FILENAME as filename
FROM
MYSCHEMA.MYEXTERNALTABLE;
Short Run:
Your approach
You've already found a viable solution, and your concern about the reliability of the last query id function is understandable. Procedures' sessions are isolated and so the last_query_id() function will be isolated to only the statements executed within that procedure. It might be unnecessary to use a procedure, but I personally like that they let you create reusable abstractions.
Another approach
An alternative, if you don't like the approach you're using, would be to create a single table with a single VARIANT data column plus the stage metadata columns, maintained by a single giant pipe, and you could maintain a set of materialized views over that table, which would filter, convert variant fields to columns, and sanitize, as appropriate.
There are some benefits:
simpler: integrating new prefixes for a stage requires only an additional materialized view, not an additional pipe + task
more control: you'd be able to operate directly and automatically on the data in raw form, rather than needing to load into a table and then check it. This means you can perform data quality checks, metadata checks, and sanitization.
maintainable: the use of materialized views over an immutable source means you can at any time change the logic and perform a full backfill with little effort.
Long Run:
Notification Integrations enable snowflake to listen (and possibly notify in the future, roadmap-gods willing) to external messaging systems. At this moment only Azure is supported, so it won't work for your case, but keep an eye out over the next few months -- I think it's safe to speculate that we will see this feature grow to support AWS, and a more direct and concise manner for implementing your original solution will eventually become available.
I'm intending to use both of SQL Server and simple text files to save my data.
Information like Users data are going to be stored in SQL Server, RSS fedd for each user are going to be stored in folder with the user Id as a title and inside this folder I can put the files that going to store the data in, each file can take only 20 lines, if there is more than 20 then I make a new file.
When I need to reed this data I simply call the last file in the user's folder.
I need to know what is the advantages and disadvantages of using this method?
thanx
I would suggest you to store the text file data into either VARCHAR(8000) or Blob and store inside the table in database.
The advantages of storing in database is:
All your data is stored in a single place. It is very easy for you to backup and restore in other place, if required
Database by default comes with concurrency and if you have say multiple users trying to access the same row, same table, database handles it inherently
When you go for files and database kind of hybrid approach, you are going for distributed storage and you have to always make sure that they are consistent
If you want to just store the latest text file content, go for UPDATE. If you want to keep history of earlier text files content, go for SCD Type 2 kind of storage or go for historical table containing previous text file data
Database is a single contained unit and you can do so many things on it like : Transparent data encryption, masking, access control and all security related stuff in a single contained unit. In hybrid approach, you have to manage security in two places.
When all your data is in a single place, and once you have proper indexes, you can write queries and come up with so many different reporting use cases, using SQL. But, if the data is distributed, you have to manage how will be handling the different reporting use cases.
The question is not quite correct.
You should start with clarification of requirements for the application. Answer to yourself the following questions:
What type of data queries need to be executed (selects, updates, reports).
How many users will be. How often requests from them will be coming. Does data must be synchronized across users (Concurrency).
Need of authentication and authorization, localization.
Need for modification history support.
Etc.
Databases usually have all this mechanisms and you do not have to implement them in your application.
Depending on your application needs you decide what strategy to use for storing the data: by means of database, files, or by both approaches.
Can someone explain to me what the fundamental difference is between Core Data (apparently, a "data store") and a database like SQLite or MySQL?
I am working on writing an iPhone app, and needed a table of static data to display. I thought core data would be a good choice for this, so I got everything set up and functioning as far as the database (i'm sorry - data STORE) went, and then went to try to import my data (it was in an excel file which I exported to CSV). I was thinking it should be a straight forward process like I have done in SQLite and other databases many times, but as it turned out after much research, the only "official" way to do this was to write a parser specifically for my data.
When I asked about this on the Apple Developer forums, the response I got was basically "What kind of idiot are you to think that you could possibly import data directly without having to write code to do it? Core data isn't a database- it's a data STORE!!" For the life of me, though, I can't see the distinction. In every way I have looked at it, core data behaves EXACTLY like a database, with a fancy way of accessing it and enough abstraction that it can use a variety of file formats for actually storing the data. In fact, I was eventually able to import my data using a simple SQLite .import command, so I really don't understand why the concept was so foreign to the responders to my original question.
So what am I missing here? What is so fundamentally different about a data store from a database that makes the concept of simple data importing completely alien to those who know the technology?
Core Data is not simply a means of persisting/storing data to and from disk as is SQL. Core Data's true function is to provide the complete model layer for the Model-View-Controller app design that the Apple API uses. As such Core Data is primarily an object-graph manager with persistence options tack onto the side.
An object-graph is a collection of live objects in memory. In Core Data, these are the managed objects. They are called "managed" objects because the managed object context observes the objects constantly making sure they are in the states and relationships that the data model says they should be in.
Core Data does provide persistence option but exactly what that option is for any particular implementation is largely hidden. You can even use the same data model and managed objects with different persistence methods, sometime in the same app.
The key difference with SQL is that SQL writes the actual data to disk whereas Core Data serializes live objects. When you look at a sqlite store in Core Data you are looking at objects that have been taken apart and "freeze dried". Obviously, "freeze drying" objects requires a rather specific data format in the sqlite store so the Core Data store uses its own custom schema that is largely the same regardless of details of the store.
That is why you can't just swap in any old SQL file and expect Core Data to import it. The SQL file is rows, tables and columns of data and not a specialized tables, columns and rows use to reconstitute freeze dried objects.
Since Core Data is first and foremost an object-graph manager, the only supported and reliable means of importing data is to create the object-graph. In the case of an SQL file, that means reading the SQL data using the SQL api and then generating managed objects from that data and then saving them to a persistent store.
That part is more work but you save time integrating the data into the rest of the app, upgrading the data and gains in reliability and maintainability.
A dictionary definition gives me:
Databases are data stores, but a data store isn't always a database.
The feature you expected isn't available in some databases either (but most are).
A data store can for example store non-relational data.
They should have just pointed you at the Wikipedia article on Core Data.
According to that article, "It allows data organised by the relational entity-attribute model to be serialised into XML, binary, or SQLite stores. The data can be manipulated using higher level objects representing entities and their relationships. Core Data manages the serialised version, providing object lifecycle and object graph management, including persistence. Core Data interfaces directly with SQLite, insulating the developer from the underlying SQL."
I guess it's the fact that "Core Data manages the serialised version" that means you can't import data directly. That is, you probably can't import data directly into SQLite in such a way that Core Data can manage it, although you probably can import data directly into SQLite in some way.
Core Data is not a data store, a data store is one part of Core Data. Core Data is closer related to an Object Relational Mapping (ORM) tool. Core Data actually has the option of using SQLite for it's datastore, but you can also choose XML files, proprietary format, or write your own datastore.
Not sure how you were able to import your data with a SQL import, shouldn't be compatible with Core Data since Core Data creates a proprietary SQL database schema that contains a ton of metadata.
Maybe it's better to think of Core Data as an "object store" and a database as a "data store". Core Data is good when you have a variety of types of object, with relationships to each other. The familiar example is a company with employees, who have bosses and reports, belong to departments, are assigned to clients, projects, etc., have schedules, go to meetings. Employees can get reassigned, etc. Even the types of relationships defined vary from time to time. That's a more heavyweight process even with Core Data, but Core Data makes it more easy than with a raw database.
If you just have "data", and not "objects", it's easier to use a database. For example if you just have a table of the elements with atomic weights, etc., you might want to just use a database.
For your application it sounds like you just have one table. It will be easy to just use SQLite, which is available, so use it if it's more convenient.
On the other hand, iOS SDK has some pre-built features that interact with Core Data. If you use SQLite you don't get those. So you might avoid custom code to import your data but have to write custom code to display your data. Tough luck. When creating software sometimes you have to write code. Weird, I know.
I maintain an application which has many domain entities that draw data from more than one database. The way this normally works is that the entities are loaded from Database A (in which most of their fields are stored). when a property corresponding to data in Database B is called, the entity fires off SQL to Database B to get all the relevant data.
I'm currently using a 'roll-your-own' ORM, which is ugly, but effective (and easy to understand). I've recently started using NHibernate for entities drawn solely from Database A, but I'm wondering how I might use NHibernate for entities drawn from both Databases A and B.
The best way I can think of do this is as follows. I continue to use a NHibernate-based class library for entities in Database A. Those entities which also need data from Database B expose all their data from Database B in a single class accessed via a property. When this property is called, it invokes the appropriate repository, and the object is returned. The class library for accessing Database B would therefore need to be referenced from the class library for accessing Database A.
Does this make any sense, and is there a more established pattern for this situation (which must be fairly common).
Thanks
David
I don't know how well it maps to your situation, or how mature the NHibernate porting for it is at this point, but you might want to look into Shards.
If it doesn't work for you as-is, it might at least supply some interesting patterns to consider.
EDIT (based on comments):
This indeed doesn't seem to map to your situation, as Shards is about horizontal splitting of data.
If you need to split vertically, you'll probably need to define multiple persistence units. Queries and transactions involving both databases will probably get interesting. I'm afraid I can't really help much with this. This question is definitely related though.
I have a database that has lots of data and is all "neat", normalized (within reason - using EAV), and I have stored procedures to access and modify the data.
I also have a WinForms application that users download to search and view this data (no inserts). To make things handy for use and updates, I've been using SQLite to store this data and it works really well.
I'm working on updating the entire process and I was wondering if I should use a denormalized view of the data to ship out to the users, ala the 1 table with all the properties as columns, or continue to use the same schema as the master database?
My initial thoughts are along the lines of :
Denormalized View:
Benefits...
Provides a simple method of querying the data (since I'm not doing a lot of joins, just a bunch of column searching.
Cons...
I'd have to manage a second data access layer. Granted I don't think it will be difficult, but it is still a bit more work.
If a new property is added, I'd have to modify the schema again and accomodate for the changes. Wheras I can simply query the property bag and work form there.
Same Schema:
Pros...
Same layout as master database, so updates are minimal, and I can even use the same queries when building my Data Access Layer since SQLite doesn't support stored procedures.
Cons...
There is a lot of small tables for lookup codes and the like, so I could start running into issues when building the queries and managing it in the DAL.
How should I proceed?
If you develop your application to query views of the data rather than the underlying data itself, you will be able to keep the same database for both scenarios without concern or the need to alter your DAL.