I was reading about ORMs and one of the descriptions I read said that the ORM interacts with database metadata.
Why is this important or relevant?
Metadata, as I understand, is just a way of describing what the database contains. So, for example, the database might have an internal table that lists what user tables have been created. Why would something like this be useful to an ORM?
What this means is that the ORM maps the schema, or structure, of the database to objects. Typically, this means mapping tables to classes (User table to User class), fields to attributes (Age field to User.Age attribute), and each record then represents an instance of that object.
The ORM uses the metadata to generate the code used to access the tables. For example, if it's a date column then it generates the code to deal with that column as a date.
It will read foreign keys and primary keys to build relationships in the code as well as for generating the proper SQL syntax.
This is just a few of the ways it uses the metadata.
Related
I am creating abstraction of database schema using object oriented programming. I have a design issue: should indices be top-level objects (like tables, view, stored procedures) or rather should be accessible through a table, like columns? What about triggers too?
I am building a python package (http://code.google.com/p/fathom/) for database schema discovery. Right now indices are accessed through table, but I see that some tools for administering database have indices as separate entry in some tree view, just like tables. That's why I am wondering, If I am doing it right.
"Indices" are part of a single table like "columns", they are not independent, like a S.P. where the developer can alter o modify several tables.
They are composed by several columns or expressions from a single table.
In the other hand, I agree sometimes its confusing. Many tools put relations among tables as dependant on a single table, and I prefer to think relations as an item of the database, not as a single table, just the opposite of "indices".
Consider a database server whose job today is to house one database. Likely the database will be moved in the future to another database instance which houses multiple databases & schemas.
Let's pretend the app/project is called Invoicer 2.0. The database is called AcmeInvoice. The database holds all the invoice, customer, and product information. Here's a diagram of the actors and their roles and behaviour.
The schema(s) will largely be used to easily assign permissions to roles. The added benefit here is that the objects aren't under dbo, and that the objects & permissions can be ported to another machine in the future.
Question
What conventions do you use when naming the schema?
Is it good form to name the schema the same as the database?
I would think that if your schema name ends up being the same as your database schema, then you are just adding redundancy to your database. Find objects in your database that have common scope or purpose and create a schema to relect that scope. So for example if you have an entity for Invoices, and you have some supporting lookup tables for invoice states, etc, then put them all in an invoice schema.
As a generally rule of thumb, I would try to avoid using a name that reflects the application name, database name or other concrete/physical things because they can change, and find a name that conceptually represents the scope of your objects that will go into the schema.
Your comment states that "the schemas will largely be used to easily assign permissions to roles". Your diagram shows specific user types having access to some/all tables or some/all stored procs. I think trying to organize objects conceptually into schemas and organize them from a security standpoint into schemas are conflicting things. I am in favour of creating roles in sql server to reflect the types of users, and grant those roles access to the specific objects that each user type needs, as apposed to granting the role or user access the schema to build your security framework..
Why would you name the schema the same as the database? This means all database objects fall under the same schema. If this is the case, why have a schema at all?
Typically schema's are used to group objects within a common scope of activity or function. For example, given what you've described, you might have an Invoice schema, a Customer schema and a Product schema. All Invoice related objects would go into the Invoice schema, all Customer related objects would go into the Customer schema, and the same for Products.
We often will use a Common schema as well which includes objects that might be common to our entire application.
I would call the database AcmeInvoice (or another suitable name) and the schema Invoicer2.
My reasons are as follows: Acmeinvoice means I am grouping all of that applications objects/data together. It can therefore be moved as one unit to other machines (a backup/restore or unattach/attach).
The schema would be Invoicer2. Applications change, maybe in the future you will have Invoicer21 (you would create a schema), or perhaps a reporting module or system (Reports schema).
I find that the use of schemas allows me to separate data/procedures in one database into different groups which make it easier to adminster permissions.
I'm working on a web-based business application where each customer will need to have their own data (think basecamphq.com type model) For scalability and ease-of-upgrades, I'd prefer to have a single database where each customer gets a filtered version of the data. The problem is how to guarantee that they stay sandboxed to their own data. Trying to enforce it in code seems like a disaster waiting to happen. I know Oracle has a way to append a where clause to every query based on a login id, but does Postgresql have anything similar?
If not, is there a different design pattern I could use (like creating a view of each table for each customer that filters)?
Worse case scenario, what is the performance/memory overhead of having 1000 100M databases vs having a single 1Tb database? I will need to provide backup/restore functionality on a per-customer basis which is dead-simple on a single database but quite a bit trickier if they are sharing the database with other customers.
You might want to look into adding Veil to your PostgreSQL installation.
Schemas plus inherited tables might work for this, create your master table then inherit tables into per-customer schemas which provide a company ID or name field default.
Set the permissions per schema for each customer and set the schema search path per user. Use the same table names in each schema so that the queries remain the same.
I was thinking of putting staging tables and stored procedures that update those tables into their own schema. Such that when importing data from SomeTable to the datawarehouse, I would run a Initial.StageSomeTable procedure which would insert the data into the Initial.SomeTable table. This way all the procs and tables dealing with the Initial staging are grouped together. Then I'd have a Validation schema for that stage of the ETL, etc.
This seems cleaner than trying to uniquely name all these very similar tables, since each table will have multiple instances of itself throughout the staging process.
Question: Is using a user schema to group tables/procs/views together an appropriate use of user schemas in MS SQL Server? Or are user schemas supposed to be used for security, such as grouping permissions together for objects?
This is actually a recommended practice. Take a look at the Microsoft Business Intelligence ETL Design Practices from the Project Real. You will find (download doc from the first link) that they use quite a few schemata to group and identify objects in the warehouse.
In addition to dbo and etl, they also use admin, audit, part, olap and a few more.
I think it's appropriate enough, it doesn't really matter, you could use another database if you liked which is actually what we do.
I'm not sure why you would want a validation schema though, what are you going to do there?
Both the reasons you list (purpose/intent, security) are valid reasons to use schemas. Once you start using them, you should always specify schema when referencing an object (although I'm lazy and never specify dbo).
One trick we use is to have the same-named table in each of several schemas, combined with table partitioning (available in SQL 2005 and up). Load the data in first schema, then when it's validated "swap" the partition into dbo--after swapping the dbo partition into a "dumpster" schema copy of the table. Net Production downtime is measured in seconds, and it's all carefully wrapped in a declared transaction.
I am working on creating the necessary views, triggers and stored procedures so I can make it easier for people to use Integration Service to copy data to and from our database, which is an entity-attribute-value schema, so the foreign key relationships are not always explicitly stated in the schema, but in my view I can hopefully make it more explicit.
So if I have a vehicle entity and I want to copy it, and have all the related parts of the vehicle also be copied, what should I be looking at with the service?
I am not very comfortable with Integration Service so I may ask for some clarification after responses.
Thank you.
SSIS typically loads a single branch of a dataflow into a table. A branch can split to load multiple tables.
I'd say it would be better to load to a staging table which always matches the required expectations for an entity, have the users make their dataflows to populate the staging table and then use a single INSERT/UPDATE in a SQL Command task to update your view (via an INSTEAD OF trigger, right?).
Another good possibility is to create a custom data destination component which enforces all your expectations.