I am currently working on a project in F# that takes in data from Excel spreadsheets, determines if it is compatible with an existing table in SQL Server, and then adds the relevant rows to the existing table.
Some of the data I am working with is more specific than the types provided by T-SQL. That is, T-SQL has a type "date", but I need to distinguish between sets of dates that are at the beginning of each month or the end of each month. This same logic applies to many other types as well. If I have types:
Date(Beginning)
Date(End)
they will both be converted to the T-SQL type "date" before being added to the table, therefore erasing some of the more specific information.
In order to solve this problem, I am keeping a log of the serialized types in F#, along with which column number in the SQL Server table they apply to. My question is: is there any way to store this log somewhere internally in SQL Server so that I can access it and compare the serialized types of the incoming data to the serialized types of the data that already exists in the table before making new inserts?
Keeping metadata outside of the DB and maintaining them manually makes your DB "expensive" to manage plus increases the risk of errors that you might not even detect until something bad happens.
If you have control over the table schema, there are at least a couple of simple options. For example, you can add a column that stores the type info. For something simple with just a couple of possible values as you described, just add a new column to store the actual type value. Update the F# code to de-serialize the source into separate DATE and type (BEGINNING/END) values which are then inserted to the table. Simple, easy to maintain and easily consumed.
You could also create a user defined type for each date subtype but that can be confusing to another DBA/dev plus makes it more complicated when retrieving data from your application. This is generally not a good approach.
Yes, you can do that if you want to.
Related
My application has a database table that is used to record the attendance of employees. And the column attedance_status has only three possible values - "present", "absent", "on_leave", and NULL as default.
How do I add it to the database? So far I have come up with two possible ways.
Create another table attendance_status with status_id and status_value and add the above values to it. And then use the id in the application for all SQL queries.
Probably the bad way. Hardcode the values (maybe in a config file) and use it throughout the app's SQL queries.
Am I missing the right way? How should this be approached?
Either will work, but Option 1 will give you flexibility in the event that the requirements change and is the standard data model. I would, however, name my columns a little differently. I would have id, value, name. Then the references become attendance_status.id and attendance_status.value. The third column is available for use in displays or reports or whatever. value is on_leave and name is On leave.
Option 2 works provided the data input point is totally closed. If someone codes new functionality there is the risk that he or she will invent something different to mean the same thing like onLeave.
I have a table, DD, which is a data dictionary, with fields (say):
ColumnID (longint PK), ColumnName (varchar), Datatype (varchar)
I have another table, V, where I have sets of records in the form:
ColumnID (longint FK), ColumnValue (varchar)
I want to be able to convert sets of records from V into another table, Results, where each field will be translated based on the value of DD.Datatype, so that the destination table might be (say):
ColumnID (longint FK), ColumnValue (datetime)
To be able to do this, ISTM that I need to be able to do something like
CONVERT(value of DD.Datatype, V.ColumnValue)
Can anyone give me any clues on whether this is even possible, and if so what the syntax would be? My google-fu has proved inadequate to find anything relevant
You could do something like this with dynamic sql, certainly. As long as you are aware of the limitation that the datatype is a property of the COLUMN in the resultset, and not each cell in the resultset. So all the rows in a given column must have the same datatype.
The only way to accomplish something like CONVERT(value of DD.Datatype, V.ColumnValue) in SQL is with dynamic SQL. That has it's own problems, such as basically needing to use stored procedures to keep queries efficient.
Alternately, you could fetch the datatype metadata with one query, construct a new query in your application, and then query the database again. Assuming you're using SQL Server 2012+, you could also try using TRY_CAST() or TRY_CONVERT(), and writing your query like:
SELECT TRY_CAST(value as VARCHAR(2)) FieldName
FROM table
WHERE datatype = 'VARCHAR' AND datalength = 2
But, again, you've got to know what the valid types are; you can't determine that dynamically with SQL without dynamic SQL. Variables and parameters are not allowed to be used for object or type names. However, no matter what you do, you need to remember that all data in a given column of a result set must be of the same datatype.
Most Entity-Attribute-Value tables like this sacrifice data integrity that strong typing brings by accepting that the data type is determined by the application and not the RDBMS. EAV does not allow you to have your cake (store data without a fixed schema) and eat it, too (enjoy DB enforced strong data typing, not having to typecast strings in the application, etc.).
EAV breaks data normalization pretty badly. It breaks First Normal Form; the most basic rule, and this is just one of the consequences. EAV tables will make querying the data anywhere from awkward to extremely difficult, and you're almost always going to sacrifice performance doing it because the RDBMS is built around the relational model.
That doesn't mean you shouldn't ever use EAV tables. They're relatively great for user defined fields. However, it does mean that they're always going to suck to query and manage. That's just the tradeoff. You broke First Normal Form. Querying and performance are going to suffer consequences of that choice.
If you really want to store your all your data like this, you should look at either storing data as blobs of XML or JSON (SQL Server 2016) -- but that's a general pain to query -- or use a NoSQL data store like MongoDB or Cassandra instead of an SQL RDBMS.
I am new to SSIS.
I have a number of MS access tables to transform to SQL. Some of these tables have datetime fields needed to go under some rules before sitting in respected SQL tables. I want to use Script component that deals with these kind of fields converting them to the desired values.
Since all of these fields need same modification rules, I want to apply the same code base to all of them thus avoiding the code duplication. What would be the best option for this scenario?
I know I can't use the same Script Component and direct all of those datasets outputs to it because unfortunately it doesn't support multi-inputs . So the question is is it possible to apply a set of generic data manipulation rules
on a group of different datasets' fields without repeating the rules. I can use a Script component for each ole db input and apply the same rule on them each. But it would not be an efficient way of doing that.
Any help would be highly appreciated.
SQL Server Integration Services has a specific task to suit this need, called a Data Conversion Transformation. This can be accomplished on the data source or via the task, as noted here.
You can also use the Derived Column transformation to convert data. This transformation is also simple, select an input column and then chose whether to replace this column or create a new output column. Then you apply an expression for the output column.
So why use one over the other?
The Data Conversion transformation (Pictured Below) will take an input, convert the type and provide a new output column. If you use the Derived Column transformation, you get to apply an expression to the data, which allows you to do more complex manipulations on the data.
I'm working on a data conversion utility which can push data from one master database out to a number of different databases. The utility its self will have no knowledge of how data is kept in the destination (table structure), but I would like to provide writing a SQL statement to return data from the destination using a complex SQL query with multiple join statements. As long as the data is in a standardized format that the utility can recognize (field names) in an ADO query.
What I would like to do is then modify the live data in this ADO Query. However, since there are multiple join statements, I'm not sure if it's possible to do this. I know at least with BDE (I've never used BDE), it was very strict and you had to return all fields (*) and such. ADO I know is more flexible, but I don't know quite how flexible in this case.
Is it supposed to be possible to modify data in a TADOQuery in this manner, when the results include fields from different tables? And even if so, suppose I want to append a new record to the end (TADOQuery.Append). Would it append to two different tables?
The actual primary table I'm selecting from has a complimentary table which is joined by the same primary key field, one is a "Small" table (brief info) and the other is a "Detail" table (more info for each record in Small table). So, a typical statement would include something like this:
select ts.record_uid, ts.SomeField, td.SomeOtherField from table_small ts
join table_detail td on td.record_uid = ts.record_uid
There are also a number of other joins to records in other tables, but I'm not worried about appending to those ones. I'm only worried about appending to the "Small" and "Detail" tables - at the same time.
Is such a thing possible in an ADO Query? I'm willing to tweak and modify the SQL statement in any way necessary to make this possible. I have a bad feeling though that it's not possible.
Compatibility:
SQL Server 2000 through 2008 R2
Delphi XE2
Editing these Fields which have no influence on the joins is usually no problem.
Appending is ... you can limit the Append to one of the Tables by
procedure TForm.ADSBeforePost(DataSet: TDataSet);
begin
inherited;
TCustomADODataSet(DataSet).Properties['Unique Table'].Value := 'table_small';
end;
but without an Requery you won't get much further.
The better way will be setting Values by Procedure e.g. in BeforePost, Requery and Abort.
If your View would be persistent you would be able to use INSTEAD OF Triggers
Jerry,
I encountered the same problem on FireBird, and from experience I can tell you that it can be made(up to a small complexity) by using CachedUpdates . A very good resource is this one - http://podgoretsky.com/ftp/Docs/Delphi/D5/dg/11_cache.html. This article has the answers to all your questions.
I have abandoned the original idea of live ADO query updates, as it has become more complex than I can wrap my head around. The scope of the data push project has changed, and therefore this is no longer an issue for me, however still an interesting subject to know.
The new structure of the application consists of attaching multiple "Field Links" on various fields from the original set of data. Each of these links references the original field name and a SQL Statement which is to be executed when that field is being imported. Multiple field links can be on one single field, therefore can execute multiple statements, placing the value in various tables, etc. The end goal was an app which I can easily and repeatedly export a common dataset from an original source to any outside source with different data structures, without having to recompile the app.
However the concept of cached updates was not appealing to me, simply for the fact pointed out in the link in RBA's answer that data can be changed in the database in the mean-time. So I will instead integrate my own method of customizable data pushes.
We have got a .Net Client that calls a Webservice. We want to store the result in a SQL Server database.
I think we have two options here how to store the data, and I am a bit undecided as I can't see the pros and cons clearly: One would be to map the results into database fields. That would require us to have database fields corresponding to each possible result type, e.g. for each "normal" result type as well as those for faults.
On the other hand, we could store the resulting XML and query that via the SQL Server built in XML functions.
Personally, I am comfortable with dealing with both SQL and XML, so both look fine to me.
Are there any big pros and cons and what would I need to consider in terms of database design when trying to store the resulting XML for quite a few different possible Webservice operations? I was thinking about a result table for each operation that we call with different entries for the different possible outcomes / types and then store the XML in the right field, e.g. a fault in the fault field, a "normal" return type in the appropriate field etc.
We use a combination of both. XML for reference and detailed data, and text columns for fields you might search on. Searchable columns include order number, customer reference, ticket number. We just add them when we need them since you can extract them from the XML column.
I wouldn't recommend just the XML. If you store 10.000 messages a day, a query like:
select * from XmlLogging with (nolock) where Response like '%Order12%'
can become slow and interfere with other queries. You also can't display the logging in a GUI because retrieval is too slow.
I wouldn't recommend just the text columns either. If the XML format changes, you'd get an empty column. That's hard to troubleshoot without the XML message. In addition, if you need to "replay" the message stream, that's a lot easier with the XML messages. Few requirements demand replay, but it's really helpful when repairing the fallout of production problems.