Hi I am using RevoScaleR package from Revolution Analytics and
I find it quite odd that the functions that are available for sql server objects are very limited .
for example:
RxSqlServerData does not support querying from a view .
I have a view which I have created from multiple tables and I intend to use this view as my source of data and I could not find anything that can solve my purpose in RevoScaleR .
I can very well create another table ( which I dont want to for many reasons) but I am looking for solutions in RevoScaleR space.
I figured this out ,if someone else stumbles upon this ,I guess it would help them(although its highly unlikely ,the documentation ,if consulted,should make this pretty trivial)
Like RxOdbcData ,RxSqlServerData also accepts table parameter and sqlquery parameter .
Now if you want to use a view or a stored Procedure you can exclude the table parameter and use sqlquery parameter .
keep in mind that both can not be used with each other
As #Bg1850 suggested - searching through the Microsoft/RevoScaleR docs is not very easy, but I think I found something here under Section : "Create New Data Sources" .
Basically, you have to set up a channel to your DB and you can use this as the basis to define different data sources ( = complete table or view or result of a SQL query).
Related
Problem: Junior SQL dev here, working with a SQL Server database where we have many functions that use temp tables to pull data from various tables to populate Crystal reports etc. We had an issue where a user action in our client caused a string to overflow the defined NVARCHAR(100) character limit of the column. As a quick fix, one of our seniors decided on a schema change to set the column definition to NVARCHAR(255), instead of fixing the issue of the the string getting too long. Now, we have lots of these table based functions that are using temp tables referencing the column in question but the temp table variable is defined as 100 instead of 255.
Question: Is there an easy way to find and update all of these functions? Some functions might not reference the table/column in question at all, but some heavily rely on this data to feed reports etc. I know I can right click a table and select "View Dependencies" in SQL Server Management Studio, but this seems very tedious to have to go through all of them and then update our master schema before deploying it to all customers.
I thought about a find and replace if there is a way to script or export the functions but I fear a problem I will run into is one variable in one function might be declared as TransItemDescription NVARCHAR(100) and one might be TransItemDesc NVARCHAR (100). I've heard of people avoiding temp tables maybe because of these issues so maybe there is just bad database design here?
Thus far I've been going through them one at a time using "View Dependencies" in SSMS.
I think the best solution would be to script out the whole database into a single script from SSMS. Then use Notepad++ (or equivalent) to either find:
All occurrences of NVARCHAR(100)
All occurrences of the variable name, e.g. TransItemDescription, TransItemDesc.
Once you have found all occurrences then make a list of all of the functions to be fixed. Then you would still need to do a manual fix to all functions, but once complete the issue should be totally resolved.
I'm following a tutorial on Azure Data Factory migration from Azure SQL to Blob through pipelines. While most of the concepts make sense, the 'Copy Data' query is a bit confusing. I have a background in writing Oracle SQL, but Azure SQL on ADF is pretty different and I'm struggling to find specific technical documentation, probably because it's not widely adopted yet.
Pipeline configuration shown below:
Query is posted below:
SELECT data_source_table.PersonID,data_source_table.Name,data_source_table.Age,
CT.SYS_CHANGE_VERSION, SYS_CHANGE_OPERATION
FROM data_source_table
RIGHT OUTER JOIN CHANGETABLE(CHANGES data_source_table,
#{activity('LookupLastChangeTrackingVersionActivity').output.firstRow.SYS_CHANGE_VERSION})
AS CT ON data_source_table.PersonID = CT.PersonID
WHERE CT.SYS_CHANGE_VERSION <=
#{activity('LookupCurrentChangeTrackingVersionActivity').output.firstRow.CurrentChangeTrackingVersion}
Output to the sink Blob as a result of the 'Copy Data' query:
2,name2,14,4,U
7,name7,51,3,I
8,name8,31,5,I
9,name9,38,6,I
Couple questions I had:
There's a lot of external referencing from other activities in the 'Copy Data' query like #{activity('...').output.firstRow.CurrentChangeTrackingVersion. Is there a way to know the appropriate syntax to referencing external activities? Can't find any good documentation the syntax, like what .firstRow is or what the changetable output looks like. I can't replicate this query in SSMS, which makes it a bit of a black box for me.
SYS_CHANGE_OPERATION appears in the SELECT with no table name prefix. Is this directly querying from the table in SourceDataset? (It points to data_source_table, which has table tracking enabled) My main confusion stems from how table tracking information is stored in the enabled tables. Is there a way to show all the table's tracked changes in SSMS? I see some documentation on what the return values, but it's hard for me to visualize it without seeing it on the table, so an output query of some return values would be nice.
LookupLastChangeTracking activity queries in all rows from a table (which when I checked, is just one row), but LookupCurrentChangeTracking activity uses a CHANGE_TRACKING function to pull the version of the data sink in table_store_ChangeTracking_version. Why does it use a function when the data sink's version is already recorded in table_store_ChangeTracking_version?
Sorry for the many questions, but I can't find any way to make this learning curve a bit less steep. Any guides or resources would be awesome!
There is an article to get the same thing done from the UI and it will help you understand it better .
https://learn.microsoft.com/en-us/azure/data-factory/tutorial-incremental-copy-change-tracking-feature-portal .
1 . These are the Lookup activity ,. very straight forward , please read about them here .
https://learn.microsoft.com/en-us/azure/data-factory/control-flow-lookup-activity
2.SYS_CHANGE_OPERATION is a column on data_source_table and so that should be fine . Regarding the details on the how the change tracking (CT) is stored , I am not sure if all the system table are exposed on Azure SQL , but we did had few table on the on-prem version of the SQL which could be queried if needed . But for this exercise I think that will be an over kill .
I'm working on a data conversion utility which can push data from one master database out to a number of different databases. The utility its self will have no knowledge of how data is kept in the destination (table structure), but I would like to provide writing a SQL statement to return data from the destination using a complex SQL query with multiple join statements. As long as the data is in a standardized format that the utility can recognize (field names) in an ADO query.
What I would like to do is then modify the live data in this ADO Query. However, since there are multiple join statements, I'm not sure if it's possible to do this. I know at least with BDE (I've never used BDE), it was very strict and you had to return all fields (*) and such. ADO I know is more flexible, but I don't know quite how flexible in this case.
Is it supposed to be possible to modify data in a TADOQuery in this manner, when the results include fields from different tables? And even if so, suppose I want to append a new record to the end (TADOQuery.Append). Would it append to two different tables?
The actual primary table I'm selecting from has a complimentary table which is joined by the same primary key field, one is a "Small" table (brief info) and the other is a "Detail" table (more info for each record in Small table). So, a typical statement would include something like this:
select ts.record_uid, ts.SomeField, td.SomeOtherField from table_small ts
join table_detail td on td.record_uid = ts.record_uid
There are also a number of other joins to records in other tables, but I'm not worried about appending to those ones. I'm only worried about appending to the "Small" and "Detail" tables - at the same time.
Is such a thing possible in an ADO Query? I'm willing to tweak and modify the SQL statement in any way necessary to make this possible. I have a bad feeling though that it's not possible.
Compatibility:
SQL Server 2000 through 2008 R2
Delphi XE2
Editing these Fields which have no influence on the joins is usually no problem.
Appending is ... you can limit the Append to one of the Tables by
procedure TForm.ADSBeforePost(DataSet: TDataSet);
begin
inherited;
TCustomADODataSet(DataSet).Properties['Unique Table'].Value := 'table_small';
end;
but without an Requery you won't get much further.
The better way will be setting Values by Procedure e.g. in BeforePost, Requery and Abort.
If your View would be persistent you would be able to use INSTEAD OF Triggers
Jerry,
I encountered the same problem on FireBird, and from experience I can tell you that it can be made(up to a small complexity) by using CachedUpdates . A very good resource is this one - http://podgoretsky.com/ftp/Docs/Delphi/D5/dg/11_cache.html. This article has the answers to all your questions.
I have abandoned the original idea of live ADO query updates, as it has become more complex than I can wrap my head around. The scope of the data push project has changed, and therefore this is no longer an issue for me, however still an interesting subject to know.
The new structure of the application consists of attaching multiple "Field Links" on various fields from the original set of data. Each of these links references the original field name and a SQL Statement which is to be executed when that field is being imported. Multiple field links can be on one single field, therefore can execute multiple statements, placing the value in various tables, etc. The end goal was an app which I can easily and repeatedly export a common dataset from an original source to any outside source with different data structures, without having to recompile the app.
However the concept of cached updates was not appealing to me, simply for the fact pointed out in the link in RBA's answer that data can be changed in the database in the mean-time. So I will instead integrate my own method of customizable data pushes.
I have a table with about 30 fields. I current have several stored procedures which access either a (aggregated) view of this table or the table itself. For many of these SPs I would like to assure that the returned records have all the same fields with the same column names. Is there a way to do this where I don't have to change 20 stored procs if I do need to change the output.
My way around it thus far is to provide clients with lists of ID which they then call SP's that return the data however this seems to be slow compared with getting the data in one shot. I have also considered using the formatting stored procs with a cursor from inside the search stored procs but was unsure if that would really buy me a whole lot.
The typical way to define a standardised and consistent data access method across multiple stored procedures in SQL Server to use Views.
Now your problem description seems to suggest that you are already using Views in order to manage your data access. If you are indeed unable to use Views for a specific reason, perhaps you can clarify the nature of your problem further for us.
When using multivalue parameters in sql reporting services is it more appropriate to implement the list filter using a filter on the dataset itself, the data region control or change the actual query that drives the dataset?
SSRS will support any scenario, so then I ask, is there a reason beyond the obvious of why this should be done at one level over another?
It makes sense to me that modifying the query itself and asking the RDBMS to handle the filtering would be most efficient but maybe I am missing something with respect to how the SSRS Data Processing Extension may handle this scenario?
You are correct. The way to go is to pass the parameters through to the database engine.
Reporting Services should only be ideally used to render content. The less data that you need to pass back to the client web browser, the faster the report will render.
You may find my answer to a similar post regarding using mulit-value parameters to be of use.
Passing multiple values for a single parameter in Reporting Services
Hope this helps but please feel free to pose any further questions you may have.
Cheers,
John
Using table-valued UDF is a good approach, but there is still one issue - in case if this function is called in many places of query, and even inside inner select, there can be performance problem. You can resolve this issue using table variable (or temp table eather):
DECLARE #Param (Value INT)
INSERT INTO #Param (Value)
SELECT Param FROM dbo.fn_MVParam(#sParameterString,',')
...
where someColumn IN(SELECT Value FROM #Param)
so function will be called only once.
Othe thing, if you don't use stored procedure, but embedded SQL query instead, you can just put MVP into query:
...
where someColumn IN(#Param)
...
Use the RDBMS to do the main filtering
SSRS provides filtering for the purposes on data driven display and/or dynamic display. Especially useful for sub reports etc