How to roughly and automatically translate the contents of a database once?

How to roughly and automatically translate the contents of a database once? - database

Let's say there is a very large SQL database - hundreds of tables, thousands of fields, millions of records. This database is in Dutch. I would like to translate all the values of certain fields for test purposes in English. It doesn't have to be completely correct, it has to be readable for the testers.
I know that most of the texts are stored in fields, named "name" and "description" throughout the database. Or basically all fields with type NVARCHAR(any length) in all tables would be the candidates for translation. Enumerating all the tables and fields is really too much work, and I would like to avoid it, so translating only these fields would be enough.
Is there a way to walk through all the tables and for all the records retrieve the values from the specific fields and replace them with their English translations? Can this be done with SQL only?
The db server doesn't matter - I can mount the database on MSSQL, Oracle or any server of a choice. Translating the texts using google or some other automatic tool is good enough for the purpose. Of course this tool must have an API in order to be used automatically.
Does anybody have an experience with similar operations?

In Pseudocode, here is what I would do, using Oracle :
Query AllFieldsQuery = new Query(connection,"Select table_name, column_name
from user_tab_columns where column_name='name' OR column_name='description'");
AllFieldsQuery.ExecuteReader();
It gives you something like this :
TABLE_NAME | COLUMN_NAME
Table1 | Name
Table2 | Name
Table2 | Description
........... | ...........
Foreach TableColumnLine
Query FieldQuery = new Query("Select DISTINCT "+COLUMN_NAME+" AS ToTranslate
from +"TABLE_NAME);
Create a new parametrized query :
MyParamQuery = new ParamQuery(connection, UPDATE TABLE_NAME SET COLUMN_NAME =
#Translated WHERE COLUMN_NAME = #ToTranslate);
Foreach ToTranslateLine
#Translated = GetTranslationFromGoogle(#ToTranslate);
MyParamQuery.Execute(ToTranslate = #ToTranslate, Translated = #Translated);
End Foreach ToTranslateLine
End Foreach TableColumnLine
The AllFieldsQuery gives you all the fields in all the tables you need to update.
The FieldQuery gives you all the texts you need to translate.
The MyParamQuery updates each one of these texts.
This may be a very long process, it depends also on the time you spend to get the translation. What I would recommend is to do at least a commit by table, and print a report to tell you for each table if everything went ok, because you could in your API have an exclude function that excludes fiels and/or tables from the first query because they were already translated before.
.
Enhancement : Writing a File and executing it at the end of the translation instead of executing sql updates.
If it's too long, and it will possibly be too long, instead of executing the sql UPDATES directly, you'd rather write a file with all your updates, and after you foreach, go get the file and use it in SQL.
.
Enhancement : Multi Threaded Version
A multi threaded version of your API could let you win some translation time, you put all your results to translate in a synchronized queue, and you use a pool of threads to translate the queue, and put all the translated messages in another synchronized Queue, which is used to write the SQL output file or launching the updates.

Related

Get audit history records of any entity record as per CRM view

I want to display all audit history data as per MS CRM format.
I have imported all records from AuditBase table from CRM to another Database server table.
I want this table records using SQL query in Dynamics CRM format (as per above image).
I have done so far
select
AB.CreatedOn as [Created On],SUB.FullName [Changed By],
Value as Event,ab.AttributeMask [Changed Field],
AB.changeData [Old Value],'' [New Value] from Auditbase AB
inner join StringMap SM on SM.AttributeValue=AB.Action and SM.AttributeName='action'
inner join SystemUserBase SUB on SUB.SystemUserId=AB.UserId
--inner join MetadataSchema.Attribute ar on ab.AttributeMask = ar.ColumnNumber
--INNER JOIN MetadataSchema.Entity en ON ar.EntityId = en.EntityId and en.ObjectTypeCode=AB.ObjectTypeCode
--inner join Contact C on C.ContactId=AB.ObjectId
where objectid='00000000-0000-0000-000-000000000000'
Order by AB.CreatedOn desc
My problem is AttributeMask is a comma separated value that i need to compare with MetadataSchema.Attribute table's columnnumber field. And how to get New value from that entity.
I have already checked this link : Sql query to get data from audit history for opportunity entity, but its not giving me the [New Value].
NOTE : I can not use "RetrieveRecordChangeHistoryResponse", because i need to show these data in external webpage from sql table(Not CRM database).

Well, basically Dynamics CRM does not create this Audit View (the way you see it in CRM) using SQL query, so if you succeed in doing it, Microsoft will probably buy it from you as it would be much faster than the way it's currently handled :)
But really - the way it works currently, SQL is used only for obtaining all relevant Audit view records (without any matching with attributes metadata or whatever) and then, all the parsing and matching with metadata is done in .NET application. The logic is quite complex and there are so many different cases to handle, that I believe that recreating this in SQL would require not just some simple "select" query, but in fact some really complex procedure (and still that might be not enough, because not everything in CRM is kept in database, some things are simply compiled into the libraries of application) and weeks or maybe even months for one person to accomplish (of course that's my opinion, maybe some T-SQL guru will prove me wrong).
So, I would do it differently - use RetrieveRecordChangeHistoryRequest (which was already mentioned in some answers) to get all the Audit Details (already parsed and ready to use) using some kind of .NET application (probably running periodically, or maybe triggered by a plugin in CRM etc.) and put them in some Database in user-friendly format. You can then consume this database with whatever external application you want.
Also I don't understand your comment:
I can not use "RetrieveRecordChangeHistoryResponse", because i need to
show these data in external webpage from sql table(Not CRM database)
What kind of application cannot call external service (you can create a custom service, don't have to use CRM service) to get some data, but can access external database? You should not read from the db directly, better approach would be to prepare a web service returning the audit you want (using CRM SDK under the hood) and calling this service by external application. Unless of course your external app is only capable of reading databases, not running any custom web services...

It is not possible to reconstruct a complete audit history from the AuditBase tables alone. For the current values you still need the tables that are being audited.
The queries you would need to construct are complex and writing them may be avoided in case the RetrieveRecordChangeHistoryRequest is a suitable option as well.
(See also How to get audit record details using FetchXML on SO.)
NOTE
This answer was submitted before the original question was extended stating that the RetrieveRecordChangeHistoryRequest cannot be used.

As I said in comments, Audit table will have old value & new value, but not current value. Current value will be pushed as new value when next update happens.
In your OP query, ab.AttributeMask will return comma "," separated values and AB.changeData will return tilde "~" separated values. Read more
I assume you are fine with "~" separated values as Old Value column, want to show current values of fields in New Value column. This is not going to work when multiple fields are enabled for audit. You have to split the Attribute mask field value into CRM fields from AttributeView using ColumnNumber & get the required result.
I would recommend the below reference blog to start with, once you get the expected result, you can pull the current field value using extra query either in SQL or using C# in front end. But you should concatenate again with "~" for values to maintain the format.
https://marcuscrast.wordpress.com/2012/01/14/dynamics-crm-2011-audit-report-in-ssrs/
Update:
From the above blog, you can tweak the SP query with your fields, then convert the last select statement to 'select into' to create a new table for your storage.
Modify the Stored procedure to fetch the delta based on last run. Configure the sql job & schedule to run every day or so, to populate the table.
Then select & display the data as the way you want. I did the same in PowerBI under 3 days.
Pros/Cons: Obviously this requirement is for reporting purpose. Globally reporting requirements will be mirroring database by replication or other means and won't be interrupting Prod users & Async server by injecting plugins or any On demand Adhoc service calls. Moreover you have access to database & not CRM online. Better not to reinvent the wheel & take forward the available solution. This is my humble opinion & based on a Microsoft internal project implementation.

Creating relationship Neo4J during import if value is 1

Hello all you helpful folks!
I have been tasked with converting our RDBM into a Graph database for testing purposes. I am using Neo4J and have been successful on importing various tables into their appropriate nodes. However, I have run into a slight hiccup when it comes to the department node. Certain department are partnered with a particular department. Within the RDBMS model, this is simple a column named: Is_Partner because this database was originally set up with one partner in mind (Hence the whole: Moving to a Graph database thing).
What I need to do is match all department with the Is_Partner value of 1 and assign a relationship to from the partner who has the value of 1 in Is_Partner and assign it to a specific partner (Edge: ABBR, Value: HR). I have written the script, but it tells me it's successful, but 0 edits are made...
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:///Department.csv" AS row
MATCH (partner:Department {DepartmentID: row.DepartmentID})
WHERE row.IS_PARTNER = "1"
MERGE (partner)-[:IS_PARTNER_OF]->(Department{ABBR: 'HR'});
I'm pretty new to Graph Databases, but I know Relational Databases quite well. Any help would be appreciated.
Thank you for your time,
Jim Perry

There are a few problems with your query. If you want to filter on CSV use WITH statement with a WHERE filter. Also you want to MERGE HR department node separately and then MERGE relationship separately.
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:///Department.csv" AS row
WITH row WHERE row.IS_PARTNER = "1"
MATCH (partner:Department {DepartmentID: row.DepartmentID})
MERGE (dept:Department{ABBR: 'HR'}))
MERGE (partner)-[:IS_PARTNER_OF]->(dept);
If it still return no results/changes, check out if your MATCH statement return anything as this is usually the problem.
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:///Department.csv" AS row
WITH row WHERE row.IS_PARTNER = "1"
MATCH (partner:Department {DepartmentID: row.DepartmentID})
RETURN partner

ADO - Can I edit results of a complex query with multiple join statements?

I'm working on a data conversion utility which can push data from one master database out to a number of different databases. The utility its self will have no knowledge of how data is kept in the destination (table structure), but I would like to provide writing a SQL statement to return data from the destination using a complex SQL query with multiple join statements. As long as the data is in a standardized format that the utility can recognize (field names) in an ADO query.
What I would like to do is then modify the live data in this ADO Query. However, since there are multiple join statements, I'm not sure if it's possible to do this. I know at least with BDE (I've never used BDE), it was very strict and you had to return all fields (*) and such. ADO I know is more flexible, but I don't know quite how flexible in this case.
Is it supposed to be possible to modify data in a TADOQuery in this manner, when the results include fields from different tables? And even if so, suppose I want to append a new record to the end (TADOQuery.Append). Would it append to two different tables?
The actual primary table I'm selecting from has a complimentary table which is joined by the same primary key field, one is a "Small" table (brief info) and the other is a "Detail" table (more info for each record in Small table). So, a typical statement would include something like this:
select ts.record_uid, ts.SomeField, td.SomeOtherField from table_small ts
join table_detail td on td.record_uid = ts.record_uid
There are also a number of other joins to records in other tables, but I'm not worried about appending to those ones. I'm only worried about appending to the "Small" and "Detail" tables - at the same time.
Is such a thing possible in an ADO Query? I'm willing to tweak and modify the SQL statement in any way necessary to make this possible. I have a bad feeling though that it's not possible.
Compatibility:
SQL Server 2000 through 2008 R2
Delphi XE2

Editing these Fields which have no influence on the joins is usually no problem.
Appending is ... you can limit the Append to one of the Tables by
procedure TForm.ADSBeforePost(DataSet: TDataSet);
begin
inherited;
TCustomADODataSet(DataSet).Properties['Unique Table'].Value := 'table_small';
end;
but without an Requery you won't get much further.
The better way will be setting Values by Procedure e.g. in BeforePost, Requery and Abort.
If your View would be persistent you would be able to use INSTEAD OF Triggers

Jerry,
I encountered the same problem on FireBird, and from experience I can tell you that it can be made(up to a small complexity) by using CachedUpdates . A very good resource is this one - http://podgoretsky.com/ftp/Docs/Delphi/D5/dg/11_cache.html. This article has the answers to all your questions.

I have abandoned the original idea of live ADO query updates, as it has become more complex than I can wrap my head around. The scope of the data push project has changed, and therefore this is no longer an issue for me, however still an interesting subject to know.
The new structure of the application consists of attaching multiple "Field Links" on various fields from the original set of data. Each of these links references the original field name and a SQL Statement which is to be executed when that field is being imported. Multiple field links can be on one single field, therefore can execute multiple statements, placing the value in various tables, etc. The end goal was an app which I can easily and repeatedly export a common dataset from an original source to any outside source with different data structures, without having to recompile the app.
However the concept of cached updates was not appealing to me, simply for the fact pointed out in the link in RBA's answer that data can be changed in the database in the mean-time. So I will instead integrate my own method of customizable data pushes.

How can I call a stored procedure without table names in HQL?

I am trying to fetch the current timestamp through a stored procedure in HQL. This means my code looks something like the following:
var currentTimestamp =
session.CreateQuery("SELECT CURRENT_TIMESTAMP()")
.UniqueResult<DateTime>();
This doesn't work. Specifically, it throws a System.NullReferenceException deep inside of the NHibernate HqlParser.cs file. I played around with this a bit, and got the following to work instead:
var currentTimestamp =
session.CreateQuery("SELECT CURRENT_TIMESTAMP() FROM Contact")
.SetMaxResults(1)
.UniqueResult<DateTime>();
Now I have the data I want, but an HQL query I don't. I want the query to represent the question I'm asking -- like my original format.
An obvious question here is "Why are you using HQL?" I know I can easily do with this session.CreateSQLQuery(...), hitting our MySQL 5.1 database directly. This is simply an example of my core problem, with the root being that I'm using custom parameter-less HQL functions throughout my code base, and I want to have integration tests that run these HQL parameter-less functions in as much isolation as possible.
My hack also has some serious assumptions baked in. It will not return a result, for example, if there are no records in the Contact table, or if the Contact table ceases to exist.

The method to retrieve CURRENT_TIMESTAMP() (or any other database function) outside of the context of a database table varies from database to database - some allow it, some do not. Oracle, for example, does not allow a select without a table, so they provide a system table with a single row which is called DUAL.
I suspect that NHibernate is implementing in HQL primarily features which are common across all database implementations, and thus does not implement a table-less select.
I can suggest three approaches:
Create a view that hides the table-less select such as 'create view dtm as select current_timestamp() as datetime'
Follow the Oracle approach and create a utility table with a single row in it that you can use as the table in a select
Create a simple stored procedure which only executes 'select current_timestamp()'

Insert data from another DB in tables

I'm having some issue here. Let me explain.
So I was about done with migration of this project and I've decided to run the test suite to make sure the logic was still working as expected. Unfortunately, it didn't... but that's not the issue.
At the end of the suite, there was a nice script that execute a delete on the datas of 5 tables of our developement database. That would be fine if there was also a script to actually populate the database...
The good side is that we still have plenty of data in production environement, so I'm looking for a way and/or possibly a tool to extract the data on these 5 particular tables in production and insert them in dev environement. There is all sort of primary and foreign key between these tables, maybe auto-increment fields, (and also A LOT of data) that's why I don't want to do it manually.
Our database is db2 v9 if it makes any difference. I'm also working with SQuirreL, there might be a plugin, but I haven't found yet.
Thanks

This is sort of a shot in the dark, as I've never used db2, but from previous experience, my intuition immidiately says "Try csv". I'm willing to bet my grandmother you can import / export csv-files in your software ( why did i just start thinking of George from Seinfeld? )
This should also leave you with FKs and IDs intact. You might have to reset your auto increment value to whatever is appropriate, if need be. That, of course, would be done after the import
In addittion, csv files are plaintext and very easily manipulated should any quirks show their head.
Best of luck to you!

Building on Arve's answer, DB2 has a built-in command for importing CSV files:
IMPORT FROM 'my_csv_file.csv'
OF del
INSERT INTO my_table
You can specify a list of columns if they are not in the default order:
IMPORT FROM 'my_csv_file.csv'
OF del
-- 1st, 2nd, 3rd column in CSV
METHOD P(1, 2, 3)
INSERT INTO my_table
(foo_col, bar_col, baz_col)
And you can also specify a different delimiter if it's not comma-delimited. For example, the following specifies a file delimited by |:
IMPORT FROM 'my_csv_file.csv'
OF del
MODIFIED BY COLDEL|
-- 1st, 2nd, 3rd column in CSV
METHOD P(1, 2, 3)
INSERT INTO my_table
(foo_col, bar_col, baz_col)
There are a lot more options. The official documentation is a bit hairy:
DB2 Info Center | IMPORT command

Do you have access to the emulator? there's a function in the emulator that allows you to import CSV into tables directly.

Frank.
Personally, I am not aware of any automated tools that can "capture" a smaller subset of your production data into a test suite, but in my day, I was able to use QMF and some generic queries to do just that. It does require forward planning / analysis of your table structures, parent-child dependencies, referential integrity and other things.
It did take some initial work to do, but once it was done, I was able to use, and re-use these tools to extract several different views of production data for my testing purposes.
If this appeals to you, read on.
On a high-level view, you could do this:
Determine what the key column names are.
Create a "keys" table for them.
Write several queries to look for your test conditions and populate the keys_table.
Once you are satisfied that keys_table has a satisfactory subset of keys, then you can use your created tools to strip out the data for you.
Write a generic query that joins the keys_table with that of your production tables and export the data into flat files.
Write a proc to do all the extractions / populations for you automatically.
If you have access to QMF (and you probably do in a DB2 shop), you may be able to do something like this:
Determine all of the tables that you need.
Determine the primary indexes for those tables.
Determine any referential integrity requirements for those tables.
Determine Parent - Child relationships between all the tables.
For the lowest level child table (typically the one with most indexes) note all the columns used to identify a unique key.
With the above information, you can create a generic query to strip out a smaller subsection of production data, for #5. In other words, you can create a series of specific queries and populate a small Key table that you create.
In QMF, you can create a generic query like this:
select t.*
from &t_tbl t
, &k_tbl k
where &cond
order by 1, 2, 3
In the proc, you simply pass the tablename, keys, and condtions variables. Once the data is captured, you EXPORT the data into some filename.
You can create an EXPORT_TABLE proc would look something like this:
run query1 (&&t_tbl = students_table , &&k_tbl = my_test_keys ,
+ &&cond = (t.stud_id = k.stud_id and t.course_id = k.course_id)
export data to studenttable
run query1 (&&t_tbl = course_table , &&k_tbl = my_test_keys ,
+ &&cond = (t.cour_id = k.cour_id
+ (and t.cour_dt between 2009-01-01 and 2010-02-02)
export data to coursetable
.....
This could capture all the data as needed.
You can then create an IMPORT_TEST proc to do the opposite:
import data from studenttable
save data as student_table (replace = yes
import data from coursetable
save data as course_table (replace = yes
....
It may take a while to create, but at least you would then have a re-useable tool to extract your data.
Hope that helps.