I have text data, and want to get it into AWS - sql-server

I have what is essentially a traditional relational database, consisting of four tables, all related with IDs. Currently this database resides in four tab-delimited text files, in an S3 bucket. Very little, if any, data will ever be added to these tables. It is an unchanging reference database. So it will be exclusively read from, never added to or edited.
I would like to access this database in an Alexa skill. I've built a few skills already, using NodeJS, so I know how that all works. But I'm anxious to learn how to link up a skill with a back-end DB. This skill will need to do SQL SELECT statements against this DB, based-on user-provided parameters, and based on the query filter be able to pull a set of records into an array that can be used by my skill's lambda function.
Each of the current text files holds one of four tables. The largest table is about 35k rows. Whole DB is maybe 5 Mb, 90% of which is one of the four. Like I said, they are all connected with ID columns like a traditional RDBMS. This will not be for commercial purposes. Probably.
I am already familiar with SQL Server, it's the DB I know, and I'm comfortable with SQL Server Express and can whip something up there, but I'm open to learning NoSQL or some other method if it's more appropriate for this use case. And as this is mostly a learning exercise, if something is "just as good", it's good for me to know.
What is my best DB solution?
* NoSQL such as DynamoDB?
* Some sort of MySQL?
* SQL Server?
* Leave them as tab-delimited text and use them from the Lambda function directly?
Thanks, I don't want to start down the wrong road here.

A few options...
S3 Select
S3 Select (in Preview at the time of writing this) "enables applications to retrieve only a subset of data from an object by using simple SQL expressions. By using S3 Select to retrieve only the data needed by your application, you can achieve drastic performance increases – in many cases you can get as much as a 400% improvement."
DynamoDB
The benefit of using DynamoDB is that there is no need to run a database server -- it is a fully-managed service. While it doesn't support SQL syntax, it is very fast and can suit many use-cases.
In fact, most projects should consider using a NoSQL database like DynamoDB for every situation, unless there is a particular reason to use SQL (such as business reporting).
Cost is based upon storage and provisioned capacity (which can scale-up and down based on demand).
SQL Database
Yes, you can certainly run an SQL database, either through Amazon RDS (Relational Database Service) or on your own EC2 instance (eg MySQL or even Apache Derby. However, you are then paying for the server even when it isn't being used.
Using Microsoft SQL Server is probably too much for your use-case (and more expensive than using an open-source product).
I wonder if you could incorporate SQLite in your app, which would provide SQL capabilities without much overhead?
Do it in memory
5 MB is, quite frankly, not much data. You could simply load all the data into memory and do your manipulations from there. While the load might consume a few cycles, data access will be very quick after that.

Related

Options for loading a bunch of disparate data sets into a 'target' schema

Background
5-10 data sources
Various formats (csv, psv, xml)
Different update schedules (weekly, monthly, quarterly)
Requirements
Only interested in some of the fields from each data source
Want to build a model from the various sources, into a single database (SQL Server)
Current platform/skillset
Azure
SQL Server
Considerations
Minimal code. Hopefully i can do this all via a UI/drag-drop interface.
Automation. Hoping i can drop the files onto a server when it needs to be updated, then "things" kick off (Azure Functions blob/FTP trigger?)
Questions
I haven't done much in the ETL space, but my initial thoughts point to something like SQL Server Integration Services, mainly because that's the only thing i can ever had experience in, ETL-wise.
Now that we have things like Azure Data Factory, SQL Data Warehouse, etc, would that be a better solution? Obviously the answer is "it depends", so what questions do i need to go about asking myself in order to clarify that? Can someone please point me to a good article to get started in this space?
TIA
The main question is where do you want to stage the data.
Many people are talking about Azure Data Lake as a staging area. There are pros and cons to this solution.
The pros are Azure Active Directory Service can be federated with your on premise forest. Once that is done, regular Access Control List can be used to restrict access.
The cons are the fact that you are using premium storage (SSD) which can cost a-lot of money for a small to medium size company.
On the other hand, Azure Blob Storage has been around for a long time. One of the pros is the cost of this storage. A shared access signature (SAS) can be used to let anyone access to the account.
The cons is that the SAS is the key to the whole kingdom. Unlike ADLS, you can not assign privledges at the file.
If you like SQL Server OpenRowSet or Bulk Insert, you are in for a treat. Support for those functions were added earlier this year.
Check out my article on MS SQL TIPS for the details.
As for scheduling, you can use a very simple Power Shell script in Azure Automation to create a hands off process.
Azure Data Factory might be able to do some of these tasks; However, you adding a-lot more complexity than a simple T-SQL statement to load data into a table.
Last but not least, learn to love PowerShell. You can pretty much do any type of file processing with that language and the right .NET components.
Happy coding.
John Miner
The Crafty DBA

SQL Server move data between databases

We have a requirement where we will have to move data between different database instance on regular basis. (For e.g. some customers willing to pay more for the better performance). So this is not going to be one off.
The database tables has referential integrity. Is there a way in which this can be done without rewriting sql script (or some other method) every time we migrate customers data?
I came across this How to move data between multiple database's table while maintaining foreign-key relationships/referential integrity?. However it appears that we have write script every time we migrate data (please correct me if I misunderstood the answer on this thread).
Thanks
Edit:
Both servers are using SQL Server 2012 (same version). Its an Azure SQL Server database.
They are not necessarily linked (no firewall between them)
We are only transferring some data, not the whole database. This is only for certain customers who opted pay more.
The schema are exactly same in both databases.
Preyash - please see the documentation on the Split-Merge tool. The Split-Merge tool enables you do move data between databases, as you have described, based on a sharding key (e.g., customer ID). One modification that you will need for your application is to add a shard map (i.e., a database that understand the global state of which customers resides in which databases).
Have a look into Azure Data Sync. It is much more aligned with your requirements. But you may end up in having another SQL Azure DB to maintain a Hub. Azure data Sync follows hub-spoke pattern and will let you do all flexible directional syncs with a few minutes of syncing gap. It is more simple and can set it up very fast without any scripts and all as you wanted.

what is recommended for a wpf app, sql database or xml

I want to build a wpf app (with mvvm dp) and I need to manage a list of customers (200 customers maximum)
What is the recommended way to handle this? sql (mySql, sqlServer) or other way (xml, excel, access)?
I can see advantage in xml because its not required addition installation
If you've got multiple users on different machines, and those machines have a persistent, shared network connection (they're on the same network, for instance), then you should prefer to use a central database, like (as you mention) SQL Server or MySQL. The latter is free, and the former is available in a free edition (Express) for light workloads.
If you don't need to share data between the different users of your application (you don't mind if their customer data get out of sync), then look at lightweight, embedded databases, like SQLite or SQL Server Compact Edition. Very simple and easy to set up and maintain, but the data stays present on a single machine.
You could build an infrastructure around manual synchronization between the different machines, sort of like how iTunes synchronizes data between your computer and iPhone whenever you plug the latter in, but trust me, that's a lot of work that's best avoided if possible. Databases have a lot of built-in multiple-users-doing-stuff-at-the-same-time logic that you want to leverage if at all possible.
If the only thing you need to persist is a list of 200the items and you don't see there as being any other persistent data requirements in the future, I would say that a flat file is plenty. However, this also depends on how often that small data will be retrieved.
I am a huge proponent of SQL Server, and Express is free. But you don't want to overengineer the problem. Unless I misunderstood your question, that is my evaluation.

How to eager load entire database with EF

My database consists of 5 tables with ~10000 rows combined. It takes ~1Mb in SQL Server CE which is on shared folder. The database itself is hierarchical Country-Region-City-Street-Building. I am using Entity Framework 4.
Because the database is small users are able to explore and edit all 2000 Cities in a WPF ListView. But with every approach I tried so far the GUI is sluggish (because of many database round-trips, with dummy data GUI is lightfast). How can I load entire database into memory with one or few database round-trips?
I tried multiple Include() but I noted great performance penalty as described here
Should I write my own ORM-light? I could also use plain ascii CSV files instead of database but it would obviously exclude concurrency.
Honestly, I've done something like this myself, and the answer for me was to copy the whole database locally and work on it.
If you're looking not only to read but also to write, I'd definitely suggest ditching CE and installing one of the Express versions of Sql Server. They are designed for this kind of situation; CE is not*.
*SP1 is better for concurrent access, but over the network will never be performant for large datasets.
I re-asked this question on Microsoft forum and they were kind to give me some guidance:
Basically my question can be restated as following:
read only once from database on application start
do all subsequent queries from local data, not database (for performance)
write to both context and database each time an entity is being added or deleted.
With plain EF it is not possible because each query goes to the database. This implies that I must read data fast on start and then cache it.
Implementation details:
The best way seems to be using ESQL to import data fast and then cache it, for example using entities not connected to context. From my first experiments it seems to work well.

MS Access Application - Convert data storage from Access to SQL Server

Bear in mind here, I am not an Access guru. I am proficient with SQL Server and .Net framework. Here is my situation:
A very large MS Access 2007 application was built for my company by a contractor.
The application has been split into two tiers BY ACCESS; there is a front end portion that holds all of the Ms Access forms, and then on the back end part, which are access tables, queries, etc., that is stored on a computer on the network.
Well, of course, there is a need to convert the data storage portion to SQL Server 2005 while keeping all of these GUI forms which were built in Ms Access. This is where I come in.
I have read a little, and have found that you can link the forms or maybe even the access tables to SQL Server tables, but I am still very unsure on what exactly can be done and how to do it.
Has anyone done this? Please comment on any capabilities, limitations, considerations about such an undertaking. Thanks!
Do not use the upsizing wizard from Access:
First, it won't work with SQL Server 2008.
Second, there is a much better tool for the job:
SSMA, the SQL Server Migration Assistant for Access which is provided for free by Microsoft.
It will do a lot for you:
move your data from Access to SQL Server
automatically link the tables back into Access
give you lots of information about potential issues due to differences in the two databases
keeps track of the changes so you can keep the two synchronised over time until your migration is complete.
I wrote a blog entry about it recently.
You have a couple of options, the upsizing wizard does a decent(ish) job of moving structure and data from access to Sql. You can then setup linked tables so your application 'should' work pretty much as it does now. Unfortunately the Sql dialect used by Access is different from Sql Server, so if there are any 'raw sql' statements in the code they may need to be changed.
As you've linked to tables though all the other features of Access, the QBE, forms and so on should work as expected. That's the simplest and probably best approach.
Another way of approaching the issue would be to migrate the data as above, and then rather than using linked tables, make use of ADO from within access. That approach is kind of famaliar if you're used to other languages/dev environments, but it's the wrong approach. Access comes with loads of built in stuff that makes working with data really easy, if you go back to use ADO/Sql you then lose many of those benefits.
I suggest start on a small part of the application - non essential data, and migrate a few tables and see how it goes. Of course you back everything up first.
Good luck
Others have suggested upsizing the Jet back end to SQL Server and linking via ODBC. In an ideal world, the app will work beautifully without needing to change anything.
In the real world, you'll find that some of your front-end objects that were engineered to be efficient and fast with a Jet back end don't actually work very well with a server database. Sometimes Jet guesses wrong and sends something really inefficient to the server. This is particular the case with mass updates of records -- in order not to hog server resources (a good thing), Jet will send a single UPDATE statement for each record (which is a bad thing for your app, since it's much, much slower than a single UPDATE statement).
What you have to do is evaluate everything in your app after you've upsized it and where there are performance problems, move some of the logic to the server. This means you may create a few server-side views, or you may use passthrough queries (to hand off the whole SQL statement to SQL Server and not letting Jet worry about it), or you may need to create stored procedures on the server (especially for update operations).
But in general, it's actually quite safe to assume that most of it will work fine without change. It likely won't be as fast as the old Access/Jet app, but that's where you can use SQL Profiler to figure out what the holdup is and re-architect things to be more efficient with the SQL Server back end.
If the Access app was already efficiently designed (e.g., forms are never bound to full tables, but instead to recordsources with restrictive WHERE clauses returning only 1 or a few records), then it will likely work pretty well. On the other hand, if it uses a lot of the bad practices seen in the Access sample databases and templates, you could run into huge problems.
It's my opinion that every Access/Jet app should be designed from the beginning with the idea that someday it will be upsized to use a server back end. This means that the Access/Jet app will actually be quite efficient and speedy, but also that when you do upsize, it will cause a minimum of pain.
This is your lowest-cost option. You're going to want to set up an ODBC connection for your Access clients pointing to your SQL Server. You can then use the (I think) "Import" option to "link" a table to the SQL Server via the ODBC source. Migrate your data from the Access tables to SQL Server, and you have your data on SQL Server in a form you can manage and back up. Important, queries can then be written on SQL Server as views and presented to the Access db as linked tables as well.
Linked Access tables work fine but I've only used them with ODBC and other databases (Firebird, MySQL, Sqlite3). Information on primary or foreign keys wasn't passing through. There were also problems with datatype interpretation: a date in MySQL is not the same thing as in Access VBA. I guess these problems aren't nearly as bad when using SQL Server.
Important Point: If you link the tables in Access to SQL Server, then EVERY table must have a Primary Key defined (Contractor? Access? Experience says that probably some tables don't have PKs). If a PK is not defined, then the Access forms will not be able to update and insert rows, rendering the tables effectively read-only.
Take a look at this Access to SQL Server migration tool. It might be one of the few, if not the ONLY, true peer-to-peer or server-to-server migration tools running as a pure Web Application. It uses mostly ASP 3.0, XML, the File System Object, the Data Dictionary Object, ADO, ADO Extensions (ADOX), the Dictionary Scripting Objects and a few other neat Microsoft techniques and technologies. If you have the Source Access Table on one server and the destination SQL Server on another server or even the same server and you want to run this as a Web Internet solution this is the product for you. This example discusses the VPASP Shopping Cart, but it will work for ANY version of Access and for ANY version of SQL Server from SQL 2000 to SQL 2008.
I am finishing up development for a generic Database Upgrade Conversion process involving the automated conversion of Access Table, View and Index Structures in a VPASP Shopping or any other Access System to their SQL Server 2005/2008 equivalents. It runs right from your server without the need for any outside assistance from external staff or consultants.
After creating a clone of your Access tables, indexes and views in SQL Server this data migration routine will selectively migrate all the data from your Access tables into your new SQL Server 2005/2008 tables without having to give out either your actual Access Database or the Table Contents or your passwords to anyone.
Here is the Reverse Engineering part of the process running against a system with almost 200 tables and almost 300 indexes and Views which is being done as a system acceptance test. Still a work in progress, but the core pieces are in place.
http://www.21stcenturyecommerce.com/SQLDDL/ViewDBTables.asp
I do the automated reverse engineering of the Access Table DDLs (Data Definition Language) and convert them into SQL equivalent DDL Statements, because table structures and even extra tables might be slightly different for every VPASP customer and for every version of VP-ASP out there.
I am finishing the actual data conversion routine which would migrate the data from Access to SQL Server after these new SQL Tables have been created including any views or indexes. It is written entirely in ASP, with VB Scripting, the File System Object (FSO), the Dictionary Object, XML, DHTML, JavaScript right now and runs pretty quickly as you will see against a SQL Server 2008 Database just for the sake of an example.
It takes perhaps 15-20 seconds to reverse engineer almost 500 different database objects. There might be a total of over 2,000 columns involved in this example for the 170 tables and 270 indexes involved.
I have even come up with a way for you to run both VPASP systems in parallel using 2 different database connection files on the same server just to be sure that orders entered on the Access System and the SQL Server system produce the same results before actual cutover to production.
John (a/k/a The SQL Dude)
sales#designersyles.biz
(This is a VP-ASP Demo Site)
Here is a technique I've heard one developer speak on. This is if you really want something like a Client-Server application.
Create .mdb/.mde frontend files distributed to each user (You'll see why).
For every table they need to perform an CRUD, have a local copy in the file in #1.
The forms stay linked to the local tables.
Write VBA code to handle the CRUD from the local tables to the SQL Server database.
Reports can be based off of temp tables created from the SQL Server (Won't be able to create temp tables in mde file I don't think).
Once you decide how you want to do this with a single form, it is not too difficult to apply the same technique to the rest. The nice thing about working with the form on a local table is you can keep a lot of the existing functionality as the existing application (Which is why they used and continue to use Access I hope). You just need to address getting data back and forth to the SQL Server.
You can continue to have linked tables, and then gradually phase them out with this technique as time and performance needs dictate.
Since each user has their own local file, they can work on their local copy of the data. Only the minimum required to do their task should ever be copied locally. Example: if they are updating a single record, the table would only have that record. When a user adds a new record, you would notice that the ID field for the record is Null, so an insert statement is needed.
I guess the local table acts like a dataset in .NET? I'm sure in some way this is an imperfect analogy.

Resources