I have a scenario with spring batch where I need read data from ms SQL server database and write it to the Cassandra database.
Am new to batch processing not much resources getting in Google to understand more on this,
Could you please share inputs in the same
Thanks in advance
You question is very light on detail and a little too open-ended so I wanted to warn you that there's a chance the community will vote to close it for those reasons.
Based on what you've provided, it sounds like you've got a streaming use case where you have an app "service" that would be the source of the data and publish it on a messaging/event platform and other systems/services can subscribe to those events.
You can use Kafka or Pulsar as the platform and Cassandra is one of the sinks. If you're interested in trying it out, Astra Streaming is a streaming-as-a-service backed by Pulsar with Astra DB (Cassandra-as-a-service) as the sink.
Astra Streaming and DB have free tiers which don't require a credit card so you can quickly do POCs without having to worry about downloading/installing/configuring clusters.
As a side note, Astra DB comes with ready-to-use Stargate.io -- a data platform that allows you to connect to Cassandra using REST, GraphQL and JSON/Doc APIs so you can easily build applications on top of Cassandra using APIs. Cheers!
Related
Firstly I'm new to development and currently I have a problem with server data filling up rapidly. I'm looking at solutions such as watcher programs to help me detect when the server data is reaching the limit but I wanted to know if cloud solutions could help in this regard. Additionally I also wanted to know if companies such as Snowflake can help to handle fast growing data and in what way can a developer use it or will it be too costly to use this approach from an enterprise point of view.
I have tried to look up the documentations of Snowflake but I am unable to reach any conclusions as to whether it can help me. I could just see articles about storage and that they store data by compressing it but I wanted more clarity on this solution.
Snowflake stores the data using Cloud Storege Services (AWS S3, Google Cloud Storage, or Microsoft Azure), so you can't fill the server data in normal conditions (never heard that S3 is full on any region).
Check the pricing page to see if it will be costly for you (or not):
https://www.snowflake.com/pricing/
As mentioned in the title, I'd like to know what data sources Snowflake supports. I'm not completely sure how to even approach this question. I know you can create an external stage in the cloud storage of supported cloud providers, but what if I want to load data from the Oracle database, for example. What's the best solution in that case, is it to use the ODBC driver, or?
And please feel free to give me any suggestions, or advice on where to continue my research. Also, let me know if any part of my question is unclear so that I can rephrase it :)
Snowflake natively supports AVRo, Parquet, CSV, JSON and ORC. These are landed in a stage for ingestion --- your ELT/ETL tool of choice or even a home-built application must land the data in a stage, either internal or external.
That file is then ingested into Snowflake utilizing a COPY command either automated by said tool or using something like Snowpipe.
We have documentation on Firehose / Kafka pipelines landing data for Snowpipe to ingest either through AUTO_INGEST notifications (limited to external stage) or calling our REST API.
All supported by our documentation, simply google the terms I have mentioned and there will be tons of documentation
Multiple existing ETL Tools allow to define Snowflake as destination, supporting a wide variety of sources.
Native Programmatic Interfaces
Snowflake Ecosystem - Data Integration
Could someone explain the benefits/issues with hosting a database in Kubernetes via a persistent volume claim combined with a storage volume over using an actual cloud database resource?
It's essentially a trade-off: convenience vs control. Take a concrete example: let's say you pay Amazon money to use Athena, which is really just a nicely packaged version of Facebook Presto which AWS kindly operates for you in exchange for $$$. You could run Presto on EKS yourself, but why would you.
Now, let's say you want to or need to use Apache Drill or Apache Impala. Amazon doesn't offer it. Nor does any of the other big public cloud providers at time of writing, as far as I know.
Another thought: what if you want to migrate off of AWS? Your data has gravity as well.
Could someone explain the benefits/issues with hosting a database in Kubernetes ... over using an actual cloud database resource?
As previous excellent answer noted:
It's essentially a trade-off: convenience vs control
In addition to previous example (Athena), take a look at RDS as well and see what you would need to handle yourself (why would you, as said already):
Automatic backups
Multizone deployments
Snapshots
Engine upgrades
Read replicas
and other bells and whistles that come with managed service opposed to self-hosted/managed one.
But there is more to it than just convenience/control that this post I trying to shed light onto:
Kubernetes is adding another layer of abstraction there (pods, services...), and depending on way of handling storage (persistent volumes) you can have two additional considerations:
Access speed (depending on your use case this can be negligent or show stopper).
Storage that you have at hand might not be optimized for relational database type of I/O (or restrict you to schedule pods efficiently). The very same reasons you are not advised to run db on NFS for example.
There are several recent conference talks on kubernetes pointing out that database is big no-no for kubernetes (although this is highly opinionated, we do run average load mysql and postgresql databases in k8s), and large load/fast I/O is somewhat challenge to get right on k8s as opposed to somebody already fine tuned everything for you in managed cloud solution.
In conclusion:
It is all about convenience, controls and capabilities.
I am trying to decide which add-on DB to use with my application when I deploy it on AppHarbor. I've two choices: JustOneDB or Cloudant. I am planning to develop a web and mobile application, which will should work with Terabytes of data.
I am searching for the easiest solution to deploy my database, without me needing to partition the DB and the tables. I want a DB that can handle a very large amount of data, but takes the sharding and partitioning architecture building away from the developer.
I also want a solution that will allow me to easily backup my large database and easily restore it.
From what I've read, Cloudant and JustOneDB are the two most popular ones, and those are available as add-ons on AppHarbor for easy deployment.
I need your recommendations on which one I should go with, the cons and pros of each one. I am developing my application in ASP.NET and C# inside Visual Studio.
There's a recent post on the Cloudant blog about using the MyCouch .Net library with Cloudant databases:
https://cloudant.com/blog/how-to-customize-quorum-with-cloudant-using-mycouch/
Cloudant also offers free hosting up to a greater than $5 bill and can work with Apache CouchDB's replication if you want to develop locally and sync it to the cloud for production/deployment. Multi-master replication isn't something many other databases offer.
Best of luck with your application!
MyCouch.Cloudant was just released. Except from CouchDb and Cloudant core feature support the MyCouch.Cloudant NuGet package adds support for Searches. There will be more Cloudant specific features added to this. It's written in C# and supports .Net40, .Net45 and Windows store apps.
You will find more info about MyCouch in the GitHub repo.
You should probably also consider MongoDB and RavenDB.
If you're just starting out, your first concern should probably be to find a database that'll let you quickly get started and build the application you have in mind. When the application becomes a success and actually attracts terabytes of data, you can start worrying about how to scale it. If the application is soundly architected, adapting it to use an appropriate datastore should not be a monumental task.
Comment removed by originator.
I'm hoping you can help.
I'm looking for a zero config multi-user datbase that my winforms application can easily upload to a webserver folder (together with 1 or 2 classic asp pages) and am looking for some suggestions/recommendations.
The idea is that the database will be used to collect feedback entered by people filling in the asp pages. The pages will write to the database using javascript.
The database will subsequently be downloaded again for processing once the responses are in.
In Summary:
It will mostly run in MS Windows environments.
I have a modest budget for this and do not mind paying for such a database.
No runtime licensing costs.
Should be xcopy - Once uploaded to a website folder it should be operational.
It should not have a dotnet CLR dependency.
It should support a resonable level of concurrent access. Average respondent count would be around 20-30 but one never knows.
Should be a reasonable size so that uploads/downloads to and from the site will be reasonably fast.
Would appreciate your suggestions/comments
Many thanks
Abz
To clarify - this is a desktop commercial application for feedback management in a vertical market. It uses SQL Server as the backing store.
The application currently provides feedback management from email and paper feedback. I now want to add web feedback capability. Getting users to to make their SQL servers accessible to a website is not at option at this time as I am want to make getting up and running as painless as possible.
I intend to release a web based implementation of the software in the near future but for now am looking at the above as a pragmatic way to provide web based feedback collection.
SQLite comes to mind. It meets all of your stated requirements, is open source, and has a liberal license (public domain).
http://sqlite.org/
I would use 'normal' database (say MySql, Postgresql, Firebird, etc.) on server. Instead of copying files to server your winforms application would create custom tables (or even custom databases). After collecting data you could just get it back to your application using plain old SQL.
why reinvent the wheel ? If you want to collect feedback and stuffs from users of your app and if they are connected to internet, it might be a better idea - and in the long term cheaper - to use a service like wufoo. We recently switched from homegrown setup to wufoo and are very pleased. Check it out.
Otherwise you might want to take a look at sqlite orfirebird. Both of them are very robust, and have ADO.NET providers. Firebird scales from a single user to full blown client server system and has no .NET dependency.
If you really don't want a DB/SQL Solution, you could try simple text files and ftp/xcopy files down and parse them into the back-office server as needed. ASP/VBScript or ASP.NET can create the files to store the basic feedback comments. Need to consider security of course!