using cloud sql and datastore together in my application - google-app-engine

I would like to build an application that serve a lot of users, so I decide to use cloud datastore because it is more scalable, but i also want to have an interface that will help me observe my data with some complex sql query.
so i decide to build my data with tow data base (cloud data store and cloud sql) and the users for my application will get the data from the datastore, and me with my interface i will use cloud sql.
The users will just read data they will not write to the datastore, but me with my interface I would read the data from my cloud sql so i can use complex query, and if i want to write or change the data, I will change both data in cloud sql and data sore.
what do you think? is there another suggestion ? thank you

Related

WHere does Snowflake stores data like metadata, tables data

WHere does Snowflake stores data like metadata, tables data and all other data ? Does it uses the Public Cloud which we used to configure while creating account in Snowflake and if yes then under that cloud where it keeps it ? and if no then which cloud provider does it use for the storage ?
Each Snowflake deployment has its own metadata servers. You may get more information on what is used for storing metadata:
https://www.snowflake.com/blog/how-foundationdb-powers-snowflake-metadata-forward/
Based on the additional questions:
The data (micro-partitions) are stored in "object storage services" of the same cloud provides (ie S3 for AWS etc)
Yes, all the data and metadata are stored in the cloud itself where the account is deployed.
Yes, it's deployed on the cloud service linked to the account.
Snowflake consists of following three layers, Database Storage, Query Processing, Cloud Services.
https://docs.snowflake.com/en/user-guide/intro-key-concepts.html
Metadata is managed in Cloud Services layer and it's clearly devided from database storage.
Snowflake's core feature is micro-partitions and immutable. Snowflake doesn't overwrite original targets but copies and updates to reference them by Cloud Services layer if something update is required.

Which database to choose in order to store data coming from flat files CSV, html

I need to design a scalable database architecture in order to store all the data coming from flat files - CSV, html etc. These files come from elastic search. most of the scripts are created in python. This data architecture should be able to automate most of the daily manual processing performed using excel, csv, html and all the data will be retrieved from this database instead of relying on populating within csv, html.
Database requirements:
Database must have a better performance to retrieve data on day to day basis and it will be queried by multiple teams.
ER model, schema will be developed for the data with logical relationship.
The database can be within cloud storage.
The database must be highly available and should be able to retrieve data faster.
This database will be utilized to create multiple dashboards.
The ETL jobs will be responsible for storing data in the database.
There will be many reads from the database and multiple writes each day with lots of data coming from Elastic Search and some of the cloud tools.
I am considering RDS, Azure SQL, DynamoDB, Postgres or Google Cloud. I would want to know which database engine would be a better solution considering these requirements. I also want to know how ETL process should be designed- lambda or kappa architecture.
To store the relational data like CSV and excel files, you can use relational database. For flat files like HTML, which doesn't required to be queried, you can simply use Storage account in any cloud service provider, for example Azure.
Azure SQL Database is a fully managed platform as a service (PaaS) database engine that handles most of the database management functions such as upgrading, patching, backups, and monitoring without user involvement. Azure SQL Database is always running on the latest stable version of the SQL Server database engine and patched OS with 99.99% availability. You can restore the database at any point of time. This should be the best choice to store relational data and perform SQL query.
Azure Blob Storage is Microsoft's object storage solution for the cloud. Blob storage is optimized for storing massive amounts of unstructured data. Your HTML files can be stored here.
The ETL jobs can be performed using Azure Data Factory (ADF). It allows you to connect almost any data source (including outside Azure) to transform the stored dataset and store it into desired destination. Data flow transformation in ADF is capable to perform all the ETL related tasks.

What is a good approach to ingest client sql data passively (client push) using Google Cloud Platform?

Our client doesn't want to let us make any call in their SQL database (even create a replica, etc). The best solution we have thought until now is instantiating a Google Cloud SQL server, so we can ask customer to push its data once a day/week (using the public IP of the server) and then we consume the data pushing into Google Big Query.
I have been reading many topics on the web and my possible solution is asking user doing weekly ETL -> Cloud SQL -> BigQuery. Is it a good approach?
To sum up, I am looking for recommendations about best/cheap practices and possible ways to let the user insert data in GCP without exposing his data or my infrastructure.
My cloud provider is Google Cloud and my client uses SQL Server.
We are open to new or similar options (even other providers like Amazon and Azure)
Constraints:
Client will send data periodically (once a day/or week ingestion)
Data finally should be sent and stored in BigQuery
The costs of having a Cloud SQL in Google is high while we don't need the allocated CPU/Memory and public IP available 24/7 (only a few times a month, e.g: 4 times a month)
The question is missing many details, but how about:
Have the customer create a weekly .csv.
Send the .csv with the new data to GCS.
Load into BigQuery.

Data pipeline - dumping large files from API responses into AWS then with final destination being on premises SQL Server

I'm new to building data pipelines where dumping files in the cloud is one or more steps in the data flow. Our goal is to store large, raw sets of data from various APIs in the cloud then only pull what we need (summaries of this raw data) and store that in our on premises SQL Server for reporting and analytics. We want to do this in the most easy, logical and robust way. We have chosen AWS as our cloud provider but since we're at the beginning phases are not attached to any particular architecture/services. Because I'm no expert with the cloud nor AWS, I thought I'd post my thought for how we can accomplish our goal and see if anyone has any advice for us. Does this architecture for our data pipeline make sense? Are there any alternative services or data flows we should look into? Thanks in advance.
1) Gather data from multiple sources (using APIs)
2) Dump responses from APIs into S3 buckets
3) Use Glue Crawlers to create a Data Catalog of data in S3 buckets
4) Use Athena to query summaries of the data in S3
5) Store data summaries obtained from Athena queries in on-premises SQL Server
Note: We will program the entire data pipeline using Python (which seems like a good call and easy no matter what AWS services we utilize as boto3 is pretty awesome from what I've seen thus far).
You may use glue jobs (pyspark) for #4 and #5. You may automate flow using Glue triggers

Databases in Codename One

I'm trying to create a social media type app using Codename One. In order to do this, I was wondering which database service I should use within my application: just the Storage class, SQLite, or MySQL? I expect to be storing significant amounts of data further down the road.
Storage and SQLite are device databases where in Storage you would normally want to store small amounts of data mainly for caching and offline purposes and SQLite for larger sets of data where you might need more sophisticated queries on the data.
MySQL is a server side database, if you're building a social app you can use mysql on your server to store your data and connect with http/https to the server and query the data to display on the device.

Resources