I am using Google Sheets to create a database that is connected to Google Data Studio. But the database is growing fast and will soon overgrow Sheets limits.
I am looking for a cloud service that is simple to use like Sheets, where I can manually add data, do calculations (like formulas in Sheets) and also use Python to update the data there. I also need it to connect to Google Data Studio for visualisation.
I got recommended Firestore, Cloud SQL, Bigquery, but I still do not understand the difference between them. I am looking for something cheap where I can do the things I mentioned above.
P.S. I am new to SQL, so I would prefer a visual database (like Sheets).
Thank you all!
Sheet is not a database, but you can use as is. You have other type of database on Google Cloud, such as
Firestore a document oriented database, not really similar to a tabular Sheet
BigQuery which is a datawarehouse very powerful and the most similar to sheet in its design, checks and controls
Cloud SQL hosts relational database engine, similar to BigQuery but with, in addition, the capacity to create contraint (unique value, primary key, external (foreign) key in relation with another value in another table.
However, no one offer the easiness of Sheet in term of graphical interface. The engine are powerful but are developer oriented and not desktop user oriented.
Related
I have what is essentially a traditional relational database, consisting of four tables, all related with IDs. Currently this database resides in four tab-delimited text files, in an S3 bucket. Very little, if any, data will ever be added to these tables. It is an unchanging reference database. So it will be exclusively read from, never added to or edited.
I would like to access this database in an Alexa skill. I've built a few skills already, using NodeJS, so I know how that all works. But I'm anxious to learn how to link up a skill with a back-end DB. This skill will need to do SQL SELECT statements against this DB, based-on user-provided parameters, and based on the query filter be able to pull a set of records into an array that can be used by my skill's lambda function.
Each of the current text files holds one of four tables. The largest table is about 35k rows. Whole DB is maybe 5 Mb, 90% of which is one of the four. Like I said, they are all connected with ID columns like a traditional RDBMS. This will not be for commercial purposes. Probably.
I am already familiar with SQL Server, it's the DB I know, and I'm comfortable with SQL Server Express and can whip something up there, but I'm open to learning NoSQL or some other method if it's more appropriate for this use case. And as this is mostly a learning exercise, if something is "just as good", it's good for me to know.
What is my best DB solution?
* NoSQL such as DynamoDB?
* Some sort of MySQL?
* SQL Server?
* Leave them as tab-delimited text and use them from the Lambda function directly?
Thanks, I don't want to start down the wrong road here.
A few options...
S3 Select
S3 Select (in Preview at the time of writing this) "enables applications to retrieve only a subset of data from an object by using simple SQL expressions. By using S3 Select to retrieve only the data needed by your application, you can achieve drastic performance increases – in many cases you can get as much as a 400% improvement."
DynamoDB
The benefit of using DynamoDB is that there is no need to run a database server -- it is a fully-managed service. While it doesn't support SQL syntax, it is very fast and can suit many use-cases.
In fact, most projects should consider using a NoSQL database like DynamoDB for every situation, unless there is a particular reason to use SQL (such as business reporting).
Cost is based upon storage and provisioned capacity (which can scale-up and down based on demand).
SQL Database
Yes, you can certainly run an SQL database, either through Amazon RDS (Relational Database Service) or on your own EC2 instance (eg MySQL or even Apache Derby. However, you are then paying for the server even when it isn't being used.
Using Microsoft SQL Server is probably too much for your use-case (and more expensive than using an open-source product).
I wonder if you could incorporate SQLite in your app, which would provide SQL capabilities without much overhead?
Do it in memory
5 MB is, quite frankly, not much data. You could simply load all the data into memory and do your manipulations from there. While the load might consume a few cycles, data access will be very quick after that.
Background
5-10 data sources
Various formats (csv, psv, xml)
Different update schedules (weekly, monthly, quarterly)
Requirements
Only interested in some of the fields from each data source
Want to build a model from the various sources, into a single database (SQL Server)
Current platform/skillset
Azure
SQL Server
Considerations
Minimal code. Hopefully i can do this all via a UI/drag-drop interface.
Automation. Hoping i can drop the files onto a server when it needs to be updated, then "things" kick off (Azure Functions blob/FTP trigger?)
Questions
I haven't done much in the ETL space, but my initial thoughts point to something like SQL Server Integration Services, mainly because that's the only thing i can ever had experience in, ETL-wise.
Now that we have things like Azure Data Factory, SQL Data Warehouse, etc, would that be a better solution? Obviously the answer is "it depends", so what questions do i need to go about asking myself in order to clarify that? Can someone please point me to a good article to get started in this space?
TIA
The main question is where do you want to stage the data.
Many people are talking about Azure Data Lake as a staging area. There are pros and cons to this solution.
The pros are Azure Active Directory Service can be federated with your on premise forest. Once that is done, regular Access Control List can be used to restrict access.
The cons are the fact that you are using premium storage (SSD) which can cost a-lot of money for a small to medium size company.
On the other hand, Azure Blob Storage has been around for a long time. One of the pros is the cost of this storage. A shared access signature (SAS) can be used to let anyone access to the account.
The cons is that the SAS is the key to the whole kingdom. Unlike ADLS, you can not assign privledges at the file.
If you like SQL Server OpenRowSet or Bulk Insert, you are in for a treat. Support for those functions were added earlier this year.
Check out my article on MS SQL TIPS for the details.
As for scheduling, you can use a very simple Power Shell script in Azure Automation to create a hands off process.
Azure Data Factory might be able to do some of these tasks; However, you adding a-lot more complexity than a simple T-SQL statement to load data into a table.
Last but not least, learn to love PowerShell. You can pretty much do any type of file processing with that language and the right .NET components.
Happy coding.
John Miner
The Crafty DBA
I'm looking for a tutorial or something that allow me to learn Presto step by step.
The idea is to start integrating file's and MSSQL, which is my knowledge area.
Unfortunately, since it is a relatively new area, I didn't find anything more than Facebook page or the Presto.io page, however it is not good enough for someone that want to start knowing the big data world from scratch.
I will appreciate your help and/or orientation in this area.
Presto has 2 primary use cases:
querying data stored in a cluster (on Hadoop's HDFS) or in a cloud (e.g. Amazon S3)
data federation, i.e. querying (and joining) data from multiple data sources (e.g. HDFS, S3, traditional RDBMS like PostgreSQL or SQL Server)
As far as SQL Server support is concerned -- Presto supports connecting to SQL Server since https://github.com/prestosql/presto/commit/072440cbb2c8df2a689c4c903dd325013eae41a0.
When it comes to querying files -- Presto uses Hive's Metastore to keep track of metadata (everything besides actually reading the data). Thus the files must reside on HDFS or S3 to be accessible (other cloud data stores like Azure's Blob are, AFAIK, not supported yet).
Our IT manager is asking my help on deciding on which would be the best to save the data. Is it in sharepoint or sql server.
On my side I don't know much about saving data on sharepoint server, how does it work, how fast, how secured, etc. I even have a doubt if sharepoint is capable of complex database design. As far as I know, sharepoint is not a database server that's why I have this doubts.
So obviously I would say Sql Server would be my prefered storage and also because Sql server is known to me for a long time already. Considering my 3 weeks exposure on sharepoint vs. 7 years on Sql Server. I don't have the enough experience to witness the strength of Sharepoint for me to decide on what to do. So to be fair on sharepoint I would like to ask you guys out there who are more experienced on this.
My questions:
1.) Does sharepoint have the ability to store data?
2.) If sharepoint can store data, what are the pros and cons?
3.) Can it cover a complex design such as relational database design like sql server does?
4.) If you where to develop a sharepoint project, would you choose sql server as the backend?
Thanks in advance!
It obviously depends on the application, and complexity of it, who the client or audience is, and how you want to deploy it.
Here are my answers to your questions:
1. Yes
2. Pros:
It provides a UI for updating data.
Cons:
Creating relational structures will be complicated.
Think custom lookup lists, associated with other custom lists.
3. Yes, but I wouldn't try it.
4. SQL Server, but this depends on the project and
isn't an entirely technical decision.
Personally, I think given your skillset, you should use SQL Server, if your manager has said it's up to you.
SharePoint itself is built on top SQL Server and ASP.NET.
Yes. You can create a custom list (basically similar to table structure), you can store document along with its metadata. You can store web pages if you are using it as your publishing (CMS) platform.
It's not supposed be a relational engine like SQL Server. Pro: versioning, workflow, for most cases, UI is there to support data input / editing. Con: Limitation of the UI w/ large amount of data.
To some degree you can relate one list to another field in a different list / document metadata.
See what I said before point 1.
SharePoint offers its own database layer built on top of SQL Server.
A complex object model is provided, and the SQL language API not available.
Acsess is by API, REST, and UI List Webparts with views; NOT SQL and the database is not accessible except through interfaces.
Deep inside data stored in Entity-Attribute-Value triples (specifically: site, web, list, item, state, field, value) such that each value goes into its own record. This is strickly non-tablular.
Maintains a dynamic end-user populated Metadata dictionary.
As a non-relational layer above a DB is offers inheritance, multi-type list, hierarchies, taxonomies, versioning, check in/out and other advanced features missing from a relational model.
Documents may be attached to a list.
Extensive use of GUIDS for identifiers, but this causes problems when moving partial related data between systems.
No referential integrity.
No joining of database tables or lists.
Filtering is more limited than in SQL.
No concept of a schema.
Parts of SharePoint break when restoring from a backup or when published to a separate site.
Rolling new features and data from development to production is problematic and sometimes breaks.
Hope this helps.
Sharepoint is obviously not a Database Server but somehow it works on some ways.
1.)Yes
2.)You can but not as complicated as Sql Server does.
Pros: It's the interfaces the gives sharepoint the edge, UI grants the user a friendlier way of inputting data.
Cons:Just like what I've said complicated database design is not easy to do.
3.) 100% Yes
4.) I would prefer Sharepoint if the application doesn't need complex design on data. Definitely Sql Server for enterprise type of application.
I know how to create a table in Google Big Table. But with my constraints I want to create database and store all the tables inside that database.
Start with the Getting Started documentation for Java or Python. The App Engine environment provides your app with a connection to a single datastore instance - you can't create new datastores for your app, so you'll need to partition your data yourself, inside the datastore.
The datastore also doesn't use 'tables' as you may be used to with a relational database, but instead uses 'entity types' to break data up similarly to tables. The documentation has more information on how it all works.