I see lots of questions, articles, and answers on using DynamoDB (NoSQL) database to store metadata for an S3. I actually have more experience using relational databases than NoSQL. Wouldn't a "RELATIONAL" database be the best choice for the metadata because of all the different parameters (metadata) (relationships) you might want to search for an image stored in S3. This is what I would think. Also, when I look at this link, it seems DynamoDB is a bit problematic.
Is DynamoDB suitable as an S3 Metadata index?
Related
We are considering to use snowflake. I tried looking into the documentation and google, but without luck. How does snowflake query/store data? As an example if I have a CSV file, database, datalake ... is it like real time querying vs the sources, or does it replicate data to snowflake? If replication, how often does it update?
Maybe an introduction to the Snowflake Architecture is helping you here: https://docs.snowflake.com/en/user-guide/intro-key-concepts.html
Let's split up your query in two parts:
How does Snowflake store data? Basically Snowflake is storing data in it's own proprietary file format. The files are are called micro partitions, are in hybrid columnar format and are stored in for example S3 in case you are using Snowflake on AWS.
How does Snowflake query data? For this Snowflake is leveraging compute instances called Virtual Warehouses, which correspond to compute instances of your cloud provider underneath. With them, the files are accessed and queried.
As a Web Developer everyday we are hearing about new technologies, recently I came to know about Elastic Search it is used to analyze the big volumes of data. I've my data in Mongo DB weather it is possible to use elastic search on it.
MongoDB Atlas has a feature called 'Atlas Search', which implements the Apache Lucene engine. This could be a solution for your search requirements.
See Atlas Search for details
Depends what you mean by "analyze the big volumes of data", what are your requirements? Don't pay to much attention on marketing slogans. Maybe you can connect Elasticsearch with MongoDB via an ODBC driver. Elasticsearch is a document oriented NoSQL database like MongoDB is. As usual both have their pros and cons.
MongoDB is more like a database, i.e. it supports CRUD (Create, Read, Update, Delete) operations and the Aggregation Framework is very powerful.
In Elasticsearch you can store data and analyze or query it. I remember in earlier releases it was not so easy to delete or update existing single documents.
We choosed Snowflake as our DWH and we would like to connect different data sources like (Salesforce, Hubspot and Zendesk).
Is there a way to extract data from these sources and store them in Snowflake in a staging schema without having to store the data in cloud storage like S3 then reading the data into Snowflake?
Many thanks in advance.
You can use any of the connectors Snowflake provide (odbc, jdbc, python, etc) and any tool that can use one of these connectors. However they wont perform well compared to the COPY INTO approach that is optimised for bulk loading.
There are ETL tools, such as Matillion, that use the stage/copy into approach but do it in the background so that it appears that you are loading directly into Snowflake.
What Datastore/Database runs on top of Amazon S3 or S3-compatible storage?
I understand that S3 is an Object Storage and thus not a database, but a database must have something to store data into, thus, my question is if there is a Database or Datastore that saves its data on an Amazon S3 or S3-compatible storage instead of a local file system.
Here are some databases and database-like products that use S3 (or can use S3).
Amazon Athena
S3 Select
Apache HBase
Redshift
Also, if you want some theory, here’s a MIT paper about Building a Database on S3.
This is by no means exhaustive, but it’s probably a good place to start.
Update
Here are some more that aren't AWS owned software.
Cassandra
Hadoop—this isn't a database, but S3 already provides you with key-value storage, and Hadoop can provide you with querying.
s3-db
Ultimately, you need to consider what sort of query functionality you need and what sort of consistency you can tolerate.
I am working on a project with a great deal of unstructured data. Is there database software or a tool that is suitable for unstructured data. If there are no tools or other software what database design would I use if mysql or sql server are my only choice?
If you are going to have enough structured data to formulate a key I'd stick with any DB that supports blobs.
If you're not going to have a structured key I'd go with something like couchDB. It allows you to use unstructured keys to store unstructured data.
If you have unstructured keys and you're absolutely stuck with mysql / sql server you can still accomplish your goal using unstructured data (mysql for instance supports column prefix indexing where you provide it the length of a variable length field to use for indexing ).
VelocityDB is high performance database suitable to handle unstructured data. It is common to create inverted indexes when handling unstructured data. The VelocityDB website and download provides sample code for creating inverted indexes from books, web pages and the entire Wikipedia text.