What's the underlying storage engine in Apache IoTDB? [closed] - database

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 14 days ago.
Improve this question
Does the underlying storage engine in Apache IoTDB use other mature storage engine, like RocksDB, LevelDB or Cassandra? Or it implements its own storage engine from sratch?

Apache IoTDB does not rely on any existing storage engine indeed. In fact, it has its own implementation for the datastore based on a newly developed file format tsfile (which is related to Apache Parquet). More information on the tsfile Format can be found here: http://iotdb.apache.org/SystemDesign/TsFile/Format.html
Also for metadata storage, Apache IoTDB relies on its own implementation of the known algorithms / concepts like BTrees, the write ahead log (WAL) or the Raft protocol (in Cluster mode).
In the following picture you see a sketch of the storage engine / architecture of Apache IoTDB taken from http://iotdb.apache.org/SystemDesign/StorageEngine/StorageEngine.html:
TL;DR:
Apache IoTDB does not rely on existing projects but implements everything related to its storage engine completely new based on a binary file format for mass data storage.

We build Apache IoTDB from scratch :)
The data file is called TsFile (Time series File), which is optimized for time series data query.
IoTDB engine is built based on TsFile.

Related

When should one use Redis as a Primary Database and Elastic Search [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
I have following scenario, others may have different. How should we decide between Redis as persistent primary database and Elastic Search.
In a micro-service, database has lots of read requests, in comparison to write request. Also my data will have only 8-10 columns or keys in terms of JSON (Simple data structure).
If my database hardly gets write request in respect to read request, why should we not use Redis as persistent Database. I went through Redis Office document and found why should we use it as persistent database [Goodbye Cache: Redis as a Primary Database]
But still not convinced fully to use it as a Primary Database
The answer would depend on your application and what it does internally. But assuming you don't need particularly complicated queries to get the data (no complex filtering, for example) and you can fit all your information in memory, I see Redis as a completely valid alternative to a traditional database.
If you want the strongest possible guarantees Redis can offer, you'd want to enable both RDB and AOF persistence options (read https://redis.io/topics/persistence).
The big advantage of a set-up like this is you can trust Redis to improve the throughput of the application, and maintain a very good level of performance over time, even with a growing dataset.

How do keyword research/analysis software work [theoratically]? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I have seen a lot of keyword research/analysis applications, such as Market Samurai: Keyword Analysis Tool, and SEMRush keyword tool.
My question is how can they get stats about those keywords ? are they using google api to achieve that ?
I fail to see how can a software not connected to google search database get information about monthly searches, competition ...etc.
Thanks.
For Search Volume, Paid Competition, CPC data, most of these tools get it in one of three ways.
They can get it directly from Google via the AdWords API (requires "Standard Access" and must meet RMF requirements).
Another way is to get it from a third-party who can pull data from Google updated monthly such as GrepWords.
Using their own models with various data sources from third parties, mixed possibly Google's statistics and other click stream data, and applying machine learning algorithms to make predictions that can even rival Google's own data.
For Keyword Difficulty (KD) or Organic Competition scores, all tools provide an estimate of how difficult it might be to rank high organically for a specific keyword. Tools will typically use a combination of techniques. Below is a short list of what they may include:
Search Engine Result Pages (SERP) analysis
Each keywords' SERP density analysis
Analysis of competitors for each keyword
Word difficulty and frequency
Backlink and domain authority analysis of competitors
and many other indicators
A few tools and where they get their Search Volume and CPC data:
SEMRush uses an algorithm to estimate their traffic (source: spoke with them at a conference in 2016).
ahrefs uses a third party to get click stream data and pairs it with data from Google
MOZ uses a third party to get their Google data and click stream data (source: spoke with team).
KWFinder reports that their data is the same as Google Keyword Planner.
Twinword Ideas actually gets their data directly from Google (source: I work there).

Which database support scalability and availability? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 6 years ago.
Improve this question
What database software can do these?
Scalability via data partitioning, such as consistent hash.
Redundancy for fail over. Both memory cache and disk storage together. Key-value data. Value is document type such as JSON.
Prefers A and P in CAP theory.
I heard that MemcacheD can do these all, but I am not sure.
Here's details:
Data storage volume is, <30KB JSON document for each key. Keys shall be >100,000,000.
Data is accessed >10K times for a second.
Persistence is needed for every key-value data.
No need for transaction.
Development environment is C#, but other languages are ok if the protocol spec is known.
Map reduce is not needed.
This is too short a spec description to choose a database. There are tons of other contraints to consider (data storage volume, data transfer volume, persistence requirement, needs for transactions, development environnment, map reduce, etc.).
That being said:
Memcachedor Redis are memory database which means that you cannot store more than what your computer memory can hold. This is less true now that distributed capabilities have been added to redis.
Document database (such as MongoDB or Microsoft Document Db) support everything. And you can add memcached or redis in front. That's how most people use them.
I would like to add that any SQL can now deal with JSON. So that works too. With a cache up front if needed.
Some link of interest for JSON oriented database. But once again. That's too short a spec to choose a database.

In terms of back-end and front-end technology, what can GAE do that Web Hosting can't? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
I vaguely understand the difference between Google App Engine and a traditional Web Hosting service.
I do understand Google App Engine can scale for a much wider audience, thanks to not having to maintain your own hardware, handling the load-balancing, spreading the data over multiple locations, etc.
But in terms of what can be accomplished when using Python or any of the other supported languages on GAE, can't a Web Hosting service equipped with a LAMP stack (or the like) create dynamic content, store data, and render pages to the browser just the same? Is there some other content / service that developers could provide through GAE?
Examples would be very helpful.
In my mind - all I can picture is that they both serve HTML pages, CSS & JS files, images, videos, music, maybe pull data from a relational database, allow users to upload files to share, etc.
Adding to #Andrei's answer, the App Engine is all about a Platform as a Service (PAAS). For example, you wrote:
In my mind - all I can picture is that they both serve HTML pages, CSS
& JS files, images, videos, music, maybe pull data from a relational
database, allow users to upload files to share, etc.
And that is all you should have to think about. With App Engine, you don't have to think about which version of operating system it's running, which database version it currently has, which web server, file server, log server, memcache and task queue servers are running, and so on.
Google's engineers keep your servers up and running with the latest versions of each service, and you don't have to do a thing to upgrade or scale up. All the data is backed up in three locations automatically, and protected as thoroughly as Google protects its own data. If hackers want to try and break in, they have to go past Google's defenses first.
So all you have to think about is your code and data, and leave everything else to Google. Compared with standard Web Hosting, where you have to maintain everything yourself, it's a relief to be free from all that extra work. I know, I've done it all before myself.
It's all about two key issues: scalability and maintenance.
Scalability comes into play when you max out your web server, then max out your database server, then max out the cluster of database servers. With App Engine you don't have to think about it. With any other solution you have to be very good and invest a lot of time to make it to each next level.
For example, it's not easy to implement tasks queues that allow any number of front-end instances to schedule task on any number of backend instances involving data from any number of database servers. On App Engine it takes a few lines of code.
In terms of maintenance App Engine eliminates the headache of hardware failure/repair, hardware/network monitoring, OS/web server/database/etc. software updates and patches, data replication - and I only mention the key areas.
The savings can be very significant depending on the scale of your project.

Choosing the correct google cloud data storage strategy for images [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
I'm writing my first google cloud backend for an image sharing mobile app and I'm having difficulties understanding which data storage option to go for. The clients will be mobile apps and probably also a web frontend. From what I've read so far the AppEngine seems to be a nice infrastructure for this.
I need to handle:
Users
Groups of users
Collections of images (created within a group of user, many shared with all users )
The actual image files
Start small but with an architecture able to support massive upscaling (in terms of users and images)
Would it make sense to store the actual images in Cloud Storage and the metadata about users, Groups of users and Collections of images in Datastore or MySql? In particular I'm having trouble choosing betwween Datastore and MySql.
Any advice would be greatly appreciated, I have very little experience from databases :)
Cheers!
Images (or other big "opaque" goobs of data such as video) on Cloud Storage, and metadata about them in a more structured store, is the classic architecture pattern for all similar use cases.
If you need some features of relational databases, such as JOINs, on your metadata, then Cloud SQL may be what you need for the "more structured store"; however since you're designing from scratch it's usually quite feasible to use a NoSQL store like App Engine's Datastore, with scalability and other advantages pertaining to it (and you do mention "massive upscaling" in the future, so this may be quite relevant to you!-).
Among the advantages of this classic architecture is that object stores like Cloud Storage can give you "serving URLs" to specific objects (images) that you can pass to clients so that the serving of such massive data will be done by Cloud Storage's own servers, without burdening your application's servers. Moreover, as https://cloud.google.com/storage/docs/website-configuration puts it,
Google Cloud Storage behaves essentially like a Content Delivery
Network (CDN) with no work on your part because publicly readable
objects are, by default, cached in the Google Cloud Storage network.
so at least for images that are "shared with all" (and thus you can mark as publicly readable) you'll get a CDN's advantages of low latency "with no work on your part".

Resources