Memgraph is an in-memory database. Does that mean that data is lost when I shutdown the computer? - graph-databases

Memgraph is an in-memory database. Does that mean that data is lost when I shut down the computer?
Do I need to use GQLAlchemy library as an on-disk storage solution to ensure data persistence?

The data is not lost when you shutdown your computer.
Memgraph uses two mechanisms to ensure data durability:
write-ahead logs (WAL)
periodic snapshots
Snapshots are taken periodically during the entire runtime of Memgraph based on the time defined with the --storage-snapshot-interval-sec configuration flag. The whole data storage is written to the disk when a snapshot is triggered. Write-ahead logs save all database modifications that happened to a file.
If you want to generate a snapshot for the current database state instantly, use the following query:
CREATE SNAPSHOT;
If you are using Docker, check how to specify volumes for data persistence.
Therefore, you don't need to use GQLAlchemy to insure data persistence. GQLAlchemy library provides an on-disk storage solution for large properties not used in graph algorithms. This is useful when nodes or relationships have metadata that doesn’t need to be used in any of the graph algorithms that need to be carried out in Memgraph, but can be fetched after. You can check out the how-to guide to learn how to use an SQL database to store node properties seamlessly as if they were being stored in Memgraph.

The data is not lost when you shutdown your computer.
Memgraph uses two mechanisms to ensure data durability:
write-ahead logs (WAL) and
periodic snapshots.
Snapshots are taken periodically during the entire runtime of Memgraph. The whole data storage is written to the disk when a snapshot is triggered. Write-ahead logs save all database modifications that happened to a file.
If you are using Docker, check how to specify volumes for data persistence.

Related

Regarding the burden on Snowflake's database storage layer

Snowflake has an architecture consisting of the following three layers.
・Database storage
・Query processing
・Cloud service
I understand that it is possible to create a warehouse for each process in query processing, and scale up and scale out on a per process basis.
However, when the created warehouses (processes) are processed in parallel, I am worried about the burden on the database storage.
Even though the query processing process can be load-balanced, since there is only one database storage, wouldn't there be a lot of parallel processing running in the database storage and an error occurring in the database storage layer?
Sorry if I don't understand the architecture.
The storage is immutable, thus the query read load is just IO against cloud provider IO layers, so for all purposes infinitely scalable.
When any node updates a table, the new set of file partitions are known, and any warehouse without the new set of partition parts, does remote IO to read them.
The only downsides to this pattern is it does not scale well for transactional write patterns, thus why that is not the targeted at those markets.

Load balancer and multiple instance of database design

The current single application server can handle about 5000 concurrent requests. However, the user base will be over millions and I may need to have two application servers to handle requests.
So the design is to have a load balancer to hope it will handle over 10000 concurrent requests. However, the data of each users are being stored in one single database. So the design is to have two or more servers, shall I do the followings?
Having two instances of databases
Real-time sync between two database
Is this correct?
However, if so, will the sync process lower down the performance of the servers
as Database replication seems costly.
Thank you.
You probably want to think of your service in "tiers". In this instance, you've got two tiers; the application tier and the database tier.
Typically, your application tier is going to be considerably easier to scale horizontally (i.e. by adding more application servers behind a load balancer) than your database tier.
With that in mind, the best approach is probably to overprovision your database (i.e. put it on its own, meaty server) and have your application servers all connect to that same database. Depending on the database software you're using, you could also look at using read replicas (AWS docs) to reduce the strain on your database.
You can also look at caching via Memcached / Redis to reduce the amount of load you're placing on the database.
So – tl;dr – put your DB on its own, big, server, and spread your application code across many small servers, all connecting to that same DB server.
Best option could be the synchronizing the standby node with data from active node as cost effective solution since it can be achievable using open source relational database(e.g. Maria DB).
Do not store computable results and statistics that can be easily doable at run time which may help reduce to data size.
If history data is not needed urgent for inquiries , it can be written to text file in easily importable format to database(e.g. .csv).
Data objects that are very oftenly updated can be kept in in-memory database as key value pair, use scheduled task to perform batch update/insert to relation database to achieve persistence
Implement retry logic for database batch update tasks to handle db downtimes or network errors
Consider writing data to relational database as serialized objects
Cache configuration data to memory from database either periodically or via API to refresh the changing part.

What is simplest way to stream yammer Metrics data into relational database?

We start to integrate yammer metrics in our applications. We want to collect generated Metrics data into relational database table.
How this Metrics data can be Streamed to database continuously ?
I have searched the internet and found that Yammer provides inbuilt Reporter API(CSVReporter, GraphiteReporter etc.) which can Stream data to CSV, Graphite etc.
We cannot keep augmenting CSV or text files because they have to be archived from server after some time because of memory issues.
Once yammer metrics API streams data out to some other place, do it keeps the copy of same in server memory ?
We want to keep server memory free once data streamed out to database.
The metrics stay in memory for a while in every situation, but you need a product like Ganglia or Graphite to store the data long term. These are normally better for operations metrics than a relational database and provide reporting add-ons. You'd need to have some extra code or extend the metrics library to log directly to a database.
Once the data is streamed out there is no point in holding onto it so it isn't going to affect your servers if you have it setup correctly.

In Memory Database as a backup for Database Failures

Is in-memory database a viable backup option for performing read operations in case of database failures? One can insert data into an in-memory database once in a while and in case the database server/web server goes down (rare occurence), one can still access the data present in the in-memory database outside of web server.
If you're going to hold your entire database in memory, you might just as well perform all operations there and hold your backup on disk.
No, since a power outage means your database is gone. Or if the DB process dies, and the OS deallocates all the memory it was using.
I'd recommend a second hard drive, external or internal, and dump the data to that hard drive.
Obviously it probably depends on your database usage. For instance it would be hard for me to imagine StackOverflow doing this.
On the other hand not every application is SO. If your database usage is limited you could take a cue from Mobile Applications which accept the fact that a server may not always be available. And treat your web application as though it were a Mobile Client. See Architecting Disconnected Mobile Applications Using a Service Oriented Architecture

Sqlite memory mode support persistence to local?

What is a memory database? Is sqlite a memory database?
On this mode, does it support persistence data to a local file?
An in-memory database supports all operations and database access syntax, but doesn't actually persist; it's just data structures in memory. This makes it fast, and great for developer experimentation and (relatively small amounts of) temporary data, but isn't suited to anything where you want the data to persist (it's persisting the data that really costs, yet that's the #1 reason for using a database) or where the overall dataset is larger than you can comfortably fit in your available physical memory.
SQLite databases are created coupled to a particular file, or to the pseudo-file “:memory:” which is used when you're wanting an in-memory database. You can't change the location of a database while it is open, and an in-memory database is disposed when you close its connection; the only way to persist it is to use queries to pull data out of it and write it to somewhere else (e.g., an on-disk database, or some kind of dump file).
SQLite supports memory-only databases - it's one of the options. It is useful when persistence is not important, but being able to execute SQL queries quickly against relational data is.
The detailed description of in-memory databases:
https://www.sqlite.org/inmemorydb.html

Resources