Flags in Memgraph for more verbose output - graph-databases

I'm trying to load 10.9M nodes in Memgraph (no edges yet). I have a CSV file with 3 columns: ID, label and description
But when loading with:
LOAD CSV FROM "/import-data/nodes.csv" NO HEADER AS row
CREATE (n:BIKG_node {id: row[0], label: row[1], description: row[2]}) ;
the server just crashes (DockerDesktop, 10GB max). Is there any flag I can give Memgraph to get some more verbose output?

Check the log files. Because you are running Memgraph through Docker, you will need to either enter the container using the docker exec command, for example:
docker exec -it CONTAINER_ID bash
or by using volumes.

Related

Sqlite3 Disk I/O error encountered after a while, but worked after using copy of db

I'm using an sqlite3 database to record data every second. The interface to it is provided by Flask-SQLAlchemy.
This can work fine for a couple of months, but eventually (as the .db file approaches 8 GB), an error prevents any more data from being written to the database:
Failed to commit: (sqlite3.OperationalError) disk I/O error
The journal file does not seem to be the issue here - if I restart the application and use the pragma journal_mode=TRUNCATE, the journal file is created but the disk I/O error persists.
Here's the .dbinfo (obtained from sqlite3.exe):
database page size: 1024
write format: 1
read format: 1
reserved bytes: 0
file change counter: 5200490
database page count: 7927331
freelist page count: 0
schema cookie: 12
schema format: 4
default cache size: 0
autovacuum top root: 0
incremental vacuum: 0
text encoding: 1 (utf8)
user version: 0
application id: 0
software version: 3008011
number of tables: 6
number of indexes: 7
number of triggers: 0
number of views: 0
schema size: 5630
data version 2
However this worked:
I made a copy of the .db file (call app.db and copy.db).
I renamed app.db to orig.db
I renamed copy.db to app.db (so effectively, I swapped it so that the copy becomes the app).
When I started my application again, it was able to write to the app.db file once more! So I could write to a copy I made of the database.
The drive is an SSD (Samsung 850 EVO mSATA)> I wonder if that's got something to do with it? Does anyone have any ideas on how I can prevent it from happening again?
EDIT: I've used the sqlite3.exe CLI to execute an INSERT INTO command manually, and this actually completed successfully (and wrote to the disk). However, when I re-ran my Flask-SQLAlchemy interface to write to it, it still came up with the disk I/O error.
UPDATE:
A colleague pointed out that this might be related to another question: https://stackoverflow.com/a/49506243/3274353
I strongly suspect now that this is a filesystem issue - in my system, the database file is being updated constantly alongside some other files which are also being written to.
So to reduce the amount of fragmentation, I'm getting the database to pre-allocate some disk space now using the answer provided in aforementioned question: https://stackoverflow.com/a/49506243/3274353
Something like this:
CREATE TABLE t(x);
INSERT INTO t VALUES(zeroblob(500*1024*1024)); -- 500 MB
DROP TABLE t;
To know whether this needs to be done, I use a call to the freelist_count pragma:
PRAGMA schema.freelist_count;
Return the number of unused pages in the database file.

How can use the hstore extension type when migrating OSM data to PostGreSQL?

I am trying to migrate some Open Street Maps .osm.pbf file to my postgresql database using osm2pgsql on an ubuntu machine.
When I inspect the data after the process has been finished, the column for tags is types as text[] and filled with data like this:
{highway,residential,lit,yes,name,"St. Nicholas Street",name:en,"Saint Nicholas Street",name:mt,"Triq San Nikola"}
Question 1:
Is that an hstore type data? I can't seem to query it as if so. I tried this format:
SELECT * FROM planet_osm_ways WHERE tags ? 'highway' LIMIT 20;
yealding this error:
ERROR: operator does not exist: text[] ? unknown
LINE 1: ...T *, tags as tags FROM planet_osm_ways WHERE tags ? 'highway...
Question 2:
So assuming the error and format above means that the data was simply not saved as an hstore typed data, how can I correct my migration process to fix this?
My process:
Create a new database using pgAdmin 3 on linux ubuntu.
add hstore extension to database (pgAdmin)
add postgis extension to database (pgAdmin)
super user into postgres user in terminal
osm2pgsql -c -d <DB_NAME> <FILE_PATH> --slim --hstore --multi-geometry
run migration command:
sudo -u postgres -i
That's it, osm2pgsql does it's thing and gives out a successful output.
So what am i missing in this process?

How to create a table of 5 GB in HBase for YCSB benchmarking?

I want to benchmark an HBase using YCSB. It's my first time using either.
I've gone through some online tutorials, and now I need to create a sample table of size 5 GB. But I don't know how to:
Batch-put a bunch of data into a table
Control the size to be around 5 GB
Could anyone give me some help on that?
Before, I've used HBase performance evaluation tool to load data into HBase. May be it can help you.
hbase org.apache.hadoop.hbase.PerformanceEvaluation
Various options are available for this tool. For your case you can set the data size to be 5GB.
This is pretty easy, the default (core) workload uses strings that are ~1KB each. So to get 5GB, just use 5,000,000 records.
You can do this by specifying the recordcount parameter in the command line, or creating your own workload file with this parameter inside.
Here's how you would do it on the command line (using the included workload workloada):
./bin/ycsb load hbase12 -P workloads/workloada -p recordcount=5000000
A custom file would look like this:
recordcount=5000000
operationcount=1000000
workload=com.yahoo.ycsb.workloads.CoreWorkload
readproportion=0.8
updateproportion=0.2
scanproportion=0
insertproportion=0
And then you just run:
./bin/ycsb load hbase12 -P myWorkload
This will insert all the data into your database.

Bulk load to HDFS from sybase database

I need to load data from sybase(production database) to HDFS. By using sqoop it is taking very long time and frequently hit the production database. So, I am thinking to create data files from sybase dump and after that copy the data files to hdfs. Is there any tool(open source) is available to create required data files(flat files) from sybase dump.
Thanks,
The iq_bcp command line utility is designed to do this on a per table basis. You just need to generate a list of tables, and you can iterate through the list.
iq_bcp [ [ database_name. ] owner. ] table_name { in | out } datafile
iq_bcp MyDB..MyTable out MyTable.csv -c -t#$#
-c specifies a character (plaintext) output
-t allows you to customize the column delimiter. You will want to use a character or series of characters that do not appear in your extact e.g. if you have a text column that contains text with a comma, a csv will be tricky to import without additional work.
Sybase IQ: iq_bcp

Export Image column from SQL Server 2000 using BCP

I've been tasked with extracting some data from an SQL Server 2000 database into a flat format on disk. I've little SQL Server experience.
There is a table which contains files stored in an "IMAGE" type column, together with an nvarchar column storing the filename.
It looks like there are numerous types of files stored in the table: Word docs, XLS, TIF, txt, zip files, etc.
I'm trying to extract just one row using BCP, doing something like this:
bcp "select file from attachments where id = 1234" queryout "c:\myfile.doc" -S <host> -T -n
This saves a file, but it is corrupt and I can't open it with Word. When I open the file with word, I can see a lot of the text, but I also get a lot of un-renderable characters. I've similar issues when trying to extract image files, e.g. TIF. Photo software won't open the files.
I presume I'm hitting some kind of character encoding problems.
I've played around with the -C (e.g. trying RAW) and -n options within BCP, but still can't get it to work.
The table in SQL Serer has a collation of "SQL_Latin1_General_CP1_CI_AS"
I'm running BCP remotely from my Windows 7 desktop.
Any idea where I'm going wrong? Any help greatly appreciated.
I got this working by changing the default options which BCP asks you about when you invoke the command:
The one that made the difference was changing the prefix-length field from 4 to 0.
bcp "select file from attachments where id = 1234" queryout "c:\myfile.doc" -S -T -n
after this
[image] : I (enter capital "I"]
0
0
Enter
save file Y
kallas.. ur file is there

Resources