How to make sure my data import to Nebula Graph is successful? - graph-databases

I want to use import data with cvs file to Nebula Graph for testing, how to make sure if the data import is successful?

Firstly, there are some logs which could show you the progress of importing like the following messages:
importer_1 | 2019/12/13 06:58:55 [INFO] statsmgr.go:61: Done(/usr/local/nebula/importer/examples/student.csv): Time(0.03s), Finished(23), Failed(0), Latency AVG(1594us), Batches Req AVG(2509us), Rows AVG(717.33/s)
importer_1 | 2019/12/13 06:58:55 [INFO] reader.go:103: Total lines of file(/usr/local/nebula/importer/examples/follow.csv) is: 4, error lines: 0
importer_1 | 2019/12/13 06:58:55 [INFO] statsmgr.go:61: Done(/usr/local/nebula/importer/examples/student.csv): Time(0.03s), Finished(26), Failed(0), Latency AVG(1454us), Batches Req AVG(2323us), Rows AVG(763.83/s)
importer_1 | 2019/12/13 06:58:55 [INFO] reader.go:103: Total lines of file(/usr/local/nebula/importer/examples/student.csv) is: 3, error lines: 0
importer_1 | 2019/12/13 06:58:55 [INFO] statsmgr.go:61: Done(/usr/local/nebula/importer/examples/follow.csv): Time(0.03s), Finished(26), Failed(0), Latency AVG(1454us), Batches Req AVG(2323us), Rows AVG(760.51/s)
importer_1 | 2019/12/13 06:58:55 [INFO] httpserver.go:37: Shutdown http server listened on 5699
importer_1 | 2019/12/13 06:58:55 [INFO] cmd.go:31: Finish import data, consume time: 0.04s
The Finished and Failed counters will tell you the result of this process.
Secondly, you can check whether error data file named failDataPath in YAML has any failed data. If the file is empty, all rows of the CSV data are inserted into Nebula successfully.

Related

After helm deploys tdengine-3.0.2.4 cluster (1mnode-3dnodes-3replica) mnode crashes and the cluster cannot be recovered

When using tdengine in the k8s environment, you will often encounter the restart of the pod where the cluster mnode node is located, and the cluster reinstallation, so two scenarios are tested.
"create database test replica 3 ;", when there are 3 replicas, in both scenarios, the tdengine cluster cannot be used normally.
Details as following:
k8s environment helm deployment tdengine-3.0.2.4 cluster (1mnode-3dnodes-3replica)
After deleting the pod, tdengine3-0 (mnode), and after the new pod is built, use the client to check the database and find that the status of tdengine3-1 and tdengine3-2 is offline
Error DB error: Fail to get table info, error: Sync not leader when checking data
Simulate the pod restart scenario where the mnode node is located in the k8s cluster, the operation is as follows:
[root#node01 tdengine]# kubectl delete pod tdengine3-0
pod "tdengine3-0" deleted
[root#node01 tdengine]# kubectl get pod -w|grep tdengine3
tdengine3-0 0/1 Running 0 8s
tdengine3-1 1/1 Running 0 62s
tdengine3-2 1/1 Running 0 2m10s
tdengine3-0 1/1 Running 0 10s
[root#node01 tdengine]# kubectl exec -it tdengine3-0 -- /bin/bash
root#tdengine3-0:~# taos
Welcome to the TDengine Command Line Interface, Client Version:3.0.2.4
Copyright (c) 2022 by TDengine, all rights reserved.
****************************** Tab Completion **********************************
* The TDengine CLI supports tab completion for a variety of items, *
* including database names, table names, function names and keywords. *
* The full list of shortcut keys is as follows: *
* [ TAB ] ...... complete the current word *
* ...... if used on a blank line, display all valid commands *
* [ Ctrl + A ] ...... move cursor to the st[A]rt of the line *
* [ Ctrl + E ] ...... move cursor to the [E]nd of the line *
* [ Ctrl + W ] ...... move cursor to the middle of the line *
* [ Ctrl + L ] ...... clear the entire screen *
* [ Ctrl + K ] ...... clear the screen after the cursor *
* [ Ctrl + U ] ...... clear the screen before the cursor *
**********************************************************************************
Server is Community Edition.
taos> show dnodes;
id | endpoint | vnodes | support_vnodes | status | create_time | note |
=================================================================================================================================================
1 | tdengine3-0.tdengine3.defau... | 2 | 8 | ready | 2023-01-30 17:42:13.682 | |
2 | tdengine3-1.tdengine3.defau... | 2 | 0 | offline | 2023-01-30 17:43:35.428 | status not received |
3 | tdengine3-2.tdengine3.defau... | 2 | 0 | offline | 2023-01-30 17:44:50.947 | status not received |
Query OK, 3 row(s) in set (0.002463s)
taos> show databases;
name |
=================================
information_schema |
performance_schema |
test |
Query OK, 3 row(s) in set (0.002456s)
taos> use test;
Database changed.
taos> select * from demo;
DB error: Fail to get table info, error: Sync not leader (10.288545s)
taos> select * from demo;
DB error: Fail to get table info, error: Sync not leader (10.289980s)

Deploying tdengine database in k8s,I don’t know what does this log means, is it a bug? or platform limitation

tdengine is deployed in k8s, through the way of http query, the query statement uses functions such as max, min, last, etc. I want to know where the parameters are not set properly, what is the meaning of tables and qId in the log, and what can be done Can you check the reasons for this query over and stop read? thanks.
Version is version 2.4
The log is as follows:
TAOS_ADAPTER info "| 200 | 19.518869ms | 10.233.94.78 | POST | /rest/sql/indc_point_data_40 " model=web sessionID=8660772
01/30 10:47:50.236770 00000189 TDB 0x7f8779a12c30 LIMIT_READ query is over and stop read. tables=1 qId=0x89f00496e6ac46
01/30 10:47:50.237723 00000200 TDB 0x7f87a5d47860 LIMIT_READ query is over and stop read. tables=1 qId=0x89f00496e6b449
01/30 10:47:50.239013 00000127 TAOS_ADAPTER info "| 200 | 9.334362ms | 10.233.94.37 | POST | /rest/sql/indc_point_data_40 " model=web sessionID=8660775
01/30 10:47:50.243209 00000207 TDB 0x7f87c18036c0 LIMIT_READ query is over and stop read. tables=1 qId=0x89f00496e6ec4e
01/30 10:47:50.245438 00000127 TAOS_ADAPTER info "| 200 | 19.027155ms | 10.233.101.104 | POST | /rest/sql/indc_point_data_40 " sessionID=8660774 model=web
01/30 10:47:50.252404 00000191 TDB 0x7f87519dfde0 LIMIT_READ query is over and stop read. tables=1 qId=0x89f00496e73c57

403 error running data unload with snowsql GET

I'm having issues testing a data unload flow from Snowflake using the GET command to store the files on my local machine.
Following the documentation here, it should be as simple as creating a stage, copying the data I want to that stage, and then running a snowsql command locally to retrieve the files.
I'm on Windows 10, running the following snowsql command to try and unload the data, against a database populated with the test TCP-H data that snowflake provides:
snowsql -a <account id> -u <username> -q "
USE DATABASE TESTDB;
CREATE OR REPLACE STAGE TESTSNOWFLAKESTAGE;
copy into #TESTSNOWFLAKESTAGE/supplier from SUPPLIER;
GET #TESTSNOWFLAKESTAGE file://C:/Users/<local user>/Downloads/unload;"
All commands run successfully, except for the final GET:
SnowSQL * v1.2.14
Type SQL statements or !help
+----------------------------------+
| status |
|----------------------------------|
| Statement executed successfully. |
+----------------------------------+
1 Row(s) produced. Time Elapsed: 0.121s
+-------------------------------------------------+
| status |
|-------------------------------------------------|
| Stage area TESTSNOWFLAKESTAGE successfully created. |
+-------------------------------------------------+
1 Row(s) produced. Time Elapsed: 0.293s
+---------------+-------------+--------------+
| rows_unloaded | input_bytes | output_bytes |
|---------------+-------------+--------------|
| 100000 | 14137839 | 5636225 |
+---------------+-------------+--------------+
1 Row(s) produced. Time Elapsed: 7.548s
+-----------------------+------+--------+------------------------------------------------------------------------------------------------------+
| file | size | status | message |
|-----------------------+------+--------+------------------------------------------------------------------------------------------------------|
| supplier_0_0_0.csv.gz | -1 | ERROR | An error occurred (403) when calling the HeadObject operation: Forbidden, file=supplier_0_0_0.csv.gz |
+-----------------------+------+--------+------------------------------------------------------------------------------------------------------+
1 Row(s) produced. Time Elapsed: 1.434s
This 403 looks like it's coming from the S3 instance backing my Snowflake account, but that's part of the abstracted service layer provided by Snowflake, so I'm not sure where I would have to go to flip auth switches.
Any guidance is much appreciated.
You need to use Windows-based slashes in your local file path. So, assuming that to #NickW's point, you are filling your local user correctly, the format should be like the following:
file://C:\Users\<local user>\Downloads
There are some examples in the documentation for this here:
https://docs.snowflake.com/en/sql-reference/sql/get.html#required-parameters

Apache Beam 2.1.0 with Google DatastoreIO calls Guava Preconditions checkArgument on non-existing function in GAE

When building a dataflow template which should read from datastore I get the following error in stackdriver logs (from Google App Engine):
java.lang.NoSuchMethodError:
com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;I)V
at
org.apache.beam.sdk.io.gcp.datastore.DatastoreV1$Read.withQuery(DatastoreV1.java:494) .... my code
This happens in a line where a read from Datastore would be generated. The pom dependency
<!-- https://mvnrepository.com/artifact/org.apache.beam/beam-sdks-java-io-google-cloud-platform -->
<dependency>
<groupId>org.apache.beam</groupId>
<artifactId>beam-sdks-java-io-google-cloud-platform</artifactId>
<version>2.1.0</version>
</dependency>
References
<!-- https://mvnrepository.com/artifact/com.google.guava/guava -->
<dependency>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
<version>20.0</version>
</dependency>
But this version does not contain a method checkArgument(String string) in class Preconditions, nor does any other version I looked at. As mentioned above, the template should be built inside a GAE flexible environment project and later executed, but the template generation fails.
If I let a main function generate the template locally, it works fine, but as soon as the project is in GAE, it fails.
Any Input is highly appreciated
EDIT: the dependency tree for com.google.guava:
[INFO] xy.company_name.test:bcc.dataflow.project_name:war:0.0.3
[INFO] \- org.apache.beam:beam-runners-google-cloud-dataflow-java:jar:2.1.0:compile
[INFO] +- org.apache.beam:beam-sdks-java-extensions-google-cloud-platform-core:jar:2.1.0:compile
[INFO] | \- com.google.cloud.bigdataoss:gcsio:jar:1.4.5:compile
[INFO] | \- (com.google.guava:guava:jar:18.0:compile - omitted for conflict with 20.0)
[INFO] +- org.apache.beam:beam-sdks-java-io-google-cloud-platform:jar:2.1.0:compile
[INFO] | +- io.grpc:grpc-core:jar:1.2.0:compile
[INFO] | | \- (com.google.guava:guava:jar:19.0:compile - omitted for conflict with 20.0)
[INFO] | +- com.google.api:gax-grpc:jar:0.20.0:compile
[INFO] | | +- io.grpc:grpc-protobuf:jar:1.2.0:compile
[INFO] | | | +- (com.google.guava:guava:jar:19.0:compile - omitted for duplicate)
[INFO] | | | \- io.grpc:grpc-protobuf-lite:jar:1.2.0:compile
[INFO] | | | \- (com.google.guava:guava:jar:19.0:compile - omitted for duplicate)
[INFO] | | +- com.google.api:api-common:jar:1.1.0:compile
[INFO] | | | \- (com.google.guava:guava:jar:19.0:compile - omitted for duplicate)
[INFO] | | +- (com.google.guava:guava:jar:19.0:compile - omitted for duplicate)
[INFO] | | \- com.google.api:gax:jar:1.3.1:compile
[INFO] | | \- (com.google.guava:guava:jar:19.0:compile - omitted for duplicate)
[INFO] | +- com.google.cloud:google-cloud-core-grpc:jar:1.2.0:compile
[INFO] | | +- (com.google.guava:guava:jar:19.0:compile - omitted for duplicate)
[INFO] | | \- com.google.protobuf:protobuf-java-util:jar:3.2.0:compile
[INFO] | | \- (com.google.guava:guava:jar:18.0:compile - omitted for conflict with 19.0)
[INFO] | +- com.google.cloud.datastore:datastore-v1-proto-client:jar:1.4.0:compile
[INFO] | | \- (com.google.guava:guava:jar:18.0:compile - omitted for conflict with 19.0)
[INFO] | +- io.grpc:grpc-all:jar:1.2.0:runtime
[INFO] | | \- io.grpc:grpc-protobuf-nano:jar:1.2.0:runtime
[INFO] | | \- (com.google.guava:guava:jar:19.0:runtime - omitted for duplicate)
[INFO] | +- com.google.cloud:google-cloud-core:jar:1.0.2:compile
[INFO] | | \- (com.google.guava:guava:jar:19.0:compile - omitted for duplicate)
[INFO] | +- com.google.cloud.bigtable:bigtable-protos:jar:0.9.7.1:compile
[INFO] | | \- (com.google.guava:guava:jar:19.0:compile - omitted for duplicate)
[INFO] | +- com.google.cloud.bigtable:bigtable-client-core:jar:0.9.7.1:compile
[INFO] | | +- com.google.auth:google-auth-library-appengine:jar:0.6.1:compile
[INFO] | | | \- (com.google.guava:guava:jar:19.0:compile - omitted for duplicate)
[INFO] | | \- (com.google.guava:guava:jar:19.0:compile - omitted for duplicate)
[INFO] | \- com.google.guava:guava:jar:20.0:compile
[INFO] +- com.google.auth:google-auth-library-oauth2-http:jar:0.7.1:compile
[INFO] | \- (com.google.guava:guava:jar:19.0:compile - omitted for conflict with 20.0)
[INFO] \- com.google.cloud.bigdataoss:util:jar:1.4.5:compile
[INFO] \- (com.google.guava:guava:jar:18.0:compile - omitted for conflict with 20.0)
UPDATE:
After Adding
<dependencyManagement>
<dependencies>
<dependency>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
<version>20.0</version>
</dependency>
</dependencies>
</dependencyManagement>
And updating a function handling DatastoreEntities it seems to work again!
Sorry for bothering, sometimes it just helps to structure the problem and stackoverflow is a great help to do so.
As noted here that corresponds to Preconditions.checkArgument(boolean, String, int). The Z is a boolean, and the Ljava/lang/String; is a String, and the I is an integer. This method should exist in Guava 20.0.

issue while importing from sqlserver into hive using apache sqoop?

When i run following sqoop command from $SQOOP_HOME/bin it works fine
sqoop import --connect "jdbc:sqlserver://ip_address:port_number;database=database_name;username=sa;password=sa#Admin" --table $SQL_TABLE_NAME --hive-import --hive-home $HIVE_HOME --hive-table $HIVE_TABLE_NAME -m 1
But when i run the same command in a loop for different databases from bash script as follows
while IFS='' read -r line || [[ -n $line ]]; do
$DATABASE_NAME=$line
sqoop import --connect "jdbc:sqlserver://ip_address:port_number;database=$DATABASE_NAME;username=sa;password=sa#Admin" --table $SQL_TABLE_NAME --hive-import --hive-home $HIVE_HOME --hive-table $HIVE_TABLE_NAME -m 1
done < "$1"
I am passing database names to my bash script in text file as parameter. My hive table is same as i want to append data from all databases in one hive table only.
For first two-three databases it works fine after that it starts giving following errors
15/06/25 11:41:06 INFO mapreduce.Job: Job job_1435124207953_0033 failed with state FAILED due to:
15/06/25 11:41:06 INFO mapreduce.ImportJobBase: The MapReduce job has already been retired. Performance
15/06/25 11:41:06 INFO mapreduce.ImportJobBase: counters are unavailable. To get this information,
15/06/25 11:41:06 INFO mapreduce.ImportJobBase: you will need to enable the completed job store on
15/06/25 11:41:06 INFO mapreduce.ImportJobBase: the jobtracker with:
11:41:06 INFO mapreduce.ImportJobBase:mapreduce.jobtracker.persist.jobstatus.active = true
11:41:06 INFO mapreduce.ImportJobBase: mapreduce.jobtracker.persist.jobstatus.hours = 1
15/06/25 11:41:06 INFO mapreduce.ImportJobBase: A jobtracker restart is required for these settings
15/06/25 11:41:06 INFO mapreduce.ImportJobBase: to take effect.
15/06/25 11:41:06 ERROR tool.ImportTool: Error during import: Import job failed!
I have already restarted my multi-node hadoop cluster by changing mapred-site.xml with above two said parametrs i.e
mapreduce.ImportJobBase:mapreduce.jobtracker.persist.jobstatus.active = true
mapreduce.jobtracker.persist.jobstatus.hours = 1
Still i am facing same problem. As i have just started learning sqoop any help will be appreciated.

Resources