Why does the log always say "No Data Available" when the cube is built? - cube

In the sample case on the Kylin official website, when I was building cube, in the first step of the Create Intermediate Flat Hive Table, the log is always No Data Available, the status is always running.
The cube build has been executed for more than three hours.
I checked the hive database table kylin_sales and there is data in the table.
And I fount that the intermediate flat hive table kylin_intermediate_kylin_sales_cube_402e3eaa_dfb2_7e3e_04f3_07248c04c10c
has been created successfully in the hive, but there is no data in its.
hive> show tables;
OK
...
kylin_intermediate_kylin_sales_cube_402e3eaa_dfb2_7e3e_04f3_07248c04c10c
kylin_sales
...
Time taken: 9.816 seconds, Fetched: 10000 row(s)
hive> select * from kylin_sales;
OK
...
8992 2012-04-17 ABIN 15687 0 13 95.5336 17 10000975 10000507 ADMIN Shanghai
8993 2013-02-02 FP-non GTC 67698 0 13 85.7528 6 10000856 10004882 MODELER Hongkong
...
Time taken: 3.759 seconds, Fetched: 10000 row(s)
The deploy environment is as follows:
 
zookeeper-3.4.14
hadoop-3.2.0
hbase-1.4.9
apache-hive-2.3.4-bin
apache-kylin-2.6.1-bin-hbase1x
openssh5.3
jdk1.8.0_144
I deployed the cluster through docker and created 3 containers, one master, two slaves.
Create Intermediate Flat Hive Table step is running.

No Data Available means this step's log has not been captured by Kylin. Usually only when the step is exited (success or failed), the log will be recorded, then you will see the data.
For this case, usually, it indicates the job was pending by Hive, due to many reasons. The simplest way is, watch Kylin's log, you will see the Hive CMD that Kylin executes, and then you can run it manually in console, then you will reproduce the problem. Please check if your Hive/Hadoop has enough resource (cpu, memory) to execute such a query.

Related

Is the duration time for Power Apps Dataflow from Azure SQL to Dataverse really slow and error messages this terrible?

I have a table in a Azure SQL Database which contains approximately 10 cols and 1.7 million rows. There data in each cell is mostly null/varchar(30).
When running a dataflow to a new table in Dataverse, I have two issues:
It takes around 14 hours (around 100k rows or so per hour)
It fails after 14 hours with the great error message (**** is just some entity names I have removed):
Dataflow name,Entity name,Start time,End time,Status,Upsert count,Error count,Status details
****** - *******,,1.5.2021 9:47:20 p.m.,2.5.2021 9:51:27 a.m.,Failed,,,There was a problem refreshing >the dataflow. please try again later. (request id: 5edec6c7-3d3c-49df-b6de-e8032999d049).
****** - ,,1.5.2021 9:47:43 p.m.,2.5.2021 9:51:26 a.m.,Aborted,0,0,
Table name,Row id,Request url,Error details
*******,,,Job failed due to timeout : A task was canceled.
Is it really so that this should take 14 hours :O ?
Are there any verbose logging I can enable to get a more friendly error message?

in-effficiency in flink job like : INSERT INTO hive_table select orgId, 2.0 , pdate, '02' from users limit 10000 where user is a kafka table

The system should just pick 10000 message and finish . Here is what is forever running. it taking in 78 G data and forever going. don't know if this default behavior. Also. it never commit in commit sink.
Above is running on flink1.12 scala 2.12 build with all hive 3.1.2
enter image description here
The streaming file sink only commits while checkpointing. Perhaps you need to enable and configure checkpointing.

What causes data on a read-replica to be an old_snapshot and cause conflict?

After encountering (on an RDS Postgres instance) for several times:
ERROR: canceling statement due to conflict with recovery
Detail: User query might have needed to see row versions that must be removed
I ran (on the hot standby):
SELECT *
FROM pg_stat_database_conflicts;
And found that all the conflicts have to do with confl_snapshot
Which is explained in the documentation as:
confl_snapshot: Number of queries in this database that have been canceled due to old
snapshots
What might be causing this conflict (being an old snapshot)?
If it helps, here are some of the relevant settings (by running SHOW ALL ; on the stand by):
hot_standby: on
hot_standby_feedback: off
max_standby_archive_delay: 30s
max_standby_streaming_delay: 1h
name,setting
old_snapshot_threshold: -1
vacuum_defer_cleanup_age: 0
vacuum_freeze_min_age: 50000000
vacuum_freeze_table_age: 150000000
vacuum_multixact_freeze_min_age: 5000000
vacuum_multixact_freeze_table_age: 150000000
wal_level: replica
wal_receiver_status_interval: 10s
wal_receiver_timeout: 30s
wal_retrieve_retry_interval: 5s
wal_segment_size: 16MB
wal_sender_timeout: 30s
wal_writer_delay: 200ms

Speed up sqlFetch()

I am working with an Oracle database and like to fetch a table with 30 million records.
library(RODBC)
ch <- odbcConnect("test", uid="test_user",
pwd="test_pwd",
believeNRows=FALSE, readOnly=TRUE)
db <- sqlFetch(ch, "test_table")
For 1 million records the process needs 1074.58 sec. Thus, it takes quite a while for all 30 million records. Is there any possiblity to speed up the process?
I would appreciate any help. Thanks.
You could try making a system call through the R terminal
to a mySQL shell using the system() command. Process your data externally and only load what you need as output.

What is a viable local database for Windows Phone 7 right now?

I was wondering what is a viable database solution for local storage on Windows Phone 7 right now. Using search I stumbled upon these 2 threads but they are over a few months old. I was wondering if there are some new development in databases for WP7. And I didn't found any reviews about the databases mentioned in the links below.
windows phone 7 database
Local Sql database support for Windows phone 7
My requirements are:
It should be free for commercial use
Saving/updating a record should only save the actual record and not the entire database (unlike WinPhone7 DB)
Able to fast query on a table with ~1000 records using LINQ.
Should also work in simulator
EDIT:
Just tried Sterling using a simple test app: It looks good, but I have 2 issues.
Creating 1000 records takes 30 seconds using db.Save(myPerson). Person is a simple class with 5 properties.
Then I discovered there is a db.SaveAsync<Person>(IList) method. This is fine because it doesn't block the current thread anymore.
BUT my question is: Is it save to call db.Flush() immediately and do a query on the currently saving IList? (because it takes up to 30 seconds to save the records in synchronous mode). Or do I have to wait until the BackgroundWorker has finished saving?
Query these 1000 records with LINQ and a where clause the first time takes up to 14 sec to load into memory.
Is there a way to speed this up?
Here are some benchmark results: (Unit tests was executed on a HTC Trophy)
-----------------------------
purging: 7,59 sec
creating 1000 records: 0,006 sec
saving 1000 records: 32,374 sec
flushing 1000 records: 0,07 sec
-----------------------------
//async
creating 1000 records: 0,04 sec
saving 1000 records: 0,004 sec
flushing 1000 records: 0 sec
-----------------------------
//get all keys
persons list count = 1000 (0,007)
-----------------------------
//get all persons with a where clause
persons list with query count = 26 (14,241)
-----------------------------
//update 1 property of 1 record + save
persons list with query count = 26 (0,003s)
db saved (0,072s)
You might want to take a look at Sterling - it should address most of your concerns and is very flexible.
http://sterling.codeplex.com/
(Full disclosure: my project)
try Siaqodb is commercial project and as difference from Sterling, not serialize objects and keep all in memory for query.Siaqodb can be queried by LINQ provider which efficiently can pull from database even only fields values without create any objects in memory, or load/construct only objects that was requested.
Perst is free for non-commercial use.
You might also want to try Ninja Database Pro. It looks like it has more features than Sterling.
http://www.kellermansoftware.com/p-43-ninja-database-pro.aspx

Resources