Performance Tuning Mystery - sql-server

So I have a SSIS package with a performance problem. I had done 4 runs so far.
Run 1 - Run whole package . It takes 58 seconds . Performance problem replicated.
Run 2 - Run whole package with logging enabled. 66 seconds .
01-18-Package 66 10:32:26 10:33:32
Task1 2 10:32:26 10:32:28
Task2 1 10:32:28 10:32:29
Task3 2 10:32:29 10:32:31
Task4 1 10:32:31 10:32:32
Task5 1 10:32:31 10:32:32
Data Flow 59 10:32:32 10:33:31
Task 7 1 10:33:31 10:33:32
The bottleneck appears to be the Data Flow.
Run 3. Execute Data flow on its own using right click and execute task. It takes 8 seconds. What ? Running the package with only the data flow task, with play button gives me 9.6 seconds.
Run 4 . Strip out everything from package apart from data flow and run with logging. 52 seconds.
Is the problem the data flow or is this a memory issue ? What should be my logical next step in this investigation ? Logging is not the issue and the Data Flow on its own is not the problem. There is a lookup in the data flow that may use some memory if that is an issue.
[Find FaultID [33]] Information: The Find FaultID processed 540345
rows in the cache. The processing time was 1.623 seconds. The cache
used 19452420 bytes of memory.

Related

Is the duration time for Power Apps Dataflow from Azure SQL to Dataverse really slow and error messages this terrible?

I have a table in a Azure SQL Database which contains approximately 10 cols and 1.7 million rows. There data in each cell is mostly null/varchar(30).
When running a dataflow to a new table in Dataverse, I have two issues:
It takes around 14 hours (around 100k rows or so per hour)
It fails after 14 hours with the great error message (**** is just some entity names I have removed):
Dataflow name,Entity name,Start time,End time,Status,Upsert count,Error count,Status details
****** - *******,,1.5.2021 9:47:20 p.m.,2.5.2021 9:51:27 a.m.,Failed,,,There was a problem refreshing >the dataflow. please try again later. (request id: 5edec6c7-3d3c-49df-b6de-e8032999d049).
****** - ,,1.5.2021 9:47:43 p.m.,2.5.2021 9:51:26 a.m.,Aborted,0,0,
Table name,Row id,Request url,Error details
*******,,,Job failed due to timeout : A task was canceled.
Is it really so that this should take 14 hours :O ?
Are there any verbose logging I can enable to get a more friendly error message?

"Connection closed" occurs when executing a agent

"Connection closed" occurs when executing a function for data pre-processing.
The data pre-processing is as follows.
Import data points of about 30 topics from the database.( Data for 9 days every 1 minute,
60 * 24 * 9 * 30 = 388,800 values)
Convert data to a pandas dataframe for pre-processing such as missing value or resampling (this process takes the longest time)
Data processing
In the above data pre-processing, the following error occurs.
volttron.platform.vip.rmq_connection ERROR: Connection closed unexpectedly, reopening in 30 seconds.
This error is probably what the VOLTTRON platform does to manage the agent.
Since it takes more than 30 seconds in step 2, an error occurs and the VOLTTRON platform automatically restarts the agent.
Because of this, the agent cannot perform data processing normally.
Does anyone know how to avoid this?
If this is happening during agent instantiation I would suggest moving the pre-processing out of the init or configuration steps to a function with the #core.receiver("onstart") decorator. This will stop the agent instantiation and configuration steps from timing out. The listener agent's on start method can be used as an example.

Flink 1.8, parallelism > 1, source never outputs values

I have a cluster with:
1 TaskManager
1 StandaloneJob / JobManager
Config: taskmanager.numberOfTaskSlots: 1
If I set default.parallelism: 4 on a job with the Flink PubSub source, I keep getting this error when starting my "job cluster"/taskmanager:
[analytics-job-cluster-7bd4586ccb-s5hmp job] 2019-05-01 16:22:30,888 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Checkpoint triggering task Source: Custom Source -> Process -> Timestamps/Watermarks -> app_events (1/4) of job 00000000000000000000000000000000 is not in state RUNNING but SCHEDULED instead. Aborting checkpoint.
However, if I point the same job at a bunch of files, it works perfectly. What does this mean?
So, the issue is that You need the numberOfTaskSlots equal to Your parallelism basically. So in this case If You have only 1 TaskManager with only 1 TaskSlot Flink will not be able to start the job properly as there is simply not enough slots for it. If You set the numberOfTaskSlots for the given TaskManager equal to the parallelism, then it should work well.

Why does the log always say "No Data Available" when the cube is built?

In the sample case on the Kylin official website, when I was building cube, in the first step of the Create Intermediate Flat Hive Table, the log is always No Data Available, the status is always running.
The cube build has been executed for more than three hours.
I checked the hive database table kylin_sales and there is data in the table.
And I fount that the intermediate flat hive table kylin_intermediate_kylin_sales_cube_402e3eaa_dfb2_7e3e_04f3_07248c04c10c
has been created successfully in the hive, but there is no data in its.
hive> show tables;
OK
...
kylin_intermediate_kylin_sales_cube_402e3eaa_dfb2_7e3e_04f3_07248c04c10c
kylin_sales
...
Time taken: 9.816 seconds, Fetched: 10000 row(s)
hive> select * from kylin_sales;
OK
...
8992 2012-04-17 ABIN 15687 0 13 95.5336 17 10000975 10000507 ADMIN Shanghai
8993 2013-02-02 FP-non GTC 67698 0 13 85.7528 6 10000856 10004882 MODELER Hongkong
...
Time taken: 3.759 seconds, Fetched: 10000 row(s)
The deploy environment is as follows:
 
zookeeper-3.4.14
hadoop-3.2.0
hbase-1.4.9
apache-hive-2.3.4-bin
apache-kylin-2.6.1-bin-hbase1x
openssh5.3
jdk1.8.0_144
I deployed the cluster through docker and created 3 containers, one master, two slaves.
Create Intermediate Flat Hive Table step is running.
No Data Available means this step's log has not been captured by Kylin. Usually only when the step is exited (success or failed), the log will be recorded, then you will see the data.
For this case, usually, it indicates the job was pending by Hive, due to many reasons. The simplest way is, watch Kylin's log, you will see the Hive CMD that Kylin executes, and then you can run it manually in console, then you will reproduce the problem. Please check if your Hive/Hadoop has enough resource (cpu, memory) to execute such a query.

Apache2: server-status reported value for "requests/sec" is wrong. What am I doing wrong?

I am running Apache2 on Linux (Ubuntu 9.10).
I am trying to monitor the load on my server using mod_status.
There are 2 things that puzzle me (see cut-and-paste below):
The CPU load is reported as a ridiculously small number,
whereas, "uptime" reports a number between 0.05 and 0.15 at the same time.
The "requests/sec" is also ridiculously low (0.06)
when I know there are at least 10 requests coming in per second right now.
(You can see there are close to a quarter million "accesses" - this sounds right.)
I am wondering whether this is a bug (if so, is there a fix/workaround),
or maybe a configuration error (but I can't imagine how).
Any insights would be appreciated.
-- David Jones
- - - - -
Current Time: Friday, 07-Jan-2011 13:48:09 PST
Restart Time: Thursday, 25-Nov-2010 14:50:59 PST
Parent Server Generation: 0
Server uptime: 42 days 22 hours 57 minutes 10 seconds
Total accesses: 238015 - Total Traffic: 91.5 MB
CPU Usage: u2.15 s1.54 cu0 cs0 - 9.94e-5% CPU load
.0641 requests/sec - 25 B/second - 402 B/request
11 requests currently being processed, 2 idle workers
- - - - -
After I restarted my Apache server, I realized what is going on. The "requests/sec" is calculated over the lifetime of the server. So if your Apache server has been running for 3 months, this tells you nothing at all about the current load on your server. Instead, reports the total number of requests, divided by the total number of seconds.
It would be nice if there was a way to see the current load on your server. Any ideas?
Anyway, ... answered my own question.
-- David Jones
Apache status value "Total Accesses" is total access count since server started, it's delta value of seconds just what we mean "Request per seconds".
There is the way:
1) Apache monitor script for zabbix
https://github.com/lorf/zapache/blob/master/zapache
2) Install & config zabbix agentd
UserParameter=apache.status[*],/bin/bash /path/apache_status.sh $1 $2
3) Zabbix - Create apache template - Create Monitor item
Key: apache.status[{$APACHE_STATUS_URL}, TotalAccesses]
Type: Numeric(float)
Update interval: 20
Store value: Delta (speed per second) --this is the key option
Zabbix will calculate the increment of the apache request, store delta value, that is "Request per seconds".

Resources