NEO4J Query and Data Output very slow - database

I have a hardware Server (2x 6 CPU Intel e5, 128GB DDR3 RAM, NVM-SSDs)
Config in .config-files for RAM are:
dbms.memory.heap.initial_size=25g
dbms.memory.heap.max_size=25g
dbms.memory.pagecache.size=40g
dbms.memory.transaction.global_max_size=20g
I have a tree structure. For test purpose I genereated a Tree where each parent has 15 children with a depth of 4. (depth: 0=>15, 1=> 225, 2=>3.375, 3=>50k 4=>760k)
If I want to output a lot of nodes the query gets stuck.
Nodes are persons with a only a few attributes:
-id (from neo4j)
-status
-customerid
-level (depth of tree)
The Nodes are linked in two ways:
:UPLINE from deeper level pointing at root
:DOWNLINE from root to leaves
My Query for getting the tree:
MATCH p = (r:Person {VM:1})-[:DOWNLINE *]->(x)
RETURN DISTINCT nodes(p) AS Customer
LIMIT 5000
Started streaming 5000 records after 2 ms and completed after 42 ms, displaying first 1000 rows.
Displaying the data takes so long even for 5000 records. It takes like 30s to display the nodes or show them in a table.
For a test I created a tree structure in a MySQL and SQL-Server database with the schema:
ID--------ManagerID-------Status
1 ----------- [-] ----------- active
2 ------------ 1 ----------- active
...
15 ----------- 1 ----------- active
16 ----------- 2 ----------- inactive
17 ----------- 2 ----------- active
...
If I query that with CTE (recursive) I get faster times then on NEO4j.
Am I wrong thinking that NEO4j should be faster at this task?
Thank you for your help!

Related

How to correctly structure a IOT sensor database model?

I'm working on a shop floor Equipment Data Collection project which aims to analyze production orders historically and with real-time data (HMI close to the operator).
Actual database status:
Data is extracted from different equipment (with different protocols) and placed in an SQL server with the following structure:
PROCESS table: As the main table, whenever a batch (production unit) is started, a ProcessID is created as well as varied information:
ProcessID
Room
EquipmentID
BatchID
Program
Operator
Start
End
209486
Room1
1010
985985
RecipeA
Jim
2022.04.05 13:58:02
2022.04.05 15:58:02
Equipment family table: For each equipment family (mixers, ovens etc.) a table is created in which its sensor values (humidity, temperature, speed etc.) are collected every 5 seconds. Here is an example the BatchID above, where ProcessID = Mix ID on the equipment family table - dbo.Mixer :
MixID
EquipmentID
Humidity
Temperature
Speed
DateTime
209486
1010
2.5
70
250
2022.04.05 13:58:02
209486
1010
2.6
73
215
2022.04.05 13:58:07
....
....
....
....
....
....
So, the database is structured with a main PROCESS table and several equipment family tables that are being created during the project development (dbo.mixer, dbo.oven etc).
have the following data flow: SQLServer(source) - RDS Server - Power BI.
Problems of actual status & doubts
With the development of the project, 2 problems arise:
MANUAL WORK: Insertion, in the source, of new tables and columns (in existing tables) causes the need of manual alteration in the RDS server and in Power BI. Every time a new equipment communication is developed and is a new equipment family, a new table is created or if we need to introduce a new sensor in an existing table since the sensors are headers of the table.
Real-time data The actual architecture makes it difficult to implement real time dashboarding.
With these two big problems we are currently analyzing that the new system architecture should be:
SQLServer(source) - DataLake - Snowflake(DataWarehouse) - Power BI (or any application).
However, this won't solve the manual work defined in 1). For this problem we are looking to restructure the source to just 2 tables: Process (equal) and Sensors table(new). This new table would be a narrow big big big table with billions of timestamps of all the different equipment sensors (over 60 equipment), structured as follows:
. dbo.Sensors:
ProcessId
EquipmentID
SensorID
SensorValue
DateTime
209486
1010
1
2.5
2022.04.05 13:58:02
209486
1010
2
70
2022.04.05 13:58:02
209486
1010
3
250
2022.04.05 13:58:02
with a corresponding Sensor Dimension Table (could be created at DataWarehouse) :
SensorID
EquipmentID
SensorName
SensorUnit
1
1010
Humidity
%
2
1010
Temperature
ÂșC
3
1010
Speed
rpm
So, would it be a better way to restructure source and create this giant tall table rather than continuing current structure? At least it will solve the problem of new table or new columns input.
On the other hand, the size of this table will be enormous given that more and more equipment and more sensors are continually being inserted.
Hoping someone might point us in the right direction.

How do I check rebuild is complete after losing a replica?

I am using OpenEBS with Jiva. I have MySQL pod running on OpenEBS with 3 replicas. My DB is around 10GB with the actual volume size ~30GB
After I lose a replica, new replica span up. Assuming that it starts replicating data immediately;
1) How do I know rebuild is done and it's safe to say ?
2) What is the average time to complete a replica rebuild on AWS (using EBS volumes) per 10 GB of data?
You need to exec this example on you openebs-apiserver:
mayactl --namespace stolon --volname my-data-my-service-0-1559496748 volume stats
End got result:
Executing volume stats...
Portal Details :
---------------
IQN : iqn.2016-09.com.openebs.jiva:my-data-my-service-0-1559496748
Volume : my-data-my-service-0-1559496748
Portal : 10.43.111.28:3260
Size : 70Gi
Replica Stats :
----------------
REPLICA STATUS DATAUPDATEINDEX
-------- ------- ----------------
10.42.7.56 running 1784
10.42.9.13 running 1266322
10.42.3.13 running 1266322
Performance Stats :
--------------------
r/s w/s r(MB/s) w(MB/s) rLat(ms) wLat(ms)
---- ---- -------- -------- --------- ---------
0 22 0.000 0.188 0.000 7.625
Capacity Stats :
---------------
LOGICAL(GB) USED(GB)
------------ ---------
77.834 65.966
From this example you can see that this replica not ready with DATAUPDATEINDEX
10.42.7.56 running 1784

SQL Server - Query String Greater Or Equal To

I am attempting to optimise a query in my application that is causing problems when scaling my application.
The table contains two columns: FROM and TO which each contain values. Here is an example:
Row | From | To
1 | AA | Z
2 | B | C
3 | JA | JZ
4 | JM | JZ
The query is passed a name (JOHN) and should return a list of ranges from the table that could contain the name.
select * from Ranges where From <= 'JOHN' and To >= 'JOHN'
Using the table above this would result in rows 1 and 3 being returned.
The problem I am having is one of query consistency.
All indexes are in place but if I search for JOHN the query returns in 20 milliseconds, whereas MARK returns in 250 milliseconds.
Looking at query analyzer shows me that JOHN is actually searching for more rows than MARK but I'm struggling to understand how or why MARK takes so long.
If the time difference was 20 - 40 milliseconds, I could live with that but 250 is so large a difference that the overall performance of my application is terrible.
Does anybody have any idea how I could narrow down why I get such variance in my queries OR a better way of storing and searching for string ranges (which could contains letters and numbers).
Many thanks in advance.
EDIT - One thing I forgot to mention was that the original table contains approximately 15 million rows (its actually postcodes).

How Do You Monitor The CPU Utilization of a Process Using PowerShell?

I'm trying to write a PowerShell script to monitor % CPU utilization of a SQL Server process. I'd like to record snapshots of this number every day so we can monitor it over time and watch for trends.
My research online said this WMI query should give me what I want:
Get-WmiObject -Query "SELECT PercentProcessorTime FROM win32_PerfFormattedData_PerfProc_Process WHERE Name='SqlServr'"
When I run the WMI query I usually get a value somewhere between 30-50%
However, when I watch the process in Resource Monitor it usually averages at less than 1% CPU usage
I know the WMI query is simply returning snapshot of CPU usage rather than an average over a long period of time so I know the two aren't directly comparable. Even so, I think the snapshot should usually be less than 1% since the Resource Monitor average is less than 1%.
Does anybody have any ideas on why there is such a large discrepancy? And how I can get an accurate measurement of the CPU usage for the process?
Everything I've learned about WMI and performance counters over the last couple of days.
WMI stands for Windows Management Instrumentation. WMI is a collection of classes registered with the WMI system and the Windows COM subsystem. These classes are known as providers and have any number of public properties that return dynamic data when queried.
Windows comes pre-installed with a large number of WMI providers that give you information about the Windows environment. For this question we are concerned with the Win32_PerfRawData* providers and the two wrappers that build off of it.
If you query any Win32_PerfRawData* provider directly you'll notice the numbers it returns are scary looking. That's because these providers give the raw data you can use to calculate whatever you want.
To make it easier to work with the Win32_PerfRawData* providers Microsoft has provided two wrappers that return nicer answers when queried, PerfMon and Win32_PerfFormattedData* providers.
Ok, so how do we get a process's % CPU utilization? We have three options:
Get a nicely formatted number from the Win32_PerfFormattedData_PerfProc_Process provider
Get a nicely formatted number from PerfMon
Calculate the % CPU utilization for ourselves using Win32_PerfRawData_PerfProc_Process
We will see that there is a bug with option 1 so that it doesn't work in all cases even though this is the answer usually given on the internet.
If you want to get this value from Win32_PerfFormattedData_PerfProc_Process you can use the query mentioned in the question. This will give you the sum of the PercentProcessorTime value for all of this process's threads. The problem is that this sum can be >100 if there is more than 1 core but this property maxes out at 100. So, as long as the sum of all this process's threads is less than 100 you can get your answer by dividing the process's PercentProcessorTime property by the core count of the machine.
If you want to get this value from PerfMon in PowerShell you can use Get-Counter "\Process(SqlServr)\% Processor Time". This will return a number between 0 - (CoreCount * 100).
If you want to calculate this value for yourself the PercentProcessorTime property on the Win32_PerfRawData_PerfProc_Process provider returns the CPU time this process has used. So, you'll need to take two snapshots we'll call them s1 and s2. We'll then do (s2.PercentProcessorTime - s1.PercentProcessorTime) / (s2.TimeStamp_Sys100NS - s1.TimeStamp_Sys100NS).
And that is the final word. Hope it helps you.
Your hypothesis is almost correct. A single thread (and a process will always have at least one thread) can have at most 100% for PercentProcessorTime but:
A process can have multiple threads.
A system can have multiple (logical) CPU cores.
Hence here (Intel i7 CPU with hyperthreading on) I have 8 logical cores, and the top 20 threads (filtering out totals) shows (with a little tidying up to make it readable):
PS > gwmi Win32_PerfFormattedData_PerfProc_Thread |
?{$_.Name -notmatch '_Total'} |
sort PercentProcessorTime -desc |
select -first 20 |
ft -auto Name,IDProcess,IDThread,PercentProcessorTime
Name IDProcess IDThread PercentProcessorTime
---- --------- -------- --------------------
Idle/6 0 0 100
Idle/3 0 0 100
Idle/5 0 0 100
Idle/1 0 0 100
Idle/7 0 0 96
Idle/4 0 0 96
Idle/0 0 0 86
Idle/2 0 0 68
WmiPrvSE/7#1 7420 6548 43
dwm/4 2260 6776 7
mstsc/2#1 3444 2416 3
powershell/7#2 6352 6552 0
conhost/0#2 6360 6368 0
powershell/5#2 6352 6416 0
powershell/6#2 6352 6420 0
iexplore/7#1 4560 3300 0
Foxit Reader/1 736 5304 0
Foxit Reader/2 736 6252 0
conhost/1#2 6360 1508 0
Foxit Reader/0 736 6164 0
all of which should add up to something like 800 for the last column.
But note this is all rounded to integers. Compare with the CPU column of Process Explorer (which doesn't round when View | Show Fractional CPU is selected) over a few processes. Note, much like win32_PerfFormattedData_PerfProc_Process the percentage value is normalised for the core count (and this is only part of the display):
A lot of processes are using a few hundreds of thousands of cycles, but not enough to round up to a single percent.
Have you tryed Get-Counter ?
PS PS:\> Get-Counter "\Processus(iexplor*)\% temps processeur"
Timestamp CounterSamples
--------- --------------
17/07/2012 22:39:25 \\jpbhpp2\processus(iexplore#8)\% temps processeur :
1,5568026751287
\\jpbhpp2\processus(iexplore#7)\% temps processeur :
4,6704080253861
\\jpbhpp2\processus(iexplore#6)\% temps processeur :
0
\\jpbhpp2\processus(iexplore#5)\% temps processeur :
4,6704080253861
\\jpbhpp2\processus(iexplore#4)\% temps processeur :
0
\\jpbhpp2\processus(iexplore#3)\% temps processeur :
0
\\jpbhpp2\processus(iexplore#2)\% temps processeur :
0
\\jpbhpp2\processus(iexplore#1)\% temps processeur :
1,5568026751287
\\jpbhpp2\processus(iexplore)\% temps processeur :
0
Be careful it depend on you locale test :
PS PS:\> Get-Counter -ListSet * | where {$_.CounterSetName -contains "processus"}

What is a viable local database for Windows Phone 7 right now?

I was wondering what is a viable database solution for local storage on Windows Phone 7 right now. Using search I stumbled upon these 2 threads but they are over a few months old. I was wondering if there are some new development in databases for WP7. And I didn't found any reviews about the databases mentioned in the links below.
windows phone 7 database
Local Sql database support for Windows phone 7
My requirements are:
It should be free for commercial use
Saving/updating a record should only save the actual record and not the entire database (unlike WinPhone7 DB)
Able to fast query on a table with ~1000 records using LINQ.
Should also work in simulator
EDIT:
Just tried Sterling using a simple test app: It looks good, but I have 2 issues.
Creating 1000 records takes 30 seconds using db.Save(myPerson). Person is a simple class with 5 properties.
Then I discovered there is a db.SaveAsync<Person>(IList) method. This is fine because it doesn't block the current thread anymore.
BUT my question is: Is it save to call db.Flush() immediately and do a query on the currently saving IList? (because it takes up to 30 seconds to save the records in synchronous mode). Or do I have to wait until the BackgroundWorker has finished saving?
Query these 1000 records with LINQ and a where clause the first time takes up to 14 sec to load into memory.
Is there a way to speed this up?
Here are some benchmark results: (Unit tests was executed on a HTC Trophy)
-----------------------------
purging: 7,59 sec
creating 1000 records: 0,006 sec
saving 1000 records: 32,374 sec
flushing 1000 records: 0,07 sec
-----------------------------
//async
creating 1000 records: 0,04 sec
saving 1000 records: 0,004 sec
flushing 1000 records: 0 sec
-----------------------------
//get all keys
persons list count = 1000 (0,007)
-----------------------------
//get all persons with a where clause
persons list with query count = 26 (14,241)
-----------------------------
//update 1 property of 1 record + save
persons list with query count = 26 (0,003s)
db saved (0,072s)
You might want to take a look at Sterling - it should address most of your concerns and is very flexible.
http://sterling.codeplex.com/
(Full disclosure: my project)
try Siaqodb is commercial project and as difference from Sterling, not serialize objects and keep all in memory for query.Siaqodb can be queried by LINQ provider which efficiently can pull from database even only fields values without create any objects in memory, or load/construct only objects that was requested.
Perst is free for non-commercial use.
You might also want to try Ninja Database Pro. It looks like it has more features than Sterling.
http://www.kellermansoftware.com/p-43-ninja-database-pro.aspx

Resources