I'm trying out ArangoDB and having some trouble. I successfully imported ~1.3 million documents and I'm trying to rearrange the document data in the database, but the following query (run through Arango shell) just slows Arango a crawl until eventually the shell gives me an error: [ArangoError 2001: Error reading from: 'tcp://127.0.0.1:8529' 'timeout during read']
FOR d IN DocumentCollection
UPDATE d WITH {'uid': d.property1.property2} IN DocumentCollection
Should this query work? Am I doing something wrong? Is there some way to speed it up?
It is (still) working.
You can use the queries Module to observe the query in action.
You can make arangosh wait more patiently with the --server.request-timeout - option.
The performance problem here is, that the whole collection has to be loaded into memory for this operation - since it can't chunk that internally (yet).
If you are able to splice that into a series of queries using FILTER and ranges, you'd probably be faster at your target.
Related
I want to test query performance . Example:
select * from vw_testrole.
vw_testrole- has lot of joins. Since the data is cached, it is returning in less time. I want to see the query plan and How to see it or clear cache or that I can see original time taken to execute.
Thanks,
Xi
Some extra info, as you are planning to do some "performance tests" to determine the expected execution time for a query.
The USE_CACHED_RESULT parameter disables to use of cached query results. It doesn't delete the existing caches. If you disable it, you can see the query plan (as you wanted), and your query will be executed each time without checking if the result is already available (because of previous runs of the same query). But you should know that Snowflake has multiple caches.
The Warehouse cache: As Simeon mentioned in the comment, Snowflake caches recently accessed the remote data (on the shared storage) in the local disks of the warehouse nodes. That's not easy to clean. Even suspending a warehouse may not delete it.
The Metadata cache - If your query access very big tables and compile time is long because of accessing metadata (for calculating stats etc), then this cache could be very important. When you re-run the query, it will probably read from the metadata cache, and significantly reduce compile time.
The result cache: This is the one you are disabling.
And, running the following commands will not disable it:
ALTER SESSION UNSET USE_CACHED_RESULT=FALSE;
ALTER SESSION UNSET USE_CACHED_RESULT;
The first one will give an error you experienced. The last one will not give an error but the default value is TRUE, so actually, it enables it. The correct command is:
ALTER SESSION SET USE_CACHED_RESULT=FALSE;
You can clear cache by setting ALTER SESSION UNSET USE_CACHED_RESULT;
To get plan of last query Id , you can use below stmt:
select system$explain_plan_json(last_query_id()) as explain_plan;
I have a Power BI report pulling from SQL Server that needs to be set up for incremental refresh due to the large data pull. As the load is fairly complex (and PQuery editor is tedious and often breaks folding), I need to use a SQL query (aka a "native query" in PBI speak) while retaining query folding (so that incremental refresh works).
I've been using the nice...
Value.NativeQuery( Source, query, null, [EnableFolding = true])
... trick found here to get that working.
BUT it only seems to work if the native query finishes fairly quickly. When my WHERE clause only pulls data for this year, it's no problem. When I remove the date filter in the WHERE clause (so as to not conflict with the incremental refresh filter), or simply push the year farther back, it takes longer seemingly causing PBI to determine that:
"We cannot fold on top of this native query. Please modify the native query or remove the 'EnableFolding' option."
The error above comes up after a few minutes both in the PQuery editor or if I try to "bypass" it by quickly quickly clicking Close & Apply. And unfortunately, the underlying SQL is probably about as good as it gets due to our not-so-great data structures. I've tried tricking PBI's seeming time-out via an OPTION (FAST 1) in the script, but it just can't pull anything quick enough.
I'm now stuck. This seems like a silly barrier as all I need to do is get that first import to complete as obviously it can query fold for the shorter loads. How do I work past this?
In retrospect, it's silly that I didn't try this initially, but even though the Value.NativeQuery() M step doesn't allow a command time-out option, you can still put it in a preceding Sql.Database() step manually and it carries forward. I also removed some common table expressions from my query which also were breaking query folding (not sure why, but easy fix by saving my complex logic as a view in sql server itself and just joining to that). Takes a while to run now, but doesn't time-out!
I have a SQL connection to a table on my SQLServer, which I have imported with the following line:
master_table <- RxSqlServerData(etc...)
Then, my goal is to save/import this table using rxImport and save it to a .xdf file, which I have called readTest <- 'read_test.xdf
The table is quite large, so I have set this in my rxImport:
rxImport(master_table, outFile=readTest, rowsPerRead=100000,reportProgress=1)
However, it has been running for 10 minutes now, and NO progress of rows being read/imported is being printed on the screen. Did I do this correctly? I wanted to output similar "progress" that is printed when a ML algorithm is run like RxForest or similar?
Thanks.
It's possible that the connection to your SQL Server database is relatively slow, report progress will only show progress when a batch of rows is complete. If the rows are relatively large, you could see nothing returned on the terminal for quite some time.
For best performance with rxImport(), ensure that rowsPerRead is the largest possible size that your local machine memory can handle. This will make progress reports less frequent, but, it will give you a faster import time. The only case where this isn't true is when importing an XDF file.
Is there any way to stop the build of a materialized view in Cassandra (3.7)?
Background: I created two materialized views A and B (full disclosure - I may have attempted to drop them before the build was complete) and those views seem to be perpetually stuck...any attempt to create another view C on the same table seems to hang. Using nodetool
nodetool.viewbuildstatus <keyspace>.<view>
shows a combination of STARTED and UNKNOWN for A and B, and STARTED for views in C. Using cql:
select * from system.views_builds_in_progress
all views are listed, but generation number and last_token have not changed in the last 24hrs (generation_number is in fact null for A).
Its not documented, but nodetool stop actually takes any compaction type, not just the ones listed (which the view build is one of). So you can simply:
nodetool stop VIEW_BUILD
Or you can hit JMX directly with the org.apache.cassandra.db:type=CompactionManager mbean's stopCompaction operation.
All thats really gonna do is set a flag for the view builder to stop on its next loop. If it threw an uncaught exception or something so its no longer doing anything (worth checking system/output logs) the stop wont do anything either. In that case its not really hurting anything though so can ignore it and retry. Worst case restart the node.
If I want to know a number of all files in alfresco to show on alfresco page how to do that first?
Now I am not find api access to database and if I find api, what should I do next?
You can make a query to SearchService, like this:
SearchParameters params = new SearchParameters();
params.getStores().add(StoreRef.STORE_REF_WORKSPACE_SPACESSTORE);
params.setLanguage(SearchService.LANGUAGE_FTS_ALFRESCO);
params.setQuery("TYPE:cm\\:content AND PATH:\"/app\\:company_home/st\\:sites/cm\\:test/cm\\:documentLibrary//*\"");
ResultSet result = searchService.query(params);
System.out.println(result.length());
But I'm not sure how optimised it is for performance.
easy way, you can query all type in database by api : localhost:8080/alfresco/service/cmis/query?q={q}
q is CMIS Query Language for alfresco. example SELECT * FROM cmis:document it selects all properties for all documents
see more CMIS Query Language
According to my experience sometimes it is better to get such information from database.
Just for info: in my current project we have more than 50000 documents in repo and I need to get exact number for monitoring.
Here is few points to use DB queries in some cases:
CMIS is (much) slower (in my case it took ~1-2 seconds per query). As #lightoze suggested you can use SearchService, but then you'll get the documents in ResultSet, so after you'll need also to call length method to get the number of them, which I think is more time consuming rather than sql call. And in my case I do such calls every 5 minutes.
There is a bug in 5.0.c which limits the result of some queries by 1000 docs.
Here you can find how to connect to database and here some interesting queries including the total number of documents in repo.