How to change MongoDB v3.4 maxWriteBatchSize limit

How to change MongoDB v3.4 maxWriteBatchSize limit - database

I am currently using a MongoDB V3.4 in production environment and have to insert data in bulk from another MongoDB same version. I am currently achieving this by using a tool which tails the oplogs of the primary DB and replicates those ops in my destination DB. However, sometimes is get an error like "Write batch sizes must be between 1 and 1000. Got 2000" and process stops. I did some digging and found the maxWriteBatchSize limit is 1000 in MongoDB v<3.5. Can I bypass or change this limit?

That limit is hardcoded at https://github.com/mongodb/mongo/blob/r3.4.24/src/mongo/s/write_ops/batched_command_request.cpp#L43
const size_t BatchedCommandRequest::kMaxWriteBatchSize = 1000;
In order to change it, you would have to recompile.

Related

Postgres: is there any row_to_json equivalent that returns values only?

In a project I'm working on, I need to stream potentially large data sets from a Postgres database to the client, for analytics purposes.
The application is built in Rails (irrelevant for this question) and after a bit of research I'm currently able to stream query results by using COPY in Postgres:
COPY (SELECT row_to_json(t) from (#{query}) t) TO STDOUT;
Sources (for who's interested):
https://shift.infinite.red/fast-csv-report-generation-with-postgres-in-rails-d444d9b915ab
https://github.com/brianhempel/stream_json_demo
This works, but it yields every row as a key-value pair, e.g.:
["{\"id\":403457,\"email\":\"email403457#example.com\",\"first_name\":\"Firstname403457\",\"last_name\":\"Lastname403457\",\"source\":\"adwords\",\"created_at\":\"2015-08-05T22:43:07.295796\",\"updated_at\":\"2017-01-19T04:48:29.464051\"}"]
In the spirit of minimising the size (in bytes) of the response and especially since this is getting served through the web, I want to return just an array of values for every row, i.e.:
["[403457, \"email403457#example.com\", \"Firstname403457\", \"Lastname403457\", \"adwords\", \"2015-08-05T22:43:07.295796\", \"2017-01-19T04:48:29.464051\"]"]
Is there a way to achieve this within Postgres, even by nesting functions, starting from the query above?

You could create a simple SQL function that converts a row into the desired format:
CREATE FUNCTION row2json(anyelement) RETURNS json
LANGUAGE sql STABLE AS
'SELECT json_agg(z.value) FROM json_each(row_to_json($1)) z';
Then you use that to transform the output:
SELECT row2json(mytab) FROM mytab;
If performance is more important than JSON output, just cast the result to a string:
SELECT CAST(mytab AS text) FROM mytab;

RODBC ERROR: 'Calloc' could not allocate memory

I am setting up a SQL Azure database. I need to write data into the database on daily basis. I am using 64-bit R version 3.3.3 on Windows10. Some of the columns contain text (more than 4000 characters). Initially, I have imported some data from a csv into the SQL Azure database using Microsoft SQL Server Management Studios. I set up the text columns as ntext format, because when I tried using nvarchar the max was 4000 and some of the values got truncated even though they were about 1100 characters long.
In order to append to the database I am first saving the records in a temp table when I have predefined the varTypes:
varTypesNewFile <- c("Numeric", rep("NTEXT", ncol(newFileToAppend) - 1))
names(varTypesNewFile) <- names(newFileToAppend)
sqlSave(dbhandle, newFileToAppend, "newFileToAppendTmp", rownames = F, varTypes = varTypesNewFile, safer = F)
and then append them by using:
insert into mainTable select * from newFileToAppendTmp
If the text is not too long, the above does work. However, sometimes I get the following error during the sqlSave command:
Error in odbcUpdate(channel, query, mydata, coldata[m, ], test = test, :
'Calloc' could not allocate memory (1073741824 of 1 bytes)
My questions are:
How can I counter this issue?
Is this the format I should be using?
Additionally, even when the above works, it takes about an hour to upload about 5k of records. Is it not too long? Is this the normal amount of time it should take? If not, what could I do better.

RODBC is very old, and can be a bit flaky with NVARCHAR columns. Try using the RSQLServer package instead, which offers an alternative means to connect to SQL Server (and also provides a dplyr backend).

how to increase the sample size used during schema discovery to 'unlimited'?

I have encountered some errors with the SDP where one of the potential fixes is to increase the sample size used during schema discovery to 'unlimited'.
For more information on these errors, see:
No matched schema for {"_id":"...","doc":{...}
The value type for json field XXXX was presented as YYYY but the discovered data type of the table's column was ZZZZ
XXXX does not exist in the discovered schema. Document has not been imported
Question:
How can I set the sample size? After I have set the sample size, do I need to trigger a rescan?

These are the steps you can follow to change the sample size. Beware that a larger sample size will increase the runtime for the algorithm and there is no indication in the dashboard other than the job remaining in 'triggered' state for a while.
Verify the specific load has been stopped and the dashboard status shows it as stopped (with or without error)
Find a document https://<account>.cloudant.com/_warehouser/<source> where <source> matches the name of the Cloudant database you have issues with
Note: Check https://<account>.cloudant.com/_warehouser/_all_docs if the document id is not obvious
Substitute "sample_size": null (which scans a sample of 10,000 random documents) with "sample_size": -1 (to scan all documents in your database) or "sample_size": X (to scan X documents in your database where X is a positive integer)
Save the document and trigger a rescan in the dashboard. A new schema discovery run will execute using the defined sample size.

Can I access sqlite3 using octave?

Is there a way to read and write to sqlite3 from octave?
I'm thinking something along the lines of RODBC in R or the sqlite3 package in python, but for octave.
I looked on octave-forge http://octave.sourceforge.net/packages.php
But could only find the 'database' package, which only supports postgresql.
Details:
OS: Ubuntu 12.04
Octave: 3.6.2
sqlite: 3.7.9

I realise this is an old question, but most answers here seem to miss the point, focusing on whether there exists a bespoke octave package providing a formal interface, rather than whether it is possible to perform sqlite3 queries from within octave at all in the first place.
Therefore I thought I'd provide a practical answer for anyone simply trying to access sqlite3 via octave; it is in fact trivial to do so, I have done so myself many times.
Simply do an appropriate system call to the sqlite3 command (obviously this implies you have an sqlite3 client installed on your system). I find the most convenient way to do so is to use the
sqlite3 database.sqlite < FileContainingQuery > OutputToFile
syntax for calling sqlite3.
Any sqlite3 commands modifying output can be passed together with the query to obtain the output in the desired format.
E.g. here's a toy example plotting a frequency chart from a table which returns appropriate scores and counts in csv format (with headers and runtime stats stripped from the output).
pkg load io % required for csv2cell (used to collect results)
% Define database and Query
Database = '/absolute/path/to/database.sqlite';
Query = strcat(
% Options to sqlite3 modifying output format:
".timer off \n", % Prevents runtime stats printed at end of query
".headers off \n", % If you just want the output without headers
".mode csv \n", % Export as csv; use csv2cell to collect results
% actual query
"SELECT Scores, Counts \n",
"FROM Data; \n" % (Don't forget the semicolon!)
);
% Create temporary files to hold query and results
QueryFile = tempname() ; QueryFId = fopen( QueryFile, 'w' );
fprintf( QueryFId, Query ); fclose( QueryFId);
ResultsFile = tempname();
% Run query
Cmd = sprintf( 'sqlite3 "%s" < "%s" > "%s"', Database, QueryFile, ResultsFile );
[Status, Output] = system( Cmd );
% Confirm query succeeded and if so collect Results
% in a cell array and clean up temp files.
if Status != 0, delete( QueryFile, ResultsFile ); error("Query Failed");
else, Results = csv2cell( ResultsFile ); delete( QueryFile, ResultsFile );
end
% Process Results
Results = cell2mat( Results );
Scores = Results(:, 1); Counts = Results(:, 2);
BarChart = bar( Scores, Counts, 0.7 ); % ... etc
Et, voilà

According to Octave-Forge the answer is no.
Interface to SQL databases, currently only postgresql using libpq.
But you can write your own database package using the Octave C++ API with SQLite C API

As you already found out, the new version of the database package (2.0.0) only supports postgreSQL. However, old versions of the package also supported MySQL and SQLite (the last version with them was version 1.0.4).
Its problem is that the old database packages do not work with the new Octave and SWIG versions (I think the last version of Octave where the database package worked was 3.2.4). Aside the lack of maintainer (package was abandoned for almost 4 years) its use of SWIG was becoming a problem since it made more difficult for other developers to step in. Still, some users tried to fix it and some half fixes have been done (but never released). See bug #38098 and Octave's wiki page on the database package for some reports on making it work with SQLite in Octave 3.6.2.
The new version of the package is a complete restart of the package. Would be great if you could contribute with development for SQLite bindings.

Check out this link http://octave.1599824.n4.nabble.com/Octave-and-databases-td2402806.html which asks the same question regarding MySQL.
In particular this reply from Martin Helm points the way to using JDBC to connect to any JDBC supported database -
"Look at the java bindings in the octave java package (octave-forge), it is
maintained and it works. Java is very strong and easy for database handling.
Use that and jdbc driver for mysql to connect to mysql (or with the
appropriate jdbc friver everything else which you can imagine). That is what I
do when using db queries from octave. Much easier and less indirect than
invoking scripts and parsing output from databse queries.
As far as I remeber the database package is somehow broken (at least I never
was able to use it). "

I know this thread is pretty old, but for anybody else out there looking for a similar solution, this project seems to provide it.
https://github.com/markuman/go-sqlite

Script output to file when using SQL-Developer

I have a select query producing a big output and I want to execute it in sqldeveloper, and get all the results into a file.
Sql-developer does not allow a result bigger than 5000 lines, and I have 100 000 lines to fetch...
I know i could use SQL+, but let's assume I want to do this in sqldeveloper.

Instead of using Run Script (F5), use Run Statement (Ctrl+Enter). Run Statement fetches 50 records at a time and displays them as you scroll through the results...but you can save the entire output to a file by right-clicking over the results and selecting Export Data -> csv/html/etc.
I'm a newbie SQLDeveloper user, so if there is a better way please let me know.

This question is really old, but posting this so it might help someone with a similar issue.
You can store your query in a query.sql file and and run it as a script. Here is a sample query.sql:
spool "C:\path\query_result.txt";
select * from my_table;
spool off;
In oracle sql developer you can just run this script like this and you should be able to get the result in your query_result.txt file.
#"C:\Path\to\script.sql"

Yes you can increase the size of the Worksheet by change the setting Tool-->Preferences - >Database -> Worksheet -> Max rows to print in a script(depends on you).

Mike G answer will work if you only want the output of a single statement.
However, if you want the output of a whole sql script with several statements, SQL*Plus reports, and some other output formats, you can use the spool command the same way as it is used in SQL*Plus.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

How to change MongoDB v3.4 maxWriteBatchSize limit - database

That limit is hardcoded at https://github.com/mongodb/mongo/blob/r3.4.24/src/mongo/s/write_ops/batched_command_request.cpp#L43 const size_t BatchedCommandRequest::kMaxWriteBatchSize = 1000; In order to change it, you would have to recompile.

Related

Postgres: is there any row_to_json equivalent that returns values only?

RODBC ERROR: 'Calloc' could not allocate memory

how to increase the sample size used during schema discovery to 'unlimited'?

Can I access sqlite3 using octave?

Script output to file when using SQL-Developer

Categories

Resources