Mongo shell : is there a way to execute the javascript code remotely instead of doing work in the local machine? - database

I have a MongoDB instance running on Mongo Atlas and I have a local machine.
I want to execute a script but I would like this script to be execute on the mongo instance.
I have tried several things like Robo3T or Mongo shell. And it looks like the behaviour is not the one I want.
Suppose we have this script :
print(db.users.find({}).toArray().length);
My users collection has around 30k rows. I voluntarily use toArray() to force the creation of a js array. But I want this array to be created... In the MongoDB instance or close to it ; not on the instance where I launched the mongo shell (or Robo3T).
This is obviously not my use case to count the number of users, if I really just wanted the number of users, I would have used .count() and it would have been faster. But I just want to illustrate the fact that the code is run not at the location I want it to be run.
Suppose you connect to a remote ssh. You have a very poor connection.
If you do something like
wget http://moviedatabase.com/rocky.mp4
which is a 1 To movie.
You will take the same time if your connection is blazing fast or amazingly slow : what counts is the bandwith of the server you are connecting to.
With my example, all depends on the connection of the instance you are launching Mongo shell on.
If it has a good connection, it will be faster than if it has a good connection.
What is the way to execute js code "closer" to the MongoDB instance?
How this behaviour not a problem when you administrate a MongoDB instance?
Thanks in advance,
Jerome

It depends on what you are trying to do.
There is no generic context where you can run arbitrary code, but you can store a javascript function on the server, which can then be used in $where or mapReduce.
Note that server-side javascript can be disabled with the security.javascriptEnable configuration parameter.
I would expect that Atlas disables this for it's free and shared tiers.

Related

Automate the execution of a C# code that uses Entity Framework to treat data?

I have code that uses Entity Framework to treat data (retrieves data from multiple tables then performs operations on it before saving in a SQL database). The code was supposed to run when a button is clicked in an MVC web application that I created. But now the client wants the data treatment to run automatically every day at a set time (like an SSIS package). How do I go about this?
But now the client wants the data treatment to run automatically every day at a set time (like an SSIS package). How do I go about this?
In addition to adding a job scheduler to your MVC application as #Pac0 suggests, here are a couple of other options:
Leave the code in the MVC project and create an API endpoint that you can invoke on some sort of schedule. Give the client a PowerShell script that calls the API and let them take it from there.
Or
Refactor the code into a .DLL or copy/paste it into a console application that can be run on a schedule using the Windows Scheduler, SQL Agent or some other external scheduler.
You could use some tool/lib that does this for you. I could recommend Hangfire, it works fine (there are some others, I have not tried them).
The example on their homepage is pretty explicit :
RecurringJob.AddOrUpdate(
() => Console.WriteLine("Recurring!"),
Cron.Daily);
The above code needs to be executed once when your application has started up, and you're good to go. Just replace the lambda by a call to your method.
Adapt the time parameter on what you wish, or even better: make it configurable, because we know customers like to change their mind.
Hangfire needs to create its own database, usually it will stay pretty small for this kind of things. You can also monitor if the jobs ran well or not, and check no the hangfire server some useful stats.

Should I have to sumit jobs to spark or I can run them from client lib?

So I'm learning about Spark and I have a question about how client libs works.
My goal is to do some sort of data analysis in Spark, telling it where are the data sources (databases, cvs, etc) to process, and store results in hdfs, s3 or any kind of database like MariaDB or MongoDB.
I though about having a service (API application) that "tells" spark what I want to do. The question is: Is it enough setting the master configuration with spark:remote-host:7077 at context creation or should I send the application to spark with some sort of spark-submit command?
This completely depends on how your environment is set up, if all paths are linked to your account you should be able to run one of the two commands, to efficiently open the shell and run test commands. The reason to have a shell, is this will allow you to dynamically run commands and validate/learn how to run/tether commands onto one another and see what results come out.
Scala
spark-shell
Python
pyspark
Inside of the environment, if everything is linked to Hive tables you can check the tables by running
spark.sql("show tables").show(100,false)
The above command will run a "show tables" on the Spark-Hive-Metastore Catalogue and will return all active tables you can see (doesn't mean you can access the underlying data). The 100 means I am going to look at 100 rows and the false means to show the full string not the first N many characters.
In a mythical example if one of the tables you see is called Input_Table you can bring it into the environmrnt with the below commands
val inputDF = spark.sql("select * from Input_Table")
inputDF.count
I would heavily advise, while your learning, not to run the commands via Spark-Submit, because you will need to pass through the Class and Jar, forcing you to edit/rebuild for each testing making it difficult to figure how commands will run without have a lot of down time.

"Unit" Testing Database

I'm running Oracle 11g SE1 .
Just wondering if there're any tools that would allow me to test the data integrity of a ( mostly read-only ) schema. Essentially, what I want to do is to have some queries that run every night or so and see if they return the expected result. For example:
SELECT COUNT(*) FROM PATIENTS WHERE DISEASE = 'Clone-Killing Nanovirus';
Expected result : 59.
How do people normally do such testing ?
I've used SQLUnit and written about it here. I don't believe any new development is being done on it but it should accomplish your goal.
SQL Developer (free, as in beer) also has a Unit Testing framework. I have installed it and that's about it. I want to use it more, but I've been working with BI the past few years so no external pressure to learn it.
The tests that you want to create sound pretty simple, so either of those should work well for you. The next step would be to have them run on a schedule (cron, windows scheduler, etc) or you can go crazy with a continuous integration tool like Atlassian's Bamboo (haven't used it).
Of course you could skip the tools altogether and just write up scripts that are called from the command line. Fancy would have you writing the results to a database table so you can easily skin it, simple would be piping the results to a text file and reviewing that each day.
Hope this helps.
You could batch up your queries and run a simple perl script using DBI that would run the queries and check them against an accepted tolerance and email you if something didn't meet thresholds. I know I have written such db checking code before to make sure items were within thresholds. Perl is a good tool for this sort of thing as the DBI module can connect to your database and then you can run some canned queries and easily send yourself an email using the MIME package. http://www.perl.com/pub/1999/10/DBI.html

Defer connection to datasources defined in application.conf

Say I have a database called "awesome" which is located on a live server and at the same time duplicated on a staging server for testing. My web app is based on Play 2.1.1 using Scala.
So I have these datasources defined in my application.conf file:
db.awesome-test.driver= com.mysql.jdbc.Driver
db.awesome-test.url="jdbc:mysql://127.0.1.1/awesome"
db.awesome-test.user=mr_awesome_tester
db.awesome-test.password=justtesting
db.awesome-live.driver= com.mysql.jdbc.Driver
db.awesome-live.url="jdbc:mysql://127.0.0.1/awesome"
db.awesome-live.user=mr_awesome
db.awesome-live.password=omgthisisawesome
Depending on what environment I am on, I would like to use either DB.withConnection("awesome-test") or DB.withConnection("awesome-live"). I am controlling this via another value in my config; so I e.g. put environment=awesome-live in there and then get the respective connection string via Play.configuration.
Now, the problem is that apparently play attempts to create a DB connection to each datasource defined in the config right away. A) This fails depending on which environment I am on. E.g. on the staging machine I will get something like this (pic is only a mock-up of course) because the live DB is not reachable:
...although it is completely unnecessary to try to connect to that DB, because it will never be used in this environment. B) Even if the connection would work, of course it would not be feasable to create two connections (live and testing) when only one of the two is ever needed.
Is there a way to tell Play to defer/postpone creation of the DB connection until it is actually needed (e.g. when DB.getConnection("...") or DB.withConnection("...") or something is called for that datasource)?
I am thinking something like db.awesome-live.deferCreation=true.
Cheers, Alex
I'd say that you have two ways of doing this.
Everything is explained at the Play! Documentation: Additional configuration
Specifying alternative configuration file
test.conf
db.awesome.driver= com.mysql.jdbc.Driver
db.awesome.url="jdbc:mysql://127.0.1.1/awesome"
db.awesome.user=mr_awesome_tester
db.awesome.password=justtesting
live.conf
db.awesome.driver= com.mysql.jdbc.Driver
db.awesome.url="jdbc:mysql://127.0.0.1/awesome"
db.awesome.user=mr_awesome
db.awesome.password=omgthisisawesome
In code you always use DB.withConnection("awesome").
Start the application with
$ start -Dconfig.resource=test.conf
or
$ start -Dconfig.resource=live.conf
Overriding specific configuration keys
In your case that means:
$ start -Ddb.awesome-live.deferCreation=true

What would be the best way to store traceroute results?

One traceroute record may include :
Timestamp. Milliseconds resolution.
Variable number of hops.
Each hops contains, ip address, hostname and rtt.
Overall results e.g. successful, network unreachable, timed out.
Thanks.
I would use a database. You could use SQLite if you don't want to run a database server.
More details:
There is this nice little sqlite addon for firefox:
https://addons.mozilla.org/en-US/firefox/addon/sqlite-manager
That should help you set things up. I would create a field for each of the values I want to store and on for the entire result perhaps and an primary key "id" field.
Getting your data into the database would be least trivial part. If your running linux, you could write a bash shell script that captures the output of traceroute and calls a PHP shell script which inserts the data into the DB. Of course you can use Python or any language you like which supports your DB.

Resources