It might looks like a dumb question. In fact it's more like a poll: how big is your Sonar database? I need this to estimate the requirements for a virtual machine to host my Sonar instance.
Also:
how big is your team?
how many additional bytes is used in the Sonar database for every new commit?
I will appreciate any help.
I'm assuming that you're referring to Sonar from Codehaus, and not SOund Navigation And Ranging.
From the installation page:
The Sonar web server requires 500Mo of RAM to run efficiently.
In terms of data space and as an indication, on Nemo the public instance of Sonar, 2Go of data space are used to analyze more than 6 million LOC with an history of 2 years. For every 1'000 LOC to analyze the database stores 350 Ko of data space.
Related
Hello Beautiful People,
We have planned to migrate our existing system to ERPNext. But we've noticed that importing large amount of data takes a long time, importing and submitting 80,000 Delivery notes took 3 days (Linked with Sales Order and Sales Invoice).
I am wondering if there is a better way to import large data. We are using AWS EC2 (c5n.xlarge vCPU 4, RAM 10.5 GiB), suggestions for increasing server capacity and configurations also encouraged.
Can we use ERPNext Data Migration Tool? I have tried to connect with another MariaDB Server, no luck. I don't know how it works.
Thanks in advance for your help
From what I have seen on the web they say:
2.The amount of disk space you need will depend on how much code you analyze with SonarQube. As an example, SonarCloud the public instance of SonarQube, has more than 30 millions lines of code under analysis with 4 years of history. SonarCloud is currently running on a Amazon EC2 m4.large instance, using about 10 Gb of drive space. It handles 800+ projects having roughly 3M open issues. SonarCloud is running on PostgreSQL 9.5 and it is using about 15Gb of drive space.
However it is not very clear. Do I just need to count the amount of lines of code we have and then do a rough estimate or what would be the best way to know the size of the DB we need?
Thank you for your help!
I highly appreciate it.
Here is the thing.
I have a report databse which is used by pentaho, for generating reports. This DB is running on the same machine as pentaho-server (v7.1).
This report DB is being filled from about 90 other databases spread across the country. Theirs number is increasing.
Because, data-integration is also a Java application, it started to require too much computing power and pentaho web app was too slow. What we did was, that we move fetches to separate machines. Where those Java apps run, and load data into report DB on webserver.
BUT, this change did not bring expected results. While decreasing Load Average on main machine significantly (from about 70 to about 12).
But postgres itself still drains too much power (and is too slow), because there are constantly like 20~30 processes on another machine feeding report DB with new data. There are of course about 90 fetch processes, but they never run all at once, but also never run less than 20 at once.
I was expecing the new machine where fetches run, to be high Load Average while web server will be low Load Average when no report is being generated.
So my question is: How to make fetches use computing power of secondary machine, when loading data into primary machine?
(I was also thinking about writing my own script in python that will do less DB operations during fetch, but that would't solve my problem, just buy me more time.)
I was looking at Citus, but I am not sure if it is exactly what I need, and if it makes sense being used on just 2 machines.
So basically my qustion is: Is there any way, how to use computing power of my Pc when inserting data into remote DB?
The more native to postgres solution will be, the better. Ideally without the need of any 3rd party software.
CouchDB is great, I like its p2p replication functionality, but it's a bit larger(because we have to install Erlang) and slower when used in desktop application.
As I tested in intel duo core cpu,
12 seconds to load 10000 docs
10 seconds to insert 10000 doc, but need 20 seconds to update view, so total is 30 seconds
Is there any No SQL implementation which has the same p2p replication functionality, but the size is very small like sqlite, and the speed is quite good(1 second to load 10000 docs).
Have you tried using the Hovercraft and/or the Erlang view server? I had a similar problem and found staying within the Erlang VM (thereby and avoiding excursions to SpiderMonkey) gave me the boost I needed. I did 3 things...
Boosting Queries: Porting your mapreduce functions from js to "native" Erlang usually gives tremendous performance boost when querying couch (http://wiki.apache.org/couchdb/EnableErlangViews). Also, managing views is easier coz you can call external libs or your own compiled modules (just add them to your ebin dir) reducing the number of uploads you need to do during development.
Boosting Inserts: Using Hovercraft for inserts gives upto X100 increase in performance (https://github.com/jchris/hovercraft.) This was mentioned in the CouchDB book (http://guide.couchdb.org/draft/performance.html)
Pre-Run Views: The last thing you can do for desktop apps is run your views during application startup (say, when the splash-screen is showing.) The first time views are run is always the slowest, subsequent runs are faster.
These helped me a lot.
Edmond -
Unfortunately the question doesn't offer enough details about your app requirements so it's kind of difficult to offer an advise. Anyways, I'm not aware of any other storage solution offering a similar/advanced P2P replication.
A couple of questions/comments about your your requirements:
what kind of desktop app requires 10000 inserts/second?
when you say size what exactly are you referring to?
You might want to take a look at:
Redis
RavenDB
Also check some of the other NoSQL-solutions listed on http://nosql.mypopescu.com against your app requirements.
I am learning about the Apache Cassandra database [sic].
Does anyone have any good/bad experiences with deploying Cassandra to less than dedicated hardware like the offerings of Linode or Slicehost?
I think Cassandra would be a great way to scale a web service easily to meet read/write/request load... just add another Linode running a Cassandra node to the existing cluster. Yes, this implies running the public web service and a Cassandra node on the same VPS (which many can take exception with).
Pros of Linode-like deployment for Cassandra:
Private VLAN; the Cassandra nodes could communicate privately
An API to provision a new Linode (and perhaps configure it with a "StackScript" that installs Cassandra and its dependencies, etc.)
The price is right
Cons:
Each host is a VPS and is not dedicated of course
The RAM/cost ratio is not that great once you decide you want 4GB RAM (cf. dedicated at say SoftLayer)
Only 1 disk where one would prefer 2 disks I suppose (1 for the commit log and another disk for the data files themselves). Probably moot since this is shared hardware anyway.
EDIT: found this which helps a bit: http://wiki.apache.org/cassandra/CassandraHardware
I see that 1GB is the minimum but is this a recommendation? Could I deploy with a Linode 720 for instance (say 500 MB usable to Cassandra)? See http://www.linode.com/
How much ram you needs really depends on your workload: if you are write-mostly you can get away with less, otherwise you will want ram for the read cache.
You do get more ram for you money at my employer, rackspace cloud: http://www.rackspacecloud.com/cloud_hosting_products/servers/pricing. (our machines also have raided disks so people typically see better i/o performance vs EC2. Dunno about linode.)
Since with most VPSes you pay roughly 2x for the next-size instance, i.e., about the same as adding a second small instance, I would recommend going with fewer, larger instances than more, smaller ones, since in small numbers network overhead is not negligible.
I do know someone using Cassandra on 256MB VMs but you're definitely in the minority if you go that small.