Fastest way to import large data to ERPNext - database

Hello Beautiful People,
We have planned to migrate our existing system to ERPNext. But we've noticed that importing large amount of data takes a long time, importing and submitting 80,000 Delivery notes took 3 days (Linked with Sales Order and Sales Invoice).
I am wondering if there is a better way to import large data. We are using AWS EC2 (c5n.xlarge vCPU 4, RAM 10.5 GiB), suggestions for increasing server capacity and configurations also encouraged.
Can we use ERPNext Data Migration Tool? I have tried to connect with another MariaDB Server, no luck. I don't know how it works.
Thanks in advance for your help

Related

Which M.. tier do I need for my app (MongoDB)?

I’m new to MongoDB,
I have an Ionic App for a local restaurant where you have some products which you can order. The app also have a register to create some users. There is also a Angular Web App where you can put in products and look up users etc.
Both apps are connected to the MongoDB. Unfortunately I don’t have any clue which data plan is necessary for the deployment of these two apps.
Is it maybe better to switch to Firebase?
Can anybody help me please?
Best regards
Basti
Selecting a tier in MongoDB-Atlas depends on various factors like data size, IOPS, Price etc.. As you're saying this is for a local restaurant & I would assume there could be less traffic to the App, then in that case you can go with M10 cause that's where MongoDB Atlas really provide some valuable features to database which is used in production environment. For development environment you can give a try with M5 cluster. Some features you can enjoy using M10 or above are :
Dedicated Cluster : Clusters deploy each mongod process to its own instance, Where as M0, M2 & M5 are in shared environment, So Atlas will automatically upgrade the cluster to latest version upon availability which is not preferred in realtime Apps as there could be a functionality/package that can break with upgrades.
Queryable backups : You can take advantage of querying specific continuous backup snapshot, Which is really helpful to restore a part of data instead of entire dataset which is back'd up a day ago.
Supports Network Peering : As most of projects now a days use cloud platforms to deploy Apps, Clusters >= M10 supports network peering.
Metrics & Performance Advisor : This is one most important thing which you'll get benefited using clusters >= M10. Using alerts you'll get to know which kind of queries are taking much time, How many connections are open at a given time, monitor CPU threshold & get alerted, additionally MongoDB can suggest you with indexes to be created for better performance of queries being run on collections which fail to use index already present in.
At the end of the day, Remaining most other features remain almost same. From my experience usually you'll estimate & pre-pay certain amount for MongoDB Atlas account for around 3 years, Where you don't get back anything if you've not utilized all of it. Also you can upgrade & downgrade clusters at anytime manually or can be automatically scaled up or down based on incoming traffic.
Ref : cluster-tier

What is the proper way to know the size of DB needed for SonarQube?

From what I have seen on the web they say:
2.The amount of disk space you need will depend on how much code you analyze with SonarQube. As an example, SonarCloud the public instance of SonarQube, has more than 30 millions lines of code under analysis with 4 years of history. SonarCloud is currently running on a Amazon EC2 m4.large instance, using about 10 Gb of drive space. It handles 800+ projects having roughly 3M open issues. SonarCloud is running on PostgreSQL 9.5 and it is using about 15Gb of drive space.
However it is not very clear. Do I just need to count the amount of lines of code we have and then do a rough estimate or what would be the best way to know the size of the DB we need?
Thank you for your help!
I highly appreciate it.

Web scraping vs Cloud Storage with AWS

My team has run into a design conflict. We are working on a project that involves scraping historical data from yahoo for all stocks for the last year to run some ML analysis on it. The latency is unbearably slow, not sure if it's the network or the web scraper. I proposed we use AWS RDS to store the data so we can access it quicker. However, a team member said that storing the data in the cloud would not solve our latency issue. I rebutted with the fact that the data will be organized and stored in a way to access the data significantly faster. He came back with something else and this went on. Is it true that a cloud DB won't offer any additional speed compared to a scraper? If so does AWS have a service that allows us to access the data we store faster through another service, almost as if the database was on our own server?
I am not that all familiar with cloud services but I do understand databases pretty well. So please dumb down the AWS stuff if you wish and feel free to point me to any duplicates or links that may help me understand this more.
Lots of good reasons to use RDS as a database, but speeding up your scraping isn't one of them - it likely isn't your bottleneck.
I have written lots of scrapers over the years, and by far the biggest performance boost will be to have a fast network connection between the scraper machine(s) and the host you are scraping, and even then, using a multi-threaded scraper for each scraping machine will give you another HUGE speed improvement.
Most time spent scraping is waiting on the host to return the results to you, not parsing the page and not saving the database to a database.
A MySQL DB on AWS RDS would be the same as the one that you'd install yourself on some machine. So, it isn't going to be different or slower just because it is in the cloud.
If you scrape some data and process it only once, then there is no point in introducing a DB in between. But if your scraper is slow and you process scraped data multiple times, then storing it in a DB should improve latencies. That is because the latencies of a DB read will be much lesser than that of scraping (assuming you design your DB schema properly; your hosts are in the same availability zones, or at least regions, as your DB etc.).
For e.g., if scraping a webpage takes ~10s and you process the scraped data twice, it'd take you ~20s if you don't have a DB. If you have a DB which has latencies of ~500ms you'd only take ~11s.

What amazon ec2 would be suggested for an ecommerce website with windows and sql server?

We are running an eCommerce venture that has around 2000 unique visitors in a day. The total data is around 6 GB as of now.
We are using SQL Server as our database and in the coming months the website may scale up to 10000 users per day.
From this link deciphered that it would be best to use M1 instance but could anyone help really clueless as to what to purchase from these options.
Note: Our budget is around 170 Dollars PM.
EDIT: The number of concurrent users we have had is around 150
I'd try to fit everything in memory. If you can't due to budget, you need to make sure the disk response times are up to par for your expected load. You application can vary widely. One visit to a homepage could generate many queries, or maybe you have application caching set up - so it's hard for anyone to just tell you. You should also get solid numbers on your peak number of concurrent users so you can plan for that. You don't mention your current environment, but you can get some numbers about CPU, Disk MB Read/Writes/s and memory used to help you get the right size.
I'd look at the xlarge m1. That gives you 15GB of memory to play with. You'll be able to cache all the data you need and have some left over for the OS and also have some room to grow. CPU probably won't be your issue, but be sure to check out your current use.
If you have some time to spend on it, I'd try setting up JMeter to do some load testing and see how many concurrent users you can max out with one of the cheaper options.
This topic may be better suited to ServerFault.
I'd suggest you to check at the reserved instances in heavy category where you can get interesting discounts if you plan to run it for a year or similar.
But with that budget you should be thinking about an m1.medium instance, which might be a little tight for your requirements.

How big is a sonar database?

It might looks like a dumb question. In fact it's more like a poll: how big is your Sonar database? I need this to estimate the requirements for a virtual machine to host my Sonar instance.
Also:
how big is your team?
how many additional bytes is used in the Sonar database for every new commit?
I will appreciate any help.
I'm assuming that you're referring to Sonar from Codehaus, and not SOund Navigation And Ranging.
From the installation page:
The Sonar web server requires 500Mo of RAM to run efficiently.
In terms of data space and as an indication, on Nemo the public instance of Sonar, 2Go of data space are used to analyze more than 6 million LOC with an history of 2 years. For every 1'000 LOC to analyze the database stores 350 Ko of data space.

Resources