Creating seperate weblinks for QA and dev region in SNOWFLAKE - database

I am not very sure if there is possibility in SNOWFLAKE to create separate weblinks for QA and dev region.
Now we have one common link to access SNOWFLAKE in our company and we have QA and Dev databases built in that, I was just wondering if there is a option to create seperate web links for, one link for QA and one link for Dev.

You can have "secondary" account setup, on a new URL that are part of the same bill, but they really are "another" account.
So the question becomes what value does this add.
With different URL's you can reuse the same SQL verbatim and not need to alter it per "region". You can also reuse the same use accounts. If you DDOS the endpoint (which uses to happen with 100+ connections) you also loss access to the admin control surface to make the instance big to handle "the increase in load" (this might have changed over the years, we last had this problem is 2017)
Re-using the same account but have prod-x/dev-x/qa-x users/databases/roles, means you have just one instance to admin. You have to use some region aware software to run/rewrite you SQL.
We did both at my old job. We started with all in one, and just handled it, but we did DDOS the endpoint and block ourselves from making it bigger till we found the tool that was just start new sessions, and run hard queries. So we got a second account (ignore the already extra account is different world regions) and planned to do all dev from that. But when we spun it up, we created some warehouse and back then the SQL commands didn't set a default auto off time like the UI, and some features where missing on the region, so we walked away from that instance for a month or two, and then got a bill with ~15K USD of server charges. Which was unpleasant. Anyways the dev instance never really got used. (the default to the warehouse creation was changed though). For our system having different account was really wasteful. Because to have data "always loading" and have dashboards always loadable for test (and multi-regions at that) means always have one extra-small always running, where-as when they where on the same instance both QA and DEV ran on the same instance, and given the total data load was so tiny, 1 instance was more than enough.
Which is to say, more instances leads to a lot of waste. If you like waste, and extra overhead, go for it. Many people come from a big-iron perspective, thus to avoid noisy neighbor problem, each thing needs to be it's own box, but that is just not an issue here. just use prefix's, and it's "all separate"

Related

Delphi Solution for data replication between two remote sites loosely connected

I'm using Delphi XE4 Architect (Delphi Xe3 is ok as well)
I need to find a smart solution to the following problem
and I would like to use one of these frameworks: kbmMW or RemOjects SDK / DataAbstract or RealThinClient
Currently I have an application using a very simple MSSQL database on a site A that is used by users of a site B through the remote desktop.
The application sometimes needs to show some pictures and also view some PDF, but it is mostly text data entry.
There is no particular reason for me to use MSSQL,
but it is a database that I found already active and populated and I have not built it myself.
And now, it would be complicated to change it.
(Database is not important, not using specific features nor stored procedures nor triggers)
Users of the Site B are connected to the A site via a network connection very slow
and occasionally the connection is not available for a few hours and up to one day (this is the major problem).
The situation of the connection, unfortunately, can not be improved for various reasons.
The database is quite simple has many tables that hardly ever change,
about ten instead undergo daily updates and potentially they may be subject to competing changes.
Mainly the records of these tables contain data that are locked in update
from a single user to edit some fields and then he saves releasing the lock.
I would like to get something very different to optimize performance.
Users of the A site have higher priority, they are more important, because the A site is the headquarters.
I would like to have a copy of the database at Site A to Site B,
so that users of site B can work in local, much faster without using the remote desktop connecting to the site A.
The RDP protocol is not very optimized and in any case if the connection is absent, users could not work.
Synchronize and update databases lock records between the two databases may not be a big problem.
Basically when a user of the Site B acquires edit a record in the database B,
obviously a user of the site A should not be able to modify the same record on the database of the site A.
This should also work in the opposite direction of course.
My big problem is figuring out how handling to the best the situation that occurs
when the connection between B and A is not available for some hours. (And transaction/events is increasing on site B).
Events on Site A have generally priority (on collision) on events on Site B.
Users of the Site B must be able to continue working.
When the connection becomes active, the changes should be sent to the database at Site A.
Obviously this can result in conflicts, but the changes made on the record
possibly by users B can be discarded or it can be done under the supervision of a selective merge
and approval record by record user of the site B.
Well, I hope the scenario is almost explained clearly.
Additional infos:
DB schema is very simple, only tables, no triggers, stored procedure. So I can build one as example but imagine 10 tables that can be updated in concurrency.
DB is used by a desktop app of sales departement, so it contains most secret data.
Remote connection is typically max 512Kbit, but the main problem here is that the connection sometimes may be not active
and user on remote site must work anyway. THis is the main focus.
Total data of daily updates could be at max 10 Mb, compressed, only for DB connections. There are some other data synchronized
on the same connection but they are not part of this job.
I don't want to use specific MSSQL tools or services (replications or so on), because DB could change in future.
Thanks
We do almost exactly this using a Delphi client app, a kbmMW based Delphi server app, MSSQL database (though it used to work quite happily on on DBISAM database too).
We have some tables that only the head office site users are allowed to modify. The smaller tables are transferred in their entirety each time there is a "merge". The larger tables and the transaction type tables all have a date added and/or a date modified field and only those records that have been changed or added in the last 3 weeks or so (configurable) are transferred. This means sites can still update to the latest data even if they have been disconnected for quite some time - we used to have clients in remote places on dubious dial up lines!
We only run the merge routines once or twice a day but it would work equally well on an hourly basis or other time schedule.
At given times of day each site (including head office) "export" their changed/new records to files (eg client dataset tables or similar). These are then zipped up by the application and placed in an "outgoing" folder. The zip file is named based on the location id, date, time etc. The files are transferred by some external means eg via FTP / file share / email etc etc. Each branch office sends/transfers its data files to head office and head office transfers its data to each branch. The files are transferred by whatever means to an "incoming" folder.
On a regular basis (eg hourly) each location does a check on the incoming folder to see if there is anything new for it to import. If so it adds all the new records, branch locations overwrite the head-office data tables with the new ones and edited records are merged in "somehow". This is the tricky bit. The easiest policy is "head office wins" so all edits are accepted unless there is a conflict in which case the head office version wins. Alternatively you could use "last edited wins" - but then you need to make sure clocks are in sync across locations. The other option is to add conflicting records to some form of "suspense" status and let an end user decide at some point in the future. We do this on one data set. Whichever conflict method you choose you need to record each decision in some form of log table and prompt an administrative level user to check occasionally.
When the head office imports data or when data is added at the head office then a field is set to indicate the data is part of the master data. When branches add data this field is empty to indicate it has yet to reach the master set. This helps when branches export their data as they can include all data that doesn't have this field set.
We have found that you can't run the merge interactively as you'll end up never getting any work done and you won't be able to run the merge at night etc. It needs to be fully automated with the ability for an admin user to make adjustments at some point after the fact.
We've been running this approach for several years now on multi-site operations and once it settled down it has worked pretty much flawlessly. With 2 export/import schedules per day we have found the branch offices run perfectly well and are only ever missing less than a days worth of transactions. Works well in our scenario where we don't often have conflicts. Exported data is in the region of 5-10MB which zips up plenty small enough.
Primary keys are vital! We use a GUID and it hasn't let us down yet.
The choice of database server and n-tier framework are, actually, irrelevant. It's the process that matters here.
Basically when a user of the Site B acquires edit a record in the database B, obviously a user of the site A should not be able to modify the same record on the database of the site A. This should also work in the opposite direction of course.
I can't see how you're ever going to make this bit work reliably if both sites have their own copy of the database and you're allowing for dropped/non-existent inter-site connections on occasion.

Database time acces in Heroku with Play Framework

I am having a problem and I need your help.
I am working with Play Framework v1.2.4 in java, and my server is uploaded in the Heroku servers.
All works fine, I can access to my databases and all is ok, but I am experiment troubles when I do a couple of saves to the database.
I have a method who store data many times in the database and return a notification to a mobile phone. My problem is that the notification arrives before the database finish to save the data, because when it arrives I request for the update data to the server, and it returns the data without the last update. After a few seconds I have trying to update again, and the data shows correctly, therefore I think there is a time-access problem.
The idea would be that when the databases end to save the data, the server send the notification.
I dont know if this is caused because I am using the free version of the Heroku Servers, but I want to be sure before purchasing it.
In general all requests to cloud databases are always slower than the same working on your local machine. Even simply query that on your computer needs just 0.0001 sec can be as slow as 0.5 sec in the cloud. Reason is simple clouds providers uses shared databases + (geo) replications, which just... cannot be compared to the database accessed only by one program on the same machine.
Also keep in mind that free Heroku DB plans doesn't offer ANY database cache, which means that every query is fetched from the cloud directly.
As we don't know your application it's hard to say what is the bottleneck anyway almost for sure you have at least 3 ways to solve your problem. They are not an alternatives, probably you will need to use (or at least check) all of them.
You need to risk some basic plan and see how things changed with paid version, maybe it will be good enough for you, maybe not.
Redesign your application to make less queries. For an example instead sending 10 queries to select 10 different rows, you will need to send one query, which selects all 10 records at once.
Use Play's cache API to avoid repeating selecting the same set of data again and again. For an example, if you have some categories, which changes rarely, but you need category tree for each article, you don't need to fetch categories from DB every time, instead you can store a List of categories in cache, so you will need to use only one request to fetch article's content (which can be cached for some short time as well...)

simple Solr deployment with two servers for redundancy

I'm deploying the Apache Solr web app in two redundant Tomcat 6 servers,
to provide redundancy and improved availability. At this point, scalability is not a issue.
I have a load balancer that can dynamically route traffic to one server or the other or both.
I know that Solr supports master/slave configuration, but that requires manual recovery if the slave receives updates during the master outage (which it will in my use case).
I'm considering a simpler approach using the ability to reload a core:
- only one of the two servers is receiving traffic at any time (the "active" instance), but both are running,
- both instances share the same index data and
- before re-routing traffic due to an outage, the now active instance is told to reload the index core(s)
Limited testing of failovers with both index reads and writes has been successful. What implications/issues am I missing?
Your thoughts and opinions welcomed.
The simple approach to redundancy your considering seems reasonable but you will not be able to use it for disaster recovery unless you can share the data/index to/from a different physical location using your NAS/SAN.
Here are some suggestions:-
Make backups for disaster recovery and test those backups work as an index could conceivably have been corrupted as there are no checksums happening internally in SOLR/Lucene. An index could get wiped or some records could get deleted and merged away without you knowing it and backups can be useful for recovering those records/docs at a later time if you need to perform an investigation.
Before you re-route traffic to the second instance I would run some queries to load caches and also to test and confirm the current index works before it goes online.
Isolate the updates to one location and process and thread to ensure transactional integrity in the event of a cutover as it could be difficult to manage consistency as SOLR does not use a vector clock to synchronize updates like some databases. I personally would keep a copy of all updates in order separately from SOLR in some other store just in case a small time window needs to be repeated.
In general, my experience with SOLR has been excellent as long as you are not using cutting edge features and plugins. I have one instance that currently has 40 million docs and an uptime of well over a year with no issues. That doesn't mean you wont have issues but gives you an idea of how stable it could be.
I hardly know anything about Solr, so I don't know the answers to some of the questions that need to be considered with this sort of setup, but I can provide some things for consideration. You will have to consider what sorts of failures you want to protect against and why and make your decision based on that. There is, after all, no perfect system.
Both instances are using the same files. If the files become corrupt or unavailable for some reason (hardware fault, software bug), the second instance is going to fail the same as the first.
On a similar note, are the files stored and accessed in such a way that they are always valid when the inactive instance reads them? Will the inactive instance try to read the files when the active instance is writing them? What would happen if it does? If the active instance is interrupted while writing the index files (power failure, network outage, disk full), what will happen when the inactive instance tries to load them? The same questions apply in reverse if the 'inactive' instance is going to be writing to the files (which isn't particularly unlikely if it wasn't designed with this use in mind; it might for example update some sort of idle statistic).
Also, reloading the indices sounds like it could be a rather time-consuming operation, and service will not be available while it is happening.
If the active instance needs to complete an orderly shutdown before the inactive instance loads the indices (perhaps due to file validity problems mentioned above), this could also be time-consuming and cause unavailability. If the active instance can't complete an orderly shutdown, you're gonna have a bad time.

How to create a reliable mobile service

I have developed a mobile application which is using extensively web services. It connects to my shared hosting server to get real-time information. Therefore, making sure the server is up is extremely important. Otherwise I am going to lose customers.
Some background. I changed no less than 3 hosting providers because they were not very reliable in terms of uptime. My currrent hosting is way better than those previous three, have I used it now for over a year, they have 99.9% uptime guarantee and all, but today I had about 3 hours of downtime. Which is why I am creating this post.
Not all of us small developers can afford expensive dedicated hosting, or have our own servers at home (which is not a guarantee it never will be down). In my case, having shared hosting for a very reasonable $10-15/month is OK. Except for those few hours it might be down.
One idea I have to deal with this is the following: have a second (different) shared hosting with another provider, and make the app to default to using this second hosting when my primary host is down. It's very unlikely that both will be down at the same time. I am going to pay only a few dollars extra per month for this, not 10 times more per month as I would for a dedicated hosting.
I am sure I am not the first person in this situation. Have anyone found a good way to deal with this problem, not requiring deep pockets? We are after all talking only about short periods of downtime on the primary server.
Thanks in advance for your suggestions.
If you are relying on a third party host and don't want to pay for greater reliability then a second server is the way to go. Depending on your application and budget you will also need to consider:
Database access and synchronization
Hosts in different physical locations
Multiple domain names and/or load balancing
If you opt to use multiple hosts and switch to a different (backup) host if one (the first) fails then you should aim to always have both (all) always in use. This way you won't get caught out trying/having to switch over to a "backup" server. By always using both (all) you can be sure that they are both (all) always up to date and working.
If your service is so critical that a couple of hours down time would be unacceptable to your users, then it should be easy to get the users to pay for that kind of reliability. This could fund hosting with a provider who can provide a greater level of up time or a second site. This will also help fund the time and effort to set all this up. ;)

How to gear towards scalability for a start up e-commerce portal?

I want to scale an e-commerce portal based on LAMP. Recently we've seen huge traffic surge.
What would be steps (please mention in order) in scaling it:
Should I consider moving onto Amazon EC2 or similar? what could be potential problems in switching servers?
Do we need to redesign database? I read, Facebook switched to Cassandra from MySql. What kind of code changes are required if switched to Cassandra? Would Cassandra be better option than MySql?
Possibility of Hadoop, not even sure?
Any other things, which need to be thought of?
Found this post helpful. This blog has nice articles as well. What I want to know is list of steps I should consider in scaling this app.
First, I would suggest making sure every resource served by your server sets appropriate cache control headers. The goal is to make sure truly dynamic content gets served fresh every time and any stable or static content gets served from somebody else's cache as much as possible. Why deliver a product image to every AOL customer when you can deliver it to the first and let AOL deliver it to all the others?
If you currently run your webserver and dbms on the same box, you can look into moving the dbms onto a dedicated database server.
Once you have done the above, you need to start measuring the specifics. What resource will hit its capacity first?
For example, if the webserver is running at or near capacity while the database server sits mostly idle, it makes no sense to switch databases or to implement replication etc.
If the webserver sits mostly idle while the dbms chugs away constantly, it makes no sense to look into switching to a cluster of load-balanced webservers.
Take care of the simple things first.
If the dbms is the likely bottle-neck, make sure your database has the right indexes so that it gets fast access times during lookup and doesn't waste unnecessary time during updates. Make sure the dbms logs to a different physical medium from the tables themselves. Make sure the application isn't issuing any wasteful queries etc. Make sure you do not run any expensive analytical queries against your transactional database.
If the webserver is the likely bottle-neck, profile it to see where it spends most of its time and reduce the work by changing your application or implementing new caching strategies etc. Make sure you are not doing anything that will prevent you from moving from a single server to multiple servers with a load balancer.
If you have taken care of the above, you will be much better prepared for making the move to multiple webservers or database servers. You will be much better informed for deciding whether to scale your database with replication or to switch to a completely different data model etc.
1) First thing - measure how many requests per second can serve you most-visited pages. For well-written PHP sites on average hardware it must be in 200-400 requests per second range. If you are not there - you have to optimize the code by reducing number of database requests, caching rarely changed data in memcached/shared memory, using PHP accelerator. If you are at some 10-20 requests per second, you need to get rid of your bulky framework.
2) Second - if you are still on Apache2, you have to switch to lighthttpd or nginx+apache2. Personally, I like the second option.
3) Then you move all your static data to separate server or CDN. Make sure it is served with "expires" headers, at least 24 hours.
4) Only after all these things you might start thinking about going to EC2/Hadoop, build multiple servers and balancing the load (nginx would also help you there)
After steps 1-3 you should be able to serve some 10'000'000 hits per day easily.
If you need just 1.5-3 times more, I would go for single more powerfull server (8-16 cores, lots of RAM for caching & database).
With step 4 and multiple servers you are on your way to 0.1-1billion hits per day (but for significantly larger hardware & support expenses).
Find out where issues are happening (or are likely to happen if you don't have them now). Knowing what is your biggest resource usage is important when evaluating any solution. Stick to solutions that will give you the biggest improvement.
Consider:
- higher than needed bandwidth use x user is something you want to address regardless of moving to ec2. It will cost you money either way, so its worth a shot at looking at things like this: http://developer.yahoo.com/yslow/
- don't invest into changing databases if that's a non issue. Find out first if that's really the problem, and even if you are having issues with the database it might be a code issue i.e. hitting the database lots of times per request.
- unless we are talking about v. big numbers, you shouldn't have high cpu usage issues, if you do find out where they are happening / optimization is worth it where specific code has a high impact in your overall resource usage.
- after making sure the above is reasonable, you might get big improvements with caching. In bandwith (making sure browsers/proxy can play their part on caching), local resources usage (avoiding re-processing/re-retrieving the same info all the time).
I'm not saying you should go all out with the above, just enough to make sure you won't get the same issues elsewhere in v. few months. Also enough to find out where are your biggest gains, and if you will get enough value from any scaling options. This will also allow you to come back and ask questions about specific problems, and how these scaling options relate to those.
You should prepare by choosing a flexible framework and be sure things are going to change along the way. In some situations it's difficult to predict your user's behavior.
If you have seen an explosion of traffic recently, analyze what are the slowest pages.
You can move to cloud, but EC2 is not the best performing one. Again, be sure there's no other optimization you can do.
Database might be redesigned, but I doubt all of it. Again, see the problem points.
Both Hadoop and Cassandra are pretty nifty, but they might be overkill.

Resources