I plan to add anti-virus protection to our web application that is being built. I have a concern that even the limited amount of files (PDF files, images, or even unknown binaries) that the user uploads may contain viruses.
Concerns:
The images are shared with other users (exposed to web pages) may contain viruses.
The PDF files that users share with each other may contain viruses.
The API that I build for this web application handles the file upload and this API is the file server as well.
Are there any state-of-the-art approaches to minimize the exposure of users to malware, including techniques in the API or techniques on the client-side (browser)? More specifically, I'm interested in solutions that would scan files in the API itself (backend). The files may be stored in a database or on the file-system.
I definitely searched Github for open-source tools and packages, moreover, ran several searches on Google against terms like "open source anti-virus API", "open-source malware HTTP API", but could not find any. Broader search terms resulted in a huge amount of unrelated results.
A related and outdated question investigates a similar problem, but I'm looking for a solution that would integrate well into a micro-service architecture, like Kubernetes, moreover, I think a canonical answer would be useful from an expert.
There are definitely solutions that can help you and integrate into your web application via an API. Here are a few that I am aware of:
SophosLabs Intelix
Intelix is a threat intelligence platform that provides access via APIs through AWS Marketplace. There are three parts to the service lookups, static analysis and dynamic analysis. Each one will give a more detailed analysis of the file. Combining the three will give you a good protection for your web application.
VirusTotal
VirusTotal is a community that will provide you with aggregated information showing you what various anti-malware vendors will say about your file. While VT is a great service, one thing to watch here is that VT is focused on being a community and therefore files uploaded are shared with others.
Clam AV
Not one that I have personal experience of but Clam AV allow you to spin up a server and then query it using API. There is a tutorial / documentation here.
Others
If you tweak your google search and look for Sandboxes most offer an API for a fee. A couple that come to mind Joe Sandbox, Falcon Sandbox which powers Hybrid Analysis.
As always, be careful of any cloud service that offers you scanning for free. Most of the free tools will share the reports and/or files within their community.
Related
We've narrowed our selection for an ipaas down to the above 3.
Initially we're looking to pass data from a cloud based HR system to Netsuite, and from Netsuite to Salesforce, and sometimes JIRA.
i've come from a Mulesoft background which I think would be too complex for this. On the other hand it seems that Celigo is VERY drag and drop, and there's not much room for modification/customisation.
Of the three, do you have any experience/recommendations? We aren't looking for any code heavy custom APIs, most will just be simple scheduled data transfers but there may be some complexity within the field mapping, and we want to set ourselves up for the future.
I spent a few years removing Celigo from NetSuite and Salesforce. The best way I can describe Celigo is that it is like the old school anti-virus programs which were often worse than the viruses... lol... It digs itself into the end system, making removing it a nightmare.
Boomi does the job, but is very counter-intuitive, and overly complex. You can't do everything from one screen, you can't easily bounce back and forth between tasks/operations/etc. And, sometimes it is very difficult to find where endpoints are used, as they are not always shown in their "where is this used" feature. Boomi has a ton of endpoint connectors pre-built (the most, I believe), but I have not seen an easy way to just create your own. Boomi also has much more functionality than just the integrations, if that is something that may be needed.
Jitterbit, my favorite, is ridiculously simple to use. You can access everything from one main screen, you can connect to anything (as long as it can reach out to the network, or you can reach it via the network - internal or external). Jitterbit has a lot of pre-built endpoint connectors. It is also extremely easy to just create a connection to anything you want. The win with Jitterbit is that it is super easy to use, super easy to learn, it always works, they have amazing support (if you need it). I have worked with Jitterbit the most (about 6 years), and I have never been unable to complete an integration task in less that a couple of day, max.
I have extensive experience with Dell Boomi platform but none with JitterBit or Celigo. Dell Boomi offers very versatile and well supported iPaaS solution. The technical challenges of Boomi are some UI\usability issues (#W3BGUY mentioned the main ones) and the lack of out-of-the-box support for CI/CD and DevOps processes (code management, versioning, deployments etc.)
One more important component to consider here is the pricing of the platform. Boomi does charge their clients yearly connection prices. Connection is defined as a unique combination of URL, username and password. The yearly license costs vary and can range anywhere between ($1,000 - $12,000) per license per year. The price depends greatly on your integration landscape and the discounts provided so I would advise on engaging with vendor early to understand your costs. Would be great to hear from others on pricing for JitterBit and Celigo.
Boomi is also more than just an iPaaS platform. They offer other modules of their platform to customers: API Management, Boomi Flow (workflow and automation module), Master Data Hub (master data management). Some of these modules are well developed and some are in their infancy (API Management).
From my limited experience with MuleSoft platform, I share the OP's sentiments about it being too complex for simple integrations. They do provide great CI/CD and DevOps functionality though if that is something that is needed.
There is not a simple answer to a question like this. One needs to look at multiple aspects of the platform and make a decision based on multitude of factors. I would advise looking at Gartner and Forrester reports for a general guidelines and working out the pricing (initial and recurring) with the vendor.
I have only used Jitterbit, so can only comment on that. It works fine. It is pretty intuitive and easy to use, but does have some flexibility with writing your own queries, defining and mapping file formats, and choosing different transfer protocols.
I've only used the free version (which you need to host somewhere and also is not supported) and it was good enough for production tasks. If you have the luxury of time, I'd say download it and try it out. If it works for you, throw it on a server or upgrade to the cloud version.
One note: Jitterbit uses background services. If you run it locally and then decide to migrate your account to a server, you need to stop those services on your local. Otherwise, it will try to run jobs from both locations and that doesn't turn out well.
Consider checking out Choreo as well. It has a novel simultaneous code + low-code approach for integration development. And provides rich AI support for performance monitoring, debugging, and data mapping.
Disclaimer: I'm a member of the project.
I am starting out with Google Apps Script, but as learning graph is growing exponentially few terms are getting hard to grasp.
Can anyone simply explain me about relationship between GAS and Googles Cloud Platform. How they are related? When there is a need of GCP when working with GAS?
Whats difference between creating project on Developer Console and creating on Cloud console?
Please add comment if need any rectification in question, i'll be happy to do that.
Thank you.
Google Apps Script is part of the G-Suite products, generally not related to the Google Cloud Platform products in ways other than being both Google products.
From the developer/customer prospectve occasionally there may be some marginal interactions between the products, but in a very limited manner - special situations. There might be more interactions on Google's infra side, but they're transparent for developers/customers.
So normally there is no need for GCP when working with GAS (or viceversa) unless it's for one of those very particular interaction situations, specifically documented.
For example
the Cloud Resource Manager is aware of and can deal differently with G-Suite customers. From Acquiring an Organization resource:
G Suite:
The first time a user in your domain creates a project, the Organization resource is automatically created and linked to your
company’s G Suite account. The current project and all future
projects will automatically belong to the organization.
a while ago mapping a Google App Engine application to a custom domain was only possible through G-Suite (known as Google Apps at the time).
It's not uncommon for various documentation/guides, including those G-Suite related (either Google's or from third parties) to reference GCP products, but usually that's because of the convenience of using them, not because they are actually needed.
So you really need to be aware of which family the product/documentation you're using is part of. The GCP products and documentation are a lot better of consistently identifying themselves a being part of GCP.
Finally the Developer Console is just the older, but still widely used name of the Cloud Console, they're the same thing - used for GCP. Unless you're referring to the Admin Console which is used for G-Suite.
I work at an office with some colleagues generating and consuming structured data which can be normally stored in a database. For instance:
- data from several countries: capital, population, currency, ...
- forecasts of the evolution of the population in each country: each year we generate one different time series
We store these data in dozens of Excel files (which is the last version?, where are they stored?, are they in a shared directory?), and we produce lots of document from these data (power point files, other Excel files to make calculations, ...).
I know how to install a mySQL server on Linux, and I could build a web-app to generate and store data, and I could build an API to consume the data. But I wondered if there was any other smarter solution to implement a simple centralized system to store and share information at an office.
Thank you very much.
It may be better to implement a cloud service here instead of working things from scratch - just to save the time and effort. Here are two cloud service solutions with some pros and cons of their features. If anything strikes to be useful, I would recommend looking deeper into them.
Cloud Service 1
If storing and sharing of files is the key point here, using an online storage system like box.com would be a good solution. I personally like box.com better than Dropbox since it seems to have better Admin capabilities when working in teams (I may just be baiased here).
Pros:
there are version histories for uploaded files, and files can be locked so they cannot be downloaded
there are access stats (logs) for each file, so you know when someone viewed or downloaded files
there's an area for Box users to leave comments for each file that is uploaded
Excel/Word/Powerpoint files can be previewed in the browser before
downloading them (and other files as well - personally found that
preview of vector files being very useful)
directories and files are immediately accessable via mobile after uploading
shared links can be generated for each file or directory for users without Box accounts to view and download
Cons:
Users will need a Box account to upload files (even for the shared links)
Users may be more used to the Dropbox UI and may find Box to be have a different UX than expected (althoug the UI is not hard to master at all)
Cloud Service 2
If finding an SQL alternative is the key point here, using an online database platform such as kintone would be a good solution. kintone allows you to build customizable online tables (called "Apps" in kintone) using drag and drop.
Pros:
live graphs can be generated on kintone from the stored data
database tables can be created and updated really easily with just the GUI
you can define table columns (or "Fields") to store attachment files
each row (or "Record") has an area for users to leave comments
each Record has a history feature, so you know who edited what contents and when. Nothing is saved locally on your computer, so the latest data is always online (no conflicts occur)
kintone also has internal forums (or "Spaces") that can be used as an alternative for internal emails.
Apps and Spaces are instantly accessable via mobile
kintone has open REST APIs and JavaScript customization capabilities for any further UI changes or connecting with other sources
Cons:
users need a kintone account to view data or to add data into apps, although there are 3rd party solutions (at a reasonable price) that allow you to do that
it may not be as intuitive as storing data in excel spreadsheets (but that's mainly because everyone's used to excel)
Further questions about cloud services may belong better in the Software Recommendations Community https://softwarerecs.stackexchange.com/
I'm personally good with the kintone APIs, so if you have any further questions related to API (capabilities, limits, possibilities etc), please go ahead to post them here in stackoverflow
I am asking this in very general sense. Both from cloud provider and cloud consumer's perspective. Also the question is not for any specific kind of application (in fact the intention is to know which type of applications/domains can fit into which of the cloud slab -SaaS PaaS IaaS).
My understanding so far is:
IaaS: Raw Hardware (Processors, Networks, Storage).
PaaS: OS, System Softwares, Development Framework, Virtual Machines.
SaaS: Software Applications.
It would be great if Stackoverflower's can share their understanding and experiences of cloud computing concept.
EDIT: Ok, I will put it in more specific way -
Amazon EC2: You don't have control over hardware layer. But you can take your choice of OS image, Dev Framework (.NET, J2EE, LAMP) and Application and put it on EC2 hardware. Can you deploy an applications built with Google App Engine or Azure on EC2?
Google App Engine: You don't have control over hardware and OS and you get a specific Dev Framework to build your application. Can you take any existing Java or Python application and port it to GAE? Or vice versa, can applications that were built on GAE be taken out of GAE and ported to any Application Server like Websphere or Weblogic?
Azure: You don't have control over hardware and OS and you get a specific Dev Framework to build your application. Can you take any existing .NET application and port it to Azure? Or vice versa, can applications that were built on Azure be taken out of Azure and ported to any Application Server like Biztalk?
Good question! As you point out, the different offerings fit into different categories:
EC2 is Infrastructure as a Service; you get VM instances, and do with them as you wish. Rackspace Cloud Servers are more or less the same.
Azure, App Engine, and Salesforce are all Platform as a Service; they offer different levels of integration, though: Azure pretty much lets you run arbitrary background services, while App Engine is oriented around short lived request handler tasks (though it also supports a task queue and scheduled tasks). I'm not terribly familiar with Salesforce's offering, but my understanding is that it's similar to App Engine in some respects, though more specialized for its particular niche.
Cloud offerings that fall under Software as a Service are everything from infrastructure pieces like Amazon's Simple Storage Service and SimpleDB through to complete applications like Fog Creek's hosted FogBugz and, of course, StackExchange.
A good general rule is that the higher level the offering, the less work you'll have to do, but the more specific it is. If you want a bug tracker, using FogBugz is obviously going to be the least work; building one on top of App Engine or Azure is more work, but provides for more versatility, while building one on top of raw VMs like EC2 is even more work (quite a lot more, in fact), but provides for even more versatility. My general advice is to pick the highest level platform that still meets your requirements, and build from there.
This is an excellent question. Full disclosure as I am partial to Azure but have experience with the others.
Where I think Azure stands out from the others is the quick transition from on prem to the cloud. For example -
SQL Azure - change connection string, upload DB, go!
Queues work a lot like MSMQ.
Blobs are pretty much blobs any way you shake them but they scale like crazy.
The table storage component is good because it provides incredible scalability for name/value pairs - but takes some getting used to.
Service Bus is my favorite of the services because it allows for a variety of communications paradigms. Two SB endpoints first try to connect to each other, if they cannot, then they route through the cloud - makes for very secure and scalable processing when firewalls tend to get in the way.
Access control list - paired typically with the service bus to make sure the right people access the right things - think SAML in the cloud.
I hope that helps!
My cloud experience is currently limited to Salesforce.com
For standard business operations and automation it provides a significant number of features that allow us to get apps up and running very quickly. We are particularly benefitting from the following:
Security (Administrators can control access to objects and fields)
Workflow & Approvals
Automatic UI generation
Built in reporting and dashboards
Entire system (including our custom changes) is accessible via web services
Ability to make the data in the system available through public sites (e.g. eCommerce)
Large library of third party apps to solve standard problems
The platform does NOT solve every problem.
I would not use the platform to model a nuclear power station or build the next twitter.
The major points of cloud computing is to save on costs by paying for usage and enable immediate deployment of computing resources.
The costs are not purely x amount of cents per instance per hour. The costs include maintenance, development, administration, etc. The huge benefit of cloud, in my mind is to liberate the customers from having to manage anything that is not within the realm of their core business competency. If I am an insurance business, I want my developers to concentrate on my insurance problems that help solve needs of my claims, rates, etc. I would rather avoid dealing with problems of email servers, file servers, document repositories, and administrating OS patches, service packs, etc.
Thus, in my opinion, the biggest benefits are derived from the SaaS and PaaS cloud offerings. One should go to IaaS only when PaaS or SaaS have serious restrictions to specific needs (i.e. I need to install a set of proprietary COM components and Azure does not support them).
SaaS is good for commodity type of applications that are not the core line of business for the client, but are more of a utility. These are your typical Messaging systems, Portals, Document Repositories, Email systems, CRMs, ERP's, Accounting, etc. etc. etc. Why reinvent the wheel by writing your own when you can customize a well supported third party product.
PaaS is great for core line of business software that supports companies' main business offering. Abstracts clients from having to deal with OS management and lets clients concentrate on the business system development - something that noone else can do for the client.
One can also take advantage of the benefits of PaaS (let's say, Google App Engine) and extend it, at times and if necessary, by pulling out some virtual machines from IaaS providers (e.g. Amazon) to do some number crunching then just send back the output to Google App Engine.
This way, you get the best of both worlds -- you can rapidly develop scalable apps in GAE, then you can always augment it by running any program you want from Amazon virtual machines.
This keeps changing, now Windows Azure also supports VM, so it is also an IaaS provider now.
Now how about Free Amazon EC2 for a year to do a better comparision. Check this out.
http://www.buzzingup.com/2010/10/amazon-announces-free-cloud-services-for-new-developers/
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 6 years ago.
Improve this question
What alternatives are there to GAE, given that I already have a good bit of code working that I would like to keep. In other words, I'm digging python. However, my use case is more of a low number of requests, higher CPU usage type use case, and I'm worried that I may not be able to stay with App Engine forever. I have heard a lot of people talking about Amazon Web Services and other sorts of cloud providers, but I am having a hard time seeing where most of these other offerings provide the range of services (data querying, user authentication, automatic scaling) that App Engine provides. What are my options here?
AppScale
AppScale is a platform that allows users to deploy and host their own Google App Engine applications. It executes automatically over Amazon EC2 and Eucalyptus as well as Xen and KVM. It has been developed and is maintained by AppScale Systems. It supports the Python, Go, PHP, and Java Google App Engine platforms.
http://github.com/AppScale/appscale
In the mean time...
...it is amost 2015 and it seems that containers are the way to go forward. Alternatives to GAE are emerging:
Google has released Kubernetes, container scheduling software developed by them to manage GCE containers, but can be used on other clusters as well.
There are some upcoming PaaS on Docker such as
http://deis.io/
http://www.tsuru.io/
even Appscale themselves are supporting Docker
Interesting stuff to keep an eye on.
I don't think there is another alternative (with regards to code portability) to GAE right now since GAE is in a class of its own. Sure GAE is cloud computing, but I see GAE as a subset of cloud computing. Amazon's EC2 is also cloud computing (as well as Joyent Accelerators, Slicehost Slices), but obviously they are two different beasts as well. So right now you're in a situation that requires rethinking your architecture depending on your needs.
The immediate benefits of GAE is that its essentially maintenance free as it relates to infrastructure (scalable web server and database administration). GAE is more tailored to those developers that only want to focus on their applications and not the underlying system.In a way you can consider that developer friendly. Now it should also be said that these other cloud computing solutions also try to allow you to only worry about your application as much as you like by providing VM images/templates. Ultimately your needs will dictate the approach you should take.
Now with all this in mind we can also construct hybrid solutions and workarounds that might fulfill our needs as well. For example, GAE doesn't seem directly suited to this specific app needs you describe. In other words, GAE offers relatively high number of requests, low number of cpu cycles (not sure if paid version will be any different).
However, one way to tackle this challenge is by building a customized solution involving GAE as the front end and Amazon AWS (EC2, S3, and SQS) as the backend. Some will say you might as well build your entire stack on AWS, but that may involve rewriting lots of existing code as well. Furthermore, as a workaround a previous stackoverflow post describes a method of simulating background tasks in GAE. Furthermore, you can look into HTTP Map/Reduce to distribute workload as well.
As of 2016, if you're willing to lump PaaS (platform as a service) and FaaS (function as a service) in the same serverless computing category, then you have a few FaaS options.
Proprietary
AWS Lambda
AWS Lambda lets you run code without provisioning or managing servers. You pay only for the compute time you consume - there is no charge when your code is not running. With Lambda, you can run code for virtually any type of application or backend service - all with zero administration. Just upload your code and Lambda takes care of everything required to run and scale your code with high availability. You can set up your code to automatically trigger from other AWS services or call it directly from any web or mobile app.
AWS Step Functions complements AWS Lambda.
AWS Step Functions makes it easy to coordinate the components of distributed applications and microservices using visual workflows. Building applications from individual components that each perform a discrete function lets you scale and change applications quickly. Step Functions is a reliable way to coordinate components and step through the functions of your application. Step Functions provides a graphical console to arrange and visualize the components of your application as a series of steps. This makes it simple to build and run multi-step applications. Step Functions automatically triggers and tracks each step, and retries when there are errors, so your application executes in order and as expected. Step Functions logs the state of each step, so when things do go wrong, you can diagnose and debug problems quickly. You can change and add steps without even writing code
Google Cloud Functions
As of 2016 it is in alpha.
Google Cloud Functions is a lightweight, event-based, asynchronous compute solution that allows you to create small, single-purpose functions that respond to cloud events without the need to manage a server or a runtime environment. Events from Google Cloud Storage and Google Cloud Pub/Sub can trigger Cloud Functions asynchronously, or you can use HTTP invocation for synchronous execution.
Azure Functions
An event-based serverless compute experience to accelerate your development. It can scale based on demand and you pay only for the resources you consume.
Open
Serverless
The Serverless Framework allows you to deploy auto-scaling, pay-per-execution, event-driven functions to any cloud. We currently support Amazon Web Service's Lambda, and are expanding to support other cloud providers.
IronFunctions
IronFunctions is an open source serverless computing platform for any cloud - private, public, or hybrid.
It remains to seen how well FaaS competes with CaaS (container as a service). The former seems more lightweight. Both seem suited to microservices architectures.
I anticipate that functions (as in FaaS) are not the end of the line, and that many years forward we'll see further service abstractions, e.g. test-only development, followed by plain-language scenarios.
Alternatives:
1. AppScale
2. Heroku.
Ref: Alternative for Google AppEngine?
Amazon's Elastic Compute Cloud or EC2 is a good option. You basically run Linux VMs on their servers that you can control via a web interface (for powering up and down) and of course access via SSH or whatever you normally set up...
And as it's a linux install that you control, you can of course run python if you wish.
Microsoft Windows Azure might be worth consideration. I'm afraid I haven't used it so can't say if it's any good and you should bear in mind that it's a CTP at the moment.
Check it out here.
A bit late, but I would give Heroku a go:
Heroku is a polyglot cloud application platform. With Heroku, you
don’t need to think about servers at all. You can write apps using
modern development practices in the programming language of your
choice, back it with add-on resources such as SQL and NoSQL databases,
Memcached, and many others. You manage your app using the Heroku
command-line tool and you deploy code using the Git revision control
system, all running on the Heroku infrastructure.
https://www.heroku.com/about
You may also want to take a look at AWS Elastic Beanstalk - it has a closer equivalence to GAE functionality, in that it is designed to be PaaS, rather than an IaaS (i.e. EC2)
If you're interested in the cloud, and maybe want to create your own for production and/or testing you have to look at Eucalyptus. It's allegedly code compatible with EC2 but open source.
I'd be more interested in seeing how App Engine can be easily coupled with another server used for CPU intensive requests.
TyphoonAE is trying to do this. I haven't tested it, but while it is still in beta, it looks like it's atleast in active development.
The shift to cloud computing is happening so rapidly that you have no time to waste for testing different platforms.
I suggest you trying out Jelastic if you are interested in Java as well.
One of the greatest things about Jelastic is that you do not need to make any changes in the code of your application, except the changes for your application functionality, but not for the reason the chosen platform demands this. With reference to this you do not actually waste your time.The deployment process is just flawless, and you can deploy your .war file anywhere further.Using GAE requires you to modify the app around their system needs. In case if you happen to get working with Java and start looking for a more flexible platform, Jelastic is a compatible alternative.
You can also use Red Hat's Cape Dwarf project, to run GAE apps on top of the Wildfly appserver (previously JBoss) without modification.
You can check it out here:
http://capedwarf.org/