How to store payment card information in case of payment processor migration? - pci-dss

I am working on a subscription-based project and I am looking for a way to store card information data in case I would be switching my payment processor provider to another one. In which case I would have to force all users to re-subscribe which would most definitely cause a massive loss of revenue(at least initially). So by having access to the card data I can "upload" card data into the new payment processor and continue operations without outage.
What would be a good way to approach this that would not bankrupt the project from the get-go(ie. PCI DSS certification and infrastructure costing hundreds of thousands of dollars)?

Tokenization (specifically tokenization-as-a-service) might be the solution you are looking for. In short, you send the card data to the tokenization provider, who can then securely store and forward the information where needed. This avoids vendor lock-in with your payment processors (and allows you to easily use multiple processors at any given time). Depending on the provider and how you've implemented it, you have effectively removed a significant portion of your application out of PCI scope.
(Full disclosure, I currently work for Basis Theory, which provides tokenization services. There are other companies that offer similar platforms, such as SkyFlow, VeryGoodSecurity, and TokenEx.)

Related

How much data should one nest in a GET API call?

Rolling with a similar example: If you have Companies and companies have Divisions and divisions have Employees, when you make a GET request for a company how do you decide what data to embed? For example you could return a Company with nested Divisions and People. Or, you could make 3 separate calls.
I've been using the GitHub API a bit and they seem to embed some data providing its not an array. For example, when you request a repo it may have the owner embedded but not issues and pull requests. How did they decide on this?
Also, it seems like, depending on your data store SQL vs NoSQL mileage may vary here.
Found this example m but its not quite the same.
when you make a GET request for a company how do you decide what data to embed?
I think your asking something like "when designing a resource, how do you decide what information belongs in the representation, and what information is linked?"
An answer to that is to pay attention to the fact that caching happens at the resource level -- the target-uri is the cache key. When we invalidate a resource, we invalidate all of its representations.
An implication here is that we want the caching policy of the resource to be favorable for all of the information included in the representation.
Mixing data that changes minute-by-minute with data that has an expected half life of a year makes it difficult to craft a sensible caching policy. So it might make more sense to carve the information into piles with similar life cycles, and have a separate resource for each.
Consider a website like stack overflow: the branding (logo, style sheets, images) doesn't change very often; the page contents (questions, answers, comments) change at a higher cadence. The total network bandwidth consumed is probably considerably lower if you link to the big, slowly changing design elements.
(Also, once you do use links, you have the ability to move the different resources independently - moving content to a CDN, routing high and low priority requests differently, and so on).

What is the software concept behind traffic maps?

A simple question: what is the name of concept that, for example, Google Maps and Waze use to discover and indicate traffic jam?
Is that from users information or something with satellite signals?
There are usually various sources involved and it all depends on which product you're looking at. I might miss a source or two, but these are probably the most important ones:
Location data from mobile phones in the background: Google has its location services in Android that show Google where you are every once in a while. If several people are moving slowly while on a road, it's very likely they are in a traffic jam.
Location data actively shared via mobile phones: Waze is very open about doing this, for example. Whenever you have Waze open, your location is sent to their servers and if several people are slowing down at a certain location, the system assumes there is a traffic jam at that point.
Location data from service providers: many service providers give out aggregated detailed information on where their users are located. This data can be used to estimate how many people are at a manifestation, for example, but also to figure out where traffic is slowing down.
TMC (Traffic Message Channel) via radio: a digital layer (RDS) within the radio signal makes it possible for a traffic management agency to send out alerts for traffic jams, accidents, dangerous weather and so on. This is usually the only source used by built-in navigation systems.
Traffic data from road-side equipment: many countries have a monitoring system for their road network that allows them to see how many vehicles are passing at a certain point and at what speed. This data is sometimes shared by the government with other companies.
Usually the best solution is to use a combination of these sources. Navigation systems usually can't rely on just mobile phone data as they'd need a widespread use before it is possible to detect most traffic jams. Some sources require a lot of fine-tuning and complex algorithms to remove the noise in the data.

Secure database and webpage against modification

My website provides extremely sensible information (think of bank account numbers) publicly available through webpages and webservices. The customers may modify these information when authentified with a username and a password.
Any hacking intrusion that would successfully modify the entries of the database, or modify the information displayed on the webpage, would be disastrous, as account numbers might then be incorrect and money could be directed to a malicious bank account.
Do you have any general advices about the architecture that would make such a service as robust as possible? I would not be responsible in case of a weak password, so my main concern is about attacks that would simply bypass the authentication process and modify the database without triggering any alert on my side; it could also be the html code of the webpage that is directly modified to show different information...
Thank you
In this case i would make sure to harden the system itself as good as possible. This includes a very broad spectrum reaching from Security Roles over transaction based usage of the database, logging as well as the prevention of all sorts of attacks like SQL injection, cross site scripting in general and maybe if its a that sensible system use certificates and general IP checks (like have a white list of IPs that are allowed to populate requests to the system that do not instantly get refused). Not to mention your Host architecture has to be protected regardless of the implemented security features inside your system (key words: firewalls, user privileges etc.). During the development process there should always be auto code checking software (like Sonar) running to detect logical errors and stuff.
Then it could also be a good idear to have a second system just to monitor your primary systems status. This system should log and notify you on:
changes made to the system itself (like if someone has access to your business logic and for examply removes authentication logic)
changes made to the database that are not consistent with your primary systems state.
detect suspicious actions: Banks for example have rules that apply on your account. Like if you used to make payments within europe for the last time and then out of nothing make a huge payment to lets say china you would recive a notification to commit this payment. The payment then would not be triggered unless that second commitment of the customer.
In the end you already pointed out correctly that you just can harden it as good as possible but generally not make it "100%" safe (at least in theory) so to have a good level of security part of the total system would include beeing able to detect unwanted changes, identify the exact changes already beeing made and have information on the overall status of your system to allow a rollback or manual correction of a corruptet state in case it already happened.
Even after having implemented mentioned techniques you would have to continously check for security bugs in used frameworks, librarys and the system as a full (like using security penetration frameworks that auto try to corrupt your system).
What i want to show you with my answer is what the comments already suggest: It is a very broad and complex topic with multiple layers of security concernes you will have to either study yourself or have framework solutions that "ensure" you to take care of the topic (like Webframeworks often include basic XSS prevention).
Without wanting to sound harsh, but if you have to ask this question on Stack Overflow, you're not really qualified to work on this project.
The financial value of your data sounds like it's enough for an attacker to expend significant resources breaching your defenses - and the consequences of such a breach would be disastrous for your organization and its customers; it could lead to the organization having to close down. You really don't want to be learning about security from strangers on the internet in this case.
One place to start learning in is with the established standards for managing financial information, often referred to as "PCI standards"; these provide guidelines for hardware, software and processes for organizations that deal with payment details.
There are numerous books on IT security; I like the "Hacking Exposed" series, and "Security Engineering".
You might also bring in specialized IT security consultants; I've worked with a number of these guys, and many of them are very good at helping you engineer security into your solution.

Booking logic and architecture, database sync: Hotels, tennis courts reservation system

Imagine that you want to design a tennis booking system.
You have 5 tennis clubs as partners with no online api allowing you to check on their side if a court is booked or not: You have to build this part as well.
Every time a booking is done on their side you want it to be known by our system. Probably using a POST request form tennis partner to our server.
Every time a booking is done on our website, we want to push the booking to their system. The difficulty is that their system need to be online and accessible from outside. Ip may change, we have to use a dns updater.
In case their system is not available we still accept the booking and fallback to an async email with 'i confirm booking/reject booking' link sent to the club.
I find the whole process quite complex and was wondering about the way online hotel booking system and hotel were working. Do they all have their data open and online ?
The good thing is that the data will grow large and fits nicely to some no SQL ;) like couch db
There are several questions here, let me try and address each one...
Since this appears to be an internet application with federated servers, using the implied HTTP Protocol makes a lot of sense. This could be done via Form POSTs, GET, or even REST-ful submission of some custom data structure. In the end, the exact approach to use will need to come down to the size and complexity of the information being communicated. Many architectures employ these approaches and often combine them with encrypted, signed, and/or encoded payloads for security. One short-fall to consider with these approaches is that they will require you to clearly communicate all request / response message formats, field ranges, and variations since these mechanisms are not really self-describing. On the other hand, these patterns use very common protocols, are easily understood, easy to implemented, and are typically lean on-the-wire.
In constrast, architectures with very complex structures often chose to use WSDL-based web services. Also driven by common standards, these tend to be self-describing, inherently versionable, although they can take more time and energy to implement. There are a lot of advantanges to web services which are driven by many WS-* standards which may be worth investigating further in your case.
As for the reservation process... many similar architectures will employ an orchestration model such as the following:
Find open booking spaces
Make a reservation for a booking space. This places an expiring lock on a space while the requestor fills in all required booking information. This mitigates against race conditions that could lead to multiple bookings for the same space
Once all required booking information is received and validated the booking is confirmed and permamently locked from use by other requestors
As for the SQL-style DB comment, I can't really say given the amount of information supplied. With that said, my instincts tell me a SQL-style DB is completely reasonable for this problem set. I have databases with many pedabytes and have very high SLA's. You implied a need for high availability and SQL-based databases have a few decades of proven support behind them in this area.
Hope this helps.
I think you will find most on-line hotel reservation systems aren't really on-line. My experience is that those companies (not the hotels themselves) offering on-line booking systems also insist that the hotel itself also books their rooms on-line using the same system.
Everything works fine as long as connectivity is not an issue - and in small motels scenario it normally will. Of course the bigger hotels use the same system the airlines do and they have dedicated communications links for the purpose. The reservations are of course maintained on one central computer with appropriate backup links etc etc etc.
It is very easy for individual tennis clubs to offer their own real-time online booking systems using their own database/website with programs like MyCourts offers however once you want to link more than one clubs facilities then you really don't have much option other than to have a centralized server that both the user and the club both have to use to reserve facilities.

Document/Image Database Repository Design Question

Question:
Should I write my application to directly access a database Image Repository or write a middleware piece to handle document requests.
Background:
I have a custom Document Imaging and Workflow application that currently stores about 15 million documents/document images (90%+ single page, group 4 tiffs, the rest PDF, Word and Excel documents). The image repository is a commercial, 3rd party application that is very expensive and frankly has too much overhead. I just need a system to store and retrieve document images.
I'm considering moving the imaging directly into a SQL Server 2005 database. The indexing information is very limited - basically 2 index fields. It's a life insurance policy administration system so I index images with a policy number and a system wide unique id number. There are other index values, but they're stored and maintained separately from the image data. Those index values give me the ability to look-up the unique id value for individual image retrieval.
The database server is a dual-quad core windows 2003 box with SAN drives hosting the DB files. The current image repository size is about 650GB. I haven't done any testing to see how large the converted database will be. I'm not really asking about the database design - I'm working with our DBAs on that aspect. If that changes, I'll be back :-)
The current system to be replaced is obviously a middleware application, but it's a very heavyweight system spread across 3 windows servers. If I go this route, it would be a single server system.
My primary concerns are scalabity and performace - heavily weighted toward performance. I have about 100 users, and usage growth will probably be slow for the next few years.
Most users are primarily read users - they don't add images to the system very often. We have a department that handles scanning and otherwise adding images to the repository. We also have a few other applications that receive documents (via ftp) and they insert them into the repository automatically as they are received, either will full index information or as "batches" that a user reviews and indexes.
Most (90%+) of the documents/images are very small, < 100K, probably < 50K, so I believe that storage of the images in the database file will be the most efficient rather than getting SQL 2008 and using a filestream.
Oftentimes scalability and performance are ultimately married to each other in the sense that six months from now management comes back and says "Function Y in Application X is running unacceptably slow, how do we speed it up?" And all too the often the answer is to upgrade the back end solution. And when it comes to upgrading back ends, its almost always going to less expensive to scale out than to scale up in terms of hardware.
So, long story short, I would recommend building a middleware app that specifically handles incoming requests from the user app and then routes them to the appropriate destination. This will sufficiently abstract your front-end user app from the back end storage solution so that when scalability does become an issue only the middleware app will need to be updated.
This is straightforward. Write the application to an interface, use some kind of factory mechanism to supply that interface, and implement that interface however you want.
Once you're happy with your interface, then the application is (mostly) isolated from the implementation, whether it's talking straight to a DB or to some other component.
Thinking ahead a bit on your interface design but doing bone stupid, "it's simple, it works here, it works now" implementations offers a good balance of future proofing the system while not necessarily over engineering it.
It's easy to argue you don't even need an interface at this juncture, rather just a simple class that you instantiate. But if your contract is well defined (i.e. the interface or class signature), that is what protects you from change (such as redoing the back end implementation). You can always replace the class with an interface later if you find it necessary.
As far as scalability, test it. Then you know not only if you may need to scale, but perhaps when as well. "Works great for 100 users, problematic for 200, if we hit 150 we might want to consider taking another look at the back end, but it's good for now."
That's due diligence and a responsible design tactic, IMHO.
I agree with gabriel1836. However, an added benefit would be that you could for a time run a hybrid system for a time since you aren't going to convert 14 millions documents from your proprietary system to you home grown system overnight.
Also, I would strongly encourage you to store the documents outside of a database. Store them on a file system (local, SAN, NAS it doesn't matter) and store pointers to the documents in the database.
I'd love to know what document management system you are using now.
Also, don't underestimate the effort of replacing the capture (scanning and importing) provided by the proprietary system.

Resources