How many entries can Sobipro handle? - joomla3.0

How many entries can sobipro handle. I'd like to use this so my clients real estate clients can post up their business. Currently we have a little over 10,000 business with one image each and contact info.

SobiPro multi directory extension is free so I say give it a shot to see if it has enough memory for those 10,000 businesses -https://www.sigsiu.net/download/sobipro.

Related

Building personalized feed on App Engine

I have been working on a social app. I'll first explain the problems, and then summarize in the questions below.
In the network, there would be channels, and users. Users can subscribe to these channels, and to other users. This way, we have two sources from which posts can be generated.
Now, we can simply keep one Activity model where we record all the actions, their kind, and what they affect. Be it from channels, or from the users. And refer these while creating a feed for each user.
I found a solution given in a talk by Brett Slatkin which basically suggests using ListProperty to link each post with each subscriber. But Guido suggests not to use lists if there's going to be more than 1000 elements. So if there's going to be more than 1000 subscribers to a channel, this will probably run into problem. Even if this were to work --
I want to rank the posts based on popularity (based on number of votes, comments), and apply some time decay function. More like Reddit. To do so, I will have to keep the Activity in memory, and filter and order it based on ranks for each user. I'll also need to do it periodically since new activities will keep occurring also old activities will gain, or lose their values.
The challenge is -- To keep the data in memory (for processing the feed as well as to keep things fast). I will have to store a copy of each users feed to persistent storage, but if the order of posts is going to be changing, how do I keep track of that in the database?
Also: I have kept my options open -- I will move to AWS if I have to.
To summarize:
Is there a better solution to keep track of subscribers without using Lists? Using something like PostID > SubscriberID in one entity would be very, very expensive and inefficient.
If there's any cost-effective and fast solution to the problem above, how do I deal with the next challenge -- which is to generate a personalized feed? (memory issues - unknown size of memcache)
If I can generate a personalized feed (which will be dynamic, will be changing) how to I keep it in the database?.
I have gone through several articles and I can probably solve first two problems with AWS, but I am trying to stay away from the manual scaling work. If there is no way, I am willing to move to AWS. Even if I move to AWS, I can't think of a solution to the third problem.
Any thoughts, directions, resources would be helpful! Thanks!

Which database to use for game analytics?

We're building an app that will have a number of games. Kids will learn Math as they play these games. All the user profile data, game data and lessons/ questions data are all being stored in the app and will sync to a MySQL database on the server side.
There also tons of events data that we would like to capture, analyze and improve our game. These events could be the start of a lesson, touching a game object, choosing the correct game object but targeting it wrongly, answering correctly but got timed out and so on. We expect this to be 100s of rows for each game that the kids plays. Also the data stored will be dependent on the type of event.
The database should allow us to analyze the data and answer questions like which games are tough on kids, which lessons are too easy for kids, are kids from some countries finding some of the lessons to be tough, how long are each of these games able to hold the attention of the kid and so on.
Which database would allow us to store so many different types of events, scale to millions of rows a day and allow for all these kinds of analysis? Given the changing nature of the data model, NoSQL seems to be an obvious choice. But which one would allow us to do all these analysis. Or should we go with Hadoop / Hive?
Thanks in advance.
Although you can do this using Hadoop/Hive, but you won't get real time performance as Hive is best suited for batch processing kinda stuff. Hbase would be a better choice in such a scenario. You could create OLAP datacube kinda thing whose dimensions could be the info specified by you, like session info, info about each kid etc etc. Or you could serialize all of this information as JSON objects and then store them in Hbase cells. You could also store each of these events in individual cells, but that would consume unnecessary space and won't be that efficient while fetching the data back.
HTH

Freebase: Is it worth it to base my company's entire database on it?

I'm with a company that is building a venue / artist database for live music and recently came across Freebase. It looks very compelling, even if the data isn't there for new, up-and-coming bands. For those of you who have worked with Freebase, I have a couple questions:
Are there downsides to integrating all of the data entry with Freebase? We are not looking to sell or privatize this information.
What are the weaknesses of Freebase, with regards to usability?
Disclosure: I work on Freebase at Google.
The music data in Freebase is one of our strongest areas and is going to continue to get broader and richer as we continue to load more datasets. For example, we import data from MusicBrainz, clean it up and match the topics against existing topics in Freebase to avoid duplicates.
In terms of downsides, you should be prepared to work with a lot of data. For example, Freebase currently has 4 musical artists named "John Smith" which may or may not be useful for your application but you'll still need to figure out which one(s) map to the John Smith that your users are interested in. We call this "reconciliation" and its necessary so that your app knows precisely which topics to query the API for.
Since you mentioned music venues I should also point out that while Freebase has a lot of data about places, we don't yet have a geosearch API so you'd need to roll your own if that's something you need.
Since anyone can edit Freebase, you should also consider using as_of_time to protect your site against vandalism.
Freebase is great for developers because you can easily jump in and clean up bad data or add missing topics. However, one area that has always been a challenge is loading large amounts of data from outside of Google. We've built the OpenRefine which allows folks to upload datasets, but these datasets must pass a QA process that takes some time to complete. Its necessary to have these QA processes to maintain the level of quality in Freebase, but it does slow down the process of loading large datasets.
I really hope that you choose to make use of Freebase music data to build your company. I know that there are already a number of music startups happily using our data.

Best way to build a DataMart from multiple external systems?

I'm in the planning stages of building a SQL Server DataMart for mail/email/SMS contact info and history. Each piece of data is located in a different external system. Because of this, email addresses do not have account numbers and SMS phone numbers do not have email addresses, etc. In other words, there isn't a shared primary key. Some data overlaps, but there isn't much I can do except keep the most complete version when duplicates arise.
Is there a best practice for building a DataMart with this data? Would it be an acceptable practice to create a key table with a column for each external key? Then, a unique primary ID can be assigned to tie this to other DataMart tables.
Looking for ideas/suggestions on approaches I may not have yet thought of.
Thanks.
The email address or phone number itself sounds like a suitable business key. Typically a "staging" database is used to load the data from multiple sources and then assign surrogate keys and do other transformations.
Are you familiar with data warehouse methods and design patterns? If you don't have previous knowledge or experience then consider hiring some help. BI / data warehouse projects have a very high failure rate and mistakes can be expensive.
Found more information here:
http://en.wikipedia.org/wiki/Extract,_transform,_load#Dealing_with_keys
Well, with no other information to tie the disparate pieces together, your datamart is going to be pretty rudimentary. You'll be able to get the types of data (sms, email, mail), metrics for each type over time ("this week/month/quarter/year we averaged 42.5 sms texts per day, and 8000 emails per month! w00t!"). With just phone numbers and email addresses, your "other datamarts" will likely have to be phone company names, or internet domains. I guess you could link from that into some sort of geographical information (internet provider locations?), or maybe financial information for the companies. Kind of a blur if you don't already know which direction you want to head.
To be honest, this sounds like someone high-up is having a knee-jerk reaction to the "datamart" buzzword coupled with hearing something about how important communication metrics are, so they sent orders on down the chain to "get us some datamarts to run stats on all our e-mails!"
You need to figure out what it is that you or your employer is expecting to get out of this project, and then figure out if the data you're currently collecting gives you a trail to follow to that information. Right now it sounds like you're doing it backwards ("I have this data, what's it good for?"). It's entirely possible that you don't currently have the data you need, which means you'll need to buy it (who knows if you could) or start collecting it, in which case you won't have nice looking graphs and trend-lines for upper-management to look at for some time... falling right in line with the warning dportas gave you in his second paragraph ;)

Booking logic and architecture, database sync: Hotels, tennis courts reservation system

Imagine that you want to design a tennis booking system.
You have 5 tennis clubs as partners with no online api allowing you to check on their side if a court is booked or not: You have to build this part as well.
Every time a booking is done on their side you want it to be known by our system. Probably using a POST request form tennis partner to our server.
Every time a booking is done on our website, we want to push the booking to their system. The difficulty is that their system need to be online and accessible from outside. Ip may change, we have to use a dns updater.
In case their system is not available we still accept the booking and fallback to an async email with 'i confirm booking/reject booking' link sent to the club.
I find the whole process quite complex and was wondering about the way online hotel booking system and hotel were working. Do they all have their data open and online ?
The good thing is that the data will grow large and fits nicely to some no SQL ;) like couch db
There are several questions here, let me try and address each one...
Since this appears to be an internet application with federated servers, using the implied HTTP Protocol makes a lot of sense. This could be done via Form POSTs, GET, or even REST-ful submission of some custom data structure. In the end, the exact approach to use will need to come down to the size and complexity of the information being communicated. Many architectures employ these approaches and often combine them with encrypted, signed, and/or encoded payloads for security. One short-fall to consider with these approaches is that they will require you to clearly communicate all request / response message formats, field ranges, and variations since these mechanisms are not really self-describing. On the other hand, these patterns use very common protocols, are easily understood, easy to implemented, and are typically lean on-the-wire.
In constrast, architectures with very complex structures often chose to use WSDL-based web services. Also driven by common standards, these tend to be self-describing, inherently versionable, although they can take more time and energy to implement. There are a lot of advantanges to web services which are driven by many WS-* standards which may be worth investigating further in your case.
As for the reservation process... many similar architectures will employ an orchestration model such as the following:
Find open booking spaces
Make a reservation for a booking space. This places an expiring lock on a space while the requestor fills in all required booking information. This mitigates against race conditions that could lead to multiple bookings for the same space
Once all required booking information is received and validated the booking is confirmed and permamently locked from use by other requestors
As for the SQL-style DB comment, I can't really say given the amount of information supplied. With that said, my instincts tell me a SQL-style DB is completely reasonable for this problem set. I have databases with many pedabytes and have very high SLA's. You implied a need for high availability and SQL-based databases have a few decades of proven support behind them in this area.
Hope this helps.
I think you will find most on-line hotel reservation systems aren't really on-line. My experience is that those companies (not the hotels themselves) offering on-line booking systems also insist that the hotel itself also books their rooms on-line using the same system.
Everything works fine as long as connectivity is not an issue - and in small motels scenario it normally will. Of course the bigger hotels use the same system the airlines do and they have dedicated communications links for the purpose. The reservations are of course maintained on one central computer with appropriate backup links etc etc etc.
It is very easy for individual tennis clubs to offer their own real-time online booking systems using their own database/website with programs like MyCourts offers however once you want to link more than one clubs facilities then you really don't have much option other than to have a centralized server that both the user and the club both have to use to reserve facilities.

Resources