Database | Google Product Feeds Field Length - database

Trying to construct a database to hold a bunch of product data. The plan is to programmically generate TXT (Text tab delimited) filed and upload to Google, Amazon, and Bing (all use Google Product Feeds file formatting) from the data in the database.
Problem I'm having right now is I seem to have to guess at the maximum field length for most of the available fields. I found in several places telling me what field are required/optional and what their use is for, but no where can I find how long they can be. This is critical to know in a fully automated product feed system since we don't want to pull data from our warehouse system and INSERT that into our Product Feed database if it is invalid for the Google product feeds (too long).
Please advise.

google shopping will allow up to 72 chr for the title. description fields are usually unlimited. for anything else you usually wouldn't have anything too long anyways.
There are programs out there that do this already

Related

Finding image duplicates for products within database workflow

I am looking for a solution to identify duplicate products from their images to speed up my product database workflow.
I accept product listings from many suppliers and have thousands of listings, totalling hundreds of thousands of images. The same product may be stocked by several suppliers. Each supplier may use the same images but with different watermarks or size. Each supplier may describe the product slightly differently.
On my website, I only want to list each individual product from one supplier. If I am sent a product that I already have, I want to efficiently identify the duplicate and ignore the new product.
I currently use some Regex and text searching to help me identify duplicates but it's not foolproof and is slow. I have read about hashing each image and searching that way, but my duplicate images aren't exactly the same.
NB. I am using Windows. I do not know Python or Java. I do have a range of technical knowledge but I haven't yet found anything that isn't "first become an expert in Java, then..."
Is there a Windows app or API or something out there that, given a set of images, can return back duplicates?

Text File or Database based chat room? Not only messages and not only one-to-one

I'm developing a webchat and I have to decide what strategy to use. I can use either a text file or a database to store the data that come from and go to the users.
Yes, I've seen the other posts where people are talking about it, but I have a few more questions than just "which one is faster" (that's also one of my questions, but not the only one). And I will point out the problem more carefully.
First of all many users will share the same room and chat with all the others at the same time. Many rooms will be open at the same time (the same user can be at two rooms at the same time).
In this system messages is not the only thing that come and go. The status (away, online, typing etc), updates ("don't talk to me, i had the worst day of my life" or "i'm new here"), who logged out and in, replies to other messages, and the messages themselves are the kind of traffic that will be there. Somewhere.
I want the Javascript to receive the data in JSON format.
And I want this chat to be used by many many people at the same time! Will send requests from each one of them every 5 or 10 seconds.
Some data will be stored in the Database for sure (the user sessions, its id, the user's id, information about the room, when it was created etc), but how it will deal with dynamic data flow is still to be decided.
In other words, even if it's not the most complex chat ever developed, it's not the simplest one.
Example of files distribution if strategy chosen is text file:
Path: /root/room/1 (to room 1)
Files: 1000001, 100002, 100003 (users in this room)
Path: /root/room/2
Files: 1000002, 100005, 100010
...
Messages and any other kind of update (access, status, personal update etc) sent in room 1 FROM user 1000001 will be saved in paths /root/room/1/100002 (user 2) and /root/room/1/100003 (user 3) and so on.
Each user will read his own file for that room and erase it after reading so only new data will be in that files everytime they read them.
The data is written in JSON format so no further encoding is needed.
The questions:
1- Considering a huge traffic of data being written in and read from these text files, what is the best option, using the database to save the file and then make queries (SELECT FROM messages (and from update, and from access, and from anything else) WHERE id > #number) every 5 or 10 seconds, for each user in each room, or using the text files?
2- If the Database is the best option, should I write it in JSON format? I would use a field called JSON, because I wouldn't want to loose the ability of making queries in that database. Or maybe I would create a new table, connecting the user, the room and the JSON and write AGAIN in other tables to keep the tuple format.
3- if File is the best option, would you consider replicating the data in the database (just in case you want to query or whatever) everytime it's saved in the files?

How to get all seller listings?

I need to get data about all listings in my Amazon Seller Central account.
I checked the available reports that Amazon can generate, and the best is the open listings report - best - I mean that it gives more data than other reports, but still lacks much, for example, EANs, photo URLs.
However, the open listings report have one issue: It crops the long titles of the listings and places a "..." postfix. It also adds a product type string to the titles, like "Title [Electronics]".
So, while the open listings report can be used as "something is better than nothing", I need something more decent.
Is there something else that can be used to accomplish my task, or is that report is the best I can get?
You can request a "_GET_MERCHANT_LISTINGS_DATA_" report from Amazon MWS, which will give you the most data, including the photo URLs.
Unfortunately, the title data is still shortened and the product type is still added.
A list of all inventory reports can be found in the documentation.
You could also try using the Products API.
This will give you very detailed data about each product with the correct titles, but you have to send a query to get results instead of getting everything in a single report.

What's the best way for developing a large amount of data in a Table View?

I'm a newbie to app development. I'm using Xcode 4.3.2. I'm attempting to develop an app using a tab bar with a table view. In the table view I need to list about 100 cities and info about those 100 cities when the user selects one. Basically, I already have that data about the cities in a Excel spreadsheet.
I can't really find good examples of what I want to achieve. I've heard the terms parsing XML, SQLite, Core Data, database, etc, and I'm not sure if that is what I need to do.
I'd thankfully accept any suggestions.
If the data in the table are changing or edited, then by using a database, you will avoid rolling a new patch with those minor changes (you just change the values in the db)
If the data is the same and won't change for a long time and you plan to patch the application, then you just need a source for that data (the spreadsheet)
For parsing the data, you can use anything, when taking about showing 100 cities, it depends how big the total data you will be querying, how fast it needs to be and you just need to benchmark it.
If you are querying about 500k records and you need to do some 'figuring out' and it takes too long to load. Then, transforming your data into xml then parsing it may give you better performance.
You have to at least design your way into what you want to achieve. Check the performance and tweak it to find the decent spot.
Right now I look at it as tackling an unknown problem. Spend some time and build something. This will help you see the potential problems better.
While databases are good, for a few hundred elements you can tolerate inefficiency. If your existing data are in an Excel spreadsheet, the easiest way to get them into your app is to export the Excel spreadsheet to Comma-Separated-Values (CSV), then make your app read CSV files. (If your Excel spreadsheet has multiple worksheets, you'll need to convert each separately.)
How do you parse CSV? See iPhone : How to convert CSV format into NSData or NSString?
You'll end up with arrays of arrays of NSString. You'll probably need to define a new class for your city data, and convert each row in the imported data to one city element.
If you need to know more, posting a few rows from your spreadsheet may help.

Millions of rows auto-complete field - implementation ideas?

I have a location auto-complete field which has auto complete for all countries, cities, neighborhoods, villages, zip codes. This is part of a location tracking feature I am building for my website. So you can imagine this list will be in the multi-millions of rows. Expecting over 20 million atleast with all the villages and potal codes. To make the auto-complete work well I will use memcached so we dont hit the database always to get this list. It will be used a lot as this is the primary feature on the site. But the question is:
Is only 1 instace of the list stored in memcached irrespective of the users pulling the info or does it need to maintain a separate instance for each? So if say 20 million people are using it at the same time, will that differ from just 1 person using the location auto-complete? I am open to other ideas also on how to implement this location auto complete so it performs well.
Or can i do something like this: When a user logs in in the background I send them the list anyways, so by the time they reach the auto complete textfield their computer will have it ready to load instant?
Take a look at Solr (or Lucene itself), using NGram (or EdgeNGram) tokenizers you can get good autocomplete performance on massive datasets.

Resources