We are receiving from partner the excell file with list of products. We need to update table that contains these products with data from file. For example some product information was updated, some new products were added , or some removed.
We currently selecting all products for partner in memory, making a diff with products received from from file. The diff will contain list of new products, products to delete, products to update. Than we are applying updates, inserts and deletes.
The approach works, especially if there are not a lot of products, but when there are a lot, memory became an issue. Does anyone had similar tasks, what are the approaches you used? Or any useful advises?
Also we are using a lot of 'home made' libs for this, I am just wondering, if there are already some libs for this already available( googling didn't helped me a lot)?
Thanks a lot for advises!
Best regards, Alex
Related
I am looking for a solution to identify duplicate products from their images to speed up my product database workflow.
I accept product listings from many suppliers and have thousands of listings, totalling hundreds of thousands of images. The same product may be stocked by several suppliers. Each supplier may use the same images but with different watermarks or size. Each supplier may describe the product slightly differently.
On my website, I only want to list each individual product from one supplier. If I am sent a product that I already have, I want to efficiently identify the duplicate and ignore the new product.
I currently use some Regex and text searching to help me identify duplicates but it's not foolproof and is slow. I have read about hashing each image and searching that way, but my duplicate images aren't exactly the same.
NB. I am using Windows. I do not know Python or Java. I do have a range of technical knowledge but I haven't yet found anything that isn't "first become an expert in Java, then..."
Is there a Windows app or API or something out there that, given a set of images, can return back duplicates?
This is what I already know:
Tables for open invoices: "CustTransOpen" - "CustInvoiceJour"
Now, I need some way to find all the lines that make up every single invoice. I've been researching and found that the tables custInvoiceLine and custInvoiceBackorderLine seem to hold this kind of information, but it's not exactly what I need.
Am I heading in the right direction?
The easy way to find is go to form and see its data sources in AOT. In this case CustInvoiceJournal where you can see CustInvoiceJour and CustInvoiceTrans(Lines table) and related tables.
I'm making an api for movie/tv/actors etc. with web api 2 and sql server. The database now has >30 tables, most of them storing data users will be able to edit.
How should I store old version of entries?
Say someone edits description, runtime and tagline for a entry(movie) in the movies table.
I'll have a table(movies_old), where I store the editable files in 'movies' pluss who/when it was edited.
All in the same database. The '???_old' tables has no relationships.
I'm very new to database design. Is there something obviously wrong with this?
To my mind, there are two issues here: what table you store the data in, and what goes in the "historical value" field.
On the first question, there are two obvious options: Store old and new records in the same table, with some sort of indication of which is "current" and which is "history", or have a separate table for history.
The main advantage of one table is that you have a simpler schema. This is especially true if the table contains many fields. If there are two tables, then all the field definitions are duplicated. When you move data from the current table to the history table, you have to copy every field, and if the list of fields changes, or their formats change, you have to remember to update the copy. Any queries that show the history have to read two tables. Etc. But with one table, all that goes away. Converting a record from current to history just means changing the setting of the "is_current" flag or however you indicate it.
The main advantages of two tables are, (a) Access is probably somewhat faster, as you don't have so many irrelevant records to skip over. (b) When reading the current table you don't have to worry about excluding the history records.
Oh, an annoying thing about SQL: In principle you could put a date on each record, and then the record with the latest date is the current one. In practice this is a pain: you usually have to have an inner query to find the latest date, and then feed this back in to an outer query that re-reads the record with that date. (Some SQL engines have ways around this. Postgres, for example.) So in practice, you need an "is_current" flag, probably 1 for current and 0 for history or some such.
The other issue is what to put in the contents. If you're dealing with short fields, customer number and amount billed and so forth, then the simple and easy thing to do is just store the complete old contents in one record and the complete new contents in the new record. But if you're dealing with a long text block, like a plot synopsis or a review, there could be many small editorial changes. If every time someone fixes a grammar or spelling error, we have a whole new record with the entire 1000 characters, of which 5 characters are different, this could really clutter up the database. If that's the case you might want to investigate ways to store changes more efficiently. May or may not be an issue to you.
I know variations on this question have been asked, but I think this is specific enough to merit a new question.
When I receive an updated data feed from Commission Junction I dump it all into a database. Then those products become searchable and selectable for use on our site. However, since all of the data used seems to be fluid, how can I update the products I have saved on the site with the new information? How can I match new info to existing products?
I'm hoping that someone has been doing this long enough to be able to say certain fields rarely change and can generally be relied upon, etc. etc.
Thank you!
I have an application which has several unrelated tables in its db. I'll explain by using an "auto-updating" version of the SO homepage as an example, so lets say I have the tables "users", "comments" and "questions".
The homepage client side needs to periodically poll the server, and get a log of all the new "events" that have happened. I.e., I'd like to display (somehow) the new questions, comments and users that have been added to SO since the last poll.
On way would be to simply keep a variable on the client side containing the last index of each of my tables, send it to the server, and have the server send me the new users, comments and questions.
The problem is, what happens when I add a new type of information, say, votes. Now I have to store another variable on the client-side, and the server has to poll another table. And so on, for every new type of information I keep.
I'm looking for a solution that helps me avoid this.
Another problem - say I'd like to see all the "events" that have happened since last time, but sorted according to when they took place.
One direction I had is to have a single "events" table, which contains the info about when each event happened. I can then poll only this table, and get a list of all the new events that have happened. The problem is that each event is pretty different (a new comment has different columns than a new upvote, etc.) So I'm not sure how to implement this, or if this is even a good idea.
Does anybody have any ideas how I can solve this? This seems like something that would come up a lot, but I don't really have much experience with databases, unfortunately.
Thanks!
It sounds to me like you're trying to future proof via database design. While this can be done through something an EVA model I caution against that because the value its adds tend to not be worth the cost.
Instead you should model the database as closely to reality as possible and not how you intend to use it.
Then use SQL to project the data to how you need it. You can do this by statements that will either deliver the meta data that you need
e.g.
Select
Count(ID) , 'Comments' Type
From
Comments
Where
lastUpdate > #InputParamter1#
UNION Select
Count(ID) , 'Questions' Type
From
Questions
Where
lastUpdate > #InputParamter1#
Or (and this doesn't get used Often enough)
Return more than one result set from your database in one go
Select
userid,
ComentText
From
Comments
Where
lastUpdate > #InputParamter1#;
Select
userId,
Questions,
Tags
From
Questions
Where
lastUpdate > #InputParamter1#
That said you will still have to write some code if you add new stuff but it should be limited to updating your sql, adding new containers for your data and then code to display to the end users and then to validate and store it.
Honestly the idea of adding new stuff requiring some work doesn't seem that awful to me.