Clean dataset with many dfferent words for one thing

Clean dataset with many dfferent words for one thing - database

At the moment I have a dataset of ingredients. The problem is that it is not very clean because it contains many different names for the same thing. Here are a few examples:
Mehl = Weizenmehl, Mehl Type360
or
Eier = Eier, Ei(er), Ei
I thought of maybe deleting those brackets and making many if statements which are looking for different things like "Mehl" but there I would have to also look for something like "Dinkel" because of
Dinkelmehl != Mehl
I could do it but it would be very laborious because that's a big dataset. Are there some other methods maybe with a dictionary or something? I hope you can help me thank you!

Frederick!
Yes, you can use map method from pandas. First, I would suggest to clean special characters (!"#$%&/) and the use the map for Eier,Ei, Mehl, Tomaten....
I attach the documentation of map: map in df pandas
VG

Related

Is there a way to duplicate every value in an excel array?

I am trying to duplicate all values in an array in my sheet. I have {1,6,14,15} and I want to output {1,1,6,6,14,14,15,15}. I would like to do this exclusively with functions. I have seen the VSTACK function, which seems very useful, however joining the insider thing seems like a hassle and would not allow this spreadsheet to be usable across other devices easily.
I have tried the CONCAT function, however this simply returns 161415161415 which is not helpful to me. The various alternatives to VSTACK all remove duplicates, which is exactly not what I am looking for. Besides all of those alternatives are lengthy and hard for me to wrap my head around.

You could use EXPAND() here:
=LET(arr,{1,6,14,15},TOROW(IFERROR(EXPAND(arr,2),arr),,1))
Note that 2 will define how often you want to duplicate the input.

If you have LET and SEQUENCE:
=LET(ζ,{1,6,14,15},INDEX(ζ,SEQUENCE(,2*COUNTA(ζ),,0.5)))

Erlang, working with Tuples without BIFs

is there a way to iterate over Tuples or get an Element of a Tuple without using BIFs? And i need to do this without using any other Data type like lists or maps or something else.
I did a lot of research but found nothing. :(
Thanks!

Ruby: Hash, Arrays and Objects for storage information

I am learning Ruby, reading few books, tutorials, foruns and so one... so, I am brand new to this.
I am trying to develop a stock system so I can learn doing.
My questions are the following:
I created the following to store transactions: (just few parts of the code)
transactions.push type: "BUY", date: Date.strptime(date.to_s, '%d/%m/%Y'), quantity: quantity, price: price.to_money(:BRL), fees: fees.to_money(:BRL)
And one colleague here suggested to create a Transaction class to store this.
So, for the next storage information that I had, I did:
#dividends_from_stock << DividendsFromStock.new(row["Approved"], row["Value"], row["Type"], row["Last Day With"], row["Payment Day"])
Now, FIRST question: which way is better? Hash in Array or Object in Array? And why?
This #dividends_from_stock is returned by the method 'dividends'.
I want to find all the dividends that were paid above a specific date:
puts ciel3.dividends.find_all {|dividend| Date.parse(dividend.last_day_with) > Date.parse('12/05/2014')}
I get the following:
#<DividendsFromStock:0x2785e60>
#<DividendsFromStock:0x2785410>
#<DividendsFromStock:0x2784a68>
#<DividendsFromStock:0x27840c0>
#<DividendsFromStock:0x1ec91f8>
#<DividendsFromStock:0x2797ce0>
#<DividendsFromStock:0x2797338>
#<DividendsFromStock:0x2796990>
Ok with this I am able to spot (I think) all the objects that has date higher than the 12/05/2014. But (SECOND question) how can I get the information regarding the 'value' (or other information) stored inside the objects?

Generally it is always better to define classes. Classes have names. They will help you understand what is going on when your program gets big. You can always see the class of each variable like this: var.class. If you use hashes everywhere, you will be confused because these calls will always return Hash. But if you define classes for things, you will see your class names.
Define methods in your classes that return the information you need. If you define a method called to_s, Ruby will call it behind the scenes on the object when you print it or use it in an interpolation (puts "Some #{var} here").

You probably want a first-class model of some kind to represent the concept of a trade/transaction and a list of transactions that serves as a ledger.
I'd advise steering closer to a database for this instead of manipulating toy objects in memory. Sequel can be a pretty simple ORM if used minimally, but ActiveRecord is often a lot more beginner friendly and has fewer sharp edges.
Using naked hashes or arrays is good for prototyping and seeing if something works in principle. Beyond that it's important to give things proper classes so you can relate them properly and start to refine how these things fit together.
I'd even start with TransactionHistory being a class derived from Array where you get all that functionality for free, then can go and add on custom things as necessary.
For example, you have a pretty gnarly interface to DividendsFromStock which could be cleaned up by having that format of row be accepted to the initialize function as-is.
Don't forget to write a to_s or inspect method for any custom classes you want to be able to print or have a look at. These are usually super simple to write and come in very handy when debugging.

thank you!
I will answer my question, based on the information provided by tadman and Ilya Vassilevsky (and also B. Seven).
1- It is better to create a class, and the objects. It will help me organize my code, and debug. Localize who is who and doing what. Also seems better to use with DB.
2- I am a little bit shamed with my question after figure out the solution. It is far simpler than I was thinking. Just needed two steps:
willpay = ciel3.dividends.find_all {|dividend| Date.parse(dividend.last_day_with) > Date.parse('10/09/2015')}
willpay.each do |dividend|
puts "#{ciel3.code} has approved #{dividend.type} on #{dividend.approved} and will pay by #{dividend.payment_day} the value of #{dividend.value.format} per share, for those that had the asset on #{dividend.last_day_with}"
puts
end

Solr find all ids that start with certain path

Have a number of id's that look like this:
/content/myProject/path1
/content/myProject/path1/page1
/content/myProject/path2
Now, I want to find all the children of path1, so I do /content/myProject/path1/*.
The problem is that I receive also /content/myProject/path2. How do I make a correct query ?
Thanks,
Peter

Sounds like you are using generic text tokenizer definition. You may want to look at the PathHierarchyTokinizer instead. It's designed to split at the path prefixes. And then, you will not need to do the * at the end.

CakePHP philosophical question

So i have a lot of different model types. Comments, posts, reviews, etc. And I want to join them into one integrated feed. Is there a CakePHP style of merging all this data for display, ordered by timestamp?
There are a lot of messy ways to do this but I wonder if there is some standard way which I am missing. Thanks!

Since the items are from different tables, it's difficult to retrieve them sorted together from the database in any case. Depending on how well organized your data is though, something as non-messy as this should do:
$posts = $this->Post->find(...);
$reviews = $this->Review->find(...);
$comments = $this->Comment->find(...);
$feed = array_merge($posts, $reviews, $comments);
usort($feed, function ($a, $b) {
$a = current($a);
$b = current($b);
return $strtotime($a['created']) - strtotime($b['created']);
});

philosophical? lol
No, I don't think there is one. Although you could write an afterSave() in app_model. Check for the data you're looking for, and if found, put it in Cache. It will probably be messy, but at least it's in one place, and doesn't affect the performance much.

There's definitely no way to merge the data automatically, but instead of firing separate queries for each relationship, you can grab it all at once using CakePHP's Containable behavior.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Clean dataset with many dfferent words for one thing - database

Frederick! Yes, you can use map method from pandas. First, I would suggest to clean special characters (!"#$%&/) and the use the map for Eier,Ei, Mehl, Tomaten.... I attach the documentation of map: map in df pandas VG

Related

Is there a way to duplicate every value in an excel array?

Erlang, working with Tuples without BIFs

Ruby: Hash, Arrays and Objects for storage information

Solr find all ids that start with certain path

CakePHP philosophical question

Categories

Resources