Looping Strucutre R for mtcars - loops

I have homework very new to R and I skipped an introductor course which I now realise was maybe not a good idea just need help with these questions.
Using “mtcars” dataset in R and answering to the following questions. For each question you
need to provide the exact R codes in R-studio. You should provide the copy of the plots.
(make a screenshot from R plot)
a) Create a scatterplot with any chosen variable with appropriate tittle and labels. (you
need to add the R codes here)
b) Use a loop structure and shows 4 different plots with different colours and symbols.
All 4 plots should be appeared in one window.
No idea where to even begin

Related

What is the best way to load a large folder of files into Julia to compare the columns of each file?

New to Julia and programming in general so this is a two part question. Suppose I have a Folder with 3,000 CSV files. Each file is roughly 7,000 x 7. (The number of rows may vary from file to file but the number of columns is constant.) I am trying to read each of these files into an 3000 x N x M tensor or other data structure in julia to compare the outputs by column. (This would mostly involve comparing the sum of the lags in each column vector of each file)
Question 1: What is the most efficient data structure to parse through this data? I would essentially be calculating the max of the sum of the lags of each column for all files. I've been told by a more experienced user that I should be using NamedArrays for this. I was wondering if anyone could provide some insights as to why? Would DataFrames be able to perform similar calculations?
Question 2: Is there an efficient way to read all these files into named arrays? I can read these files into Dataframes with Glob using the following code.
Folder="/Users/Desktop/Data"
Files=glob("*.csv", Folder)
df=DataFrame.(CSV.File.(Files))
But I don't know how to read it into NamedArrays directly. Any insights would be greatly appreciated thanks!

Ive got a pipe that consists of 5 pieces, each including 5 properties

Inlet -> front -> middle -> rear -> outlet
Those five properties have a value anything between 4 - 40. Now i want to calculate a specific match for each of those values that is either a full 10 or a 5 when a single property is summed from each pipe piece. There might be hundreds of different pipe pieces all with different properties.
So if i have all 5 pieces and when summed, their properties go like 54,51,23,71,37. That is not good and not what im looking.
Instead 55,50,25,70,40. That would be perfect.
My trouble is there are so many of the pieces that it would be insane to do the miss'matching manually, and new ones come up frequently.
I have manually inserted about 100 of these already into SQLite, but should be easy to convert into any excel or other database formats, so answer can be related to anything like mysql or googlesheets.
I need the calculation that takes every piece in account and results either in "no match" or tells me the id of each piece that is required for a match and if multiple matches are available, it separates them.
Edit: Even just the math needed to do this kind of calculation would be a lot of help here, not much of a math guy myself. I guess there should be a reference piece i need to use and then that gets checked against every possible scenario.
If the value you want to verify is in A1, use: =ROUND(A1/5,0)*5
If the pipes may not be shorter than the given values, use =CEILING(A1,5)

Algorithm to match three sorted lists

So I've been trying to solve this for some hours now, but apparently there's still something missing. Maybe I'm thinking the wrong way, but I think it is a very complex problem:
I have three lists with items in a fixed order. For explaining the problem assume they contain items A to Z - mostly in the same order with some exceptions, where items can be in different positions. Also only one list contains all items - the other contain a subset and are missing certain items. As a solution for this problem would be sufficient, it could be possible to have no list with all items, but only partly overlapping sets. Even better would be an algorithm to solve the problem with multiple (> 3) lists.
So here's the example:
List 1: A B C D E F G H I J
List 2: A C D B F G
List 3: B C D E H F G
Now what I want is to match these three lists to visualize where the sort order is different and where are items that are missing. So the result should be:
List 1: A B C D E F G H I J
List 2: A C D B F G
List 3: B C D E H F G
So I immediately see, that List 2 has a B at the wrong position, A is missing from List 3, which also has H in the wrong position.
I was thinking about storing the result in a CSV to import into Excel. So the rows are:
A,A,
B,,B
C,C,C
...
Now my question is: how do I match the lists that way to generate the CSV output? The language I use is Java. So far I failed with the problem that a list other than the reference list contains items earlier, which appear later in the reference list.
This is by the way a real-world problem.
Any suggestions are appreciated.
There are off-the-shelf tools for solving this problem, such as the Unix tool diff3. Trying to solve it for arbitrary numbers of lists is not advisable unless you are willing to invest a lot of time in developing heuristics, as you are then dealing with the NP-hard general case of the longest common subsequence problem.
If I understand your question correctly, you are essentially trying to solve a multiple sequence alignment problem, which is a well-researched topic within bioinformatics. There are several algorithms for it, some of which are based on the concept of Levenshtein distance (which would solve a two-array version of your problem) - I suggest you start there.

How to format data in (a) CSV file(s) so that it can easily be imported in R?

Edit:
So, this format would work:
featureID charge xcoordinate ycoordinate
1 2 5105.9217 336.125209180674
1 2 5108.7642 336.124751115092
2 0 2434.9217 145.893331325278
But what if I have two columns with multiple value that are linked. Say column quality has a machine and a quality linked and the column looks like this
MachineQuality
[[{1:1224}, {2:3453}], [{1:2242}, {2:4142}]
Now if I want to split that up like I did with the coordinates of the convexhull I would need 2 rows instead of 1. But wouldn't I need 2 rows for every row that is already in (so 4, because there are already 2 extra for the coordinates) like this:
featureID charge xcoordinate ycoordinate quality1 quality2
1 2 5105.9217 336.125209180674 1224 3453
1 2 5105.9217 336.125209180674 2242 4142
1 2 5108.7642 336.124751115092 1224 3453
1 2 5108.7642 336.124751115092 2242 4142
[...]
Would it have to be like this?
I'm very new to R, my knowledge doesn't go much further than knowing how to make a vector and some simple plots. I'm going to use R for an internship project the next couple of months and during this time I will (hopefully) learn some of the ins and outs of R. However, before I start I need to produce the data that I'm going to do the statistics on. I need to know beforehand how I should format my output CSV data so that I can easily read it in once I start my R analysis.
One thing that I've been asked to do is make a CSV file out of the data so that it can be read in by R. The example CSV files for importing with R that I've seen all look like this
featureID Charge value
1 2 10
2 0 9
However, my data mostly consists out of columns for which the values contain multiple values. To clarify:
As an example, my data exists of "features" that, amongs other information has a "convexhull". This convexhull consists of paired x and y coordinates. So what I could have for data is (only showing two coordinates, can be many)
featureID Charge Convexhull
1 2 [[{'y': '336.125209180674'}, {'x': '5105.9217'}], [{'y': '336.124751115092'}, {'x': '5108.7642'}]]
Is it possible to get this in one CSV file, being able to read it in R correctly (so that the paired x and y coordinates are preserved)? If so, how should the CSV file look like? For example, I've seen examples for CSV files with multiple values that look like this:
featureID charge xcoordinate ycoordinate
1 2 5105.9217 336.125209180674
5108.7642 336.124751115092
2 0 2434.9217 145.893331325278
But I can't find if this is easily imported by R.
If this is not doable in one CSV file, are the CSV files easily imported independently, with a primary key idea, like database linking?
The only critical things are that you have a unique character separating your data columns and that each column is the same length. As long as the second row in your last example is filled in that will import fine.
You need to consider what you want to do with the data after it's in R to decide how you might want any other special formatting beforehand. But, as long as the column separator is a unique character and the columns are of equal length then it will import.
(You can violate the unique separator requirement if your entries are wrapped in quotes. And if you want to get really fancy you could "import" almost anything. But if someone's asking you to format the data then they probably want a rectangular data.frame compatible layout. They probably want unique values in each column (no columns of points). But that's between you and them.)
long vs. wide form. Your last example is known as long form (except all cells should be filled in) and your first example is roughly wide form as discussed on the ?reshape page and illustrated in the examples at the end of that page. You likely want to stick with long form. For an alternative see the reshape2 package.
save & load. Note that if you are only writing it out to read it back in to R later (as opposed to communicating it to some other software) you could use save and load which don't require any change to the object at all.
json. Another possibility given the form of your example is that you might want to look at the rjson package .

Saving numbers to file and plotting them, in a loop

Given the following two loops, I want to save G.temp and S.temp to file and plot them at a later stage.
for(p in seq_along(time(S))){
G.temp<-window(G,start = p,end=p+1)
S.temp <- window(S,start=p,end=p)
print(max(as.numeric(G.temp)-as.numeric(S.temp)))
}
You may want to read one of the introductions to R to learn about data types into which you can write your temporary results. Then use a standard function like write.table to export it.
Your installation of R came with this introduction to R which will also cover basic plotting questions.
For your second question, data import/export is covered in the R Data Import/Export manual that also came with R.

Resources