What's the best way for developing a large amount of data in a Table View? - database

I'm a newbie to app development. I'm using Xcode 4.3.2. I'm attempting to develop an app using a tab bar with a table view. In the table view I need to list about 100 cities and info about those 100 cities when the user selects one. Basically, I already have that data about the cities in a Excel spreadsheet.
I can't really find good examples of what I want to achieve. I've heard the terms parsing XML, SQLite, Core Data, database, etc, and I'm not sure if that is what I need to do.
I'd thankfully accept any suggestions.

If the data in the table are changing or edited, then by using a database, you will avoid rolling a new patch with those minor changes (you just change the values in the db)
If the data is the same and won't change for a long time and you plan to patch the application, then you just need a source for that data (the spreadsheet)
For parsing the data, you can use anything, when taking about showing 100 cities, it depends how big the total data you will be querying, how fast it needs to be and you just need to benchmark it.
If you are querying about 500k records and you need to do some 'figuring out' and it takes too long to load. Then, transforming your data into xml then parsing it may give you better performance.
You have to at least design your way into what you want to achieve. Check the performance and tweak it to find the decent spot.
Right now I look at it as tackling an unknown problem. Spend some time and build something. This will help you see the potential problems better.

While databases are good, for a few hundred elements you can tolerate inefficiency. If your existing data are in an Excel spreadsheet, the easiest way to get them into your app is to export the Excel spreadsheet to Comma-Separated-Values (CSV), then make your app read CSV files. (If your Excel spreadsheet has multiple worksheets, you'll need to convert each separately.)
How do you parse CSV? See iPhone : How to convert CSV format into NSData or NSString?
You'll end up with arrays of arrays of NSString. You'll probably need to define a new class for your city data, and convert each row in the imported data to one city element.
If you need to know more, posting a few rows from your spreadsheet may help.

Related

Database Without a database? [Advice needed]

For a school project we need to store a lot of weather data. 8000 rows per second are supposed to be inserted. This data also needs to be queried fast, with filters, groups, sorting and limiting.
But we are not allowed to use an existing database management system like mySql or Mongo.
Currently we are thinking about creating a new file every second with 8000 lines each. Only reading the data back from all those files to generate a top 10 list would not be fast. So fast querying is the main problem.
Any recommendations are welcome.
EDIT:
We need to store 8000 rows per second containing the following values from from xml files we receive. We already can parse and store 8000 of these xml files per second in flat files(CSV style). But we are not able to do any fast querying. An example query would be: Get the highest 100 temperatures from the latest 8 million rows.
<MEASUREMENT>
<STN>123456</STN>
<DATE>2009-09-13</DATE>
<TIME>15:59:46</TIME>
<TEMP>-60.1</TEMP>
<DEWP>-58.1</DEWP>
<STP>1034.5</STP>
<SLP>1007.6</SLP>
<VISIB>123.7</VISIB>
<WDSP>10.8</WDSP>
<PRCP>11.28</PRCP>
<SNDP>11.1</SNDP>
<FRSHTT>010101</FRSHTT>
<CLDC>87.4</CLDC>
<WNDDIR>342</WNDDIR>
</MEASUREMENT>
How about using SQLite? You can embed it in your Project and tweak for performance.
Also I would check here: https://en.wikipedia.org/wiki/Embedded_database
Try out clickhouse (https://clickhouse.yandex/) :)
It's kinda overkill, but it will work very fast!
edit:
You are not allowed use ANY database? If so, than use flat files.

Adding new Excel files to MS Access database as they come in

I am in the situation where I have a questionnaire that is basically just a plain excel spreadsheet with two columns:
one column with the questions and
a second column next to it where users can fill in their answers.
Each respondent has been sent a copy of the file and they will email back their files individually over a long time period. I can't wait until i have all files back; instead i would like to collect (and use) the data in Access as the files come in.
Two questions:
What is the best set up in terms of the manual steps required when a new datafile comes in. Can one just save the file in a specific folder and somehow have the column (column B) with responses "automatically" added to the main database? If not fully automatically, what could be done with just a few manual steps involved?
I realize that the shape of the questionnaire is not ideal (variables are in rows, not in columns). What's the best way to deal with that?
Thanks in advance for any pointers!
PS: I'be open to (simple) alternatives, if Access is not the best choice for this. Analysis of the data will be done in Excel again in the end.
Update, to clarify the questions below:
1) In the short - medium term, we are expecting 50-100 replies. In the long term, it will be more as, people will be asked to send updates when their situation changes - these will have to be added as new entries with a new date attached to them. i.e. it will be a continuous process with a few answers coming in every few weeks.
2) There are 80 questions on the questionnaire.
3) The Excel files come back as email attachments.
4) I was contemplating using Acess, as I thought it will a) makeit a bit cleaner and less error prone, especially as project managers might change in the future, b) allow for better handling of the data, as it will have to be mashed up and reshaped in different ways for the anlysis (e.g. it has to be un-pivoted, which i don't even know if excel can do), and c) i thought it it would give us more flexibility in the future when it comes to using different tools for analysis. i.e. each tool can just query the database. I am open for other suggestions, including Excel-only solutions, if that makes it easier, though.
5) I envision the base table to have all the 80 variables in different columns, and the answers as rows (i.e. each new colum that comes with each excel file will need to be transposed and added as a new row). There will be other data tables with the same primary key as the row identifier in this table.
6) I havn't worked on the analysis part yet, but i know that it will require a lot of reshaping and merging of data sets.
Answer 1 - Questions
You do not provide enough information to allow any one to give you pointers. Some initial questions:
How many questionaires are you expecting: 10, 100, 1000?
How many questions are there per questionaire?
How are the questionaires reaching you? You say "email back". Does this mean as an attachment or as a table in the body of the email.
You say the data is arriving as Excel files and you intend to do the analysis in Excel. Why are you storing the answers in Access? I am not saying you are wrong to store the results in Access; I just want to be convinced you have a reason.
Have you designed the planned table structure for Access?
Have you designed the structure of the Excel workbook(s) on which you will perform the analysis?
Answer 2
Firstly, I should say that I agree with Mat. I am not an expert on questionnaires but my understanding is that there are companies that will host online questionnaires and provide the results in a convenient form.
Most of the rest of this answer assumes it is too late to consider an online questionnaire or you have, for whatever reason, rejected that approach.
An Access project is, to a degree, self-documenting. You can look at its list of tables and see that Table 1 has columns A, B and C. If created properly you can see the relationships between tables. With an Excel workbook you just have a number of worksheets which can contain anything. There is no automatic documentation.
However, with both Excel and Access the author can create complete documentation that explains each table, worksheet, report and macro. If this project is going to last indefinitely and have a succession of project managers, such documentation will be essential. I can tell you from bitter experience that trying to understand a complex Access project or Excel workbook that you have inherited without proper documentation is at best difficult and at worst impossible.
Don’t even start this unless you plan to create and maintain proper documentation. I do not mean: “We will knock up something when we have finished.” Once it is finished, people will be moving onto their next projects and will have little time for boring stuff like documentation. After the event documentation also loses all the decisions and the reasons for those decisions. The next team is left wondering why their predecessors did it that way. The reason will not matter in many cases but I have seen a product destroyed by a new team removing “unnecessary complexity” they did not understand. I always kept a notebook in which I recorded what I was doing and why during the day. I encouraged my staff to do the same. I insisted something for the project log every week. The level of detail depends on the project. The question I asked myself was: “If I had just inherited this project, what happened during the last week that I would need to know?” This was in addition to an up-to-date specification for each component.
Sorry, I will get off my hobby-horse.
“In the short - medium term, we are expecting 50-100 replies. In the long term, it will be more as, people will be asked to send updates when their situation changes - these will have to be added as new entries with a new date attached to them.”
If you are going to keep a history of answers then Access will probably be a better repository than Excel. However, who is going to maintain the Access project and the central Excel workbooks? Access does not operate in the same way as Excel. Access VBA is not quite the same as Excel VBA. This will not matter if you are employing professionals experienced in both Access and Excel. But if you are employing amateurs who are picking up the necessary skills on the job then using both Access and Excel will increase what they have to learn and the likelihood that they will get confused.
If there are only 100 people/organisations submitting responses, you could merge responses and maintain one workbook per respondent to create something like:
Answers -->
Question 1May2014 20Jun2014 7Nov2014
Aaaaaa aa bb cc
Bbbbbb dd ee ff
I am not necessarily recommending an Excel approach but it will have benefits in some circumstances. Personally, unless I was using professional programmers, I would start with an Excel only solution until I knew why I needed Access.
“I envision the base table to have all the 80 variables in different columns, and the answers as rows (i.e. each new colum that comes with each excel file will need to be transposed and added as a new row).” I interpret this to mean a row will contain:
Respondent identifier
Date
Answer to Q1
Answer to Q2
: :
Answer to Q80.
My Access is very rusty. Is there a way of accessing attribute “Answer to Q(n)” or are you going to need 80 statements to move answers in and out? I hope there is no possibility of new questions. I found updating the database when a row changed a pain. I always favoured small rows such as:
Respondent identifier
Date
Question number
Answer
There are disadvantages to having lots of small rows but I always found the advantages outweighed them.
Hope this helps.

Silverlight Isolated Storage and loading big files

In a Windows Phone 7 application, I would like to query a big XML file (list of cities) stored using Isolated Storage. If I do that this way, will the file be loaded to memory (> 5 mo) ? If so, what other solution do I have?
Edit:
More details. I want to use AutoCompleteBox (http://www.jeff.wilcox.name/2008/10/introducing-autocompletebox/), but instead of using a web service (this is fixed data, no need to be online), I want to query a file/database/isolated storage... I have a fixed list of cities. I said in the comments it's 40k, but it finally seems closer to 1k rows.
instead of using isolatedstorage for this, would it be an option for you to use a webservice instead... or do you design your app for an offline approach?
querying a webservice, wcf or json enabled webservice is really simple, and will be easier for you to maintain :)
Rather than have a big file containing all the data can you not break it down into lots of smaller files. (One for each city?)
You could have a separate file to keep an index of them all if need be. Alternatively, depending on the naming of the files, you may be able to use IsolatedStorageFile.GetFileNames to get a list of all files.
I would create my own file format, using, for example, a separator between fields, with one row for each record.
That way you can read your file line-by-line to fill your data structure with these advantages:
no need to pull the whole file into memory
no XML overhead (in a desktop application it may not be a problem, but in the phone context a 5 MB text file may become quite a bit smaller)
Dumb example:
New York City; 12345
Berlin; 25635
...
EDIT: given that the volume is not that large you don't need any form of indexing or loading on-demand. I would store the cities as stated above -one record per line-, load them in a list and use LINQ to select the items you need. This will probably be fast and keep your application very responsive.
In this case, in my opinion, XML is not the best tool for the job. Your structure is very simple and storing in XML would probably double the file size, which is a concern for a mobile device, and would also slow the parsing, also a concern in this case.

Fastest way to store/retrieve a dictionary - SQL, text file...?

I've got a text file of words and word frequencies. It's very large - theoretically we're talking millions of rows.
I just want to retrieve values from the file, and do it as quickly and efficiently as possible (for a web app, in Django).
My question is: what is the best way to store and retrieve the values? Should import them into SQL? Or keep the file and use grep? Or put them into a JSON dictionary...? Or some other way?
Would be very grateful for advice!
putting them in a json dictionary would be a bad idea unless you want to load the entire thing into memory when you search through it.
sql is basically built for this kind of thing, so i would use that. a file and grep would also work fine, but you wouldn't gain any benefits from indexing etc that sql would give you.

Advantages of keeping to a protocol for a data model

The question title is probably not correct because part of my question is to try and get some more understanding on the problem.
I am looking for the advantages of making sure data that is imported to a database (simple example: Excel table to Access database) should be given using the same schema and should also be valid to the business requirements.
I have an Excel table containing none normalised data and an Access database with normalised tables.
The Excel table comes from multiple third parties, none of which stick to the same format as each other or the database.
Some of the sources also do not supply all the relevant data.
Example of what could be supplied
contact_key, date, contact_title, reject_name, reject_cost, count_of_unique_contact
count_of_unique_contact is derived from distinct contact_title's and should not be imported.
contact_key is sometimes not supplied.
title is sometimes unknown and passed in as such "n/a", "name = ??1342", "#N/A" etc. rather random.
reject_name is often miss spelled.
the fields are sometimes not even supplied, e.g. date and contact_key are missing.
I am trying to find information to help explain the issues with the above.
Issues only related to incorrect data or fields making it difficult to have useful data in the database such as not being able to report a trend on reject costs in a month when the date is not supplied. Normalising the excel file is not an option available to me.
Requesting the values and fields in the Excel files to match the business requirements and the format to be the same for every third party that sends them is what I want to do but the request is falling on deaf ears.
I want to explain to the client that inputting fake data and checking for invalid/existing rejects/contacts all the time is wrong and doing it is going to fail or at the best be difficult without constant maintenance of a poor system.
Does anyone have any information on this problem?
Thanks
This is a common problem; this gets referred to in data processing circles as "garbage in, garbage out". Essentially, what you're running up against is that the data as given is of poor quality; you're correct to recognize that the problem is that it will be hard (if not impossible) to use this data to extract any useful information.
To some extent, this is a problem that should be fixed at the source; whatever your source of your data is, they need to be convinced that the data quality must improve. In the short term, you can sanitize your data; the term refers to removing or cleaning the bad entries to make the remainder of the data (the "good" data) importable into your database. Depending on just what percentage of your data is bad, you may or may not be able to do useful things with the sanitized data once you import it.
At some point, since you're not getting traction with management about the quality of the data, you will simply have to show them that the system is not working as intended because the quality of the data is bad. They'll need to improve their processes at that point to improve the quality of the data you get in at that point. Until then, though, keep pressing for better data; investigate the process of sanitizing the data and see what you can do with the remaining data. Good luck!

Resources