MS Access - Matching records without single identifier - database

I need to find a way to match records between two tables. The problem is a single identifier that would make the match very simple isn’t available so I need to find a way to make that match based on some other available information in the records.
In an elementary school all registered/existing students have a Student ID. It is unique and makes a perfect primary key. However, any new students entering the school for the coming year do not get a Student ID until they are officially registered.
Before the next school year starts the school invites the new incoming students to be part of a pre-registration assessment program to help determine their current level and needs for the coming school year. It is at this point that as much data about each prospective student is gathered. This information is stored in a separate table from the main student information, mostly because there is no official Student ID. The idea is to merge the pre-registration students and their data into the main student information table(s) once they have an official Student ID assigned to them.
My thinking was to assign these new students a temporary ID just to have a unique identifier for them in case there are name duplications.
My question is how can I match up the temporary ID’s with the real ID’s once the student is assigned one?
Some information that will be gathered in the pre-registration process will include Last Name, First Name, Middle Name, Grade, with Birthday being another possibility (but isn’t included at this time).
Maybe I’m going about this in the wrong way so any suggestions on offer would be greatly appreciated.

It sounds like you are exporting information from the main Student Information System, running additional processing in Microsoft Access, then ultimately merging it back into the main system. This being the case, you will have to work with the limitations in the export and merge features, and building your matching logic around what is available there.
Plan A: Ideally your Excel export would include some type of primary record identifier from the main system, independent of the Student ID that gets assigned later. (It very likely uses a unique ID internally, even if that is not included in the export file.) You would then use this to match to your records in Microsoft Access.
Plan B: If the primary system does not export a unique identifier, then you will need to come up with your best combination of data to uniquely identify the student. How you do this will depend on how many students you are dealing with, and whether the matched data changes in either system. Full name and birthdate is a fairly common way to do this, if that data is complete in the originating system.
With the unique identifier established, I would use two queries in Access. The first would be an update query to assign the Student ID to your Access system as soon as it becomes available in the main system. (Search for matching students that have a Student ID in Excel, but not yet in Access.)
The second query would be an append query to add the new students from the main system into Access. (Where the student in Excel does not match any existing student in Microsoft Access.)
Taking this approach, you would pull the Excel export regularly from the main system and run the above queries to keep your Access system updated. Then when you are ready to merge information back into the main system, you could filter on students in Access that have a Student ID assigned. The actual update of data in the main system might be done through an update query, or perhaps an export from Access that includes the Student ID. (Depending on how your main system merges the incoming data.)

The way I would approach this is to merge both tables into a single table of students. This table would have an AutoNumber ID column that refers to the student or prospective student. Then you would have another column in this table for the StudentID which would be assigned at a later point.
Your forms and reports can then filter the data based on the StudentID field to show you either current or prospective students.
Taking this approach means your student data gets entered into one place, and you don't have to worry about trying to repeat information or merge it later. Since a single record represents a single individual, it makes logical sense to me to use a single table.

Related

In Access database table, sequential field must be unique, but only when student ID matches between records

I am maintaining an Access Database for use with student admissions. I have a primary table which houses biographical information, and a secondary table which has application information, and allows for multiple applications per student (with each student having a unique student ID; that ID is stored in both tables and is how the applications are matched to the student).
Each application is assigned an "Application Number," and each student can only have one application with a specified number (i.e., student A cannot have two applications numbered "1", but can have 1, 2, and 3).
I would like to create a validation rule of some kind to prevent duplicates, but the whole column is not unique... it's only as it relates to the specified student.
Is there a way to create such a rule, or should I be arranging my data differently? I am open to making changes if it means a more efficient workflow.
I hope this makes sense... I wasn't sure how best to describe this. Thank you for any help.
If you are expecting the user doing the data entry to come up with a valid unique "application number", then the rule you are looking for would be a unique index on both StudentId and ApplicationNumber. (Remember, you can create an index which includes multiple columns.) This would mean that every pair of StudentId and ApplicationNumber must be unique.
However, I should note that requiring the user doing the data entry to have to come up with a unique application number by themselves is very user-unfriendly.
Consider the following alternatives:
Have the database suggest a unique application number. Or, better yet,
Do not even suggest any number while the application is being filled-in, but instead issue a unique application number at the moment that the application is submitted. Or, even better yet,
Stop storing application numbers in the database, and instead have the database calculate them, only when there is a need to display them, based on user id and date of data entry of the application. (Caveat: if a student has 3 applications, and application #2 gets deleted, then the old application #3 will be renumbered to #2, thus causing confusion. So, this will only work if deletion is disallowed.)

Two separated tables vs One table with two columns

I am creating a windows forms application that must control the entry and exit of people in an office building. These people may be visitors or employees, and everybody must use an access card at the building entrance. A card will be programmed temporarily when the person is a visitor, and the employees should use their own cards. My doubt is about my database. Is there a way to do this nicely? The cards (no matter if it is an employee or visitor ) has a strong key to the ratchet can identify it wich comes from the manufacturer of turnstiles and I can't change it. So, my structure is:
In my database, I have a table where I keep the cards. When someone try to get inside the building, the turnstile sends to my system the date and time of access and the card code. Now I do not know how to separate the employees and visitors. Should I have a separated flow table to employees and visitors? For the employees flow table, I get the employees card from the card table using the same ID. In the visitor flow table, I need to know who is the visitor using the temporary card (the key to the temporary card never changes, so I can not rely on me only in the key). Or should I only have a flow table with a Visitor_ID and Worker_ID column , one of which will always be null (so I know if it was an employee or a visitor by the field with a value).
Can anyone tell me which of the two is more applicable and why?
Employees and visitors are both people. Specifically people that may (or may not) have an access card assigned to them.
I would have one People table that has a foreign key relationship to the AccessCard table. If you only care about whether the person is an employee or visitor, but the information you store is otherwise identical, a boolean column is fine. If your system stores additional information for employees and/or visitors, create an Employees and Visitors table, and have a foreign key relationship from People to each of those.
I would create single table to store both employee and visitor then add an extra column for Type(E, V).

Finding a suitable data structure for deletion from both lists

This might be deleted, since involves idea sharing which is not quite allowed in stack overflow, but still before that if I could get any ideas from solid programmers, it will be a win situation for me
Assume that you have a class Student, stored in the database, and this class has a list property called favoriteTeachers. This list constantly gets updated by the system and involves the id of teachers.
You also have a class Teacher, also stored in database and likewise has a list property favouriteStudents. It is again updated constantly and involves the id's of students.
In our system, when a student calls a function (let's say notMyFavoriteTeacher), our system has to apply the changes below;
Delete the given teacher's id from favouriteTeacher list
Delete the student's id from given teacher's favouriteStudent list
I've tried to consider the number of rows updated could exhaust the database so instead of mapping the students with their favorite teachers in a separate table as user_id, teacher_id, instead I created a column and stored a string which contains the teachers id's separated by comma. (Ex: "1,2,14,4,25"). Same applied for the teacher as well.
However when we call this function, we also face another problem. In order for this operation to be done, you need to convert the string to list, find the element by linear search and later on delete, and later on convert list to string and push back to db. And you have to do the other operation for the teacher class as well. If we did not apply the string method, deletion would be easier but since we would be handling deletion and addition operations for like 2k times a day, i did not think it would be feasible to use separate tables.
I wanted to ask in order to decrease the number of operations, could a data structure be chosen such that it would increase the efficiency?
Storing an relation as an array in a single column is a violation of first normal form, and should not be done without good reason. Although various forms of denormalization may result in increased efficiency in some cases, I don't see this case being one of those. What's worse, you'll get no help from the database in enforcing referential integrity. And some operations will result in guaranteed row scans: When deleting a teacher, you will have to examine every row of every student to remove the teacher from each student's favorite list. Same goes for deleting a student.
Relational Databases are designed and built to link rows to other rows. You need a very good reason to keep them from doing what they're design to do. You should go ahead and design a proper relational schema, and only if actual measurement shows that it is too slow should you worry about its performance.
First of all, I don't understand your choice of storing ids of favorite teachers/students as comma separated strings, because either in the case of comma separated values or in case of a table with studentId, teacherId structure, you do exactly 2 row updates/deletes (first in the favoriteTeachers table, second in the favoriteStudent table).
But one way of optimizing performance given your current data structure would be keeping the comma separated strings sorted. I mean from the very formation of rows, keep your comma separated ids like "1, 5, 7, 15". This way, if you convert it to a list, you could perform binary search and it would take Log(n) time instead of n.
You are losing all the benefits provided by any RDBMS by storing it as a list of strings. Create a separate table with Student_id and favorite teacher_id. Apply filtering conditions (either for student or for teacher) before joining it to main tables.

How can simplify my database?

I am working on a project in which I have generated a unique id of a customer with the customer's Last name's first letter. And stored it in a database in different tables as if customer's name starting with a then the whole information of the customer will stored in Registration_A table. As such I have created tables of Registration up to Z. But retrieving if data with such structure is quiet difficult. can you suggest me another method to save data so that retrieving become more flexible?
Put all of your registration data into one table. There's absolutely no need for you to break it into alphabetical pieces like that unless you have some serious performance issues.
When querying for registration data, use SQL's WHERE clause to narrow down your results.
You have to merge this to one table ´Registration´, then let the database care about unique ids. This depends on your database, but searching for PRIMARY KEY or AUTO INCREMENT should give you lots of results.
If you have done the the splitting because of performance reasons, you can add a Index on the users last name.

Need strategy for managing aggregated data during large database table creation

Imagine collecting all of the world's high-school students' grades each month into a single table and in each student's record, you're required to include the final averages for the subject across the student's class, city and country. This can be done in a post-process, but your boss says it has to be done during data collection.
Constraint: the rows are written to a flat file then bulk-inserted into the the new table.
What would be good strategy or design-pattern to hang on to the several hundred thousand Averages until the table is done without adding excessive memory/processing overhead to the JVM or RDBMS? Any ideas will be helpful.
Note: Because the table is used as read-only, we add a clustered index to it on completion.
I'd tell my boss to stop micromanaging.
But seriously, sort the data by class, city, and then country. Then compute the running average for each by keeping a running total and count for class, city, and country. When you encounter a different class, write the class name and average to a file. Do the same for cities and countries only use different files for each. Then you can open the sorted data file and the average files and insert rows in the database one by one.
If you want to use a framework that will handle all the writing to disk, I would look into using Hadoop for the processing.

Resources