In my project, I need to do lot of data feed testing where I need to compare two sets of data.
First set a.k.a. Old, will have the current version of data (as-is version of data)
Second set a.k.a. New, will have the newer version of data (with the necessary code changes)
The 'New' file can vary by the 'Old' file in terms of: record count, addition/removal of columns, change in column definition. The data in the file normally have about 50-60 columns and can go upto 200k records.
The data is normally pulled from 3 different databases and about 12-15 tables.
I need to essentially do two things:
Make sure the data is matching for the columns which are matching between 'old' and 'new' files
Run database queries to validate for the new columns in 'new' file
Currently, Im doing my validation using VLOOKUP, but I'm trying to see if there are any tool which will help me do this faster.
Thank you guys! Your responses are much appreciated.
Thanks,
Ganesh
Using SQL Server 2012, I've got a table that currently has several hundred-thousand rows, and will grow.
In this table, I've got a nvarchar(30) field that contains Medical Record Number (MRN) values. These values can be just about any alphanumeric value, but are not words.
For Example,
DR-345687
34568523
*45612345;T
My application allows the end user to enter a value, say '456' in the search field. The application would need to return all three of the example records.
Currently, I'm using Entity Framework 5.0, and asking for a field.Contains('456') type of search.
This always takes 3-5 seconds to return since it appears to do a table search.
My question is: Would creating a Full Text Index on this column help performance? I haven't tried it yet because the only copy of the database that I have with lots of data in it is currently in QA trials.
Looking at the documentation for the Full Text Indexes it appears that it is optimized around separate words in the field value, so I am hesitant to take the performance hit to create the index without knowing how it is likely to affect my query performance.
EF won't use the T-SQL keywords needed to access the SQL Server full text index (http://msdn.microsoft.com/en-us/library/ms142571.aspx#queries) so your solution won't fly without more work.
I think you would have to create a SProc to get the data using the FTI and then have EF call this. I have a similar issue and would be interested to know your results.
Andy
A little background
I have a single RRD that exists to hold aggregated values of 1500+ individual RRDs (there are 1500+ devices i am monitoring). I do this so that I do not hit 1500+ RRDs when I am looking to get values from every monitored device that holds the data I am looking for. I am constantly growing this group of monitored devices so I do some xml editing (much like the contrib perl script that adds new datasources to an already existing RRD) to account for my new devices. the update to the RRD happens once an hour.
the RRD was created with this
--step 3600
--start now
DS: [$cabinet-totalw] :GAUGE:7200:U:U"
RRA:AVERAGE:0.5:1:4392
RRA:AVERAGE:0.5:24:366
RRA:AVERAGE:0.5:744:36
RRA:MIN:0.5:24:732
RRA:MAX:0.5:24:732
FYI - $cabinet-totalw is in fact a variable in a for loop. The initial build looped through something like 1300 cabinets. I didn't want to list everything here.
The issue
As a new device is added to the monitored group, the datasource is added to the aggregation RRD file. However, when the update fires, it doesn't actually update the RRD for some unknown reason. when i do this manually updatev exists with a zero. if i look at xport output, i have NAN for any new datasource data. however, all existing datasources seem to update without issue.
At the moment I'm lost as to why this is happening. Things seem to be in order yet the update to the new RRD datasources does not take. even more interesting is that i've added datasources to this file in the past and have had those update without issue. it just seems to be recent updates do not take.
I should also add that lastupdate does in fact show the ... well last update correctly. so i assume its a lack of RRD knowledge on my part?
ADDITION
I wrote a script that grabs the index of the DS i am interested in. I then parse through the output of a rrdtool fetch to find that requested value based on the index per time interval. I found that the values are in fact being updated and stored in the RRD. Interestingly enough, the timestamp is showing 7 mins after an allotted time slot (step is 3600 so data should be stored on the hour). I now believe this to be an interpolation issue?
I found my issue. When i am updating the rrd data in the xml file (after it has been dumped) i was mistakenly adding wrong default values to the ds value and the min/max values. needed to change node values from NaN to 0.0000000000e+00 and min/max values from 0.0000000000e+00 to NaN.
thanks if anyone was trying to help.
I have a huge database (800MB) which consists of a field called 'Date Last Modified' at the moment this field is entered as a text data type but need to change it to a Date/Time field to carry out some queries.
I have another exact same database but with only 35MB of data inside it and when I change the data type it works fine, but when I try to change data type on big database it gives me an error:
Micorosoft Office Access can't change the data type.
There isn't enough disk space or memory
After doing some research some sites mentioned of changing the registry file (MaxLocksPerFile) tried that as well, but no luck :-(
Can anyone help please?
As John W. Vinson says here, the problem you're running into is that Access wants to hold a copy of the table while it makes the changes, and that causes it to exceed the maximum allowable size of an Access file. Compacting and repairing might help get the file under the size limit, but it didn't work for me.
If, like me, you have a lot of complex relationships and reports on the old table that you don't want to have to redo, try this variation on #user292452's solution instead:
Copy the table (i.e. 'YourTable') then paste Structure Only back
into your database with a different name (i.e. 'YourTable_new').
Copy YourTable again, and paste-append the data to YourTable_new.
(To paste-append, first paste, and select Append Data to Existing
Table.)
You may want to make a copy of your Access database at this point,
just in case something goes wrong with the next part.
Delete all data in YourTable using a delete query---select all
fields, using the asterisk, and then run with default settings.
Now you can change the fields in YourTable as needed and save
again.
Paste-append the data from YourTable_new to YourTable, and check
that there were no errors from type conversion, length, etc.
Delete YourTable_new.
One relatively tedious (but straightforward) solution would be to break the big database up into smaller databases, do the conversion on the smaller databases, and then recombine them.
This has an added benefit that if, by some chance, the text is an invalid date in one chunk, it will be easier to find (because of the smaller chunk sizes).
Assuming you have some kind of integer key on the table that ranges from 1 to (say) 10000000, you can just do queries like
SELECT *
INTO newTable1
FROM yourtable
WHERE yourkey >= 0 AND yourkey < 1000000
SELECT *
INTO newTable2
FROM yourtable
WHERE yourkey >= 1000000 AND yourkey < 2000000
etc.
Make sure to enter and run these queries seperately, since it seems that Access will give you a syntax error if you try to run more than one at a time.
If your keys are something else, you can do the same kind of thing, but you'll have to be a bit more tricky about your WHERE clauses.
Of course, a final thing to consider, if you can swing it, is to migrate to a different database that has a little more power. I'm guessing you have reasons that this isn't easy, but with the amount of data you're talking about, you'll probably be running into other problems as well as you continue to use Access.
EDIT
Since you are still having some troubles, here is some more detail in the hopes that you'll see something that I didn't describe well enough before:
Here, you can see that I've created a table "OutputIDrive" similar to what you're describing. I have an ID tag, though I only have three entries.
Here, I've created a query, gone into SQL mode, and entered the appropriate SQL statement. In my case, because my query only grabs value >= 0 and < 2, we'll just get one row...the one with ID = 1.
When I click the run button, I get a popup that tells/warns me what's going to happen...it's going to put a row into a new table. That's good...that's what we're looking for. I click "OK".
Now our new table has been created, and when I click on it, we can see that our one line of data with ID = 1 has been copied over to this new table.
Now you should be able to just modify the table name and the number values in your SQL query, and run it again.
Hopefully this will help you with whatever tripped you up.
EDIT 2:
Aha! This is the trick. You have to enter and run the SQL statements one at a time in Access. If you try to put multiple statements in and run them, you'll get that error. So run the first one, then erase it and run the second one, etc. and you should be fine. I think that will do it! I've edited the above to make it clearer.
Adapted from Karl Donaubauer's answer on an MSDN post:
Switch to immediate window (Ctl + G)
Execute the following statement:
DBEngine.SetOption dbMaxLocksPerFile, 200000
Microsoft has a KnowledgeBase article that addresses this problem directly and describes the cause:
The page locks required for the transaction exceed the MaxLocksPerFile value, which defaults to 9500 locks. The MaxLocksPerFile setting is stored in the Windows registry.
The KnowledgeBase article says it applies to Access 2002 and 2003, but it worked for me when changing a field in an .mdb from Access 2013.
It's entirely possible that in a database of that size, you've got text data that won't convert to a valid Date/Time.
I would suggest (and you may hate me for this) that you export all those prospective date values from "Big" and go through them (perhaps in Excel) to see which ones are not formatted the way you'd expect.
Assuming that the error message is accurate, you're running up against a disk or memory limitation. Assuming that you have more than a couple of gigabytes free on your disk drive, my best guess is that rebuilding the table would put the database (including work space) over the 2 gigabyte per file limit in Access.
If that's the case you'll need to:
Unload the data into some convenient format and load it back in to an empty database with an already existing table definition.
Move a subset of the data into a smaller table, change the data type in the smaller table, compact and repair the database, and repeat until all the data is converted.
If the error message is NOT correct (which is possible), the most likely cause is a bad or out-of-range date in your text-date column.
Copy the table (i.e. 'YourTable') then paste just its structure back into your database with a different name (i.e. 'YourTable_new').
Change the fields in the new table to what you want and save it.
Create an append query and copy all the data from your old table into the new one.
Hopefully Access will automatically convert the old text field directly to the correct value for the new Date/Time field. If not, you might have to clear out the old table and re-append all the data and use a string to date function to convert that one field when you do the append.
Also, if there is an autonumber field in the old table this might not work because there is no way to ensure that the old autonumber values will line up with the new autonumber values that get assigned.
You've been offered a bunch of different ways to get around the disk space error message.
Have you tried adding a new field to your existing table using Date data type and then updating the field with the value the existing string date field? If that works, you can then delete the old field and rename the new one to the old name. That would probably take up less temp space than doing a direct conversion from string to date on a single field.
If it still doesn't work, you may be able to do it with a sceond table with two columns, the first long integer (make it the primary key), the second, date. Then append the PK and string date field to this empty table. Then add a new date field to the existing table, and using a join, update the new field with the values from the two-column table.
This may run into the same problem. It depends on number of things internal to the Jet/ACE database engine over which we have no real control.
I have a table containing user input which needs to be optimized.
I have some ideas about how to solve this but i would really appreciate your input on this. The table that needs optimization is called Value in the structure below.
All tables mentioned below has integer primary keys called Id.
Specs: Ms Sql Server 2008, Linq2Sql, asp.net website, C#.
The current structure looks as follows:
Page -> Field -> FieldControl -> ValueGroup -> Value
Page
A pages is a container for one or more Fields.
Field
A field is a container for one or more FieldControls such as a textbox or dropdown-options.
Relationships: PageId
FieldControl
If a Field is of the type 'TextBox' then a single FieldControl is created for the Field.
If a Field is of the type 'DropDown' then one FieldControl per dropdown option is created for the Field containing the option text.
Relationships: FieldId
ValueGroup
Each time a user fills in Fields within a Page and saves it, a new ValueGroup (Id) is created to keep track of user input that is relevant to that save. When a user wants to
look at a previously filled in form, the valuegroup is used to load the Values into the FieldControls of that previously filled in instance.
Relationships: None
Value
The actual input of a FieldControl. If the user typed 'Hello' in a TextBox then 'Hello' would be stored in a row in this table followed by a reference back to which FieldControl 'Hello' was inputted for. A ValueGroup is linked to values in order to group them to keep track of which save/instance they belong to as described in ValueGroup.
Relationships: ValueGroupId, FieldControlId
The problem
If 100.000 Pages are fully filled in, containing 10 TextBoxes each then we get 100.000 * 10 records in the Values table meaning we quickly reach one million records making it really slow as it is now. The user can create as many different pages with as many different Fields as he/she likes and all these values are stored in the Values table. The way i use this data is by either displaying a gridview with pagination that displays all records for a single Pagetype, or when looking at a specific Page instance (Values grouped by ValueGroupId).
Some ideas that i have:
Good indexing should be very important when optimizing the Values table.
Should i perhaps add a foreign key directly back to Page from Value, ending up with indexing by (Id, PageId, ValueGroup) allowing the gridview to retrieve values that are only relevant for one Page?
Should i look into partitioning the table and if so, how would you recommend that i do this? I was thinking that partitioning by Page, hence getting chunks of values that are only relevant to a certain page would be wise in this case right? How would the script/schema look for something like that where pages could be created/removed at any time by the users.
PS. There should be a badge on this forum for all people that finished reading this long post, and i hope ive made myself clear :)
Just to close this post. Correct indexing solved all performance problems.
This may be slightly off-topic, but why? Is this data that you need to access in real-time, or is it for some later processing? Could you perhaps pack the data into a single row and then unpack it later?
Generic
You say it is slow now and that can be many reasons for that other than the database
like low memory, high CPU, disk fragmentation, network load, sockets problems etc etc.
This should show up on a system monitor
Try for instance Sysinternals (now MS) tool: http://live.sysinternals.com/procexp.exe
But if that is all under control then back to the database.
Database index
One million records is not "that much" and should not be a problem.
An index should do the trick if you don't have any indexes right now.
You should probably set indexes on all tables if you haven't done so already.
I tried to do a database model, is this right:
http://www.freeimagehosting.net/image.php?a39cf99ae5.png
Table structure (?)
Page -> Field -> FieldControl -> ValueGroup -> Value
The table structure looks like it may not be the optimal one but it is hard to say exactly when I don't know how the application works.
Do all tables have the foreign keys of
the table above ?
Is this somewhat similar to your code ?
Pseudo code:
1. Get page info. Gives key "page-id"
2. Get all "Field":s marked with that "page-id".
Gives keys "field-id" & "fieldcontrol-id"
3. Loop trough all fields-id:s and get the FieldControl for each one
4. Loop trough all fields-id:s and get all ValueGroup:s.
Gives a list of "valuegroup-id":s keys
5. Loop trough all ValueGroup:s and get all fields