Hello clever people.
I manage a group workbook; each sheet has time series data, in one row per month. I receive excel sheets updated with an extra row for the latest month's data. After some rudimentary checks, I paste the new sheet over the existing sheet, so newer data is now added lower down the page.
Sometimes, a value in a row for an earlier month has changed in the imported sheet - sometimes by accident but often after validation. Obviously, when I paste on the latest sheet, only the most recent value is present - I don't necessarily need the old value, I just need to know its been changed.
I thought of performing a checksum on each row, before and after - that would do to indicate a change. Any ideas of a straightforward approach?
TIA, Paul
If values to check are numbers only you could paste it special with substraction, non-changed cells would become zeros...
Related
For data analysis purposes, I need to manually add information in the columns right beside an imported range. It generally doesn't cause any issues and works well. However, whenever the imported data shifts rows (i.e. a new row was added in the middle of the original sheet), the manually-added info no longer matches the data -it either ends up in one row above or below. Basically, it's not in-sync with the needed data.
Is there a way to kind of "fixate" the manually-added information to the same row as the imported data? So that if the order changes in the original sheet, it won't mess up the new one.
I've been using the code shared by #Mogsdad here. However, it is only syncing the info on the "key column" and not the rest of the data in the columns after it.
Attaching screenshots for reference:
This is how it usually looks (the third column is the "key")2
And this is what happens when the rows in the imported range change:3
The code seems to be working, just not for all the columns.
I'm currently working on an excel workbook using the following formula to copy all rows from one sheet (Creation_Series_R) to another one, excluding empty rows.
{=IFERROR(INDEX(Creation_Series_R!C:C;SMALL(IF(Creation_Series_R!$C$3:$C$20402<>"";ROW(Creation_Series_R!$C$3:$C$20402));ROW()-ROW(Creation_Series_R!$C$3)+1));"")}
And the formula works very well. Except, when I did my proof of concept I only had a few rows but with the final data, I need to work on 20400 rows... adding to the fact that I have 17 columns, and 3 similar sheets with similar formula, my workbook takes an hour to compute every time I input just one value.
This workbook is designed as a way for a client to enter data, and then it reorganize the data so that it can be imported directly in our software. I already limited the number of data the user can enter per workbook (to their very big disappointment), so I can't really reduce it to less than 20400 rows (it's only a 100 funds financial data).
Is there a way, even maybe using macro, I could do this more efficiently ?
The big block of array formulas is killing your performance (time-wise).
If your data is in column A through Q, then I would use column R as a "helper" column. In R2 insert:
=COUNTA(A2:Q2)
and copy down. The macro would:
AutoFilter column R
Hide all rows showing 0 in column R
Copy the visible rows and paste elsewhere as a block
I'm not the best at using spreadsheets but I've given a task and its possible I may be a little out of my depth (I'm more of a web programmer)
I have two sheets:
One called Area A and one called Area B with headings:
Time - Location - Reference
I need to set up a new sheet with these column headings:
Time - Reference - Location - Area
Then make a sortable list (I can do this bit)
The Location A & B sheets will be constantly changing and this will need to be reflected in the new sheet when ever it is opened (maybe some sort of onload style event?)
Any ideas on the easiest way to do the above (or if indeed it is doable)? I don't want to be spoon fed, I'd be happy to be pointed in the right direction or to be given some keywords I can Google (I learn better this way).
Many thanks!
This type of data manipulation is something that excel is not good for and is prone to errors.
The best two good ways to do this.
Manually
On sheet "Area A" add a column with area name I.e. Area A. Do this for each "data" sheet. Then manually or via vba copy and paste one sheet at a time to you're aggregated sheet.
Programmatically using VBA
Loop through each sheet and copy and paste to the aggregated sheet adding a column with the sheet name as you paste.
For either of these methods the important thing to do is build in a few checks on counts at the end to make sure your not missing any data.
I need to import sheets which look like the following:
March Orders
***Empty Row
Week Order # Date Cust #
3.1 271356 3/3/10 010572
3.1 280353 3/5/10 022114
3.1 290822 3/5/10 010275
3.1 291436 3/2/10 010155
3.1 291627 3/5/10 011840
The column headers are actually row 3. I can use an Excel Sourch to import them, but I don't know how to specify that the information starts at row 3.
I Googled the problem, but came up empty.
have a look:
the links have more details, but I've included some text from the pages (just in case the links go dead)
http://social.msdn.microsoft.com/Forums/en-US/sqlintegrationservices/thread/97144bb2-9bb9-4cb8-b069-45c29690dfeb
Q:
While we are loading the text file to SQL Server via SSIS, we have the
provision to skip any number of leading rows from the source and load
the data to SQL server. Is there any provision to do the same for
Excel file.
The source Excel file for me has some description in the leading 5
rows, I want to skip it and start the data load from the row 6. Please
provide your thoughts on this.
A:
Easiest would be to give each row a number (a bit like an identity in
SQL Server) and then use a conditional split to filter out everything
where the number <=5
http://social.msdn.microsoft.com/Forums/en/sqlintegrationservices/thread/947fa27e-e31f-4108-a889-18acebce9217
Q:
Is it possible during import data from Excel to DB table skip first 6 rows for example?
Also Excel data divided by sections with headers. Is it possible for example to skip every 12th row?
A:
YES YOU CAN. Actually, you can do this very easily if you know the number columns that will be imported from your Excel file. In
your Data Flow task, you will need to set the "OpenRowset" Custom
Property of your Excel Connection (right-click your Excel connection >
Properties; in the Properties window, look for OpenRowset under Custom
Properties). To ignore the first 5 rows in Sheet1, and import columns
A-M, you would enter the following value for OpenRowset: Sheet1$A6:M
(notice, I did not specify a row number for column M. You can enter a
row number if you like, but in my case the number of rows can vary
from one iteration to the next)
AGAIN, YES YOU CAN. You can import the data using a conditional split. You'd configure the conditional split to look for something in
each row that uniquely identifies it as a header row; skip the rows
that match this 'header logic'. Another option would be to import all
the rows and then remove the header rows using a SQL script in the
database...like a cursor that deletes every 12th row. Or you could
add an identity field with seed/increment of 1/1 and then delete all
rows with row numbers that divide perfectly by 12. Something like
that...
http://social.msdn.microsoft.com/Forums/en-US/sqlintegrationservices/thread/847c4b9e-b2d7-4cdf-a193-e4ce14986ee2
Q:
I have an SSIS package that imports from an Excel file with data
beginning in the 7th row.
Unlike the same operation with a csv file ('Header Rows to Skip' in
Connection Manager Editor), I can't seem to find a way to ignore the
first 6 rows of an Excel file connection.
I'm guessing the answer might be in one of the Data Flow
Transformation objects, but I'm not very familiar with them.
A:
Question Sign in to vote 1 Sign in to vote rbhro, actually there were
2 fields in the upper 5 rows that had some data that I think prevented
the importer from ignoring those rows completely.
Anyway, I did find a solution to my problem.
In my Excel source object, I used 'SQL Command' as the 'Data Access
Mode' (it's drop down when you double-click the Excel Source object).
From there I was able to build a query ('Build Query' button) that
only grabbed records I needed. Something like this: SELECT F4,
F5, F6 FROM [Spreadsheet$] WHERE (F4 IS NOT NULL) AND (F4
<> 'TheHeaderFieldName')
Note: I initially tried an ISNUMERIC instead of 'IS NOT NULL', but
that wasn't supported for some reason.
In my particular case, I was only interested in rows where F4 wasn't
NULL (and fortunately F4 didn't containing any junk in the first 5
rows). I could skip the whole header row (row 6) with the 2nd WHERE
clause.
So that cleaned up my data source perfectly. All I needed to do now
was add a Data Conversion object in between the source and destination
(everything needed to be converted from unicode in the spreadsheet),
and it worked.
My first suggestion is not to accept a file in that format. Excel files to be imported should always start with column header rows. Send it back to whoever provides it to you and tell them to fix their format. This works most of the time.
We provide guidance to our customers and vendors about how files must be formatted before we can process them and it is up to them to meet the guidlines as much as possible. People often aren't aware that files like that create a problem in processing (next month it might have six lines before the data starts) and they need to be educated that Excel files must start with the column headers, have no blank lines in the middle of the data and no repeating the headers multiple times and most important of all, they must have the same columns with the same column titles in the same order every time. If they can't provide that then you probably don't have something that will work for automated import as you will get the file in a differnt format everytime depending on the mood of the person who maintains the Excel spreadsheet. Incidentally, we push really hard to never receive any data from Excel (only works some of the time, but if they have the data in a database, they can usually accomodate). They also must know that any changes they make to the spreadsheet format will result in a change to the import package and that they willl be charged for those development changes (assuming that these are outside clients and not internal ones). These changes must be communicated in advance and developer time scheduled, a file with the wrong format will fail and be returned to them to fix if not.
If that doesn't work, may I suggest that you open the file, delete the first two rows and save a text file in a data flow. Then write a data flow that will process the text file. SSIS did a lousy job of supporting Excel and anything you can do to get the file in a different format will make life easier in the long run.
My first suggestion is not to accept a file in that format. Excel files to be imported should always start with column header rows. Send it back to whoever provides it to you and tell them to fix their format. This works most of the time.
Not entirely correct.
SSIS forces you to use the format and quite often it does not work correctly with excel
If you can't change he format consider using our Advanced ETL Processor.
You can skip rows or fields and you can validate the data the way you want.
http://www.dbsoftlab.com/etl-tools/advanced-etl-processor/overview.html
Sky is the limit
You can just use the OpenRowset property you can find in the Excel Source properties.
Take a look here for details:
SSIS: Read and Export Excel data from nth Row
Regards.
I'm learning how to develop SSIS packages for ETL systems this week. One of my first objectives is to discover different ways to import flat files into a database. As this is pretty straight forward for the most part, I've been playing around with different flat files that contain a variety of data.
One issue I ran into today was with a Excel document that contained data in the first row, the header information in the second row and foot information in the last couple of rows. What I want to import into the database is the header and all the rows leading up to the footer. I do not want the first row and I do not want the footer.
My current solution is to create a Data Flow task in Advance Settings and OpenRowSet with "Sheet1$A2:I20000". This allows me to open the sheet I want, select the second row (where my header resides) and then select all other rows that are between A2 and I20000.
This solution also allows me to read the header information (which I want) and all the rows that follow for importation. Unfortunately, this also selects the footer rows and doesn't seem optimize for good performance as the package has to scan a massive range of rows regardless if there is data in those rows or not.
The screenshot below contains the Excel sheet that I'm trying to import based on the MS SQL sample database. The rows I want to remove or ignore are circles with the red box. Everything else not circled is what I want to import.
Any thoughts on how I can ignore the first row, read the second row for my header information, read the rows that follow the header for my data set and then ignore the last couple of rows that I'm deeming as the footer?
Addition Information About This File
The first row will never change.
The header row will never change.
The data set after the header will change values, not data types.
The first column of footer will never change.
The second column of footer will change values, not data types.
The rest of the footer columns will never change.
I figured out the solution to my own question.
I used the Conditional Split as shown in my diagram to filter out the rows I didn't need. For example, I put a condition that checks if the first column of data (member_no) was < (less than) a number. If TRUE, it goes to my OLE DB. If False, it goes nowhere. This prevented the "SUM TOTAL" from being passed to the database.
I also edited my start range with 'Sheet1$A2:I' as opposed to 'Sheet1$A2:I20000'. That way the package scans until there is no records to scan and stops (I assume).