Comparing two excel files for matching / mismatching rows (long text strings - match() wont work

ahaye

New Member
Joined
Jan 14, 2024
Messages
3
Office Version
  1. 2016
Platform
  1. Windows
Hi folks , long time viewer, first time poster.
At my work, we use excel table for tracking our long term project plan.
I've been tasked to go back through several revisions of the file to flag all changes that were made between each rev. After which, I need to re-add those changes to a different file.
The file initially was setup with a unique ID number filed to label row numbers, but unfortunately many of those updating the file didn't understand the need to update the number sequentially, either not filling it out, or copying the number from the previous row they were duplicating.

Essentially, any time a change is made to an item in our plan, we create a new row in the table, with most the data being duplicated from the original row and then change the value in a column called "status" to superseded.
By way of example, we might have moved a project from 2025 to 2026, no change in scope, cost etc. We would create a new row in the table, copy in all the information from the old row, with two columns being different, the "planned year" would be 2026 and the "status" would be Different Year. We would then change the "status" of the old row to Superseded. We also have a comment filed which would get populated with an explanation of the change. This forms the basis of our change management process to ensure we can defend and explain changes to the plan over long periods of time.

We typically use pivot tables to summarize our plan for any valid rows with specific values in the status field, such as New, Different Year, and In Plan while excluding values like superseded, cancelled etc.
There is a lot of columns in the actual file, and typically in excess of 256 characters of text/number data across all columns per row (if you were to concatenate each column together in a given row).

The file is saved on SharePoint, and each revision gets saved on the server.
I've saved each version of the file I want to review (ie rev 4.12 to 4.11, 4.11 to 4.10 and so on).
I've tried using concatenate to create a unique cell with the combined text of each row's columns in both files to compare against, however nothing seems to be working. The match function is limited in total character count, and v-lookup doesn't seem like a good fit, although I might just be getting fed up with it at this point.

Here is a very brief example of what a typical change might look like. The users are obligated to provide a detailed explanation of the changes they made however that would not usually include enough information to locate the changed & added rows. I've also added a small image of the actual file.

I'm totally out of ideas, anything anyone can suggest would be greatly appreciated! I'm running Excel 2016.
From a knowledge level I'm not well versed in VBA solutions, and I'm not really able to convert this to a macro enabled file, so I'm stuck looking for a solution with the default excel formulas.
Thanks!

Project NameProject YearStatusCostComment
Project 12025Superseded$15,000Project moved to 2026 per reference doc xxxx
Project 12026Different Year$15,000


Oh I should add, I know that the real solution for this type of file is to have this turned into an access database to force people to use it correctly, but the team is frightened to death of ms access haha
 

Attachments

  • excel example.png
    excel example.png
    8.5 KB · Views: 13
Last edited by a moderator:

Excel Facts

Which came first: VisiCalc or Lotus 1-2-3?
Dan Bricklin and Bob Frankston debuted VisiCalc in 1979 as a Visible Calculator. Lotus 1-2-3 debuted in the early 1980's, from Mitch Kapor.
I'm probably the wrong person to comment because I can't directly answer your question. But since no others have appeared, let's try. For example, you could try to create a unique cell content by combining ProjectName + the number of letters in the row or something similar. Your example is too narrow to suggest a precise way to do it.


It would be easier to use VBA to create a checksum from the rows. It is not necessary to change the type of the processed files. You can make a separate workbook where the macros are. Of course, this requires that the use of macros is possible at your work.


My apologies for any quirks, English is not my native language. "So I blame Google translate for all my goofs." :devilish:
 
Upvote 0
I'm probably the wrong person to comment because I can't directly answer your question. But since no others have appeared, let's try. For example, you could try to create a unique cell content by combining ProjectName + the number of letters in the row or something similar. Your example is too narrow to suggest a precise way to do it.


It would be easier to use VBA to create a checksum from the rows. It is not necessary to change the type of the processed files. You can make a separate workbook where the macros are. Of course, this requires that the use of macros is possible at your work.


My apologies for any quirks, English is not my native language. "So I blame Google translate for all my goofs." :devilish:
I've got rudimentary VBA knowledge. Id be willing to try it. I was trained in C++ but never formally in visual basic. most of my background comes from recording macros, and adjusting the outputs to suit my needs.
Ultimately, the challenge I have is, any one of the columns in a single row can be updated, sometimes it may be a clerical update, sometimes it may be a change in year of execution, project manager, cost, etc. I find typically new entries (rows added) are a little more obvious to locate, sometimes as simple as using conditional formatting to highlight duplicates and check through the results. But when its a change to an entry it can be very difficult. For instance, Row 9 could be a modification to supersede Row 45, with a change in D9, E9, L9 (update cost estimate, year of execution and record status) and a change in L45 (changing record status to superseded).
Had the file been setup more like MS Access, with each row having a unique key (ie setting Column A to be sequentially numbered as you add rows) I could have easily used VLOOKUP to compare file 1 to file 2 and find all changes rapidly.

The real final challenge is automating the file so that any added rows in the table get a unique key # added automatically, and prevent duplication errors.
Part of the same issue, not related to this immediate problem (for better context); table columns 'O' and 'M' are automatic formulas which scale our cost estimates to match projected inflation. Users are told the can't over write those two values when copying old data, but quite often people copy a complete row (say A9:Z9) and paste values into the newly create row, breaking the formulas. This is usually an easy fix (just overwrite the formula again) but it would be great if we could just prevent bad habits.

If you have any suggestions for a VBA file compare, I'm all ears to learn!
 
Upvote 0
Here is an example of how to calculate a checksum from rows.

Identical rows must always produce the same checksum and different rows must always produce a different checksum.

This concatenates all columns of a row into text and calculates a checksum from it.

Inside the VBE, Go to Tools -> References, then Select Microsoft XML, v6.0
(Needed for checksum calculation)

The test data should start from cell A1 and end no later than two columns before the R column (which is the P column).
By default, the name of the sheet is Temp
The checksum is written in the R column.

No comparison is performed here yet.

VBA Code:
Option Explicit


Sub TS_Hashing_Range_V5()

Dim wsTemp As Worksheet
Dim ReadDataRNG As Range

On Error GoTo ErrHand: Call TurnOffFeatures

' ***** Here are the variables to define *****
Set wsTemp = ThisWorkbook.Worksheets("Temp")            ' Worksheet to test
Set ReadDataRNG = wsTemp.Range("a1").CurrentRegion      ' Data range
Dim Salt As String: Salt = "Some long text"             ' Salt for hashing
Dim CheckSumColumn As String: CheckSumColumn = "R"      ' Column were checksum is written
Dim coT As Single: coT = Timer()

Dim DataARR As Variant
DataARR = ReadDataRNG.Value2

Dim HashRowSTR As String
Dim HashRowHASH As String

Dim DataARRRows As Long: DataARRRows = UBound(DataARR, 1)
Dim DataARRCols As Long: DataARRCols = UBound(DataARR, 2)

Dim HashARR As Variant
ReDim HashARR(1 To DataARRRows, 1 To 1)

Dim iR As Long, iC As Long

' ***** Calculation of row Hash *****
For iR = 1 To DataARRRows
    HashRowSTR = ""
    
        For iC = 1 To DataARRCols
            HashRowSTR = HashRowSTR & DataARR(iR, iC)
        Next iC
        
    HashRowHASH = Base64_HMACSHA256(HashRowSTR, Salt)
    HashARR(iR, 1) = HashRowHASH
Next iR

wsTemp.Range(CheckSumColumn & 1 & ":" & CheckSumColumn & UBound(HashARR)).Value2 = HashARR
Debug.Print Timer() - coT ' Time spent calculating hash values.


ErrHand:
    Call TurnOnFeatures
    If Err.Number <> 0 Then MsgBox "Something went badly wrong!" & vbCrLf & vbCrLf & "Error number: " & Err.Number & " " & Err.Description

End Sub

Public Function TurnOffFeatures()
    Application.Calculation = xlManual
    Application.EnableEvents = False
    Application.ScreenUpdating = False
    Application.DisplayStatusBar = False
End Function
Public Function TurnOnFeatures()
    Application.Calculation = xlAutomatic
    Application.EnableEvents = True
    Application.ScreenUpdating = True
    Application.DisplayStatusBar = True
End Function

Public Function Base64_HMACSHA256(ByVal sTextToHash As String, ByVal sSharedSecretKey As String)
    Dim asc As Object, enc As Object
    Dim TextToHash() As Byte
    Dim SharedSecretKey() As Byte
    Set asc = CreateObject("System.Text.UTF8Encoding")
    Set enc = CreateObject("System.Security.Cryptography.HMACSHA256")
 
    TextToHash = asc.Getbytes_4(sTextToHash)
    SharedSecretKey = asc.Getbytes_4(sSharedSecretKey)
    enc.Key = SharedSecretKey
 
    Dim bytes() As Byte
    bytes = enc.ComputeHash_2((TextToHash))
    Base64_HMACSHA256 = EncodeBase64(bytes)
    Set asc = Nothing
    Set enc = Nothing
End Function

Private Function EncodeBase64(ByRef arrData() As Byte) As String
 
    'Inside the VBE, Go to Tools -> References, then Select Microsoft XML, v6.0
    '(or whatever your latest is. This will give you access to the XML Object Library.)
 
    Dim objXML As MSXML2.DOMDocument60
    Dim objNode As MSXML2.IXMLDOMElement
 
    Set objXML = New MSXML2.DOMDocument60
 
    ' byte array to base64
    Set objNode = objXML.createElement("b64")
    objNode.DataType = "bin.base64"
    objNode.nodeTypedValue = arrData
    EncodeBase64 = objNode.Text
 
    Set objNode = Nothing
    Set objXML = Nothing
 
End Function


Is row checksum a useful approach to your problem?
 
Upvote 0
Solution
I can’t say for sure if this would have worked out but it is beyond my understanding to implement. I’m going to see if my company can get me a VBA course. In the meantime, I have manually shifted through the data, and I’m going to add a unique row key so that in future v-lookup will correctly operate.
 
Upvote 0

Forum statistics

Threads
1,216,500
Messages
6,131,016
Members
449,615
Latest member
Nic0la

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top