Wednesday, June 24, 2009

VB Macro for deleting or detecting duplicate Outlook Mails

There are so many times that one cant help but wonder at how in spite of all the advances in comfort etc etc, life seems a lot harder than ever... sighhh.

For example, all the haphazard shifting around at office over the last 3 years has resulted in me accumulating a mountain of disorganised backups. Most notable are the outlook backups in PST formats (what are .pst files? They are Outlooks default backup file format extension). I was getting fed-up of helplessly grinding my teeth because of all the duplicate emails in different folders and different PSTs...
..especially when the solution was right in front of my eyes - A simple VB macro!

Finally I took time out to get it done and with a very polished form too as can be seen below screnshots.
To make it easy to carry around, I've stored it as an Excel file.

Download my file or scroll down further to see code to customize.

Excel file containing the macro.

The launched macro!

App in action!
Functionality
The Macro identifies any duplicate items in 2 outlook folders (even those in different PST files) and optionally marks those items.

It does this by simply comparing all the mails (or any Outlook item) in the 2 locations specified and marks the duplicate mails (in both the folders) with a tag of your choice on the Categories attribute of that email. So it only marks duplicate mails and does not do the actual deleting (those who want to automate this step can easily add an extra line in the code to delete instead of mark, but I found it too risky). An option is available to disable even the marking so that you only get the duplicate count in the summary.

Inputs
The actual program inputs are the names of the Outlook PST file and the mail folder of the 2 locations. Of course, if you want to compare 2 folders in the same PST, just provide the same PST name in both sets and choose the different folders.
The other input is the name of the label you want to apply against the Categories attribute for duplicate e-mails.
Finally, there is an option to enable marking of duplicate items.

Steps
Ensure Outlook is running before launching the macro or it will not be able to get the PSTs files loaded in Outlook.
1) Start the Macro.
2) The macro fetches the PST files loaded in your Outlook application and displays the form.
3) Select the PST file in set1. The set1 list gets updated with the folders in this PST file. Select the first folder you need to compare.
4) Similarly, select the PST file in set2. The set2 folder list gets updated with the selected PST file contents. Select the second folder you need to compare here.
5) Next choose whether to Mark the Duplicates or simply show a summary of duplicate sets.
6) Ok Button gets enabled when all the right options are selected.
7) Click on the OK button to start the process! This indexing step of the process may take time for large number of items... a progress bar displays the progress as well as a status bar with basic updates. When the process is done, the status shows the Completed message (and other statistics if you scroll up).
8) Manual step of actual deleting duplicate mails: After running the program, you have to open the Outlook folder that you want to clean up and sort by the Categories column to see those marked in Duplicate like below (the Categories column is hidden by default - this has to be added from the Field Chooser dialog box):
Results showing marked mails under attribute, "Categories"

You can give a quick look to make sure they are really duplicates. Then you simply shift-select those mails and delete! :)

Performance
Efficiency was increased by indexing the 2 folders before identifying duplicates- for 3000 + 1000 emails in the 2 folders, without indexing it took a few hours and still didnt finish (because there would be 3000 * 1000 reading of outlook items)! Now it runs in 5 mins.

Customisation
The current macro only compares the Subject and Creation Date of 2 emails to identify if they are the same (which is actually fairly safe). If you want, you can have more precision by editing the code to check more mail attributes.
Also as mentioned before it is simple to change the code to delete duplicate mails instead of just marking them.

I made the macro portable by embedding it in an Excel document with a launcher button (what you see in the background of the first 2 screens above). The Excel document containing all the code is available here. Of course, you can also create the macro right into your Outlook client itself (you'll have to copy the code manually from the Excel-macro code page to your Outlook-macro code page).

Below is the core code which is called from the form:


Const PST1_NAME = "NewBackup"
Const PST2_NAME = "NewBackup"
Const FOLDER1_NAME = "Inbox"
Const FOLDER2_NAME = "OldInbox"
Const CATEGORY_SEPERATOR = ","
Const FINAL_PROGRESS_ALLOCATED = 20 ' between 1 and 100

Public progressValue 'this holds the percentage completed.
Public progressStatus 'this holds the current status.

' sample with hardcoded psts & folders
Private Sub markDuplicateEmails()
markDuplicates PST1_NAME & SEPERATOR & FOLDER1_NAME, PST2_NAME & SEPERATOR & FOLDER2_NAME, DEFAULT_CATEGORY
End Sub

' actual method which takes dynamic pst\folder source and destination
Public Sub markDuplicates(source, destination, category)
Dim myOlApp, myNameSpace
Set myOlApp = CreateObject("Outlook.Application")
Set myNameSpace = myOlApp.GetNamespace("MAPI")

Dim tmpArray, pst1Name, pst2Name, folder1Name, folder2Name
tmpArray = Split(source, SEPERATOR)
pst1Name = tmpArray(0)
folder1Name = tmpArray(1)
tmpArray = Split(destination, SEPERATOR)
pst2Name = tmpArray(0)
folder2Name = tmpArray(1)

Dim folder1Size, folder2Size
folder1Size = myNameSpace.Folders(pst1Name).Folders(folder1Name).Items.Count
folder2Size = myNameSpace.Folders(pst2Name).Folders(folder2Name).Items.Count

Dim array1() As cstData, array2() As cstData
ReDim array1(folder1Size)
ReDim array2(folder2Size)

Dim outlookItem1, outlookItem2, i, j
Dim theCstmData As Module1.cstData
Dim startTime, endTime

'populate array1
i = -1
startTime = Now
progressStatus = "Indexing set1..."
For Each outlookItem1 In myNameSpace.Folders(pst1Name).Folders(folder1Name).Items
i = i + 1
Set theCstmData.item = outlookItem1
theCstmData.subject = outlookItem1.subject
theCstmData.creationTime = outlookItem1.creationTime
array1(i) = theCstmData
progressValue = 100 * (i / (folder1Size + folder2Size + (folder1Size + folder2Size) * (FINAL_PROGRESS_ALLOCATED / 100)))
DoEvents
Next outlookItem1
progressStatus = "Indexing set1 Complete."
' populate array2
i = -1
progressStatus = "Indexing set2..."
For Each outlookItem2 In myNameSpace.Folders(pst2Name).Folders(folder2Name).Items
i = i + 1
Set theCstmData.item = outlookItem2
theCstmData.subject = outlookItem2.subject
theCstmData.creationTime = outlookItem2.creationTime
array2(i) = theCstmData
progressValue = 100 * ((folder1Size + i) / (folder1Size + folder2Size + (folder1Size + folder2Size) * (FINAL_PROGRESS_ALLOCATED / 100)))
DoEvents
Next outlookItem2
progressStatus = "Indexing set2 Complete."
progressStatus = "Indexing time: " & (Now - startTime) * 60 * 60 * 24

'loop through each item in array1
progressStatus = "Applying Category labels on duplicates..."
For i = 0 To folder1Size - 1
'loop through each item in array 2 comparing each array2Item with current array1item
For j = 0 To folder2Size - 1
' if it is a match mark the item in array2 as duplicate
If array1(i).subject = array2(j).subject And _
array1(i).creationTime = array2(j).creationTime Then
If array1(i).item.Categories = "" Then
array1(i).item.Categories = category
Else
array1(i).item.Categories = array2(j).item.Categories & CATEGORY_SEPERATOR & category
End If
array1(i).item.Save

If array2(j).item.Categories = "" Then
array2(j).item.Categories = category
Else
array2(j).item.Categories = array2(j).item.Categories & CATEGORY_SEPERATOR & category
End If
array2(j).item.Categories = category
array2(j).item.Save
End If
DoEvents
Next j
progressValue = (100 - FINAL_PROGRESS_ALLOCATED) + (FINAL_PROGRESS_ALLOCATED * (i / folder1Size))
Next i
progressStatus = "Total Time: " & (Now - startTime) * 60 * 60 * 24
progressStatus = "All done."
End Sub


Update [25-Jun-09]:
New features
- Option to only count duplicates instead of Marking.
- Now displays even sub-folders recursively
- 2 new status views available to display number of items in selected folder.
- Other performance enhancements
See screenshot below:


Update [03-Jul-09]:
New features
-support for choosing same folder in set1 and set2. This will mark/count duplicate sets of mails in the same folder.

Update [15-Feb-12]:
New features
-Support for identifying Missing mails - choose source and destination folders, and all mails missing in destination will be identified.
-Support for marking labels as well as just counting Duplicates or Missing mails


Note: This was tested on Outlook 2003 but should work on any outlook...

33 comments:

  1. Thanks for the great macro. It didn't work when I chose Set-1 and Set-2 same folder. My all duplicate mails in same folder. Actually I don't need to compare, I just have to put a label for emails that have same subject

    ReplyDelete
  2. Serhat
    I've added support for identifying duplicates in the same folder too!
    Cheers

    ReplyDelete
  3. Sachin, many thanks...
    Have you updated the xls file too?
    Or should I update the macro code myself?

    ReplyDelete
  4. No prob.. thanks for the review,
    Yes the updated macro is in the xls file... & the latest code is in it too

    ReplyDelete
  5. I have seen numerous software in my life. But yesterday I was at the Inet and surprised reason of I found - how to recover corrupt .pst manually. It restored my old damaged emails!! Moreover I knew how this tool solved my issue for free as I kept in mind.

    ReplyDelete
  6. Hi,

    I tried to use the macro in Excel 2007 but got "Compile error : syntax error". When using Debug/Compile in VB, received "User-defined type not defined" on line
    Dim array1() As cstData, array2() As cstData
    I presume that a type library is missing in the references (currently = Visual Basic for Applications; Microsoft Excel 12.0 Object Library; OLE Automation; Microsoft Forms 2.0 Object Library; Microsoft DAO 3.6 Object Library).

    Also download code references - "Download my file" & "The Excel document containing all the code is available here." don't work.

    Thanks in advance,
    Larry

    ReplyDelete
  7. Hi Larry

    Yes, it does require a library but its quite simple and was in the example file.

    As for the file itself, thanks for letting me know - it seems the place where I stored my blog files stopped hosting them.
    I'll find another copy and update it here in a couple of days max.

    Regards
    Sachin

    ReplyDelete
  8. Excel file is back and its the latest version - faster and more features!

    ReplyDelete
  9. Could you add in the ability to loop through all folders in a mailbox? Currently I have over 80 folders and the majority of them contain duplicates to some degree.

    Any help would be great.

    ReplyDelete
  10. I'm pretty much tied up with a few things...
    But you could take a shot yourself - vb macros are easy. You could refer to any only free vba macro tutorial. I picked it up from Excel's built-in help files.

    ReplyDelete
  11. Thank you so much for this!
    Is there a version that will support the 64 bit version of Outlook ver 2010?

    ReplyDelete
  12. Sachin,

    Great Work!!! is there an option to delete duplicate Feeds?

    Regards,
    Naveen...

    ReplyDelete
    Replies
    1. Hi,

      Thanks for sharing code which helps to deleted the duplicate email messages from Outlook. I want to recommend a tool which also help to remove all email item from Outlook i.e. Stellar Phoenix Outlook Duplicate Remover.

      Delete
  13. Thanks for the macro! It saved me a few days.

    However, I've caught an error while indexing my 5-years-old archive. Something like the one mentioned here: http://support.microsoft.com/kb/258527
    I believe there were some emails with expired or corrupted certificates, but the macro just stops. A couple of rows with exception handling would solve the problem. But I suspect that I'm not eligible to make any changes in your code and post it anywhere (there's no license info but I appreciate your work). Would you please contact me for details?

    ReplyDelete
  14. Glad to hear it helped. And would like to resolve the error ur getting but I'm stretched thin on a number of things right now! I'll try and remember to pick this up when I get some time. And in case you have the inclination, feel free to modify, use and republish the code as needed if its for personal use.

    ReplyDelete
    Replies
    1. When I try to run the macro I get a visual basic compile error: can't find project or library

      Delete
  15. I am curious as to why this is not a macro within outlook and then integrated into the toolbar.

    ReplyDelete
    Replies
    1. Any macro can be integrated into the toolbar. Exact steps depend on your version of outlook but can be found in the standard Outlook help docs.

      Delete
  16. Hi! How long do you think it will take to index 20,000 emails in the same folder (inbox)?

    ReplyDelete
    Replies
    1. This varies depending on your machine capacity. The best way is measure the time for some 100 emails. And extrapolate this.

      Delete
  17. Thanks, Sachin! Unfortunately, after completion it said that there were 0 duplicate sets. Any idea why it didn't pick up my duplicate emails? I was comparing emails in the same inbox.

    ReplyDelete
    Replies
    1. I have this same result (marked 0 duplicate sets) - I am not sure why.

      Delete
    2. @Karris, @LaxMom - Perhaps its the way my code identifies duplicate mails? The code checks that the 'Subject' AND the 'Creation Date' of the 2 mails being compared. Could it be that the mails you had expected to be marked as duplicates have different 'Creation dates' for some reason?

      Delete
    3. Unfortunately yes, the duplicates have different creation dates.

      Delete
    4. Well if you are okay with only comparing subjects to identifying duplicates (the risk is that different mails that happen to have the same subject will be considered duplicates), in the macro, just remove the line that checks for creation date.

      So replace this:
      If array1(i).subject = array2(j).subject And _
      array1(i).creationTime = array2(j).creationTime Then


      with this:
      If array1(i).subject = array2(j).subject Then

      Delete
  18. I would like to have a right click menu item to mark a mail as spam and to never get the same mail in inbox again. That is because there are the event invitations circulating in our organization dozens of times each. I get daily mails which I have got already dozens of times and that is frustrating. I just want to see the invitation once, not more.

    How could I do that? Could this macro script be customized to do it?

    ReplyDelete
  19. Hi Sachin,

    I face some problem regarding code in comparing state. I post some data like ...

    array1(i).creationTime = array2(j).creationTime this code return False

    but I manually check this is the same value in like this
    For array one
    ?array1(i).creationTime
    02-Aug-13 8:54:34 AM

    And For array two

    array2(j).creationTime
    02-Aug-13 8:54:34 AM

    Why in this condition return false

    Ragards,

    Abhilash

    ReplyDelete
  20. Sachin (or anyone else in the group that could help).

    I am experiencing the follow problems with the “packaged code”:

    System: Windows XP, MS Excel 2007, MS Outlook 2007.
    Trust Center (currently set most permissive possible)
    I have verified that I have the following Automation libraries: Excel.exe, MSOutl.olb, Dao350.dll Dao360.dll

    Here are the steps I performed:
    1. Download the packaged file “DuplicateAndMissingMailsMarker.xls”.
    2. Start MS Outlook 2007
    3. Open the file “DuplicateAndMissingMailsMarker.xls”in Excel 2007, click the button.
    a. Dialog box from Excel reports: Automation error Unspecified error.
    b. Click OK, VBA editor starts, then I select Debug->Compile Project and get another error
    i. Compile error: Automation error
    ii. The error appears to originate in the CommonModule at “Public xT as TimerX” line.
    iii. I can see the “class module” TimerX under “Class Modules” so not sure what the problem is.
    4. After the automation errors have occurred I try to save the file to a new name as a “macro enabled workbook” and Excel reports the following:
    a. Errors were detected while saving [c:\path to file]. Microsoft Office Excel may be able to save the file by removing or repairing some features. To make the repairs in a new file, click Continue. To cancel saving the file, click Cancel.
    b. I click Continue, enter new filename in dialogue box and then get another dialogue error box: Excel encountered errors during save. However excel was able to minimally save your file to: [path to file in Application Data\Microsoft\Excel\XL70D0.xlsb].
    c. Then it reports: Damage to the file was so extensive that repairs were not possible. Excel attempted to save your formulas and values, but some data may have been lost or corrupted.

    I'd really like to get the code working but I appear to have some permission problem, library problem, or perhaps both. The code looks like the perfect solution to my problem so I’d appreciate any help in getting it working and am willing to share the solution.

    Someone mentioned a Type Library but not sure that’s causing the problem. Any help would be greatly appreciated.

    ReplyDelete
  21. Sachin (or anyone else in the group that could help).

    Not sure if this thread is being followed anymore or not, but I too am experiencing the follow problems with the “packaged code” as "JustMe" above is:

    System: Windows 7, MS Excel 2007, MS Outlook 2007.
    Trust Center (currently set most permissive possible)
    I have verified that I have the following Automation libraries: Excel.exe, MSOutl.olb, Dao350.dll Dao360.dll

    I performed the similar steps as above:
    1. Download the packaged file “DuplicateAndMissingMailsMarker.xls”.
    2. Start MS Outlook 2007
    3. Open the file “DuplicateAndMissingMailsMarker.xls”in Excel 2007, click the button.
    a. Dialog box from Excel reports: Automation error Unspecified error.
    b. Click OK, VBA editor starts, then I select Debug->Compile Project and get another error
    i. Compile error: Automation error
    ii. What I have been able to do is narrow the issue down to the ProgressBar1 on the userform
    iii. When trying to view the properties of the ProgressBar1, I receive the following error: "System Error &H80004005 (-2147467259). Unspecified Error"
    iv. Removed ProgressBar1 and commented out all related code and the code runs without error - now its not finding the duplications, but that's another issue

    ReplyDelete
  22. If you want try any simple way to remove you all duplicates mails from your outlook mail device then you visit Kernel for Outlook duplicates remover tool to remover all over useless and duplicates mails item from your MS outlook. The software support with all version of MS Outlook with (32 Bit and 64 Bit).

    ReplyDelete
  23. Hi Sachin,
    Could you please help me out with macro excel, which deletes emails from folder X automatically where emails are less than 3 days. I would like to schedule this job automatic so that i run the job automatic.

    ReplyDelete
  24. Excellent Product, saved me me a lot of time and space, much appreciated thank you

    ReplyDelete

 
Superblog Directory