LetsDefend — MSHTML Challenge Walkthrough

Maldoc analysis using zipdump.py, re-search.py, & VirusTotal

9 min readAug 25, 2024

Image Credit: https://app.letsdefend.io/challenge/mshtml

Introduction:

Have you ever come across a suspicious document file and wondered if it’s doing something malicious in the background? If so, welcome to another weekly walkthrough — you’ve stumbled on the right blog!

This week, we’re tackling the MSHTML challenge from LetsDefend. Our mission is to analyze four malicious document (maldoc) samples, discover the IP addresses and domains hidden within them, and use that information to figure out which vulnerability or CVE that the threat actor is exploiting.

Throughout this walkthrough, we’ll explore the inner workings of .docx files to find indicators of compromise (IOCs). To do that, we’ll use several tools from Didier Stevens including zipdump, re-search, and numbers-to-string, to extract the artifacts. Then, we’ll leverage VirusTotal to correlate threat intelligence and determine the exploited CVE. Sounds like a fun time!

Although LetsDefend rates this challenge as Hard, we’ll go through it step-by-step to make it much more accessible. What are we waiting for? Thanks for reading along with me!

Challenge Link: https://app.letsdefend.io/challenge/mshtml

Challenge Scenario:

2021’s 0-Day MSHTML

Question 1: Examing the Employees_Contact_Audit_Oct_2021.docx file, what is the malicious IP in the docx file?

All right, let’s jump right in! But before we go too far down the rabbit hole, let’s check out the Tools folder on the Desktop to see what we have available at our disposal to work through this challenge.

For Question 1, we are going to be performing some analysis on a .docx file. It wouldn’t be much fun if we could simply just open it and find our answer, right?

With that in mind, let’s first get some background about the structure of the document’s format from the SANS Analyzing Malicious Documents cheat sheet:

OOXML document files (.docx, .xlsm, etc.) supported by Microsoft Office are compressed zip archives.

Interesting, since a .docx file is basically a zip archive, let’s go back and see what tool in the Tools folder might be able to help with this task. Maybe we can utilize Didier Stevens’ zipdump.py? According to the SANS cheat sheet, this utility can be used to “examine contents of OOXML file” — it sounds like this might fit the bill, let’s try it!

We’ll use the following syntax to perform a basic analysis on the document:

python3 zipdump.py /root/Desktop/ChallengeFiles/Employees_Contact_Audit_Oct_2021.docx

We can see that zipdump.py lists out all the files contained within the .docx and assigns them an index filename — there are so many to choose from! Let’s start with a broad strokes approach.

After consulting the man pages for zipdump.py, we can use the — dumpall (-D) option to dump all these files rather than focus on a specific one for now.

But how will that help us analyze the output? For this, we can pipe the output into another Didier Stevens tool, re-search.py.

re-search.py is a tool that uses regular expressions to search through files. You can use regular expressions from a small builtin library, or provide your own regular expressions.

Combining zipdump and re-search, we’ll use the below command to dump all the indexes in the sample, pipe them into re-search, and then use the included filters to search the output for unique IPv4 addresses:

python3 zipdump.py -D /root/Desktop/ChallengeFiles/Employees_Contact_Audit_Oct_2021.docx | python3 re-search.py -n -u ipv4

Now we’ve located an IP address within the document and found the answer to Question 1!

While we’re at it, let’s do some additional threat intelligence gathering about this IP address on VirusTotal. This could come in handy for later in the challenge…

Question 2: Examing the Employee_W2_Form.docx file, what is the malicious domain in the docx file?

The same way we solved the previous question, we’re going to again combine zipdump and use the filtering capabilities of re-search to locate domains within the dump instead of IPv4 like we did in Question 1.

Let’s look at the options for re-search again:

https://github.com/DidierStevens/DidierStevensSuite/blob/master/re-search.py

At first glance, the url and url-domain options seem like the best choices to use with re-search. But we’ll hit a snag and not locate any suspicious hits when using these filters. Let’s pivot and try a third option, domaintld, in case the top-level domain is not one that is found with the standard url filter.

python3 zipdump.py -D /root/Desktop/ChallengeFiles/Employee_W2_Form.docx | python3 re-search.py -u -n domaintld

There we go! Using the domaintld filter, we found the below domain in the document and can answer Question 2!

Question 3: Examing the Work_From_Home_Survey.doc file, what is the malicious domain in the doc file?

Okay, Question 3 has us analyzing a .doc file. While different than .docx, let’s approach this question with the same way that we used to answer Question 2 by using zipdump.py and re-search.py with the domaintld filter:

python3 zipdump.py -D /root/Desktop/ChallengeFiles/Work_From_Home_Survey.doc | python3 re-search.py -n -u domaintld

This seems promising but this domain isn’t long enough to answer Question 3…

Let’s dig more deeply. Instead of using the zipdump.py -D option to dump all the streams, let’s try to analyze them individually. But how do we know which streams to focus on?

Well, let’s do some Google research about the OOXML format to find out more about which stream contains external references like URLs. After some brief searching we’ll stumble across a reference sheet for the WordprocessingML file type from Open Office which has a very helpful note:

Office Open XML - Anatomy of an OOXML WordProcessingML File

Anatomy of a WordProcessingML File A WordprocessingML or docx file is a zip file (a package) containing a number of…

officeopenxml.com

With that background information, we are going to focus on stream 10 (-s 10) and dump (-d) the content from this file only using the below command.

python3 zipdump.py /root/Desktop/ChallengeFiles/Work_From_Home_Survey.doc -s 10 -d

This returns a huge blob of output but let’s focus on the highlighted section where we see a Relationship Id. We know from the OOXML specification that this should be the right location to find external links but it seems to be encoded…

While we could use something like CyberChef to perform some decoding/transformation, let’s stick with the provided utilities and use another of Didier Stevens’ tools — Numbers-to-String.py

numbers-to-string.py is a Python program that reads texts files (as arguments on the commandline, @here files or stdin), extract numbers from these files and converts these to strings.
The first argument of numbers-to-string.py is a Python expression. This Python expression can use variable n that represents each extracted number.

This should allow us to dump the content of this stream and pipe it into numbers-to-strings to perform the conversion for us.

python3 zipdump.py /root/Desktop/ChallengeFiles/Work_From_Home_Survey.doc -s 10 -d | python3 numbers-to-string.py

There we go! By doing some research and combining two of the included tools, we’ve uncovered a malicious domain within the .doc file!

Question 4: Examing the income_tax_and_benefit_return_2021.docx, what is the malicious domain in the docx file?

For Question 4, we’re looking for a malicious domain again, so let’s circle back and apply the same process that we used to answer Question 2.

Except instead of using the domaintld option like we used before, let’s see if we get any hits using the url-domain option.

python3 zipdump.py -D /root/Desktop/ChallengeFiles/income_tax_and_benefit_return_2021.docx | python3 re-search.py -n -u url-domain

Hey, we found one unique URL in the output! Let’s check it against VirusTotal to see if we can find any hits to confirm if this is a malicious domain or not to confirm our finding.

We have a few hits, but we’ll go a step further and check the Relations > Communicating Files tab, where we will see several file hits including one that looks very familiar…

Let’s submit our answer and move on to the final question to determine what common vulnerability all of the sample files exploit.

Question 5: What is the vulnerability the above files exploited?

Okay, last question! To tackle Question 5, we’ll check the file hash of each sample to collect more information from VirusTotal and discover the common vulnerability that each malicious document exploits.

First, to get the hashes, we’ll run the SHA256sum command for all the files in the ChallengeFile directory:

sha256sum *

Then, we can submit each of the hashes to VirusTotal.

https://www.virustotal.com/gui/file/679bbe0c50754853978a3a583505ebb99bce720cf26a6aaf8be06cd879701ff1

https://www.virustotal.com/gui/file/ed2b9e22aef3e545814519151528b2d11a5e73d1b2119c067e672b653ab6855a

https://www.virustotal.com/gui/file/84674acffba5101c8ac518019a9afe2a78a675ef3525a44dceddeed8a0092c69

https://www.virustotal.com/gui/file/d0e1f97dbe2d0af9342e64d460527b088d85f96d38b1d1d4aa610c0987dca745

Did you notice that each one is tagged with the label CVE-2021–40444? I think we have found the answer, but let’s do some additional research about this vulnerability from Microsoft which describes it as:

A remote code execution vulnerability in MSHTML that affects Microsoft Windows. Microsoft is aware of targeted attacks that attempt to exploit this vulnerability by using specially-crafted Microsoft Office documents.
An attacker could craft a malicious ActiveX control to be used by a Microsoft Office document that hosts the browser rendering engine. The attacker would then have to convince the user to open the malicious document.

While this is just a brief summary, we get the idea that the samples we’ve analyzed are Microsoft Office documents specially-crafted to exploit a Windows MSHTML vulnerability. Between the intelligence we gathered from VirusTotal and the CVE details from Microsoft, we have enough data to answer Question 5!

Conclusion:

Mission accomplished! We’ve successfully analyzed each of the four maldoc samples, found the IP addresses and domains within them, and used those artifacts to figure out which CVE was exploited. Let’s wrap up this investigation.

A big thank you to LetsDefend for another excellent challenge! While I’ve used Didier Stevens tools before, I hadn’t had the opportunity to try re-search or numbers-to-string. These tools really helped to speed up the investigation since I didn’t have to pivot to external tools, and they were powerful for parsing the zipdump output. This was a great opportunity to practice with these tools hands-on!

If you found this walkthrough helpful in leveling up your skills or getting you through a tricky question, please give it a clap! Your feedback lets me know that I helped you out on your security journey. We’re in this together! Thanks for the support!

Until next week’s challenge — stay curious and be safe out there!

Tools & References:

SANS Cheat Sheet for Analyzing Malicious Documents: https://www.sans.org/posters/cheat-sheet-for-analyzing-malicious-documents/

Didier Stevens (Zipdump.py): https://github.com/DidierStevens/DidierStevensSuite/blob/master/zipdump.py

Didier Stevens (re-search.py): https://blog.didierstevens.com/2021/02/22/re-search-py-and-custom-validations/

VirusTotal: https://www.virustotal.com/gui/ip-address/175.24.190.249/relations

Open Office XML Reference: http://officeopenxml.com/anatomyofOOXML.php

Didier Stevens (Numbers-to-Strings): https://github.com/DidierStevens/DidierStevensSuite/blob/master/numbers-to-string.py

VirusTotal (Employee_W2_Form.docx): https://www.virustotal.com/gui/file/679bbe0c50754853978a3a583505ebb99bce720cf26a6aaf8be06cd879701ff1

VirusTotal (Employees_Contact_Audit_Oct_2021.docx): https://www.virustotal.com/gui/file/ed2b9e22aef3e545814519151528b2d11a5e73d1b2119c067e672b653ab6855a

VirusTotal (Work_From_Home_Survey.doc): https://www.virustotal.com/gui/file/84674acffba5101c8ac518019a9afe2a78a675ef3525a44dceddeed8a0092c69

VirusTotal (income_tax_and_benefit_return_2021.docx): https://www.virustotal.com/gui/file/d0e1f97dbe2d0af9342e64d460527b088d85f96d38b1d1d4aa610c0987dca745

Microsoft MSHTML Remote Code Execution Vulnerability: https://msrc.microsoft.com/update-guide/vulnerability/CVE-2021-40444