Blue Team Labs Online — Employee of the Year Challenge Walkthrough

Analyzing a DD disk image with Scalpel and PhotoRec

Drew Arpino
10 min readJun 16, 2024
Image Credit: https://blueteamlabs.online/

Introduction:

Welcome to my weekly walkthrough!

Have you ever been curious about recovering deleted data from a disk image file? Well, we’re about to explore data recovery and analysis by tackling the Employee of the Year challenge from Blue Teams Labs Online! This is a capture the flag style challenge that has us defenders investigating a DD disk image, searching for lost files, and recovering flags from inside of the document structures by leveraging Scalpel and PhotoRec.

So, whether you’re here to learn more about DD file analysis, check out some practical use of file carving tools, or are just looking for a reference walkthrough for the Employee of the Year challenge, you’ve stumbled on the right blog.

Now, let’s put on our detective hats and have some fun with forensics! Thanks for reading along!

Challenge Link: https://blueteamlabs.online/home/challenge/employee-of-the-year-df16bc36f3

Challenge Scenario:

John received the ‘Best Employee of the Year’ award for his hard work at FakeCompany Ltd. Unfortunately, today John deleted some important files (typical John!). It’s your job to recover the deleted files and capture all the flags contained within!

Setup the REMnux Analysis Environment & Extract the challenge file:

Safety first — It’s always important when working with lab/challenge files from Blue Team Labs Online (or any educational lab/challenge/range) to keep yourself safe by performing these tasks in a dedicated, isolated virtual machine environment. For example, I’m using REMnux for this challenge and walkthrough.

To keep this write-up focused I’m going to skip a step-by-step setup guide of REMnux. Instead, if you want to set up your own REMnux environment please follow the directions provided by REMnux directly. I opted for the virtual appliance method:

Okay! Now that we have our virtual environment created, updated, isolated, and snapshotted, we can download and extract our challenge file and get started!

Question 1: What is the text written on the recovered gif image?

Let’s dive right in and get an overview of the .dd file. This is a raw disk image file and we will be working to recover the data deleted by the user.

To start out, we’re going to use the strings command. At a high-level, this will help us reveal some of the data within the image by printing pieces of data contained (strings) within the image out to the console.

strings recoverfiles.dd
Output of the strings command on the .dd image file.

Right away we will see some interesting, relevant strings at the very top of the output. For Question 1 we are going to focus on recovering the .gif image, but how do we extract the information from the image?

We’re going to use the data carving tool, Scalpel. According to the REMnux documentation, Scalpel is used to:

Carve contents out of binary files, such as partitions.

So how do we do this? Well, Scalpel uses a targeted approach, so we need to know what type of file that we’re looking for. In this case we know that we need to carve out a GIF file, so we’ll first need to adjust the Scalpel configuration file by uncommenting (removing #) the relevant lines for GIF files in a text editor. For example, I’ll use Nano.

sudo nano /etc/scalpel/scalpel.conf

Now that we have selected the GIF file types, we can run Scalpel against the image file to extract any of the matching file types. I made a folder called Recovered that we will use as an output directory.

Let’s try it out!

scalpel -o Recovered/ recoverfiles.dd

It looks like Scalpel was able to carve out one GIF file, let’s check out Recovered folder and see what it found.

GIF file extracted by Scalpel.

Good job, indeed! Let’s submit the answer and move on to Question 2.

Question 2: Submit Flag1

Since we tested Scalpel for Question 1, why don’t we try a different tool for Question 2?

There is another suggested tool for this challenge, PhotoRec. This is another data recovery tool that we are going to leverage to retrieve files from the disk image. One of the benefits of PhotoRec is that it has many more file types selected by default, so we don’t necessarily need to know what exactly we are looking for. This is going to be critical since the only clue we know to look for is Flag1.

Let’s try PhotoRec out:

sudo photorec recoverfiles.dd

There will be a few screens that will require some input from us, but I just left the default settings.

At the last menu screen, you will need to select an output destination and press C to confirm your choice.

Okay, PhotoRec carved 5 files out of the image. Let’s review them and see what we found:

Contents of the PhotoRec recovery.

Very interesting! The first file we see is the GIF file from Question 1, that would have saved us some time to start with PhotoRec. But more importantly is the .png file — let’s open it up to find Flag1!

Question 3: Submit Flag2

Now that we found the first flag, let’s keep looking at the files that PhotoRec recovered for us.

Contents of the PhotoRec recovery.

We’re going to focus on the .PDF and .MP4 later in the challenge so let’s just focus on the file f0009072.docx.

We don’t have a way of opening the file to view the contents, so we are going to do a little static analysis on the structures of the file itself.

Let’s establish some background theory about the .docx file format first.

According to Microsoft:

The Open XML format (.docx/.xlsx/.pptx) is the default format in all supported versions of Microsoft Office and, unless you have a specific reason to use a different format, it’s the format we recommend using for your Office file

Now the Office Open XML (OOXML) format is essentially structured as a ZIP archive and made up of XML files and other data (files, images, etc.). If we use a tool like oleid we can confirm the container format:

So, all of this to say is that we need to view the content structure of this file to see what streams are available for us to analyze!

If we do some research on this topic, we’ll stumble across a SANS Internet Storm Center diary entry from Didier Stevens whose tool, zipdump.py, we’ll leverage.

Let’s follow the concepts of this research and try running zipdump.py on the .docx file we retrieved:

sudo zipdump.py f0009072.docx

After running zipdump.py, we can view the streams within the .docx file, let’s focus on index number 5, word/document.xml, that contains the content of the document itself.

Putting all of this together, we’re going to use zipdump.py to dump the stream of word/document.xml for us to examine using the below syntax to select Index 5.

sudo zipdump.py -s 5 -d f0009072.docx

Okay, it’s not pretty when displayed in the console but we are seeing the structure of the document content! Outside of all the formatting, notice the string highlighted in the image above? This looks like a Base64-encoded string, doesn’t it?

We’re almost there! Let’s test out the theory and try to decode this string in CyberChef. For this challenge, I will use the version built-in to REMnux, but you can use the online version, too.

We can apply the From Base64 operation to the recipe and input the string we found in the .docx file:

And there we go — we found the 2nd flag!

Question 4: Submit Flag3

For Question 4, let’s turn our attention to the PDF file since we saw it had some text in the preview icon that might give us a clue.

Well, not much to go on here, so let’s see if there is anything to discover in the structure of the PDF. We will use another tool by Didier Stevens, pdf-parser.py, to parse the PDF file for the data objects that make up the document rather than what we saw rendered.

So, let’s say we wanted to search for a flag, we can (and will) parse the document and search for a string within the objects. For this challenge, let’s just use grep to clean-up the output and simply look for “flag.”

pdf-parser.py f0009040.pdf | grep -i "flag"

Awesome, we got it! But something looks a little off, doesn’t it? We need to decode this to get a fully readable flag, so let’s jump back into CyberChef again.

It looks like the flag has some URL/Percent encoding which is used to ensure valid characters for transmission over the internet. In CyberChef let’s add the URL Decode operation to the recipe and see if we can grab the flag…

Question 5: What is the filesystem of the provided disk image?

This is a tricky question to tackle. If we do some research on Google, we’ll find that there is no shortage of suggested methods to locate this information including: blkid, fsck, df, etc.

Unfortunately, none of these commands can help determine the answer to Question 5.

We could continue doing some further Google searching, but let’s try to leverage generative AI. I’m going to check with Microsoft Copilot for any methods I might have missed.

According to Copilot, there is a method I hadn’t found yet in my earlier research:

To identify the file system type, use cfdisk:

sudo cfdisk your_file.dd

Let’s be diligent and validate that the information is correct by verifying the provided source link. How to open .DD file to analyze it? — Ask Ubuntu

After a quick overview from the forum link, the information looks accurate! Let’s try it…

Awesome! By following this method, we were able to find an additional method that helped us locate the answer to Question 5!

Question 6: What is the original filename of the recovered mp4 file?

Okay, last question! Let’s focus on the final file that PhotoRec recovered back in Question 2, the MP4 file.

Remember, that this isn’t the original name but the recovered name after the data carving. We can actually watch the video to find that the content is referencing SBTCertifications — this name rings a bell…

Remember back in Question 1 that we ran the strings command on the .dd file and we saw some interesting file names?

Let’s try looking at the entire recovery image with strings again. We already know there is a ton of output, so let’s just grep for mp4 this time.

strings recoverfiles.dd | grep -i "mp4"

And there we go! We found the final flag! Let’s wrap up this investigation.

Conclusion:

Hey, nice job with the investigation! We successfully analyzed the DD file, located the flags, and recovered John’s files to complete the Employee of the Year challenge! Now that we successfully helped John to recover his data and retain his “Employee of the Year” status, let’s close this case.

A big thank you to Blue Teams Labs Online for hosting this awesome challenge! This was a fantastic opportunity to learn about file carving and add some new tools to my tool kit. I also appreciated the depth of this challenge. We not only had to learn how to find and recover the files, but we also had to deep-dive into OOXML and PDF files to locate the flags. Overall, I gained some valuable experience about analyzing DD disk images and data recovery. I hope that you had as much fun as I did and learned something new, too!

Thank you so much for reading along and working through this investigation with me. Until next week — stay curious!

Tools & References:

REMnux: https://docs.remnux.org/

Scalpel: https://github.com/sleuthkit/scalpel

Photorec: https://www.cgsecurity.org/wiki/PhotoRec

Microsoft File Formats: https://support.microsoft.com/en-us/office/learn-about-file-formats-56dc3b55-7681-402e-a727-c59fa0884b30#:~:text=docx%20file%20is%20an%20Open%20XML%20formatted%20Microsoft%20Word%20document.

Wikipedia Office Open XML: https://en.wikipedia.org/wiki/Office_Open_XML

OLEID: https://github.com/decalage2/oletools/wiki/oleid

Zipdump.py: https://blog.didierstevens.com/2020/07/27/update-zipdump-py-version-0-0-20/

SANS XML Document: https://isc.sans.edu/diary/An+XMLObfuscated+Office+Document+CVE202140444/27860

CyberChef: https://gchq.github.io/CyberChef/

PDF Parser: https://blog.didierstevens.com/programs/pdf-tools/

URL Percent Encoding: https://www.w3schools.com/tags/ref_urlencode.ASP

Microsoft Copilot: https://copilot.microsoft.com/

Ask Ubuntu: https://askubuntu.com/questions/1279716/how-to-open-dd-file-to-analyze-it

--

--

No responses yet