Hunting threats with Pandas 🐼👊 — $MFT Analysis

Improving incident response with data analysis techniques and tools.

Published in

Towards Data Science

6 min readAug 12, 2021

It is really exciting to see the direction that, in my opinion, cybersecurity is taking. Incident response and even more so Threat Hunting are increasingly linked to the processing of huge amounts of information.
I believe this is due on the one hand to the amount of telemetry stored and/or generated by current devices and that there are more and more incidents and many are very large incidents, with many computers and even many sites of an organization involved.
This means that when we have to find an attacker within an organization in Threat Hunting or when we have to find the tracks of an attacker in a DFIR exercise, we have to deal with more and more information, millions of events from multiple sources, with different formats … a nightmare.

I have always argued that programming is a basic skill for any analyst in cybersecurity, if we do not have the minimum knowledge, we are doomed to always have to use third-party tools that will not always fit our needs. In this case it is no different, being able to create small scripts or programs that automate data processing will make a big difference.

Python and Pandas to the rescue!

While Python has the ability to surprise programmers who start with it, Pandas has blown me away!
Pandas is a data analysis and manipulation tool, as defined by its creators it is fast, powerful and flexible… they fall short, it’s mind-blowing. 😍
Although they are independent projects, it is highly recommended to “play” with Pandas using JupyterLab, well it is recommended for anything, if you have never used it, when you do, you will regret not having done it before.
I recommend you to install Anaconda 🐍 to work with JupyterLab, this will not end up our computer full of Python libraries that we don’t need, you have to be organized! 😄
To start, once Anaconda is installed, we will start JupyterLab by running this command in the Anaconda console.

C:>jupyter notebook

Once in the Jupyter interface you will be able to create a new notebook with Pyhton 3 and start playing.

MFT analysis 📂

Although my intention is to make several posts talking about ways to use Pandas in the daily work of an analyst, in this case we will talk about how to work with a MFT (Master File Table) with Pandas and Python.
To begin with, the MFT is a very valuable asset in many security incidents as it is a database where the NTFS file system keeps track of all the files and directories created on the storage volume. The situations in which it is necessary to look at this log are legion, but in this case, our victim has suffered from ransomware.
The MFT is not in a comfortable format to work with and to make it easier for us I am going to use Eric Zimmerman’s MFTECmd tool. 🔝
If the MFT is not very big we can analyze it with MS Excel, but many times the file is so big that we need something more versatile.

Initially, some variables must be declared and the MFETCmd tool must be run.

import os
from subprocess import check_output
mft_path = "\"C:\\kape\\collected\\2021-06-01T151604\\C\\$MFT\""
mftexplorer_path = "\"C:\\MFTExplorer\\MFTECmd.exe\""
output_folder = "C:\\Documents\\test"
output_filename = "MyOutputFile.csv"command = "{0} -f {1} --csv \"{2}\" --csvf \"{3}\"".format(mftexplorer_path, mft_path, output_folder, output_filename)
print(command)
output = os.popen(command).read()

The result of this execution will be a CSV file in the set folder and that is what we need to feed our Panda. 🐼

import pandas as pd 
pd.set_option('display.max_columns', 500)

data = pd.read_csv(output_folder + "\\" + output_filename)

At this point we already have a dataframe with all the information from the MFT. To continue working comfortably with the data it is advisable to adjust the types of the fields containing dates.

data.set_index("EntryNumber", inplace=True)
data['Created0x10'] =  pd.to_datetime(data['Created0x10'], format='%Y-%m-%d %H:%M:%S.%f')
data['Created0x30'] =  pd.to_datetime(data['Created0x30'], format='%Y-%m-%d %H:%M:%S.%f')
data['LastModified0x10'] =  pd.to_datetime(data['LastModified0x10'], format='%Y-%m-%d %H:%M:%S.%f')
data['LastModified0x30'] =  pd.to_datetime(data['LastModified0x30'], format='%Y-%m-%d %H:%M:%S.%f')
data['LastRecordChange0x10'] =  pd.to_datetime(data['LastRecordChange0x10'], format='%Y-%m-%d %H:%M:%S.%f')
data['LastRecordChange0x30'] =  pd.to_datetime(data['LastRecordChange0x30'], format='%Y-%m-%d %H:%M:%S.%f')

The data is now ready for queries. In this case, this MFT belongs to a computer that suffered a ransomware attack, more specifically Avaddon.
Let’s imagine that we don’t know when the ransomware was executed and we want to know it in order to perform the investigation.
First of all, let’s try to see in the MFT which days are the ones in which more changes were made to files.

dates = data["LastRecordChange0x10"]
dates.index = dates.dt.to_period('d')
s = dates.groupby(level=0).size()
s.sort_values(ascending=False).head(10)

In the image above you can see how with a few commands we have extracted the dates of modification of files and after grouping the data we obtain firstly the date of installation of the operating system and secondly what apparently can be the day of the attack.
If we prefer to see it in a more visual way, we can generate a graph with the changes made in the last 2 months for example.

Another test we can do is to look for the most repeated files within the MFT for the last 4 months, with the intention of discovering anomalies.

fig = px.bar(names_plot, x="FileName", "title=Files with more entries")
Fig.show()

The ransom notes! This ransomware creates ransom notes during its execution and modifies them with each file it encrypts. To see the information in a little more detail we will do the following.

readme = data[data["FileName"].str.contains("_readme_.txt")]
redme.sort_values(by="LastModified0x10", ascending=True).head()

In this case, with the appearance of the ransom notes we can know when the incident occurred but now we are going to try to find out who could have been the perpetrator only with the MFT. To do this, we are going to try to query the files with the most common extensions used by attackers in ransomware cases, but only in the 12 hours prior to the creation of the first ransom note. 🔍

first_note = readme1.sort_values(by="LastModified0x10", ascending=True).iloc[0]["LastModified0x10"]
range_exe = first_note + pd.offsets.Hour(-12)
data_filtered = data[(data['Created0x10'] > "2021-05-22") & (data['Created0x10'] < "2021-05-25")]
files = data_filtered[data_filtered["FileName"].str.contains("\.exe|\.ps1|\.msi|\.vba", regex=True)]

Now that we have the file we have to find out how it got to the computer, but that is something that the MFT will not be able to tell us 😄. Although we have not gone very deep, we could perform analysis on multiple MFT files, search by name patterns, compare creation and modification dates, search for files with suspicious attributes… a world of possibilities.

In future posts I will talk about more advanced cases in which Pandas can help us in our investigations as in the analysis of gigabytes of firewall logs or analysis of Windows events.
I hope you liked it and that you are encouraged to include Pandas and Jupyter in your arsenal of hunting tools.

You can find this notebook in this link, enjoy!

See you in the next one and happy hunting!

Hunting threats with Pandas 🐼👊 — $MFT Analysis

Improving incident response with data analysis techniques and tools.

Python and Pandas to the rescue!

MFT analysis 📂

Written by Luis Francisco Monge Martinez