Hunting Office Macros with Sysmon and Pandas.

7 min readDec 2, 2022

Using Pandas and Jupyter everywhere 🐼🪐

During these last months I have been studying the course SEC530: Defensible Security Architecture and Engineering: Implementing Zero Trust for the Hybrid Enterprise, during this course and certification there are different ways to implement a methodology oriented to “Zero Trust” and among the topics Office macros are mentioned.

It’s funny how when you talk about office macros, those of us in the security field get a shiver down our spine. We all know that there are multiple ways to secure a company against them, but one thing I am sure of is that, if it were up to us, we would all prefer to have macros disabled in our organization.

For better or worse, these decisions have to be very well motivated and, above all, the impact on the company has to be very well measured before any decision can be made.

To audit the use of macros there are not many options if we do not have cloud services such as O365 but we will try it.

Let’s start the adventure.

First try, file extension.📓

The easy approach might be to search our organization for files with macros by searching for the file extension, but it’s never that easy.

One thing that is important to know, and a lot of people don´t know, is that in order for an Office file to contain macros it must have very specific file extensions and other extensions are incompatible with them.

For many years, Microsoft wanted to identify documents containing macros with the letter “m” in the extension as docm, xlsm… and making it impossible to run macros in the new extensions containing the letter “x” at the end as docx, xlsx… but as always with Microsoft, backward compatibility did its thing here.

In the office 97–2003 document format we are allowed to store and run macros while keeping simple extensions like doc, xls… making it impossible for security analysts to know if a document can contain macros or not only using the file extension.

Second try, Sygma rule and Sysmon 💻

After a conversation with other professionals, they suggest me a Sygma rule written by Florian Roth. In this rule is it possible to see that there are 3 libraries that office loads when the a file has VBA macros.

‘*\VBE7.DLL*’
‘*\VBEUI.DLL*’
‘*\VBE7INTL.DLL*’

Researching more I found that there is also another possible detection and it is the registry modification in HKEY_CURRENT_USER\Software\Microsoft\Office\[office_version]\Excel\Security\Trusted Documents\TrustRecords.

This looked promising if you have Sysmon or you can deploy it.

<RuleGroup name="" groupRelation="or">
  <ImageLoad onmatch="include">
    <Rule name="Potential Macro file opened" groupRelation="or">
      <ImageLoaded condition="end with">vbeintl.dll</ImageLoaded>
      <ImageLoaded condition="end with">vbe7.dll</ImageLoaded>
      <ImageLoaded condition="end with">vbeui.dll</ImageLoaded>
    </Rule>
  </ImageLoad>
</RuleGroup>
<RuleGroup name="" groupRelation="or">
  <RegistryEvent onmatch="include">
    <TargetObject name="T1060,RunKey" condition="contains">Documents\TrustRecords</TargetObject>
  </RegistryEvent>
</RuleGroup>

It seemed like a solved case, but when you have to modify a Sysmon configuration file in an organization with thousands of machines, it’s worth double or ten times to check that everything works as expected 😊.

It occurred to me to test the Sygma rule with different office files and test what happened if these files were downloaded or not and these were the results.

It seems that our second approach is not valid neither. Although the VBE7.dll and VBEUI.dll libraries serve to identify files with locally created macros, when a file is downloaded from the internet, these libraries are loaded as well, even if the file does not contain macros.

The culprit for this behavior is the Mark-of-the-Web (MOTW) or the alternate data stream added to files when are downloaded via web browser, which causes the files to be opened in a protected view.

Since this first attempt did not work, I thought I would try to use the same approach but with a broader scope, i.e. look at all the libraries loaded by Excel in each of these executions and observe their differences.

Third and last try, Python and Pandas 🐼 🔥

Foto de Erik Mclean: https://www.pexels.com/es-es/foto/ciudad-carretera-hombre-arte-4065797/

Anyone who has played with Procmon and the loading of libraries will know that this can be a bit tedious as each Excel run loads hundreds of libraries. To help us in this task we have our third protagonist Python with his faithful friend Jupyter 😊.

We will use Sysmon and DLL loading events to find differences between Office executions, this way we could identify differences that could indicate that an open document contained macros or not, independently of its extension.

First the Sysmon rule to monitor the libraries loaded by the Excel file during the opening of a document.

<RuleGroup name="" groupRelation="or">
  <ImageLoad onmatch="include">
    <Rule name="Image Loaded by Excel" groupRelation="or">
      <Image condition="end with">excel.exe</Image>
    </Rule>
  </ImageLoad>
</RuleGroup>

When I was thinking about tests I would like to do, I realized that compare the libraries that loaded each of the use cases was going to be very tedious so I decided to use Python and a Jupyter Notebook to be able to automate and compare results easily and because I have fun with it 😆.

Makeameme https://makeameme.org/meme/PANDAS-PANDAS-EVERYWHERE

To parse the resulting EVTX files from Sysmon we are going to use the PyEvtxParser library, an old acquaintance that was very useful to me in the past when I wrote Grafiki. Here you can find this Notebook on my Github among others.

def evtx_folder_to_dataframes(directory):
    dataframes_list_seven = {}

    for filename in os.listdir(directory):
        f = os.path.join(directory, filename)
        if os.path.isfile(f):
            events_one_five = []
            events_seven = []
            a = open(f, 'rb')
            parser = PyEvtxParser(a)

            for record in parser.records_json():
                event = json.loads(record['data'])
                #Image loaded
                if event["Event"]["System"]["EventID"] == 7:
                    event_list = [event["Event"]["EventData"]["UtcTime"],
                    event["Event"]["System"]["EventID"],
                    event["Event"]["EventData"]["ImageLoaded"],
                    event["Event"]["EventData"]["FileVersion"],
                    event["Event"]["EventData"]["Description"],
                    event["Event"]["EventData"]["Product"],
                    event["Event"]["EventData"]["Company"],
                    event["Event"]["EventData"]["OriginalFileName"],
                    event["Event"]["EventData"]["Hashes"],
                    event["Event"]["EventData"]["Signed"],
                    event["Event"]["EventData"]["Signature"],
                    event["Event"]["EventData"]["SignatureStatus"]]
                    events_seven.append(event_list)
            name = filename.split("\\")[-1].split(".")[-2]

            df_7 = pd.DataFrame.from_records(events_seven,
                                   columns=['UtcTime','EventID', 'ImageLoaded', 'FileVersion', 'Description',
                                            'Product', 'Company', 'OriginalFileName', 'Hashes', 'Signed',
                                           'Signature', 'SignatureStatus'])
            dataframes_list_seven[name] = df_7
            
    return dataframes_list_seven

Once we have our list of events, we’ll turn to Pandas to make it easier to work with the data we have.
Let’s start watching how many unique libraries are loaded in each opening.

df_count = pd.DataFrame(index=list(dataframes_list_seven.keys()), columns=["Count"])
for e in list(dataframes_list_seven):
    df_count["Count"][e] = dataframes_list_seven[e]["ImageLoaded"].nunique()

As we can see in the image, when the document has macros or has been downloaded, the number of libraries is significantly higher.

The next question we ask ourselves is, will there be different libraries between these executions that allow us to determine that a document has macros?

To do this, what we are going to do is to make a comparison between all the measurements, thereby creating a comparative table that will allow us to detect anomalies. In the left part of the table we will have the base of our comparative, that is to say the part “Which libraries contains this execution” and in the upper part “Which does not contain this execution”.

For this we will use this simple function that allows us to make a comparative of DataFrames and we will only keep the ones that are in the first operator.

def compare_df(df_in, df_not_in):
    list = df_in.merge(df_not_in.drop_duplicates(), 
                       on=['ImageLoaded'], 
                       how='left', 
                       indicator=True) 
    list_min = list[list['_merge'] == 'left_only']["ImageLoaded"].unique()
    return(list_min)

df = pd.DataFrame(index=list(dataframes_list_seven.keys()), columns=list(dataframes_list_seven.keys()))

for e in list(dataframes_list_seven.keys()):
    for i in list(dataframes_list_seven.keys()):
        list_min = compare_df(dataframes_list_seven[e], dataframes_list_seven[i])
        df[i][e] = len(list_min)

The results speak for themselves and the first column allows us to draw a quick conclusion, a downloaded document without macros contains at least the same libraries as the rest of the tests 😞 with the exception of Office 1997–2003 documents with macros that contain an additional library.

At this point we can draw the conclusion that this way is not valid for detecting Office documents with macros. A great discovery that can save us from spending much more time implementing rules, analysing results and drawing conclusions on data that we believed to be reliable.

Bonus 🎯

As the idea with Jupyter is to generate reusable documents. We are going to implement a small menu that will help us to see in detail each of the comparisons.

Maybe in this case it is not very relevant but this notebook is easily adaptable to monitor other types of executions detecting for example differences in network connections, registry key modifications, creation of named pipes…

def compare_events():
    import ipywidgets as widgets
    from IPython.display import display, Markdown, clear_output
    pd.set_option('display.max_rows', 500)
    output = widgets.Output()
    dfs = list(dataframes_list_seven.keys())
    columns = dataframes_list_seven[list(dataframes_list_seven.keys())[0]].columns
    in_widget = widgets.Dropdown(
                        options=dfs,
                        description='Events in:',
                        disabled=False)
    not_in_widget = widgets.Dropdown(
                        options=dfs,
                        description='And not in:',
                        disabled=False)
    columns = widgets.Dropdown(
                        options=columns,
                        description='Column:',
                        disabled=False)
    button = widgets.Button(description=f'List')
    display(in_widget, not_in_widget, columns, button, output)

    def _click_function(_):
        with output:
            clear_output()
            list = dataframes_list_seven[in_widget.value].merge(dataframes_list_seven[not_in_widget.value].drop_duplicates(), 
                                                          on=['ImageLoaded'], 
                                                          how='left', 
                                                          indicator=True) 
            list_min = list[list['_merge'] == 'left_only'][columns.value].unique()
            display(len(list_min))
            display(pd.DataFrame(list_min).style.set_properties(**{'text-align': 'left'}))

    button.on_click(_click_function)

With this function we can generate a menu that allows us to see the difference between two of the results taking into account the selected column.

I hope you liked it and that, as I did, you don’t trust my results and you dare to try it by yourselves, if you do it and the results are different, please let me know 😊

See you next time!

Hunting Office Macros with Sysmon and Pandas.

First try, file extension.📓

Second try, Sygma rule and Sysmon 💻

Third and last try, Python and Pandas 🐼 🔥

Bonus 🎯

Written by Luis Francisco Monge Martinez