Credit card skimmers are a constant threat, especially at gas stations and ATMs where point of sale equipment is accessible to criminals. The Secret Service recovers 20-30 of these skimmers a week but this is only a small fraction of the devices in the wild. In 2019 Bhaskar et al. conducted a survey of over 1000 gas stations across 6 states and found skimmers that exfiltrated their data using Bluetooth and WiFi. While technological solutions exist for finding Bluetooth skimmers in the form of various mobile applications, little work has been done to detect WiFi skimmers. This is partially because very little data exists about these skimmers. Many of the mobile apps that work to find the Bluetooth skimmers work on a whitelist or blacklist system, which limits their effectiveness to a snapshot of what technologies criminals were using at the time the app was developed. Machine learning makes it possible for us to build a much more flexible solution, and the PSoC™ 62S2 Wi-Fi BT Pioneer Kit is the WiFi powerhouse that will make it possible.
With gas prices rising by the day this project is motivated to protect people when money is already tight. Bhaskar et al. indicates that the number of skimmers using alternatives to Bluetooth to exfiltrate data, such as WiFi, are on the rise. Anything that makes it so an attacker can recover the data from a safe distance without having to return to the pump or device itself will be attractive to these criminal elements, so work is needed to find these newer skimmers.
Applying Machine LearningHow can we train a model to detect WiFi credit card skimmers if there is no dataset to work with? Let's brainstorm. What would a WiFi credit card skimmer look like?
- It would be very close to your car when you pull in to fuel up.
- It will not look like a regular WiFi network- they won't want random people connecting to it.
- It may appear as a hidden network to further discourage users from connecting with it
Data about normal WiFi networks is plentiful. One of my favorite datasets is from Kaggle. It features thousands of scans where WiFi data such as the network's SSID, signal strength, and MAC address are captured. These networks are all "normal"- they were captured in and around an office and are a good baseline for what a regular, everyday WiFi network should look like.What if we could teach our AI to find the "bad" networks by teaching it what the "good" networks look like? This is called anomaly detection. One great way to accomplish anomaly detection with machine learning is with autoencoders. It lets you detect anomalies by training on a dataset with limited to no "bad" examples to label. This is because autoencoders are quite strange conceptually- they give you your input back as an output! If that sounds strange it should, but what it really means is that if the model knows what your input is then your output should be very similar or nearly identical to your input. If it doesn't, then the output will be very different. Here's a visual representation of what we're going to make:
Notice that the learned representation looks similar to the original. We want to feed our autoencoder parameters of WiFi networks, and if they come back out drastically different we have an anomalous WiFi network and should alert the user.
In this way, we do not need to maintain a whitelist or blacklist of networks or network parameters! This makes this solution less fragile than the Bluetooth scanning apps.
We will use Keras to train our autoencoder and the Kaggle dataset above as our input data. Keras has been chosen since saved models can easily be added to your project in ModusToolbox if saved from Keras in its.h5 format, which will make implementation a snap! I will be using the RSSI values from the dataset as my column of choice, since they indicate the proximity of WiFi networks. A very close transmitter should be flagged as an anomaly, as it will be quite close to the user and their vehicle.
You can view the complete notebook and dataprocessing steps in this interactive notebook: https://console.paperspace.com/kloeffler/notebook/rc57w66j4c2a0vq
The notebook shows how the dataset can be imported in just a few steps, and you can even train the model on Paperspace using one of their free GPUs in just a few minutes. Let's take a quick look at the model, which is quite simple:
from keras.layers import Input,Dense
from keras.models import Model
# number of neurons in the encoding hidden layer
encoding_dim = 5
# input placeholder
input_data = Input(shape=(1,))
# encoder is the encoded representation of the input
encoded = Dense(encoding_dim, activation ='relu')(input_data)
# decoder is the lossy reconstruction of the input
decoded = Dense(1, activation ='sigmoid')(encoded)
# this model maps an input to its reconstruction
autoencoder = Model(input_data, decoded)
# this model maps an input to its encoded representation
encoder = Model(input_data, encoded)
# model optimizer and loss
autoencoder = Model(input_data, decoded)
# loss function and optimizer
autoencoder.compile(optimizer='adadelta', loss='binary_crossentropy')
# train test split
from sklearn.model_selection import train_test_split
x_train, x_test, = train_test_split(rssi_df, test_size=0.1, random_state=42)
# train the model
autoencoder.fit(x_train,
x_train,
epochs=50,
batch_size=256,
shuffle=True)
autoencoder.summary()
Notice that this is very similar to the above diagram! We take in a single piece of input data- an RSSI value, hence the input shape. Then we set up an encoder and decoder layer in-between. This is the compression step from the diagram, and results in the "lossy" reconstruction that will indicate if the model "recognized" what we gave it. We optimize to minimize loss and save 10% of our data for validation. The random state of course can be set to anything you wish.
You can download a pre-trained model below in.h5 format that uses the above parameters. I trained it on 10, 000 input examples from the Kaggle dataset- you may get better results by including more, as well as by playing with the training epochs and batch sizes.
Hardware SetupThe hardware setup for this project is straightforward. Unbox your PSoC6 kit and install ModusToolbox. The first thing you will need to do is update the firmware on your device following the directions you can find here. When you run the software you download on that page, you should get an output that looks like this, indicating you are now ready to load the SkimScam software:
Now, download the SkimScam project available from the GitHub link at the bottom of this page. You can do this by downloading the project as a ZIP or cloning it with Git. Then, in ModusToolbox:
Click the New Application link in the Quick Panel (or, use File > New > ModusToolbox Application). This launches the Project Creator tool.
Pick the PSoC kit from the menu shown in the Project Creator - Choose Board Support Package (BSP) dialog. It will show up like this:
In the Project Creator - Select Application dialog, choose "Import" and then select where you saved the SkimScam repository.
(Optional) Change the suggested New Application Name.
Click Create to complete the application creation process.
If you would like a brief tour of how this repo works and how I imported the model, you can jump down to the next section. If you just want to flash the software and play, read on!
Now, connect your PSoC 6 board with the included USB cable.
Now return to ModusToolbox. In the Quick Panel, scroll down, and click Program (KitProg3_MiniProg4). It will be next to an arrow in a green bubble. SkimScam will be built and start to be loaded onto your board. This process will take several minutes.
In the meantime, get your terminal software set up if you want to see what the scan task is finding around you. I recommend PuTTY. Use Device Manager or similar to figure out what COM port the PSoC 6 is on, and then set the baud rate like so:
Click "Open".
Once the SkimScam software loads, you should see in your terminal window some statistics about the autoencoder model, as well as information about the WiFi networks surrounding you:
The Blue LED will illuminate to tell you that WiFi scanning is occuring. If an anomalous WiFi network is flagged by the model, the indicated LED below will glow red:
This project lends itself well to "wardriving". Check out your local gas station, maybe you'll bust a crook!
How to add a Keras model to your project (Optional)This section is optional but I wanted to cover briefly how I added the autoencoder from the above notebook to my project. First, I opened the ML Configurator from the tools menu. I selected the H5 file I downloaded from my notebook and clicked "Generate Source", leaving everything as defaults. It looked like this:
I then checked the validation by clicking "Validate in Desktop", which helped as a sanity check that my model worked:
The next critical thing I found I needed to do was open the Library Manager under Tools and select the ML middleware libraries. They are under the libraries tab:
Click "Update". It will take a few minutes for ModusToolbox to pull in the libraries.
The code under ScanTask.c in the SkimScam repo gives a simple way you can now import and use your model. Something else I found very valuable was reading the code in main.c in Infineon's ML-profiler project. Both show the imports you need and the convenience methods you can use from the middlware library.
Comments