Under the Hood: ThreatSlayer’s Supervised Learning AI
Get a better understanding of the tech we’re using to catch bad actors — where it comes from, how it works, and why we’re using it.
Artificial Intelligence is all the rage lately, and for good reason. Recent advances in the field are producing impressive results in everything from self-driving cars and automated vacuum cleaners to image and language recognition. Online security is definitely a place where you want to leverage AI to stay ahead of the game. But how did we get here, how does AI work, and why is it useful for this?
Interlock is dedicated to working together to develop a safer internet for all. As part of this objective, we have incorporated OpenAI into our platform to aid with analyzing heuristics, creating documentation, and supporting and educating users. By doing so, Interlock is now better equipped to offer top-tier security solutions to our customers. We believe that artificial intelligence (AI) will continue to be a key factor in cybersecurity in the future, and we are thrilled to be leading the way in this area. Please note that Interlock users' data will never be accessible by OpenAI.
What We Talk About When We Talk About AI
Let’s start at the beginning. AI is often used as a catch-all for a number of different disciplines that include robotics as well as machine learning and its subfields like artificial neural networks and natural language processing.
While this isn't necessarily wrong, we should understand what its real meaning is. At a basic level, artificial intelligence simply refers to the simulation of human thought by a computer.
This is something that was originally proposed in 1939 as the Church-Turing Thesis. The simple version of it is that as long as we can give a computer a step-by-step explanation of the way we think about something, the computer will be able to replicate that process indefinitely.
More important for us, as the computer runs that process over, and over, and over, we can coach it to make small improvements and “learn” along the way. This lets the computer get really good — and even better than us — at mimicking things we do naturally like pattern recognition.
To restate a previous example:
We could have an AI that is built to identify types of fruit. The computer would just have to be trained on a dataset of pictures of fruits along with their corresponding labels (e.g. apple, banana, orange).
Its task is to match the patterns it sees in each picture to their corresponding label, all on its own. After many attempts where we evaluate the results and provide feedback, the computer starts to recognize the visual characteristics of each type of fruit.
Then it can be used to classify new, unseen images of fruits as one of the known types. For instance, when presented with a new image of a strawberry, it will be able to correctly identify it as a strawberry. It will work even if that image wasn’t in the dataset it was trained with.
This is what we call machine learning. A similar version of this is what the most popular applications like ChatGPT are doing to understand and generate text.
Different Types of Machine Learning
Of course, there are many ways to “teach” the machine to do these things. Just for fun, let’s get meta and ask ChatGPT to explain the different types of learning techniques that algorithms like itself are trained with (I’ll only add headers for better readability):
When it comes to Machine Learning, there are four distinct types of algorithms: supervised, semi-supervised, unsupervised, and reinforcement. Let’s take a look at each one more closely.
Supervised learning is where the system uses labeled data sets in order to learn from them by recognizing patterns or correlations among different variables. This type of algorithm requires many examples of input/output pairs so that when an unknown input is encountered, the correct output can be predicted.
Some common applications of supervised learning include facial recognition systems, voice recognition software, credit card fraud detection systems, and natural language processing (NLP) programs.
Unsupervised learning algorithms are used when there is no labeled training data available. In this case, the algorithm must learn by using cluster analysis or other methods to identify patterns in the input data without any help from labels. Common applications include customer segmentation and anomaly detection systems.
Semi-supervised learning combines aspects of both supervised and unsupervised learning. It uses labeled data to gain initial understanding of the system, but then relies on unlabeled data to refine its knowledge.
This type of algorithm is often used when there is an abundance of unlabeled data available and a shortage of labeled data. Examples include object recognition in images and text classification.
Reinforcement learning, also known as RL, differs from all other ML algorithms because it allows the system to learn through trial and error. The system is rewarded for making correct decisions and penalized for making mistakes, with the goal being to find an optimal solution. Applications of RL include robotics, game playing, and autonomous vehicles.
No matter which type of Machine Learning algorithm you choose to use, remember that it’s important to understand how each one works in order to make sure it’s used correctly and effectively. Good luck!
…Yes, all of the information above is correct. ChatGPT itself was trained using supervised and reinforcement learning on text scraped from across the internet.
Supervised Learning in ThreatSlayer
ThreatSlayer’s Threat Detection Tool makes use of the same technological principles that generated the text you just read, except we’re using them to catch malicious websites. It’s an AI that uses supervised learning from the SciKit Learn library, and just as ChatGPT suggested, we’ve made sure to use it effectively.
Even within supervised learning there are different approaches to training the AI depending on your goals. You can train it to classify objects, you can train it to find the relationships between objects, and you can train it to forecast trends according to past data. These are known as classification, regression, and forecast algorithms, respectively.
For ThreatSlayer, we are using a particularly effective classification and regression algorithm known as a Decision Tree.
The algorithm is really good at deciding whether a random image falls into a specific category according to the information it already has. In our case, it can hone in on key characteristics of a website that may indicate if it’s malicious or not.
But no matter how good the algorithm is, it doesn’t do much without the right data. We trained Threat Slayer on large datasets of known crypto scams, phishing links, dangerous URLs, malware, and more.
Just like the fruit example, we made sure that ThreatSlayer became really good — even better than a person — at detecting what a malicious website looks like by testing it and correcting it based on its output.
More importantly, we will keep training it with more and more data as the ThreatSlayer and Bouncer communities grow. Every new threat you report will strengthen the algorithm and make it better at catching and predicting what bad actors are up to.
Why Use AI for Threat Detection?
Again, AI is really good at imitating how we think about things. Just like you use your calculator to avoid napkin math every time you split the check at a restaurant, you can use an AI algorithm to do some sniffing out. It helps you be a little more at ease when browsing — especially when you’re carrying funds in your Web3 wallet.
It also helps that this particular AI is updated with information about new threats from across the internet thanks to an engaged (and incentivized) community. You’re not just off-loading the part of your brain that’s always on the lookout for threats, you’re also leaving it to a hivemind that’s seen and recognized more threats than you would want to spend time on.
We really think that if we’re improving the web with decentralization, we should also take it up a notch in terms of security. AI is the right tool for this.
Don’t miss a beat when it comes to the safety of your funds as you navigate both Web2 and Web3. Follow us on Twitter to stay on top of how we’re improving our AI and, if you still haven’t, download ThreatSlayer today.