Confusion Matrix And Its Importance In Cyber Security

Saurabh Chowdhari
5 min readJun 6, 2021

We are in living in the world of computers . Most of the data that are very dear to us or the data which protect our privacy are online. We are using different services like social media, banking, official work and everything is online. Yes, it has made our life super easy; just in a single click, we can do many things, and in a single click, we can access and store our data online. But there is also the risk with this.

As internet usage has grown exponentially, cyber attacks and cyber threats have been huge issues.

What is Cyber Attack

A cyber attack is an digital or physical attack on the servers or computer in the public or private internet where the attacker with malicious intent seeks to expose, damage, alter, disable or try stealing the current data or changing the system configuration, and that is done unauthorized. The act of doing this cyberattack is called cybercrime.

Some of the examples of cyber attacks are:

  • Stealing Business confidential data and hacking servers
  • Exposing someone privacy and harassing
  • Stealing bank details and card details

The IT industry is trying its best to protect the data and protect servers. Many different techniques and applications have been developed to prevent cybercrimes.

Today we will talk about one of the approach i.e Confusion matrix

What is Confusion Matrix

In the field of machine learning and specifically the problem of statistical classification, a confusion matrix, also known as an error matrix is a specific table layout that allows visualization of the performance of an algorithm, typically a specialized one

Each row of the matrix represents the instances in an actual class while each column represents the instances in a predicted class, or vice versa

source-https://towardsdatascience.com

Lets understands all four blocks one by one of confusion matrix in details and also discuss about the its two type of error

  1. True Positive (TP): You predicted positive and it’s true.
  2. True Negative (TN): You predicted negative and it’s true.
  3. False Positive (FP): You predicted positive and it’s false. it is also known as Type 1 error
  4. False Negative (FN): You predicted negative and it’s false.it is also known as Type 2 error

It’s always better to use confusion matrix as evaluation criteria for machine learning model. It gives a very simple, yet efficient performance measures for our model.

Some of the most common performance measures we can use from the confusion matrix.

Accuracy: It gives the overall accuracy of the model, meaning the fraction of the total samples that were correctly classified by the classifier. To calculate accuracy, use the following formula:

(TP+TN)/(TP+TN+FP+FN).

Misclassification Rate: It tells what fraction of predictions were incorrect. It is also known as Classification Error. You can calculate it using

(FP+FN)/(TP+TN+FP+FN)

Precision: It tells what fraction of predictions as a positive class were actually positive. To calculate precision, use the following formula:

TP/(TP+FP).

Recall: It tells what fraction of all positive samples were correctly predicted as positive by the classifier. It is also known as True Positive Rate (TPR), Sensitivity, Probability of Detection. To calculate Recall, use the following formula:

TP/(TP+FN).

Specificity: It tells what fraction of all negative samples are correctly predicted as negative by the classifier. It is also known as True Negative Rate (TNR). To calculate specificity, use the following formula:

TN/(TN+FP).

Use Case In Cyber Security

Consider a real world example in which we have a server where we received 1000 data traffic in 1 hour . As we all know machine can never be 100 % correct so let’s check how it did. When we used our machine learning model it evaluated our data traffic, let’s say it predicted that the packet/transmission is dangerous or not, to the server. We want to know if the packet or transmission was good(True/1) or suspicious(False/0).

In the above image, our Machine Learning model predicted 750 packets as same, and they were actually safe, Then we can see that model said that 165 packets were suspicious and dangerous, and they were dangerous in actuality, so the machine gave us the correct information, and we save yourself in time. Now we have 20 of the packets predicted as dangerous, but they are safe packets in actuality. In this case, the model alerted a false alarm.it said the safe data unsafe and made the unnecessary trouble to security guys . This one is a Type 2 error; they are not very dangerous in the real world as they alert us . Finally, we have 65 packets which we in actuality, dangerous, but the machine predicted that they were good and safe. The packet was actually false(dangerous). Still, the model predicted they were True(safe) and that packet did not trigger any alarm or notified the security. This is called a Type 1 Error, and they are very dangerous to the server or real-world example.it can have very unlikely consequences to the company or business

So this is how the confusion matrix help in cyber attack monitoring. The security team get a lot of help using confusion matrix, and even tries to reduce the type 1 error as much as possible.

I hope I’ve given you some basic understanding on what exactly is confusion matrix.

Thanks for reading this far, here is a bonus quote for you.

If you get up in the morning and think the future is going to be better, it is a bright day. Otherwise, it’s not.

Elon Musk

--

--

Saurabh Chowdhari

Aspiring machine learning , artificial intelligence engineer. i love to explore the world of computers.