Role of Confusion Matrix for Preventing Cyber Attacks

Ashish Dwivedi
5 min readJun 6, 2021

Cyber attack definition

Simply put, a cyber attack is an attack launched from one or more computers against another computer, multiple computers or networks. Cyber attacks can be broken down into two broad types: attacks where the goal is to disable the target computer or knock it offline, or attacks where the goal is to get access to the target computer’s data and perhaps gain admin privileges on it.

8 types of cyber attack

To achieve those goals of gaining access or disabling operations, a number of different technical methods are deployed by cybercriminals. There are always new methods proliferating, and some of these categories overlap, but these are the terms that you’re most likely to hear discussed.

  1. Malware
  2. Phishing
  3. Ransomware
  4. Denial of service
  5. Man in the middle
  6. Cryptojacking
  7. SQL injection
  8. Zero-day exploits

Cyber Attack on GitHub

On February 28, 2018, the version control hosting service GitHub was hit with a massive denial of service attack, with 1.35 TB per second of traffic hitting the popular site. Although GitHub was only knocked offline intermittently and managed to beat the attack back entirely after less than 20 minutes, the sheer scale of the assault was worrying; it outpaced the huge attack on Dyn in late 2016, which peaked at 1.2 TB per second.

More troubling still was the infrastructure that drove the attack. While the Dyn attack was the product of the Mirai botnet, which required malware to infest thousands of IoT devices, the GitHub attack exploited servers running the Memcached memory caching system, which can return very large chunks of data in response to simple requests.

Memcached is meant to be used only on protected servers running on internal networks, and generally has little by way of security to prevent malicious attackers from spoofing IP addresses and sending huge amounts of data at unsuspecting victims. Unfortunately, thousands of Memcached servers are sitting on the open internet, and there has been a huge upsurge in their use in DDoS attacks. Saying that the servers are “hijacked” is barely fair, as they’ll cheerfully send packets wherever they’re told without asking questions.

Just days after the GitHub attack, another Memecached-based DDoS assault slammed into an unnamed U.S. service provider with 1.7 TB per second of data.

PREVENTION OF CYBER ATTACK

Intrusion detection system (IDS) has the potential to be the frontier of defense against cyberattacks and plays an essential role in achieving security of networking resources and infrastructures. The performance of IDS depends highly on data features. Selecting the most informative features eliminating the redundant and irrelevant features from network traffic data for IDS is still an open research issue. The key impetus of this paper is to identify and benchmark the potential set of features that can characterize network traffic for intrusion detection. In this correspondence, an ensemble approach is proposed. As a first step, the approach applies four different feature evaluation measures, such as correlation, consistency, information, and distance, to select the more crucial features for intrusion detection. Second, it applies the subset combination strategy to merge the output of the four measures and achieve the potential feature set. Along with this, a new framework that adopts the data analytic lifecycle practices is explored to employ the proposed ensemble for building an effective IDS. The effectiveness of the proposed approach is demonstrated by conducting several experiments on four intrusion detection evaluation datasets, namely KDDCup’99, NSL-KDD, UNSW-NB15, and CICIDS2017.

Proposed data analytic framework for building IDS

IN EVALUATION OF MODEL IDS USE CONFUSION MATRIX

The performance of an IDS is evaluated for its ability to correctly classify the given network traffic data packet as malicious or normal. A good IDS should pose high accuracy and detection rate with low FAR. In this regard, the current work uses confusion matrix given in fig. to compute these three metrics as follows ,

  • Detection rate (DR): Also called True Positive Rate is defined as the ratio of number of network traffic data packet detected correctly by the IDS to the total number of network traffic data packets in the testing dataset.
  • False positive rate: also termed as false alarm rate (FAR), it is the ratio of the number of normal packets detected as malicious packets (FP) to the total normal packets in the testing dataset. If this metric value increases consistently, it may cause the network administrator to deliberately ignore the system warnings Consequently, this may put the entire network into a dangerous stage. Therein, this metric value should be kept as low as possible.
  • Accuracy (ACC): can be defined as the proportion of the total number of the correct classification (detection) of malicious (TN) and normal packet (TP) to the actual size of testing dataset.

SIMPLE WAY OF UNDERSTANDING CONFUSION MATRIX

  1. True Positive (TP): These are the events which were correctly predicted by the model as “occurred = Yes”
  2. True Negative (TN): These are the events which were correctly predicted by the model as “not occurred = No”
  3. False Positive (FP): These are the events which were predicted as “occurred = Yes” but in reality it was “not occurred = No”
  4. False Negative (FN): This is the opposite of FP, i.e. predicted as “not occured = No” but in reality it was “occurred = Yes”

HAPPY LEARNING……

--

--