Generation of Sufficient Cut Points to Discretize Network Traffic Data Sets

Image credit: pixabay


Classification accuracy and efficiency of an intrusion detection system (IDS) are largely affected by the discretization methods applied on continuous attributes. Cut generation is one of the methods of discretization and by applying variable number of cuts (in a partition) to the continuous attributes, different classification accuracy are obtained. In the paper to maximize accuracy of classifying network traffic data either ‘normal’ or ‘anomaly’, the proposed algorithm determines the set of cut points for each of the continuous attributes. After generation of appropriate and necessary cut points, they are mapped into corresponding intervals following centre-spread encoding technique. The learnt cut points are applied on the test data set for discretization to achieve maximum classification accuracy.

Tuhin Sharma
Senior Principal Data Scientist

My research interests include AI, NLP and Distributed Computing.