Tackling the data classification issue through notorious neural networks

 Tackling the Data Classification Issue Through Notorious Neural Networks


Why is Classification an issue in Data Mining?

We all know how neural networks work. Neural networks are a deeply connected set of nodes (neurons) that learn data and perform some function. This is in most cases a classification. Classifications can include classifying an input set to a set of classes that give us an idea of where they belong. If let's say that we had an input of Color and Shape, and we had to tell what fruit it is, we'd have to first train the neural network to be able to classify the inputs to an output as per its' learning.
If, the Color is Yellow and the Shape is:


then the neural network must classify as a 'Banana'. Now it needed to have learned that such inputs belonged to the class of a 'Banana'. 

Consider the same thing on a very big data set and now we have a problem. That is effectively the issue with larger data sets and why neural networks tend to pose a threat when it comes to classification in data mining. 

In addition to this, also consider the following problem:
  • Neural networks work with weights at each neuron that gets updated progressively as soon as the learning begins. Thus, classification rules developed at every node get lost because there is no way for us to view them within the network. Hence, neural networks are entirely ineffective for finding classification rules.

Proposed Methodology of Finding Classification Rules

Neural Networks are not looked at and considered when it comes to finding classification rules. However, because they are more effective than decision trees in the same regard and tend to give a high accuracy rate, hence, they are employed in our methodology. Along side such a good accuracy rate, they also tend to withstand large noise in data. Such robustness is mostly achieved through adding noise during the learning phase of the neural networks.

Hence to begin, we must now have a
  1. Rule-Extraction Algorithm
  2. Pruning to save time
within our neural network for it to effectively be able to deliver classification rules with the accuracy desired.

The Rule-Extraction algorithm tends to find clear and concise rules within a neural network by identifying the relationship within different nodes in the network and then deriving a mathematical formula. An image of this is presented below:



Alongside, Rule-Extraction, one other requirement is of Pruning. Pruning basically helps to remove weights that would, otherwise, give the same results or won't have a lot of effect on the classification accuracy for our neural network. One very effective method to do this is to prune the extra nodes once a classification accuracy has been reached. So, once training is complete, a set of nodes can be removed without affecting the weights associated and keeping the accuracy of the larger initial network in itself.



A further detailed study of this methodology will be posted later, describing how Pruning and then Rule-Extraction, tends to bring about classification rules that are similar to decision trees. Another optimization of neural networks can be achieved through

  • Reducing the time it takes to learn and train the neural network                                                                                                                                                             

This can also be achieved by a concept known as incremental training. Incremental training works by appending and adding new data sets to pre-existing ones in the hope that the AI model will learn from it and train itself accordingly. This would enable us to utilize rule-extraction algorithms on any database alongside and then gather results in real-time, without the need for training again and again.








Comments