Classification Technology

Automatic classification is the process of putting a document or piece of text into one or more categories based on its content or appearance.  Assigning categories to documents is a common requirement in environments such as:

  • the mailroom, where inbound documents need to be sorted into the appopriate work queues
  • content management, where assigning categories to documents can facilitate effective storage and retrieval
  • records management, where the document type is an important factor in determining the appopriate retention schedule, and can greatly assist record search and navigation
  • case management, where large quantities of documents (e.g. legal or medical) require indexing by category and relevance

Classifying documents manually can be labour-intensive and prone to error.  Automatic classification offers huge benefits, providing that accurate results can be obtained without large setup costs.  First generation classifiers required manually defined rules (such as the presence of specified keywords) to tell different categories apart.  This approach is feasible when the number of categories is small and the document content fairly static, but quickly becomes unmanageable with more categories or variable content.  Second generation classifiers can be trained by example, but still require manual 'tuning' of parameters to achieve the required accuracy.

Utilising the latest academic research, Focal Point has built a third generation classifier that can operate fully automatically, learning the distinguishing features of different categories and tuning for maximum accuracy with no manual intervention.  With this ability, classification can be deployed in a self-learning mode with no up-front configuration – the system initially watches and learns from manual categorisation, and converts to a fully automated process once it has built up sufficient knowledge of the document categories.

Focal Point's classification technology is available as part of our Document Analysis SDK,  enabling the technology to be integrated into a wide variety of business appplications.