Geetha Murugesan, CISA, CRISC, CGEIT, CDPSE, COBIT 2019 Foundation, COBIT 5 Implementor and Assessor, and member of ISACA Emerging Trends Working Group

Geetha is an IT Governance, IT security, IT risk management and IT professional with over twenty-five years’ experience. She has offered consulting, implementation, and advisory services to various organisations in the banking, telecom, health care, manufacturing, government, and insurance sectors while working for a largest Indian IT software company. She is a regular on-site trainer for conducting training through ISACA HQ for certification exam like CRISC and CISA for various multinationals for the last 5 years. She is a Global volunteer with ISACA Global.


Businesses generate a large amount of data every day, which presents both an opportunity and a challenge. On the one side, data helps companies get smart insights on people’s opinions about a product or service. There is an increasing need for analysing emails, product reviews, social media posts, customer feedback, and support tickets. On the other side, there’s the dilemma of how to process all this data. That’s where text mining plays a major role.

What is Text mining?

Text mining (also known as text analysis), is the process of transforming unstructured text into structured data for easy analysis. 

In a business context, unstructured text data can include emails, social media posts, chats, support tickets, and surveys. Sorting through all these types of information manually often results in failure. Not only because it’s time-consuming and expensive, but also because it’s inaccurate and impossible to scale.

Text mining is an automatic process that uses natural language processing (NLP) to extract valuable insights from unstructured text. By transforming data into information that machines can understand, text mining automates the process of classifying texts by sentiment, topic, and intent.

Text mining: a powerful tool

A recent stat from Gartner claims that almost 80% of the existing text data is unstructured, meaning it’s not organized in a predefined way, it’s not searchable, and it’s almost impossible to manage. In other words, it’s just not useful. Text mining in big data analytics is emerging as a powerful tool for harnessing the power of unstructured textual data by analysing it to extract new knowledge and to identify significant patterns and correlations hidden in the data. However, being able to organize, categorize and capture relevant information from raw data can be a major concern and challenge for companies.

Machine learning is a discipline derived from artificial intelligence (AI), which focuses on creating algorithms that enable computers to learn tasks based on examples. Machine learning models need to be trained with data, after which they’re able to predict with a certain level of accuracy automatically. When text mining and machine learning are combined, automated text analysis becomes possible.

Difference between text mining and text analytics

Both text mining and text analytics are intended to solve the same problem (automatically analysing raw text data) by using different techniques. Text mining identifies relevant information within a text and therefore, provides qualitative results. Text mining combines notions of statistics, linguistics, and machine learning to create models that learn from training data and can predict results on new information based on previous experience.

Text analytics, however, focuses on finding patterns and trends across large sets of data, resulting in more quantitative results. Text analytics uses results from analyses performed by text mining models, to create graphs and all kinds of data visualizations.

Text mining has proved to be a reliable and cost-effective way to achieve accuracy, scalability, and quick response times. 

How Does Text Mining Work?

Text mining helps to analyse large amounts of raw data and find relevant insights. Combined with machine learning, it can create text analysis models that learn to classify or extract specific information based on previous training. Text mining may seem complicated, but it’s quite simple to get started with these steps:

  1. Gather the data. Prepare or generate a document which lists data everywhere it’s present. Data can be internal (interactions through chats, emails, surveys, spreadsheets, databases, etc.) or external (information from social media, review sites, news outlets, and any other websites).
  2. Prepare your data. Text mining systems use several NLP techniques ― like tokenization, parsing, lemmatization, stemming and stop removal ― to build the inputs of the machine learning model.

Then, it’s time for the text analysis itself. 

Text mining use cases: 

Text mining technology is now applied to a wide variety of government, research, and business needs for records management and searching documents.  Legal professionals use text mining for e-discovery.  Governments and military groups use text mining for national security and intelligence purposes like cybercrime prevention. Scientific researchers incorporate text mining approaches into efforts to organize large sets of text data (i.e., addressing the problem of unstructured data), to determine ideas communicated through text (e.g., sentiment analysis in social media) and to support scientific discovery in fields such as the life sciences and bioinformatics. In business, applications are used to support competitive intelligence, knowledge management, risk management, enhanced customer service, automated ad placement, business intelligence, content enrichment, etc. Many text mining software packages are marketed for security applications, especially monitoring and analysis of online plain text sources such as Internet news, blogs, spam filtering, and social media data analysis for national security purposes. 

Enterprises are in an environment of uncertainty and ambiguity that requires continuous flexibility, innovation, and investment or reinvestment in data and analytics strategy. Data and analytics are now vital to business strategy, adding significant value to digital transformation initiatives.

Content Disclaimer

Related Articles