How to Count Word Occurrences in a Text File Using Collections in Java?

Java is one of the most widely used programming languages, offering powerful libraries and frameworks for various use cases. One such capability involves reading text files and processing the content efficiently. Counting word occurrences in a text file is a common task when working with text data, and in Java, we can use the Collections Framework to implement a solution.

In this tutorial, we’ll guide you through the process of counting the frequency of words in a text file using Java’s Collections framework. Specifically, we will focus on the HashMap and HashSet classes, which are part of the Java collections API.

What is the Collections Framework in Java?

The Collections Framework in Java is a set of classes and interfaces that implement commonly reusable collection data structures. Some of the key classes in the collections framework include ArrayList, LinkedList, HashMap, and HashSet. These classes are ideal for working with groups of data, and they provide built-in methods to efficiently manipulate, retrieve, and store elements.

Steps to Count Word Occurrences in a Text File

Let’s break down the process of counting word occurrences into smaller, manageable steps:

Read the text file.
Tokenize the text into words.
Count the occurrences of each word using a HashMap.
Display the results.

1. Reading the Text File

The first step is to read the contents of the text file. We can use BufferedReader or Scanner for this purpose. Here, we will use BufferedReader to read the file line by line.

import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;

public class WordCount {
    public static void main(String[] args) {
        try {
            BufferedReader reader = new BufferedReader(new FileReader("textfile.txt"));
            String line;
            while ((line = reader.readLine()) != null) {
                System.out.println(line); // Just prints the content for now
            }
            reader.close();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

In the above code, we use BufferedReader to open the file textfile.txt and read it line by line. Each line is then printed on the console. This is a basic way to ensure that the file is being read correctly before proceeding to the next steps.

2. Tokenizing the Text into Words

Next, we need to split the lines of text into individual words. For this, we can use the split() method of the String class, which allows us to define delimiters for splitting the text.

import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;

public class WordCount {
    public static void main(String[] args) {
        try {
            BufferedReader reader = new BufferedReader(new FileReader("textfile.txt"));
            String line;
            while ((line = reader.readLine()) != null) {
                String[] words = line.split("\\s+"); // Splitting by whitespace
                for (String word : words) {
                    System.out.println(word); // Printing each word
                }
            }
            reader.close();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

In this updated code, the line is split into words using the regular expression "\\s+", which matches one or more spaces. The split() method then returns an array of words, which are printed one by one.

3. Counting Word Occurrences Using HashMap

Now, we need to keep track of how many times each word appears in the text. A HashMap is a perfect data structure for this task because it stores key-value pairs. The word itself can be the key, and the value will represent the count of occurrences.

import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.util.HashMap;

public class WordCount {
    public static void main(String[] args) {
        HashMap wordCountMap = new HashMap<>();

        try {
            BufferedReader reader = new BufferedReader(new FileReader("textfile.txt"));
            String line;
            while ((line = reader.readLine()) != null) {
                String[] words = line.split("\\s+");
                for (String word : words) {
                    word = word.toLowerCase().replaceAll("[^a-zA-Z]", ""); // Normalize the word
                    if (!word.isEmpty()) {
                        wordCountMap.put(word, wordCountMap.getOrDefault(word, 0) + 1); // Increment count
                    }
                }
            }
            reader.close();

            // Print the word count
            for (String word : wordCountMap.keySet()) {
                System.out.println(word + ": " + wordCountMap.get(word));
            }
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

Here, we’ve introduced a HashMap called wordCountMap to store the word counts. For each word, we ensure that it is converted to lowercase and stripped of any non-alphabetical characters. This ensures that “Hello” and “hello” are treated as the same word. The getOrDefault() method is used to retrieve the current count or initialize it to 0 if the word is not found in the map.

4. Displaying the Results

Finally, after counting the occurrences of all the words, we can loop through the wordCountMap to print each word along with its count. The result will show the frequency of each word in the text file.

Conclusion

In this tutorial, we demonstrated how to count word occurrences in a text file using Java’s collections framework. By using BufferedReader for reading the file and HashMap for storing word counts, we were able to efficiently process the text and display the results. The process can be further extended and optimized, depending on specific use cases and requirements.

Please follow and like us: