Wednesday, June 30, 2010

Find common list elements in Linux with comm

If you have two lists and want to extract the common elements between them, use the comm command.  It's important to note that comm requires the 2 lists to be sorted before they are compared.

Let's say we have the following sorted files, names1 and names2:

Andreas
Cameron
Janica
Reuben
Richard

Andreas
Bonnie
David
Janica
Julia
Mark

Now from these files, we will extract the same lines with comm:

comm -12 names1 names2

This will output:

Andreas
Janica

The two arguments used, -1 and -2, ensure that the lines unique to the first file (-1) and the second file (-2) are not shown...which leaves us with the lines that are common between both files.



We can also use the comm command without having the lists in separate files, by making use of bash process substitution.

Let's say we want to check which sub directories (their names) are common in two separate directories:

comm -12 <(ls /my/first/dir) <(ls /my/second/dir)

We can also use this technique to compare files that are not sorted (by first filtering them with the sort command):

comm -12 <(sort file1) <(sort file2)