Commandline Puzzler

Paul R. Brown @ 2009-09-25T19:39:39Z

Suppose that you have to files that consist of records, one per line, and you want to ensure that none of the records in the second file appear in the first. How do you do it with only the text processing commandline tools commonly available on *nix systems?

Meta

Tags: (tag) (tag)

(comment bubbles) 3 comments
370 direct views

Comment from Paul Brown @ 2009-09-25T22:22:15Z # permalink

You could use diff, but that's not in the spirit of the question.

My solution is:

 cat file1 file1 file2 | sort | uniq -c > out

Now, the lines prefixed by a count of 1 are those only in file2, the lines prefixed by a 2 are only in file1, and those prefixed by a 3 are in both.

Comment from Alexander @ 2009-09-26T19:32:00Z # permalink

You need to be sure that "intersection" of file1 and file2 is empty. In the case when records in files are uniq. Something like this can be useful:

 if [ `cat file1 file2 | sort | uniq -d | wc -l` -eq 0 ]
 then
     echo No file2 in file1
 else
     echo Files intersect
 fi

Comment from Hen @ 2009-09-29T05:35:14Z # permalink

grep -Ff file2 file1