bash - How can I get common rows that exist in at lest two files or more? -
i have 7 test files. following
file 1
chr start end strand chr1 10525 10525 + chr1 10542 10542 + chr1 10571 10571 + chr1 10577 10577 + chr2 10589 10589 + chr2 565262 565262 + chr2 565397 565397 + chr3 567239 567239 + chr3 567312 567312 + chr4 567348 567348 +
how can rows common in @ least 2 files in following format
chr start end strand file1 file2 file3 file4 file5 file6 file7 chr1 10525 10525 + 0 1 0 0 0 1 1 chr1 10542 10542 + 1 1 1 1 1 0 0 chr1 10571 10571 + 0 1 0 1 1 0 0 chr3 10577 10577 + 1 1 0 0 0 1 0 chr3 10589 10589 + 0 0 1 0 1 0 1 chr4 565262 565262 + 1 0 0 1 1 1 1
"1" row exist in given file , "0" rows on exist in given file. not want show rows not common in files.
using awk:
awk ' fnr==1{ #header line: fn[++i]=filename; # record filenames fn[0]=$0; # & file header } (fnr>1){ # lines other header lines list[$0]++; # record line file_list[$0 filename]++; # record file has line } end{ for(t=0;t<=i;t++) printf "%s\t", fn[t]; # print header & file names print ""; # quick hack printing newline. for(t in list){ # every line occurred in of files if (list[t]>=2){ # if count >= 2 printf "%s\t", t; # print line for(j=1;j<=i;j++) { printf "%d\t", file_list[t fn[j]]; # print per file occurrence count. } print "" # print newline. } } }' file{1..7}
Comments
Post a Comment