linux - Maximum of multiple Rows and calculating average between them using Awk -
i want calculate maximum of values in column 8 between rows starting 1.000, 1.35, 1.70,......(incrementing 0.35)....,120 (14 rows each) separately , subsequently calculate average between them (i.e. maximum values) using awk. appreciate help
1.000 8 .... 0.017947838827838864 1.000 8 .... 0.029306373626373672 1.000 8 .... 0.018125164835164853 ... ... 1.350 27 ... 0.0014171428571428946 1.350 27 ... 0.0017828571428571971 1.350 27 ... 0.0017828571428571971 ... ... 120.000 28 ... 0.49277503924646787 120.000 28 ... 0.41021689560439561 120.000 29 ... 0.38946329670329682
it isn't hard. since there 3 useful columns in sample data, i've changed 8 3 in code below:
awk '$1 != col1 { if (col1 != "") max[col1] = max3; max3 = $3; col1 = $1 } { if ($3 > max3) max3 = $3 } end { if (col1 != "") max[col1] = max3; (i in max) { sum += max[i]; num++ } if (num > 0) print sum / num }'
the first line deals changes in column 1. if there value column 1 before (col1
), save maximum (max3
) in array max
indexed col1
. reset current value of col1
, set maximum current value in $3
.
the next line 'every line' processing; if value in column 3 larger previous largest, record new maximum.
the end
block processes 'change in column 1' in first block. doesn't need reset values because there no more lines of input. next line computes sum of values. final line prints average if there @ least 1 value process.
given sample data, produces answer:
0.174621
clearly, data 8 columns, you'd need map threes eights.
this code assumes data grouped in column 1, related entries together. possible avoid assumption, this:
awk '{ if (!$1 in max) max[$1] = $3; if ($3 > max[$1]) max[$1] = $3 } end { (i in max) { sum += max[i]; num++ } if (num > 0) print sum / num }'
this simpler previous version; looks see if value in $3
(or $8
in version) larger maximum associated $1
, , if so, stores it. if $1
hasn't been seen before, sets maximum current value; avoids issues 'what safe value maximum — values ever negative'.
and in both solutions, if want maxima printed, easy in end
block loop such as:
for (i in max) print i, max[i]
or can use more ornate print formatting suits you. note order in keys (i
values) presented indeterminate. if order matters, have sort values, either in awk
or separate sort
process.
Comments
Post a Comment