datamash is a command-line utility that allows you to perform basic numeric, textual, and statistical operations on input data files. It is a simple and lightweight tool that is useful for data manipulation and analysis tasks.
To use datamash, you will need to specify the operation that you want to perform, as well as the input data file and any necessary options or arguments. For example, to calculate the average of a column in a CSV file, you could use the following command:
# datamash -t, mean 1 < input.csv
This command will calculate the mean of the values in the first column of the input.csv file, using a comma as the field delimiter.
datamash Command Examples
1. Get max, min, mean and median of a single column of numbers:
# seq 3 | datamash max 1 min 1 mean 1 median 1
2. Get the mean of a single column of float numbers (floats must use "," and not "."):
# echo -e '1.0\n2.5\n3.1\n4.3\n5.6\n5.7' | tr '.' ',' | datamash mean 1
3. Get the mean of a single column of numbers with a given decimal precision:
# echo -e '1\n2\n3\n4\n5\n5' | datamash -R number_of_decimals_wanted mean 1
4. Get the mean of a single column of numbers ignoring "Na" and "NaN" (literal) strings:
# echo -e '1\n2\nNa\n3\nNaN' | datamash --narm mean 1