datamash is a command-line utility that allows you to perform basic numeric, textual, and statistical operations on input data files. It is a simple and lightweight tool that is useful for data manipulation and analysis tasks.
To use datamash, you will need to specify the operation that you want to perform, as well as the input data file and any necessary options or arguments. For example, to calculate the average of a column in a CSV file, you could use the following command:
# datamash -t, mean 1 < input.csv
This command will calculate the mean of the values in the first column of the input.csv file, using a comma as the field delimiter.
If you encounter the below error while running the command datamash:
datamash: command not found
you may try installing the below package as per your choice of distribution:
Distribution | Command |
---|---|
Debian | apt-get install datamash |
Ubuntu | apt-get install datamash |
Arch Linux | pacman -S datamash |
Kali Linux | apt-get install datamash |
Fedora | dnf install datamash |
OS X | brew install datamash |
Raspbian | apt-get install datamash |
datamash Command Examples
1. Get max, min, mean and median of a single column of numbers:
# seq 3 | datamash max 1 min 1 mean 1 median 1
2. Get the mean of a single column of float numbers (floats must use "," and not "."):
# echo -e '1.0\n2.5\n3.1\n4.3\n5.6\n5.7' | tr '.' ',' | datamash mean 1
3. Get the mean of a single column of numbers with a given decimal precision:
# echo -e '1\n2\n3\n4\n5\n5' | datamash -R number_of_decimals_wanted mean 1
4. Get the mean of a single column of numbers ignoring "Na" and "NaN" (literal) strings:
# echo -e '1\n2\nNa\n3\nNaN' | datamash --narm mean 1