csv-diff is a command-line tool that allows you to compare and view the differences between two CSV (Comma-Separated Values), TSV (Tab-Separated Values), or JSON (JavaScript Object Notation) files. It provides a convenient way to analyze changes, additions, or deletions in the data contained within these files.
Here are the key features and functionalities of csv-diff:
- Comparison of Data: csv-diff performs a line-by-line comparison of the input files, highlighting the differences between corresponding rows. It identifies added, modified, and removed rows, making it easy to understand the changes in the data.
- Field-Level Comparison: In addition to comparing entire rows, csv-diff can also perform field-level comparison. It compares the values within the same row across the files, allowing you to pinpoint specific changes or discrepancies within individual fields.
- Flexible Input Formats: csv-diff supports various input formats, including CSV, TSV, and JSON. This flexibility allows you to compare files that are formatted differently or use different separators. You can specify the input format while running the tool to ensure accurate comparison.
- Customizable Output: csv-diff provides options to customize the output format to suit your needs. You can choose to display only the differences or include additional information such as line numbers and context lines for better understanding of the changes. This flexibility allows you to generate output that best fits your requirements.
- Ignore Options: csv-diff offers options to ignore certain fields or columns during the comparison. This feature is useful when you want to exclude specific fields from the comparison process, such as timestamps or unique identifiers, which may vary but are not relevant for your analysis.
- Integration with Workflow: csv-diff is designed to be easily integrated into your workflow. It can be used in scripts or command-line pipelines, allowing you to automate the comparison process or incorporate it into your data processing tasks.
By using csv-diff, you can quickly identify and analyze differences between two CSV, TSV, or JSON files. Whether you are comparing different versions of a dataset, tracking changes in data over time, or validating data consistency, csv-diff provides a straightforward and efficient way to view the differences and gain insights into the changes made to the files.
csv-diff Command Examples
1. Display a human-readable summary of differences between files using a specific column as a unique identifier:
# csv-diff /path/to/file1.csv /path/to/file2.csv --key=column_name
2. Display a human-readable summary of differences between files that includes unchanged values in rows with at least one change:
# csv-diff /path/to/file1.csv /path/to/file2.csv --key=column_name --show-unchanged
3. Display a summary of differences between files in JSON format using a specific column as a unique identifier:
# csv-diff /path/to/file1.csv /path/to/file2.csv --key=column_name --json