This repository contains a MapReduce program written in Java to find the maximum temperature from a given dataset. The program uses Hadoop MapReduce framework to process large amounts of data in parallel on a cluster of commodity hardware.
- Java 8 or higher
- Hadoop 2.7.1 or higher
The input dataset is assumed to be in the following format:
where station_name
is a string representing the name of the weather station, year
is a string representing the year in yyyy
format, and max temperature
is a float representing the temperature in Fahrenheit.
To run the MapReduce program, you need to first create a jar file using the following command:
This will create a mapreduce-1.0-SNAPSHOT.jar
file in the home
directory. You can then run the program using the following command:
$ hadoop jar home/mt.jar MaxTemperature <input_path> <output_path>
where <input_path>
is the path to the input dataset and <output_path>
is the path to the output directory where the maximum temperature will be written.
The output of the program is a single line containing the maximum temperature and the date on which it occurred. The output is in the following format:
If you have any questions or suggestions, please feel free to connect with us.