Please check your marks and contact me as soon as possible if you have any question. I need to submit the final marks to the school by this Friday.
Your exam marks are scaled by a factor S considering the difficulty. Your final mark is computed as 2 * proj_mark * exam_mark * S/(proj_mark + exam_mark * S). You should be able to pass if you obtained 30+ marks in the final exam. If you obtained over 60 marks, you may get HD!
Please follow the lab instructions to work on AWS.
Test everything on your local machine first. Run your program on AWS after you believe that your program can generate correct results and is efficient enough. 100 dollars should be enough for you to finish project 4.
Remember to terminate the cluster after you finish your jobs!!! Otherwise you may need to pay the extra costs incurred.
BTW, please help provide some feedback in myExperience. Any suggestions or comments would be greatly appreciated.
1. The screenshot submitted should contain the cluster information and the runtime, like the figure below:
2. You CANNOT create a global variable to share the information across different mappers and reducers. That may work on your local machine, but it will fail in AWS. You need to use the Configuration object to pass the information from the main function to the mappers/reducers.
3. You CANNOT use LSH to do this project, since that can only obtain approximate results. This project requires exact set similarity join results. Please follow the slides and the paper as introduced during the lecture.
4. As mentioned in the lecture, please briefly describe your optimization techniques in a file named as "Optimization.pdf". I've already updated the project specification.
BTW, you can check your marks of project 3 now. Contact as soon as possible if you have any question.