Don't forget to fill out the SES survey
  There should be a link on your course's Moodle page


  • Some FAQs about Project 1

    Posted by Xin Cao Friday 15 October 2021, 11:54:41 PM.

    1. I saw some requests for more test cases in the forum. Thus, one more test data is provided. If you use Java, the running script is also provided, including packaging your java files and run your jar file on Hadoop (please change the document number and the reducer number accordingly).

    2. Please use the pre-installed Hadoop at ~/hadoop. Lab 1 only aims to let you know how to install and configure Hadoop. Please delete the ~/workdir folder after you compete Lab 1, as well as the corresponding configurations in ~/.bashrc.

    3. Before you run mrjob code on Hadoop, please start both HDFS and YARN. Please check if you have configured YARN correctly by following the instructions of Lab 1, including two files: $HADOOP_CONF_DIR/mapred-site.xml and $HADOOP_CONF_DIR/yarn-site.xml.

    4. The "\t" means the tab character, not a string "\t". Because one tab character may take 4 or 8 space characters, in the editor and in the terminal the texts may be displayed differently.

    5. Please try your best to debug your code, and then ask questions. You can first test your code locally, and then run on Hadoop. Note that it is very possible that your code can generate correct results locally but fails on Hadoop. There must be something wrong due to the key partitions. In mrjob, you can first test your mapper and then test your reducer. To test the mapper, you can write a simple reducer which writes the mapper output directly to the reducers. By doing so, you will be able to know if your mapper can send the key-value pairs to the reducers as expected. After the mapper is OK, you can proceed to test your reducer.

    6. A variable defined in mapper_init/reducer_init and mapper/reducer has different scopes. If it is defined in mapper/reducer, it can only be seen within this mapper function call for the current input. If it is defined in mapper_init/reducer_init, it can be seen by all mapper/reducer functions within each mapper/reducer.

    7. It is strongly recommended to complete the two problems in Lab 4 first, and then work on the project. Otherwise, you will meet many problems during working on the project.

  • Some tips about the first project

    Posted by Xin Cao Saturday 09 October 2021, 02:06:13 AM.

    Dear All,

    You still have one weekend plus one week to work on the first project.

    1. Lab 4 is released already, which will help you writing codes for project 1, especially on how to use the partitioner and comparator class in mrjob (if you use java, the lab provides you some practices on defining a custom partitioner and defining an order for your keys).

    If you do not know how to work on the project now, please first complete the problems in Lab 4, and then you will have better ideas on solving the project problem.

    2. It is allowed to pass the number of documents as an argument in the python version of the project. To make it fair, if you use java, you are also allowed to do so. I have updated the project description for the java version. Please download the new document.

    3. I've made a mistake in slide 21 of Chapter 3.1 on how to use the partitioner class in mrjob. I have updated that slide, and please download a new version as well.



  • Change of the consultation time

    Posted by Xin Cao Saturday 02 October 2021, 11:24:38 PM.

    Dear All,

    Due to that the previous consultation time collides with some students' lab session, the consultation time will be changed from 4pm-5pm on Thursday to 12pm-1pm on Tuesday (after the lecture) from the next week.

    The project will be released on next Monday, and you will have two weeks working on it. I suggest you first complete the lab programming activities and then work on the project.

    Please always remember to download a new version of the lecture slides after each lecture, since I may do some edits (such as correcting the typos or adding a few new slides).



Upcoming Due Dates

There is nothing due!

Back to top

COMP9313 21T3 (Big Data Management) is powered by WebCMS3
CRICOS Provider No. 00098G