• Marks of assignment4

    Posted by Xin Cao Monday 28 November 2016, 08:38:58 PM.

    Dear All,

    The field of "ass4penalty" only affects the late submissions. If you did submit your assignment before the deadline, there will be NO penalty.

    The mark of assignment4 is computed as: ass4mark = (ass4p1mark+ass4p2mark)*ass4penalty.

    If you have any doubt of the mark, please contact me as soon as possible. I will need to finalise the marks before this Thursday.

    If you have finished all the assignments on your own, you should perform well in the exam and be able to obtain a good final mark.



  • Exam questions samples

    Posted by Xin Cao Thursday 03 November 2016, 04:33:31 AM.

    Dear All,

    As you know, this is the first exam of comp9313. Therefore, I hope that you could understand that it is not easy to provide many practical questions to you.

    You can download some sample questions at . I hope that the file could be helpful to your exam. Remember that they are just sample questions, and the exam questions will be more complicated and harder to solve. These samples aims to let you know how to answer questions in the exam, such as the format of pseudo-code.

    Some other notes below:

    1. The marks of assignment2 are released. Recently I am not in campus, and it is extremely slow to connect to the university's vpn to use SMS. I will try my best to release the marks of assgnment3 soon.

    2. Remember to check your bill in AWS... Do not run out of your credits.

    3. Please finish the survey in CATEI and myExperience if you haven't done so. Your feedback matters a lot to this course. I appreciate much for any comments from you!

    Kind regards,


  • More Notes on the Second Problem of Assignment 4

    Posted by Xin Cao Tuesday 01 November 2016, 04:43:32 AM.

    Hi All,

    1. It seems that most of you are confused about the input and output of the second problem. I try to make this more clear.

    a). The two graphs are processed separately, not together.

    b). The input is the file given to you, which is the adjacency list of the graph. You should specify the input file in the parameters as "S3://YOUR_BUCKET/problem2/NA.cedge.txt". When you test your code in your local computer, you could use "hdfs://localhost:9000/user/comp9313/input/NA.cedge.txt"

    c). The output is an HDFS folder, which is used to store the final results. For example, in the cluster, if the query node ID is 0 and you specify the name of the folder as "output", the result should be stored in a folder "HDFS_HOME/output/NA0". If you test your code in your local machine, the output is stored in folder "hdfs://localhost:9000/user/comp9313/output/NA0".

    If you follow the template, the intermediate data generated in each iteration is stored in "HDFS_HOME/outputXXXXX", where "XXXXX" is the system time. However, you can decide your own location to store the intermediate data, such as the HDFS tmp folder. This step will not be evaluated.

    DO NOT use S3 to store the intermediate data!!! You need to pay a lot for the data transfer between your cluster and S3!

    One run of your code will only process ONE query node on ONE graph.

    2. You can either use a MapReduce job to extract the final result, or use HDFS API to do this job.

    3. If you use MapReduce to extract the final result, you will need to pass the query node ID to the reducer (Hint: You can use the configuration object to do this). QueryNodeID, TargetNodeID, and Distance are separated by space, not "\t" (Hint: In the reducer, you can emit a key-value pair where the key is null and the value is the whole line.)

    4. In the final output, please ignore the unreachable nodes (whose distance to the source node is infinite), and sort the nodes according to their numeric value.



Upcoming Due Dates

There is nothing due!

Back to top

COMP9313 16s2 (Big Data Management) is powered by WebCMS3
CRICOS Provider No. 00098G