Notices

  • FAQs of the final exam (kept being updated during the exam)

    Posted by Xin Cao Tuesday 22 August 2023, 08:04:48 AM, last modified Tuesday 22 August 2023, 11:18:42 AM.

    Dear All,

    This notice is used to publish FAQs for the final exam and will be kept updated during the exam.

    Please do not post publish questions in Ed, since it will disturb the other students. Please send me emails directly.

    1. You can submit multiple times before the due time.

    2. You need to submit a single zip file, including the following documents: a single pdf containing the answers, and two Python scripts for Question 3.

    3. The time zone should be AEST, not AEDT. I am sorry for the typo. Please ignore that. I have updated the exam paper already.

    4. In Question 3.(a), "T3 SensorA 11.0" should be "T3 SensorA 12.0". You do not need to worry about the precision problem.

    5. In Question 4.(ii), there is also a typo: the second hash function should be (7n+2) mod M. I have updated the exam paper Already. Note that you need to use WORDS as tokens. The shingles should be unique and have no duplicates.

    6. In Question 2.(a), it means that you need to find out the maximum number of characters in a term contained in the news articles from that year. For example, you have "2017: The Australian brings you the latest Australian news as well as latest politics". The longest length of terms is 10. If a term spans over two lines, in the first line there will be no space character at the end. The news articles always start with "YEAR: ", like "2017: ". You need to check the beginning of a line to see if it is a new article.

    7. In Question 6.(a), you can only show the updates of the distances and ignore the paths. This could save you some time.

    8. We already received several submissions. However, I still would like to give all of you 10 more minutes to prepare for the submission. Please manage your time well, since the submission due is approaching.

    Regards,

    Xin

  • Final Exam Tomorrow

    Posted by Xin Cao Monday 21 August 2023, 08:11:36 PM.

    Dear All,

    A quick reminder that the final exam is scheduled for tomorrow, between 8:00 AM and 12:00 PM (AEST) on August 22th. We will release the final exam paper on WebCMS and Moodle at 7:55 AM (5 minutes prior to the exam start time) to allow extra time for downloading and uploading. Remember that you are required to submit your exam on Moodle.

    Here are some important guidelines for the exam day:

    • If you have any concerns or questions about the exam questions, please send an email to me (xin.cao@unsw.edu.au) AND cc to the course admin (siqing.li@unsw.edu.au). Avoid posting queries on the forum to prevent distractions for others.
    • If any updates or clarifications are necessary for the questions, we will communicate them through WebCMS notices and the Exam tab in WebCMS.

    Please allocate 5 minutes towards the end of the exam for submission. Submissions received after 12:00 PM will incur a significant late penalty of 5 marks per minute (unless you have adjustments based on an Equitable Learning Plan). Please well managed your time during the exam. Answer the easy questions first.

    Kindly ensure that you are well-prepared for the exam. I want to emphasize that the exam must be attempted individually. Engaging in group work or using generative AI tools can lead to academic misconduct. For more information, please review the details at this link: https://www.student.unsw.edu.au/plagiarism .

    Wishing you all the best of luck in your exams.

    Kind regards,

    Xin

  • Exam time

    Posted by Xin Cao Tuesday 15 August 2023, 04:33:28 PM, last modified Tuesday 15 August 2023, 04:34:06 PM.

    Dear All,

    After a few rounds of communication with the school, I can confirm that the online exam time is 8:00 - 12:00 on 22nd August. We need to start the exam so early because some students also need to take an exam in the afternoon. The school requires that enough time should be given for the break and transportation. The exam paper will be available in both Moodle and WebCMS3.

    We will get all submissions of Project 3 tomorrow (because the longest extension is 7 days), and thus we may not be able to complete the marking before the final exam. I will run an extra consultation this Thursday at 1 pm, to take questions regarding the final exam and Project 3.

    Regards,

    Xin

  • 1-day Extension for Project 2

    Posted by Xin Cao Sunday 16 July 2023, 08:37:49 PM, last modified Sunday 16 July 2023, 09:37:34 PM.

    Dear All,

    I received several emails requesting an extension for Project 2.

    We have an extension for Project 1 because the installation and configuration of Hadoop are tricky, especially on Mac. However, Spark is much easier to use, and there is no problem with the working environment for Project 2.

    I can grant all of you a 1-day extension (the due time on Moodle is still midnight today). That is the best I can do since the deadline for Project 3 cannot be too close to the final exam.

    Those who can submit Project 2 today will get a 5% bonus mark.

    Project 3 will be released on Tuesday next week.

    Note: Please do your assessment independently. You can discuss the ideas but cannot share the code with each other. We have found some similar submissions in Project 1.

    Regards,

    Xin

  • Project 1 marks released

    Posted by Xin Cao Wednesday 12 July 2023, 11:47:40 AM.

    Dear All,

    The marks of Project 1 have been released. The tutors also provided some feedback to you. If you have any doubt about your mark, please get in touch with the tutor who marked your submission and cc the email to me. We will check the problem for you as soon as possible. The solution is also released on WebCMS3 for your reference.

    Please note that we run your code on Hadoop using the command provided in the project specification. Before contacting the tutors, please ensure you can obtain the correct results by running on Hadoop. In addition, if you have been granted an extension, but received a late submission penalty, please also contact us. We will update your mark accordingly.

    BTW, one test case has been released for Project 2. Please remember that the due date of Project 2 is this Sunday. Project 3 will be released after the deadline for Project 2.

    Regards,

    Xin

  • 3-days Extension for Project 1

    Posted by Xin Cao Tuesday 27 June 2023, 10:53:03 AM.

    Dear All,

    According to the feedback of the tutors, some students still have problems configuring Hadoop, and some students just began lab 1 in last week's lab. Considering this, a 3-days extension is given for Project 1. Please submit it before midnight this Thursday. Please come to the in-person labs this week to get some help if you still cannot start HDFS and YARN successfully.

    I need to emphasize again: Before working on the project, please watch the lecture recording about the inverted index construction problem and also complete lab 4. If you could understand how to utilize order inversion and secondary sort, you would be able to complete the task with a single MRStep.

    Since this is an assignment, we could not help you debug your code. All the hints are already given in the lab solutions actually. You can debug your code by using the sys.stderr.write() function to record some debugging information, which can be accessed in $HADOOP_HOME/logs/userlogs/ folder, the stderr log files in your MapReduce application log. I demonstrated how to do this in the lecture.

    Project 2 will be released tomorrow, and I will explain it in tomorrow's lecture.

    Regards,

    Xin



  • This Sunday is the last day of dropping courses

    Posted by Xin Cao Sunday 25 June 2023, 09:55:50 PM, last modified Tuesday 27 June 2023, 05:15:02 PM.

    Dear All,

    I hope that you have enjoyed the course so far.

    Just a reminder that after 11:59 on Sunday 25 June, you'll be charged a fee for enrolling in this course.

    If you feel that you have difficulties using Hadoop, doing the lab problems, working on the first project, or you do not quite like the course contents, you can consider dropping the course today. If you feel that you may not have enough time this term, you can consider taking this course in the future. The third project will require you to spend enough effort to get a good mark.

    If you would like to stay in the course, I would appreciate it a lot if you could provide some feedback to me either via email or in the forum (you can do this anonymously in a private post in Ed). I will try my best to improve the teaching of this course in the remaining weeks.

    Some FAQs of Project 1:

    1. You do not need to round the results. Use the provided test case to check the correctness of your solution.

    2. You do not need to achieve a global order when using multiple reducers. With multiple reducers, you will have multiple output files in HDFS. You just need to guarantee the order within each file.

    3. Remove the setting of "mapreduce.job.reduces" in JOBCONF, and so we can test your code with multiple reducers. JOBCONF will get this value from the command automatically.

    4. To make JOBCONF work, you need to run your code on Hadoop (with option -r hadoop). MRJob cannot sort and partition the mapper output when running locally.

    5. When you meet an error, please always first check the logs in $HADOOP_HOME/logs/userlogs. Watch the lecture recording for more detailed steps.

    6. When you yield a key-value pair, MRJob will generate a "\t" in between automatically. You do not need to handle that.

    7. Before working on the project, please watch the lecture recording about the inverted index construction example, and also complete lab 4. If you understand how to utilize order inversion and secondary sort, you will be able to complete the task with a single MRStep.

    8. If you define the steps() function, you need to put "SORT_VALUES = True" out of this function and put JOBCONF inside this function. I am sorry that the previous project code template has a mistake here.

    9. In JOBCONF, you also need to configure 'stream.num.map.output.key.fields' to tell MRJob how many fields you have in your mapper key. Otherwise, MRJob may not be able to work correctly.

    Regards,

    Xin

  • Consultation changed to Friday

    Posted by Xin Cao Thursday 22 June 2023, 10:56:40 AM, last modified Thursday 22 June 2023, 10:58:02 AM.

    Dear All,

    I am sorry that I could not do the consultation today.

    Because of a sudden sharp stomach pain, I will need to see GP later today.

    I will be available at 3:30 pm tomorrow. You can either come to my office or join the online consultation session.

    Regards,

    Xin

  • One more test dataset released for project 1

    Posted by Xin Cao Wednesday 21 June 2023, 04:52:49 PM.

    Dear All,

    As mentioned in the lecture, one more test case has already been released in WebCMS3. Please use it to check the correctness of your solution.

    You do not need to achieve the global sort order when running on multiple reducers. Just make sure that the order within each reducer is correct. Obtaining a total order in MRJob is not an easy task.

    Please work on the lab problems and then code for the project. If you can totally understand the lab solutions, you should be able to complete the project using one step with all the design patterns we have learned so far.

    You can submit multiple times in Moodle, and the last submission before the due time will be marked.

    If you still have problems of using Hadoop, please contact the teaching team as soon as possible. The submission due date is approaching.

    Regards,

    Xin

  • Please post your questions to Ed

    Posted by Xin Cao Monday 12 June 2023, 11:06:29 PM.

    Dear All,

    As mentioned in the first lecture, we will not use WebCMS3 as the discussion forum this term. Please post your questions to the Ed platform. The link can be found in the first lecture: https://edstem.org/au/join/uB64ta . Please let me know if you have difficulties of accessing Ed.

    The discussion forum in WebCMS3 will be deleted this week.

    Regards,

    Xin

  • The first coding project released.

    Posted by Xin Cao Monday 12 June 2023, 10:54:35 PM.

    Dear All,

    The first coding project is released already, and you need to use the knowledge learned in weeks 2 and 3 to complete the project. I will talk about more details about it in this Wednesday's lecture after we finish learning about MapReduce.

    Labs 2, 3, and 4 aim to help you work on this project. If you can understand the solutions to the lab problems, you should have no problem completing the project on Hadoop MapReduce.

    Considering that today is a public holiday, the deadline will be extended to midnight on Monday in week 5. Please submit a special consideration request if you need extra time due to some reasons.

    BTW, I have confirmed with the school that the final exam will be online. You will be given 4 hours for the exam.

    Regards,

    Xin

  • This week's consultation: 1pm - 2pm Friday

    Posted by Xin Cao Thursday 08 June 2023, 09:02:37 AM, last modified Thursday 08 June 2023, 09:06:40 AM.

    Dear All,

    I am sorry that I cannot do the consultation today. My daughter was sick and I will take her to the hospital.

    I also see several unread emails in my email box. I am sorry that I could not work much in recent days due to family issues. I will begin to process the emails later today. Thank you for your patience in waiting for the reply.

    Please note that we are not going to use WebCMS3 as the discussion forum this term. Please use the following link to log into Ed and ask your questions: https://edstem.org/au/join/uB64ta

    If you haven't completed the installation and configuration of Hadoop, please come to an in-person lab. That would be much easier for us to help you.

    Regards,

    Xin

  • Virtual machine image and Lab 1

    Posted by Xin Cao Monday 29 May 2023, 02:33:37 AM.

    Dear All,

    Our first lab will start on Tuesday in Week 1. If your OS is Windows, you will need to use the virtual machine (VirtualBox or VMware). The VM image can be downloaded at:

    https://mega.nz/file/SqIz1Jpb#Ay5ioC4EkiQgZVuVYDUL6hfO2LiBvsJxjTXX1qAnxrg

    https://drive.google.com/file/d/1ymUkS422jiNnEKU2witPb2fIL8wf6eME/view

    Please download the image asap and let me know if there is any problem with downloading the image. Another option is that you can download the Ubuntu OS and VirtualBox from the official website, and then install Ubuntu 22.04 in VirtualBox by yourself.

    Please watch out if you have enough memory in your host OS. You can decrease the memory allocated to the VM from 8G to 4G. Otherwise, you may encounter some errors.

    If your OS is Linux or Mac, you can install and configure Hadoop on your own laptop directly by following the lab instructions.

    Regards,

    Xin

  • Welcome to COMP9313

    Posted by Xin Cao Monday 29 May 2023, 02:28:31 AM.

    Dear COMP9313 Students,

    Welcome to COMP9313! Please see " Course Outline " in the left panel .

    The course will still be delivered both online and offline in T2. I hope that you will enjoy the course this term.

    Our first lecture is on Tuesday in Week 1. Note that the first lab also starts in Week 1. You will need to set up the Hadoop working environment either using a virtual machine or on your own laptop in the first lab.

    I look forward to seeing you all soon.

    Regards,

    Xin


Back to top

COMP9313 23T2 (Big Data Management) is powered by WebCMS3
CRICOS Provider No. 00098G