Course Code | COMP9319 | ||||||||
Course Title | Web Data Compression and Search | ||||||||
Convenor | Raymond Wong | ||||||||
Classes |
Live Lectures
:
Pre-recorded Lectures :
Timetable for all classes |
||||||||
Consultations |
Consultation time
|
||||||||
Units of Credit | 6 | ||||||||
Course Website | http://cse.unsw.edu.au/~cs9319 | ||||||||
Handbook Entry | http://www.handbook.unsw.edu.au/postgraduate/courses/current/COMP9319.html | ||||||||
Issues | email cs9319@cse.unsw.edu.au | ||||||||
Student Reps | email stureps@cse.unsw.edu.au ... to raise major issues about the course |
As the amount of Web data increases, it is becoming vital to not only be able to search and retrieve this information quickly, but also to store it in a compact manner. This is especially important for mobile devices which are becoming increasingly popular. Without loss of generality, within this course, we assume Web data (excluding media content) will be in XML and its like (e.g., HTML, JSON).
This course aims to introduce the concepts, theories, and algorithmic issues important to Web data compression and search. The course will also introduce the most recent development in various areas of Web data optimization topics, common practice, and its applications. The course is composed of the following parts:
If time allows, we may cover optional topics such as: streaming algorithms, text analytics, Web data optimization for mobile devices. The lecture materials will be complemented by two programming assignments and numerous tutorial-type, written exercises.
The official prerequisite of this course is COMP2521/ COMP1927 / COMP9024, i.e., Data Structures and Algorithms.
Furthermore, at the start of this course students should be able to:
After completing this course, students will:
There are two programming assignments and one final exam in the course. Written exercises will also be provided and relate to the fundamentals of data compression. The programming assignments will relate to the applications of Web data compression and search techniques and/or their programming.
This course contributes to the development of the following graduate capabilities:
Graduate Capability | Acquired in |
Scholars capable of independent and collaborative enquiry, rigorous in their analysis, critique and reflection, and able to innovate by applying their knowledge and skills to the solution of novel as well as routine problems |
Exercises,
Programming assignments |
Entrepreneurial leaders capable of initiating and embracing innovation and change, as well as engaging and enabling others to contribute to change | Programming assignments, Class discussions |
Professionals capable of ethical, self- directed practice and independent lifelong learning |
Exercises,
Provided readings |
Global citizens who are culturally adept and capable of respecting diversity and acting in a socially just and responsible way | Case studies, Class discussions |
There are three primary modes of learning for this course: lectures, written exercises, programming assignments.
Each week there will be four hours of lectures (up to 2 hrs of a live lecture and approximately 2-3 hrs of pre-recorded lectures) during which concept/theory, steps, examples, and case studies will be presented. You will get maximum benefits from lectures if you attend the live (in person) lectures and participate in the discussions during the live lectures.
There won't be tutorials in this course for 2023T2, but tutorial-type, written tutorial exercises will be made available. You are expected to work through these yourself. Note: none of these are assessable, but we are assuming that you will be interested enough in the topics to actually do them without the need for assessment-based incentives. It is definitely in your own best interest to keep up-to-date with these exercises. There will be many consultation sessions (of a hybrid tutorial/consultation style) to help and answer questions regarding these exercises and lecture materials.
You must make use of (a) the discussions during the lectures, (b) consultations, (c) Web forums (for short questions), in order to get your questions answered.
There are two programming assignments for this course. They contribute 50% of the overall mark for this course, and are tentatively scheduled as follows:
# | Description | Due | Marks |
1 | Programming assignment 1 (fundamental) | Week 5 | 15% |
2 | Programming assignment 2 (compression and search) | Week 9 | 35% |
Assignments will be completed
individually
;
this means that you should do them
yourself
without assistance from others,
except for asking advice from the Lecturer or Tutor.
As noted above, assignments are the primary vehicle for learning
the material in this course. If you don't do them, or simply
copy and submit someone else's work, you have wasted a
valuable learning opportunity.
Assignments are to be submitted via "give" before the specified time on the due date. Assessment of assignments will be primarily based on how accurately they satisfy the requirements; this means that most of the marks will be based on automatic marking. However, we may also manually examine submitted assignments to determine (a) whether they are written with good style, (b) how closely they satisfied the requirements, if time allows.
The penalty for late submission of assignments will be 5% (of the worth of the assignment) subtracted from the raw mark per day of being late. In other words, earned marks will be lost. For example, assume an assignment worth 20 marks is marked as 18, but had been submitted two days late. The late penalty will be 2 marks, resulting in a mark of 16 being awarded. No assignments will be accepted later than 5 days after the original deadline. For example, if you have your special consideration granted by UNSW for a one-week extension, there will be no late penalty if the assignment is submitted within 7 days after the original deadline. However, no further late submissions will be accepted after these 7 days.
The learning foci in this course are primarily lectures (introduction, illustration, discussion), written exercises (theoretical knowledge) and programming assignments (practical knowledge). In 2023T2, consultations will further explain the course materials, go through selected exercise questions, and answer student questions in a group setting. The course will have an emphasis on the fundamental techniques and their advantages / disadvantages in the context of real applications. Students will learn the main contents of the course through lectures. Written exercises and consultations are available to assist students to obtain in-depth understanding of course materials.
Finally, the primary learning focus in this course is the programming assignments, which have been designed to be challenging and relevant (i.e., aim to strengthen their understanding and develop practical skills on Web data compression and search).
We will be providing some exercises and we expect you will do them in your own time.
Naturally, sometimes people will "get stuck" on understanding various aspects of the material, so we have several consultations every week where you can ask questions that can be explained in detail. We also have Forums where you can ask short questions (e.g., clarifications) that can be precisely explained, and where we can share these Q&As with other students. If you have a problem that has not been addressed in the lectures / can't be handled easily using the Forums, please attend one of the available consultations. Details on how the consultations run in 2023T2 will be discussed in the first live lecture.
The Student Code of Conduct ( Information , Policy ) sets out what the University expects from students as members of the UNSW community. As well as the learning, teaching and research environment, the University aims to provide an environment that enables students to achieve their full potential and to provide an experience consistent with the University’s values and guiding principles. A condition of enrolment is that students inform themselves of the University’s rules and policies affecting them, and conduct themselves accordingly.
In particular, students have the responsibility to observe standards of equity and respect in dealing with every member of the University community. Behaviour that is considered in breach of the Student Code Policy as discriminatory, sexually inappropriate, bullying, harassing, invading another’s privacy, or causing any person to fear for their personal safety is serious misconduct and can lead to severe penalties, including suspension or exclusion from UNSW.
If you have any concerns, you may raise them with your Lecturer, or approach the School Ethics Officer , Grievance Officer , or one of the student representatives .
Plagiarism is defined as using the words or ideas of others and presenting them as your own. UNSW and CSE treat plagiarism as academic misconduct, which means that it carries penalties as severe as being excluded from further study at UNSW. There are several on-line sources to help you understand what plagiarism is and how it is dealt with at UNSW. If you haven't done so yet, please take the time to read the full text of
The pages below describe the policies and procedures in more detail:
Make sure that you read and understand these. Ignorance is not accepted as an excuse for plagiarism. In particular, you are also responsible for ensuring that your assignment files are not accessible by anyone but you by setting the correct permissions in your CSE directory and code repository, if using one (in particular, do not put assignment code in a public GitHub repository). Note also that plagiarism includes paying or asking another person to do a piece of work for you and then submitting it as your own work.
Furthermore, reproducing, publishing, posting, distributing or translating any assignment materials such as assignment specifications is an infringement of copyright and will be referred to UNSW Student Conduct and Integrity for action.
Generative Tools , such as Github Copilot and ChatGPT based on large language models or other generative artificial intelligence techniques, are becoming popular and used by some programmers. However, you need a good understanding of the language you are coding in and the systems involved before you can effectively use these tools.
Using these tools to generate code for COMP9319 instead of writing the code yourself will hinder your learning. Therefore, you are not permitted to submit code generated by automatic tools such as Github Copilot, ChatGPT, Google Bard in COMP9319 for any assessment tasks such as assignments. Submitting code generated by Github Copilot, ChatGPT, Google Bard and similar tools will be treated as plagiarism.
There will be two places where your learning in this course will be assessed: assignments and the final exam.
Much as we dislike conflating the learning aspect of assignments with their assessment aspect, there will be marks for the assignment work. We would rather use the assignments entirely as learning vehicles and have no assessment associated with them, but I suspect that wouldn't result in a satisfactory outcome (i.e., nobody would do them).
In addition to the programming assignments, the final exam will also be a major assessment in this course and aims to test what you learned about data compression and search during the course of the semester. To pass this course, you are required to have satisfactory performance on the final exam even if you do very well on the assignments. In order words, you will not be able to pass this subject even if you achieve full marks out of all the programming assignments (i.e., a total mark of 50% for the course overall). In order to meet the hurdle requirement, you must score better than 40% on the final exam. Note that the hurdle will be enforced after any required scaling. The following formula describes precisely how the final mark will be computed and how the hurdle (satisfactory performance on the exam) will be enforced:
a1 = mark for assignment 1 (out of 15) a2 = mark for assignment 2 (out of 35) asgts = a1 + a2 (out of 50) exam = mark for final exam (out of 50) okEach = exam > 20 (after scaling) mark = a1 + a2 + exam grade = HD|DN|CR|PS if mark >= 50 && okEach = FL if mark < 50 && okEach = UF if !okEach |
If your work in this course is affected by unforeseen adverse circumstances, you should apply for Special Consideration. If your request is reasonable and your work has clearly been impacted, then
Note the use of the word "may". None of the above is guaranteed. It depends on you making a convincing case that the circumstances have clearly impacted your ability to work.
UNSW handles special consideration requests centrally (in the Student Lifecycle division), so all special consideration requests must be submitted via the UNSW Special Consideration website.
Special consideration requests must be accompanied by documentation, which will be verified by Student Lifecycle. Do not email the course convenor directly about special considerations.
If you cannot attend the Final Exam because of illness or misadventure, then you must submit a Special Consideration request, with documentation, through MyUNSW within 24 hours of the exam. If your request is reasonable, then you will be awarded a Supplementary Exam
Note that UNSW expects you to be available to sit Supplementary Exams if required. If you are awarded a Supplementary Exam and do not attend, then your exam mark will be zero.
For further details on special considerations, see the UNSW Student website .
If you are registered with Disability Services, please forward your documentation to Course Convenor within the first two weeks of the term.
Note that if you attend the exam, but fall ill during it, you most likely will not be awarded a Supp Exam.
The following is an
approximate
guide to the sequence of topics in this course. It is subject to change as the term progresses.
Week | Lectures | Assignments |
1 | Introduction, basic information theory, basic compression | |
2 | More basic compression algorithms |
|
3 | Adaptive Huffman; Overview of BWT | a1 released |
4 | Pattern matching and regular expression |
|
5 | FM index, backward search, compressed BWT | a1 due; a2 released |
6 | - |
|
7 | Suffix tree, suffix array, the linear time algorithm |
|
8 | XML overview; XML compression | |
9 | Graph compression; Distributed Web query processing | a2 due |
10 | Optional advanced topics; Course Revision |
|
|
|
|
|
|
There will be no textbook used in this course. Lecture slides and supplementary readings will be provided and used.
You may find the readings below useful as reference materials:
You will also find your previous textbooks on data structures and/or algorithms useful, in case you need to refer to the fundamentals of data structures and algorithms for text processing.
Last but not least, we will heavily refer to the W3C specs on:
This course is evaluated each session using MyExperience.
The MyExperience evaluation from the last time I taught this course showed that students were satisfied with all aspects of the course. Thus we maintain a similar style and structure for this term. Since this is the first time that we run this course after the pandemic (from totally online back to hybrid/in-person), we will go through the in-depth topics in the recorded lectures and discuss more examples and/or practical considerations in the live lectures. Please note that your feedback is important and will be considered to improve future offerings of this course (e.g., how much content can remain online).
Students are also encouraged to provide informal feedback during the term and let the lecturer know of any problems, as soon as they arise. Suggestions will be listened to very openly, positively, constructively, and thankfully, and every reasonable effort will be made to address them as soon as possible.
Resource created Friday 12 May 2023, 02:32:40 PM, last modified Friday 26 May 2023, 02:32:00 PM.