Contents

Course Details

Course Code COMP6714
Course Title Information Retrieval and Web Search
Convenor Raymond Wong
Classes Lectures :
Wed 13:00 - 15:00 Live Online Wk 1-5,7-10
Anytime
Recorded Videos Wk 1-5,7-10

Timetable for all classes
Consultations Consultation time

Units of Credit 6
Course Website http://cse.unsw.edu.au/~cs6714
Handbook Entry http://www.handbook.unsw.edu.au/postgraduate/courses/current/COMP6714.html
Issues email cs6714@cse.unsw.edu.au
Student Reps email stureps@cse.unsw.edu.au ... to raise major issues about the course

Course Summary

This course aims to introduce the concepts, theories, and algorithmic issues important to Information Retrieval. If time allows, the course will also cover some recent topics and common practices. The course is composed of the following parts:

Information Retrieval:

  1. Document modeling
  2. Inverted index construction and compression
  3. Vector space model and ranking methods
  4. Probabilistic and language models
  5. Evaluation methods
  6. Relevance feedback and query expansion.

Web Search:

  1. Web search engine architecture
  2. Web crawling and indexing
  3. Web structure and usage analytics.

The lecture materials will be complemented by a non-programming assignment and a programming project.

Assumed Knowledge

The official prerequisite of this course is COMP9020 and COMP9024 for postgraduates; and (MATH1081 and (COMP1531 or COMP2041)) or (COMP1927 or COMP2521) for undergraduates. That is, we assume you have:

  • experience with procedural programming, and an understanding of a range of data structures (e.g., trees, graphs, hash-tables) and algorithms (e.g., sorting, divide-and-conquer); and
  • knowledge of discrete mathematics, including sets, logic, functions and relations, and graphs and trees.

Furthermore, at the start of this course students should be familiar with Python programming:

  • produce correct programs in Python, i.e., coding, running, testing, etc.;
  • produce readable code with clear documentation; and
  • appreciate the use of abstraction in computing.

Student Learning Outcomes

After completing this course, students will:

  • understand the whole process of information retrieval and search engines
  • understand various document and retrieval models used in information retrieval
  • understand various indexing and query processing tehniques and their variants
  • develop solutions for real problems using existing technologies
  • appreciate the past, present and future of information retrieval and search engine technologies

There are one non-programming assignment; one programming project; and one final exam in the course. Some Python examples and exercises will also be provided and relate to the important basics and selected topics. The assignment and project will relate to the knowledge, techniques and/or applications of information retrieval.

This course contributes to the development of the following graduate capabilities:

Graduate Capability Acquired in
Scholars capable of independent and collaborative enquiry, rigorous in their analysis, critique and reflection, and able to innovate by applying their knowledge and skills to the solution of novel as well as routine problems Assignment,
Project
Entrepreneurial leaders capable of initiating and embracing innovation and change, as well as engaging and enabling others to contribute to change Project, Class discussions
Professionals capable of ethical, self- directed practice and independent lifelong learning Examples/Exercises,
Provided readings
Global citizens who are culturally adept and capable of respecting diversity and acting in a socially just and responsible way Case studies and/or Class discussions

Teaching Strategies

There are four primary modes of learning for this course: lectures, non-assessable examples/exercises, a non-programming assignment, and a programming project.

Lectures

Each week there will be up to four hours of lectures (approximately 1-2 hours live online, plus 2-3 hours of pre-recorded videos) during which concept/theory, steps, examples, and/or case studies will be presented. You will get maximum benefits from lectures if you attend the (live online) lectures and participate in the discussions during the live online lectures.

There won't be tutorials or labs in this course for 2022T3, but some Python examples/exercises will be made available. You are expected to work through these yourself. Note: none of these are assessable, but we are assuming that you will be interested enough in the topics to actually do them without the need for assessment-based incentives. It is definitely in your own best interest to keep up-to-date with these examples/exercises. There will be many consultation sessions to help and answer questions regarding these examples/exercises and lecture materials. Selected questions from the consultations, if there are enough requests, will be discussed in the live online lectures.

You must make use of (a) the discussions during the lectures, (b) consultations, (c) Web forums (for short questions), in order to get your questions answered.

Assignment and Project

There are one non-programming assignment and one programming project for this course. They contribute 50% of the overall mark for this course, and are tentatively scheduled as follows:

# Description Due Marks
1 Assignment (non-programming) Week 5 20%
2 Project (programming) Week 9 30%


The assignment and project will be completed individually ; this means that you should do them yourself without assistance from others, except for asking for advice from the Lecturer or Tutor. As noted above, the assignment and project are the primary vehicles for learning the material in this course. If you don't do them, or simply copy and submit someone else's work, you have wasted a valuable learning opportunity.

The assignment and project are to be submitted via "moodle" or "give" before the specified time on the due date. Their assessment will be primarily based on how accurately they satisfy the requirements; this means that most of the marks will be based on automatic marking. However, we may also manually examine the submitted assignment/project to determine (a) whether they are written with good style, (b) how closely they satisfied the requirements, if time allows.

The penalty for late submission will be 5% (of the worth of the assessment) subtracted from the raw mark per day of being late. In other words, earned marks will be lost. For example, assume an assignment worth 20 marks is marked as 18, but had been submitted two days late. The late penalty will be 2 marks, resulting in a mark of 16 being awarded. No submissions will be accepted later than 5 days after the deadline.


Teaching Rationale

The learning foci in this course are primarily lectures (introduction, illustration, discussion); the non-programming assignment and the programming project (theoretical and practical knowledge). In 2022T3, online consultations will further explain the course materials, go through selected examples / questions, and answer student questions in a group setting. The course will have an emphasis on the fundamental techniques and their advantages / disadvantages in the context of real applications. Students will learn the main contents of the course through lectures. Examples and consultations are available to assist students to obtain an in-depth understanding of course materials.

Finally, the primary learning focus in this course is the programming project, which has been designed to be relatively challenging and relevant (i.e., aims to strengthen their understanding and develop practical skills in information retrieval). We will also be providing some Python examples/exercises and we expect you will try them out in your own time.

Naturally, sometimes people will "get stuck" on understanding various aspects of the material, so we have many (online) consultations where you can ask questions that can be explained in detail. We also have Forums where you can ask short questions (e.g., clarifications) that can be precisely explained, and where we can share these Q&As with other students. If you have a problem that has not been addressed in the lectures / can't be handled easily using the Forums, please attend one of the available consultations. Details on how the online consultations run in 2022T3 will be discussed in the first lecture.

Student Conduct

The Student Code of Conduct ( Information , Policy ) sets out what the University expects from students as members of the UNSW community. As well as the learning, teaching and research environment, the University aims to provide an environment that enables students to achieve their full potential and to provide an experience consistent with the University’s values and guiding principles. A condition of enrolment is that students inform themselves of the University’s rules and policies affecting them, and conduct themselves accordingly.

In particular, students have the responsibility to observe standards of equity and respect in dealing with every member of the University community. Behaviour that is considered in breach of the Student Code Policy as discriminatory, sexually inappropriate, bullying, harassing, invading another’s privacy, or causing any person to fear for their personal safety is serious misconduct and can lead to severe penalties, including suspension or exclusion from UNSW.

If you have any concerns, you may raise them with your Lecturer, or approach the School Ethics Officer , Grievance Officer , or one of the student representatives .

Plagiarism is defined as using the words or ideas of others and presenting them as your own. UNSW and CSE treat plagiarism as academic misconduct, which means that it carries penalties as severe as being excluded from further study at UNSW. There are several on-line sources to help you understand what plagiarism is and how it is dealt with at UNSW:

Make sure that you read and understand these. Ignorance is not accepted as an excuse for plagiarism. In particular, you are also responsible for ensuring that your assignment files are not accessible by anyone but you by setting the correct permissions in your CSE directory and code repository, if using one (in particular, do not put assignment code in a public GitHub repository). Note also that plagiarism includes paying or asking another person to do a piece of work for you and then submitting it as your own work.

UNSW has an ongoing commitment to fostering a culture of learning informed by academic integrity. All UNSW staff and students have a responsibility to adhere to this principle of academic integrity. Plagiarism undermines academic integrity and is not tolerated at UNSW. Plagiarism at UNSW is defined as using the words or ideas of others and passing them off as your own.

If you haven't done so yet, please take the time to read the full text of

The pages below describe the policies and procedures in more detail:

Assessment

There will be three places where your learning in this course will be assessed: the assignment, the project, and the final exam. Much as we dislike conflating the learning aspect of assignments with their assessment aspect, there will be marks for the assignment and project work. We would rather use the assignments entirely as learning vehicles and have no assessment associated with them, but I suspect that wouldn't result in a satisfactory outcome (i.e., nobody would do them).

In addition to the assignment and the project, the final exam will also be a major assessment in this course and aims to test what you learned about information retrieval during the course of the term. To pass this course, you are required to have satisfactory performance on the final exam even if you do very well on the assignment and the project. In order words, you will not be able to pass this subject even if you achieve full marks out of both the assignment and the project (i.e., a total mark of 50% for the course overall). In order to meet the hurdle requirement, you must score better than 40% on the final exam. Note that the hurdle will be enforced after any required scaling. The following formula describes precisely how the final mark will be computed and how the hurdle (satisfactory performance on the exam) will be enforced:

asgt       = mark for the assignment    (out of 20)
proj       = mark for the project       (out of 30)
exam       = mark for final exam        (out of 50)
okEach     = exam > 20                  (after scaling)
mark       = asgt + proj + exam
grade      = HD|DN|CR|PS  if mark >= 50 && okEach
           = FL           if mark <  50 && okEach
           = UF           if !okEach
        

Special Consideration

If your work in this course is affected by unforeseen adverse circumstances, you should apply for Special Consideration. If your request is reasonable and your work has clearly been impacted, then

  • for the assignment / the project, you may be granted an extension
  • for the Final Exam, you may be offered a Supplementary Exam

Note the use of the word "may". None of the above is guaranteed. It depends on you making a convincing case that the circumstances have clearly impacted your ability to work. UNSW handles special consideration requests centrally (in the Student Lifecycle division), so all special consideration requests must be submitted via the UNSW Special Consideration website. Special consideration requests must be accompanied by documentation, which will be verified by Student Lifecycle. Do not email the course convenor directly about special considerations.

If you cannot attend the Final Exam because of illness or misadventure, then you must submit a Special Consideration request, with documentation, through MyUNSW within 24 hours of the exam. If your request is reasonable, then you will be awarded a Supplementary Exam. Note that UNSW expects you to be available to sit Supplementary Exams if required. If you are awarded a Supplementary Exam and do not attend, then your exam mark will be zero. Note that if you attend the Final Exam, but fall ill during it, you most likely will not be awarded a Supplementary Exam.

For further details on special considerations, see the UNSW Student website . If you are registered with Disability Services, please forward your documentation to Course Convenor within the first two weeks of the term.


Course Schedule

The following is an approximate guide to the sequence of topics in this course. It is subject to change as the term progresses.

Week Topic
1 Introduction, Boolean Retrieval
2 Preprocessing
3 Index Construction
4 Compression
5 Vector Space Model
6 -
7 Evaluation
8 Crawling
9 Link Analysis
10 Optional Topics, Revision

Course Materials/Resources

Lecture slides will be posted on the course website. These slides summarise the major contents and help you understand the materials when you read the textbook later. You definitely need to read the corresponding chapters in the textbook to gain a full understanding of the lectures.

  • [MRS08] Christopher D. Manning, Prabhakar Raghavan and Hinrich Schutze, Introduction to Information Retrieval, Cambridge University Press. 2008.
  • [CMS09] W. Bruce Croft, Donald Metzler, and Trevor Strohman, Search Engines: Information Retrieval in Practice. Pearson. 2009.

Reference books for this course are:

  • [JM19] Dan Jurafsky, James H. Martin, Speech and Language Processing (3rd ed. draft). 2019.
  • [BCC10] Stefan Buettcher, Charles L. A. Clarke, Gordon V. Cormack, Information Retrieval: Implementing and Evaluating Search Engines, The MIT Press. 2010.
  • [BB99] Ricardo Baeza-Yates and Ber thier Ribeiro-Neto, Modern Information Retrieval, Addison Wesley. 1999.

Useful Resources:

Course Evaluation and Development

This course is evaluated each session using MyExperience. Based on the MyExperience comments from the previous offering, in this term, the assignment's due date will be brought forward and the course will more focus on applied topics.

Since this is the first time that I take over and run this course, the course will be maintained with a similar style and structure as before. Therefore, your feedback is particularly important and will be considered to improve future offerings of this course.

Students are also encouraged to provide informal feedback during the term and let the lecturer know of any problems, as soon as they arise. Suggestions will be listened to very openly, positively, constructively, and thankfully, and every reasonable effort will be made to address them as soon as possible.

Resource created Sunday 04 September 2022, 03:48:22 PM, last modified Monday 26 September 2022, 10:55:54 PM.


Back to top

COMP6714 22T3 (Information Retrieval and Web Search) is powered by WebCMS3
CRICOS Provider No. 00098G