Week 1 of emergency-online-teaching DB systems

Week 1 Lessons covered: Hashing (Cuckoo, Extendible, Linear) and Buffer Pool Management.

Lessons learned from Assessments

The weekly assessments are a mix of auto-graded (e.g. what value causes an infinite loop in a given cuckoo hashing structure) and open-ended questions (e.g. given the pin-counter, can you do away with the dirty bit?). I used Gradescope’s online assignment feature and it worked wellI highly recommend it. With 25 students, I could still read through and grade open-ended questions — it did take most of my Sunday morning!

I found that students came to Zoom’s office hours to validate their understanding of how certain algorithms worked. So the auto-graded aspects did ensure that students worked through these algorithms individually.

The open-ended questions showed me gaps that need to be addressed. Usually, these questions explored different design tradeoffs (e.g. students often drew an analogy between a B+ tree’s fill factor and a hashing table’s utilization factor and split on a rather low utilization threshold of 2/3.) It is spring break this week but next week my in-class Zoom session will target these specific areas such as why would a DBMS rely on an OS file cache and what are some side-effects/problems that could occur?

Re: Academic Integrity: We discussed online academic integrity at length during our Zoom class and I had my students digitally pledge to maintain (using a piazza poll) online academic integrity. It is definitely worth while to do so. It may seem obvious but there are things that aren’t. For example, asking students not to record zoom sessions without explicit permission and consent from others especially if you have students that would not participate if they know they are being recorded, etc.

Lessons learned from office hours

I guess what would have helped most during office hours was to have back-up worked examples, especially if your weekly unit covers specific data-structures/algorithms, etc. I had a student ask about why do you need local-depth in extendible hashing. As I tried to explain that you need to know at local overflows if you need to double the slot directory or not and if local-depth < global-depth, you don’t need to, it became clear that it would have been much easier (less confusing) if I worked through some ready-made examples on the screen.

Lessons learned from the forum

Students do like to help each other. My students shared links to external materials that helped them understand something better than the videos, assigned reading materials or textbook. They also answered each other’s questions much faster (maybe even better) than I would have.

If you don’t have a class forum, start one!

General student feedback

Some students returned to their home countries to stay with family. Others got caught in between the global travel bans and are stuck (neither at home, nor in the UAE!) I received some extension requests but even those students worked through most of the material and online assessments. I think the students do appreciate/prefer the new format. Students sent “thank you” notes. This made me feel much better about the asynchronous approach at least for this week.

More weekly updates to come.


Switching to an online DB systems course on short notice

Disclaimer. I have never taught online. With only 1-week of prep, we were asked to switch to online teaching. I am quickly learning how to do so from trying to figure out how best to deliver content, to how to assess student understanding and work around the many challenges of an online environment. I assume other faculty will be in a similar situation as COVID-19 continues to spread and more countries take precautionary measures such as closing university campuses. Here are the steps I am taking and I hope this blog helps others who are in a similar position and need to teach a Database Systems course. I will also provide the lessons I learn as I go through this.

Pre-online status. We have covered the relational model, relational algebra, SQL, database design, normalization and FDs, and have started the database architecture unit. We had just finished access methods. My lecture style encourages lots of student participation with different in-class exercises. My class has 25 students. Labs (SimpleDB labs) and problem sets can be done in pairs.

Key decision point. You need to decide early on how you intend to proceed with your online class: synchronous, asynchronous or a hybrid of both.

Option 1: Synchronous model. You continue to meet during your lecture sessions and deliver your lectures as before on an online environment like Zoom.

Option 2: Asynchronous model. You prepare recorded lectures and engage students through forum discussions.

Option 1 is difficult to run with 25+ students. It isn’t clear how effective the “talking head” is in-terms of learning: will students digitally raise hands and ask clarifying questions? will students tune-out? Option 1 will never mimic your in-person classroom experience. Moreover, with many students potentially returning home to different parts of the world during closures, it is not clear how many can easily zoom into the classroom. You can record your zoom sessions but I am concerned about how likely students will ask questions or participate if sessions are recorded. Also, as a lecturer, the amount of preparation for a recorded session is much higher than a regular one. I initially considered this approach under the assumption that the university closure will only last for a month with a scheduled 1-week spring break any way. However, on further discussions with other faculty, my optimism wained: it is possible that closures will last for the entire semester and if some of your students left the country, they may not be able to return in time. Option 2 involves a fair amount of work to ensure students are posting questions and engaging in online forums, especially if you haven’t established an online posting culture in your class early on.

Option 3 Hybrid model. I opted for this model to select the best of both worlds. I will be using Professor Andy Pavlo’s recorded lectures from his Fall 2019 Database Systems course as well as Joe Hellerstein’s CS186 Berkeley online recorded lectures. Both lecture sets are excellent. They are at the right level for my class and at the right pace. You might have to pick and choose the material and reorder it to better fit your planned syllabus. Andy’s lectures are not segmented by topic, which means that listeners might loose attention/tune-out. I am creating online lessons as follows: I am using short video segments on a focused topic (I’m taking parts of Andy’s lectures or using Joe’s segments directly). After each topic, I will ask 1-2 short questions worth a few points. These will contribute to each student’s final grade and will help me assess student learning and participation. I will be including my online lessons in this blog with links to the segmented videos and the post-video questions.

Instead of meeting twice a week, I’m planning to meet once a week instead to discuss problem areas and conduct some in-session activities (Zoom’s breakout rooms should help with this). I reduced the number of meetings for two reasons (i) to allow students more time to go through the weekly video lectures and readings and to answer sub-topic questions, and (ii) to provide me more time to prepare the weekly lessons, and to determine problem areas for in-zoom discussion from solutions to sub-topic questions.

Assessment. With two midterms to go, a group project, a lab and another problem set, I am rethinking my assessment strategy.

My current plan is as follows: (i) Replace the group project with students individually writing 2-3 research paper critiques and responding to at least one other student critique online. (ii) Replacing the midterms with the sub-topic questions that are spread out through out the entire semester. (iii) Keeping the remaining lab and problem set as is. Ultimately, assessment should be in line with your learning objectives and the form of assessment can change as long as you achieve your objectives.

My goal with the group research project was to expose students to research ideas in DBMS, which I hope to partially achieve with critiques that are easier to do individually and remotely.

While some tools enable online proctoring, it is difficult to administer midterms online. Finally, it is worth noting that students may be stressed, or worried about family that they may not be able to travel to due to travel restrictions or quarantines, or even sick or in isolation. By distributing the weight of the midterms across many questions for each lesson I hope to not disadvantage students who are dealing with a particularly difficult situation.

With labs and problem sets, I advise not switching to a completely different set if you already started your class. For example, BusTub and DataBass are really cool labs/projects for teaching database systems internals and are auto-graded but the overhead for students to switch midway to another lab etc. might be overwhelming and your capacity for remote support and debugging is severely limited. For this semester, I will continue to use SimpleDB as they already completed Lab 1.

Tools. I’m using Zoom for once a week class discussions and office hours. Piazza for the class forum.

If you haven’t introduced other tools before the shutdown don’t go overboard introducing many tools. Stick to only the tools you absolutely need and those that students will actually use.

Sanity. This is not an easy transition so here are few tips for your mental sanity.

  1. Don’t be overwhelmed by the tons of resources online and advice on online teaching. Most of it you will not be able to follow/implement in the short time frame, so feel free to ignore it and do what you think makes sense for you.
  2. Keep it positive. I always wanted to try out alternative teaching methods and this might be an opportunity to do so.
  3. Feel free to use existing teaching material when possible. If they will help you achieve your learning objectives then it doesn’t have to be perfect or equivalent to the experience you provide students in your classroom or in one-on-one meetings.
  4. You are dealing with more than moving to online teaching. Your research labs may also be closing and researchers might be leaving. Try to keep a healthy expectation of what you hope to achieve this semester or even this year. For example, user studies are suspended this month and that will impact my research and ability to publish this cycle. It will also impact some senior year Capstone projects in my lab. I’m ok with that and I will work with the students around this.

Acknowledgements: I would like to thank Nancy Gleason at the Hilary Balon Teaching Center for her advice and on-going sessions that help support NYUAD faculty, Andy Pavlo and Joe Hellerstein for their lectures. Alexandra Meliou for sharing her materials from a flipped introductory database application class even if it wasn’t quite a fit for my course.


How to write a critique for a research paper?

Note: this is an active post and I’ll be updating it as I receive feedback or find helpful illustrative examples.

One of my many awesome advisors and teachers, Daniel Abadi, made us write critiques of research papers in his graduate database systems course. It was an excellent exercise as it allowed us to:

  1. think deeply of the work,
  2. create a summary that we can later visit whenever we need to refresh our memory of the work or even to write the related works,
  3. think about new research problems or different research perspectives on our current work, and
  4. practice writing (and believe me you need the practice).

If you read a paper and you like it (or dislike it), write a critique! Adrian Colyer has a popular blog where he critiques research papers in systems and ML.

Read the rest of this entry »

The Case for Redistributing Charitable Crowdfunding Donations

What better time to blog about charity than during Ramadan, the month of giving? In late 2015, we partnered up with LaunchGood, a crowdfunding platform, to study ways to improve the overall success of the different charitable campaigns they support. We decided to tackle the problem from a data-driven perspective: we examined two years worth of data on campaigns and donors. Here is a detailed technical report of our key findings.

Read the rest of this entry »


Managing your advisor: the art of the meeting

An effective student-advisor relationship is the foundation of good academic research. This relationship is often structured around weekly meetings.

As a student, keep in mind that your research problem is your main and only work focus and you are expected to initiate and test out ideas as well conduct the majority of the creative (design prototypes, UIs, design experiments, code, think of a proof structure, etc.) or grunt (code, prove, conduct experimental runs, etc.) work.

The advisor is usually your backup, wiser brain. Often, the  advisor presents you with the research problems. She will likely guide you through the problem, outline solutions, remind you of the big picture, refer you to papers, make you think of alternative solutions, designs, implementations, unstick you if you find yourself stuck, help you analyze or figure out the experimental data, and so on. The advisor, however, is a busy, multitasking machine, often advising multiple students with varying demands on her time, teaching courses, writing grants, building research networks, serving on conference committees, or dealing with university business. I never appreciated the faculty workload until I became an assistant professor.

The advisor brain is thus an expensive resource, which you must efficiently manage. I hope you would find some benefit in these advisor meeting & management tips: Read the rest of this entry »