Slow Data in a Hurried Age*

* This is a play on David Mikics Slow Reading in a Hurried Age. As someone whose research is about building tools for decision-making with data, I find myself at odds with how these tools are often used. Mikics describes how we have lost the joy of reading books because of how hurried our world has become; our reading attention spans are now limited to tweets, and other forms of bad writing. I think the hurried age also applies to how we make decisions with data: so below is my attempt at describing the problem. In the future, I will try to propose ways of slowing things down.

Part 1: The Gated Data Nullius

What purpose does data serve?

When we began to build abodes for data, the databases, from cave walls to stone tablets to blue ledgers, to OracleTMs and finally to lakes in the cloud, we started with a simple desire: to keep record — a shared contract of what happened, when, where and how, and what should happen in the future as of now. All this kept us civil: “this is our story for the record.

I remember over-stuffed, tattered, beige folders of patient files shuffling around my mother’s clinic: papers of test results, scribbled notes, and pink and yellow carbon copy prescriptions peeking from every side, held by an invisible force in each folder. Each one was a joint story of human interaction across time, place and technology. As with all stories, much is left out. Tests were done on Feb; Results came back a week later; Prescriptions were given on March; More tests; More results; The patient is back. From these fragments, my mother would skilfully fill the narrative gaps. As Ali sat down, she could tell he wasn’t taking his meds, the very ones she prescribed in March. He was giving the meds to his brother who couldn’t afford them and seemed sicker! But he was getting sicker too as the record showed; even after accounting for Ali’s confession: He wasn’t fasting that day they ran the tests. Like every day he couldn’t resist his morning tea with 4 sugar heap-fuls. The full story lives off the record. Rich lives, paltry records.

When I grade labs, exams, or papers, I would often think of those folders. I submit my grades on Gradescope, thinking here is my feedback for the record yet fully aware of the richness of in-class conversations, the 1-on-1s, the circumstances of my students and the learning (or lack thereof) beyond what the record shows. My students do the same thing when they fill in course evaluations, here is what we thought of the class for your record, fully aware of the richness of all our interactions inside and outside the classroom.

But data are not just records! A “database” is an antiquated term. Data doesn’t live in a base. It’s part of a set; it’s pulled from a log; it’s generated from a model. It’s beyond you and me and it doesn’t care for our full stories. How did the database become the data set? Well, a couple of things happened:

First Development: We stopped being deliberate about what we retained. When we tallied inventory in ledgers or kept notes in records, there was a clear sense of limits and costs. Maintaining a record took time, effort and physical space1. There was a visible cost to retention. So we only kept what we needed: we were deliberate. But when data retention ceased to have a visible cost — Data centers are hidden from sight, often across continents; Once paid, the subscription, licence or storage costs create a perverted sunk cost fallacy, in that we ignore the cost and assume data lives free! — we gave up our deliberations. Keeping track of things with exacting precision does not require much or any effort: we track when we track things down to the millisecond2. We track how long a patient spent with the doctor and how long it took to grade one answer vs. another. We don’t need to store notes on our interactions, we can and we will store every uhm and err.

Second Development: We stopped owning our stories. As the beige folders made their way into my mother’s office, I could see the people in the waiting room perk up: “is that mine?” They felt they owned the folder and the fragments of their story that it held. But who owns data? The minutes it took for a patient to be seen; the diagnostic codes that justify insurance claims for tests, referrals and refills; and so on. Whose stories (if any) are they representing? The physical folders are gone but with it the imaginary force that tied together the doctor, the patient, the nurses, and many others across time and place as they ritually kept record of their many interactions. The insurance company keeps the data and decides our insurance premiums even if we never engaged in “keeping record” with the insurer. The medical group keeps the data and pushes wellness checks on unwell patients. The clinic, the university, the hypermarket, the bank, the city … we don’t see our stories, yet it is data wrapped on our records, forcing its way into our lives.

We have grown familiar and even comfortable with the disappearance of the ownership of our narratives even when we commit to writing our own stories. On social media, there are no owned stories, only data. A post generates likes, reposts, and maybe responses. Our posts and responses are stripped of the richness of our lives; they don’t even rise to paltry records because we don’t even try to share the context through which we write our fragments (we can’t) or read others’ (we don’t ask).

From these developments emerges the data nullius, like terra nullius or no man’s land, no individual owns the data. Unlike the commons, we don’t govern it either for collective use. Yet, it is kept and gated with varying degrees of access.

So what purpose does data serve?

Quick takes on critical issues! I don’t mean this cynically. It is just the way it is. Imagine a boardroom, where you or someone else are sitting around representing an institution. A well meaning concern is posed. Either to prioritize or discredit the concern, one announces “Show me the data!” The data person3 pipes up: “well we have this data that we can use to answer this question with … (5 minutes of data analysis jargon go by; words like correlation and causation are thrown around to signal deep knowledge of all things data) … it isn’t exactly this and of course we would have to do further analysis but we can show … (5 more minutes!)” The newly formulated question has nothing to do with the concern but that question has the right data parts. The data itself is not substantial enough to answer the question but that is secondary, future work, after the first take is presented at the next board room meeting.

The presence of any data (Development 1) drives this behavior. Pushing back is seen as ludicrous. Surely, you would want a quick handle or even an approximate answer before you invest any further resources into studying an issue or handling it. Resources are constrained. We have no time. There is also a feedback loop that plays in here: data begets data. Resources are put into the secondary future work to gather any more data that can be easily gathered, joined, merged, etc. (Development 1). We can now answer even more questions motivated by, but disconnected from, deep concerns. And so we operate in a world of compounded approximate and bad answers to irrelevant questions.

Let me give an example. Across universities faculty are concerned that using course evaluations to evaluate teaching effectiveness is leading to grade inflation4. Brought to a university’s board room, the data person formulates the following irrelevant question: “are student grades correlated with course evaluations?” Some analytical work of course is needed to account for discretized, finite grades, non-normal distributions, non-linearity, etc . All of this is explained along with the required “correlation is not causation” to provide the necessary due diligence that assures everyone is aware of the imperfection as they nod to “let’s see what the data says5. Yet no matter how much data6 exists, it cannot uncover whether faculty take a more lenient grading approach in fear of poor evaluations. Here are some reasons why:

  • Many faculty-student-administrator-adjective-interactions: Espousing this concern, grade inflation is probably strongest when we have a large proportion of (i) must-get-an-A students (or perceptions thereof), (ii) risk-averse yet impartial faculty: the combination of holding equity dear, yet wanting to get tenured, promoted or renewed, and (iii) data-driven administrators that believe course evals should determine teaching effectiveness. But we could have variations on this: smaller proportions, pragmatic (a-C-would-do) students, risk-tolerant or partial faculty, etc. Moreover, each of these plays into a noisy, person-specific, interaction-specific, function of how much additional grade is tacked on. From Development 2, we know that the data doesn’t come with any of that context, even if the record between the faculty and the student does so for them.
  • Many other factors: Perhaps grade inflation happens because of many factors that may include course evaluations or changes in knowledge expectations or the faculty actually just got better at teaching.
  • It’s time-evolving: New faculty might come in with certain cultural norms of grading. Perhaps they were stringent and received poor evals. They sought advice from a colleague who said “just give them good grades.” After some internal moral qualms, the faculty did become lenient and evals did improve. The faculty pushes on the same advice to newer faculty. They got tenured and then found a new sense of risk tolerance. They make the material more challenging and they are more confident docking down students but the evals remained the same. They now claim that evals are not influenced by grades. This is just one complex trajectory for one faculty that explains the interplay of grading norms and evals.

To all these reasons and any more, the data person has rigorous solutions; rigorous methods that are less sensitive to the violation of their assumptions; causal models; data curation (e.g. let’s look at large class sizes taught by multiple senior instructors within the same year — so calculus 101?!). Yet the violation of an assumption should invalidate the result; causal models cannot account for unaccounted causes and often require more data than what is available for robustness; and at what point does the curated data stop generalizing to the entire university. Debating the data person can go on indefinitely.

Let me be clear, I am not making an argument on the validity of the concern: it may, it may not be. I am arguing against the presumption that with data, especially what is often readily available, we can figure it out. I’m also pointing out that the emergence of the data nullius makes it increasingly difficult to argue against a fast data approach. Back in the board room, imagine a voice that says: “hold on, that is not what records of grading are meant to be used for; neither are records of course evaluation.” If that voice does emerge, it will quickly be quelled again with irrelevant responses. “Oh we are doing this analysis in the aggregate, we are not violating any one’s confidentiality.” But it isn’t about confidentiality, in fact the extraction of a data point from the very context it was created within is the problem. But the voice rarely emerges because we are familiar and comfortable with the disappearance of the ownership of our stories.

Again these are rough ideas, feel free to comment.

  1. The tangible nature of a record meant it also didn’t travel much or far. ↩︎
  2. A nursery app would ping me the very minute my toddler had a bowel movement! ↩︎
  3. At this point, it is fair for you to ask “Azza, are you the data person?” I’m not the data person but I also plead the 5th 🙂 ↩︎
  4. This an age old concern. There are lots of empirical studies that show some evidence of this, and no evidence. I learned that in the US a possible driver for grade inflation was the Vietnam War; to protect failing students from being conscripted, faculty boosted the grades! ↩︎
  5. The presence of empirical studies with other universities does not detract from the data analysis effort: (1) we have the data, so we can do it! (2) we are unique. ↩︎
  6. There might be other ways to determine how prevalent grade inflation might be at an institution and what drives it but none of them are easy or fast or without problem: Qualitative interviews (Too long) ; Surveys (Few answer them, fewer truthfully); Logical reasoning (Ah, the philosophers); Game theoretic formulations and simulations (Ah, the economists); Randomized Control Trials (Ah, more economists … What? You want to do what?); We build a multi-factorial AI model that predicts grades or course evals and we examine the weights assigned to course evals or grades (Get out of here!). ↩︎

A Theory for Technology Design that Empowers

In the discussion of technology, we often find a refrain at least among the more sensible scholars: “Technology is neutral, it is what we do with it that matters.” This makes sense as it stands in contrast to the marching chants of techno-defeatists: “Technology progresses on, as if it has a mind of its own and will continue to persevere, survive and grow like other sentient beings, evolving to finally achieve its personhood like humans – the pinnacle of evolution.” Both claims are deficient. Technology is designed and technological solutions are designed with a designer’s framing that colors the personality of a tech even if it never achieves personhood; this framing determines its relationship to us as humans.

Let me grossly simplify, generalize and state that designers think of the humans who they design tech for in one of two ways:

  1. Humans are flawed, technology aims to overcome, or
  2. Humans are potent, technology aims to empower.

These diametrically opposing perceptions of humans by designers may appear subtle with respect to a technological solution but they are not. Technology interacts with humans in amplifying, self-fulfilling loops: a technology rooted in a human flaw will only amplify the flaw, sap us of our potential and leave us weaker and more dependent on the technology itself. A technology rooted in a human potential will only amplify our power, leaving us stronger and self-fulfilled. Designers in the first camp are in essence self-haters1 : they see weakness in us, hate it and aim to eradicate it. Those in the second camp are empowerers: they celebrate human potential, see our strengths and aim to grow them. As a designer, I can attest that it is easier to design from within the self-hating camp. After all, weaknesses are problems and we enjoy solving problems. Whether what was identified is indeed a weakness or a problem rather than a feature is beside the point, once the design process begins.

Two examples ought to make concrete my arguments. A majority of technology is borne out of self-hate, so let’s start with one technology that isn’t.

The Bicycle as Empowering Technology

Riding a bicycle brings me an immense joy. A bicycle cannot be borne out of a perception of humans as flawed2. It celebrates our keen sense of balance — who would have thought it wise to stay upright on less than one inch of rolling tyres; It celebrates our capacity to grasp our surroundings as we dart around potholes, pedestrians, or car doors that suddenly open; It celebrates the physical strength of our quads and hams as we climb up hills and travel longer distances. A bicycle empowers us to go further because we can. In the self-fulfilling form of any technology: the more you bike, the better you balance, the more acute your reflexes get and the stronger you become. Bicycles bring joy because they let us lead, they extend our powers rather than replace.

The Gamified Exhibit as Self-Hating Technology

My husband and I recently took our boys (6 & 4) to a children’s exhibit. The beautiful exhibit had juxtaposed art works of stars, planets and galaxies, along with historical and contemporary artefacts that allowed humans to peer ever more deeply into the skies and space. Naturally curious and interested in space, the exhibit was bound to leave an everlasting impression of wonder in any child. Except for the self-hater in every designer who couldn’t trust that the heavens, space, art and history alone can excite a child to wander in awe and learn about the cosmos. Starting with a human flaw — children lack motivation — a technological solution is proposed: Let’s gamify this experience! And so begins the design process of complex technology layering that created the following:

On entering the exhibit, two avatars construct a narrative to enlist the children’s help. There has been a system glitch and the databases holding key information about humans and the cosmos are corrupted. Don’t lose hope! if you complete a series of exercises, you can fix the databases.

Accepting the mission earns each child a barcode bracelet that tracks their progress towards fixing the glitches: a jumbled up digital copy of a centuries old painting, a disordered compass, mislabeled planetary objects, etc. Our children are ecstatic. As we follow along through the 15 stations or so, we find ourselves explaining one tech glitch after another. There is no time to experience the beauty of any artefact, appreciate its perseverance through time, or consider its meaning. We trudge along: scan, fix, collect points. Hooray, a badge! Then things take a darker turn: competition creeps in. “Mom, let’s do this faster! Dad is already two stations ahead.” “Dad, they have more points! Do it right!” Now we are not only moving through a repair factory line, we are fighting off intensifying feelings of injustice. We are one step away from full catastrophic meltdowns.

Technology did overcome! We completed all the tasks. In order. Two email certificates attest to that. Problem solved.

The self-fulfilling, amplifying effects do not end on exiting the exhibit. The self-hating design is analytically confirmed to be superior: “look all the tracking data shows success, the kids are going through all the stations; they saw all the things we wanted them to see.” Our kids were robbed of their potential to just enjoy wandering around aimlessly through broken clay pots, metal spears, historical gadgets, art and depictions of the cosmos. Now, a gamified interest-production layer is required because museums need to be covered and completed. Aimless wandering and just resting at one or two things that spiritually speak to one can only be a flaw that ought to be technologically eradicated.

As I prepare for my Spring course on Techruption, I find myself dwelling on these two design mindsets. Perhaps every now and then I’ll post about a technology that I find emanating from one design camp or the other and perhaps I can make my arguments more nuanced. I would appreciate a conversation on this, so feel free to comment or reach out.

  1. A benefit of a blog post is the ability to state things a bit more controversially and strongly 😉 ↩︎
  2. We can’t really go back to the psyche of the first person who designed a bicycle so we are going to engage in a bit of logical fallacy and affirm the consequent. ↩︎


Evaluation of an emergency online course

I had ambitious plans for the two months of emergency online teaching including sharing my weekly lesson plans and lessons learned. I haven’t delivered a weekly report as promised but I did offer occasional updates. So, I hope in this wrap-up post to first share some the main lessons I learned. As many of us have to teach courses online again this fall, some of these lessons may help others better plan for round two! I have already received my course evaluations so I’ll emphasize the methods that the students appreciated. Second, I would like to put forward a plan of action moving forward. Many of my colleagues described the two-month ordeal as challenging and complicated. While I would use similar words to describe parts of the course, the word that comes to my mind about this experience is illuminating. I teach again in the spring and I hope by then the world returns to some semblance of normality with in-person classes. My plan moving forward assumes in-person classes are back but hopes to integrate many of the positive aspects of online-learning that I have discovered.

Lessons Learned

The hybrid model works.

I made a drastic decision to switch my teaching model in the one week preparation time to a hybrid one: asynchronous lectures + one live session a week. If you have prepared lecture slides and materials from teaching a course many times before, you understand how this decision might seem like an additional and unnecessary overhead. I cannot stress how useful it was to learning. Putting my self in my students’ shoes, I understand how after 12-15 minutes of watching an online seminar, I am easily distracted and I am onto emails, newsfeeds, or other more attention-holding tasks. An online Zoom lecture isn’t much different. You can introduce more breaks and participatory exercises to break the monotony but once you lose your students, it is very hard to bring them back in. In the hybrid model, I would occasionally dive into a specific concept in detail and within 10 minutes, I can see the videos turning off. That said, I was lucky in that I found excellent online videos by Prof. Joe Hellerstein at UC Berkeley and Prof. Andy Pavlo at CMU that covered roughly the same material I had in my syllabus. If you have the summer to prepare, it might be worthwhile to record your own lectures. You have to keep each video segment short. On average the segments were 5 minutes long, with 10 minutes being the longest video I shared.

She also picked and provided video lessons each week that were easy to watch and follow along to, and when the video was long, she picked out the relevant segments which made it less intimidating, since I didn’t have to watch a bunch of 1 hour lecture videos.”

Course evaluation, anon student.

Pre-recorded lectures are not enough and you have to back them with a live session once a week and extended office hours. The live session helped me work with the students on exploring real-world problems from different perspectives and to dig into more advanced material. A lecture format didn’t quite work here and I found the students most engaged when I introduced a problem and then broke them up into groups of 4-6 and asked each group to come up a with solution based on the material covered in the previous week’s module. For example, after the introductory module on transactions, I asked each student group to pick an application they are familiar with and discuss what sorts of anomalies can occur from multiple, interleaving transactions and whether we need a serializable transactional system to support the application. The discussion in each group was lively and instructional.

If you choose to cancel one of your weekly classes like I did, I recommend replacing it with office hours. As students take on independent learning, they may still struggle with certain concepts and a one-on-one can really help students overcome mental blocks. For office hours, I set up the zoom room and waited. It was a casual set-up. I had my camera turned off and microphone muted most of the time. I didn’t expect students to turn on their cameras for office hours either. Some students joined without any questions and used this time to individually work on the weekly lesson with me around and to hear any Q&A that came about. It was a laid back, semi-supervised, learning session.

Structure and organization are key.

In earlier posts, I attached some samples of the structure of my weekly lesson plan. As the class continued, I found myself adding more text and notes to each week’s module and distinguishing core concepts from optional/advanced/good-to-know ones. My advice here is to stick to a consistent plan and do not deviate unless there are exceptional circumstances. Weekly assessments for each lesson were due on Saturday, and the following Monday “Patch-the-Gap” lecture focused on any gaps uncovered from grading the assessments. Of the 22 students who evaluated my class, 8 students answered “organization” and “structure” to the question “what aspects of the course were most valuable to you?” I made sure to communicate any changes in the course plan by email and in the live session. I discussed how the different class deliverables connected to learning objectives and how any changes in these deliverables did not impact their learning objectives. I was open about the advantages and limitations of online learning and used structure to ensure that no student was left behind. Having a rough week shouldn’t derail a student completely: keeping each lesson plan as self-contained as possible and allowing students to revisit past material was how I achieved this.

Weekly assessments, labs and problem sets are crucial, …

… not midterms and final exams: I say this knowing that I will alienate some of my colleagues. Personally, I enjoy putting together a challenging midterm. For many of us, midterms and exams are powerful tools that make us dust-off our textbooks, read, revisit slides and learn. Understandably, giving up exams does not come easy. I have worked through several reasons for not giving them up myself and here is what I found opting for online weekly assessments in lieu of midterms/exams:

  • Myth 1 – Students will share answers and not bother going through the material: I didn’t find this to be the case. Open-ended questions revealed a diversity of student understanding. Even for true/false and multiple-choice questions, there were was a diversity of responses that makes me question this assumption. Handling academic integrity concerns is separate from the form of assessment and should be addressed by affirming a positive learning mindset in students. For example, it helps to reiterate to students their pledge to academic integrity. Reminding students of why they take courses in the first place is also effective. I explain to students that I teach material that will help them with real-world problems that they may face in their professional lives: individually going through the difficult mental process of learning new material and solving problems themselves makes them better prepared for their professional careers. Taking shortcuts doesn’t.
  • Myth 2 – Everyone gets an A: This is a somewhat problematic statement, which I hear often. It implies that online, open-book assessments are somewhat trivial and can not accurately discern the degree of student learning. I found that the weekly assessments allowed students to better gauge their own learning and understanding, more so than closed-book timed examinations where students attributed poor performance to insufficient time, anxiety, having a bad day or confusing questions. It also helped students learn at their own pace. Underlying this statement is also the incorrect assumption that the purpose of teaching a course is assigning a grade: this often has the effect of making students believe that the purpose of taking a course is getting a grade (or getting an A in particular). Grades, however, are just another form of feedback to students as they continue on their life-long learning journeys. The average grade for this course was a B+: most of the students demonstrated adequate competence with concepts introduced in the course and if they need to apply these concepts in future problems, they should have a good idea of what more they need to learn or relearn to do so expertly.

“The weekly assessments really helped to not let the fog and de-motivation of online classes set in.”

Course evaluation, anon student.

“I think the weekly assessments were instrumental in helping me learn and still feel connected to the class and its material.”

Course evaluation, anon student.
  • Myth 3 – You can’t really test problem-solving. The week-long, open-book, multiple submission, assessments kept students on track as they had to learn and understand the weekly lessons to correctly complete the assessments. When students found a question difficult to answer, they had the opportunity to rewatch the video lectures or review the reading material to provide a more confident response. Having office hours a few days before the assessment deadline helped students start their weekly lessons early to avoid cramming the day of the deadline. Having extensive and challenging problem sets and labs allowed students to apply the knowledge they learned across several weekly lessons onto bigger problems. As computer scientists, we solve complex problems and the few minutes we have to answer questions on a timed midterm rarely reflect the reality of the problem-solving process and limit the range of problems we can ask our students to engage with.

I really enjoyed doing the Labs and Problem Sets. While they were challenging, getting this hands on approach has definitely made me more competent as a CS major and a student in general.

Course evaluation, anon student.

Really cool labs with tests, which enabled you to monitor your progress as you go through them, which is not common in other CS classes. I also enjoyed the teamwork dynamic connected with labs as that heavily reminded me of the workplace collaboration aspect of software dev jobs.

Course evaluation, anon student.
  • Use a tool that helps make online grading and feedback easy. I highly recommend Gradescope. The ability to assign rubrics as you grade and reuse comments speeds up grading and helps you figure out the main patterns in student responses. Giving thorough feedback is difficult when you have a large class size and this tool definitely made it feasible.
  • Responsiveness and availability go a long way: Going online meant that many student questions were posted on the online-class forum. I checked those once a day and responded directly to questions or endorsed correct responses from fellow students.

Thoughts for the Future

Moving forward, the experience has made me consider the following improvements for future iterations of the class whether it is online or in-person.

  1. Creating instructional, interactive, and visual notebooks of algorithms, data structures and protocols: There are a variety of online sources that provide some form of animated visualizations of different concepts and my students were exceptional in finding those and sharing them with each other on the class-forum, e.g. animations of searches, insertions, and deletions on b+ trees helped students better understand how they worked. However, what I found missing from many of these beautiful visualizations is a demonstration of how these artifacts behave in practice. For example, how to bulk-load a b+ tree and why is it faster than insertions? À la distill.pub style, I would like to invest time into creating visualization notebooks that delve further into actual practice, behavior in real-world and more advanced optimizations and variations. It is time to rethink the textbook into a more interactive and visual medium. In the recovery and distributed transactions unit, students found it difficult to appreciate the delicateness of designing recovery or commit protocols. An interactive notebook here would ideally allow them to introduce failure and play along the protocols or modify the protocols and understand why they might fail. If this is something you would like to contribute to, please reach out.
  2. Creating a database of continuous online assessment questions: Preparing new weekly assessments every week is time-consuming and error-prone. Students found some of my questions confusing, especially the multiple-choice ones. There is a wide variety of questions online and I had to use those and modify them (thanks to Joe Hellerstein’s edX course). That said, it would be great to build a database of questions designed specifically for regular, online assessment that have the right balance of difficulty and offer a good mix of easy-to-grade and open-ended questions.
  3. Creating recorded video lectures: I did enjoy the reduced stress of not having to lecture twice a week. Despite teaching database systems a number of times before, every semester, I still have to prepare for 2-3 hours for every hour of lecture. I would much prefer to use this time to create content that students can access any time and at their own pace. As much as I like the external video material I found (there is also Professor Jennifer Widom’s excellent StanfordOnline course), I would like to create material that is more aligned with my lecturing style and my personal weighting of the importance of different topics. I am not sure I have time this year to start working on this. I have always enjoyed the energy of students during lecture so I would like to start recording myself once we start in-person classes.


Bring some hope into your online class – weeks 2-4

Online teaching takes a mental toll on students and faculty alike. I miss the face to face interactions, and the gallery of 25+ mini mug-shots on Zoom just isn’t the same. What is quickly transpiring is that some students have adapted really well to the online environment. These students do appreciate the recorded lectures and the ability to replay, pause and even speed through them and do find the weekly assessments helpful as a measure of their learning. Others haven’t adapted as well. As one of my students explained: “I am finding it very hard to stay motivated.” I understand how challenging it is to stay motivated with online learning, and I can only imagine how the current circumstances compound the issue.

So I decided to liven things up for Week 4. Instead of the regular Zoom session where I go over some of the material in depth, I invited NYUAD alumni, whom I have taught a course previously or I have supervised their capstone, to our zoom session: Nine awesome alumni showed up! It was heart-warming to hear their stories from around the world. They shared their professional experiences that spanned a gamut of sectors: consulting, software development and testing, product design, graduate school, real-estate and even the army. They shared a message of hope that these circumstances will pass. They calmed some of the seniors whose post-graduation job offers got upended. They shared their strategies to survive the unstructured world after university: one alum shared that he only signs 6-month contracts and assesses his achievements and life goals every 6-months, flexibly changing his career and plans if the past 6-months made him miserable. Another explained how he used an 8-month employment gap to reinvent his career. They talked about their current jobs and their biggest challenge post-graduation: budgeting! Another explained how “learning, re-learning and reading books” is something that helps deal with the current situation as well as multi-tasking across a variety of personal-interest projects.

They also reflected on their time at NYUAD and their senior years. They described how OS and DB Systems were intense. “We did everything in Azza’s class: projects, assignments, research, exams, and labs!” and how they regretted not taking the “wood-working” class. They all shared how their experiences at NYUAD made them better professionals and people. I may have intended for the session to cheer up my students, but it ended up lifting my spirits and making me so proud of my former students. More importantly, it made me optimistic about the future of my current students, knowing that NYUAD prepares resilient individuals who adapt well to the changing world.


Switching to an online DB systems course on short notice

Disclaimer. I have never taught online. With only 1-week of prep, we were asked to switch to online teaching. I am quickly learning how to do so from trying to figure out how best to deliver content, to how to assess student understanding and work around the many challenges of an online environment. I assume other faculty will be in a similar situation as COVID-19 continues to spread and more countries take precautionary measures such as closing university campuses. Here are the steps I am taking and I hope this blog helps others who are in a similar position and need to teach a Database Systems course. I will also provide the lessons I learn as I go through this.

Pre-online status. We have covered the relational model, relational algebra, SQL, database design, normalization and FDs, and have started the database architecture unit. We had just finished access methods. My lecture style encourages lots of student participation with different in-class exercises. My class has 25 students. Labs (SimpleDB labs) and problem sets can be done in pairs.

Key decision point. You need to decide early on how you intend to proceed with your online class: synchronous, asynchronous or a hybrid of both.

Option 1: Synchronous model. You continue to meet during your lecture sessions and deliver your lectures as before on an online environment like Zoom.

Option 2: Asynchronous model. You prepare recorded lectures and engage students through forum discussions.

Option 1 is difficult to run with 25+ students. It isn’t clear how effective the “talking head” is in-terms of learning: will students digitally raise hands and ask clarifying questions? will students tune-out? Option 1 will never mimic your in-person classroom experience. Moreover, with many students potentially returning home to different parts of the world during closures, it is not clear how many can easily zoom into the classroom. You can record your zoom sessions but I am concerned about how likely students will ask questions or participate if sessions are recorded. Also, as a lecturer, the amount of preparation for a recorded session is much higher than a regular one. I initially considered this approach under the assumption that the university closure will only last for a month with a scheduled 1-week spring break any way. However, on further discussions with other faculty, my optimism wained: it is possible that closures will last for the entire semester and if some of your students left the country, they may not be able to return in time. Option 2 involves a fair amount of work to ensure students are posting questions and engaging in online forums, especially if you haven’t established an online posting culture in your class early on.

Option 3 Hybrid model. I opted for this model to select the best of both worlds. I will be using Professor Andy Pavlo’s recorded lectures from his Fall 2019 Database Systems course as well as Joe Hellerstein’s CS186 Berkeley online recorded lectures. Both lecture sets are excellent. They are at the right level for my class and at the right pace. You might have to pick and choose the material and reorder it to better fit your planned syllabus. Andy’s lectures are not segmented by topic, which means that listeners might loose attention/tune-out. I am creating online lessons as follows: I am using short video segments on a focused topic (I’m taking parts of Andy’s lectures or using Joe’s segments directly). After each topic, I will ask 1-2 short questions worth a few points. These will contribute to each student’s final grade and will help me assess student learning and participation. I will be including my online lessons in this blog with links to the segmented videos and the post-video questions.

Instead of meeting twice a week, I’m planning to meet once a week instead to discuss problem areas and conduct some in-session activities (Zoom’s breakout rooms should help with this). I reduced the number of meetings for two reasons (i) to allow students more time to go through the weekly video lectures and readings and to answer sub-topic questions, and (ii) to provide me more time to prepare the weekly lessons, and to determine problem areas for in-zoom discussion from solutions to sub-topic questions.

Assessment. With two midterms to go, a group project, a lab and another problem set, I am rethinking my assessment strategy.

My current plan is as follows: (i) Replace the group project with students individually writing 2-3 research paper critiques and responding to at least one other student critique online. (ii) Replacing the midterms with the sub-topic questions that are spread out through out the entire semester. (iii) Keeping the remaining lab and problem set as is. Ultimately, assessment should be in line with your learning objectives and the form of assessment can change as long as you achieve your objectives.

My goal with the group research project was to expose students to research ideas in DBMS, which I hope to partially achieve with critiques that are easier to do individually and remotely.

While some tools enable online proctoring, it is difficult to administer midterms online. Finally, it is worth noting that students may be stressed, or worried about family that they may not be able to travel to due to travel restrictions or quarantines, or even sick or in isolation. By distributing the weight of the midterms across many questions for each lesson I hope to not disadvantage students who are dealing with a particularly difficult situation.

With labs and problem sets, I advise not switching to a completely different set if you already started your class. For example, BusTub and DataBass are really cool labs/projects for teaching database systems internals and are auto-graded but the overhead for students to switch midway to another lab etc. might be overwhelming and your capacity for remote support and debugging is severely limited. For this semester, I will continue to use SimpleDB as they already completed Lab 1.

Tools. I’m using Zoom for once a week class discussions and office hours. Piazza for the class forum.

If you haven’t introduced other tools before the shutdown don’t go overboard introducing many tools. Stick to only the tools you absolutely need and those that students will actually use.

Sanity. This is not an easy transition so here are few tips for your mental sanity.

  1. Don’t be overwhelmed by the tons of resources online and advice on online teaching. Most of it you will not be able to follow/implement in the short time frame, so feel free to ignore it and do what you think makes sense for you.
  2. Keep it positive. I always wanted to try out alternative teaching methods and this might be an opportunity to do so.
  3. Feel free to use existing teaching material when possible. If they will help you achieve your learning objectives then it doesn’t have to be perfect or equivalent to the experience you provide students in your classroom or in one-on-one meetings.
  4. You are dealing with more than moving to online teaching. Your research labs may also be closing and researchers might be leaving. Try to keep a healthy expectation of what you hope to achieve this semester or even this year. For example, user studies are suspended this month and that will impact my research and ability to publish this cycle. It will also impact some senior year Capstone projects in my lab. I’m ok with that and I will work with the students around this.

Acknowledgements: I would like to thank Nancy Gleason at the Hilary Balon Teaching Center for her advice and on-going sessions that help support NYUAD faculty, Andy Pavlo and Joe Hellerstein for their lectures. Alexandra Meliou for sharing her materials from a flipped introductory database application class even if it wasn’t quite a fit for my course.


The Case for Redistributing Charitable Crowdfunding Donations

What better time to blog about charity than during Ramadan, the month of giving? In late 2015, we partnered up with LaunchGood, a crowdfunding platform, to study ways to improve the overall success of the different charitable campaigns they support. We decided to tackle the problem from a data-driven perspective: we examined two years worth of data on campaigns and donors. Here is a detailed technical report of our key findings.

Read the rest of this entry »