Practical Big-O

You’ve likely seen a Big-O chart that looks something like the one below.

There are plenty of other sites that will give you a technical breakdown of how to calculate Big-O time and space complexities. (Big-O Cheat Sheet or VisuAlgo)

That is not the goal of this post. I want to take a practical look at what all the fuss is about. Why does any of this even matter?

What’s the Big Deal?

Calculating Big-O complexities has been around since the 1800’s, but I don’t remember ever having a Big-O notation question in a technical interview until recent years. Why has there been such a major shift in interviewing?

Is the efficiency of our code more important now than it was before? I really don’t think so, in fact, if anything, with less processing power, I would suggest that writing efficient code might have been more important 20 years ago than it is now.

What does the interviewer want to know?

When an interviewer asks Big-O questions, what they really want to know is, can you write efficient code? Can you process large amounts of data quickly?

In reality, I have never worked in a job where I actually analyzed the Big-O complexity and then used a Big-O chart as a point of discussion in deciding how to proceed.

I have, however, worked with huge datasets, that needed to be processed in a short amount of time, and I’ve had to get creative in making that happen.

Not all problems can be solved with a better algorithm!

I’ve done quite a bit of work with data warehousing and data processing through ETL packages.

I once had a client who had a relatively short window of opportunity to process a full days worth of transactions. Several thousands of records had to be processed in about a 2 hour window.

Each record had a complex set of steps that it had to go through to be fully transformed. (each record took about .3 seconds to process)

Example: If there was a day with 50,000 transactions

50,000 * .3 => 15,000 seconds
15,000 / 60 => 250 minutes
250 / 60 => 4.15 hours

We optimized the transformation algorithms several times to squeeze the most performance out of each millisecond. We were able to get the time down to about 200 milliseconds pre record (2.8 hours for 50,000 records). But in the end nothing seemed to be enough, the 2 hour window was too short for the number of records that we had to process.

Big-O did NOT come to the rescue!

Scary Right? We had a complexity problem to solve, but an even faster algorithm did not present itself. So we had to find a different solution.

It turns out that the best approach for our scenario was to not try to process all thousands of records in the 2 hour window. Processing each record was relatively quick, but combining all of those fractions of a second into a single process was an insurmountable task.

But what if we processed each record in real time? Adding .2 seconds to each transaction would not even be noticed by each user as they worked throughout the day. In fact, we could even launch a new thread to process the record, so that the user would not be affected at all.

In essence, we took a synchronous 3+ hour process and spread the work throughout the day.

Then, at the end of the day, we still had to run a short process to perform some final calculations – but this was a very quick step.

Not once during this project did anyone mention Big-O, yet the problem revolved entirely around algorithmic and space complexities. Our focus was on meeting the requirements: 1) thousands of records and 2) window of opportunity (2 hours). We could have spent time theorizing about which time complexity we were dealing with…

O(n log n)

But all that would have done is told us what we already knew – the process takes too long for the amount of time we have.

So What?

Although many developers work for companies who deal with millions or billions of records (or more), most of our programming should NEVER deal with all of those records at the same time. Generally speaking, we should be only working with a small subset of the data at any give time.

So if you find that your data processing is taking a long time, the first place I would look is to see if you are pulling too many records into memory. (I.e. I can’t think of any good reason to ever send thousands of records from an API to a front end app)

The Practical Approach

Don’t get me wrong, some problems require you to optimize every calculation, and to know exactly what the time complexities are for the sake of scalability (i.e. home security notifications or self driving car sensors). But the vast majority of problems we solve day to day will look something like this.

  1. Find an algorithm to solve the problem
  2. Time the process to get a baseline for your efficiency
  3. Determine if the baseline is good enough or if it needs to be optimized
  4. Optimize the algorithm/process if necessary

So why is Big-O such a big deal in interviews?

Most likely, the interviewer googled the same websites that you did to prepare for your interview. They are asking what they think are good standard interview questions.

When I conduct an interview, I am generally less concerned with whether or not you know the textbook definition of the various Big-O complexities, than I am with trying to learn how you think. I want to know how you approach challenges, particularly unexpected ones. Yes, asking Big-O questions would give some insight, but to me that is only one tool. Solving real world coding problems is much more involved than Big-O.

So when you prepare for interviews, yes, you should prepare for the Big-O questions, but you should think deeply about WHY you need to optimize your code.

Categories: Design, Misc, Web

Tags: ,

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: