Most companies aren’t experiencing Big Data or small data problems. They’re experiencing a witching hour of sorts. This a point in their growth where their data is too big for small data and too small for Big Data. As I’m teaching at companies, I’m finding as much as 80% of use cases are falling into this conundrum.
I’ve taken to calling this in between small data and Big Data “medium data.” Medium data is from companies that are looking to move from small data to Big Data, but don’t quite need the BIg Data scales. This potential for over-engineering is making companies less successful and putting the blame on Big Data.
I’ll give a few empirical examples.
Moving Off NoSQL
I was teaching at a financial company and the told me about a production case. They were using a NoSQL database and were moving it back to a RDBMS.
I took this as a use case in the medium data space because:
- Someone over-engineered or wanted to pad their resume with a NoSQL product in production.
- The person who came along afterward didn’t understand NoSQL and wanted to get it back to a technology they understood.
- The person who architected the system didn’t really understand the pros and cons of the technologies as applied to the use case.
It’s incredibly important to use the right tool for the job, but this one is interesting. If the team were truly using NoSQL for its strengths and abilities, they shouldn’t be able to go back to a RDBMS. The RDBMS shouldn’t have been able to scale to the levels that the use case required. This would only be possible if the use case was in this medium data space where both small data and Big Data technologies are viable.
Startups and Small Companies
Most startups and small companies don’t have Big Data requirements. Usually, their usage of Big Data is based on expected growth; they’re expecting to grow into their Big Data technologies.
This is another time I see medium data. They’re trying to decide when to transition over to Big Data technologies. This transition is especially important for small companies because of the increase in complexity. The team knows it will have to pay the piper sometime and they’re choosing to pay it early.
Why Does It Matter?
Using Big Data for medium data increases the complexity dramatically. These issues manifest as complexity architecturally, programmatically, and operationally. This means that you’re using technologies that are probably overkill for the problems you’re working on.
For a small company without the proper skills and training, this can cut productivity and increase costs. These costs aren’t just operational. They represent increased costs for better programmers, architects, and operations personnel.
I’m starting to see some technologies address medium data. Usually, it’s a downward move in a Big Data technology. They’re making it easier to deploy or run a much smaller cluster that addresses medium data.