Mastering Analytics Engineering: The Art of Data Modeling and Beyond
A Wild Analytics Engineer appeared. Who are they, and what should they do in the real world.
Introduction
Not long ago, with the rise of dbt, a new role appeared - Analytics Engineer. You can read their post to see their perspective. How I see it might differ from what’s out there in the data community. Feel free to challenge me! Oh, and don’t forget, it’s not only dbt out there anymore. The market is dividing, and there are more options. Especially with the pricing changes on the dbt cloud 😉
You can read more about them in this post. There will be more of these tools! I’ll try to refrain from explicitly using dbt (even though my whole experience is from this tool) and call it a Transformation Tool.
The Role of an Analytics Engineer
You’ve heard maybe a lot about “recovering data scientists”, where people were hired as data scientists to do some ML/AI thingies but had to do everything else apart from this. Putting out fires, building data platforms, data warehouses, providing insights, etc.
At least while I was doing some of the hiring, I noticed some trends in the market:
People were looking for SQL developers to build dbt models.
Hype train where it’s a data engineering role, but you’d get more applicants by using this title (I was doing this 😕)
One-man band, assuming that the Analytics Engineer can cover all areas of data
You might have ended up in a completely different environment based on the scenario you were hired or the company was looking to hire for. Now, time has passed, and my mind has digested what has happened in the market and what the role would look like in a perfect world.
First, I’d say that an Analytics engineer should join the team where there already are good foundations of data infrastructure and data analysts are digging for insights. This means you have data flowing on time, and people are using the data, not only collecting it.
The AE fits well by building a Data Warehouse and Data Marts optimised for reading (reporting purposes). So, let’s cover some technical knowledge the AE should have:
Data Modelling - Since this role is a specific superpower, a person should know all modelling approaches, the pros and cons, and how to make the most optimal solution for a specific company.
SQL - as with Data Engineer, it’s a crucial skill to master for any Data Savyy person, but for this role specifically. It’s your bread and butter. Not only this, but you have to know how to tweak the query to make it run blazing fast on any DB (or at least understand the query plan and optimise using it)
Data Quality/Observability/Testing - this is an interesting part. Some of the tools have this integrated, and some of it you can add additionally. But AE should have a sharp eye and add checks where trouble might occur, think about anomalies, or how to be sure that your queries are correct. Why this is interesting, you might ask - here, you might need some expertise in testing to build your data tests isolated and separately to ensure that expected values stay the same. At least dbt tests check only after the potential damage is done, and you have to roll back some tables and make changes. Having knowledge or know-how of testing would be insanely beneficial.
Optional: some scripting knowledge to extend or add missing functionality in the Transformation Tool. This is optional because the community is huge, and other folks might have the same problems and are more experienced in contributing to the Open Source project.
And no, I’m not going to put dashboarding here, even though, in some places, it’s expected. They might know it, but their skill set is more applied to the previous steps of the data flow. Understanding the BI/Dashboarding tools will give them a competitive advantage so they can build data marts optimised for the specific BI tool. Still, it’s not their primary responsibility to create them.
Seniority
There is no Junior Analytics Engineer. You might ask why - it’s a fresh role; to succeed, you must be a Junior Data Engineer to learn the ropes or have walked in the Data Analyst's shoes. In addition, how do you know what you like and dislike before you try a wider role? Why limit your experience to a more narrow field where expectations might crush you? In general, the market is too fresh, and in my eyes, it’s not that clearly fleshed out on responsibilities, deliverables, etc. Technically, you can call someone a Junior Analytics Engineer, but he will most likely do the same things as your Junior Data Engineer. I would be interested to hear your thoughts on this topic.
Mid
As you have worked in some other broader data roles before, you already know the ropes, understand the basics, and can easily identify who does what; you know your way around the DWH approaches, and you can create some simple data models. Little by little, you might get some random Data Engineering tasks (just because folks have no clue what’s the difference). Based on your previous battle scars, you can identify where potential quality issues might occur and how to be one step ahead.
Senior
Because of your previous experiences, the line between Senior Data Analyst and/or Senior Data Engineer becomes blurry. You might be thrown into some analysis or dashboarding or have to own some of the pipelines now.
In the perfect world, you would be involved in new data structure creation, and you’d be one of the stakeholders when talking with data engineers on how to store the data in a more convenient way (i.e. partitioning key on some column that you noticed users or your queries are using etc.) As usual, now you might be added to some hiring interviews to probe the potential newbies and their skill levels and be a mentor for other team members.
Team lead
As with previous things, the team lead is just juggling more responsibilities, time management, and trying to deliver something while managing the team and its priorities. I believe that it’s not an actual role. If you have Analytics Engineers - put them into domains and let the domain experts manage them. Having a Technical Lead/Guild Lead is more beneficial here. I might be wrong here, but from what I saw in the market and how I perceive this role, there can’t be a central Analytics Engineer team. The whole idea of analytics engineers is to deliver/build DWH and data marts faster to a specific product or department in the company. If you need standards - tackle it from a technical POV with a Staff AE or Guild Lead or whatever you call the next IC level.
Staff
We can call it staff AE, or you can call it a Data Architect/Data Modeller. He will be driving whole Data Modelling approaches, standards and conventions. This person has to know to the smallest detail what works best where and how to go further. Involvement with Engineering teams is crucial since they’re the ones producing data. I think this role is unique and very useful because Data Engineers focus more on pipelines and coding than on Data Modelling, and this is where I see the most significant value added by the Analytics Engineer. You’re the Data Model Warrior bashing everyone for crappy standards and misalignments in sources; your goal is to champion Data Quality and Unified approach in the whole DWH. We don’t want garbage in our DWH, don’t we?
Summary
It might feel like I’m bashing in some areas on Analytics Engineer, but I see this role as a more narrow-oriented role of a Data Engineer, at least at the Mid-level. As I’ve explained, I don’t see a value in centralising the Analytics Engineering team; I think the whole idea of this role came with Data Mesh to give more speed compared to a centralised DE team.
The most significant value here is if the person can grow to the Staff/Data Architect/Data Modeller level and be able to design practical and usable data models and take the art of Data Modeling to the next level inside the company.
Uncle data! It looks you're defining an octopus 🐙 great article! Thanks