Part 1: Clean data ≠ the solution
‘Data cleaning’ is a term or concept you will manage often as a data professional. In fact, the first step in data analysis is cleaning dirty data — nulls, missing values, duplicates, etc.
As I mature as a data analyst, I believe that dirty data will not always be the primary issue. What happens when you have the perfect dataset? (not sure it exists, but a girl can dream.)
There is a lot of talk around “data maturity”, building systems, dashboards, and pipelines, all very necessary, but they don’t mean much if the questions coming in are vague, shifting, or built on assumptions.
For beginners, data cleaning is the foundation of analytics. As you grow, process cleaning becomes your kryptonite. You still clean data, but maturing is realizing that you should clean requests.
Every analyst I know spends time cleaning datasets, debugging dashboards, automating reports, or tracking down someone who renamed a column from “User ID” to “UID_2023_FINAL.”
So, what if messy data isn’t the issue? What if it’s unclear questions? What if the issue begins at the point of request?
In the last 30 days, I’ve had at least 3 requests where:
The metric was not defined before, the dataset didn’t exist (or wasn’t tracked), and the client just needed “a number” to validate impact.
It may seem far-fetched, but it makes your job easier and shows maturity. You’re cleaning data AND expectations.
So, how do we get here?
In this series, I will document how I achieve data maturity, simple and easy.
Push for the real question
My first principle of practicing data maturity in any business, or on any team, is simple:
→ START CLEANING EXPECTATIONS
If you work with me, I will always ask you, “What are you actually trying to decide?”
In plain English. I write in English before I write a formula or code. This makes a difference because some bad analysis starts with unclear questions.
The request goes from “Can you give me weekly signups?” to “We’re testing a landing page and want to measure drop-off after form submission.”
“Can you give me weekly signups?” can be vague.
Why do you want weekly signups?
Are you testing a change? Validating an ad?
Looking for drop-offs?
Without this clarity, your analysis might answer the wrong question.
Don’t rush the process, you can be quick and still maintain steadiness. Take time to ask your client, supervisor, or boss, “if the number is high, what happens? And if it’s low, what happens?”
Asking the right questions in business is the most important step towards achieving data maturity. Calculating averages and totals adds value when you know precisely what you’re looking for.
What value does “minimum order” have if it doesn’t give you any productive insights?
2. Make peace with ‘no.’
We talk about missing values, incorrect formats, duplicates, and so on, but expectations can be messy. Sometimes, the data doesn’t exist or wasn’t tracked, fields are missing, or you have what I call “vanity metrics”.
If the data doesn’t exist, say it on time. What’s the worst-case scenario? Losing a client, right? Most of the time, they appreciate that I pointed it out, and I suggest a workaround or a future-facing solution (“We can start tracking this moving forward”). But I don’t apologize for missing data that was never captured.
No sir, we can’t solve these problems in Python or Alteryx. You solve them through conversations.
3. Define ‘done’ up front.
Is this for a slide deck, a decision, or a dashboard?
Knowing the end use of the data determines how in-depth I go, how pretty it looks, and how often it should be refreshed or updated.
My goal as an experienced analyst is: “less code, more clarity.” AI is writing all the code, and I will admit, better than I do.
I still enjoy a good SQL challenge or Python script, but more than that, I like it when I don’t have to redo an entire analysis because the goal shifted mid-way.
For me, clean questions create clean data.
Concluding part 1:
You can have clean tables and still deliver meaningless analysis, and this has happened to me as a freelancer. You can also have a messy spreadsheet and still find something that changes how someone thinks.
If your data work feels heavy, maybe the problem isn’t in the tools, but your assumptions.
One impactful thing you might do this month, apart from building great dashboards, is helping your team, business, company, organization, or yourself craft better requests and questions.
See you in Part 2.
Enjoyed this? Read, leave a comment, share it with your team, and subscribe for more.
Be data-informed, data-driven, but not data-obsessed
Biz and whimsy: https://linktr.ee/amyusifoh
🔗 Connect with me on LinkedIn and GitHub for more data analytics insights.
🔗 Get the free dashboard design toolkit and workflow: dashboard design toolkit
#dataScience #dataanalysis #Businessanalysis
Data analyst ⬩ Spreadsheet advocate ⬩ Freelancer ⬩ Turning data into useful insights
Defining what "done" is it such a huge one. I adopted that a couple years ago after I experienced it working with an internal IT department. Having the requirements and definition defined up front and signed off on protects you from a 1 month project becoming 6 months one!