Why Bad Data Management Practices Will Drive Your Team Crazy
There is nothing more frustrating to teammates, a team lead or a manager than bad data management practices. Bad data management practices lead to:
- Collaboration and work transferability problems: If data is poorly organized it makes collaboration more difficult and more time consuming, as every person who uses the data or analysis has to spend time in trying to understand the structure of the files. This also leads to an increased probability of errors with team projects.
- Inefficiencies and data accuracy/reliability problems: Poorly organized data and analysis make iterating on an analysis more difficult and also make updating an analysis with new data more time consuming and error prone. Additionally, lack of traceability makes quality checking very difficult and not as effective at catching errors.
10 tips to improve your data management practices:
- Always keep a paper trail. The only hard entered data that goes into a spreadsheet should be the raw data. Everything else should be linked to the raw data. This allows data to be easily checked and makes iterating on a calculation very easy.
- Document what you have done in a description tab. Create a description tab at the front of an excel file that describes the purpose of the file, tracks any file updates/iterations, lists data/document inputs, and notes about the structure of the file (i.e. color coding or ordering systems) that will help team members quickly understand the layout of the file. Putting this information together takes less than 5 minutes and makes it much easier for other people to pick-up your work from where you left off and makes it easier to manage iterations of the analysis. Here is an example of what a good description tab looks like:
- Always keep an untouched copy of your data. Data corruption can ruin a well-executed piece of analysis, so you need to do all you can to ensure that your data is high quality. One part of ensuring that the data is high quality is ensuring it does not get altered while processing. I recommend creating a raw data tab for each data source that you are working with in your Excel File and using a color-coding or ordering system to identify all raw data in an Excel file. See the sample description tab above for ideas on how to do this.
- Develop clear file and tab names. All of your teammates should be able to review your files and have a good sense of the purpose of each file just by reading the file name. Similarly, make sure to name your tabs within a file so that your teammates would be able to easily navigate your file. For larger files, it is a good practice to develop a table of contents tab as well.
- Make sure that it is easy to pick-up your analysis from where you left off months later. Questions in business will be unearthed from time to time and there is nothing more frustrating than re-visiting a piece of analysis and having to re-discover the wheel.
- Validate your final calculations and make sure they make sense. Check that observation counts add up to the same number that you started with and if not be able to account for each observation that is excluded from the analysis. Similarly, make sure that aggregate calculations match those in the raw data.
- Anticipate iterations and build flexibility into your model. Spend some time thinking about the structure of your analysis and how it could change. It is always a good rule of thumb to make the important assumptions flexible. This will allow you to easily iterate on your analysis down the road and also test the effect of your assumptions.
- Plan-out your final charts and tables before you start the analysis. Think about the final deliverable and structure your data processing and analysis in a way that will make it easy to arrive at your final deliverable. This can save you lots of time and effort.
- Always save data to a shared drive that is backed-up. This will ensure that your work is recoverable even if your computer goes down. Also, this will allow for your colleagues to access your files without interacting with you. This is a really important thing to remember for part-time employees.
- Always develop an analysis or data processing blueprint before diving into a data task. Don’t analyze data blindly. Doing so, is a dangerous practice that often leads to data mining and chasing results that are driven by spurious correlation. Data should be used to test hypothesis.
Don’t fall victim to bad data management practices! The time you invest in maintaining organized, clean and traceable data and analysis files will make you and your team more efficient and accurate over time.
If you are interested in reading more on best practices for data management, I highly recommend reading the Obsessive-Compulsive Data Quality Blog by Jim Harris.