Data preparation can be a tedious and time consuming activity.  However, it is a fundamental part of the data analysis process in order to produce accurate information that can be trusted.

If you are responsible for preparing data for analytics, you have to ask questions based on the expectations and demands of the business.

What is the problem you are looking to solve? To address the concern, you need to ask specific questions to your audience, from the report creators up to the stakeholders.

How do I effectively interrogate expectations from business executives?

Often executives can be vague about the dashboards they need at the end of the data preparation and analysis process. They might ask for things like “month-over-month comparison” or “reports shared online.” It is up to you to translate vague instructions to precise specifications, and you can do this by asking questions that you can use as guidelines.

These questions will assist you to ascertain the essential requirements:

  • Why is this report critical to the business at present?
  • Who is ultimately going to receive and use the report?
  • How will the recipients gain access to the report?
  • When will each report need to be delivered?
  • Who will maintain the report?
  • How many historical records should this report contain?
  • Are there any new data views that need to be incorporated?
  • What formatting and design specifications should be applied to each report?
  • How long do I foresee the process taking? (Is there an urgent need for the report?)

The answers to these questions will translate into the report brief that outlines expectations and deliverables.

Where are the physical locations of the data?

You want to find out where the actual data of the company is stored. It is highly likely that the data you need is housed in different locations – databases of various systems your company uses (ERP, CRM, Accounting, etc.) or multiple spreadsheets in many versions. It is also possible that your company already has a data warehouse for reporting purposes. In any case, you will have to gather, consolidate, and integrate relevant data.

  • Are there one or many sources of data that will need to be accessed?
  • How varied is the data I will be using?
  • What formats of data will I be dealing with for this report?
  • What is the volume of data that I need to deal with?
  • Do I have permission to access the data?
  • Do I have sufficient knowledge of the underlying schema and tables in every database?
  • Do I require all the available data for a granular analysis?
  • How many people will be involved with getting the data ready?

Will I need to transform or manipulate the data?

Data quality and data integrity are the two of the most important factors in reporting and analysis. Stakeholders need to be confident that they are making important decisions based on information that they can trust.

Therefore, you have to make sure that data is complete and accurate. Ideally, reports and analytics should be directly connected to the actual data source in order to have a single source of truth. This can be achieved by using tools that dynamically communicate with data sources to create models. However, when that is not possible, you should ask yourself the following questions:

  • Can the data, in its present form, effectively answer the business question asked of it?
  • How should I go about cleaning the data? Is it a manual or systematic procedure?
  • Do I have permission to manually change the data in its current location? If not, what tools should I use to manipulate data?

What model will provide the information needed?

You want to ensure that different users can connect related fields in various tables to answer any ad-hoc questions. Therefore, you must know exactly what relationship you wish to generate between the different entities in your chosen data model. As multiple users ask different questions, creating a multi-dimensional model that users can slice and dice as they wish—in an “ad hoc” fashion–will provide the most flexibility.

  • What dimensions are relevant to the report viewers?
  • What relationships do the dimensions have with each other?
  • Are there different views of data that the users need?
  • Will it be possible to make changes to the model at a later stage?
  • How easy will it be to add data sources in the future?

How will I connect my data models to my analysis tools?

Most business users prefer to access their reports in some sort of data visualization tool or an Excel report. Connecting your data model to your front-end tool must be considered, in order to have an efficient data analysis process.

How will I know if the end result is accurate?

You want to double check the results before you mark it as complete. To verify the findings, ask yourself these questions:

  • Have all questions been answered?
  • Have you collected enough data to reach a conclusion?
  • How much manipulation did you need to do, and what tool did you use? Check for errors especially if you used a spreadsheet.
  • At first glance, do the numbers seem to make sense?

The Results Are In

These questions form the process of effective data preparation. Always go back to the original objectives and look at your results through that lens. If data preparation and analysis are not done correctly from the start, it will add cost — time, money, resources – in the future. You do not want to spend more (sometimes, much more!) on such an integral part of your business success – so make sure all essential questions are posed and answers are provided, so that you can best help guide your organization forward. Even going through these steps is costly for a business, but an initiative to bring data sources together and connect critical business measures saves time and money in the end.  Connected systems like these can be linked to reports for access to up-to-the-instant data at any given time, which is a world of difference from doing it manually.

Latest posts by melindasantos (see all)