The Challenges of Data Integration: People (Part 3/4)

Ian Hellström | 22 June 2014 | 12 min read

The third part of the four-part series on the challenges of data integration deals with people. I have already hinted at a few people issues in the first and second parts on technical and project management challenges, respectively, but I have not gone into specifics.

Where people work together there will be conflicts. As we shall see, data integration projects can be particularly tricky, as they require dirty data to be ‘smuggled’ over organizational borders into enemy territory.

Decisions: Back and Forth

Once the decision to go ahead with the data integration is made, it is final. Or at least it should be. A few corporations have the annoying tendency to discuss matters over and over, and even question decisions that have been given the official seal of approval. Such behaviour is detrimental to any project, in particular when people who are not directly involved in it begin to mount pressure on people in the inner circle of the project through unofficial channels.

I had to live through one case where data integration had been on the to-do list of the organization for more than a decade, but no one was either up to the task or interested to do it because it would not have led to a feather in their caps and hence ultimate glory. The official excuse was that there never had been sufficient manpower (or ‘capacity’ in their parlance), which happened to be used for everything they did not feel like doing. Interestingly, most of the real work done at that company was outsourced, which technically rendered the argument moot. Anyway, after our team had managed to kick off the project successfully and reach the mid-project review on time and within budget, we hit a brick wall. It felt more like a reinforced concrete, blast-resistant wall really. The wall had been built by a single unit who ensured that we would not receive timely access to any of their sources or experts, as they would typically have been asked to lead the data integration but because of their prior reluctance were not chosen by the new management. Naturally they were piqued. In all official documents issued by that particular department and informal gatherings with the management, to which we were not invited for obvious reasons, they asserted full support and applauded the team’s efforts. Hence, they ensured that we had no leg to stand on once we needed to escalate matters to our sponsor who gasped incredulously at our claims of their attempts to obstruct the project. As you may have expected, the project was seriously delayed.

A business case for your data integration project is what you need at the very least to convince people of the necessity of the endeavour. People who live in their own little data worlds often do not understand the need for an all-encompassing data warehouse. Newcomers to these data microcosms may point to alternatives but they are quickly initiated into the cult or expelled from the biosphere by the data lions.

An apt analogy is a supermarket where some products have correct labels, some have erroneous labels, while others have no labels at all. Whenever you grab a can of Campbell’s tomato soup from a shelf you have no way of knowing whether it’s really Campbell’s tomato soup, tomato soup from another brand, French onion soup, dog food, or even a can of socks. If everyone who goes to that supermarket just happens to know where to find the products they want, there is no real problem. Loyal customers also train new arrivals to navigate the supermarket, because they are comfortable with the status quo and do not wish to learn to live with correct labels. New customers who do not like that arrangement will seek produce elsewhere. Because nobody complains to the manager about wrong labels she is highly unlikely to hire someone to relabel everything. Moreover, no one of is likely to volunteer as the task is extremely tedious and nobody seems to want it: all customers are used to the erroneous labels.

If you now replace customer with business user, products with data, you pretty much have a typical situation for a data warehouse, and you can see why an outsider (read: project manager) has a difficult task to convince people to switch labels too. Pure data integration, that is data integration without a business case, is an uphill battle, and any victory will be pyrrhic. With a reason to do data integration it will still be tough to make people accept and adopt your point of view, but there is at least no argument as to your motives: you are driven by a business problem that is in need of a solution, and that solution requires consolidated data sources. You don’t want to shrug ignorantly when someone poses the perfectly legal question: “What’s in it for me?”

Ideally, the main customers for the data warehouse are also the main producers and consumers of data. It is incredibly difficult to and convince people of a project that is perceived to be a waste of time and money and make them spend considerable amounts of time on it and force new tools upon them.

Collaboration vs Competition

Once a decision has been made, you’re either with me or against me, and within a company collaboration takes precedence over competition. Hierarchical organizations typically have more problems with data integration projects, since collaboration of divisions is practically unheard of, and decisions can run contrary to the usual chain of command. At some companies it is tantamount to blasphemy to ask for someone’s assistance if you appear to bypass the corporate ladder.

What is more, transparency hurts. Nobody wants others to shine a light on the skeletons in the closet, or to have their dirty laundry washed out in public. Even people who are willing and even eager to work with you on the project may not like the consequences. As a project manager, it is important not to make a name-and-shame game of data integration. Sometimes you need to ask for more resources to cleanse data from a particular source or department because theirs has more holes than a Swiss cheese and stinks worse than a Pont l’Évêque. In these case it is to your advantage to repackage the request, for instance by saying their data is more complex. Complexity is hard to quantify, easy to sell, and it sounds a heck of a lot better than chaotic.

With regard to transparency it is important to lead by example: you cannot demand transparency and expose data pains when the project itself is closed off and communication to the outside world is limited. Mind you, you do not have to justify your every move and ask the entire organization for permission to go powder your nose.

Regrettably, office politics still dominate (i.e. impede progress at) some companies or departments, and they can seriously undermine the data integration efforts. Passive-aggressive behaviour is probably the most prevalent tactic employed, and it can be pretty darn effective too. Sometimes you don’t know the right questions to ask because you lack the business insight to ask them, which is why cooperative people, or data allies, are so important. “You did not ask, so I did not tell you about it” is not what you want to hear your project. It can sometimes set you back weeks of work.

There are basically two extremes of organizations where office politics are the order of the day: companies with extremely high turnover and companies with almost no turnover. The former is pretty obvious: if people leave often, the culture probably is not conducive to sustained cooperation, most likely due to bad working conditions and possibly nasty co-workers too. A hostile environment and politics are among the main reasons people resign.

The latter may not be immediately evident, especially since very few companies today are still loyal to employees, but I came into contact with such an organization on one occasion. The company in question had an insane retention rate, to which everyone simply nodded because the reason to them was obvious: the wonderful corporate culture. Somewhat amazed about that fact, I took on the job as project manager on a large-scale data integration project at one of their sites. I became even more astonished when I discovered the true reason of the retention rate: job security at the cost of the culture. Many people had entered the company when they had still been young and full of ideas, but as they were staring at the sunset of their careers they had become bitter and adroit at backstabbing colleagues, especially anyone with an opinion different from theirs. The most common response as to why they were so much against any change was that it had always been done in their particular way and it had worked well so far, hadn’t it? That claim was contradicted on several occasions by clients who were increasingly dissatisfied with the quality of their products and level of services rendered. From my time with my head inside their IT systems I could tell that bad data and a complete absence of policies regarding data (i.e. data governance) were the main reasons that they were not as nimble as their competitors.

Culture and Leadership

What this particular case demonstrates is that there was a significant detachment of the organizational culture from the one advertised on the company website. A dysfunctional corporate culture is often the results of years, if not decades, of neglect from the management who are happily unaware of reality, and people who, when left to their own devices, shape the culture in their image. At the company I worked for, there were some efforts at reorganization but cultural change never occurred: the majority simply ignored the structures imposed.

It is said that leaders can shape the culture of a company but with many large corporations the established culture determines organizational behaviour and thus the effectiveness of prevailing leadership styles.

“So how does this relate to data integration?” I hear you ask. Data integration usually brings along change: technical (new systems) and social (new responsibilities and perhaps even new organizational structures). Resistance to change is a common reaction. The leadership must reassure people about the change that is about to come because of the data warehouse. Paul Lawrence writes on that participation as a weapon against resistance to change is not always beneficial, and I concur. Participation can lead to too many people being involved; while it’s sometimes good it have more people on board, you do not want to increase the chances of mutiny.

Great Expectations

You’ve convinced people that the data integration project is strategically sound, and you’ve nipped the resistance to the change in the bud. Fantastic! But don’t start celebrating yet.

At this point some clients can get excited to an extent that is totally unwarranted, mainly because of what you, the project manager, say is different from what they hear. For instance, you explain that the data in the data warehouse is supposed to be clean and consistent, and open to anyone who needs it. What most people hear is that the days of their data problems are over, for good.

In all project communications you have to be absolutely clear about what data integration means for everyone: data integrity is not something you do once. You have to establish a set of ground rules that describe who owns the data, who ensures that the data is accurate, who or what generates fresh data, who is allowed to use or modify the data, what business processes deal with data, and so on. I shall talk more about data governance in the final post on the challenges of data integration. Suffice to say for now, your data needs to be managed too; all your efforts will be in vain if no one takes care of the data after you leave.

When it comes to expectations there are also challenges on the opposite side of the spectrum. Some people may have laboured years and years on a custom system or report that does exactly what they want it to do. The data warehouse may not have all these features, perhaps because they are not pertinent to the business or lead to data inconsistencies elsewhere, or maybe because you have not gotten round to them yet. Some bells and whistles are really best left to rust in peace, particularly when they do not pertain to the business case at hand. The data integration project is there to solve a specific business problem not to create everyone’s dream toy: it has to be fit for purpose. I know it sounds pretty much like children comparing toys on the playground—“Mine can do something yours can’t!”—but I have seen it many times that I think it deserves honourable mention.

That brings us to the next expectation that must be managed appropriately: integration vs interpretation. People sometimes expect the data warehouse to store the data in a certain format because they are used to seeing the data in a particular way. The task of data integration is to provide clean, consolidated data, not data that has been manipulated beyond recognition. Interpretation of the data is done by implementing the business logic on top of the ‘new’ data, for instance by means of defining OLAP cubes or even with views. If you just pull in data from various sources and store it more or less in the same way, the insights you gain are fewer and less valuable than when you design the data warehouse from scratch, which forces you to think about how things can be done differently and hopefully better.

The Finish Line

You’re on the home straight. Both you and the team are probably very tired. Additionally, the team have a serious case of housemaid’s knees from scrubbing data for months on end. Now is not the time to rest though.

Acceptance tests are critical to the success of the project, and because you have all tests including specific, measurable, assignable/attainable, realistic, timed (SMART) success criteria defined—you did remember them, didn’t you?—you have the best protection available against end-of-project fatigue. At this pivotal point you must keep your cool even though it’s deceptively easy to lose it and fall into the them-vs-us trap, in particular when you had to quell a lot of opposition throughout the project. If an acceptance test fails and you are forced back to the drawing board, even though you had covered all relevant cases in your own tests, you need to protect the client’s and the team’s interests: you acknowledge that you have to redesign a portion of the system, but that does not imply complete and utter failure. You cannot under any circumstance take sides. If you choose the team, the client feels neglected and your reputation is at stake. If you pick your client, your abandon your team and their motivation will slip.

To end, bugs happen. When they do, expose them for what they are and squash them as quickly as possible, particularly in the early days of the live data warehouse. If you don’t, you run the risk of people mistaking minor errors for a failure of the entire project. Overgeneralization is most prevalent when you are still on thin ice as the client tries to gradually gain trust in the production system. Find a few data champions who are supportive in order to stem the tide of negative experiences that threaten the success of the data warehouse.

In a meeting with a less-than-supportive department at a client we once showed preliminary results to get initial feedback on the general direction we were heading in. We had developed a completely new tool for operations based on consolidated data, which we wanted to share with them, well aware that the report was still a prototype. A few data points were off due to the fact that we were still working on the integration logic, and we explained that to them, but nothing that would normally invalidate the entire report. They did not care: deviations in the very first chart meant the whole data warehouse had to be wrong, at which point they walked out and never looked at the reports again. Interestingly, another unit at that client loved the new tool and they have been using it ever since.

On a different occasion we were more successful, even though it looked like we were down the same road. Our client, a multinational brewing company’s procurement group, did not believe that our methods were according to their specification, as there were still discrepancies with what they expected to see in our reports, so we described our logic with ample examples to support it. After the presentation, they admitted that we had done everything we should have but that their data was dirtier than they had assumed, so that the specification had been off. They accepted that they had to clean up the remaining issues themselves, as only they were in a position to do so; since they had been largely unaware of the problems uncovered by our team, they actually saw it as an opportunity to redesign their business processes from the ground up.


The main challenges when dealing with people on data integrations projects are best summarized by listing common responses:

  • ‘What’s in it for me?’
  • ‘I don’t need to explain my data to you.’
  • ‘Your integration is not my problem.’
  • ‘You never asked, that’s why I never said anything about it.’
  • ‘We’ve always done it like that and we are certainly not going to change now.’
  • ‘As long as it does not affect me in any way, whatever.’
  • ‘And if it does affect me, don’t expect assistance any time soon.’
  • ‘Our system can do something yours cannot, so yours is inferior.’
  • ‘You have made a mistake. Therefore, everything must be wrong.’
  • ‘All our data problems are over!’