sales & marketing
By Alan Weber and John Trewolla

What They Don’t Tell You About Data Mining


Data Mining is typically approached as a technical challenge. Get a big enough data warehouse, with a fast enough computer, load all the data you can get into it, and voila! A successful data mining effort is born.

If the purpose of data mining was to build a big, impressive computer system that your vendors and IT department can be proud of, and brag about to other techies, most data mining efforts would be a complete success. Unfortunately, most corporate data mines end up getting little or no use.

The purpose of data mining should be to build knowledge and understanding in order to make better decisions. The idea is that decisions based on hard, factual data are more reliable. Technical knowledge contributes to the availability of data, but not necessarily to the usefulness or understanding of the data.

Most data warehouses are full of space—and spaces. Literally, most data warehouses (which is where data mining is done) are overdesigned in an attempt to capture everything about everybody. The business challenge of understanding the data is ignored by technical specialists, who instead focus on the technical challenge of storing data nobody understands.

The results are that the burdens of using the data are so great that users quickly find reasons to avoid using the new system. The burdens of capturing and storing data that gets little use causes support people to quickly realize they have better things to do, and they skip the work of loading in current, potentially useful data. So, the "data warehouse" spins happily away, storing spaces instead of data.

Data mining is best done in the same way that other mining is done. First, know what you’re looking for before you start. Second, do a high-altitude "fly by" to see if any territory looks like it might hold something interesting. Third, get the permission of the territory-owner to do some preliminary exploration. (You’ll not get far if you’re trespassing on someone’s property in your search!) Fourth, extract some ore samples and test their quality to confirm that the ore is good enough to mine further. (You might be dealing with bad data!) Finally, establish a consistent method and methodology to extract the data so that each time you tap into the ore, you get consistent results. (Don’t shoot down your conclusions with inconsistent data extraction procedures.)

The best data mining is done by end-user domain experts, not IT staff or programmers. Domain experts are specialists in different areas, like marketing, research, production and so on. Too often, the IT staff does not understand the business relationships between the data elements—and that kind of understanding is crucial to making sense out of the data. Technical understanding is important, but no data mining effort can succeed without involving people with an understanding of the business and the business objectives.

How much more cost-effective and agreeable it would be to design a data warehouse around an understanding of what data might prove useful. It would save:
(1) expense of buying storage
(2) frustration of entering useless data
(3) on concerns about data privacy
(4) time for people to learn to use the information

Involving domain experts at an early stage is crucial. There is much information in corporate data systems that are incorrect or incomplete. For example, many systems used for sales support have a place for product cost. However, costs are managed by another department and tracked on another system. As a result, it may not be possible to track costs accurately by product. Determining this early on, before decisions are made based on bogus costs, is crucial.

Before undertaking a major data mining decision, ask two important questions:
(1) What problems are we trying to solve?
(2) Who understands the problems?
(3) As you move ahead, stay focused on the problems, involve the people who understand them, and use technology as a means, not an end..

Alan Weber is CEO of DataPlus Millennium LLC, and John Trewolla is principal consultant. They may be reached by phone at 913.432.8311 or by e-mail at alan.weber@dpm2000.com.

 

Return to Table of Contents