Data Mining is typically approached as a technical challenge. Get a big
enough data warehouse, with a fast enough computer, load all the data
you can get into it, and voila! A successful data mining effort is born.
If the purpose of data mining was to build a big, impressive computer
system that your vendors and IT department can be proud of, and brag about
to other techies, most data mining efforts would be a complete success.
Unfortunately, most corporate data mines end up getting little or no use.
The purpose of data mining should be to build knowledge and understanding
in order to make better decisions. The idea is that decisions based on
hard, factual data are more reliable. Technical knowledge contributes
to the availability of data, but not necessarily to the usefulness or
understanding of the data.
Most data warehouses are full of spaceand spaces. Literally, most
data warehouses (which is where data mining is done) are overdesigned
in an attempt to capture everything about everybody. The business challenge
of understanding the data is ignored by technical specialists, who instead
focus on the technical challenge of storing data nobody understands.
The results are that the burdens of using the data are so great that users
quickly find reasons to avoid using the new system. The burdens of capturing
and storing data that gets little use causes support people to quickly
realize they have better things to do, and they skip the work of loading
in current, potentially useful data. So, the "data warehouse"
spins happily away, storing spaces instead of data.
Data mining is best done in the same way that other mining is done. First,
know what youre looking for before you start. Second, do a high-altitude
"fly by" to see if any territory looks like it might hold something
interesting. Third, get the permission of the territory-owner to do some
preliminary exploration. (Youll not get far if youre trespassing
on someones property in your search!) Fourth, extract some ore samples
and test their quality to confirm that the ore is good enough to mine
further. (You might be dealing with bad data!) Finally, establish a consistent
method and methodology to extract the data so that each time you tap into
the ore, you get consistent results. (Dont shoot down your conclusions
with inconsistent data extraction procedures.)
The best data mining is done by end-user domain experts, not IT staff
or programmers. Domain experts are specialists in different areas, like
marketing, research, production and so on. Too often, the IT staff does
not understand the business relationships between the data elementsand
that kind of understanding is crucial to making sense out of the data.
Technical understanding is important, but no data mining effort can succeed
without involving people with an understanding of the business and the
business objectives.
How much more cost-effective and agreeable it would be to design a data
warehouse around an understanding of what data might prove useful. It
would save:
(1) expense of buying storage
(2) frustration of entering useless data
(3) on concerns about data privacy
(4) time for people to learn to use the information
Involving domain experts at an early stage is crucial. There is much information
in corporate data systems that are incorrect or incomplete. For example,
many systems used for sales support have a place for product cost. However,
costs are managed by another department and tracked on another system.
As a result, it may not be possible to track costs accurately by product.
Determining this early on, before decisions are made based on bogus costs,
is crucial.
Before undertaking a major data mining decision, ask two important questions:
(1) What problems are we trying to solve?
(2) Who understands the problems?
(3) As you move ahead, stay focused on the problems, involve the
people who understand them, and use technology as a means, not an end..
Alan Weber is CEO of DataPlus Millennium LLC, and John Trewolla
is principal consultant. They may be reached by phone at 913.432.8311
or by e-mail at alan.weber@dpm2000.com.
|