Chapter 1 Introduction
Data science is everywhere. Most likely you have heard that data science can do wonders for you, that you will boost your appeal responses, that you will find new major donors, and that you will be a rock star. Although there is some truth in these statements, do not conclude that you need an analytics team. Performing quality analysis is a hard task, but most organizations fail in its execution. Most people are reluctant to change, and selling analytics to such people is difficult (Mankins and Sherer 2014). Even if you’re interested in learning only the hows of data science, you still must understand the whys.
Let’s look at the whys from three different perspectives.
- Need for analytics
- Adoption of analytics
- Success in analytics
1.1 Quick Introduction to Analytics
This field has many names: statistical analysis, quantitative analysis, data mining, machine learning, data analytics, business intelligence, and data science. While some names have fallen out of favor, some are trending.
Regardless of the size of ythe data, the common goal in all these fields is to learn something from your data. It requires grit and skill, however, to learn something useful and actionable. In recent years, the data has grown so rapidly that it has become unmanageable. Plus, management leaders and data professionals have realized the derived value of such data (Amatriain and Basilico 2011).
The growth of available data and the increased need of insights from such data has given birth to specialized tools to manage and store data as well as to learn from it. We will not talk about these specialized tools, such as Hadoop, Hive, Mahout, and other tools that sound like names of animals or diseases, in this book. You can extend the principles and methods discussed here to learn and use such “big data” tools.
1.2 When Do You Need a Data Science Team?
As a believer in discovering insights from data, I am biased: I believe that data-driven decisions will make you and your organization more effective (Brynjolfsson, Hitt, and Kim 2011). The question is not whether you should use data or analytics, but which insights you would find most applicable.
After seeing countless news articles on big data, you may find it easy to believe that you need scores of data scientists or that you need infographics. For simple aggregation of data or calculation of descriptive statistics, you just need an introduction to statistics class.
To do (what I consider) data science, you need advanced knowledge of analytical concepts and, more importantly, you need heightened judgment to reject questions that yield “interesting” yet non-actionable results. These skills come with practice and at a cost.
1.3 Analytics Maturity Model
Before you begin your analytics journey, you must consider the current state of your data and your organization’s appetite for it. If your data is in bad shape due to inconsistent or irrelevant capturing of data elements, you will have a tough time getting something meaningful out of it. Your first priority in such cases should be to shore up your data and build consistent data-capturing practices. Once you regularly capture and retain useful data, you can begin your journey on the analytics path which, according to IBM, looks like Figure 1.1.
Let’s go through these stages:
Ad-hoc: In this stage, you keep data in spreadsheets. Every time you are asked to provide information, you spend a lot of time combining data sources.
Foundational: You have a customer relationship management (CRM) system, supported by a well-designed database. Finding and providing information takes less time. You often provide basic measurable activities.
Competitive: You efficiently utilize your database and reporting solutions. Decision makers access data and reports easily. You accurately measure and report past activities, answering questions like “What happened?”
Differentiating: You generate forward-looking information. The decision makers rely on this type of information to plan for the future. You answer questions like “What will happen?” and “Why did X happen?”
Breakaway: You help automate decision making and/or generate real-time information. The decision makers have the latest information readily available along with recommendations for subsequent steps. You answer questions like “What should we do?”
As proposed in the IBM model, your goal should always be to advance the status from the basics of getting and storing the right data and building the best data practices to measuring objectives and goals.
Although you should always attempt to increase data usage in your decision making, you need to assess whether you need an analytics team.
If you work for a smaller organization (say an organization with fewer than 1,000 prospects) then, yes, you can improve your solicitation success rates or increase your retention rates using analytics. The costs of doing so, however, using a full-time analyst could be higher than any efficiency you gain.
For example, let’s say the cost of an appeal is $5 per piece and you send this appeal to 1,000 people. That appeal will cost you $5,000. But instead, you develop a model that predicts people who are likely to respond. Based on that model, you choose to send the appeal to 600 people. Your cost of sending the mail now is $5 X 600 = $3,000, a total savings of $2,000.
But if these savings came at a cost of a $55,000 employee or a $15,000 model purchased from a vendor, you did not even break even on the money you spent on modeling.
The true value of an analytics person is less in what models he or she can develop, but more in his or her critical thinking. Any analyst should be able to solve a given problem, but good analysts will ask the questions nobody has asked before and provide new solutions to previously undiscovered problems. And that ability is worth acquiring.
1.4 What Will Data Science Do For You
After reading the first section, you may wonder, “But what about Moneyball?” or “What about all the news of how ‘big data’ is going to save the world?”. If you cut through the hype and find some problems in your organization worth solving, yes, analytics can indeed add a lot of value. Let’s look at an example of how analytics can add value.
Let’s say your annual appeal has a response rate of 8%, but then you send your appeal to only a selected population using a predictive model. From this selected population, 16% of the people respond, thus, giving you a 16/8 = 2 lift.
Another way to look at it is by creating a cumulative gains chart, as shown in Figure 1.2. This chart shows the response rates using traditional methods of selecting whom to contact and the response rates using predictive models. Most likely, you will see improvements over your baseline methods.
For example, in this chart you see that if you contact 20% of the population using your baseline method, you will get a response of 20%. With a predictive model, if you contact 20% of the population, you will get a response of 50%. You gain efficiency by contacting people likely to respond as predicted by the models.
If applied thoughtfully, insights generated using analytics can help your organization do all of these better:
Find more prospects
Build a stronger prospect base
Retain more donors
Manage the right prospects
Recommend giving options to your donors
Recruit gift officers
Invite people to events
“Up-sell” online giving
Create stewardship articles
Staff the right geographic regions
Assess campaign readiness
Scale prospect research
1.5 How to Build a Case for an Analytics Team
The best way to build a case for an analytics team is to report on the return on investment (ROI) on analytics as applied to your organization’s existing problems.
Are there any problems that worry you about the future of your organization? Problems such as how to retain donors, how to find new donors, how to increase giving, how to increase participation rates, how to focus your gift officers’ efforts, how to provide timely information on your prospects to your staff, or how to know that you have enough prospects to reach a campaign goal?
The following is a simple way to document these problems.
Create a table with five columns as shown in Table 1.1. In the first column, list all the problems that worry you about the future of your organization1. In the second column, record your thoughts about solving those problems.
Take a break.
Go over the list again and add any other information that you can think of. In the third column, make notes of any problems that you think can be solved by data-driven decision making. In the fourth column, make notes of any outcomes such as improved processes, saved time, or increased giving. In the last column, enter estimated savings or earnings.
Once you complete the list, you may find that an outcome of fixing a single problem could be worth thousands, if not millions of dollars to you. If that is the case, congratulations! You just built a strong case for your analytics team.
Summarize all the outcomes, provide the estimated dollar amounts in savings or new income, and present the findings to your management team. When leaders see significant risks or opportunities, they are more likely to invest and support the idea.
|Acquire more prospects||Purchase lists||yes, we can purchase lists based on profiles of our existing donors||1,000 more prospects in the database||$5,000|
|Increase retention||Learn interests of donors||yes, we can build a dataset of donor interests||improved response and retention rates||$10,000|
|Shortage of donors||Find new markets||yes, using geographic modeling, we can find new regions||new opportunities||$500,000|
If your organization is unwilling to invest in a new team or you just don’t have resources to do so (though you have a solid case), just start doing. Doing is far more powerful than saying. Lead with an example. Tackle a reasonable problem and generate an analytical solution to it. Show the results and projected outcomes to a potential consumer of your information. Be very picky about choosing your first consumer. This consumer should be your champion and should be able to communicate the power of analytics to other people, including your leadership.
1.6 Differentiate Actionable from “Interesting”
It is very easy to think that you can apply analytics to every problem—true, you can—but the bigger challenge is separating “interesting” from “actionable.” For example, social network analysis is quite cool and you may apply it to your data to find network graphs. Yes, the network graphs look good and interesting; how to put them into action, however, is a bigger challenge2. That is why it is important to think first of the biggest problems or questions that your organization is facing. By solving these problems, you could provide a new direction for your organization. If you think of solutions before the problems, you can forget implementing them because you will have a hard time creating the buy-in and “selling” your solutions.
If you’re still reading this chapter, I assume that you want to build an analytics team, and I anticipate your next question might be “What type of people do I hire for such positions?” I consider the following qualities, which make up the mindset of analysts, critical for the success of such a team.
Some of the world’s biggest inventions happened because someone was curious about something. It would be very nice if we could describe the problems with all the parameters to our analysts and then ask them to find solutions. You know this: it doesn’t work that way. What worked in college or graduate school hardly works in the professional world. In a school setting, you solve a given problem, whereas, in the professional world, you interpret problems.
Your analysts first and foremost must be curious—curious to ask questions, curious to wonder whether there is a better way of doing things, curious to find information, and curious to talk to people and understand their problems.
1.7.2 Balanced Skepticism
To succeed in this type of a role, one needs to have balanced skepticism toward existing practices, available data, current conclusions, and cultural biases3. As O’Neil (2013) suggests in her book , “a skeptic is someone who maintains a consistently inquisitive attitude toward facts, opinions, or (especially) beliefs stated as facts.”
Skepticism is further helpful in balancing the belief of “data can solve every problem” with “I don’t know whether data can support that question, but I will find out.”
Real-world data is messy. Cleaning and preparing such data takes a lot of time4. When you add the learning curve, intricacies, and sheer difficulties of using specialized tools, the whole process no doubt frustrates you. Just when you think you’re ready, your underlying question changes, newer data becomes available, or you are asked for something completely different. To survive through this and still succeed, one needs persistence—a lot of it. I have seen many talented professionals quit (not only quit projects, but quit their jobs) because they wanted quick results and did not persist through the messiness of our business.
1.7.4 Hunger to Learn and Improve
As data grows, the tools available to gather, manipulate, and analyze data are changing, too. It is challenging to keep up with the latest technology, but practitioners of data science should willingly give up inefficient tools for better ones. Doing so requires regularly reading and learning about the field and picking up relevant tools.
A good analyst will separate herself from an ordinary analyst with such a mindset. Continuous improvement of processes, tools, methods, and, most importantly, of oneself should be the cornerstone of an analyst’s mindset.
When you immerse yourself in similar fields and you constantly read and learn about such fields or industries, innovation happens.
When you neglect the other fields, you don’t innovate; you repeat.
Research shows that an intrinsically motivated person is better at learning than an extrinsically motivated person (Ryan and Deci 2000). People who are intrinsically motivated enjoy their learning, have persistence, and are creative compared to those who are extrinsically motivated.
The extrinsically motivated expect and await rewards. It is hard to keep an extrinsically motivated employee happy in a job requiring nimbleness, curiosity, and continuous improvement.
1.7.6 Portfolio Approach
While tinkering with data and developing various data products, a good approach is the portfolio approach. As Karl Ulrich (a Wharton Business School professor) explained in his talk5, one must generate many ideas and work on them simultaneously while looking for breakthrough ideas. One of them is likely to be a winner.
Although you may disagree, it is a fact that at all times, you are selling something. In every conversation, you are explaining your perspective or convincing others to accept your idea. Selling is critical when you want your users to take action on your insights and recommendations. They will not take an action if they can’t trust your models and theories, or, worse, you.
You need to explain your processes using stories or analogies so that you don’t have to hard sell, but make them understand that you are solving their problems. Selling becomes easier if you clearly communicate that you are solving your users’ problems and that you explain your methods without confusing your listeners. I’ll paraphrase Wayne Gretzky: “You waste 100% of your analysis that your readers don’t take action on.”6
1.8 Technical Expertise
You may ask, “Why did you emphasize mindset and softer characteristics over technical expertise second?” “Aren’t technical skills more important than soft skills?” Yes, they are important because analysts would be unable to do their jobs if they didn’t have the technical skills, but they would be unable to succeed if they didn’t have soft skills. Plus, one can be trained in technical skills, but it is very hard to extrinsically cultivate soft skills.
1.9 Areas of Importance
To do a high level of analytics or use data science, the following areas are of high importance.
1.9.1 Data Mining / Machine Learning / Statistics
Data mining, machine learning, or applied statistics. Whatever you call them, these skills are the foundation of analytics. Data mining is a general name for the process of finding patterns from data. Machine learning is a field of computer science that focuses on using various pattern-detection algorithms. Some examples of machine learning algorithms are association rules, nearest neighbors, decision trees, random forests, Bayesian methods, and neural networks. Some methods from applied statistics have also made their way into machine learning. Multiple linear regression, logistic regression, and Bayesian methods are the most used techniques from the applied statistics field.
220.127.116.11 Data Visualization
In this infographic-crazy world, it is easy to dismiss graphics. I know I do.
Bad data visualizations take up the whole space to describe very few data points (think people, flags, buildings, and exploding pie charts), whereas good data visualizations get out of your way and actually show the underlying data (think tables, simple charts, and patterns). If carefully crafted, data visualizations can tell powerful stories. The key is to avoid the trap of making them overly beautiful but hardly actionable.
Noah Iliinsky7, a data visualization expert, said that “data visualizations are advertisements and not art.” Your main objectives are: make the visualizations tell your story, and let the data/patterns stand out and not distract your reader. If you follow the principles of effective data visualizations, you will more than likely make your visualizations actionable and good looking.
18.104.22.168 Database Management
Of all the other processes in an analytics project, data gathering and manipulation takes the most time. If you are unable to get the required data in a structure suitable for analysis, you’ll spend even more time manipulating the data.
Structured Query Language (SQL) is handy in such cases. Most likely, your data is stored in a database management system such as an Oracle™ or a Microsoft SQL Server®. There are three things you must know to efficiently get the data out of such systems.
Database structure: Knowing which data elements are stored in which tables and how the tables are connected to each other.
Concepts: Understanding the theory and principles of relational databases will help you get the required data faster and with accuracy.
SQL: You’ll need to write queries to access the desired data.
You may complete various analytics projects without writing a single piece of code, but programming offers tools to become efficient. The other benefits of using a programming language are reproducibility, repeatability, and readability. Reproducibility helps you track your steps when someone asks you how you arrived at a certain number. Repeatability helps you modify your process when someone asks you to make some changes to your analysis. Readability helps you and others to understand the logic of your analysis.
Open source and free statistical and scientific programming languages such as R and Python are helpful in our analytics pursuit as both languages provide countless libraries for various topics. Plus, they both make data manipulation and analysis very easy, s as you will find out in this book.
22.214.171.124 Communication Skills / Storytelling
Imagine yourself speaking in front of the consumers of your analysis. You want to describe how your predictive models performed. You can show them the “confusion matrix,” that is, the errors and accuracy of your model. Or you can describe a single person (and her characteristics) from your data and how those characteristics impact your model. Which version do you think your audience will most likely understand, remember, and trust? I am willing to bet on the second one. Even the most serious scientists enjoy good stories. I take huge inspiration from The Economist articles. These often start with the story of one person and later describe a wider phenomenon with detailed statistics.
1.10 Where to Find Them
You may look for one person with all of the above skills, but you may also may be able to build a team with complementary skills. I would like to see organizations create another important position that I call the insights manager. Although the analysts themselves can communicate the results to the stakeholders, you will see better results if you have a dedicated person to work with management team members, listen, ask questions, and formulate data questions for the analysts. Once the analysts complete their analysis, the insights manager then builds a plan to put the analysis in action and makes sure that the analysis is used in decision making. This person frames the right questions, applies the analysis, and understands that a mediocre analysis that is used is more effective than the excellent analysis that sits on a desk.
Now that you know the technical skills and the required mindset of a sound analytics team, the next logical question you may have is where to find this talent. Two obvious choices are grow or hire the talent.
Growing talent in house would be a good choice if a person interested in this field is interested for the right reason, that is, not to make a quick buck but to learn various tools and their uses. If an employee, in your mind, has already passed all the mindset tests, it is quite easy to put her on the above training or ask her to complete related data science courses at https://www.coursera.org. If you are lucky to work near universities, you could also look for interns who major in computer science or other quantitative fields. Also, you have this book in your hands. :)
Although hiring the almighty data scientist or the measly data analyst may seem like an obvious choice, both are quite hard to find, let alone hire. If you go about hiring, you could look at recent graduates from applied statistics or analytics programs, such as North Carolina State University’s MS program, or you could work with a recruiter specializing in analytics. Most likely, your best hires are passive candidates who are already doing well in their current job. Wherever you find them, I recommend testing the technical skills and analytical problem-solving skills of these candidates. The hardest qualities to test, I have learned, are perseverance, patience, and hunger to learn. The following job description may help you create your own job posting.
Sample Job Description
We are looking for an experienced data analyst who enjoys working with messy datasets and finding patterns of business significance from them. An ideal candidate would have a graduate degree or equivalent coursework in a technical and quantitative field along with strong programming skills.
Data Manipulation, Enrichment, and Analysis (80%)
- Manage, acquire, clean, and manipulate data to support analyses and reporting
- Use machine learning and advanced statistical techniques to draw meaningful and actionable recommendations from various data sources
- Use various software and tools (R, SQL, Weka, Python and others) to analyze and present the data analysis
Concept Development and Learning (10%)
- Build and keep up with the knowledge of literature, practices, and techniques in data science, business intelligence, and management communities
- Develop product ideas and solutions to increase our operational efficiency
- Promote data-driven culture
- Build and share analytics expertise
- Strong critical thinking and project-management skills along with curiosity, a passion for learning, and balanced skepticism
- Graduate degree or equivalent course work in a technical or quantitative area
- Strong computing and programming expertise
Mankins, Michael, and Lori Sherer. 2014. “Help Reluctant Employees Put Analytic Tools to Work.” Harvard Business Review. https://hbr.org/2014/10/help-reluctant-employees-put-analytic-tools-to-work.
Amatriain, Xavier, and Justin Basilico. 2011. “Netflix Recommendations.” http://bit.ly/2tnLnKt.
Brynjolfsson, Erik, Lorin Hitt, and Heekyung Kim. 2011. “Strength in Numbers: How Does Data-Driven Decision-Making Affect Firm Performance?”
O’Neil, Cathy. 2013. On Being a Data Skeptic. O’Reilly Media, Inc.
Ryan, Richard M, and Edward L Deci. 2000. “Self-Determination Theory and the Facilitation of Intrinsic Motivation, Social Development, and Well-Being.” American Psychologist 55 (1). American Psychological Association: 68.
Don’t note the lack of your promotion as a worry – numbers will always be against you!↩
LinkedIn has one of the largest relationships datasets in the world. Even they put a stop to their network mapping tool in September 2014. These maps did not add anything to our knowledge.↩
I’m suggesting a careful and objective point of view toward everything and not becoming a “devil’s advocate,” which I think people use as a shield while bringing down other people’s ideas.↩
Some comment that 80% of any data analysis project is spent on data preparation.↩
Original: “You miss one hundred percent of the shots you don’t take.”↩