May 03, 2017

What’s next for the Digital Analytics Association?

hipsterstepsI’ve been a member of the Digital Analytics Association for, it turns out, about twelve years – over half my professional life. In that time I’ve seen the organization grow and blossom into a vibrant community of professionals who are passionate about the work they do and about helping others to develop their own skills and career in digital analytics.

When the DAA started (as the WAA), web analytics was a decidedly niche activity, not considered as rigorous or demanding as ‘proper’ data mining or database development. Many of its early practitioners, like me, did not come from formal data backgrounds; we were to a large extent making things up as we went along, arguing with one another (often in lobby bars) about things like the proper definition of a page view, or the relative merits of JavaScript tags vs log files.

We didn’t know it at the time, but the niche activity we were helping to define would grow to dominate the entire field of data analytics. Today, transactional (i.e. log-like) and unstructured data comprise the vast majority of data being captured and analyzed worldwide and the analytical principles and techniques that the DAA championed have become the norm, not the exception.

The DAA and its members can justly derive a certain amount of satisfaction from knowing we were part of something so early on, but now that the rest of the world has shown up to the party that we started, how do we continue to differentiate the organization and add value to its members and the industry?

It’s to help answer this and other interesting and challenging questions facing the DAA that I’ve put my name forward for a position on the organization’s board. You can read my nomination (and, hopefully, vote for me) here if you’re a DAA member. After twelve years of benefiting from my DAA membership, it’s time to give something back to the organization.

If I’m elected to the board, I’ll devote my energies to helping DAA members adapt to and embrace the next set of transformations that are taking place within the industry. In my role at Microsoft I’m participating in a very rapid shift from traditional descriptive analytics, based around a recognizable cycle of do/measure/analyze/adjust, to machine learning-based optimization of business processes, particularly digital marketing. Predictive analytics and data science skills are therefore becoming more and more important in digital analytics, while the range of data and scenarios is exploding. This raises tricky questions for the DAA: Which skillsets and data scenarios should the association focus its energies on, and how to stay relevant as the industry changes so rapidly?

A big part of the answer, I believe, lies with the DAA members ourselves. At a DAA member event in Seattle last week, I met the excellent Scott Fasser of HackerAgency and had a fascinating conversation with him about a current passion of mine, multi-armed bandit experimentation for digital marketing. There are many experienced members of the DAA like Scott, who have deep expertise in different areas of digital analytics, and who are keen to share their knowledge with others. We need to find ways to connect the Scotts of this world to people who can benefit from their expertise, and more broadly connect the DAA’s more experienced members with those newer to the discipline so that they can pass on their hard-won knowledge.

Finally, given that so many new people have moved into the analytics neighborhood, the DAA needs to get out and meet some of the new neighbors rather than peering out through the curtains muttering about hipsters and gentrification. Many new groups of analytics & data science professionals have sprung up over the years, both formal and informal, and there are likely profitable connections to be made with at least some of these organizations, many of which share some of the same members as the DAA.

So if you’d like to see me put my shoulder to the wheel to address these and other challenges, please vote for me by May 12.

del.icio.usdel.icio.us diggDigg RedditReddit StumbleUponStumbleUpon

October 22, 2015

6 steps to building your Marketing Data Strategy

powerpoint_sleeping_meetingYour company has a Marketing Strategy, right? It’s that set of 102 slides presented by the CMO at the offsite last quarter, immediately after lunch on the second day, the session you may have nodded off in (it’s ok, nobody noticed. Probably). It was the one that talked about customer personas and brand positioning and social buzz, and had that video towards the end that made everybody laugh (and made you wake up with a start).

Your company may also have a Data Strategy. At the offsite, it was relegated to the end of the third day, after the diversity session and that presentation about patent law. Unfortunately several people had to leave early to catch their flights, so quite a few people missed it. The guy talked about using Big Data to drive product innovation through continuous improvement, and he may (at the very end, when your bladder was distracting you) have mentioned using data for marketing. But that was something of an afterthought, and was delivered with almost a sneer of disdain, as if using your company’s precious data for the slightly grubby purpose of marketing somehow cheapened it.

Which is a shame, because Marketing is one of the most noble and enlightened ways to use data, delivering a direct kick to the company’s bottom line that is hard to achieve by other means. So when it comes to data, your marketing shouldn’t just grab whatever table scraps it can and be grateful; it should actually drive the data that you produce in the first place. This is why you don’t just need a Marketing Strategy, or a Data Strategy: You need a Marketing Data Strategy.

A Marketing Data What?

What even is a Marketing Data Strategy, anyway? Is it even a thing? It certainly doesn’t get many hits on Bing, and those hits it does get tend to be about building a data-driven Marketing Strategy (i.e. a marketing strategy that focuses on data-driven activities). But that’s not what a Marketing Data Strategy is, or at least, that’s not my definition, which is:

A Marketing Data Strategy is a strategy for acquiring, managing, enriching and using data for marketing.

The four boldface words are the key here. If you want to make the best use of data for your marketing, you need to be thinking about how you can get hold of the data you need, how you can make it as useful as possible, and how you can use your marketing efforts themselves to generate even more useful data – creating a positive feedback loop and even contributing to the pool of Big Data that your Big Data guy is so excited about turning into an asset for the company.

Building your Marketing Data Strategy

So know that you know why it’s important to have a Marketing Data Strategy, how do you put one together? Everyone loves a list, so here are six steps you can take to build and then start executing on your Marketing Data Strategy.

Step 1: Be clear on your marketing goals and approach

setting-goalsThis seems obvious, but it’s a frequently missed step. Having a clear understanding of what you’re trying to achieve with your digital marketing will help you to determine what data you need, and what you need to do with/to it to make it work for you. Ideally, you already have a marketing strategy that captures a lot of this, though the connection between the lofty goals of a marketing strategy (sorry, Marketing MBA people) and the practical data needs to execute the strategy are not always clear.

Here are a few questions you should be asking:

Get new customers, or nurture existing ones? If your primary goal is to attract new customers, you’ll need to think differently about data (for example relying on third-party sources) than if you are looking to deepen your relationship with your existing customers (about whom you presumably have some data already).

What are your goals & success criteria? If you are aiming to drive sales, are you more interested in revenue, or margin? If you’re looking to drive engagement or loyalty, are you interested in active users/customers, or engagement depth (such as frequency of usage)?

Which communications strategies & channels? The environments in which you want to engage your audience make a big difference to your data needs – for example, you may have more data at your disposal to target people using your website compared to social or mobile channels.

Who’s your target audience? What attributes identify the people you’d most like to reach with your marketing? Are they primarily demographic (e.g. gender, age, locale) or behavioral (e.g. frequent users, new users)?

What is your conversion funnel? Can you convert customers entirely online, or do you need to hand over to humans (e.g. in store) at some point? If the latter, you’ll need a way to integrate offline transaction data with your online data.

These questions will not only help you identify the data you’ll need, but also some of the data that you can expect to generate with your marketing.

Step 2: Identify the most important data for your marketing efforts

haystack1Once you’re clear on your goals and success criteria, you need to consider what data is going to be needed to help you achieve them, and to measure your success.

The best way to break this down is to consider which events (or activities) you need to capture and then which attributes (or dimensions) you need on those events. But how to pick the events and attributes you need?

Let’s start with the events. If your marketing goals include driving revenue, you will need revenue (sales) events in your data, such as actual purchase amounts. If you are looking to drive adoption, then you might need product activation events. If engagement is your goal, then you will need engagement events – this might be usage of your product, or engagement with your company website or via social channels.

Next up are the attributes. Which data points about your customers do you think would be most useful for targeted marketing? For example, does your product particularly appeal to men, or women, or people within a certain geography or demographic group?

For example, say you’re an online gambling business. You will have identified that geo/location information is very important (because online gambling is banned in some countries, such as the US). Therefore, good quality location information will be an important attribute of your data sources.

At this step in the process, try not to trip yourself up by second-guessing how easy or difficult it will be to capture a particular event or attribute. That’s what the next step (the data audit) is for.

Step 3: Audit your data sources

auditor_gift_i_love_auditing_mugNow to the exciting part – a data audit! I’m sure the very term sends shivers of anticipation down your spine. But if you skip this step, you’ll be flying blind, or worse, making costly investments in acquiring data that you already have.

The principle of the data audit is relatively simple – for every dataset you have which describes your audience/customers and their interaction with you, write down whether (and at what kind of quality) they contain the data you need, as identified in the previous step:

  • Events (e.g. purchases, engagement)
  • Attributes (aka dimensions, e.g. geography, demographics)
  • IDs (e.g. cookies, email addresses, customer IDs)

The key to keeping this process from consuming a ton of time and energy is to make sure you’re focusing on the events, attributes and IDs which are going to be useful for your marketing efforts. Documenting datasets in a structured way is notoriously challenging (some of the datasets we have here at Microsoft have hundreds or even thousands of attributes), so keep it simple, especially the first time around – you can always go back and add to your audit knowledge base later on.

The one type of data you probably do want to be fairly inclusive with is ID data. Unless you already have a good idea which ID (or IDs) you are going to use to stitch together your data, you should capture details of any ID data in your datasets. This will be important for the next step.

To get you started on this process, I’ve created a very simple data audit template which you can download here. You’re welcome.

Step 4: Decide on a common ID (or IDs)

name_badge_2This is a crucial step. In order for you to build a rich profile of your users/customers that will enable you to target them effectively with marketing, you need to be able to stitch the various sources of data about them together, and for this you need a common ID.

Unless you’re spectacularly lucky, you won’t be issuing (or logging) a single ID consistently across all touchpoints with your users, especially if you have things like retail stores, where IDing your customers reliably is pretty difficult (well, for the time being, at least). So you’ll need to pick an ID and use this as the basis for a strategy to stitch together data.

When deciding which ID or IDs to use, take into consideration the following attributes:

  • The persistence of the ID. You might have a cookie that you set when people come visit your website, but cookie churn ensures that that ID (if it isn’t linked to a login) will change fairly regularly for many of your users, and once it’s gone, it won’t come back.
  • The coverage of the ID. You might have a great ID that you capture when people make a purchase, or sign up for online support, but if it only covers a small fraction of your users, it will be of limited use as a foundation for targeted marketing unless you can extend its reach.
  • Where the ID shows up. If your ID is present in the channels that you want to use for marketing (such as your own website), you’re in good shape. More likely, you’ll have an ID which has good representation in some channels, but you want to find those users in another channel, where the ID is not present.
  • Privacy implications. User email address can be a good ID, but if you start transmitting large numbers of email addresses around your organization, you could end up in hot water from a privacy perspective. Likewise other sensitive data like Social Security Numbers or credit card numbers – do not use these as IDs.
  • Uniqueness to your organization. If you issue your own ID (e.g. a customer number) that can have benefits in terms of separating your users from lists or extended audiences coming from other providers; though on the other hand, if you use a common ID (like a Facebook login), that can make joining data externally easier later.

Whichever ID you pick, you will need to figure out how you can extend its reach into the datasets where you don’t currently see it. There are a couple of broad strategies for achieving this:

  • Look for technical strategies to extend the ID’s reach, such as cookie-matching with a third-party provider like a DMP. This can work well if you’re using multiple digital touchpoints like web and mobile (though mobile is still a challenge across multiple platforms).
  • Look for strategies to increase the number of signed-in or persistently identified users across your touchpoints. This requires you to have a good reason to get people to sign up (or sign in with a third-party service like Facebook) in the first place, which is more of a business challenge than a technical one.

As you work through this, make sure you focus on the touchpoints/channels where you most want to be able to deliver targeted messaging – for example, you might decide that you really want to be able to send targeted emails and complement this with messaging on your website. In that case, finding a way to join ID data between those two specific environments should be your first priority.

Step 5: Find out what gaps you really need to fill

mindthegapYour data audit and decisions around IDs will hopefully have given you some fairly good indications of where you’re weak in your data. For example, you may know that you want to target your marketing according to geography, but have very little geographic data for your users. But before you run off to put a bunch of effort into getting hold of this data, you should try to verify whether a particular event or attribute will actually help you deliver more effective marketing.

The best way to do this is to run some test marketing with a subset of your audience who has a particular attribute or behavior, and compare the results with similar messaging to a group who which does not have this attribute (but are as similar in other regards as you can make them). I could write another whole post on this topic of A/B testing, because there is a myriad of ways that you can mess up a test like this and invalidate your results, or I could just recommend you read the work of my illustrious Microsoft colleague, Ronny Kohavi.

If you are able to run a reasonably unbiased bit of test marketing, you will discover whether the datapoint(s) you were interested in actually make a difference to marketing outcomes, and are therefore worth pursuing more of. You can end up in a bit of a chicken-and-egg situation in this regard, because of course you need data in the first place to test its impact, and even if you do have some data, you need to test over a sufficiently large population to be able to draw reliable conclusions. To address this, you could try working with a third-party data provider over a limited portion of your user base, or over a population the provider provides.

Step 6: Fix what you can, patch what you can’t, keep feeding the beast

cookie-monster-1_2Once you’ve figured out which data you actually need and the gaps you need to fill, the last part of your Marketing Data Strategy is about tactics to actually get this data. Of course the tactics then represent an ongoing (and never-ending) process to get better and better data about your audience. Here are four approaches you can use to get the data you need:

Measure it. Adding instrumentation to your website, your product, your mobile apps, or other digital touchpoints is (in principal) a straightforward way of getting behavioral events and attributes about your users. In practice, of course, a host of challenges exist, such as actually getting the instrumentation done, getting the signals back to your datacenter, and striking a balance between well-intentioned monitoring of your users and appearing to snoop on them (we know a little bit about the challenges of striking this balance).

Gather it. If you are after explicit user attributes such as age or gender, the best way to get this data is to ask your users for it. But of course, people aren’t just going to give you this information for no reason, and an over-nosy registration or checkout form is a sure-fire way to increase drop-out from your site, which can cost you money (just ask Bryan Eisenberg). So you will need to find clever ways of gathering this data which are linked to concrete benefits for your audience.

Model it. A third way to fill in data gaps is to use data modeling to extrapolate attributes that you have on some of your audience to another part of your audience. You can use predictive or affinity modeling to model an existing attribute (e.g. gender) by using the behavioral attributes of existing users whose gender you know to predict the gender of users you don’t know; or you can use similar techniques to model more abstract attributes, such as affinity for a particular product (based on signals you already have for some of your users who have recently purchased that product). In both cases you need some data to base your models on and a large enough group to make your predictions reasonably accurate. I’ll explore these modeling techniques in another post.

Buy it. If you have money to spend, you can often (not always) buy the data you need. The simplest (and crudest) version of this is old-fashioned list-buying – you buy a standalone list of emails (possibly with some other attributes) and get spamming. The advantage of this method is that you don’t need any data of your own to go down this path; the disadvantages are that it’s a horrible way to do marketing, will deliver very poor response rates, and could even damage your brand if you’re seen as spamming people. The (much) better approach is to look for data brokers that can provide data that you can join to your existing user/customer data (e.g. they have a record for user abc@xyz.com and so do you, so you can join the data together using the email address as a key).

Once you’ve determined which data makes the most difference for your marketing, and have hit upon a strategy (or strategies) to get more of this data, you need to keep feeding the beast. You won’t get all the data you need – whether you’re measuring it, asking for it, or modeling it – right away, so you’ll need to keep going, adjusting your approach as you go and learn about the quality of the data you’re collecting. Hopefully you can reduce your dependency on bought data as you go.

Finally, don’t forget – all this marketing you’re doing (or plan to do) is itself a very valuable source of data about your users. You should make sure you have a means to capture data about the marketing you’re exposing your users to, and how they’re responding to it, because this data is useful not just for refining your marketing as you go along, but can actually be useful other areas of your business such as product development or support. Perhaps you’ll even get your company’s Big Data people to have a bit more begrudging respect for marketing…

del.icio.usdel.icio.us diggDigg RedditReddit StumbleUponStumbleUpon

August 26, 2015

Got a DMP coming in? Pick up your underwear

mr-messy-nr-8If you’re like me, and have succumbed to the unpardonably bourgeois luxury of hiring a cleaner, then you may also have found yourself running around your house before the cleaner comes, picking up stray items of laundry and frantically doing the dishes. Much of this is motivated by “cleaner guilt”, but there is a more practical purpose – if our house is a mess when the cleaner comes, all she spends her time doing is tidying up (often in ways that turn out to be infuriating, as she piles stuff up in unlikely places) rather than actually cleaning (exhibit one: my daughter’s bedroom floor).

This analogy occurred to me as I was thinking about the experience of working with a Data Management Platform (DMP) provider. DMPs spend a lot of time coming in and “cleaning house” for their customers, tying together messy datasets and connecting them to digital marketing platforms. But if your data systems and processes are covered with the metaphorical equivalent of three layers of discarded underwear, the DMP will have to spend a lot of time picking that up (or working around it) before they can add any serious value.

So what can you do ahead of time to get the best value out of bringing in a DMP? That’s what this post is about.

What is a DMP, anyway?

That is a excellent question. DMPs have evolved and matured considerably since they emerged onto the scene a few years ago. It’s also become harder to clearly identify the boundaries of a DMP’s services because many of the leading solutions have been integrated into broader “marketing cloud” offerings (such as those from Adobe, Oracle or Salesforce). But most DMPs worth their salt provide the following three core services:

Data ingestion & integration: The starting place for DMPs, this is about bringing a marketer’s disparate audience data together in a coherent data warehouse that can then be used for analytics and audience segment building. Central to this warehouse is a master user profile  – a joined set of ID-linked data which provides the backbone of a customer’s profile, together with attributes drawn from first-party sources (such as product telemetry, historical purchase data or website usage data) and third-party sources (such as aggregated behavioral data the DMP has collected or brokered).

Analytics & segment building: DMPs typically offer their own tools for analyzing audience data and building segments, often as part of a broader campaign management workflow. These capabilities can vary in sophistication, and sometimes include lookalike modeling, where the DMP uses the attributes of an existing segment (for example, existing customers) to identify other prospects in the audience pool who have similar attributes, and conversion attribution - identifying which components of a multi-channel campaign actually influenced the desired outcomes (e.g. a sale).

Delivery system integration: The whole point of hiring a DMP to integrate data and enable segment building is to support targeted digital marketing. So DMPs now provide integration points to marketing delivery systems across email, display (via DSP and Exchange integration), in-app and other channels. This integration is typically patchy and influenced by other components of the DMP provider’s portfolio, but is steadily improving.

Making the best of your DMP relationship

The whole reason that DMPs exist in the first place is because achieving the above three things is hard – unless your organization in a position to build out and manage its own data infrastructure and put some serious investment behind data integration and development, you are unlikely to be able to replicate the services of a DMP (especially when it comes to integration with third-party data and delivery systems). But there are a number of things you can do to make sure you get the best value out of your DMP relationship.

 

1. Clean up your data

dirty-dishesThis is the area where you can make the most difference ahead of time. Bringing signals about your audience/customers together will benefit your business across the board, not just in a marketing context. You should set your sights on integrating (or at least cataloging and understanding) all data that represents customer/prospect interaction with your organization, such as:

  • Website visits
  • Purchases
  • Product usage (if you have a product that you can track the usage of)
  • Mobile app usage
  • Social media interaction (e.g. tweets)
  • Marketing campaign response (e.g. email clicks)
  • Customer support interactions
  • Survey/feedback response

You should also integrate any datasets you have that describe what you already know about your customers or users, such as previous purchases or demographic data.

The goal here is, for a given user/customer, to be able to identify all of their interactions with your organization, so that you can cross-reference that data to build interesting and useful segments that you can use to communicate with your audience. So for user XYZ123, for example, you want to know that:

  • They visited your website 3 times in the past month, focusing mainly on information about your Widget3000 product
  • They have downloaded your free WidgetFinder app, and run it 7 times
  • They previously purchased a Widget2000, but haven’t used it for four months
  • They are male, and live in Sioux Falls, South Dakota
  • Last week they tweeted:
    image

Unless you’re some kind of data saint (or delusional), reading the two preceding paragraphs probably filled you with exhaustion. Because all of the above kinds of data have different schemas (if they have schemas at all), and more importantly (or depressingly), they all use different (or at least independent) ways of identifying who the user/customer actually is. How are you supposed to join all this data if you don’t have a common key?

DSPs solve these problems in a couple of ways:

  • They provide a unified ID system (usually via a third-party tag/cookie) for all online interaction points (such as web, display ads, some social)
  • They will map/aggregate key behavioral signals onto a common schema to create a single user profile (or online user profile, at any rate), typically hosted in the DMP’s cloud

The upside of this approach is that you can achieve some degree of data integration via the (relatively) painless means of inserting another bit of JavaScript into all of your web pages and ad templates, and also that you can access other companies’ audiences who are tagged with the same cookie – so-called audience extension.

However, there are some downsides, also. Key amongst these are:

Yet another ID: If you already have multiple ways of IDing your users, adding another “master ID” to the mix may just increase complexity. And it may be difficult to link key behaviors (such as mobile app purchases) or offline data (such as purchase history) to this ID.

Your data in someone else’s cloud: Most marketing cloud/DMP solutions assume that the master audience profile dataset will be stored in the cloud. That necessarily limits the amount and detail of information you can include in the profile – for example, credit card information.

It doesn’t help your data: Just taking a post-facto approach with a DMP (i.e. fixing all your data issues downstream of the source, in the DMP’s profile store) doesn’t do anything to improve the core quality of the source data.

So what should you do? My recommendation is to catalog, clean up and join your most important datasets before you start working with a DMP, and (if possible) identify an ID that you already own that you can use as a master ID. The more you can achieve here, the less time your DMP will spend picking up your metaphorical underwear, and the more time they’ll spend providing value-added services such as audience extension and building integrations into your online marketing systems.

 

2. Think about your marketing goals and segments

cpc_01You should actually think about your marketing goals before you even think about bringing in a DMP or indeed make any other investments in your digital marketing capabilities. But if your DMP is already coming in, make sure you can answer questions about what you want to achieve with your audience (for example, conversions vs engagement) and how you segment them (or would like to segment them).

Once you have an idea of the segments you want to use to target your audience, then you can see whether you have the data already in-house to build these segments. Any work you can do here up-front will save your DMP a lot of digging around to find this data themselves. It will also equip you well for conversations with the DMP about how you can go about acquiring or generating that data, and may save you from accidentally paying the DMP for third-party data that you actually don’t need.

 

3. Do your own due diligence on delivery systems and DSPs

catapultYour DMP will come with their own set of opinions and partnerships around Demand-side Platforms (DSPs) and delivery systems (e.g. email or display ad platforms). Before you talk with the DMP on this, make sure you understand your own needs well, and ideally, do some due diligence with the solutions in the marketplace (not just the tools you’re already using) as a fit to your needs. Questions to ask here include:

  • Do you need realtime (or near-realtime) targeting capabilities, and under what conditions? For example, if someone activates your product, do you want to be able to send them an email with hints and tips within a few hours?
  • What kinds of customer journeys do you want to enable? If you have complex customer journeys (with several stages of consideration, multiple channels, etc) then you will need a more capable ‘journey builder’ function in your marketing workflow tools, and your DMP will need to integrate with this.
  • Do you have any unusual places you want to serve digital messaging, such as in-product/in-app, via partners, or offline? Places where you can’t serve (or read) a cookie will be harder to reach with your DMP and may require custom integration.

The answers to these questions are important: on the one hand there may be a great third-party system with functionality that you really like, but which will need custom integration with your DMP; on the other hand, the solutions that the DMP can integrate with easily may get you started quickly and painlessly, but may not meet your needs over time.

 

If you can successfully perform the above housekeeping activities before your DMP arrives and starts gasping at the mountain of dishes piled up in your kitchen sink, you’ll be in pretty good shape.

del.icio.usdel.icio.us diggDigg RedditReddit StumbleUponStumbleUpon

June 23, 2015

The seven people you need on your data team

Congratulations! You just got the call – you’ve been asked to start a data team to extract valuable customer insights from your product usage, improve your company’s marketing effectiveness, or make your boss look all “data-savvy” (hopefully not just the last one of these). And even better, you’ve been given carte blanche to go hire the best people! But now the panic sets in – who do you hire? Here’s a handy guide to the seven people you absolutely have to have on your data team. Once you have these seven in place, you can decide whether to style yourself more on John Sturges or Akira Kurosawa.

Before we start, what kind of data team are we talking about here? The one I have in mind is a team that takes raw data from various sources (product telemetry, website data, campaign data, external data) and turns it into valuable insights that can be shared broadly across the organization. This team needs to understand both the technologies used to manage data, and the meaning of the data – a pretty challenging remit, and one that needs a pretty well-balanced team to execute.

1. The Handyman
Weird-Al-Handy_thumb10The Handyman can take a couple of battered, three-year-old servers, a copy of MySQL, a bunch of Excel sheets and a roll of duct tape and whip up a basic BI system in a couple of weeks. His work isn’t always the prettiest, and you should expect to replace it as you build out more production-ready systems, but the Handyman is an invaluable help as you explore datasets and look to deliver value quickly (the key to successful data projects). Just make sure you don’t accidentally end up with a thousand people accessing the database he’s hosting under his desk every month for your month-end financial reporting (ahem).

Really good handymen are pretty hard to find, but you may find them lurking in the corporate IT department (look for the person everybody else mentions when you make random requests for stuff), or in unlikely-seeming places like Finance. He’ll be the person with the really messy cubicle with half a dozen servers stuffed under his desk.

The talents of the Handyman will only take you so far, however. If you want to run a quick and dirty analysis of the relationship between website usage, marketing campaign exposure, and product activations over the last couple of months, he’s your guy. But for the big stuff you’ll need the Open Source Guru.

2. The Open Source Guru
cameron-howe_thumbI was tempted to call this person “The Hadoop Guru”. Or “The Storm Guru”, or “The Cassandra Guru”, or “The Spark Guru”, or… well, you get the idea. As you build out infrastructure to manage the large-scale datasets you’re going to need to deliver your insights, you need someone to help you navigate the bewildering array of technologies that has sprung up in this space, and integrate them.

Open Source Gurus share many characteristics in common with that most beloved urban stereotype, the Hipster. They profess to be free of corrupting commercial influence and pride themselves on plowing their own furrow, but in fact they are subject to the whims of fashion just as much as anyone else. Exhibit A: The enormous fuss over the world-changing effects of Hadoop, followed by the enormous fuss over the world-changing effects of Spark. Exhibit B: Beards (on the men, anyway).

So be wary of Gurus who ascribe magical properties to a particular technology one day (“Impala’s, like, totally amazing”), only to drop it like ombre hair the next (“Impala? Don’t even talk to me about Impala. Sooooo embarrassing.”) Tell your Guru that she’ll need to live with her recommendations for at least two years. That’s the blink of an eye in traditional IT project timescales, but a lifetime in Internet/Open Source time, so it will focus her mind on whether she really thinks a technology has legs (vs. just wanting to play around with it to burnish her resumé).

3. The Data Modeler
ErnoCube_thumb9While your Open Source Guru can identify the right technologies for you to use to manage your data, and hopefully manage a group of developers to build out the systems you need, deciding what to put in those shiny distributed databases is another matter. This is where the Data Modeler comes in.

The Data Modeler can take an understanding of the dynamics of a particular business, product, or process (such as marketing execution) and turn that into a set of data structures that can be used effectively to reflect and understand those dynamics.

Data modeling is one of the core skills of a Data Architect, which is a more identifiable job description (searching for “Data Architect” on LinkedIn generates about 20,000 results; “Data Modeler” only generates around 10,000). And indeed your Data Modeler may have other Data Architecture skills, such as database design or systems development (they may even be a bit of an Open Source Guru). But if you do hire a Data Architect, make sure you don’t get one with just those more technical skills, because you need datasets which are genuinely useful and descriptive more than you need datasets which are beautifully designed and have subsecond query response times (ideally, of course, you’d have both). And in my experience, the data modeling skills are the rarer skills; so when you’re interviewing candidates, be sure to give them a couple of real-world tests to see how they would actually structure the data that you’re working with.

4. The Deep Diver
diver_thumb3Between the Handyman, the Open Source Guru, and the Data Modeler, you should have the skills on your team to build out some useful, scalable datasets and systems that you can start to interrogate for insights. But who to generate the insights? Enter the Deep Diver.

Deep Divers (often known as Data Scientists) love to spend time wallowing in data to uncover interesting patterns and relationships. A good one has the technical skills to be able to pull data from source systems, the analytical skills to use something like R to manipulate and transform the data, and the statistical skills to ensure that his conclusions are statistically valid (i.e. he doesn’t mix up correlation with causation, or make pronouncements on tiny sample sizes). As your team becomes more sophisticated, you may also look to your Deep Diver to provide Machine Learning (ML) capabilities, to help you build out predictive models and optimization algorithms.

If your Deep Diver is good at these aspects of his job, then he may not turn out to be terribly good at taking direction, or communicating his findings. For the first of these, you need to find someone that your Deep Diver respects (this could be you), and use them to nudge his work in the right direction without being overly directive (because one of the magical properties of a really good Deep Diver is that he may take his analysis in an unexpected but valuable direction that no one had thought of before).

For the second problem – getting the Deep Diver’s insights out of his head – pair him with a Storyteller (see below).

5. The Storyteller
woman_storytellerThe Storyteller’s yin is to the Deep Diver’s yang. Storytellers love explaining stuff to people. You could have built a great set of data systems, and be performing some really cutting-edge analysis, but without a Storyteller, you won’t be able to get these insights out to a broad audience.

Finding a good Storyteller is pretty challenging. You do want someone who understands data quite well, so that she can grasp the complexities and limitations of the material she’s working with; but it’s a rare person indeed who can be really deep in data skills and also have good instincts around communications.

The thing your Storyteller should prize above all else is clarity. It takes significant effort and talent to take a complex set of statistical conclusions and distil them into a simple message that people can take action on. Your Storyteller will need to balance the inherent uncertainty of the data with the ability to make concrete recommendations.

Another good skill for a Storyteller to have is data visualization. Some of the most light bulb-lighting moments I have seen with data have been where just the right visualization has been employed to bring the data to life. If your Storyteller can balance this skill (possibly even with some light visualization development capability, like using D3.js; at the very least, being a dab hand with Excel and PowerPoint or equivalent tools) with her narrative capabilities, you’ll have a really valuable player.

There’s no one place you need to go to find Storytellers – they can be lurking in all sorts of fields. You might find that one of your developers is actually really good at putting together presentations, or one of your marketing people is really into data. You may also find that there are people in places like Finance or Market Research who can spin a good yarn about a set of numbers – poach them.

6. The Snoop
Jimmy_Stewart_Rear_Window_thumb6These next two people – The Snoop and The Privacy Wonk – come as a pair. Let’s start with the Snoop. Many analysis projects are hampered by a lack of primary data – the product, or website, or marketing campaign isn’t instrumented, or you aren’t capturing certain information about your customers (such as age, or gender), or you don’t know what other products your customers are using, or what they think about them.

The Snoop hates this. He cannot understand why every last piece of data about your customers, their interests, opinions and behaviors, is not available for analysis, and he will push relentlessly to get this data. He doesn’t care about the privacy implications of all this – that’s the Privacy Wonk’s job.

If the Snoop sounds like an exhausting pain in the ass, then you’re right – this person is the one who has the team rolling their eyes as he outlines his latest plan to remotely activate people’s webcams so you can perform facial recognition and get a better Unique User metric. But he performs an invaluable service by constantly challenging the rest of the team (and other parts of the company that might supply data, such as product engineering) to be thinking about instrumentation and data collection, and getting better data to work with.

The good news is that you may not have to hire a dedicated Snoop – you may already have one hanging around. For example, your manager may be the perfect Snoop (though you should probably not tell him or her that this is how you refer to them). Or one of your major stakeholders can act in this capacity; or perhaps one of your Deep Divers. The important thing is not to shut the Snoop down out of hand, because it takes relentless determination to get better quality data, and the Snoop can quarterback that effort. And so long as you have a good Privacy Wonk for him to work with, things shouldn’t get too out of hand.

7. The Privacy Wonk
Sadness_InsideOut_2815The Privacy Wonk is unlikely to be the most popular member of your team, either. It’s her job to constantly get on everyone’s nerves by identifying privacy issues related to the work you’re doing.

You need the Privacy Wonk, of course, to keep you out of trouble – with the authorities, but also with your customers. There’s a large gap between what is technically legal (which itself varies by jurisdiction) and what users will find acceptable, so it pays to have someone whose job it is to figure out what the right balance between these two is. But while you may dread the idea of having such a buzz-killing person around, I’ve actually found that people tend to make more conservative decisions around data use when they don’t have access to high-quality advice about what they can do, because they’re afraid of accidentally breaking some law or other. So the Wonk (much like Sadness) turns out to be a pretty essential member of the team, and even regarded with some affection.

Of course, if you do as I suggest, and make sure you have a Privacy Wonk and a Snoop on your team, then you are condemning both to an eternal feud in the style of the Corleones and Tattaglias (though hopefully without the actual bloodshed). But this is, as they euphemistically say, a “healthy tension” – with these two pulling against one another you will end up with the best compromise between maximizing your data-driven capabilities and respecting your users’ privacy.

Bonus eighth member: The Cat Herder (you!)
The one person we haven’t really covered is the person who needs to keep all of the other seven working effectively together: To stop the Open Source Guru from sneering at the Handyman’s handiwork; to ensure the Data Modeler and Deep Diver work together so that the right measures and dimensionality are exposed in the datasets you publish; and to referee the debates between the Snoop and the Privacy Wonk. This is you, of course – The Cat Herder. If you can assemble a team with at least one of the above people, plus probably a few developers for the Open Source Guru to boss about, you’ll be well on the way to unlocking a ton of value from the data in your organization.

Think I’ve missed an essential member of the perfect data team? Tell me in the comments.

del.icio.usdel.icio.us diggDigg RedditReddit StumbleUponStumbleUpon

May 17, 2015

The rise of the Chief Data Officer

mad-men-monolithAs the final season of Mad Men came to a close this weekend, one of my favorite memories from Season 7 is the appearance of the IBM 360 mainframe in the Sterling Cooper & Partners offices, much to the chagrin of the creative team (whose lounge was removed to make space for the beast), especially poor old Ginsberg, who became convinced the “monolith” was turning him gay (and took radical steps to address the issue).

My affection for the 360 is partly driven by the fact that I started my career at IBM, closer in time to Man Men Series 7 (set in 1969) than the present day (and now I feel tremendously old having just written that sentence). The other reason I feel an affinity for the Big Blue Box is because my day job consists of thinking of ways to use data to make marketing more effective, and of course that is what the computer at SC&P was for. It was brought in at the urging of the nerdish (and universally unloved) Harry Crane, to enable him to crunch the audience numbers coming from Nielsen’s TV audience measurement service to make TV media buying decisions. This was a major milestone in the evolution of data-driven marketing, because it linked advertising spend to actual advertising delivery, something that we now take for granted.

The whole point of Mad Men introducing the IBM computer into the SC&P offices was to make a point about the changing nature of advertising in the early 1970s – in particular that Don Draper and his “three martini lunch” tribe’s days were numbered. Since then, the rise of the Harry Cranes, and the use of data in marketing and advertising, has been relentless. Today, many agencies have a Chief Data Officer, an individual charged with the task of helping the agency and its clients to get the best out of data.

But what does, or should, a Chief Data Officer (or CDO) do? At an advertising & marketing agency, it involves the following areas:

Enabling clients to maximize the value they get from data. Many agency clients have significant data assets locked up inside their organization, such as sales history, product telemetry, or web data, and need help to join this data together and link it to their marketing efforts, in order to deliver more targeted messaging and drive loyalty and ROI. Additionally, the CDO should advise clients on how they can use their existing data to deliver direct value, for example by licensing it.

Advising clients on how to gather more data, safely. A good CDO offers advice to clients on strategies for collecting more useful data (e.g. through additional telemetry), or working with third-party data and data service providers, while respecting the client’s customers’ privacy needs.

Managing in-house data assets & services. Some agencies maintain their own in-house data assets and services, from proprietary datasets to analytics services. The CDO needs to manage and evolve these services to ensure they meet the needs of clients. In particular, the CDO should nurture leading-edge marketing science techniques, such as predictive modeling, to help clients become even more data-driven in their approach.

Managing data partnerships. Since data is such an important part of a modern agency’s value proposition, most agencies maintain ongoing relationships with key third-party data providers, such as BlueKai or Lotame.The CDO needs to manage these relationships so that they complement the in-house capabilities of the agency, and so the agency (and its clients) don’t end up letting valuable data “walk out of the door”.

Driving standards. As agencies increasingly look to data as a differentiating ingredient across multiple channels, using data and measurement consistently becomes ever more important. The CDO needs to drive consistent standards for campaign measurement and attribution across the agency so that as a client works with different teams, their measurement framework stays the same.

Engaging with the industry & championing privacy. Using data for marketing & advertising is not without controversy, so the DCO needs to be a champion for data privacy and actively engaged with the industry on this and other key topics.

As you can see, that’s plenty for the ambitious CDO to do, and in particular plenty that is not covered by other traditional C-level roles in an ad agency. I think we’ll be seeing plenty more CDOs appointed in the months and years to come.

del.icio.usdel.icio.us diggDigg RedditReddit StumbleUponStumbleUpon

February 07, 2012

Big (Hairy) Data

Zagazoo

My eye was caught the other day by a question posed to the “Big Data, Low Latency” group on LinkedIn. The question was as follows:

“I've customer looking for low latency data injection to hadoop . Customer wants to inject 1million records per/sec. Can someone guide me which tools or technology can be used for this kind of data injection to hadoop.”

The question itself is interesting, given its assumption that Hadoop is part of the answer – Hadoop really is the new black in data storage & management these days – but the answers were even more interesting. Among the eleven or so people who responded to the question, there was almost no consensus. No single product (or even shortlist of products) emerged, but more importantly, the actual interpretation of the question (or what the question was getting at) differed widely, spinning off a moderately impassioned debate about the true meaning of “latency”, the merits of solid-state storage vs HD storage, and whether to clean/dedupe the data at load-time,or once the data is in Hadoop.

I wouldn’t class myself as a Hadoop expert (I’m more of a Cosmos guy), much less a data storage architect, so I may be unfairly mischaracterizing the discussion, but the message that jumped out of the thread at me was this: This Big Data stuff really is not mature yet.

I was very much put in mind of the early days of the Web Analytics industry, where so many aspects of the industry and the way customers interacted with it had yet to mature. Not only was there still a plethora of widely differing solutions available, with heated debates about tags vs logs, hosted vs on-premise, and flexible-vs-affordable, but customers themselves didn’t even know how to articulate their needs. Much of the time I spent with customers at WebAbacus in those days was taken up by translating the customer’s requirements (which often had been ghost-written by another vendor who took a radically different approach to web analytics) into terms that we could respond to.

This question thread felt a lot like that – there didn’t seem to be a very mature common language or frame of reference which united the asker of the question and the various folk that answered it. As I read the answers, I found myself feeling mightily sorry for the question-poser, because she now has a list as long as her arm of vendors and technologies to investigate, each of which approaches the problem in a different way, so it’ll be hard going to choose a winner.

If this sounds like a grumble, it’s really not – the opposite, in fact. It’s very exciting to be involved in another industry that is forming before my very eyes. Buy most seasoned Web Analytics professionals enough drinks and they’ll admit to you that the industry was actually a bit more interesting before it was carved up between Omniture and Google (yes, I know there are other players still – as Craig Ferguson would say, I look forward to your letters). So I’m going to enjoy the childhood and adolescence of Big Data while I can.

del.icio.usdel.icio.us diggDigg RedditReddit StumbleUponStumbleUpon

About

About me

Disclaimer

Subscribe

Enter your email address:

Delivered by FeedBurner

Subscribe