FEATURE BUREAUX SERVICES
“A database of three million customers would be in the low Terabytes today – five years ago it was rare for a marketing database to be larger than a few Gigabytes.” Director of Insight and Innovation, Celerity
small in comparison, the move to the Terabyte scale and rising demand for real-time answers has forced changes in the databases used by MSPs. “We have changed the architecture of our marketing database,” says Gregory. “Real time is now the norm for us and the relational structure is no longer adequate.”
Cleansing big files
For MSPs specialising in list hygiene, client file size is rarely a problem as all other attributes bar name and address are usually stripped out before processing.
“We are seeing larger files come across these days,” says Data8’s Antony Allen (right), whose company specialises in online cleansing services. “We recently processed a 13m record file for one client. But it’s simply a list of names and addresses, and by the time the data is zipped, how big is it going to be?”
Depending on the complexity of validation and the business rules applied, modern cleansing software should be able match the entire UK population against a set of reference files within a manageable period. But as deadlines tighten, whether that period is hours or days for larger files is becoming more important. “With tighter schedules and the growth in volumes, our bureau clients are putting pressure on us to get our software to run as fast as possible,” says Mark Dobson (below), Client Services Director at The Software Bureau. “We’re rewriting Cygnus module by module to take advantage of the latest software innovations.” That means using multi-threading to allow load sharing across multiple servers, and also offering SaaS delivery for the first time. The software already offers alternative matching techniques that suit different applications, for example, high- volume deduping versus name and address matching to suppression files.
“There’s not much more that we can add in
functionality, but clients will be able to choose from an installed version, a service they can use themselves or one that we can deliver,” says Dobson.
28 May 2012
Celerity now employs a data structure developed by data warehouse guru Ralph Kimball. Rather than the hierarchical structure you would tend to find in a relational marketing database where everything is tied back to an individual from the start, the company stores data at a transactional level and only later makes the linkages required to form a picture of a group or an individual’s behaviour – the so-called “presentation layer”.
That suits online data in particular, where a site visitor might start as a cookie entry in a web log and only become a customer later on. This structure can handle far larger data volumes and is also better suited to the distributed computing model required for top performance. “Shaping of the data in its storage phase of
processing immediately limits its usefulness by fixing relationships between data items,” says Grace.
“This fixed structure does not lend itself to the wide variety of analysis and selection techniques that maybe required, so there is an increasing trend to provide this data shaping dynamically through the presentation layer.”
Storage media is cheap and is possibly the simplest part of the big data challenge to solve – but that’s really only for archiving purposes. Crunching the numbers at the required velocity tends to demand high-end hardware. Dynamically building the required view for analysis also requires much faster processing.
“This places much higher levels of work load on the storage which needs to retrieve and join the data together as demanded,” says Grace.” This combined with large volumes leads to specialist solutions such as solid-state storage, appliance-based database engines or the new breed of in-memory solutions.”
This need for computing horsepower on demand is accelerating the shift to cloud-based data centres. “We haven’t bought a server for years,” says Gregory. “Scale and availability are more important now due to the need to send and respond to campaigns in real time. The days when an MSP manages its own infrastructure are almost gone.” However, the traditional SCV building challenges of cleansing and merging feeds into a coherent, actionable database still remain. Higher volumes and the variety of new data types simply exacerbate them.
“A lot of data feeds will contain erroneous or spurious data,” says Steven Day, Director at UKChanges. “There’s a learning curve to go
through to successfully filter out the junk and filter in the useful stuff. Homing in on the right indicators from the data is likely to be an iterative process.” Day also pinpoints the consistency in format and content of data feeds as crucial. “If third party feeds feature, as there’s typically less control and “big data” types of projects are necessarily heavily automated,” he says. “Appropriate checks and balances are required to ensure everything stays in line.” As ever, most companies are well behind the leading edge of marketing practice and big data to those in the midmarket at the moment really means making better – or any - use of online information. Email response data stagnates on Email Service Providers’ servers while web analytics are only used to optimise site design. “At present, big data is a potential
requirement but it is not a reality in the day-to- day world of most of our clients,” says Day. “They struggle to decide which attribute, data feed or source to measure and what the data may mean and how to action it.” If web and call centre data are two of the feeds most responsible for growing marketing database volumes, what about all that social media data? Social media analysis still tends to exist in a silo as it is difficult to bring the data into a marketing database in a meaningful way. “The need to recognise the same people on
and offline is fundamental to making use of social data,” says Thomas. “Sentiment analysis is more
“Volume is definitely in the top three challenges for any new client solution – a rule of thumb for cost and effort is volume multiplied by complexity.”
Andy Grace, Technical Architect, Occam
like traditional market research. But if you want to engage with someone directly through social media, you will at some point have to bring messages into the database at an individual level.”
PREPARING THE WAY Just as faster computers with more memory led developers to create larger, more complex software applications and bigger data sets, so marketers will seize the opportunity to store and analyse ever larger data sets in order to track and understand customer behaviour. Big data may be niche today, but the numbers using those kinds of volumes will surely grow.
“The industry may be a good two or three years away from this but it will happen,” says Gregory. “As an industry, we need to make sure we can provide the road map for our clients.” n
| Page 2
| Page 3
| Page 4
| Page 5
| Page 6
| Page 7
| Page 8
| Page 9
| Page 10
| Page 11
| Page 12
| Page 13
| Page 14
| Page 15
| Page 16
| Page 17
| Page 18
| Page 19
| Page 20
| Page 21
| Page 22
| Page 23
| Page 24
| Page 25
| Page 26
| Page 27
| Page 28
| Page 29
| Page 30
| Page 31
| Page 32