Friday, January 11, 2013

The @WalmartLabs Social Media Analytics project

Firstly, let me wish you all a happy new year. 2012 was definitely very exciting for us all. While scientists at CERN were sifting through a whopping 200 petabytes of data analyzing 800 trillion collision events to detect the Higgs boson, we at @WalmartLabs started out on a little hunt of our own. I’m talking about an insights-mining project we started working on last year.

Social Media Analytics is all about mining retail-related insights from social channels, a perilous and personally exciting task to us. When our team spent the 22nd of November feverishly following the social retail pulse on Black Friday, we knew the world wasn’t preparing for an apocalypse.

As we watched the incredible surge in Walmart related social buzz on that day, we were only gently reminded another time of the promise that lay hidden deep within the treasure of the social data goldmine Рthe promise of social media analytics that is only emphasized all the time today, almost to the point of a clich̩. The potential itself is nevertheless, still largely untapped. We are only barely beginning to scratch the surface of all the great tales that the data has to tell us.

Social buzz typically precedes retail buzz. People are constantly expressing about upcoming stuff on social apps - the hot new video game whose trailer just released, the cool gadget about to be launched, or a new upgrade to that toy that your child always loved. There are good things said, and bad things too. And thus, social media is really a direct real-time feedback channel to us from our many customers. I am still only stating the obvious.

Our goal is to tap into this social buzz and help Walmart with decision making on aspects like inventory and assortment. As an example, the figure below shows a reasonable spike in social activity about Sony's new Android phone Xperia Z, few days ahead of its actual launch. Such insights can help our merchants make smarter decisions ahead of time.

Social data mining comes with incredible challenges, which only makes it all the more exciting for our super smart engineers to come to work every day. Data volume is formidably huge. We are talking about petabytes here. Real-time social data processing requires sophisticated data stores and blazingly fast algorithms. The noise levels are exorbitant, the language used in social forums is heavily informal, unstructured and often ungrammatical, and filtering out that helpful insight out of the huge amount of noise is super hard. Just consider algorithmically parsing - “OMG!!! dis is sooo coool! i luv ma new fone. i cant believ ma luck 4 chosin this! #wellwhatdoyathink”. Popular text analytics and natural language processing techniques based on standard language models simply fail. We need altogether different techniques to filter out and focus on social data that is relevant to us, which in itself is a daunting task. The next step is to map this to meaningful retail products. All of these are difficult tasks. As a quick sneak-peek, a new technique we are trying out today is to look for any of several hand-verified n-grams around brands in a large time window. Several more schemes are to follow. It is only after conquering all of these multifold challenges that meaningful recommendations can be made.

Today, our social media analytics project operates on top of a searchable index of 60 billion social documents and helps merchants at Walmart monitor sentiments and popular interests real-time, or inquire into trends in the past. One can also see geographical variations of social sentiments and buzz levels. There are also tools that marry search trends on, sales trends in our brick-and-mortar stores and social buzz all in one place, to help make correlations. Together, these tools provide powerful social insights today.

As we step into another fantastic year, we are excited to be taking up more audacious goals. On one hand, we aspire to improve the quality of our insights and work with our merchants to expedite them effectively. On the other, we aim to map our interest trends to demand levels for actual products and come up with insights for assortment and inventory management. And all of this, well ahead of time, while we can make a difference. 

It is going to be an exciting year indeed.