How I Built a Data Analytics Platform in a Month

During the evolution of my current company’s product line and technical skillset, there was a point when leadership decided it wanted to realign our current vision and start focusing on the needs of our customers rather than solely on the development project of building the worlds fastest triple-store.

It was at that point that leadership said, in order to build a better relationship with our customers, we’re going to step into their shoes, and do with our products what they wish they could do, if only they had a better understanding of linked data and graphs. In that process, our team started vigorously researching and developing a process that performs the very analytics that our tool is supposed to foster.

I was made the product manager / product owner of that effort within the company.

Over the next several weeks, it became clear to me that analytics is not only a technical skillset, but a process, and an IT management challenge. In order to be successful, the analytics team had to be able to easily ingest data from the client, meet with them regularly to truly understand their set of needs, our analysts needed to be able to ask tough questions about the data, and then rapidly generate data analysis snippets that represented answers to those questions while doing it in a repeatable way. Those analysis snippets couldn’t just stand on their own, though. They needed to be addressed by our analysts explicitly — so, after performing those analysis tidbits, they needed to officially record insights for those questions. And finally, after all of that, we needed a way to generate a report that encompassed all of that effort.

During that time, we were heading down two major paths simultaneously. On one path, we were manually performing all of our data analysis for our clients by composing queries, running command line tools, building data models, consuming APIs and other services, and manually reporting all of our efforts. So far, I’ve learned that it takes FOREVER to do all of this work manually.

Luckily, as a development team, we also are very good at writing applications. So, the other path, naturally, was to write an application that could organize, enable, and demonstrate all of these pieces of the process at scale. So, during the actual work, we also built a full application that did all of that.

I am happy to report that after exactly 1 month of working tirelessly literally from the moments we woke up to the moments we went to bed, we have it. It is a working prototype of an analytics suite that literally does everything we need, and improves the performance in which we do them.

The toolset consisted of the following:

  • NodeJS for middle tier development

  • AzureSQL for MSSQL data storage and retrieval

  • MongoDB for rapid caching and application data storage

  • Mongoose for mid-tier object relational mapping within the application

  • AngularJS for the consumer application

  • Twitter bootstrap and SASS for dynamic interface styling

  • Countless 3rd party NPM modules for specific data analysis (like Natural Language Processing, Correlation Generation, Machine Learning, Data Cleansing, etc)

  • Azure and Amazon Web Services for storage and deployment of the application

The combination of these elements has resulted in a multi-tier application that not only can perform many difficult types of data analysis, but also is designed to be extended to use additional data sources, and perform new types of analysis in the future, including graph exploration and ontology management.

This is one of the most difficult projects I’ve done in a while, not only because of the time constraints, but also because of the requirement to literally become well versed in data analysis and data science. It was not easy, but it has been done, and from here on out, I expect every addition to the application to produce some very interesting and insightful data revelations to our customers. :)

My Journey to Breaking 80 in Golf, and What I Learned Along the Way

I work very close to a golf course, and so when weather permits, I get out there to practice on the range during the occasional lunch break. Its been about 1 year since I played my very first round of golf, and last week, I broke 80. Since the beginning, I’ve gotten my share of punishment and reward from the game, and now that I’m a decent enough golfer that its possible to play at that level, I’ll tell you what I’ve discovered about my own game, and perhaps anyone’s game at that level.

Continue reading

The Time I Met Joan Rivers

I occasionally get invited to attend performances at a Resort / Casino in the Palm Springs area by a family member who has hookups there. Usually what happens is we’ll get seats to a performance, and often before the show, my wife and I will get the opportunity to meet and greet with the performer. Usually, the performer is kinda in a trance, stuck in their own thoughts, presumably preparing mentally for their performance, and the meet and greet is more like a quiet handshake and a picture, and that’s that. But one time, Joan Rivers did a show there, and in the meet and greet, she was clearly a much different kind of person than many of the other performers that we’d been able to meet, and I also got some interesting insight into what kind of business woman she is. It was illuminating.

Continue reading

What is DBPedia and How Do I Use It?

DBPedia, in general, is a linked-data data extraction of Wikipedia. If you’ve been living under a rock and don’t know what Wikipedia is, its a crowd sourced encyclopedia hosted on the internet. In terms of data structure, Wikipedia reports on its own wiki page that it is powered by clusters of Linux servers and MySQL databases, and uses Squid caching servers in order to handle the 25,000 to 60,000 page requests per second that it gets on average. In terms of the product, it is very culturally significant in that it is one of the most referenced sources of general information on earth, if not the outright leader. Again, DBPedia, for all intents and purposes, is a linked-data version of that dataset.

Continue reading

When to Use a Triplestore in Your Application

So, you’ve heard of a triplestore, that’s an important first step. Now, you’re wondering why you’d  need one? That is a good question. I believe that the best way to answer the question is to talk a little bit about we know about triples as a data model, what SPARQL is good for, and where the industry has gone in the last few years that has caused us to need triples and SPARQL in the first place. Let’s get started.

Continue reading

Answer Synthesis is the Future, Let me Tell You What It Is

The act of computationally creating an answer via cognitive computing or conceptual reasoning rather than searching for it with text curiously gets described in so many ways, but nobody ever seems to talk about it directly, its always a talked about in terms of how it is done. I propose we call it “answer synthesis”. Let’s dig deeper.

Continue reading

What Type of Problem is Ubiquitous Computing, Really?

Ubiquitous Computing, as a term, has been around for quite some time now. It refers to a state of computing in which there is a presence of data, interfaces, computing, etc, that is essentially omnipresent and is available for interaction in a wide variety of forms for a wide array of purposes. In essence, when people talk about the Internet of Things, they usually are describing what others refer to as ubiquitous computing. One of the aspects of this paradigm that makes it ubiquitous is a somehow-universal interoperability between all things connected.

Also, separate from that, there should be a sense of ambient intelligence that persists around all of these interacting agents. Obviously, interoperability, intelligence, high-availability, access, security, communication, data interoperability, data analysis, prediction, etc, are all under the umbrella of the term. However, is all of this really needing to be solved in order to have the user experience of having interoperability and ambient intelligence? I think not. Either way, there are lots of things to think about when it comes to putting your finger on what the real problems are that are left to solve in this space.

Continue reading

Is Semantic Web Dead or Alive?

Semantic web is alive, and I will tell you why. But first, let me tell you how I arrived at this conclusion.

When I first came to my current job, I was tasked with writing an automated implementation of Schema.org as a service, which could be implemented by multi-site owners as a way to shortcut the tagging and structuring of their site data for the sake of acquiring rich snippets, and ultimately to get better search engine performance.

During that time, I learned a lot about schema.org, semantic web technologies, linked data, and Google. So, with that said, if you’re here wanting to know if you should care about the semantic web, let me drop some knowledge.

Continue reading