Solving the Problem of Unified Deep Business Search is Fascinating

At this moment, there are exactly 604,844 registered corporations in the USA.

The story of why I know that is pretty fascinating. A while back, I embarked on a personal project to unify data sources and build a reasonably powerful business search tool, and in the process, I learned a lot about what information is and isn’t easy to acquire out there on the web.

First, you have social networks. Those were easy. Just connect to APIs, perform basic string search for an entity and allow the user to rapidly dive into content and then flag it as the correct entity. From there, you can look into social network engagement and activity, and those types of things.

Second, you have general information repositories, like wikipedia. Using DBpedia and WIkidata APIs, I started pulling in RDF data about companies, and due to the crowdsourced nature of those datasources, some companies are rich with data, and some companies have none. Basically, these sources proved to be unreliable when it came to identifying all possible companies.

Third, to identify the products and services a company offers, that proved to be most difficult. Most sources don’t reliably track a company’s services, products, product lines, etc. So, the most reliable place to get that information is from the company’s website itself. The only problem is, this is supposed to be a unified search. So I had to build a special scraper / parser that took that information and translated it into my data graphs.

Fourth, the process of getting financial data is more categorical than I thought it would be. On one hand, if you’re searching for a public company, you have stocks, accounting data, and all sorts of interesting financial information. If you’re dealing with a private company, you have to be creative about how to characterize their finances.

It was here that I discovered that you can look into a private company’s finances by searching through SEC filings. They come in an antiquated XML / search interface called Edgar, which is served over FTP. I was able to get a basic API going that consumed the information from that FTP and parsed it into useful formats, but even then, it was limiting in what you could do with it. One interesting thing that I found in these filings is that Board Members in the company are always listed.

It was then I realized why companies like Bloomberg company search always has directors and leadership listed — you can extract that from SEC filings. Easy peasy! Also, I discovered that you can actually get a raw data list of all existing corporations in the US, and if parsed correctly, you can tease out their CIK numbers, which are used to identify trails of SEC filings.

When I was looking at that raw line-delimited list of registered corporations in the USA, I became curious how long the list was. Frankly, it was killing my browser’s scroll bar, so I just had to know. So, I opened up my browser javascript console, selected the text from the DOM, and ran a basic regex splitter function to break all of that content into a long javascript array. From there, I just console.log’ed out the array.length, et voila!, 604,844 items. Holy cow. Is that a lot? or is that a little? The United States is so large. I couldn’t really map it mentally, so I started doing basic string searches in it, and found the regulars… Google, Uber, Twitter, etc. Then I started looking up other, smaller private companies that were either LLCs or traditional corporations, and yep, they were there. What about the local taco shop? Nope. Not a corporation. Fascinating!

Once you parse all of the above information, you can build quite the graph of data. And, while I’ve seen some companies who provide bits and pieces of that kind of information, nobody really provides a unified search across everything. Hopefully I will provide that service someday…

How I Built a Data Analytics Platform

Recently, I was tasked with building a method for performing data analysis as a service, as well as building a data science team that could perform those services for various industries. So, I quickly kicked off a deep exploration of the field and the technologies available in the space. I found what I learned to be very engaging and it got me thinking not only about technical challenges in performing analysis, but also managing the process of doing data science in a business context.

Continue reading

My Journey to Breaking 80 in Golf, and What I Learned Along the Way

I work very close to a golf course, and so when weather permits, I get out there to practice on the range during the occasional lunch break. Its been about 1 year since I played my very first round of golf, and last week, I broke 80. Since the beginning, I’ve gotten my share of punishment and reward from the game, and now that I’m a decent enough golfer that its possible to play at that level, I’ll tell you what I’ve discovered about my own game, and perhaps anyone’s game at that level.

Continue reading

The Time I Met Joan Rivers

I occasionally get invited to attend performances at a Resort / Casino in the Palm Springs area by a family member who has hookups there. Usually what happens is we’ll get seats to a performance, and often before the show, my wife and I will get the opportunity to meet and greet with the performer. Usually, the performer is kinda in a trance, stuck in their own thoughts, presumably preparing mentally for their performance, and the meet and greet is more like a quiet handshake and a picture, and that’s that. But one time, Joan Rivers did a show there, and in the meet and greet, she was clearly a much different kind of person than many of the other performers that we’d been able to meet, and I also got some interesting insight into what kind of business woman she is. It was illuminating.

Continue reading