DBPedia, in general, is a linked-data data extraction of Wikipedia. If you’ve been living under a rock and don’t know what Wikipedia is, its a crowd sourced encyclopedia hosted on the internet. In terms of data structure, Wikipedia reports on its own wiki page that it is powered by clusters of Linux servers and MySQL databases, and uses Squid caching servers in order to handle the 25,000 to 60,000 page requests per second that it gets on average. In terms of the product, it is very culturally significant in that it is one of the most referenced sources of general information on earth, if not the outright leader. Again, DBPedia, for all intents and purposes, is a linked-data version of that dataset.
So, you’ve heard of a triplestore, that’s an important first step. Now, you’re wondering why you’d need one? That is a good question. I believe that the best way to answer the question is to talk a little bit about we know about triples as a data model, what SPARQL is good for, and where the industry has gone in the last few years that has caused us to need triples and SPARQL in the first place. Let’s get started.
The act of computationally creating an answer via cognitive computing or conceptual reasoning rather than searching for it with text curiously gets described in so many ways, but nobody ever seems to talk about it directly, its always a talked about in terms of how it is done. I propose we call it “answer synthesis”. Let’s dig deeper.
Ubiquitous Computing, as a term, has been around for quite some time now. It refers to a state of computing in which there is a presence of data, interfaces, computing, etc, that is essentially omnipresent and is available for interaction in a wide variety of forms for a wide array of purposes. In essence, when people talk about the Internet of Things, they usually are describing what others refer to as ubiquitous computing. One of the aspects of this paradigm that makes it ubiquitous is a somehow-universal interoperability between all things connected.
Also, separate from that, there should be a sense of ambient intelligence that persists around all of these interacting agents. Obviously, interoperability, intelligence, high-availability, access, security, communication, data interoperability, data analysis, prediction, etc, are all under the umbrella of the term. However, is all of this really needing to be solved in order to have the user experience of having interoperability and ambient intelligence? I think not. Either way, there are lots of things to think about when it comes to putting your finger on what the real problems are that are left to solve in this space.
Semantic web is alive, and I will tell you why. But first, let me tell you how I arrived at this conclusion.
When I first came to my current job, I was tasked with writing an automated implementation of Schema.org as a service, which could be implemented by multi-site owners as a way to shortcut the tagging and structuring of their site data for the sake of acquiring rich snippets, and ultimately to get better search engine performance.
During that time, I learned a lot about schema.org, semantic web technologies, linked data, and Google. So, with that said, if you’re here wanting to know if you should care about the semantic web, let me drop some knowledge.
There are a lot of people out there talking about the Internet of Things. A lot of them are really enamored with the idea that it is going to be a gazillion dollar industry, but I think that there are other things about it that are more fun to think about, namely how my dog is technically a thing in the Internet of Things. Before I get to that, let me explain what the IoT is and isn’t.
Partial updates are somewhat problematic in the world of RESTful applications. Currently, we use POST and PUT to write data or update it, but on sub-properties of data updates, it actually can get somewhat hard to code for when you get into the more subtle application logic and error management, let alone on datasets that are very large or have very deeply nested data structures in a single JSON object, for example.
But, regardless, PUT and POST have done a satisfactory job up until now, and nobody really needs to use PATCH in a relational context. But therein lies an interesting point: data is getting bigger, and naturally, semantic data is starting to become much more prevalent, and its URI-based. It logically follows that if data continues to become more semantic, and you’re dealing more often in deeply nested structures, you’ll need a URI-based updating method that can be more flexible than PUT and POST. But you don’t have to take my word for it, lets ask an expert.
As a software guy, I believe it, and as a user experience guy, I guarantee it: data entry is the biggest problem in the software world.
A lot of applications are bidirectional in the sense that I use the application for something, and the application uses me for something. I want gratification, or information, or help with something, and it wants information. So why is it that we’re still using forms for everything? Well, in the not-so-distant past, we started getting introduced to the concept of devices and human interaction working harmoniously to produce wildly fluid experiences, as well as giving the user the ability to give instant feedback and information without them having to type things. Of course, this isn’t always the case, but we’ve gotten a taste of that design philosophy, and its really nice.
Fast forward a few years and we started seeing wearables creep into the tech market. Wearable technology often makes use of accelerometers and other hardware to record data passively while you function in the world. The activities include exercise, sleeping, monitoring your vitals, walking, and things like that. This is revolutionary stuff not just because of the hardware being able to do amazing things, but rather it provides a way for people to record specific data about things and upload them into applications without having to type anything. Eventually, they’ll be including miniature wifi-access-points, computer vision, and other amazing technologies. By combining these technologies into physical devices, it gives you, the human, the ability to skip over the cumbersome step of manual data entry.
So, if you’re part of the SEO community, you’ve probably heard the news that Google recently decided to go from sometimes-encrypted search to always-encrypted search. If you use Google Analytics to track the performance of your website out there on the web, then you’ve probably noticed that often the top recorded search terms that brought visitors to your website showed a value of “(not set)”. This value corresponds to visitors that were using the then-sometimes-encrypted search. Now, technically, everyone predicts that this will be the only keyword information we’ll see for inbound search traffic.
This scares SEO companies because typically their business model has been heavily based around the notion that websites need to be optimized for certain keywords, which will then result in people seeing your website when they search for those exact terms. “If we can’t sell that service, then is the death of SEO?” they ask. The answer is no. But before I explain why SEO is not dead, let me explain why Google doesn’t mind getting rid of keyword optimization.
When I talk to someone who belongs to or runs a development team, I often hear reoccurring themes in their criticisms of their existing setup or business flow. I frequently hear things like, “Our management wants XYZ to be written and delivered by this date, and that’s totally unrealistic!”, “We were stressing out last week because we found a really critical bug in production”, or “Our server did XYZ the other day, which caused downtime”, and much more.
What’s interesting about this is that, while they’re not uncommon problems, they also have pretty common solutions, and also suffer from a very common set of barriers to fixing them: they’re too deep in some technology, their team is too used to doing some other process, the code was architected in a funny way, and now there is too much legacy code to do it differently without restructuring everything.
I’m not saying that those problems aren’t valid — In fact, they’re really difficult to fix. But, if you find yourself in a position in which your application is not very old, there isn’t too much code, and your team isn’t that big yet, I guarantee that you can make your life easier in the future by doing the following three things.