Our academic background forces us to thoroughly check the algorithms and systems we develop, and this is what we do. Recent experiments on publicly available data (LETOR4.0) revealed that our self-learning search engine substantially outperforms state-of-the-art search algorithms. What does this mean? That you'll probably not find a better (learning to rank) search algorithms out there than the one we have.
Let’s dive into some of the details here. We ran experiments using the LETOR4.0 dataset, which is made available by Microsoft. This dataset is particularly useful for testing learning to rank approaches: search methods based on machine learning. Since 904Labs self-learning search is exactly this, the LETOR4.0 dataset is perfect for determining its performance. The dataset contains about 25 million web pages and 1,700 queries with labeled documents. Labeled documents have been given a relevance score with respect to a query, so that we know which documents are relevant and which ones are not.
For learning to rank experiments we actually not need the original text of documents and queries. In the LETOR4.0 dataset, documents and queries are represented by feature vectors. Let’s take a simple example: we have a query "bicycle" and a certain document X (consisting of 100 words in total) that contains this word eight times. On top of that, this document has 20 incoming links, and a URL length of 70 characters. Now, instead of representing the query and document by their actual content, we can also say that for the query "bicycle", document X is [100, 8, 20, 70]. We now have represented the document by four features and their values: document length (100 words), the frequency of the query term in the document (8), the number of incoming links (20), and the length of the URL of the document (70). We can think of many more feature like these, and this is what has been done for LETOR4.0. The dataset contains 46 features, all of which can be found in this pdf.
The goal of any learning to rank system is to find an optimal combination of the available features to generate the best possible ranking of search results for any given query. To do this, the system needs to be trained, which is why the LETOR4.0 dataset contains training, validation, and test data. Systems are first trained on the training data and are then validated on the validation data. This can be repeated several times, until one is satisfied with the results. Only after everything seems to be in order, the final trained system is applied to the test data. This way we make sure that the test data has never been seen before by the system, so it could not have "learned it by hard".
Over the years, many different learning to rank systems have been proposed. The most prominent ones have been trained and tested on the LETOR4.0 dataset. These systems include RankSVM-Struct, ListNet, AdaRank, and RankBoost. In the results section below, we compare these systems to the learning to rank method implemented in our self-learning search engine, called 3PR.
We measure results in nDCG@5. Normalized Discounted Cumulative Gain (nDCG) is a metric that expresses how good a system is at getting the best results to the top of the ranking. The better a system is, the closer it gets to 1.0. We measure nDCG at the fifth document for each query, mimicking a user who wants the best result in the top 5. The results in the list below show that our system improves over the best learning to rank method (RankBoost) by more than 4% in nDCG.
The plot below shows that the performance of our self-learning search system improves while feeding it with more training data. One can look at this as an improvement over time: as more data comes into the self-learning search engine, it gets better at combining all available features and produces better documents rankings.
To conclude, 904Labs self-learning search solutions are powered by the best learning to rank method out there. Do you want to know more about our solutions or about the underlying algorithms, don't hesitate to get in touch.