Cold

Suddenly it became cold. The snowfall silently started during the night, and the next day paths and trees down below outside the window were covered in white. And the wind started to creep slowly…

Smartphone

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转




Serving and Evaluating Predictions Engines Online

This post goes into using an AI engine to display predictions on a live website and set up A/B experiments to measure performance.

A/B testing involves creating two or more versions of the page and siphoning user traffic to one of the versions in order to evaluate the impact of a change. A/B testing can be used to compare recommendation engines, or the difference between having and not having recommendation engines. A serving configuration is specified and then a prediction API returns a list of item id’s.

Take the case of predicting a list of items. A given API might provide a ‘predict’ method call that returns a sorted list of the highest probability items according to a given model type objective. The functions rendering such items will then display them to users.

The method will potentially need authentication with the service provider’s authentication service given an OAuth account. It’s advisable to create an account specifically dedicated to predictions. In production, it is recommended to use the supporting client libraries to make calls to the prediction API, rather than, say, through a curl script.

The fields in an example prediction request might be the serving configuration (the identifier identifying the event’s place in a page) and the user’s specific request information, like the type of event, the session id, the user’s id, and product id for the models that make use of product ids. The last item applies to models which include a user’s product of interest and not to those that don’t. The former might be a “others you may like” category, which needs a product against which the ‘others’ can be summoned. The latter case could include “recommended for you” categories, where only the user’s history informs the recommendations.

Results can be controlled by other functions, too. For example, a ‘filter’ can be applied to eliminate ‘out of stock’ items or other tagged items. Limitations can be placed on the number of predictions returned in a response. The parameters include returning the full product data, probability scores for items, and levels to control settings like fairness and later reranking.

The response from the API can be used according to the developer’s desires. Typical examples include web pages and personalized emails or recommendations in an application. Most models are personalized in real-time, meaning that results probably should not be cached or stored due to rapid obsolescence.

Recommendation results returned within the larger context of a webpage has some drawbacks. Because the request is “blocking”, adding it to the server-side can introduce latency and tightly couple page serving with recommendations serving. Some integrations with server APIs may limit the recommendations serving code to web results, so attention may be needed to handle mobile results.

Issues with coupling might be addressable with Ajax. There might not be an API endpoint reachable from client-side Javascript because the predict method’s need for authentication blocks such requests, but a handler serving Ajax requests can circumvent that. The existing serving infrastructure may make available such endpoints with extra applications or endpoints, like app engines or functions within cloud services. Cloud services are a nice ‘serverless’ way of deploying such handlers. For example, Python and a retail client library can provide an endpoint that deploys cloud functions and returns their results in JSON or HTML.

How do we evaluate deployed recommendations? The general way is through A/B tests. One can test different AI services against each other, an AI service against no service, changes to model or serving configurations (different objectives like CTR vs. CVR, changes to price ranking and fairness scores), or changes to the interface and placements.

Certain cloud providers allow the creation and execution of A/B tests without server-side code changes. Front-end changes are generally the most tractable of all those capable of being tested. Javascript may also be added to each variant of an experiment. This is useful for content that is displayed from calls to cloud functions. A/B experiments with server-side render content through redirect tests or other API calls.

Say you want to test two models on the same page. The models leverage product ids to make a product details page placement. For this example a cloud function returns the recommendations in HTML format. The results can then be inserted into a DOM element and displayed on a page load.

An experiment here is orchestrated by the suite of analytics tools whose experimental settings are tunable. Objectives and traffic allocation and rule matching help guide the test. The console provides code to populate the div. If two variants exist to call the recommender service from the cloud with different placement ids, then exercise care not to call the recommender unnecessarily. Some services offer prediction based on initial metric results.

That wraps up the discussion. Thank you!

Add a comment

Related posts:

Jesus?

I went through a breakup with Jesus about 10 years ago — after I realized how much the concept of Jesus had been misrepresented by Western Christianity and how many of the teachings and beliefs…

My interpretation of 21 lessons for the 21 century by Yuval Noah Harari

My interpretation of 21 lessons for the 21 century by Yuval Noah Harari. The technological revolution is the biggest challenge ahead of us and in its scope and magnitude something humanity has not seen before in….

THORChain Development Update 003

In the last development update, we talked about our first testnet deployment for the THORChain project, a high-performance liquidity and payment protocol for decentralized exchange use cases. We…