Precision and Recall calculation for Search Engines

Search engines retrieves a number of documents as a result for a query. However, the relevance of the result can also be checked by consulting the users. For the details, see Cranfield Evaluation Methodology.

Depending on the user group, different users might find different result sets more relevant. Depending on the context, sometimes, a user wants as much results as possible, irrevelant results with proportionally more irrelevant documents or prefer fewer results with much more relevant hits.

The precision is the ratio of relevant documents to the total number of returned documents.  If an index has 1M documents and the result of query q results 100 documents from which 45 is relevant, the precision is 45/100.

The recall is the ratio of returned documents to the total relevant documents. If an index has 1M documents and the result of query q results 100 documents from which 45 is relevant but the total number of relevant documents in the index is 90, the recall is 45/90.

The formula for the relevance is:

P – Precision

R – Recall

B – Parameter, often set to 1

relevance formula1

Parameter B is used to tune the P and R values and is often set to 1. When it is set to 1, the formula becomes:

relevance formula2

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s