Kuhn-Tucker Demystified

For some, Kuhn-Tucker method (also known as Karush-Kuhn-Tucker method or KKT for short) seems like a random optimization method that comes out of nowhere. The truth is, KKT multiplier is just a glorified Lagrange multiplier. I do not want to downplay the importance and relevance of KKT multiplier. It has been successfully utilized in a variety of optimization cases in miscellaneous areas. However, its mathematics is far from hard to understand (given that one understands Lagrange multiplier) and I will make my case by looking at an example:

maximize f(x,y)= \sqrt{xy}

subject to

  1. x \geq 0
  2. y \geq 0
  3. 5x+3y \leq 10

The area defined by the constraint can be seen in the following graph:


Let us suppose that we do not know anything about KKT but we are familiar with unconstrained optimization and Lagrange multiplier, how are we going to solve this optimization problem?

One may approach this optimization problem by doing the following steps (I know a sensible person would not do these but bear with me for a second. You can skim step 2 onwards and revisit them later):

  1. Derive f(x,y) with respect to x and y and solve for these equal to 0. If the solution (say (x_1,y_1)) lies in the interior of the triangle, then compute the value of the function on this point and save them (i.e. save (x_1,y_1,f(x_1,y_1))) so that we can compare them with the other possible maxima later to determine the global maximum. Repeat this for all solutions. This will give us all possible local maxima in the interior. For this particular example, there is no point in the interior of the diagram making (f_x,f_y) = (0,0) so there is nothing to save.
  2. Use Lagrange multiplier to get all possible maxima on line segment b (but not necessarily on point A and C due to how Lagrange multiplier work). That is to solve: maximize f(x,y)= \sqrt{xy} subject to 5x+3y = 10 and ignore all the result with x<0 or y<0. A typical way is to define the Lagrange function Lb (x,y,\lambda) =\sqrt{xy} + \lambda(10-5x-3y) and solve Lb_x = Lb_y = Lb_{\lambda} = 0. Then, we’ll save all solutions as well as the value of the functions on these points. For this particular example:
    1. Lb_x = \frac{1}{2} \sqrt{y/x}-5\lambda = 0
    2. Lb_y = \frac{1}{2} \sqrt{x/y}-3\lambda = 0
    3. Lb_{\lambda}=10-5x-3y = 0
    4. The solution is (1,5/3) and the value of the function at this point is \sqrt{5/3}
  3. Similarly, use Lagrange multiplier to get all possible maxima on line c (but not necessarily on point A and B):
    1. Lc=\sqrt{xy} + \mu x
    2. Lc_x = \frac{1}{2} \sqrt{y/x}+\mu = 0
    3. Lc_y = \frac{1}{2} \sqrt{x/y} = 0
    4. Lc_{\mu}=x = 0
    5. There is no solution.
  4. Again, do this for line a (but not point B and C):
    1. La=\sqrt{xy} + \nu y
    2. La_x = \frac{1}{2} \sqrt{y/x} = 0
    3. La_y = \frac{1}{2} \sqrt{x/y}+\nu = 0
    4. La_{\nu}=x =0
    5. There is no solution.
  5.  On point A:
    1. LA=\sqrt{xy} + \mu x +\lambda(10-5x-3y)
    2. LA_x = \frac{1}{2} \sqrt{y/x} + \mu - 5\lambda
    3. LA_y = \frac{1}{2} \sqrt{x/y}- 3\lambda
    4. LA_{\mu}=x
    5. LA_{\lambda} =10-5x-3y
    6. There solution is (0,10/3) and the value of the function here is 0.
  6. On point B: …
  7. On poi.. OK I think you get my point: In general, we need to do a lot of  Lagrange multiplier calculations and all these calculations are very similar. Sure, using Lagrange multiplier seemed like a waste of time in my example above as the function and constraints are pretty easy but usually it would not be this simple and you may have no choice.

This is where KKT makes your life (just a little bit) easier: Notice how all the Lagrange functions and their derivatives are really similar that we basically repeat many calculations over and over again? The idea of KKT is that you only need to define one Lagrange function and derive once. The Lagrange function you need to define is

L =\sqrt{xy} + \mu x + \nu y +\lambda(10-5x-3y);

that is, the Lagrange function when you consider all constraints as equalities. And you will get:

i)L_x = \frac{1}{2} \sqrt{y/x} + \mu - 5\lambda

ii)L_y =\frac{1}{2} \sqrt{x/y}+ \nu - 3\lambda

iii) L_{\mu} = x

iv) L_{\nu} = y

v) L_{\lambda} =10-5x-3y

Why define and derive this function? Because it is really easy to get to all the derivatives in Step 1-7 from here. The idea is, since we have computed Lagrange of all constraints, we can then turn any equality on and off. For instance, if we want to find maxima in the interior just set L_{\mu},L_{\nu},L_{\lambda} >0 (that is x>0, y>0 and 10>5x+3y which means we only consider the pink region inside the triangle) and \mu = \nu = \lambda =0 on equations i-v and you will get the exact same equations we had in Step 1.

On line segment c, we know that the equality sign on constraint 1 has to be satisfied (binding) while the equality signs on constraint 2 and 3 do not have to be satisfied (non-binding). Therefore, to get all possible maxima on line c, just set the following values on equations i-v: L_{\mu} =0, L_{\nu},L_{\lambda} >0 (which just means that x=0, y> 0 and 10>5x+3y) and take \nu = \lambda = 0 and we will end up with the same equations in Step 3. And on point A, just set L_{\mu} =0, L_{\lambda}=0, L_{\nu}>0 and take \nu = 0 and we shall recover equations of Step 5.

Note that in all our example above, whenever \mu,\nu or \lambda equal 0, then L_{\mu},L_{\nu},L_{\lambda} will be non-zero and vice-versa. This condition you get when you turn on/off some constraints is known as complementary slackness. Do this for all possible combinations of \mu,\nu and \lambda where each can be zero or non-zero and then compare the value of the functions on the points we got by solving all these systems of equations. In this example, there are 8 possibilities and in general there 2^n cases where n is the number of constraints. In practice, you can quickly reduce the number by arguing that some cases are impossible. Take the case \mu,\nu and  \lambda all non-zero. Their complements L_{\mu},L_{\nu},L_{\lambda} will all be zero but this means we are considering points in the intersection of the line a,b,c but we know there is no such point.

You may have realized at this point that KKT multiplier is the optimization method that let you derive once, then branches off into multiple systems of equations (by using complementary slackness condition) instead of branching off into multiple Lagrange derivations just to get the same multiple systems of equations. You may even think that you can formulate the KKT multiplier for a more general optimization problem; I really encourage you to do this and then compare your version with the ones available on many lecture notes/textbooks.

Link: A worked example of KKT multiplier


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s