Optimization Methods

Optimal Control Theory

The term optimal control theory (OCT) refers to all the methods and algorithms that have been developed to solve an optimization in controlled dynamics, i.e. a problem of the form

\[ \text{min}\,F(x) \\ \left\{ \begin{array} \dot x(t) = g( x(t), a) \qquad (t>0)\\x(0) = x^0 \end{array} \right. \]

where \(a\) is a continuous or discrete function over time that represents the control and \(F(x)\) is the function to minimize. The latter one is known and used in OCT under different names, such as merit function, figure of merit or functional to optimize.

Quantum optimal control is a branch of OCT that studies the dynamics of a quantum system, quite often solving the Schrödinger equation with some controls \(\epsilon_i(t)\) linearly coupled to a set of Hamiltonians:

\[\Op{H}(t) = \Op{H}_0 + \sum_i \epsilon_i(t) \Op{H}_i.\]

Our main goal is to find a series of controls (usually representing some electromagnetic pulses that interact with the system) that minimize the defined functional. This functional is described in a such a way that its value is minimum whenever the system evolves in some desired manner, which is usually given by (but not restricted to) one of the following cases:

  • State-to-state transitions: an initial set of states at time \(t=0\) must evolve to a set of target states at final time \(t = T\): \(\{ \ket{\Psi_k} \}\rightarrow \{ e^{i \phi_k} \ket{\Psi^{\tgt}_k} \}\) for an arbitrary of fixed phase \(\phi_k\). For an open quantum system, \(\{ \Op{\rho}_k \} \rightarrow \{ \Op{\rho}^{\tgt}_k \}\).

  • Implementation of a unitary gate \(\Op{O}\): any state \(\ket{\Psi(t=0)}\) must evolve into the state \(\ket{\Psi(t=T)} = \Op{O}\ket{\Psi(t=0)}\). Since any unitary \(\Op{O}\) can be represented in a basis of the full Hilbert space, the optimization of a unitary gate maps to a simultaneous state-to-state transition for all of the basis states that span \(\Op{O}\). For the example of a two-qubit gate, we require

    \[ \Ket{k}\rightarrow e^{i \phi}\Op{O}\Ket{k}, \qquad\Ket{k} \in \{ \Ket{00},\Ket{01}, \Ket{10}, \Ket{11}\}. \]

    Note that for a unitary gate, there can be (at most) an arbitrary global phase \(\phi\), whereas for an arbitrary state-to-state transitions each target may have an different phase \(\phi_k\). In Liouville space, the optimization for a unitary gate is expressed as

    \[ \Op{\rho}_{ij}\rightarrow\Op{O}\Op{\rho}_{ij}\Op{O}^\dagger, \qquad \Op{\rho}_{ij} = \Ket{i}\Bra{j}. \]

Normally the functional \(J(\{\ket{\Psi_k(T)}\}, \{\epsilon_i(t)\})\) primarily consists of a final time functional \(J_T(\{\ket{\Psi_k(T)}\})\) (similarly for \(\{\Op{\rho}_k(T)\}\)),but it may include additional running costs that would impose some extra restrictions over the system. with these special conditions we can manipulate can also

  • Put an upper bount to the controls’ amplitude.

  • Prevent the population from going into a forbidden subspace.

  • Make a robust control against some kind of noise

QDYN includes the qdyn_optimize utility to set up and perform optimization of all of the above cases, and more. Please refer to the documentation of that utility for details.

How to optimize (using QDYN)

The main optimization algorithm used in our group is Krotov’s algorithm, which is why we focus on how to use this method in the following guide. We will only describe how to optimize with First-Order-Krotov. For the optimization using different methods, please have a look of the needed variables for their respective routines in their own documentation.

The heart of QDYN’s optimization part is the routine optimize_pulses(), which routine should be used when doing optimizations with QDYN. The three most important parameters are

targets(:):

This list of targets not only tells the optimization routine, from or onto which state/gate to optimize, but also e.g. the dynamic generator of the system, i.e. the Hamiltonian or Liouvillian, which are coupled to the list of pulses described in the next item. This variable is also passed to the chis interface, needed for krotov.

pulses(:):

A list of all the pulses which are coupled to the Hamiltonian or Liouvillian. Apart from the values defining the pulse amplitude at each point of the time grid, it also contains information on how (or even if) to optimize the pulse. This information is stored in pulse%oct. The most important parameter for the optimization with Krotov’s algorithm is stored in lambda_a, which gives the step size of the update. Another typically used variable is the update-shape, which allows for the inclusion of a weighting function for the field update in every step in the form of a shape function (e.g. a gaussian or a rectangular shape. This can be used for fixing pulse values at specific times (e.g. fixing the pulse amplitude to zero at t=0 and t=T). For more information see pulse_t.

get_chis:

This is an external interface which needs to be provided and which calculates the co-states \(\ket{\chi}\), which are described below. These are the core of Krotov’s method and provide the gradient information to the algorithm (not the J_T argument). There are several routines already implemented, which are labeled chis_*. We will have a look at the most common ones further below

Another important parameter is para, which holds all the information of the config file. We will also have a look on the most important config file parameters later.

The targets parameter

The optimization target is saved in a type oct_target_t data structure. Initializing them should be done using the set_oct_target() routine. Since oct targets are only used for the optimization routine, this function takes an array of oct_target_t variables, which is allocatable and is set individually by the index i. Apart from that, we need to set an initial state for each target, from which the propagation is started. We also have to specify a dynamic generator describing our system and the grid, which can also be a fake grid. Ideally this grid is also set via the config file. The next depend on what we would like to optimize. In the most general case (most of the functionals) we will need to give an additional state, which commonly acts as the final state and is labeled as state_target, when giving it to the set_oct_target() routine.

The pulses parameter

The pulses parameter takes an array of pulse_t, which can conveniently initialized via the config file using the helpful init() routine. All the operators in the targets variable (or rather the dynamical generator) need to point to the pulses in this list. After the optimization is done, this array contains the optimized pulses.

The fw_states_T parameter

The fw_states_T parameter is an allocatable list of states (state_t), which will hold the states at time T, propagated with the optimized pulses. Therefore it is enough to provide an allocated array of state_t.

The chis parameter

The chis interface is used to provide the co-states for the optimization, which are propagated backward in time. The external interface has the following structure:

subroutine get_chis(chis, fw_states_T, targets, tau, para_oct)
    type(state_t), allocatable, intent(inout) :: chis(:)
    type(state_t),              intent(in)    :: fw_states_T(:)
    type(oct_target_t),         intent(inout) :: targets(:)
    complex(idp),               intent(in)    :: tau(:)
    type(oct_pt),               intent(in)    :: para_oct
end subroutine get_chis

chis are the co-states calculated by this routine. The fw_states_T and targets parameters are the ones directly passed on from the routine. The array of complex numbers tau however, is the weighted overlap of the final state \(\ket{\Psi_i(T)}\) obtained by the propagation with the current set of pulses and the desired final state \(\ket{\Psi^{\mathrm{tgt}}_i(T)}\), as set for the targets earlier. \(\tau\) is calculated by f_tau(). Note, that these are not always needed and can also be left undefined. Finally the para_oct parameter holds all the relevant information on the optimization itself and can be used for the calculation of the co states. This is in general not needed but can be really hand for carrying information needed for the calculation of chis, such as in chis_split().

In general it is not necessary to write these routines on our own since QDYN offers several predefined chis routines, which can be found together with the corresponding functionals (J_T_*) in the oct_funcs module (and some in the local_invariants module). Very common functionals/chis are

Functional

Chis

Purpose

J_T_sm()

chis_sm()

Optimize to target states, independent of local and global

J_T_ss()

chis_ss()

Optimize to target states, independent of global phase

J_T_re()

chis_re()

Optimize to target states, with matching local and global phase

The para parameter or How to set up the config file

The Optimization in QDYN can be configured to large extent using the oct section in the config file. However, many of these parameters are not important for a basic optimization, but are used for second order optimization or special functionals. In the following, we give an overview of the most important variables for a simple state to state (or similar) optimization and describe briefly what they mean and how to use them.

oct:method

This is probably the most important keyword for all optimizations. It specifies the method used for the optimization. In most cases we want to choose ‘krotovpk’ (KROTOV-Palao-Kosloff), which is also referred to as Krotov’s method (or more commonly only krotov). Apart from that, we can also specify ‘lbfgs’, which is Grape with LBFGS. However, this is not commonly used and might have some bugs

oct:J_T_conv

With this keyword we can define the value of J_T at which the optimization is considered to be converged. Since we are minimizing this is in general something close to 0, e.g. 1e-2.

oct:iter_stop

By setting this keyword, we can define the maximum number of iteration steps the program should optimize for.

oct:continue

This keyword does exactly what it says, it describes if the optimization should be continued from a previous optimization. This can for example be useful if we set ‘iter_stop’ to some value and see after the optimization that we are almost there. Then we could add a continue=T to the config file and increase the ‘iter_stop’ variable. Now the optimization will continue from the last optimized pulse and only needs to optimize for the additional iterations we specified in iter_stop. So let’s say we had iter_stop=100 which was not enough, then we could go for iter_stop=110 and continue=T and our script will continue from step 100 and optimize for 10 more iterations (or until another stop condition is fulfilled).

Apart from these, there are some other keywords, which can also be helpful.

oct:delta_J_T_conv/oct:delta_J_conv

To avoid, that the optimization is stuck at some plateau and might optimize for too long without any benefit, we can define a condition which stops the optimization if the relative difference between the new J(_T) and the old J(_T) is smaller than delta_J(_T)_conv. In other words, the optimization will be stopped if \(\Delta J_{(\mathrm{T})}/|J_{(\mathrm{T})}|\) < delta J(_T)_conv.

Adaptive \(\lambda_a\)

QDYN also offers the possibility to dynamically adjust the \(\lambda_a\) parameter during the optimization to account for difficult optimization landscapes. This can be done by setting the oct:dynamic_lambda_a parameter to a value >0. This parameter scales \(\lambda_a\) by a factor of \(\frac{\mathrm{EMA}\left(g_a^\mathrm{int}\right)}{J_T\cdot\lambda^{\mathrm{dyn}}_a}\), where EMA is the exponential moving average, J_T the overall functional value and \(\lambda^{\mathrm{dyn}}_a\) the value given by oct:dynamic_lambda_a. Using this, it is possible to keep the update at a certain ratio of the overall functional. For example if you want to keep the update of the control-dependent cost integral \(g_a^\mathrm{int}\) around \(10^{-3}J_T\), you would need to set

oct: dynamic_lambda_a=0.001

Since the EMA averages over the last few optimization steps, it makes sense to wait for some more steps before adjusting \(\lambda_a\) again, which is why the adaption of \(\lambda_a\) is suspened for a certain amount of steps after the update. This can be controlled by the oct:dynamic_la_wait parameter in the config file, which is set to be 10 if not specified. We can adjust this for example by using

oct: dynamic_lambda_a=0.001, dynamic_la_wait=5

which means that QDYN would only wait for five steps until it checks if \(\lambda_a\) can be updated again. This makes sense, if we alter the averaging with the EMA, which is controlled by a parameter \(\alpha\) representing the weight by which the value of \(g_a^\mathrm{int}\) from the last optimization step enters the exponential moving average. We can change this by the config file parameter oct:ema_alpha which is set to 0.3 by default

oct: dynamic_lambda_a=0.001, dynamic_la_wait=5, ema_alpha=0.5

In this case, the last value of \(g_a^\mathrm{int}\) would enter with a factor of 0.5, the previous one with a factor 0.25 and so on.

After having changed all these parameters, \(\lambda_a\) will now get updated dynamically during the optimization process. However, \(\lambda_a\) is only rescaled if a certain threshold is exceeded. By default this is set to a factor of two. This means, that \(\lambda_a\) is only updated, if the new \(\lambda_a\) would be smaller than 0.5 or larger than 2.0 relative to the previous \(\lambda_a\). If you need smaller margins for the update you can set these by adding the oct:dynamic_la_marginfac parameter to the oct section in the config file

oct: dynamic_lambda_a=0.001, dynamic_la_wait=5, ema_alpha=0.5,
     dynamic_la_marginfac=1.25

In this case, \(\lambda_a\) would get updated if the new value would exeed the old one by a factor of 0.8 or 1.25. Note, that this is equivalent to setting dynamic_la_marginfac=0.8.

All other parameters

Besides the parameters of optimize_pulses() which we have discussed above, we also have to set the g_b interface, which provides a function for the state dependent part of the functional. However, this is not often used and can be set to g_b_zero(), which just ignores this part of the functional.

Another type of parameter which might also be useful is the info-hook, which includes all the subroutines that can be used to write out more information or even alter certain parameters of the optimization after certain step in each iteration. The oct_info_hook is called after each optimization step, whereas the prop_info_hook is called after each forward propagation. Some use cases could be:

oct_info_hook:
  • Manually truncating the pulse spectrum after each oct step

  • Monitoring individual parts of a composite functional for diagnosis

  • Adaptive scaling of \(\lambda_a\) (see also oct:dynamic_lambda_a as config parameter)

prop_info_hook:
  • Track the norm of the Hilbert space state / trace of the density matrix

  • Output trajectory information, e.g. for plotting

  • Calculate partial traces (for example needed for simulating measurements).

Note

Most of the commonly used cases for the prop_info_hook can also be treated more elegantly by using the observables section (insert link here) in the config file. For a detailed description c.f. observables_pt (add link) and the code of the qdyn_prop_traj utility.

Optimization Algorithms

The optimization of any of the functionals is done in an iterative procedure: in each OCT iteration \(i\), we find an update to the pulse \(\epsilon^{(i)}(t)\) such that the updated pulse \(\epsilon^{(i+1)}(t)\) yields an improved value for the functional. The optimization loop continues until \(J\) reaches a value that is smaller than some predefined limit, or until the value of \(J\) shows no significant improvement.

There are two basic categories of algorithms for finding updates for the control parameters that improve the value of the functional. The first are “gradient-free”, employing only evaluations of the functional. Most prominently, this includes the downhill-simplex algorithm discussed below. Gradient-free optimization algorithms are extremely versatile and easy to apply, but they also tend to converge very slowly, particularly for a large number of control parameters. Moreover, they are prone to running into local minima, although more advanced global methods such as genetic algorithms and swarm search optimization also exist. Gradient-free optimization methods can easily be incorporated into experimental setups as a closed-loop control, where a measurement determines the figure of merit and drives a variation of the control parameters for the next iteration.

The second category are gradient-based methods. Including information about how the optimization functional varies with changes in the controls greatly speeds up convergence. However, it requires to derive analytical expressions for the gradient, and then additional numerical resources for evaluating that gradient. A concurrent scheme like the gradient ascent discussed below varies each control parameter individually according to the derivative of the optimization functional with respect to that parameter. In contrast, Krotov’s method (see below) takes the time-continuous \(\epsilon(t)\) as a whole. Subsequent discretization to a time grid yields a sequential scheme in which the update for \(\epsilon_j\) takes into account the updates at earlier times. While not applicable to arbitrary pulse parametrizations, Krotov’s method guarantees monotonic convergence. Gradient-based methods are typically used as an open-loop control, where the entire optimization is performed based on a numerical simulation, before taking the final optimized set of controls to an experimental implementation.

QDYN supports an optimization with gradient-based algorithms through a single common interface, implemented by the optimize_pulses routine. It receives an array of targets (see above) and information about the optimization functional and method that is to be used.

1. Downhill Simplex Optimization

The downhill simplex, or Nelder-Mead algorithm is a particularly simple, gradient-free optimization method that is very effective if there is only a handful of \(N\) optimization parameters. The idea is to construct a simplex polytope consisting of \(N+1\) points in the \(N\)-dimensional parameter space. The point with the largest value of the functional is then replaced by reflection on the remainder of the polytope, followed by some contraction and expansion steps, yielding a new point with an improved value of the functional. Intuitively, the simplex “rolls” down the optimization landscape.

The fact that the algorithm only relies on the evaluation of the optimization functional makes it extremely versatile, giving a black box method for the optimization of arbitrary figures of merit. It is well-suited as a pre-optimization for finding pulses of simple analytical forms that may then be optimized further on using a gradient-based method on a time grid. In particular, the pulse duration \(T\) can easily be included as a control parameter for a simplex method. In contrast, \(T\) must be fixed when optimizing on a time grid.

Non-gradient methods may be used for controls that are inherently coarse-grained, due to the limitations of experimental setups, e.g. the limited number of pixels in early femtosecond pulse shapers. Some control problems are known to require only a small number of frequency components. The CRAB algorithm [21] has been developed for such cases.

QDYN does not contain direct support for gradient-free optimization. However, the propagators in QDYN can easily be used to evaluate the optimization functional. An external software such as the Scipy optimize package can then wrap the QDYN program. And example for this is shown in the CRAB tutorial.

2. Gradient Ascent

The GRAPE algorithm (Gradient Ascent Pulse Engineering) [22] considers the gradient \(\frac{\partial J}{\partial \epsilon_j}\) with respect to any control parameter and then updates that control parameter according to

\[\epsilon_j^{(i+1)} = \epsilon_j^{(i)} - \alpha \frac{\partial J}{\partial \epsilon_j}\]

using a suitable step width \(\alpha\). In practice, using only the gradient of the functional to steer the optimization will generally not yield sufficiently fast convergence; the situation could be improved by using Newton’s method, taking into account the second derivative, i.e., the Hessian \(H_{jj'} = \frac{\partial^2 J}{\partial \epsilon_j \partial \epsilon_{j'}}.\) However, calculating the Hessian is generally prohibitively expensive. Therefore, quasi-Newton methods are employed to estimate the Hessian using only gradient information. The popular LBFGS-B algorithm implements this efficiently. QDYN handles the calculation of the gradient only, and passes it to the LBFGS-B library for the actual optimization.

For functionals depending on the overlap between propagated and expected state, and assuming a time-grid parametrization of the pulse, the gradient becomes

\[\begin{split}\begin{align} \frac{\partial \tau_k}{\partial \epsilon_j} &= \frac{\partial}{\partial \epsilon_j} \langle k \vert \Op{O}^\dagger \Op{U}_{nt-1} \dots \Op{U}_{j} \dots \Op{U}_{1} \vert k \rangle \\ &= \langle k \vert \Op{O}^\dagger\, \Op{U}_{nt-1} \dots \Op{U}_{j+1} \, \frac{\partial \Op{U}_{j}}{\partial \epsilon_j} \, \Op{U}_{j-1} \dots \Op{U}_{1} \vert k \rangle \\ &= \langle \chi_k(t_{j+1}) \vert \frac{\partial \Op{U}_{j}}{\partial \epsilon_j} \vert \phi_k(t_j)\rangle\,, \end{align}\end{split}\]

where \(\ket{\phi_k(t)}\) is the forward-propagated basis state \(\ket{k}\), and \(\ket{\chi(t)}\) is the backward-propagated target state \(\Op{O}\ket{k}\). The numerical effort in calculating the gradient compared to a simple evaluation of the functional is therefore an additional backward propagation. Moreover, the states of either the backward or the forward propagation at every point in time need to be stored in order to calculate the gradient. The derivative of the \(j\)’th time evolution operator an be efficiently calculated using the QDYN gradU_schirmer routine.

In order to use GRAPE/LBFGS-B, optimize_pulses must be called with method=lbfgs in the OCT section of the config file.

3. Krotov’s Method

For time-continuous controls, Krotov’s method considers a functional of the form

\[J[\epsilon^{(i)}(t)] = J_T({\phi_k^{(i)}(T)}) + \int_0^T g_a[\epsilon^{(i)}(t)] \dd t + \int_0^T g_b[{\phi^{(i)}_k(t)}] \dd t.\]

In addition to the final time functional, running costs that depend on the control field and the propagated states at each point in time are also included. As before, for a gate optimization, the \(\{\ket{\phi^{(i)}_k(t)}\}\) are the basis states \(\{\ket{k}\}\) propagated under the pulse \(\epsilon^{(i)}(t)\) at the current OCT iteration \(i\).

Krotov’s method uses an auxiliary functional to disentangle the interdependence of the states and the field, allowing to find an updated \(\epsilon^{(i+1)}(t)\) such that \(J[\epsilon^{(i+1)}] < J[\epsilon^{(i)}]\) is guaranteed. The derivation, see Ref. [23], yields the condition

\[\begin{split}\begin{split} \left.\frac{\partial g_a}{\partial \epsilon}\right \vert_{\epsilon^{(i+1)}(t)} & = 2 \Im \left[ \sum_{k=1}^{N} \Bigg\langle \chi_k^{(i)}(t) \Bigg\vert \Bigg(\left.\frac{\partial \Op{H}}{\partial \epsilon} \right\vert_{i+1} \Bigg) \Bigg\vert \phi_k^{(i+1)}(t) \Bigg\rangle + \right. \\ & \qquad \left. + \frac{1}{2} \sigma(t) \Bigg\langle \Delta\phi_k(t) \Bigg\vert \Bigg(\left.\frac{\partial \Op{H}}{\partial \epsilon} \right\vert_{i+1}\Bigg) \Bigg\vert \phi_k^{(i+1)}(t) \Bigg\rangle \right]\,, \end{split}\end{split}\]

with

\[\ket{\Delta \phi_k(t)} \equiv \ket{\phi_k^{(i+1)}(t)} - \ket{\phi_k^{(i)}(t)}.\]

Assuming the equation of motion for the forward propagation of \(\ket{\phi_k(0)} = \ket{k}\) is written as

\[\frac{\partial}{\partial t} \Ket{\phi_k^{(i+1)}(t)} = -\frac{i}{\hbar} \Op{H}^{(i+1)} \Ket{\phi_k^{(i+1)}(t)},\]

the co-states \(\Ket{\chi_k}\) are backward-propagated under the old pulse as

\[\frac{\partial}{\partial t} \Ket{\chi_k^{(i)}(t)} = -\frac{i}{\hbar} \Op{H}^{\dagger,(i)} \Ket{\chi_k^{(i)}(t)} + \left.\frac{\partial g_b}{\partial \Bra{\phi_k}} \right\vert_{\phi^{(i)}(t)},\]

with the boundary condition

\[\Ket{\chi_k^{(i)}(T)} = -\left.\frac{\partial J_T}{\partial \Bra{\phi_k}} \right\vert_{\phi_k^{(i)}(T)}.\]

The scalar function \(\sigma(t)\) that must be properly chosen to ensure monotonic convergence. In most cases, it is sufficient to set \(\sigma(t) \equiv 0\).

In order to obtain an explicit equation for \(\epsilon^{(i+1)}(t)\), a control-dependent running cost \(g_a\) must be used, and usually takes the form

\[g_a[\epsilon(t)] = \frac{\lambda_a}{S(t)} \left(\epsilon(t) - \epsilon^{\text{ref}} (t)\right)^2,\]

with a scaling parameter \(\lambda_a\) and a shape function \(S(t) \in [0,1]\). When \(\epsilon^{\text{ref}}\) is set to the optimized field \(\epsilon^{(i)}\) from the previous iteration,

\[g_a[\epsilon^{(i+1)}(t)] = \frac{\lambda_a}{S(t)} \left(\Delta\epsilon(t) \right)^2, \quad \Delta\epsilon(t) \equiv \epsilon^{(i+1)}(t) - \epsilon^{(i)}(t),\]

and for \(\sigma(t) \equiv 0\), the explicit first-order Krotov update equation is obtained:

\[\Delta\epsilon(t) = \frac{S(t)}{\lambda_a} \Im \left[ \sum_{k=1}^{N} \Bigg\langle \chi_k^{(i)}(t) \Bigg\vert \Bigg( \left.\frac{\partial \Op{H}}{\partial \epsilon} \right\vert_{i+1} \Bigg) \Bigg\vert \phi_k^{(i+1)}(t) \Bigg\rangle \right].\]

If \(S(t) \in [0,1]\) is chosen as a function that smoothly goes to zero at \(t=0\) and \(t=T\), then the update will be suppressed there, and thus a smooth switch-on and switch-off can be maintained. The scaling factor \(\lambda_a\) controls the overall magnitude of the pulse update. Values that are too large will change \(\epsilon^{(i)}(t)\) by only a small amount, causing slow convergence.

The functional \(J_T\) enters the first-order update equation only in the boundary condition for the backward propagated co-state. For the standard functionals for gate optimization, this evaluates to

\[\begin{split}\begin{aligned} - \left.\frac{\partial J_{T,sm}}{\partial \Bra{\phi_k}} \right\vert_{\phi_k^{(i)}(T)} &= \left( \frac{1}{N^2} \sum_{l=1}^N \tau_l \right) \Op{O} \Ket{k}\,, \\ - \left.\frac{\partial J_{T,re}}{\partial \Bra{\phi_k}}\right \vert_{\phi_k^{(i)}(T)} &= \frac{1}{2N} \Op{O} \Ket{k}\,. \end{aligned}\end{split}\]

At first glance, there is a striking similarity between the Krotov update formula and the GRAPE/LBFGS-B gradient, except that for GRAPE/LBFGS-B both backward- and forward-propagation are performed with the same pulse \(\epsilon^{(i)}(t)\). However, a closer look shows the key difference between GRAPE/LBFGS-B and Krotov’s method: whereas the former is inherently discrete, the latter gives a continuous update equation that is discretized only afterwards. In fact the monotonic convergence of Krotov’s method is only guaranteed in the continuous limit; a coarse time step must be compensated by larger values of \(\lambda_a\), slowing down convergence. Generally, choosing \(\lambda_a\) too small will lead to numerical instabilities and unphysical features in the optimized pulse.

In order to use Krotov’s method, optimize_pulses must be called with method=krotovpk (first order), or method=krotov2 (second order) in the oct section of the config file.

Choosing an Optimization Method

Whether to use a gradient-free optimization method, gradient ascent, or Krotov’s method depends on the size of the problem (both the Hilbert space dimension and the number of control parameters), the requirements on the control pulse, and the optimization functional. Gradient-free methods should be used if propagation is extremely cheap (small Hilbert space dimension), the number of independent control parameters is relatively small, or the functional is of a form that does not allow to calculate gradients.

Gradient ascent should be used if the control parameters are discrete, such as on a coarse-grained time grid, and the derivative of \(J\) with respect to each control parameter is known. Moreover, evaluation of the gradient must be numerically feasible. Parametrization on a time grid directly yields an efficient scheme. Gradients for many other parametrizations (e.g. an expansion in Fourier coefficients) can be derived from that time grid gradient.

Krotov’s method should be used if the control is near-continuous, and if the derivative of \(J_T\) with respect to the states can be calculated. When these conditions are met, Krotov’s method gives excellent convergence, although it is often observed to slow down when getting close to the minimum of \(J\). Since quasi-Newton gradient ascent does not show such a slow-down, it can be beneficial to switch from Krotov’s method to LBFGS-B in the final stage of the optimization.

Bibliography

  • [24] Palao and Kosloff, Phys. Rev. A 68, 062308 (2003)

  • [21] Caneva, et al. Phys. Rev. A 84, 022326 (2011).

  • [22] Khaneja et al. J. Magnet. Res. 172, 296 (2005).

  • [23] Reich et al. J. Chem. Phys. 136, 104103 (2012)