🔖 Notation

# Notation ![image.png](https://cos.easydoc.net/46811466/files/l7lt16kw.png) $\mathbf{s}_t$: environment state $\mathbf{o}_t$: observation $\mathbf{a}_t$: action $r\left(\mathbf{s}_t, \mathbf{a}_t\right)$: reward function $\pi_\theta\left(\mathbf{a}_t \mid \mathbf{o}_t\right)-$ policy $\pi_\theta\left(\mathbf{a}_t \mid \mathbf{s}_t\right)-$ policy (fully observed) ![](https://i.imgur.com/aX6YerV.png)