Montemurro M A, Senatore R, Panzeri S
Neural Comput. 2007 Nov;19(11):2913-57. doi: 10.1162/neco.2007.19.11.2913.
The estimation of the information carried by spike times is crucial for a quantitative understanding of brain function, but it is difficult because of an upward bias due to limited experimental sampling. We present new progress, based on two basic insights, on reducing the bias problem. First, we show that by means of a careful application of data-shuffling techniques, it is possible to cancel almost entirely the bias of the noise entropy, the most biased part of information. This procedure provides a new information estimator that is much less biased than the standard direct one and has similar variance. Second, we use a nonparametric test to determine whether all the information encoded by the spike train can be decoded assuming a low-dimensional response model. If this is the case, the complexity of response space can be fully captured by a small number of easily sampled parameters. Combining these two different procedures, we obtain a new class of precise estimators of information quantities, which can provide data-robust upper and lower bounds to the mutual information. These bounds are tight even when the number of trials per stimulus available is one order of magnitude smaller than the number of possible responses. The effectiveness and the usefulness of the methods are tested through applications to simulated data and recordings from somatosensory cortex. This application shows that even in the presence of strong correlations, our methods constrain precisely the amount of information encoded by real spike trains recorded in vivo.
对尖峰时间所携带信息的估计对于定量理解脑功能至关重要,但由于实验采样有限导致的向上偏差,这一估计过程颇具难度。基于两个基本观点,我们在减少偏差问题方面取得了新进展。首先,我们表明通过谨慎应用数据重排技术,几乎可以完全消除噪声熵的偏差,而噪声熵是信息中偏差最大的部分。此过程提供了一种新的信息估计器,其偏差远小于标准的直接估计器,且方差相近。其次,我们使用非参数检验来确定假设低维响应模型时,尖峰序列编码的所有信息是否都能被解码。如果是这种情况,响应空间的复杂性可以通过少量易于采样的参数完全捕捉。将这两个不同的过程相结合,我们获得了一类新的精确信息量估计器,它可以为互信息提供数据稳健的上下界。即使每个刺激可用的试验次数比可能的响应次数小一个数量级,这些界也很紧密。通过将这些方法应用于模拟数据和体感皮层的记录,测试了这些方法的有效性和实用性。此应用表明,即使存在强相关性,我们方法也能精确限制体内记录的真实尖峰序列所编码的信息量。