Efficient move, robust Python array estimation

I’m looking for a fast and effective way to calculate a robust moving scale estimate for a set of data. I’m using a 1d array of usually 3-400k elements. Until recently, I’ve been working on Using simulated data (no catastrophic outliers), and the move_std function in the excellent Bottleneck package is helpful to me. However, when I convert to noisy data, std no longer behaves very usefully.

< /p>

In the past, I used a very simple biweight mid-variance code element to deal with bad behavior:

def bwmv(data_array):< br /> cent = np.median(data_array)
MAD = np.median(np.abs(data_array-cent))
u = (data_array-cent) / 9. / MAD
uu = u*u
I = np.asarray((uu <= 1.), dtype=int)
return np.sqrt(len(data_array) * np.sum((data_array-cent) **2 * (1.-uu)**4 * I)\
/(np.sum((1.-uu) * (1.-5*uu) * I)**2))

But the array I am currently using is large enough, which is too slow. Does anyone know a software package that provides such an estimator, or have any suggestions on how to solve this problem quickly and efficiently?

I have used a simple low-pass filter in similar situations.

Conceptually, you can get a moving estimate of the average, fac = 0.99; filter [k] = fac * filtered [k-1](1-fac)* data [k], which is very Effective (in C). A slightly more fancy IIR filter than this one, butterworth low-pass, easy to set up scipy:

b, a = scipy. signal.butter(2, 0.1)
filtered = scipy.signal.lfilter(b, a, data)

To get an estimate of the “proportion”, you can subtract this from the data “Average estimate”. This actually turns the low pass into a high pass filter. Take abs() and run through another low pass filter.

The result might look like this:

Full script:

from pylab import *
from scipy.signal import lfilter, butter

data = randn(1000)
data[300:] += 1.0
data[600:] *= 3.0
b, a = butter(2, 0.03)
mean_estimate = lfilter(b, a, data)
scale_estimate = lfilter(b, a, abs(data-mean_estimate))

plot(data,'.')
plot(mean_estimate)
plot (mean_estimate + scale_estimate, color='k')
plot(mean_estimate-scale_estimate, color='k')

show()

Obviously, butter() The parameters need to be adjusted according to your question. If you set the order to 1 instead of 2, then you will get the simple filter I described first.

Disclaimer: This is the engineer’s opinion on this issue. This method may be unreasonable in any statistics or mathematics. Also, I’m not sure if it really solves your problem (if not, please better Explain), but don’t worry, I have some fun anyway;-)

I am looking for a fast and effective way to calculate the robustness of a set of data Moving scale estimation. I am using a 1d array with usually 3-400k elements. Until recently, I have been using simulated data (without catastrophic outliers), and the move_std function in the excellent Bottleneck software package is very helpful to me. However, when I convert to noisy data, std no longer behaves very usefully.

In the past, I used a very simple biweight mid-variance code element to deal with bad behavior :

def bwmv(data_array):
cent = np.median(data_array)
MAD = np.median(np.abs(data_array-cent ))
u = (data_array-cent) / 9. / MAD
uu = u*u
I = np.asarray((uu <= 1.), dtype=int)< br /> return np.sqrt(len(data_array) * np.sum((data_array-cent)**2 * (1.-uu)**4 * I)\
/(np.sum(( 1.-uu) * (1.-5*uu) * I)**2))

But the array I am using now is large enough, which is too slow. Does anyone know how to provide such An estimator package, or any suggestions on how to solve this problem quickly and efficiently?

I have used a simple low-pass filter in similar situations.

Conceptually, you can Get the moving estimate of the average, fac = 0.99; filter [k] = fac * filtered [k-1](1-fac)* data [k], which is very effective (in C). A slightly more fancy The IIR filter, lower pass than this one, Butterworth, is easy to set up scipy:

b, a = scipy.signal.butter(2, 0.1)
filtered = scipy.signal.lfilter(b, a, data)

To get an estimate of the “proportion”, you can subtract this “average estimate” from the data. This will actually lower the pass Becomes a high-pass filter. Take the abs() and run through another low-pass filter.

The result may look like this:

Full script:

from pylab import *
from scipy.signal import lfilter, butter

data = randn(1000)
data[300:] += 1.0
data[600:] *= 3.0
b, a = butter(2, 0.03)
mean_estimate = lfilter(b, a, data)
scale_estimate = lfilter(b, a, abs(data-mean_estimate))

plot(data,'.')
plot(mean_estimate)
plot(mean_estimate + scale_estimate, color='k')< br />plot(mean_estimate-scale_estimate, color='k')

show()

Obviously, the butter() parameters need to be adjusted according to your problem. If you change Set the order to 1 instead of 2, then you will get the simple filter I described first.

Disclaimer: This is the engineer’s opinion on this issue. This method may be useful in any statistical or mathematical aspect Unreasonable. Also, I’m not sure if it really solves your problem (If not, please explain better), but don’t worry, I have some fun anyway;-)

Leave a Comment

Your email address will not be published.