How to retrieve the original order of keyword parameters passed to function call?

The order of retrieving keyword arguments passed through **kwargs is very useful in the specific project I am working on. It is about making a nd numpy array with meaningful dimensions (now Called dimarray), which is particularly useful for geophysical data processing.

Now we have:

import numpy as np
from dimarray import Dimarray # the handy class I am programming

def make_data(nlat, nlon):
""" generate some example data
"""
values = np.random.randn(nlat, nlon)
lon = np.linspace(-180,180,nlon)
lat = np.linspace(-90,90,nlat)
return lon, lat, values

What works:

>>> lon, lat, values ​​= make_data(180,360)
>>> a = Dimarray (values, lat=lat, lon=lon)
>>> print a.lon[0], a.lat[0]
-180.0 -90.0

What not:

>>> lon, lat, data = make_data(180,180) # square, no shape checking possible !
>>> a = Dimarray(values, lat= lat, lon=lon)
>>> print a.lon[0], a.lat[0] # is random
-90.0, -180.0 # could be (actually I raise an error in such ambiguous cases)

The reason is the __ of Dimarray The signature of the init__ method is (value, **kwargs), because kwargs is an unordered dictionary (dict), the best it can do is to check the shape of the value.

Of course, I hope it applies For any type of dimension:

a = Dimarray(values, x1=.., x2=...,x3=...)

So it must be hard-coded with **kwargs
The possibility of ambiguity increases as the dimensionality increases.
There are many methods, such as signature (value, axis, name, **kwargs) can do it :

a = Dimarray(values, [lat, lon], ["lat","lon"])

But this syntax is very important for interaction It is troublesome to use (ipython) because I hope this package will really become part of my (and others) daily use of python, as a practical substitute for numpy arrays in geophysics.

I I am very interested in this aspect. The best way I can think of now is to use the inspect module’s stack method to parse the caller’s statement:

import inspect
def f(**kwargs):
print inspect.stack()[1][4]
return tuple([kwargs[k] for k in kwargs])

>>> print f(lon=360, lat=180)
[u'print f(lon=360, lat=180) ']
(180, 360)
< br />>>> print f(lat=180, lon=360)
[u'print f(lat=180, lon=360) ']
(180, 360)

People can solve this problem, but because stack() captures everything, there is an unsolvable problem:

>>> print (f( lon=360, lat=180), f(lat=180, lon=360))
[u'print (f(lon=360, lat=180), f(lat=180, lon=360) ) ']
[u'print (f(lon=360, lat=180), f(lat=180, lon=360)) ']
((180, 360), (180, 360))

There are other inspection techniques that I don’t know, can they solve this problem? (I am not familiar with this module) I would imagine a piece of code, it is located between the brackets lon = 360, lat = 180 should be feasible, no?

So for the first time I feel that python is doing something that is theoretically feasible based on all available information (the ranking provided by the user is valuable information!!!).

I read interesting suggestions there: https://mail.python.org/pipermail/python-ideas/2011-January/009054.html
and want to know if this idea has been in some way moving forward?

I understand why there is generally no need to have an orderly **kwargs, but the patches for these rare cases will be neat. Who knows a reliable hacker?

Note: This is not about pandas, I am actually trying to develop a lightweight alternative, its usage is still very close to numpy. The gitHub link will be released soon.

Edit: Note that this is related to the interactive use of dimarray. Double syntax is required anyway.

EDIT2: I also saw anti-data, knowing that the data is not sorted can be considered valuable Because it allows Dimarray to freely check the shape of the values ​​and automatically adjust the order. It may not even remember that the dimension of the data is more common than the two dimensions have the same size. So now, I think I can raise an error for ambiguous situations, The user is required to provide the names parameter. However, the freedom to make that choice (how the Dimarray class should behave) is free, not constrained by the missing features of python.

Edit 3, the solution: in After kazagistar’s suggestion:

I did not mention that there are other optional attribute parameters, such as name="" and units="", and some other parameters related to slices, so the *args structure Need to test the keyword name on kwargs.

In short, there are many possibilities:

*choose a: keep the current syntax

< pre>a = Dimarray(values, lon=mylon, lat=mylat, name="myarray")
a = Dimarray(values, [mylat, mylon], ["lat", "lon"], name= "myarray")

*choose b: kazagistar’s second suggestion, reduce the axis definition through **kwargs

a = Dimarray(values, ( "lat", mylat), ("lon",mylon), name="myarray")

*Choose c: kazagistar’s second suggestion, use **kwargs to select the axis definition
(Note that this involves name = extracted from **kwargs, see background below)

a = Dimarray(values, lon=mylon, lat=mylat, name="myarray ")
a = Dimarray(values, ("lat", mylat), ("lon",mylon), name="myarray")

*choose d: kazagistar’s third suggestion, select axis definition through **kwargs

a = Dimarray(values, lon=mylon, lat=mylat , name="myarray")
a = Dimarray(values, [("lat", mylat), ("lon",mylon)], name="myarray")

Well, It boils down to aesthetics and some design issues (does laziness order an important feature in interactive mode? ). I hesitate between b) and c). I'm not sure what ** kwargs really brings. Ironically, when I think about it more, the content I started to criticize became One feature...

Thank you very much for your answer. I will mark the question as answered, but you are welcome to vote for a), b) c) or d)!

====================

Edit 4: A better solution: choose a)!!, But add a from_tuples class method. The reason is to allow a greater degree of freedom. If the axis name is not provided, they will be automatically generated as "x0", "x1", etc... to be used like pandas, but with axis naming. This Also avoid mixing the axis and attributes into **kwargs and leave it to the axis only. Once I finish the document, it will be fast.

a = Dimarray (values, lon=mylon, lat=mylat, name="myarray")
a = Dimarray(values, [mylat, mylon], ["lat", "lon"], name="myarray")a = Dimarray.from_tuples(values, ("lat", mylat), ("lon",mylon), name="myarray")

Edit 5: More pythonic solutions? : Similar to the EDIT 4 user api above, but through the wrapper dimarray, the instantiation of Dimarray is very strict. This is also in line with the spirit proposed by kazagistar.

from dimarray import dimarray, Dimarray 

a = dimarray(values, lon=mylon, lat=mylat, name="myarray") # error if lon and lat have same size
b = dimarray(values, [("lat", mylat), ("lon",mylon)], name="myarray")
c = dimarray(values, [mylat, mylon, ...], ['lat',' lon',...], name="myarray")
d = dimarray(values, [mylat, mylon, ...], name="myarray2")

From the class itself For example:

e = Dimarray.from_dict(values, lon=mylon, lat=mylat) # error if lon and lat have same size
e.set( name="myarray", inplace=True)
f = Dimarray.from_tuples(values, ("lat", mylat), ("lon",mylon), name="myarray")
g = Dimarray.from_list(values, [mylat, mylon, ...], ['lat','lon',...], name="myarray")
h = Dimarray.from_list(values, [mylat , mylon, ...], name="myarray")

In the case of d) and h), the axis is automatically named "x0", "x1", and so on, unless mylat, Mylon actually belongs to the Axis class (I didn't mention it in this article, but Axes and Axis did their work, creating axes and processing indexes).

Description:

class Dimarray(object):
""" ndarray with meaningful dimensions and clean interface
"""
def __init__(self, values, axes, **kwargs):
assert isinstance(axes, Axes), "axes must be an instance of Axes"
self.values ​​= values
self.axes = axes
self.__dict__.update(kwargs)

@classmethod
def from_tuples(cls, values, *args, **kwargs):
axes = Axes.from_tuples(*args)
return cls(values, axes)

@classmethod
def from_list(cls, values, axes, names=None, * *kwargs):
if names is None:
names = ["x{}".format(i) for i in range(len(axes))]
return cls.from_tuples( values, *zip(axes, names), **kwargs)

@classmethod
def from_dict(cls, values, names=None,**kwargs):
axes = Axes.from_dict(shape=values.shape, names=names, **kwargs)
# with necessary assert statements in the above
return cls(values, axes)

This is a technique (schematic diagram):

def dimarray(values, axes) =None, names=None, name=..,units=..., **kwargs):
""" my wrapper with all fancy options
"""
if len( kwargs)> 0:
new = Dimarray.from_dict(values, axes, **kwargs)

elif axes[0] is tuple:
new = Dimarray.from_tuples(values , *axes, **kwargs)

else:
new = Dimarray.from_list(values, axes, names=names, **kwargs)

# reserved attributes
new.set(name=name, units=units, ..., inplace=True)

return new

The only thing we loose is *args syntax , It can’t accommodate so many
options. But that’s okay.

And it can also be easily subclassed. How does this sound to Python experts?

(This whole discussion can be divided into two parts)

=====================

< p>Some background (edit: partly outdated, case a), b), c), d) only), just in case you are interested:

*choice involves:

def __init__(self, values, axes=None, names=None, units="",name="",..., **kwargs):
""" schematic representation of Dimarray's init method
"""
# automatic ordering according to values' shape (unless names is also provided)
# the user is allowed to forget about the exact shape of the array< br /> if len(kwargs)> 0:
axes = Axes.from_dict(shape=values.shape, names=names, **kwargs)

# otherwise initialize from list
# exact ordering + more freedom in axis naming
else:
axes = Axes.from_list(axes, names)

... # check consistency
< br /> self.values ​​= values
self.axes = axes
self.name = name
self.units = units

*choose b) and c) to impose :

def __init__(self, values, *args, **kwargs):
...

b) All attributes are passed naturally through kwargs, with self.__dict__.update(kwargs). This is very clean.

c) Keyword parameters need to be filtered:

def __init__(self, values, *args, **kwargs):
""" most flexible for interactive use
"""
# filter out known attributes
default_attrs = {'name':'','units':'', ...}
for k in kwargs:
if k in'name','units', ...:
setattr(self, k) = kwargs.pop(k)
else:
setattr(self, k) = default_attrs[k]

# same as before
if len(kwargs)> 0:
axes = Axes.from_dict(shape=values.shape, names=names, **kwargs)

# same , just unzip
else:
names, numpy_axes = zip(*args)
axes = Axes.from_list(numpy_axes, names)

This is actually very easy to use and The only (minor) disadvantage is that the default parameters of name="", units="" and some other more relevant parameters cannot be checked or completed to access.

*Select d: Clear __init__

def __init__(self, values, axes, name="", units="", ..., **kwaxes)

But it is a bit verbose.

==========

EDIT,F YI: I ended up using a list of tuples of axis parameters, or the parameters dims = and labels = are used for axis names and axis values, respectively. The related item dimarray is on github. Thanks again to kazagistar.

No, you cannot know the order in which items are added to the dictionary, because doing so will significantly increase the complexity of implementing the compass. (Because when When you really need this, you are guaranteed collections.OrderedDict).

But, have you considered some basic alternative syntax? For example:

a = Dimarray(values,'lat', lat,'lon', lon)

or (probably the best choice)

a = Dimarray(values, ('lat', lat), ('lon', lon))

or (the most specific)< /p>

a = Dimarray(values, [('lat', lat), ('lon', lon)])

But to some extent , The need to sort is essentially positional. **kwargs is often abused for marking, but the parameter name should usually not be "data", because it is a pain to set it programmatically. Just use the two tuples to clearly associate the data Part, and use the list to maintain the sorting, and provide a strong assertion error message to clearly explain the reason when the input is invalid.

Retrieve the key passed through **kwargs The order of the word parameters is very useful in the specific project I am working on. It is about making a kind of nd numpy array (now called dimarray) with meaningful dimensions, which is particularly useful for geophysical data processing.

Now that we have:

import numpy as np
from dimarray import Dimarray # the handy class I am programming

def make_data(nlat, nlon):
""" generate some example data
"""
values ​​= np.random.randn(nlat, nlon)
lon = np. linspace(-180,180,nlon)
lat = np.linspace(-90,90,nlat)
return lon, lat, values

What works:

>>> lon, lat, values ​​= make_data(180,360)
>>> a = Dimarray(values, lat=lat, lon=lon)
>>> print a.lon[0], a.lat[0]
-180.0 -90.0

What not:

>>> lon, lat, data = make_data(180,180) # square, no shape checking possible !
>>> a = Dimarray(values, lat=lat, lon=lon)
> >> print a.lon[0], a.lat[0] # is random
-90.0, -180.0 # could be (actually I raise an error in such ambiguous cases)

The signature of the __init__ method of Dimarray is (value, **kwargs), because kwargs is an unordered dictionary (dict), the best it can do is to check the shape of the value.

Of course, I hope it applies to any type of dimension:

a = Dimarray(values, x1=.., x2=...,x3=...)

So it must be hard-coded with **kwargs
The possibility of ambiguity increases as the dimensionality increases.
There are many methods, such as signature (value, axis, name, **kwargs ) Can do:

a = Dimarray(values, [lat, lon], ["lat","lon"])

But this This syntax is cumbersome for interactive use (ipython), because I hope this package will really become part of my (and others) daily use of python, as a practical substitute for numpy arrays in geophysics.

I am very interested in this aspect. The best way I can think of now is to use the stack method of the inspect module to parse the caller's statement:

import inspect 
def f(**kwargs):
print inspect.stack()[1][4]
return tuple([kwargs[k] for k in kwargs])

>>> print f(lon= 360, lat=180)
[u'print f(lon=360, lat=180) ']
(180, 360)

>>> print f (lat=180, lon=360)
[u'print f(lat=180, lon=360) ']
(180, 360)

People can Solve this problem, but because stack() captures everything, there is an unsolvable problem:

>>> print (f(lon=360, lat=180) , f(lat=180, lon=360))
[u'print (f(lon=360, lat=180), f(lat=180, lon=360)) ']
[u'print (f(lon=360, lat=180), f(lat=180, lon=360)) ']
((180, 360), (180, 360))< /pre>

There are other inspection techniques that I don’t know about, can they solve this problem? (I am not familiar with this module) I would imagine a piece of code, it is located between the brackets lon = 360, lat = 180 should be feasible, no?

So for the first time I feel that python is doing something that is theoretically feasible based on all available information (the ranking provided by the user is valuable information!!!).

I read interesting suggestions there: https://mail.python.org/pipermail/python-ideas/2011-January/009054.html
and want to know if this idea has been in some way moving forward?

I understand why there is generally no need to have an orderly **kwargs, but the patches for these rare cases will be neat. Who knows a reliable hacker?

Note: This is not about pandas, I am actually trying to develop a lightweight alternative, its usage is still very close to numpy. The gitHub link will be released soon.

Edit: Note that this is related to the interactive use of dimarray. Double syntax is required anyway.

EDIT2: I also saw anti-data, knowing that the data is not sorted can be considered valuable Because it allows Dimarray to freely check the shape of the values ​​and automatically adjust the order. It may not even remember that the dimension of the data is more common than the two dimensions have the same size. So now, I think I can raise an error for ambiguous situations, The user is required to provide the names parameter. However, the freedom to make that choice (how the Dimarray class should behave) is free, not constrained by the missing features of python.

Edit 3, the solution: in After kazagistar’s suggestion:

I did not mention that there are other optional attribute parameters, such as name="" and units="", and some other parameters related to slices, so the *args structure Need to test the keyword name on kwargs.

In short, there are many possibilities:

*choose a: keep the current syntax

< pre>a = Dimarray(values, lon=mylon, lat=mylat, name="myarray")
a = Dimarray(values, [mylat, mylon], ["lat", "lon"], name= "myarray")

*choose b: kazagistar’s second suggestion, reduce the axis definition through **kwargs

a = Dimarray(values, ( "lat", mylat), ("lon",mylon), name="myarray")

*Choose c: kazagistar's second suggestion, use **kwargs to select the axis definition
(Note that this involves name = extracted from **kwargs, see background below)

a = Dimarray(values, lon=mylon, lat=mylat, name="myarray ")
a = Dimarray(values, ("lat", mylat), ("lon",mylon), name="myarray")

*Select d: ka The third suggestion of zagistar, select axis definition through **kwargs

a = Dimarray(values, lon=mylon, lat=mylat, name="myarray")
a = Dimarray(values, [("lat", mylat), ("lon",mylon)], name="myarray")

Well, it boils down to aesthetics and some design issues ( Does laziness order an important feature in interactive mode? ). I hesitate between b) and c). I'm not sure what ** kwargs really brings. Ironically, when I think about it more, the content I started to criticize became One feature...

Thank you very much for your answer. I will mark the question as answered, but you are welcome to vote for a), b) c) or d)!

====================

Edit 4: A better solution: choose a)!!, But add a from_tuples class method. The reason is to allow a greater degree of freedom. If the axis name is not provided, they will be automatically generated as "x0", "x1", etc... to be used like pandas, but with axis naming. This Also avoid mixing the axis and attributes into **kwargs and leave it to the axis only. Once I finish the document, it will be fast.

a = Dimarray (values, lon=mylon, lat=mylat, name="myarray")
a = Dimarray(values, [mylat, mylon], ["lat", "lon"], name="myarray")a = Dimarray.from_tuples(values, ("lat", mylat), ("lon",mylon), name="myarray")

Edit 5: More pythonic solutions? : Similar to the EDIT 4 user api above, but through the wrapper dimarray, the instantiation of Dimarray is very strict. This is also in line with the spirit proposed by kazagistar.

from dimarray import dimarray, Dimarray 

a = dimarray(values, lon=mylon, lat=mylat, name="myarray") # error if lon and lat have same size
b = dimarray(values, [("lat", mylat), ("lon",mylon)], name="myarray")
c = dimarray(values, [mylat, mylon, ...], ['lat',' lon',...], name="myarray")
d = dimarray(values, [mylat, mylon, ...], name="myarray2")

From the class itself For example:

e = Dimarray.from_dict(values, lon=mylon, lat=mylat) # error if lon and lat have same size
e.set( name="myarray", inplace=True)
f = Dimarray.from_tuples(values, ("lat", mylat), ("lon",mylon), name="myarray")
g = Dimarray.from_list(values, [mylat, mylon, ...], ['lat','lon',...], name="myarray")
h = Dimarray.from_list(values, [mylat , mylon, ...], name="myarray")

In the case of d) and h), the axis is automatically named "x0", "x1", and so on, unless mylat, mylon actually belongs to the Axis class (I didn't mention it in this article, but Axes and Axis did their work, creating axes and processing indexes).

< p>Description:

class Dimarray(object):
""" ndarray with meaningful dimensions and clean interface
"""
def __init__(self, values, axes, **kwargs):
assert isinstance(axes, Axes), "axes must be an instance of Axes"
self.values ​​= values
self.axes = axes
self.__dict__.update(kwargs)

@classmethod
def from_tuples(cls, values, *args, **kwargs):
axes = Axes .from_tuples(*args)
return cls(values, axes)

@classmethod
def from_list(cls, values, axes, names=None, **kwargs):< br /> if names is None:
names = ["x{}".format(i) for i in range(len(axes))]
return cls.from_tuples(values, *zip( axes, names), **kwargs)

@classmethod
def from_dict(cls, values, names=None,**kwargs):
axes = Axes.from_dict(shape =values.shape, names=names, **kwargs)
# with necessary assert s tatements in the above
return cls(values, axes)

This is the technique (schematic diagram):

def dimarray(values, axes= None, names=None, name=..,units=..., **kwargs):
""" my wrapper with all fancy options
"""
if len(kwargs )> 0:
new = Dimarray.from_dict(values, axes, **kwargs)

elif axes[0] is tuple:
new = Dimarray.from_tuples(values, *axes, **kwargs)

else:
new = Dimarray.from_list(values, axes, names=names, **kwargs)

# reserved attributes
new.set(name=name, units=units, ..., inplace=True)

return new

The only thing we loose is the *args syntax, It can’t accommodate so many
options. But that’s okay.

And it can also be easily subclassed. How does this sound to Python experts?

(This whole discussion can be divided into two parts)

=====================

< p>Some background (edit: partly outdated, case a), b), c), d) only), just in case you are interested:

*choice involves:

def __init__(self, values, axes=None, names=None, units="",name="",..., **kwargs):
""" schematic representation of Dimarray's init method
"""
# automatic ordering according to values' shape (unless names is also provided)
# the user is allowed to forget about the exact shape of the array< br /> if len(kwargs)> 0:
axes = Axes.from_dict(shape=values.shape, names=names, **kwargs)

# otherwise initialize from list
# exact ordering + more freedom in axis naming
else:
axes = Axes.from_list(axes, names)

... # check consistency
< br /> self.values ​​= values
self.axes = axes
self.name = name
self.units = units

*choose b) and c) to impose :

def __init__(self, values, *args, **kwargs):
...

b) All attributes are pass Kwargs is passed naturally, with self.__dict__.update(kwargs). This is very clean.

c) Keyword parameters need to be filtered:

def __init__(self, values, *args, **kwargs):
""" most flexible for interactive use
"""
# filter out known attributes
default_attrs = {'name':'','units':'', ...}
for k in kwargs:
if k in'name','units', ...:
setattr(self, k) = kwargs.pop(k)
else:
setattr(self, k) = default_attrs[k]

# same as before
if len(kwargs)> 0:
axes = Axes.from_dict(shape=values.shape, names=names, **kwargs)

# same, just unzip
else:
names, numpy_axes = zip(*args)
axes = Axes.from_list(numpy_axes, names)

This is actually very easy to use and the only (secondary) disadvantage is The default parameters of name="",units="" and some other more relevant parameters cannot be checked or completed to access.

*Choose d: Clear __init__

def __init__(self, values, axes, name="", units="", ..., **kwaxes)

But it is a bit verbose.

==========

EDIT,FYI: I ended up using a list of tuples of axis parameters, or The parameters dims= and labels= are used for the axis name and axis value respectively. The related project dimarray is on github. Thanks again to kazagistar.

No, you cannot know The order in which items are added to the dictionary, because doing so will significantly increase the complexity of implementing the compass. (Because when you really need this, collections.OrderedDict you are guaranteed).

But , Have you considered some basic alternative syntax? For example:

a = Dimarray(values,'lat', lat,'lon', lon)

or (probably the best choice)

a = Dimarray(values, ('lat', lat), ('lon', lon))

or (the most specific)< /p>

a = Dimarray(values, [('lat', lat), ('lon', lon)])

But to some extent , The need to sort is essentially positional. **kwargs is often abused for marking, but the parameter name should usually not be "data", because it is a pain to set it programmatically. Just use the two tuples to clearly associate the data Part, and use the list to keep the sorting, and provide a strong assertion error message, so that when the input is invalid, the reason can be clearly explained.

Leave a Comment

Your email address will not be published.