How to stop numpy from iterating over custom Mapping objects?

0

Issue

The Question

Hi. Please consider this code

import numpy as np


class MappingFoo:
    bar = {"a": 1}

    __len__ = bar.__len__
    __iter__ = bar.__iter__
    __getitem__ = bar.__getitem__


foo = MappingFoo()

print(np.array([foo.bar]))  # [{'a': 1}] (np array with dtype object)

print(np.array([foo]))  # [['a']] (np array with dtype str)

Note: MappingFoo implements exactly all 3 magic methods required to inherit from and become a collections.abc.Mapping.

Here are my observations:

  • numpy treats MappingFoo like a list/iterable.
  • If you remove any of the 3 magic methods of MappingFoo, np.array([foo]) will also return an object array just like the first np.array(...) call.
  • It makes no difference whether MappingFoo inherits from collections.abc.Mapping or not.

Here are my questions:

  1. How does numpy decide which objects should be expanded/iterated over when creating an array with numpy.array(...)?
  2. How do I stop if from iterating over my MappingFoo objects and make it store them as objects in the array instead? Remark: In my real example, I don’t know the types of the input in advance. Hence, I want to use np.array(...) for array creation instead of relying on np.empty(...) and then filling it in a loop or other workarounds that require an explicit dtype.

A Piece of Code to Illustrate Question 2

As the saying goes: a piece of code is worth a thousand words. So here is re-phrasing question 2 with code.

from random import (
    randint,
    random,
)

def generate_input():
    """Generate a non-deterministic list as input for np.array.

    Generate a list or list of lists or 3-times nested list with either
        * only integers
        * a mixture of integers and MappingFoo objects
        * only MappingFoo objects
    """
    def generator(shape_, foo_it_all):
        if len(shape_) == 1:
            return [randint(0, 255) 
                    if not foo_it_all and random() < .99
                    else MappingFoo()
                    for _ in range(shape_[0])]
        else:
            return [generator(shape_[1:], foo_it_all)
                    for _ in range(shape_[0])]

    dim = randint(1, 3)
    shape = tuple(randint(1, 10) for _ in range(dim))
    return shape, generator(shape, random() > .5)
        

shape, input_ = generate_input()
array = np.array(input_)
print(array)
print(array.dtype)
print(shape)
assert array.shape == shape

How do I make the assertion pass each time without changing generate_input or making use of shape? Changing MappingFoo would be ok (of course not removing functionality). Lastly, array should have the minimum dtype required to hold the objects in the sequence.

Solution

Missing import that almost made me quit:

In [31]: from random import random, randint

The thousand words code is only as good as the willingness of the users to run it. Even so it’s a good idea to give some sample outputs, not just the code.

For example the first time I tried this I got a (7, 2, 3) shape. It wasn’t obvious why.

Next time a simple shape:

In [32]: shape,x = generate_input()
In [33]: shape
Out[33]: (1,)
In [34]: x
Out[34]: [169]
In [35]: np.array(x)
Out[35]: array([169])

Again:

In [36]: shape,x = generate_input()
In [37]: shape
Out[37]: (4, 9)
In [38]: x
Out[38]: 
[[<__main__.MappingFoo at 0x7f8e123d81f0>,
  <__main__.MappingFoo at 0x7f8e238906a0>,
  <__main__.MappingFoo at 0x7f8e23890e50>,
  <__main__.MappingFoo at 0x7f8e23890ac0>,
  ...
  <__main__.MappingFoo at 0x7f8e123e67f0>,
  <__main__.MappingFoo at 0x7f8e123e6970>,
  <__main__.MappingFoo at 0x7f8e123e6100>]]



In [39]: np.array(x).shape
Out[39]: (4, 9, 1)    # and a dtype='<U1'

Using the empty/fill:

In [40]: res = np.empty(shape,object)
In [41]: res[:]=x
In [42]: res
Out[42]: 
array([[<__main__.MappingFoo object at 0x7f8e123d81f0>,
        <__main__.MappingFoo object at 0x7f8e238906a0>,
        <__main__.MappingFoo object at 0x7f8e23890e50>,
        <__main__.MappingFoo object at 0x7f8e23890ac0>,
        <__main__.MappingFoo object at 0x7f8e23890700>,
         ....
        <__main__.MappingFoo object at 0x7f8e123e67f0>,
        <__main__.MappingFoo object at 0x7f8e123e6970>,
        <__main__.MappingFoo object at 0x7f8e123e6100>]], dtype=object)

Since the res.shape matches the x list nesting, there’s no need to iteration. res[:]=x is enough. Sometimes assignment like this can raise broadcasting errors, but that’s not the case here.

Like I commented, empty/fill is the only sure way of creating an object dtype array with a chosen shape. np.array is trying too hard to do its-own-thing.

Answered By – hpaulj

This Answer collected from stackoverflow, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Leave A Reply

Your email address will not be published.

This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish. Accept Read More