fixes for push_attr/pull_attr #105

ilia-kats · 2025-10-06T14:28:53Z

don't raise exception if mods is used together with common or prefixed (push) / common, nonunique,
or unique (pull): There is nothing in the logic preventing that
don't raise if columns is used together with common or prefixed (push) / common, nonunique, or unique, warn instead
bugfix for push_attr: correct ordering of pushed column
minor code cleanup

codecov-commenter · 2025-10-06T14:33:50Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (main@ddbcfa4). Learn more about missing BASE report.

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #105   +/-   ##
=======================================
  Coverage        ?   96.78%           
=======================================
  Files           ?       13           
  Lines           ?      872           
  Branches        ?        0           
=======================================
  Hits            ?      844           
  Misses          ?       28           
  Partials        ?        0

Files with missing lines	Coverage Δ
tests/test_pull_push.py	`99.52% <100.00%> (ø)`

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

gtca

Thanks for the fixes!

I am also wondering if this fixes specific use cases that were failing before, can we add them to the our tests?

src/mudata/_core/utils.py

gtca · 2025-10-09T22:31:00Z

src/mudata/_core/mudata.py

-                assert v is None, f"Cannot use {k} with columns."
+                if v is not None:
+                    warnings.warn(
+                        f"Both columns and {k} given. Columns take precedence, {k} will be ignored",


Would something like this improve readability? (I am not sure we have a consistent policy for formatting in such cases.)

Both `columns=...` and `{k}=True` were given. <...>

If yes, this is also true for similar warnings in other parts of the PR.

I think that would be a bit misleading here, since the warning will also be emitted if k=False or any other value which is not the None default. Perhaps something like

Both `columns=...` and `{k}={locals()[k]}` were given...

? But I'm not sure if that brings the message across that it should just be not passed at all (as in leave the None default).

ilia-kats · 2025-10-10T07:32:59Z

I am also wondering if this fixes specific use cases that were failing before, can we add them to the our tests?

I did add a test for the one bugfix that this contains (the ordering of pushed columns). The rest of this PR is more quality of life / logic improvements. I can add tests for those, but I'd prefer to do that as part of a larger effort to improve test coverage.

ilan-gold

Generally just found it a bit hard to reason about what things are done to the columns, so maybe a class would help?

ilan-gold · 2025-10-10T09:01:39Z

src/mudata/_core/mudata.py

        common
            If True, pull common columns.
            Common columns do not have modality prefixes.
            Pull from all modalities.


What does this mean?

So if you have a column that is present in all modalities (AnnData objects) and is named, say batch, it will be pulled and its name in the global dataframe will also be batch, so without any modalitiy prefix.

src/mudata/_core/mudata.py

ilan-gold · 2025-10-10T09:04:38Z

src/mudata/_core/mudata.py

        only_drop
            If True, drop the columns but do not actually pull them.
            Forces drop=True. False by default.


So this argument turns this function into a "drop columns" function instead of a "pull columns" function? Made add some context to why one might want to do this (deleting data from your underlying AnnData stores strikes me as a bad idea, but I'm sure there's a reason to do this in some sort of structure manner)

That was originally implemented by @gtca, so he's best qualified to answer, but I think if you have a bunch of AnnDatas coming from somewhere (e.g. constructed from the output of a proprietary pipeline or some publication), there are often metadata that are either useless (the same value for all observations) or that you just don't care about in your analysis, in which case one can drop them to reduce visual clutter (e.g. when printing the dataframe in a notebook or tab-completing column names).

ilan-gold · 2025-10-10T09:26:17Z

src/mudata/_core/mudata.py

        if only_drop:
            drop = True

        cols = _classify_attr_columns(


I think cols should be a class, not just a dictionary, with methods to encapsulate the below iterations (and it's sub-dictionary value as well to handle the modcols logic)

Generally I would agree with you, but this used at exactly one place in the entire codebase, so I think a class is a bit overkill at this point.

Although I would consider making it a named tuple for performance reasons.

ilan-gold · 2025-10-10T09:27:12Z

src/mudata/_core/mudata.py

-
-            if mods is not None:
-                cols = [col for col in cols if col["prefix"] in mods]
+            cols = {


i.e., with cols as a class this could be

prefix_to_cols = cols.filter_by_name_or_derived_name(colums)

(I would also advocate changing the name from cols to prefix_to_cols to avoid confusion with columns)

src/mudata/_core/utils.py

ilan-gold · 2025-10-10T09:49:49Z

src/mudata/_core/utils.py

        {"name": "global", "prefix": "", "derived_name": "global", "class": "common"},
        {"name": "mod1:annotation", "prefix": "mod1", "derived_name": "annotation", "class": "prefixed"},
        {"name": "mod2:annotation", "prefix": "mod2", "derived_name": "annotation", "class": "prefixed"},
        {"name": "mod1:unique", "prefix": "mod1", "derived_name": "annotation", "class": "prefixed"},


The class seems redundant, more class-behavior. Isn't just a check on name?

In general yes, but having this as a separate attribute makes the filtering in _push_attr() much easier.

It's a big refactor but this gets back to my point about using classes. It just feels like _classify_attr_columns and _classify_prefixed_columns do such similar things and return such weak types that refactoring around using classes would be a good idea.

It's tough to say without doing it myself, but offhand, I don't see why this should return a list and not the same thing as _classify_attr_columns with methods for getting the needed information that might distinguish them. For example, moving prefix here into the key of the dictionary returned in classify_attr_columns seems. Even there, in that return type the name and derived_name are just slightly different than name and prefix here it seems.

I just generally don't like untyped dictionaries and magic strings and especially in this case, it seems like there's a decent amount of overlapping functionality. But if it's too much for this PR, that's fine. I think stronger typing on the return types is a good start.

OK, there is now a custom class encapsulating one column and its logic. But moving all the filtering logic etc. into another class feels like overengineering to me tbh. If anything, I would get rid of _classify_attr_columns and just move the few lines into _pull_attr.

It's up to you. It's not so much about code correctness as it is about readability for the next person. For example:

https://github.com/scverse/mudata/pull/105/files#diff-5fc9f5c31eeb9fbab11538b609e263905fcc3e03d32bc2436469f26ded6514afR2261-R2267

or

https://github.com/scverse/mudata/pull/105/files#diff-5fc9f5c31eeb9fbab11538b609e263905fcc3e03d32bc2436469f26ded6514afR2012-R2026

I find these a little difficult to reason about, especially in the absence of comment strings.

I added some more comments throughout.

src/mudata/_core/utils.py

- don't raise exception if mods is used together with common or prefixed there is nothing in the logic preventing it - don't raise if columns is used together with common or prefixed, warn instead - minor code cleanup

- don't raise exception if mods is used together with common, nonunique, or unique: there is nothing in the logic preventing it - don't raise if columns is used together with common, nonunique, or unique, warn instead - fix ordering of pushed column - minor code cleanup

ilan-gold

Definitely like the MetadataColumn class!

src/mudata/_core/mudata.py

ilan-gold · 2025-10-13T10:08:45Z

src/mudata/_core/utils.py

        {"name": "global", "prefix": "", "derived_name": "global", "class": "common"},
        {"name": "mod1:annotation", "prefix": "mod1", "derived_name": "annotation", "class": "prefixed"},
        {"name": "mod2:annotation", "prefix": "mod2", "derived_name": "annotation", "class": "prefixed"},
        {"name": "mod1:unique", "prefix": "mod1", "derived_name": "annotation", "class": "prefixed"},


It's up to you. It's not so much about code correctness as it is about readability for the next person. For example:

https://github.com/scverse/mudata/pull/105/files#diff-5fc9f5c31eeb9fbab11538b609e263905fcc3e03d32bc2436469f26ded6514afR2261-R2267

or

https://github.com/scverse/mudata/pull/105/files#diff-5fc9f5c31eeb9fbab11538b609e263905fcc3e03d32bc2436469f26ded6514afR2012-R2026

I find these a little difficult to reason about, especially in the absence of comment strings.

_pull_attr/_push_attr

ilan-gold

Are there explicit tests for the fixes? For example:

don't raise exception if mods is used together with common or prefixed (push) / common, nonunique, or unique (pull): There is nothing in the logic preventing that

deosn't seem to have a test. I searched through the file and mods is None by default and the other two instances of it in test_push_pull are only with columns=...

src/mudata/_core/mudata.py

Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>

ilia-kats · 2025-10-22T11:03:58Z

Are there explicit tests for the fixes? For example:

There is a test for push_attr: correct ordering of pushed column, which is the only actual fix in here. The rest is more about behavior changes / being more permissive in terms of what it accepts. I can add some tests for that now, but I'd prefer to do that as part of a larger effort to get our test coverage up (which is on my todo list, but depends on #104 and #106 being merged).

no need to argsort

ilan-gold · 2025-10-24T11:25:22Z

tests/test_pull_push.py

+                == mdata.mod[m].obs["common_obs_col"].to_numpy()[modmap[mask] - 1]
+            ).all()


From what I understand, the reason is that 0 encodes missing (why not NA?) so the obsmap indexing starts at 1 and then you need to shift by 1. Is this right?

Unrelated to the PR, why are these not nullables?

Yes, exactly. When we first implemented this, AnnData's IO did not support nullables yet, so we went with this.

ilia-kats requested a review from gtca October 6, 2025 14:28

ilia-kats force-pushed the push_pull_fixes branch from 143a0a1 to 2e797b1 Compare October 6, 2025 14:33

ilia-kats force-pushed the push_pull_fixes branch 2 times, most recently from aad3562 to 2cc418f Compare October 7, 2025 11:49

ilia-kats requested a review from ilan-gold October 9, 2025 16:11

gtca reviewed Oct 9, 2025

View reviewed changes

ilia-kats force-pushed the push_pull_fixes branch from 2cc418f to ca406d7 Compare October 10, 2025 07:07

ilan-gold reviewed Oct 10, 2025

View reviewed changes

ilia-kats force-pushed the push_pull_fixes branch 5 times, most recently from 1072cf6 to 0dc33db Compare October 13, 2025 09:12

ilia-kats added 3 commits October 13, 2025 11:26

pull_attr fixes

3a4b2e1

- don't raise exception if mods is used together with common or prefixed there is nothing in the logic preventing it - don't raise if columns is used together with common or prefixed, warn instead - minor code cleanup

push_attr fixes

8647ddd

- don't raise exception if mods is used together with common, nonunique, or unique: there is nothing in the logic preventing it - don't raise if columns is used together with common, nonunique, or unique, warn instead - fix ordering of pushed column - minor code cleanup

push/pull: replace dict holding column information with custom class

2b26ebf

ilia-kats force-pushed the push_pull_fixes branch from 0dc33db to 2b26ebf Compare October 13, 2025 09:26

ilan-gold reviewed Oct 13, 2025

View reviewed changes

get rid of _classify_attr_columns, sprinkle more comments throughout

21b2e83

_pull_attr/_push_attr

ilia-kats force-pushed the push_pull_fixes branch from 4e2177d to 21b2e83 Compare October 13, 2025 12:36

ilia-kats requested a review from ilan-gold October 22, 2025 08:50

ilan-gold reviewed Oct 22, 2025

View reviewed changes

src/mudata/_core/mudata.py Outdated Show resolved Hide resolved

Apply suggestion from @ilan-gold

8b8d1c1

Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>

ilia-kats added 3 commits October 22, 2025 17:54

improve push performance scaling

f4d5eca

no need to argsort

fixup! improve push performance scaling

0d646ed

fixup! improve push performance scaling

717c362

ilan-gold approved these changes Oct 24, 2025

View reviewed changes

ilia-kats merged commit 0681f45 into scverse:main Oct 24, 2025
8 checks passed

ilia-kats mentioned this pull request Nov 4, 2025

push_obs transfers labels out of order #109

Open

		== mdata.mod[m].obs["common_obs_col"].to_numpy()[modmap[mask] - 1]
		).all()

fixes for push_attr/pull_attr #105

fixes for push_attr/pull_attr #105

Uh oh!

Conversation

ilia-kats commented Oct 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov-commenter commented Oct 6, 2025 • edited by codecov bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

gtca left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ilia-kats commented Oct 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ilan-gold left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ilan-gold left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ilan-gold left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ilia-kats commented Oct 22, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

ilia-kats commented Oct 6, 2025 •

edited

Loading

codecov-commenter commented Oct 6, 2025 •

edited by codecov bot

Loading

ilia-kats commented Oct 10, 2025 •

edited

Loading