Unexpected inconsistency in records

(codeblog.jonskeet.uk)

59 points | by OptionOfT 3 days ago

12 comments

andygocke 15 hours ago
Unfortunately, there are alternatives to this behavior, but they all have other downsides. The biggest constraint was the schedule didn't support a new version of the .NET IL format (and reving the IL format is an expensive change for compat purposes, as well). There were two strong lowering contenders, with their own problems.
The first is to use a `With` method and rely on "optional" parameters in some sense. When you write `with { x = 3 }` you're basically writing a `.With(x: 3)` call, and `With` presumably calls the constructor with the appropriate values. The problem here is that optional parameters are also kind of fake. The .NET IL format doesn't have a notion of optional parameters -- the C# compiler just fills in the parameters when lowering the call. So that means that adding a new field to a record would require adding a new parameter. But adding a new parameter means that you've broken binary backwards compatibility. One of the goals of records was to make these kinds of "simple" data updates possible, instead of the current situation with classes where they can be very challenging.
The second option is a `With` method for every field. A single `with { }` call turns into N `WithX(3).WithY(5)` for each field being set. The problem with that is that it is a lot of dead assignments that need to be unwound by the JIT. We didn't see that happening reliably, which was pretty concerning because it would also result in a lot of allocation garbage.
So basically, this was a narrow decision that fit into the space we had. If I had the chance, I would completely rework dotnet/C# initialization for a reboot of the language.
One thing I proposed, but was not accepted, was to make records much more simple across the board. By forbidding a lot of the complex constructs, the footguns are also avoided. But that was seen as too limiting. Reading between the lines, I bet Jon wouldn’t have liked this either, as some of the fancy things he’s doing may not have been possible.
[-]
- louthy 13 hours ago
  > The biggest constraint was the schedule didn't support a new version of the .NET IL format (and reving the IL format is an expensive change for compat purposes, as well).
  My biggest sadness reading this is that what MS have done is to outsource the issue to all C# devs. We will all hit this problem at some point (I have a couple of times) and I suspect we will all lose hours of time trying to work out WTF is going on. It may not quite be the Billion Dollar Mistake, but it's an ongoing cost to us all.
  A possible approach I mentioned elsewhere in the thread is this (for the generation of the `with`):
```
    var n2 = n1.<Clone>$();
    n2.Value = 3;                  // 'with' field setters
    n2.<OnPostCloneInitialise>();  // run the initialisers
```
  Then the <OnPostCloneInitialise>:
```
    public virtual void <OnPostCloneInitialise>()
    {
        base.<OnPostCloneInitialise>();

        Even = (Value & 1) == 0;    
    }
```
  If the compiler could generate the <OnPostCloneInitialise> based on the initialisation code in the record/class, could that work?
  That would just force the new object to initialise after the cloning without any additional IL or modifications.
  [-]
  - andygocke 11 hours ago
    > MS have done is to outsource the issue to all C# devs
    Let's be clear: breaking dozens of tools because of a change to the IL format also outsources an issue to all C# devs. The .NET IL format has been basically unchanged since .NET 2.0 and huge numbers of people take very hard dependencies on the exact things they do and do not expect. I don't expect we would have been able to make significant changes due to the breaking change impact.
    > A possible approach I mentioned
    This would likely be even harder to understand. For better or worse, the .NET design is that external initializers happen _after_ the constructor runs. That's been true all the way back to when the initializer syntax was first introduced in C# 3. Making regular initializers and `with` initializers have inverted order strikes me as being way worse.
    If I could go back in time, I think the main change to C# I would make would be to enforce that the constructor always runs after all external initialization.
    [-]
    - louthy 10 hours ago
      Slightly confused. My suggestion was to run the initialisers after the new object has been constructed (cloned+modified). The semantics are the same as you describe even if the underlying implementation is different.
      What am I missing?
      [-]
      - andygocke 6 hours ago
        There are two types of initializers: internal and external. Internal are inside the type, like field and property initializers. External are outside, like object initializers, collection initializers, and ‘with’ clauses.
        Internal initializers are run as part of the constructor, before any user code. External initializers are run after the constructor, on the constructed object.
        For instance:
        class C { public int P = 5; } var c = new C { P = 3 };
        `c.P` has the value 3.
        In your example:
        var n2 = n1.<Clone>$(); n2.Value = 3; // 'with' field setters n2.<OnPostCloneInitialise>(); // run the initialisers
        The “PostCloneInitializers” you’re running are the field initializers, so the order is backwards. You’re overwriting the value of the external initializers with the internal initializers.
wpollock 19 hours ago
As I read the post, I thought of relational data models. The behavior is expected. I believe the root issue is your records should not have computed fields that depend on mutable fields. Change your record schemas to eliminate that and you should have no further problems using "with".
If changing the schema isn't reasonable, use a copy constructor instead.
[-]
- louthy 19 hours ago
  > The behavior is expected
  It isn't, that's why there's a blog article documenting how unexpected it is.
  > that depend on mutable fields
  The fields are not mutable. The `with` expression creates a whole new record, clones the fields, and then sets the field you're changing (the field is read-only, so this is the compiler going 'behind the scenes' to update the new record before it sets the reference). The reason for all this is performance: the new structure is allocated on the heap, a memcopy happens (old structure copied onto the new), and then the `with` changes are applied. It's just at this point the 'init time' properties aren't run on the new object.
  In the language the fields are immutable. So, the argument is that a whole new record initialised with the fields of the old record (with some changes) should run the 'init time' properties so that they get set too, otherwise the data-structure can become inconsistent/poorly-defined.
  > use a copy constructor instead
  It's probably worth reading the article:
  "Note that because Value is set after the cloning operation, we couldn’t write a copy constructor to do the right thing here anyway."
  [-]
  - cbsmith 18 hours ago
    > It isn't, that's why there's a blog article documenting how unexpected it is.
    The behaviour is expected for the language design. Whether developers using the language expect it is a separate matter.
    The with operator clearly allows someone to break encapsulation and as such should only be used in cases where you aren't expecting encapsulation of the underlying record.
    > It's probably worth reading the article:
    > "Note that because Value is set after the cloning operation, we couldn’t write a copy constructor to do the right thing here anyway."
    It's probably worth reading the entire article, as that quote is followed by: "(At least, not in any sort of straightforward way – I’ll mention a convoluted approach later.)", which presumably was what was being referred to there.
    In general, there's a whole ton of gotchyas around encapsulation of precomputed values. That's just life outside of a purely functional programming context.
    [-]
    - louthy 18 hours ago
      > The behaviour is expected for the language design.
      So, Microsoft meant it. Ok...
      > Whether developers using the language expect it is a separate matter.
      Really? Perhaps read the 'Principle of Least Astonishment' [1] to see why this is a problem. If I create a new object I would expect the 'init time' properties to be initialised.
      > It's probably worth reading the entire article, as that quote is followed by: "(At least, not in any sort of straightforward way – I’ll mention a convoluted approach later.)", which presumably was what was being referred to there.
      It's probably worth continuing to read the article. Because the attempt to deal with it required manual writing of Lazy properties:
      private readonly Lazy<ComputedMembers> computed = new(() => new(Value), LazyThreadSafetyMode.ExecutionAndPublication);
      That's not practical. Might as well use computed properties.
      > In general, there's a whole ton of gotchyas around encapsulation of precomputed values. That's just life outside of a purely functional programming context.
      Great insight. Let's not run the 'init time' properties for a newly initialised object, just in case it works as expected. This 'feature' can't even be manually resolved by doing post-`with` updates (because often the properties are init/read-only). It makes the whole init-property feature brittle as fuck.
      [1] https://en.wikipedia.org/wiki/Principle_of_least_astonishmen...
      [-]
      - cbsmith 16 hours ago
        > Really? Perhaps read the 'Principle of Least Astonishment' [1] to see why this is a problem. If I create a new object I would expect the 'init time' properties to be initialised.
        The Principle of Least Astonishment definitely applies. It's a deliberate design choice that unfortunately violates the principle.
        > Great insight. Let's not run the 'init time' properties for a newly initialised object, just in case it works as expected. This 'feature' can't even be manually resolved by doing post-`with` updates (because often the properties are init/read-only). It makes the whole init-property feature brittle as fuck.
        ? I'm not sure I follow what you are going with here, but yeah, in general you'd have to carefully limit all uses of "with" for objects with precomputed values to inside the encapsulation of said objects. Alternatively, as you mentioned, you could just not have precomputed properties.
        [-]
        louthy 14 hours ago
        > in general you'd have to carefully limit all uses of "with" for objects with precomputed values
        What you’re describing is incidental complexity. It is not a good thing. We can’t constrain the complexity with language constraints, we have to rely on the programmer following some guidance and never ever making a mistake. In large teams and large code-bases that's a problem. And it's an offloaded problem from Microsoft to every software development team on the planet that uses C#.
        The incidental complexity for the average C# developer has increased, whereas a better direction of travel would be toward correctness and declarative features. I would prefer it if the csharplang team worked toward that.
        [-]
        cbsmith 12 hours ago
        > What you’re describing is incidental complexity. It is not a good thing.
        Yes.
wavemode 19 hours ago
If John Skeet of all people is confused about something in C#, that probably means it's objectively confusing behavior.
OptionOfT 19 hours ago
I think this stems from having properties, which are synthetic sugar for a backing-field and a getter & setter function.
This muddies the water between just setting a field vs executing a function that does work and then sets the field.
If I write a record with an explicit setFoo(foo: Foo) I wouldn't expect a clone & subsequent direct field assignment to execute the setFoo(foo: Foo) code.
[-]
- louthy 18 hours ago
  > wouldn't expect a clone
  If I create a new object I would expect the 'init time' properties to be initialised. Regardless of how it was initialised. The current approach just leads to inconsistent data structures, with significant issues for debugging how a data structure got into an inconsistent state. Modern language features should not be trying to save nanoseconds like they did in the past. Or, should at least default to 'correct' with performance opts outs.
  [-]
  - OptionOfT 14 hours ago
    You don't even need to use the `with` operator to get into an inconsistent state:
```
    using System;
    
    // same behavior with 
    // public sealed class Inner
    public struct Inner
    {
        public int Value { get; set; }
    }
    
    // same behavior with
    // public sealed record class Outer(Inner Inner)
    public record struct Outer(Inner Inner)
    {
        public bool Even { get; } = (Inner.Value & 1) == 0;
    }
    
    class Program
    {
        public static void Main(string[] args)
        {
            var inner = new Inner { Value = 42 };
    
            var outer = new Outer(inner);
    
            inner.Value = 43;
    
            Console.WriteLine("{0} is {1}", inner.Value, outer.Even);
        }
    }
```
    Would you expect Even to be updated here?
    [-]
    - louthy 14 hours ago
      Sure, but you’re talking about a 25 year old feature (structs). One that was implemented like that for performance reasons in C# 1.0 – because we were all using single or dual core machines back then.
      Records were a relatively recent feature addition, computers are significantly more powerful, and our programs are significantly more complex. And so it’s hella frustrating that MS didn’t opt for program correctness over performance. They even missed the opportunity to close the uninitialised struct issue with record structs (they could have picked different semantics).
      I find their choices utterly insane when we’re in a world where bad data could mean a data breach and/or massive fine. The opportunity to make records robust and bulletproof was completely missed.
      I was hoping that we’d get to a set of data-types in C# that were robust and correct: product-types and sum-types. With proper pattern matching (exhaustiveness checking).
      Then either a Roslyn analyser or a C# ‘modern mode’ which only used the more modern ‘correct’ features so that we could actually go away from the compromises of the past.
      Unfortunately many modern features of C# are being built with these ugly artefacts seeping in. It’s the opposite of declarative programming, so it just keeps increasing incidental complexity: it exports the complexity to the users, something a type-safe compiler should be reducing.
_old_dude_ 20 hours ago
For the record (sorry), I believe C# uses the clone operation because records support inheritance.
For me, this is where lies the design flaw, trying to support both inheritance and be immutability at the same time.
[-]
- ethan_smith 16 hours ago
  The inheritance + immutability combination forces the compiler to use field-by-field copying rather than constructor chaining, which bypasses the property initialization logic that would maintain consistency between related fields.
- louthy 18 hours ago
  Cloning anything creates a new object of a known type (well, the runtime knows at least) and so if the object re-runs the init-properties of the known type then it will be the same as constructing that type afresh.
  You could even imagine a compiler generated virtual method: `OnCloneReinitialiseFields()`, or something, that just re-ran the init-property setters (post clone operation).
  Is there some other inheritance issue that is problematic here? Immutability isn't a concern, it's purely about what happens after cloning an object, whether the fields are immutable or not doesn't change the behaviour of the `with` operation.
- hvb2 20 hours ago
  It was never immutable? You can have collections on there that are perfectly mutable as well as being able to set values? You CAN use the with keyword, but that's a choice.
  I've seen people use records for value based equality and to use for things like dictionary keys. Immutability in c# just doesn't exist, any attempt to achieve it is flawed from the start
benmmurphy 20 hours ago
Seems like enough of a gotcha that other people have stumbled upon the issue as well: https://blog.codingmilitia.com/2022/09/01/beware-of-records-...
[-]
- firesteelrain 19 hours ago
  > SomeCalculatedValue in the second line still has the same value as in the first. This makes sense, as what happens when we use the with expression, is that the record is cloned and then the properties provided within the brackets are overwritten.
  Shouldn't SomeCalculatedValue be "This is another some value *calculated*" when using with ?
  Edit: Actually that is the problem, that it isn't "recalculating" due to the way that the initialization of read-only properties works in a C# record.
  [-]
  - bee_rider 18 hours ago
    Did you go through the same journey as the author of the article, in compressed form? Further evidence it is a confusing feature, if so!
    [-]
    - firesteelrain 17 hours ago
      Yes! It didn’t make sense at first. It’s not intuitive unless you understand the internals on how it works and then it makes sense. I don’t think there is anything wrong just needs to be documented better
louthy 19 hours ago
I've hit this before and facepalmed when I realised what they had done. Records were supposed to work more like immutable product-types in F# and other functional languages, but with this approach it can't be seen as anything other than broken IMHO.
Sometimes the csharplang team make some utterly insane decisions. This is the kind of thing that would happen back-in-the-day™ in the name of performance, but in the modern world just adds to the endless list of quirks (like uninitialised structs). I suspect there are elements of the design team that are still stuck in C#1.0/Java mode and so holes like this don't even seem that bad in their minds. But it literally leads to inconsistent data structures which is where bugs live (and potential security issues in the worst cases).
uticus 20 hours ago
It's been a while since I followed Jon Skeet, but his books on Manning were always worthwhile. Plus the Jon Skeet facts [0] is fun.
[0] https://meta.stackexchange.com/questions/9134/jon-skeet-fact...
[-]
- hvb2 20 hours ago
  [dead]
harpiaharpyja 15 hours ago
Ehh, I think the real footgun here is using a property with backing storage to store what is clearly a derived value. Using a computed property is what we really should be doing here, if we think our code should line up with our intentions.
I feel like what's happened here is that the author actually needed a system to cache their derived values, but didn't think to build that out explicitly.
high_na_euv 20 hours ago
To be fair I dont think this behaviour is unreasonable
[-]
- mlhpdx 15 hours ago
  I tend to agree. It’s always been my understanding that record types were strictly data holders and shouldn’t be embellished with behavior.
apwell23 20 hours ago
i wonder how much of Coding Agents/AIs are now just Jon Skeet.
croemer 20 hours ago
The "– Jon Skeet's coding blog" in the title is not necessary as the URL shows in parentheses after the title. Adding C# however might be helpful.