* WGs marked with an * asterisk has had at least one new draft made available during the last 5 days

Lsr Status Pages

Link State Routing (Active WG)
Rtg Area: Alvaro Retana, Martin Vigoureux, John Scudder | 2018-Feb-23 —  

IETF-111 lsr minutes


minutes-111-lsr-00 minutes

          IETF 111 LSR Agenda
          Chairs:      Acee Lindem (acee@cisco.com)
                       Chris Hopps (chopps@chopps.org)
          Secretary:   Yingzhen Qu (yingzhen.ietf@gmail.com)
          WG Page:     http://tools.ietf.org/wg/lsr/
          Materials:   https://datatracker.ietf.org/meeting/111/session/lsr
          1. Meeting Administrivia and WG Update
          Chairs     (10 mins)
          John Scudder: The flex-algo draft is in my queue, I'll get to it.
          2. Flooding Speed   (65 mins)
              - Les Ginsberg (15 mins)
              - Bruno Decraene  (20 mins)
              - Discussion (30 mins)
          Chris H:   Tony P mentioned in the chat that it should be LSPTxRate in
                     the pic.
          Les:       Right. apology for that.
          Acee:      Are those attempts at the source?
          Les:       No, these are seconds.
          Chris H:   How much faster with RWin than with congestion control? Is
                     there a comparison?
          Guillaume: I don't think the comparison is relevant.
          Chris:     They have to be relevant, so we can compare. How much faster
                     we can flood, like in Les's presentation, there are numbers.
                     Is there some number we can say? For example, cat RWin get
                     everything done for example in 3s?
          Guillaume: RWin algorithm should not be used alone. It should be used
                     together with congestion control as additional guarantee, so
                     congestion control doesn't lose packet due to CPU contention.
          Chris:     Is RWin equal to Les's making transmit ack dynamic?
          Guillaume: In a way, yes.
          Bruno:     It is Difficult to compare directly because hardware is
                     different. If you just use RWin, the sender adapts to the
                     receiver, it's the maximum the receiver can do. If the
                     pauses for  300ms, the sender will pause for 300ms. Anything
                     changes in the receiver, sender adapts quickly.
          Chris:     Les, is there a fixed target in your proposal? In RWin, it's
                     a target set by the receiver.
          Les:       Agreed with Bruno, you can't compare raw numbers because of
                     hardware and implementation difference. The more relevant
                     question is which one is more adaptive? From the email
                     communication, they're not expecting to modify RWin
                     dynamically, it's chosen at startup. So to me, it means you
                     have to pick a conservative number.
          Chris:     A good point. RWin reminds me of credits.
          Les:       I'll defer this to Bruno. My understanding it's not adaptive,
                     in order to be adaptive, there are numbers you have to get
                     and it's hard to get them.
          Guillaume: There are two things, RWin and congestion control. RWin is
                     the upper bound on what can be stored before ISIS processing.
                     We need both RWin and congestion control. In our case the
                     congestion control is different from Les monitoring ack
                     rate, we monitor if an LSP gets delayed for too long
                     compared to usual rate, so the dynamic of the sender coming
                     from the congestion control. The RWin is a guarantee on top.
          Tony Li:   Consider this as a control loop problem. The point is we have
                     feedback, we can do better. Les's slides shows there is
                     some time for the transmitter to react. Something to improve.
          Chris H:   I agree with Tony, it's a good starting point. Some people
                     are interested in not adding to much info, rate-limiting,
                     etc.  I don't think we should be so averse to it.
          Bruno:     The window is static, but it's not the rate we're going to
                     achieve. The rate is determined by how fast the receiver can
                     process and ack. The window is static but the rate is dynamic
                     based on how fast the sender and the receiver can process.
          Chris H:   How is TxMax value in Les's slides related to RWin?
          Les:       The answer is yes. But the Tx based algorithm is built to
                     adapt. RWin is based on picked number, and it's not going
                     to change or adapt. The behavior from RWin is because the
                     receiver says I can receive 10 LSPS, the magic number, then
                     I have to pause. In both cases, the testing done so far is
                     artificial because we only run ISIS on a limited topology,
                     and no FIB update, etc. In real world, there will be data
                     traffic, etc. The capability of the receiver is not going
                     to be static.
          Chris H:   Did you simulate this?
          Les:       Yes, in there a simplified way to demonstrate that the
                     algorithm does  adapt. It doesn't adapt in case of slowdown,
                     with zero re-transmissions but it does adapt and I think
                     this is an important aspect. In real world, we will be doing
                     lots more than just ISIS.
          Guillaume: I agree you need congestion control. The value you advertise
                     is not magic, it's the space LSPs get processed and I don't
                     think it's too much to ask. You know the buffer you have
                     where LSPs get stored before processing. I agree that RWin
                     can become a bottleneck especially in case of large RTTs. In
                     the example I sent on the list, you have a burst of 10 LSPs,
                     and RTT of 10ms, you're able to reach 1000 LSP per second.
                     So likely if you have a large buffer, you will never reach
                     this bottleneck. So you actually need a congestion control
                     algorithm. I agree with this.
          Tony Li:   Although RWin is static, let's not set it in stone. We don't
                     want to pick A or B right now. The point we're trying to make
                     is feedback is helpful. In a perfect world we could tell
                     instantaneously the transmitter what rx-max is all the time.
                     We can't do that, so then the question is how fast we can
          Bruno:     Agree with Les. We want the sender to be dynamic and adaptive.
                     If the receiver stops 200ms, the sender will stop 200ms after
                     one RTT without losing any LSPs. We can adapt to different
                     numbers of neighbors without losing LSPs. With CPU bound, you
                     can see in Les's slides, it adapts but after losing LSPs. So
                     both adapt with one faster and not losing LSPs.
          Ketan:     RWin is like TxMax Rate, more or less. Whether it's static or
                     constant, I don't see it as s max rate configured on a per
                     link basis. I see the challenge is that it should be a dynamic
                     value, not static. Even static, I don't know how it can be
                     determined, socket or BGP, etc. There are additional
                     requirements needed to implement this, backward compatibility,
                     implementation assumptions etc., and it should be documented.
          Les:       Agree with Tony Li. If we could adapt with RWin dynamically,
                     it will be useful, but we don't know how to do it and it's
                     very difficult. Anything presented so far doesn't give a
                     and that's a significant issue. We need practical solution.
                     Like Ketan just said, it's important to work with nodes not
                     optimized, the RWin-based proposal is heavily dependent on
                     PSNP response time optimized. I just don't want to make an
                     assumption all routers are optimized.
          Chris H:   We haven't talked about where to drive the information, I
                     personally don't believe it's so hard. First thing to my mind,
                     the line-card queue depth to the RP. We need more
          Acee:      I agree feedback is good. These two proposal are using
                     different feedback, one is RWin and the other is the
                     taking the actual behavior of the receiver as indicated by
                     the acks sent. There are lot of differences how you implement
                     it. The other thing is whether or not to have an interim for
          Chris H:   This is fruitful discussion and we may have an interim for it.
          Acee:      Let's take it to the list what we should do about
          Chris H:   It will be great if we can get apples-to-apples comparison or
                     close to it. Let's take the discussion to the list.
          3. IS-IS Flood Reflection
          Tony Przygienda   (5 mins)
          Acee:      I'd like to see more discussions on this draft on the list.
          Tony P:    I'll work with you on the code point stuff.
          Chris H:   Can you do without tunneling?
          Tony P:    It's possible. but you may not want to do it operationally
                     wise. Will add some clarifications.
          Les:       The code point has been renewed.
          4. Flexible Algorithms: Bandwidth, Delay, Metrics and Constraints
          Shraddha Hegde    (10 mins)
          Chris H:   We're not going to make a consensus call now. If you have a
                     contention, better go to WG since it's a WG doc. Just chair
          Ketan:     Normally SR-TE is set up to the node, not to the prefix. If
                     that's achieved through generic metric on the link, I'm not
                     sure. That applies to RSVP-TE as well. If it's needed generic
                     metric at prefix level, we can add it later.
          Shraddha:  Maybe I was not clear, I will clarify on the list.
          Acee:      Reiterate what I said on the list. we spent all this time on
                     ASLAs and now in an existing WG doc that does
                     something everybody agreed we needed, we introduce this
                     generic metric which is not compatible. It's ambiguous rather
                     than using application specific metrics you have different
                     types for differnt applications. Maybe we should use a
                     metric for these bandwidth constraints and move generic metric
                     to a separate proposal. For example, you said you could put in
                     extended link attributes or TE LSA, we have to go back to
                     correlating LSAs for flex algo and that would be a disaster.
          Shraddha:  I don't understand your concern. Are you saying we should not
                     have it in TE LSA? What's the point? There is no proposal to
                     use it from TE LSA from flex-algo.
          Tony Li:   There's no end run going on. In earlier draft, we had it as
                     bandwidth metric and we had definitions on how bandwidth
                     be defined, later we concluded that it's purely a local
                     definition. No reason to mandate a operator to use a
                     algorithm. So it makes more sense to make it generic.
          Chris H:   In the draft, when you specially talk about cases, are you
                     moving those to use cases?
          Tony Li:   You could use generic metric for bandwidth.
          Chris:     Maybe we didn't talk about it much in the draft.
          Ron:       We didn't violate the word written in RFC8919. Maybe the
                     author's intent.
          Chris:     If errata is needed, we can file one.
          Ron:       It's an update to the document. The community reviewed the
                     text on the page, not the author's mind.
          John:      If you open an errata, I'll look at the consensus and how
                     it's written down. If it got written down wrong, then it's an
                     errata, otherwise the errata doesn't get confirmed.
          Chris H:   Thanks. Let's go with the easy hanging fruits, and go for the
                     heavier if we have to.
          5. IS-IS and OSPF Extension for Event Notification
          Peter Psenak      (15 mins)
          Acee:      It's like a best-effort delivery.
          Peter:     Yes. There is reliability but limited.
          Acee:      It's based on when you got the component of the summary.
                     There are two parts, the mechanism and the events triggered.
          Peter:     It can be any application.
          Huaimo:    It defines generic procedures and encodings for distributing
                     events. Ten years ago, I had a draft using traditional way.
                     This way is much better.
          Aijun:     This is another approach for the scenario in the PUA draft,
                     xxxxx (voice broken, will send it to the list).
          Chris H:   Interesting new work, let's discuss more on the list. Is this
                     related to the next presentation?
          Peter:     Yes. One of the use case is related with prefix unreachable,
                     but we use a completely different mechanism. And this defines
                     a generic mechanism.
          6. Updates for PUA and Passive Interface Attributes 
          Gyan Mishra/Aijun Wang    (10 mins)
          Acee:      I don't think we need this. We have links topologically
                     significant to the IGPs, and we have prefixes for local
                     addresses. You can take this stub link to carry info for
                     applications, but IGP doesn't need it. But to invent a new
                     construct to do it, and advertising the prefix separately
                     from the prefix used for the route computation. That's
                     what I think is wrong, I know we disagree on it. But that's
                     my comment.
          Chris H:   Is this a WG doc?
          Acee:      They requested WG adoption.
          Chris H:   Let's have more discussions on the list.
          Aijun:     We had it in prefix TLV, but after discussion on the list we
                     changed it to stub link. We will discuss more.
          Acee :     Once you add address to the stub link, but you're advertising
                     it two different ways, that's a good indication this not the
                     right way to encode it.
          7. Meeting Closure
          Chairs (5 mins)
          Chris H:   Discussions are good. we will look into an interim. Thanks
                     for participating, see you next time.
          From the Chat:
          Ketan Talaulikar
          @ Acee the authors remove the reference to OSPFv3 SRv6 from the flex-algo;
          it will be covered in the OSPFv3 SRv6 draft.
          s/remove/removed ... this was done in draft-ietf-lsr-flex-algo
          Bruno Decraene
          Actually, Guillaume will present
          Tony Przygienda
          just as clarification the ack delay is not even necessary, it's just an
          optimization to prevent the algorithm to back off too quickly on large
          ACK delays on the Rx
          my observation to Les was actually that it's even simpler to watch the
          outstanding LSPs in the LSP-RETX queue and just start to back off when
          it starts to reach certain % of the max rate (equivalent to not enough
          acks really for whatever reason).
          Jeffrey Haas
          I'm waiting to see if there will be a case simulating some % of packet
          loss and its behavior
          Tony Przygienda
          yeah, lots graphs coming
          ah, you mean loss on purspose? les didn't do it but from experience it
          doesn't matter
          Jeffrey Haas
          I have different experience, but that's mostly in tcp timers.
          Tony Przygienda
          overload/loss/slow all the same to algorithm. it just builds up lsp retx
          queue or not enough ack and hysterisis backs off
          yeah, that's why we should NOT use TCP here ;-)
          TCP collapses very quickly on losses
          Jeffrey Haas
          yep. but it does give you a strong sense how various re-xmit algos play
          Guillaume Solignac
          There are TCP algorithms that work on bandwith monitoring as well
          (Google's BBR)
          Tony Przygienda
          @Guillaume, correct, all the nwere work
          ultimately problm is TCP is ordered and link state flooding is not
          hence you don't thave to back-off since you don't get a big mbuf buildup
          Graph _still_ incorrect, thos are LSPTxRate!
          Jeffrey Haas
          A case of cadence matters.
          Tony Przygienda
          as side note: even with a socket per peer in ISIS lots of platforms have
          single queue bottlenecks in the whole chain from port to user space
          Les Ginsberg
          It is not possible for IS-IS to know (at either end) the difference
          between loss and delay. This is because the state of the queues/punt
          path from dataplane to IS-IS in control plane is not known by IS-IS -
          and is difficult to know. And because there is no ordering of LSPs - so
          receiving an update from Node A tells you nothing about whether you should
          have received an update from Node B. (The TCP analogy is not a good fit)
          Tony Przygienda
          yeah, we can go for quite a bit really. on lots platforms buffer space
          is shared amongst sockets as well
          Bruno Decraene
          @TonyP that's not a problem. RWIN is used to control rate between
          application (IS-IS). For limitations on the path, a congestion control
          algo (roughly similar in both draft) is used.
          So RWIN is used in addition (not in replacement)
          Tony Przygienda
          2000 on all VLANs or just one?
          looks like all VLANs. this is a very low rate to saturate even a small
          CPU by well implemented flooding IME. I'm surprised
          Bruno Decraene
          total (200LSP/s per neighbhor)
          Tony Przygienda
          IME MaxTX is superfluous. The hysteresis will back off nicely
          it's more of a "sanity upper bound" so the thing doesn't run away to
          eat all CPU/I/O possibly ;-)
          Guillaume Solignac
          You have to take care of TCP fairness though, your algorithm could get
          crushed by BGP
          Tony Li
          Take a look at Les' graphs and look at the latency to adapt.
          Tony Przygienda
          @Guillame, that's solved differerntly on a good box
          Tony Li
          What would happen if the receiver could signal RXmax?
          Guillaume Solignac
          Does it even exist ?
          Tony Przygienda
          @Tony: yepp, of course. if you signal every 50 msecs that will be always
          faster than waiting for backpressure by loss/queue overbuild ;-)
          it could be even faster if you entangle the RX and TX ;-)
          OK, tired of waiting for the meandering mike so I keep it very clip &
          1. fix window ain't gonna cut it and you won't be able to compute it
          since very platform has so many variables you never get it right. And
          the load of the system changes on top
          2. you can't signal that fast reliably with lots peers, @ scale. Assuming
          very short, precise timers in user space on real systems is just that,
          assumption. You can have very few or you can have them in the kernel,
          in user space timer slips in 100s of msecs are normal fare.
          Jeffrey Haas
          lack of resources in whatever flavor will manifest as drop.
          Guillaume Solignac
          @Tony the real signal is not in the TLV, it is the PSNPs
          Fix window allows to pace the sender to the PSNPs
          The PSNPs rate is a dynamic signal that you use as well, but you have
          one information less
          So you have less guarantees
          Tony Przygienda
          yes, that's a reasonable way to see it, you are free to send less ACKs
          to backpressure. It's a "poor man's window" if you want. It kind of
          happens naturally when ISIS gets busy and doesn't get to push PSNPs
          (modulo parallel implementation which is a different kettle of pisces ;-)
          Tony Li
          You could estimate your own RXmax.
          How many did you process in the las t 1s?
          Jeff Tantsura
          great stuff!
          Robert Raszuk
          @Tony Doesn't it also depend on how much you got ?
          Tony Li
          No, it doesn't need to.
          If you managed to process 100 in the last second, you say that.
          Transmitter can infer that you dropped a million. :-)
          Robert Raszuk
          Oh in that direction ... sure
          I was looking at the max in maximum point
          Tony Li
          The point is that we need feedback. As we learn, we can be more
          sophisticated about the feedback.
          And yes, the feedback is optional. We do have to work with legacy.
          John Scudder
          I missed precisely what it was Acee was calling an "end run"?
          Bruno Decraene
          @Les in slide 8 you have loss of LSPs when adapting/slowing down. Do you
          think that you could add a test in your implementation? if UnAcknowledged
          LSP > 40, pause sending LSP. And see if this can reduce or eliminate
          your loss of LSPs?
          John Scudder
          I mean, I can and will go back and listen to the replay to try to
          understand it, but I'd appreciate clarification from @Acee.
          Robert Raszuk
          Well Pulse would trigger BGP route calculation (best path run) - how
          do you communicate Pulses from IGP to BGP ? via RIB ? how if it only
          contains IGP summary ...
          Also is there no worry about Pulse based DDoS to poor nodes when we have
          massive failures ? I assume you are not planning on summarizing Pulses ?
          Les Ginsberg
          @Bruno - we tried several different strategies - one of them was what
          you suggest. Performance was not as good.
          Tony Przygienda
          @Bruno. roughly what I said as in "don't count ACKs, enough to look @
          your outstanding queue". Howver, from direct implementation expreince,
          you want to look @ % of queue as your flooding speed rather thaqn a
          constant number. Yes, if RX could somehow signal his window, that could
          be taken into account but again, there is no timestamp on anything (or
          for that matter distributed time), you cannot tell losses from delays
          or no sending etc. Especially assuming the very small timescales given
          the burstiness of load on real world systems.
          Bruno Decraene
          @Les performance is limited by size of RWIN and RTT (idem as with TCP). So
          probably, the PSNP were not sent fast enough. How fast do you send PSNP?
          Les Ginsberg
          @Bruno - we have tested w a variety of PSNP times - but all the data
          shared today we were acking within 50 ms.
          Bruno Decraene
          @Les ok. With a RWIN of 40, that should give 800LSP/s per neighbour. Below
          what you achieve with a single neighbor. Probably better starting with
          3 neighbours.
          Tony Li
          IGP is not a dump truck.
          Jeff Tantsura
          we have got BGP for that...

Generated from PyHt script /wg/lsr/minutes.pyht Latest update: 24 Oct 2012 16:51 GMT -