Archive for 2009

First they laugh at you…

Friday, November 20th, 2009

I found this article in “Network Computing” pretty interesting, although not exactly for the content.   Just the framing of the whole article, with Microsoft is touting the fact that they’ve managed to achieve performance parity with Linux on some HPC benchmarks as an achievement (and putting up a graph that shows they are still at least a few percent behind), shows how dominant Linux is in HPC.  Also, the article says:

The beta also reportedly includes optimizations for new processors and can deploy and manage up to 1,000 nodes.

So in other words Microsoft is stuck at the low end of the HPC market, only usable on small clusters.

Poor choice of words

Tuesday, August 11th, 2009

I just got a marketing email from drugstore.com with the subject “Diaper blowout: save on Pampers, Huggies, Seventh Generation and more.”  If you’re not a parent and you don’t know why that’s unintentionally hilarious, just do a web search for “diaper blowout.”  Suffice it to say that when placed next to the word “diaper,” the word “blowout” does not usually connote slashed prices.

Lazyweb: best Verizon data card?

Tuesday, June 9th, 2009

I currently have Verizon mobile data service with a Kyocera PC card, and it works well with recent distros using NetworkManager.  However, my venerable laptop is being replaced with a Lenovo X200, which has no PC card slot, so I’ll have to replace my Verizon data card as well.  According to the Verizon Wireless web site, my choices seem to be the NovaTel V740 for ExpressCard, or for USB the UTStarcom UM175 or the Novatel USB760.

My question for the lazyweb is: which data card/EV-DO modem should I get (assume that I’ll be running Linux 99.9% of the time when I use it)?  The ExpressCard is substantially more expensive and less flexible (since I may want to use this card on a system without an ExpressCard slot someday), so I’d probably go with one of the USB cards if it’s left up to me.  The USB760 doubles as a micro SD reader, which is not useful to me, and confounds things with a mass storage interface that probably just causes confusion, so my first choice would be the UM175 probably.  However if someone with first-hand knowledge knows why that’s a bad decision, I’d love to hear about it in the comments.

(And I put a very high value in not having to boot into Windows periodically to update cell tower locations or anything like that, for what it’s worth)

RDMA on Converged Ethernet

Wednesday, March 25th, 2009

I recently read Andy Grover’s post about converged fabrics, and since I particupated in the OpenFabrics panel in Sonoma that he alluded to, I thought it might be worth sharing my (somewhat different) thoughts.

The question that Andy is dealing with is how to run RDMA on “Converged Ethernet.” I’ve already explained what RDMA is, so I won’t go into that here, but it’s probably worth talking about Ethernet, since I think the latest developments are not that familiar to many people.  The IEEE has been developing a few standards they collectively refer to as “Data Center Bridging” (DCB) and that are also sometimes referred to as “Converged Enhanced Ethernet” (CEE).  This refers to high speed Ethernet (currently 10 Gb/sec, with a clear path to 40 Gb/sec and 100 Gb/sec), plus new features.  The main new features are:

  • Priority-Based Flow Control (802.1Qbb), sometimes called “per-priority pause”
  • Enhanced Transmission Selection (802.1Qaz)
  • Congestion Notification (802.1Qau)

The first two features let an Ethernet link be split into multiple “virtual links” that operate pretty independently — bandwidth can be reserved for a given virtual link so that it can’t be starved, and by having per-virtual-link flow control, we can make sure certain traffic classes don’t overrun their buffers and avoid dropping packets.  Then congestion notification means that we can tell senders to slow down to avoid congestion spreading caused by that flow control.

The main use case that DCB was developed for was Fibre Channel over Ethernet (FCoE).  FC requires a very reliable network — it simply doesn’t work if packets are dropped because of congestion — and so DCB provides the ability to segregate FCoE traffic onto a “no drop” virtual link.  However, I think Andy misjudges the real motivation for FCoE; the TCP/IP overhead of iSCSI was not really an issue (and indeed there are many people running iSCSI with very high performance on 10 Gb/sec Ethernet).

The real motivation for FCoE is to give a way for users to continue using all the FC storage they already have, while not requiring every server that wants to talk to the storage to have both a NIC and an FC HBA.  With a gateway that’s easy to build an scale, legacy FC storage can be connected to an FCoE fabric, and now servers with a “converged network adapter” that functions as both an Ethernet NIC and an FCoE HBA can talk to network and storage over one (Ethernet) wire.

Now, of course for servers that want to do RDMA, it makes sense that they want a triple-threat converged adapter that does Ethernet NIC, FCoE HBA, and RDMA.  The way that people are running RDMA over Ethernet today is via iWARP, which runs an RDMA protocol layered on top of TCP.  The idea that Andy and several other people in Sonoma are pushing is to do something analogous to FCoE instead, that is, take the InfiniBand transport layer and stick it into Ethernet somehow.  I see a number of problems with this idea.

First, one of the big reasons given for wanting to use InfiniBand on Ethernet instead of iWARP is that it’s the fastest path forward.  The argument is, “we just scribble down a spec, and everyone can ship it easily.”  That ignores the fact that iWARP adapters are already shipping from multiple vendors (although, to be fair, none with support for the proposed IEEE DCB standards yet; but DCB support should soon be ubiquitous in all 10 gigE NICs, iWARP and non-iWARP alike).  And the idea that an IBoE spec is going to be quick or easy to write flies in the face of the experience with FCoE; FCoE sounded dead simple in theory (just stick an Ethernet header on FC frames, what more could there be?) it turns out that the standards work has taken at least 3 years, and a final spec is still not done.  I believe that IBoE would be more complicated to specify, and fewer resources are available for the job, so a realistic view is that a true standard is very far away.

Andy points at a TOE page to say why running TCP on an iWARP NIC sucks.  But when I look at that page, pretty much all the issues are still there with running the IB transport on a NIC.  Just to take the first few on that page (without quibbling about the fact that many of the issues are just wrong even about TCP offload):

  • Security updates: yup, still there for IB
  • Point-in-time solution: yup, same for IB
  • Different network behavior: a hundred times worse if you’re running IB instead of TCP
  • Performance: yup
  • Hardware-specific limits: yup

And so on…

Certainly, given infinite resources, one could design an RDMA protocol that was cleaner than iWARP and took advantage of all the spiffy DCB features.  But worse is better and iWARP mostly works well right now; fixing the worst warts of iWARP has a much better chance of success than trying to shoehorn IB onto Ethernet and ending up with a whole bunch of unforseen problems to solve.

Know anyone at Coverity?

Thursday, February 19th, 2009

The recent mention of scan.coverity.com at lwn.net reminded me that the Coverity results for the kernel (what they call “linux-2.6″) have become pretty useless lately.  The number of “results” that their checker produce jumped by a factor of 10 a month or so ago, with all of the new results apparently warning about nonsensical things.  For example, CID 8429 is a warning about a resource leak, where the code is:

      req = kzalloc(sizeof *req, GFP_KERNEL);
      if (!req)
              return -ENOMEM;

and the checker thinks that req can be leaked here if we hit the return statement.

The reason for this seems to be that the checker is run with all config options enabled (which is sensible to get maximum code coverage), and in particular it seems to be because the config variable CONFIG_PROFILE_ALL_BRANCHES is enabled, which leads to a complex C macro redefininition of if() that fatally confuses the scanner.

I’ve sent email to scan-admin about this but not gotten any reply (or had any effect on the scan). So I’m appealing to the lazyweb to find someone at Coverity who can fix this and make the scanner useful for the kernel again; having nine-tenths or more of the results be false positives makes it really hard to use the current scans. What needs to be done to fix this is simple to make sure CONFIG_PROFILE_ALL_BRANCHES is not set; in fact it may be a good idea to set CONFIG_TRACE_BRANCH_PROFILING to n as well, since enabling that option causes all if statements annotated with likely() or unlikely to be obfuscated by a complex macro, which will probably lead to a similar level of false positives.

Update: Dave Jones got me in touch with David Maxwell at Coverity, and he updated the kernel config so that we don’t get all the spurious results any more.  Thanks guys!