8.5.1 DAOS in the Replicator benchmarks, a slight disappointment
It does this by the allowing the two servers taking part in replication to communicate that a given NLO attachment either already exists in the target server DAOS repository or that it does not. If the NLO already exists in the target server DAOS repository, then that attachment is not sent. This means only "net new" attachments (ones where no NLO exist) are sent to a target server during replication. As many users and mail files have the same attachments (which is why we see such amazing savings on disk storage on DAOS) the same many, many copies of the same attachments now only ever replicate once between two discreet servers.
Think of it this way:
Source server : I am about to send you DAOS attachment abc.nlo (size 2.4MB)
Target server : I don't have this, send it over
Source server : Sending 2.4MB
Or
Source server : I am about to send you DAOS attachment abc.nlo (size 2.4MB)
Target server : I already have this NLO file. Do not send
Source server : Not sending 2.4MB
With one simple example you can see how powerful this will be once you scale it to entire Domino servers. As the attachment count goes up, and reply to all with history is hit, the bandwidth savings can be phenomenal. "Enough talk", you say, "show me the money".
Disclaimer : as 8.5.1 is only in beta right now, these results could significantly change (or IBM could change functionality) in the production release of ND 8.5.1.
Figure 1. Bandwidth used without DAOS enabled during replication
In figure 1, we move 9.2GB of mail and archive files as new replica's via AdminP to a new server. Without DAOS enabled we see a total of 9.33GB of bandwidth used.
So what happens when we enable DAOS?
Figure 2. Bandwidth used with DAOS enabled during replication
In figure 2, we see no difference? Really? WTF? I was expecting to see a 40-60% percent decrease.
Note, the extra GB or so is just normal replication after the initial build took place, so no, it really isn't larger.
This result threw me for a loop until I thought through the process. See, to get the mail files over to the server, we used AdminP and created replica stubs on the the new target server. Then replication took place. My guess here is that DAOS only works once the replicas have been fully initialized. This seems to be proved by figure 3 which shows a handful (in this case 38) attachments we "optimized". These were in fact "optimized" after the replicas were initialized. I was expecting so see 15,000+ as the number here. Bummer. Maybe in 8.5.2 :)
Figure 3. DAOS "optimized" attachment count.
Bottom line, DAOS in the replicator works the way I described it above (2.5MB example), just the replicas have to be fully initialized and created first. The way I "thought" it should work, with replica stubs would be very useful too.
Discussion for this entry is now closed.
Comments (20)
If the source database is DAOS-enabled and the destination server is DAOS-enabled and you attempt to do an accelerated replica it should fall back to doing a normal replication so you can take advantage of exploitation. Was the source database and destination server DAOS enabled when you created the replica?
Did the objects already exist in the DAOS store on the destination database when you created the new replica?
Was the server time of the two servers within 5 minutes of each other?
@all, so adminp looks like the culprit. Maybe I'll try to stub from the client, then use server to server.
@2, I'm not sure about the router yet. The client >= 8.5.1 does have this capability, just not sure about the server yet.
@3, yes. So that explains it. Sucks, but I guess that is the way it is.
@4, yes, kind of and yes. For "kind of" I mean that we did 10GB of mail, there are some duplicate attachments in all of them. Again, AdminP is the issue.
Time to retest using client initiated stubs.
Both sides were DAOS enabled. Now, maybe I'm just over thinking this, but....
I figured if both are DAOS enabled then during initial replication the source server would not send any NLOs that exist on the target. Given the bandwidth and DAOS stats above this does not look to be true.
Both servers are 8.5.1 CD8.
Did you use adminp accelerated replication to create the replicas or did you create stubs via the client?
Did the same behavior occur when creating the stubs via the client and then letting replication happen?
Also, there were already databases on the destination server that contained objects in DAOS that were also replicated over during the initial replication of the new replicas?
@8 - 1, adminp
-2, have not yet tried the client to do the stubbing.
-3, there were, but only from mail files created earlier in the process. I guess I'm trying to say that mail file "A" was replicated, all attachments added to DAOS, mail file "B" was replicated and had some of the same attachments. However, both "A" and "B" were stubbed at the same time via AdminP. Hence, my feeling that AdminP stubbing is the issue.
It will be a day or two before I can fully test some more.
So when mail file "B" was replicated over it didn't take advantage of any of the objects that had already been replicated over by mail file "A"?
Did the replication happen after mail file "A" was completely replicated or did they happen at the same time?
We've done some testing here and it seems to work even when the stubs are created by adminp.
No it didn't (at least based on the bandwidth utilization I saw).
"A" was completely replicated before "B" began.
I'm pretty sure the bandwidth (and the DAOS count of 38 items) is due to the "x-copy" type copying that takes place when AdminP stubs the files out.
So two things look apparent:
1) AdminP stubbing of replicas will not give any wire bandwidth savings when initially creating replicas on other servers.
2) SCR (streaming cluster replication) does NOT save any bandwidth either.
What we've seen here is that if the source DB is DAOS-enabled and the destination server is DAOS aware the adminp "x-copy" shouldn't happen -- it should fall back to creating the stub through normal replication. Have you verified if client created stubs work differently?
SCR only saves the disk i/o on the destiation side.
Over on his blog, Ulrich may have found the solution. See here, http://www.eknori.de/2009-09-04/daos-domino-8-5-1-and-replication/
@11, I will run another test once back in the office (doing a 8.5 upgrade this weekend). In your lab tests what does the "sh daos stats" show? Does it show 0 after the initial replication and initialization has completed or does it show the number of NLO objects that were not sent (ie, on the ticket was sent)? If you monitor the bandwidth until successful initialization takes place between the servers do you see a dramatic decrease in bandwidth utilization like I had expected? Thanks to Ulrich's blog post, I think the full text options may have croaked the accelerated replica request and forced it back to "non-accelerated". That could explain the differences from what you are seeing and what I posted above.
As this is all starting to make some sense, I'll do some more testing and do another post next week.
Also, thanks for the clarificaiton on the cluster replicator. I had wondered about that since I first saw it on Ed's slides.
@16, provided your BES server supports 8.5.x DAOS is not an issue for BES servers. Specifically DAOS is transparent to all other Domino tasks and the Notes client, and indeed other Domino servers.
Now, what I haven't tested is a BES installed on a server which is also the mail server with DAOS'd mail files. This is not a supported by RIM anyway, but I think it should work OK, but test it first.
What versions BES and Domino is BES running? Never seen an issue on any BES with DAOS.
Had the same problem with attachments not showing up in the device. a reboot of the BES server did the trick.
Was adminp able to do an accelerated create replica on the new server? If so, there may be a confounding variable in your experiment. As I recall, accelerated create replica results in an OS-level xCopy for the initial replication job (which isn't able to use replication formulas and ACL restrictions, much less DAOS). You may want to manually create an empty replication stub on the remote server and see if the numbers work out better. If that works out better, we'll then need to see if we can get IBM to change the accelerated create replica behavior for DAOS enabled servers (since the xCopy thing was introduced in R6 to improve performance).