15:01:03 #startmeeting oVirt Release Go/No Go Meeting 15:01:03 Meeting started Mon Jan 30 15:01:03 2012 UTC. The chair is mburns. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:01:03 Useful Commands: #action #agreed #help #info #idea #link #topic. 15:01:08 #chair oschreib 15:01:08 Current chairs: mburns oschreib 15:01:38 who is here? 15:01:45 108 people 15:01:59 * ykaul is here 15:02:05 * mgoldboi here 15:02:30 * sgordon is here 15:02:54 \me accidentally listening 15:03:24 anyone else? 15:03:27 * jumper45 here 15:04:38 #info attending: mburns, oschreib, sgordon, ykaul, jumper45 15:04:57 oschreib: nice of you 15:05:36 #info and mgoldboi 15:05:38 * danken is here! 15:05:50 * lhornyak pong 15:05:58 * pstehlik is spectator 15:06:11 oschreib: as long as people say something, they'll be noted in the minutes 15:06:18 good :) 15:06:23 lets start 15:06:28 so no need to keep doing infos for them 15:07:01 mburns: want to update us with the UEFI status? 15:07:54 oschreib: kernel bug is fixed and released in fedora 15:08:13 there is some apparent reluctance in kernel upstream to accept the patch, but that's a fight for another day 15:08:32 oschreib, can you #link that pre-release check list from the wiki? 15:08:40 i built ovirt-node yesterday with the patches and the new kernel 15:08:50 there is one new bug filed today that we're investigating 15:09:07 #link https://bugzilla.redhat.com/show_bug.cgi?id=785728 15:09:15 #chair sgordon 15:09:15 Current chairs: mburns oschreib sgordon 15:09:52 #link http://www.ovirt.org/wiki/Releases/First_Release 15:09:56 mburns: looks like a blocker to me 15:10:08 mburns: do we expect any more tests to be carried out on this image? 15:10:30 ykaul: it was on my plan for today before i saw this bz 15:10:46 if it is a blocker can we add it to the page? 15:10:54 #link http://www.ovirt.org/wiki/Releases/First_Release_Blockers 15:10:54 i'm hopeful this bug is a simple fix and i can get a new one out shortly 15:11:32 mburns: anyone else but you planning to test that image? 15:11:35 #link http://www.ovirt.org/wiki/First_release 15:12:00 ykaul: i'm going to wrap jboggs in as well, but no qe team afaik 15:13:03 OK, we started with the node (will cover vdsm and engine later) 15:13:19 are we OK with the current status of the node for tomorrow? 15:14:05 oschreib: we'll need to figure out this bug scale - install fails is a big nono from my side 15:14:38 right, we need triage on this bug before i can give ack on release 15:14:39 mburns: thoughts? 15:14:54 if it's small subset, i'd be willing to say go, but i just don't know at this point 15:15:26 mburns: when will we have an answer for that? I'm not sure we can wait much longer. 15:16:14 oschreib: i hope within about an hour i'll have an answer on whether it is a blocker or not 15:18:44 ykaul: mgoldboi: how do you feel about releasing ovirt-node in the current status if this bug is not a blocker? 15:19:17 oschreib: we are not testing ovirt-node {enough|at all} to comment. 15:21:12 oschreib: UEFI seems to still be an issue - it's going to be hurtful from our past experience 15:21:53 mgoldboi: but mburns says it was fixed... 15:22:19 oschreib: the new bug points to efi or at least related to the uefi patches 15:23:18 oschreib: i should know more in an hour or 2 about it 15:23:23 if the bug is UEFI related, then I think we proceed with the release and just say "oVirt Node does not work with UEFI" and we can reference the fact that upstream kernel is still not quite sure how to handle this 15:23:48 (i.e. latest patch that is in Fedora to handle dmm issue might not get in upstream is our current understanding) 15:24:06 and that leads me to the next question: are we ok the release oVirt without the support for UEFI? 15:24:33 oschreib: i'd vote to go ahead without UEFI 15:24:53 Agreed with mburns 15:25:02 Fedora doesn't even yet properly support UEFI in Grub2 15:25:03 just call it out in bold italicized underlined font in the release notes that it doesn't work with UEFI 15:25:25 sgordon: can you handle that? 15:25:29 yeah 15:25:33 cool 15:25:33 i will be writing them up today 15:25:36 oschreib: releasing an interim ovirt-node with uefi support should be possible too 15:25:39 on the assumption we are going ahead 15:25:57 s/interim/async 15:26:18 ykaul: mgoldboi - I remember you both didn't like the idea of releasing oVirt Node without UEFI support. 15:26:20 it shouldn't impact any other ovirt subproject 15:26:33 mburns: I agree 15:27:16 oschreib: I still don't like the idea - I know several people on the list complained about it. My input is not release it without it, due to the lousy user experience it provides out of the box. 15:28:04 ykaul: you heard the guy, Fedora doesn't support that. do you want us to wait until they will? 15:28:16 my question is which is worse: release without uefi? or don't release? 15:28:27 * mburns thinks releasing is the more important thing at this point 15:29:23 oschreib: I suspect we have enough other blocking issues, but no, I would not release without UEFI, based on the number of people who bumped into it AND complained on the mailing list - I'm sure there were more who just ditched it after failing to install. 15:30:21 oschreib: let's understand the scale of the bug first- we don;t have a clew on what's going on there - be smarter a bit later and let's continue 15:30:36 mgoldboi: we can't wait much longer. 15:31:00 mgoldboi: for how long will we wait? we must agree whether we are releasing tomorrow or not 15:31:12 yeah i think we need to decide 15:31:26 even if that is to push it back and re-assess tomorrow or something 15:31:47 ok, I think we better move talking about vdsm 15:31:49 * mburns would agree to that, let's push until wednesday 15:32:21 #topic VDSM 15:32:57 #link https://bugzilla.redhat.com/show_bug.cgi?id=773371 15:33:00 do we have any blockers? (or suggested blockers?) 15:33:31 14[[07Special:Log/newusers14]]4 create210 02 5* 03Fsimonce 5* 10created new account User:Eblake 15:33:32 basic installation flow - will cause a problem with spice on tls 15:34:19 danken: your thoughts of the above BZ would be much appreciated 15:35:38 I know why it happens 15:35:50 * lpeer is here 15:35:51 because the code assumes sysv at one point 15:36:00 but has systemd instead 15:36:06 welcome lpeer :) 15:36:13 which does not support vdsm's "reconfigure". 15:36:14 oschreib: sorry for being late 15:36:35 In my opinion it is not a blocker, I'd rather add a release note. 15:36:38 lpeer: we're not talking about the engine-core yet, in few minutes 15:36:50 is there an easy workaround? 15:37:01 danken: what would be the RN? don't install vdsm from command line? 15:37:10 danken: note that Spice is our only remote console. 15:37:33 run `/lib/systemd/systemd-vdsmd reconfigure` if your vdsm is ill-configured 15:37:49 #link https://bugzilla.redhat.com/show_bug.cgi?id=785557 - bridge configured on a NM_CONTROLLED device wouldn't work after reboot - another nasty paper cut. 15:38:00 sorry for jumping in... 15:38:15 let's finish with the first bug 15:38:17 #chair 15:38:17 Current chairs: mburns oschreib sgordon 15:38:28 mburns: can mgoldboi add links? 15:38:38 #chair mgoldboi 15:38:38 Current chairs: mburns mgoldboi oschreib sgordon 15:38:45 mgoldboi: do you think we can ship with this release note? 15:38:47 #link https://bugzilla.redhat.com/show_bug.cgi?id=785557 - bridge configured on a NM_CONTROLLED device wouldn't work after reboot - another nasty paper cut. 15:38:48 mgoldboi: please add the links again 15:38:54 or not :) 15:39:00 #link https://bugzilla.redhat.com/show_bug.cgi?id=773371 15:39:12 * mburns doesn't remember if the links get added 15:39:31 i think you can #action youself if you're not a chair, but not sure about links 15:40:11 danken: I really wonder whether your workaroung is acceptable. it's easy, but might hurt lot's of people 15:40:20 danken: if it's a simple fix i would like a fix - it's another bad user experience out of the box 15:40:52 mgoldboi: it is hard to fix this nicely for both sysv and systemd 15:41:13 without our beloved vdsm-tool (still in the works^Wdreams) 15:41:50 danken: and if we run reconfigure as part of the vdsm bootstrap? 15:42:22 we do that - but via sysv... 15:42:25 mgoldboi: I don't think the code fix is in the scope here. we should decide if it's a blocker or not. 15:42:50 danken: can't you put a condition around it? 15:43:06 if rpm -q systemd; do systemd; else sysv 15:43:34 might not be good for non-fedora distros, but something like that should work... 15:43:52 mburns: of course we can, but I simply do not think it is worth it. We should solve it properly via a vdsm-tool that has this logic 15:44:18 bottom line - who thinks it's a blocker? 15:45:14 oschreib: bad user experience - i would like a fix for it 15:45:57 i think we should revisit the requirements we set out for a releae 15:46:08 http://www.ovirt.org/wiki/First_release 15:46:20 we can come up with papercuts all day 15:46:24 sgordon: we have there "MUST: Pass minimal smoke test" 15:46:36 we don't talk about workarounds 15:46:39 sure, but does a minimal smoke test involve a fedora node 15:46:45 or does it involve *the* node 15:46:59 sgordon: it involve host installation, for sure. 15:47:10 and it might fail on this BZ 15:47:15 might? 15:47:17 or does? 15:47:20 and on the second BZ we didn't talked about 15:48:10 sgordon: danken can elaborate, I understand the user might experience it, if he installed vdsm manually on the host (not so unusual process) 15:48:11 well i am not sure we need to continue if you guys are already convinced that we have bugs in multiple components that are blockers 15:48:21 because this is a go or no-go meeting 15:48:34 if vdsm is started before it has its keys configured, it configures itself to avoid ssl keys - even vdsm is later installed properly with its keys and certificates. 15:48:36 yes, but we must understand which bugs should be fixed. 15:48:51 and again i say 15:49:05 the list of suggested blockers is supposed to be here: http://www.ovirt.org/wiki/Releases/First_Release_Blockers 15:49:25 so let's get a clear list there please 15:49:45 but we need to decide whether a certian BZ is a blocker. 15:49:54 the key word was suggested 15:50:01 since we didn't mentioned UEFI in the release criteria 15:50:40 sgordon: so adding the UEFI as a blocker is controversial 15:50:56 well we have been talking about this for 50 minutes, but no bugs have been added to that list? 15:51:04 sgordon: we have suggested a list, sent via email. If they are accepted as blockers, I believe they should be added to the wiki. Also, note that from my perspective, a difficult user experience is also a blocker. We do care about the user experience, especially for the first release and want it to be smooth. 15:51:08 so is it going to be clear from the minutes which bugs are and arent blockers 15:52:02 i am all for a continued discussion of which bugs should and should be fixed, but to fulfil the scope of this meeting 15:52:08 ykaul: yea, you're right. but "bad user experience" is not an acceptable criteria 15:52:09 14[[07Releases/First Release Blockers14]]4 !10 02http://www.ovirt.org/w/index.php?diff=2052&oldid=1973&rcid=2120 5* 03Mburns 5* (+354) 10/* First Release (3.0) Known blockers */  15:52:12 i think we need to determine whether we are pushing the release 15:52:13 14[[07Features/DetailedFloatingDisk14]]4 !10 02http://www.ovirt.org/w/index.php?diff=2053&oldid=1989&rcid=2121 5* 03Mlipchuk 5* (+1) 10/* Disk */  15:52:26 *should and should not 15:53:10 I'm afraid that blocking on NM_CONTROLLED device doesn't work after reboot 785557 15:53:10 is not reallistic 15:53:21 sgordon: added ovirt-node and 2 vdsm bugs to that page 15:53:27 mburns: thanks 15:53:35 I'm not sure what is the solution for this 15:53:49 14[[07Features/DetailedFloatingDisk14]]4 !10 02http://www.ovirt.org/w/index.php?diff=2054&oldid=2053&rcid=2122 5* 03Mlipchuk 5* (-59) 10/* Open Issues */  15:53:53 danken: why not? can't you remove the "nm_controlled" from the configuration? 15:53:54 I reviewed the engine bug list sent by mgoldboi and non is blocker for the release IMO 15:54:03 14[[07Features/DetailedFloatingDisk14]]4 !10 02http://www.ovirt.org/w/index.php?diff=2055&oldid=2054&rcid=2123 5* 03Mlipchuk 5* (+57) 10/* Future Work */  15:54:27 oschreib: I'm not sure if NM would like as stealing its nic 15:55:01 I'm fine with RN says we're not supportingNM_CONTROLLED devices atm 15:55:10 we still use bridges for everything, right? 15:55:11 to be honest for nm to play nicely with bridges it's better to just turn it off period 15:55:12 again I think it's more realistic to ask the user: disconnect your nic from NM 15:55:18 mburns: yes 15:55:25 if so, NM doesn't deal with bridges, so stealing the nic is what you have to do 15:55:25 14[[07Features/Direct Lun14]]4 !N10 02http://ovirt.org/w/index.php?oldid=2056&rcid=2124 5* 03Eduardo 5* (+114) 10The direct LUN feature enables to use a LUN as a local VM device. 15:56:03 14[[07Features/Direct Lun14]]4 !10 02http://ovirt.org/w/index.php?diff=2057&oldid=2056&rcid=2125 5* 03Eduardo 5* (-18) 10/* Direct Lun */  15:56:11 bottom line- sounds to me that the NM issues could be a RN 15:56:23 and the spice issues should be fixed (and a blocker) 15:56:43 14[[07Features/Direct Lun14]]4 !10 02http://ovirt.org/w/index.php?diff=2058&oldid=2057&rcid=2126 5* 03Eduardo 5* (+6) 10/* Introduction */  15:56:47 14[[07Features/DetailedFloatingDisk14]]4 !10 02http://www.ovirt.org/w/index.php?diff=2059&oldid=2055&rcid=2127 5* 03Mlipchuk 5* (+52) 10/* Dependencies / Related Features and Projects */  15:57:00 14[[07Features/DetailedFloatingDisk14]]4 !10 02http://www.ovirt.org/w/index.php?diff=2060&oldid=2059&rcid=2128 5* 03Mlipchuk 5* (-3) 10/* Dependencies / Related Features and Projects */  15:57:17 danken: how much time will it take to fix the tls issue? 15:58:19 * danken added RN for the NM bug 15:58:22 14[[07Features/DetailedFloatingDisk14]]4 !10 02http://www.ovirt.org/w/index.php?diff=2061&oldid=2060&rcid=2129 5* 03Mlipchuk 5* (+58) 10/* Open Issues */  15:59:09 oschreib: mgoldboi: with pre-integ commitment, reconfigure can be done by tomorrow.. 15:59:35 danken: we can help with that ;) 16:00:14 mgoldboi: remember that dougsland is awake during tlv nights 16:00:23 #info https://bugzilla.redhat.com/show_bug.cgi?id=785557 is not a release blocker. Release notes will be added. 16:00:42 #info https://bugzilla.redhat.com/show_bug.cgi?id=773371 - is a blocker, should be fixed tomorrow. 16:00:46 any other vdsm issues? 16:01:00 mgoldboi: ^^ 16:01:30 we have a deadlock on prepare for shutdown - not nice - but don't think it's a blocker - danken? 16:01:53 #link https://bugzilla.redhat.com/show_bug.cgi?id=785749 16:02:04 mgoldboi: I saw the bug - but I did not understand the deadlock 16:02:10 what is blocking exactly? 16:02:15 vdsm won't go down? 16:02:22 danken: right 16:02:28 14[[07Releases/First Release Blockers14]]4 !10 02http://www.ovirt.org/w/index.php?diff=2062&oldid=2052&rcid=2130 5* 03Oschreib 5* (-135) 10/* First Release (3.0) Known blockers */  16:03:00 ok, can we move to oVirt engine? 16:03:14 mgoldboi: non-blocker imho 16:03:39 we have 1 more issue with libvirt - mass migration #link https://bugzilla.redhat.com/show_bug.cgi?id=785789 16:03:55 mgoldboi: mass migration is not part of the first release criteria 16:04:28 #topic libvirt 16:04:44 #link https://bugzilla.redhat.com/show_bug.cgi?id=785789 16:04:56 that's the only issue with libvirt so far 16:05:03 #info mass migration is not part of criteria for first release 16:05:14 #topic engine 16:05:27 mgoldboi: any open engine issues? 16:06:02 ovirt-engine-core: extend fails when there is more than once host in cluster 16:06:24 #link https://bugzilla.redhat.com/show_bug.cgi?id=782432 16:06:44 not in the criteria, and doesn't sounds to me it should be there. 16:07:13 oschreib: it's a basic flow - but it's a large system... 16:07:24 mgoldboi: I don't think it is a release blocker 16:08:04 lpeer: if we will delay the first release, is it quickly solvable? 16:08:47 oschreib: not sure, It looks like an easy fix but have to look in the logs to make sure 16:08:52 oschreib: and just as interesting - if we do NOT delay, when will it be solved? 16:09:39 oschreib: I can see how soon can we push a fix but would not delay the release for it 16:10:03 lpeer: sure, we already have a blocker in some other places 16:10:04 ykaul: since this is urgent i'll take a look ASAP and comment on the bug 16:10:28 #link https://bugzilla.redhat.com/show_bug.cgi?id=784900 null pointer exception when merging snapshot and engine gets restarted - VM stuck in unknown state 16:10:37 IMO, this issue is a SHOULD for the release 16:11:01 mgoldboi: you must wait before adding links, so we can add info 16:11:26 oschreib: got you 16:12:32 playing with snapshot sounds more important to me. 16:13:03 altohugh it's not fully mentioned in the release criteria 16:13:41 mgoldboi: just to be clear - EVERY merge fails? 16:14:01 oschreib: AFAIK yes 16:14:19 ammm 16:15:06 VDSM has one obvious blocker, and we're not sure about oVirt-node. 16:15:23 sounds to me like we should delay the release for few days 16:15:31 mgoldboi: AFAIU it is only if you do a jboss restart while the task is running 16:16:37 mgoldboi: any chance you can see if http://gerrit.ovirt.org/#change,1360,patchset=1 is good for reconfigure? 16:16:38 had 0 DEV testing. 16:16:59 lpeer: you're right regarding this one, but there is also - https://bugzilla.redhat.com/show_bug.cgi?id=785671 around this area 16:17:49 danken: probably too late today - we can have a look at it tomorrow morning - please open a ticket for us 16:19:46 mgoldboi: please discuses the engine blocker on engine-devel, so we will understand if they are real blockers 16:20:00 any suggestions on the new release date? 16:21:22 mgoldboi: mburns sgordon ykaul danken ? 16:21:39 mgoldboi: 785671 is about taking two snapshot and creating a race, I dont see it as a release blocker 16:22:18 oschreib: I'd say tentatively push to thursday 16:22:24 oschreib: list has been sent - need to add also kvm crash https://bugzilla.redhat.com/show_bug.cgi?id=784324 - happens when storage connection is blocked 16:22:54 mgoldboi: again, not in the release critetia, and "blocked storage" sounds like a corner case 16:22:55 oschreib: and rsync wednesday on the board call 16:23:13 #topic decision 16:23:21 mgoldboi: ticker 600 is yours 16:23:51 oschreib: blocked storage is NOT a corner case. I don't know where you got this idea. 16:24:24 ykaul: do you think it's a first release blokcer? the intention was to release something that works, not without bugs 16:24:34 #undo 16:24:34 Removing item from minutes: 16:24:39 oschreib: it's a very common fault scenario in both NFS and iSCSI storages. That being said, the result (what happens when there is a temp disconnection), should be the criteria we should measure the bug severity. 16:24:40 and no one mentioned this in the release criteria 16:24:42 ykaul: imho error flows should not be blockers for our release unless they are very common 16:25:28 oschreib, lpeer: I think it depends what happens due to the error. If your VM dies, it's quite bad. If it's disk is corrupted, it's unacceptable. 16:26:20 so has anyone actually tried the latest nightly Ovirt-Mnagement packages for F16? 16:26:27 They break engine-notifierd 16:26:29 oschreib: I'm not saying "don't release", but lets not get put it into "corner case" bucket. And error flows are an issue we need to look at. 16:26:51 ykaul: maybe you're right. but no one raised this issues when we wrote the "release criteria". 16:27:18 ykaul: If you think we should edit it for the first release, raise it on the mailing list, not now. 16:27:21 also I have spent the last 5 days hammering on Ovirt trying to test it and it's far some any release 16:28:18 killsudo_m, can you link the relevant bug numbers? 16:29:14 oschreib: got to go - let's continue discuss it on the list. 16:29:15 I've hit so many "uhhh wtf" I quick keeping count 16:29:37 One glaring missing this is tap interfaces and the lack of support 16:29:41 killsudo_m: we can't really comment without specific issues, bugs or reports in the mailing list. 16:29:46 oschreib: lets take this to various lists and circle back wednesday during the board meeting... 16:30:05 mburns: so what is our decision? 16:30:08 killsudo_m, all very well, but the only way things improve are if bugs are raised and they are tracked 16:30:09 I would love to report them but after the latest yum update from nightly repo my entire installation broke 16:30:18 oschreib, i think the suggestion was to delay to thursday 16:30:23 oschreib: we have enough blockers and/or near blockers that we have to delay, IMO 16:30:34 so I can't even walk through to document a full reproducible bug report 16:30:35 and reassess wednesday in the sync meeting 16:30:49 yes, but we did not decide what is considered as a blocker 16:30:55 i think we tentatively say thursday with a go/nogo during board meeting 16:31:05 ok 16:31:08 sounds ok to me 16:31:13 oschreib: let's figure that on mailing lists 16:31:29 #topic decision 16:31:40 ykaul: can you please send an email about your requirements for the first release? 16:31:52 #info delay release until thursday with go/no-go during board meeting wednesday 16:31:53 we have one clear blocker in VDSM 16:32:13 oschreib: yes, I'll revisit it. I think we have set the bar low. 16:32:16 #info blocker/non-blocker needs to be determined by each package on their mailing list *before* the board meeting 16:32:47 yjaul: can you please reduce the to the bare minimum? 16:32:47 could someone at least check their "/etc/init.d/engine-notifierd" and tell me what user that is suppose to use 16:32:47 Mine is set to engine and and I get a "no such user" then engine-notifiered exits 16:33:06 #action mburns danken lpeer oschreib for blocker status before board meeting 16:33:27 #action ykaul to discuses release criteria on the mailing list 16:33:52 killsudo_m: I think you can already file a BZ on this. but the notifierd is really not critical for the operation of oVirt. 16:34:22 oschreib: anything else or can we end the meeting? 16:34:43 I'm just wondering whether should we send an email about the delay 16:34:51 since it might be delayed again 16:34:52 then the update broke something else as my fedora ovirt-node is completely "unresponsive" from the mangement interface and no amount of fiddling or restart the management side works but the node is fine 16:35:13 killsudo_m: we don't support ovirt updates yet.... 16:35:41 oschreib: probably send to users@ and announce@ 16:36:00 saying the we're postponing the release and we'll re-evaluate wednesday 16:36:08 ok 16:36:20 i wouldn't put a date on it 16:36:40 mburns: announce@ ?? 16:36:47 announce@ovirt.org 16:37:00 killsudo_m: and do you have VDSM logs which we can look at? 16:37:03 never used it 16:37:10 killsudo_m, mine runs as engine, i assume at some point it might have still been rhevm or something else 16:37:35 (this is a fresh install i created on friday) 16:37:51 oschreib: very low traffic, but probably worth sending the delay to 16:37:52 I have tons of logs and been digging through all of them trying to track down why my setup borked itself for the third time during normal usage 16:38:53 might be worth pinging quaid or rbergeron as i expect that list needs moderator approval for the mail to be sent out 16:38:56 mburns: ok, thanks 16:39:43 ok, anything else? or can i end the meeting? 16:39:51 end it 16:39:57 #endmeeting