13:58:51 #startmeeting oVirt Infra 13:58:51 Meeting started Mon Sep 30 13:58:51 2013 UTC. The chair is knesenko. Information about MeetBot at http://wiki.debian.org/MeetBot. 13:58:51 Useful Commands: #action #agreed #help #info #idea #link #topic. 13:59:02 #chair ewoud knesenko dcaro 13:59:02 Current chairs: dcaro ewoud knesenko 13:59:19 eedri is not here today .... 13:59:28 obasan is not here 13:59:35 ok 14:00:02 anyone else here ? new folks ? 14:00:09 * orc_orc is here 14:00:14 ... Russ Herrold 14:00:14 orc_orc: Error: ".." is not a valid command. 14:00:16 ... Russ Herrold 14:01:31 dcaro: can you lead the ,meeting ? 14:01:35 dcaro: I need to go 14:01:40 dcaro: explain later 14:01:59 knesenko: ok 14:04:04 Hi Russ, you want to join ovirt infra then? 14:04:27 more, I wish to watch a bit and learn more on the project first 14:04:52 I have tried local setups and such and the CI model is something I wish to replicate on my development bench 14:05:34 I have a wiki account (so I may correct bitrot) and have been trying to work through it 14:05:52 orc_orc: ok, you are welcome to ask anytime :) 14:05:52 hi all 14:05:52 orc_orc: I assume you also follow the mailing list then? 14:06:00 dcaro: absolutely 14:06:22 #chair dneary 14:06:22 Current chairs: dcaro dneary ewoud knesenko 14:07:18 I don't think we have an agenda 14:07:46 let's first talk about the artifactory server 14:08:04 #topic Artifactory server 14:08:26 eyal told me that it's almost ready, what's missing? 14:08:51 #info network issues resolved, were duplicate mac addresses; pool on alterway02.ovirt.org has been adjusted not to conflict 14:09:05 I don't know how it was installed, but it wasn't using puppet 14:09:26 I haven't seen no patches there neither 14:11:04 I think what was missing was mostly that jenkins wasn't using it yet 14:11:32 :), yep, we have to do some changes in some of the jobs to get the maven settings 14:11:58 I think we need a puppet patch to fill ~jenkins/.m2/settings.xml on every slave 14:13:04 looking at the bash history, the install was rather trivial so I'll try to write a very simple patch 14:13:13 #action ewoud puppetize artifactory 14:13:30 ewoud: it's a little more complicated, as some of the jobs run maven inside makefiles and scripts, that require adding some parameters to use othe maven configs 14:14:47 dcaro: doesn't maven read the settings stacked? so first system, then user and then project? 14:16:01 ewoud: I was told that you have to pass some extra flags to make based projects, maybe they overwrite some of those options inside the scripts or something 14:16:24 hmm ok 14:16:50 ewoud: but we can try :) 14:18:57 ok, any volunteers? 14:19:27 I think someone on the ML already supplied the config 14:21:07 http://lists.ovirt.org/pipermail/infra/2013-September/004023.html 14:21:07 yep, alon passed the new settings.xml file 14:21:18 should we just deploy that using puppet? 14:21:35 agree 14:22:11 #action ewoud deploy settings.xml to all jenkins slaves 14:22:49 ewoud: we will have to make sure the jobs are actually using it 14:23:13 dcaro: could you have a look at that once we've deployed it? 14:23:41 #action dcaro make sure jobs use the new settings.xml file 14:24:39 ok, next issue 14:25:16 #topic new machine ovirt03.redhat.com/rackspace03.ovirt.org 14:25:56 I've opened the ticket to get it installed, I hope this week it will be available 14:26:15 ok, then we need to think of a migration plan 14:26:26 one option is just to install new slaves 14:26:49 I was thinking that it might be useful to have the OS in the hostname 14:27:18 such as slave-f18-01, slave-f19-01 etc 14:28:35 ewoud: is there a convention between using a unique hode name in the A record, and then using a CNAME for such mutable information? 14:28:59 orc_orc: ewoud yep, I was thinking on something like that 14:29:11 s//host/ 14:29:16 as we use dns entries to connect to the slaves 14:30:09 orc_orc: since the OS is part of the role of those slaves, I think it's part of the hostname 14:30:10 but we do not control the dns directly (each change requires a request to a third party) 14:30:28 dcaro: yes, eventually we may need to find a solution for that 14:30:34 so it's not easy to change right now 14:30:43 s/easy/agile/ 14:31:43 so, a locally managable solution would be to ask for a permanent and unique A record when a host is deployed, and then assign and manage a CNAME in a domain under local control? 14:31:57 that solves the reliance on third party after the initial setup 14:32:14 (I do something like for the LSB_ 14:32:50 orc_orc: ovirt.org is managed by RedHat IT, so requesting an A record and a CNAME is the same procedure 14:33:29 I think he proposes to host our own dns servers for a subdomain so we can change the config ourselves 14:33:31 we could run bind on foreman and replicate it to other servers for reliability, change the NS records in the .org zone and manage it ourselves 14:34:13 dcaro: yes -- either a sub-delegation (harder) or a local utility domain (easier) 14:35:11 at $dayjob I've done the sub-delegation and it's not that hard if you have access to the correct accounts 14:35:17 not sure how easy that is to do with RH IT 14:35:38 we can try 14:35:55 * eedri here 14:36:08 #chair eedri 14:36:08 Current chairs: dcaro dneary eedri ewoud knesenko 14:36:39 but what's the plan with the new host? 14:36:51 install fresh slaves or try to migrate the existing ones? 14:36:59 hehehe, we diverged a bit yes :) 14:37:08 I am sorry i am back 14:37:29 knesenko: eedri welcome back 14:37:51 we are discussing what's the plan with the new rackspace03 server 14:38:14 ah ok 14:38:27 seems like we need to reinstall rackspace 14:38:30 ovirt0-engine 14:38:38 I don;t like local sotrage setups 14:38:53 knesenko: that was the plan with the new server 14:39:07 install that in a cluster setup with just 1 host initially 14:39:27 attach the gluster storage to it 14:39:42 then we can either migrate existing hosts or install fresh ones 14:39:47 I'm leaning toward the latter 14:40:03 yes ok 14:40:49 so how will we migrate the old hosts to gluster? Just create the vms from scratch again? 14:41:13 I'd say yes 14:41:29 dcaro: I think that we can create export domain .... export them and import to another DC 14:41:46 so we start by emptying one (say rackspace01) 14:41:55 and exporting + importing could work 14:42:00 right 14:42:17 then once it's empty, we reconfigure rackspace01 to join the cluster with rackspace03 14:42:25 and when that works, we do the same for rackspace02 14:42:30 deleting the vms means creating/reconfiguring the slaves on jenkins 14:43:20 export + import could be quick, depending on the storage size 14:46:03 I prefer export + import 14:46:13 dcaro: me too 14:46:31 * ewoud has no preference 14:46:54 can we document the proceeding somewhere? 14:47:27 wiki? etherpad? 14:47:29 dcaro: I have a ticket somewhere to plan that migration 14:47:40 perfect 14:48:00 next then? 14:48:41 yes 14:49:25 ewoud: did you check if we ca kickstart using dhcp on rackspace? 14:49:39 dcaro: haven't had time 14:49:44 ok 14:50:04 #topic jenkins update to latest lts 14:50:18 ewoud: what's the status? 14:50:28 dcaro: oh, I forgot to mail an update 14:50:44 but I updated it using yum update, service jenkins restart 14:51:44 ewoud: I remember I had some problems with jenkins sending the gerrit message twice, I had to restart it, but I think it was on wednesday though 14:51:54 anyhow, any issues? 14:52:00 not that I know 14:52:06 good :) 14:52:10 * ewoud checked if all slaves reconnected 14:52:24 we are having dead slave lately 14:53:12 one slave or different ones? 14:53:31 iirc more than one 14:54:48 I think they get out of memory and kill the java process 14:55:24 then dmesg should show it 14:55:57 ewoud: it does, but I'm not sure of the date :S 14:56:41 dcaro: you can clean it using dmesg -c it it should be at least somewhat recent 14:57:10 ewoud: I'll do 14:57:43 ewoud a more durable solution might be: ( date ; dmesg -c ) >> /var/log/messages 14:57:57 that gets a searchible record with timestamps 14:58:19 dcaro: what orc_orc said ;) 14:59:32 ewoud: I recall I wrote that for a deployment outline when I was having troubles with a given project. Also I made sure systat (sar) was present and running 15:02:31 any other items on the agenda? 15:02:46 I had an 'add on item' I noticed 15:03:14 'add on item'? 15:03:26 an iten not 'on the agenda' 15:03:29 reading the mailing list, I had reviewed the Logwatch report for linode01.ovirt.org 15:03:34 see, eg: http://lists.ovirt.org/pipermail/infra/2013-September/004030.html at: sendmail-largeboxes 15:03:39 and it appears that the mail spool for userid: jenkins is not being purged of older messages 15:04:05 I assume the emails to that account ID may be needed to trigger a VCS check and a build when a commit message is seen. 15:04:12 ... but then perhaps those email should be deleted after a week or so, to prevent the file from growing without limit 15:04:12 orc_orc: Error: ".." is not a valid command. 15:04:16 ... but then perhaps those email should be deleted after a week or so, to prevent the file from growing without limit 15:04:31 * ewoud looks what's even in there 15:04:34 it may be that the emails are not wanted at all and should be devnulled at once 15:04:40 * nod * 15:05:27 I fear that most of them might be cron jobs or something 15:05:36 (buildbot, which which I am familiar) may have builds triggered by 'seeing' commit emails) 15:06:54 it seems it tries to mail to gerrit2@gerrit.ovirt.org but, can't connect 15:07:22 ... so probably those: email delivery delayed warnings 15:07:22 orc_orc: Error: ".." is not a valid command. 15:07:24 ... so probably those: email delivery delayed warnings 15:07:39 Question: I'm interested in pre-populating SSL keys on the ovirt engine location and on the vdsm side of things (so that as much as possible can be automated). Does anybody know what files need to be copied to the VDSM side to allow this, specifically for the authorized_keys file entries 15:07:48 orc_orc: thanks for noticing 15:07:55 ewoud: no problem ;) 15:08:51 dcaro: any way we can tell jenkins not to mail gerrit? 15:09:16 why is he trying to email gerrit on the first place? 15:09:46 dcaro: I think because a build failed and then it will try to mail the one who broke it? 15:10:02 gerrit will place itself as committer if you cherry pick 15:10:09 I think that the problem is that jenkins creates the patch review as user 'jenkins@ovirt.org', and then getting all the emails for further updates of th epatches... 15:11:01 Project: http://jenkins.ovirt.org/job/ovirt_engine_update_db_multiple_os/./label=centos64/ 15:11:05 that job at least mails 15:11:23 Project: http://jenkins.ovirt.org/job/ovirt-engine_master_create_rpms_quick/./label=fedora18/ 15:11:26 on failure as well 15:12:21 ok, we can troubleshoot it on the ml 15:12:23 can jenkins be instructed to substitute the last committer instead of itself? 15:14:01 orc_orc: I think it's configured to mail those who changed something since the last build, which is normally a good thing 15:14:04 ewoud, dcaro every time someone uses rebase button on gerrit the commiter changes to gerrit2@ovirt.org 15:14:18 eedri: I think gerrit 2.7 changes that 15:14:21 ewoud, dcaro that's a "feature" and gerrit developers are not willing to change that 15:14:22 maybe 2.8 15:14:30 ewoud: noted -- just hoping that in some cases an over-ride might be used 15:14:46 ewoud, dcaro this is why it's important to check the checkbox in scm advanced option for tracking commit author and not commiter 15:14:51 alternatively editting the MTA's aliasing table to send email to a monitored address may make sense 15:15:00 eedri: or I'm confused with another patch 15:15:09 ewoud, i really hope so 15:15:21 eedri: I was not aware of the commit author vs committer 15:15:21 ewoud, right now we have 2.6.1 15:15:36 ewoud, if you check advaned git option on jenkins job, you'll see the option 15:16:28 ewoud, did you see artifactory is online 15:16:54 eedri: yes, but it would be nice to have it puppetized 15:17:32 I'll add an action to check the email issue 15:18:05 #action dcaro investigate jenkins@linode01 emails origin 15:18:43 eedri: http://gerrit.ovirt.org/19698 should puppetize the install 15:19:04 it doesn't deal with fetching the RPM and I didn't see a yum repo for their RPMs either 15:19:39 eedri: is openjdk-1.7.0-devel needed to run artifactory as well? 15:20:01 http://dl.bintray.com/content/jfrog/artifactory-rpms looks like a yum repo :) 15:21:21 ewoud, yes 15:21:23 ewoud, must 15:21:33 ewoud, as they write in installation notes 15:22:10 eedri: ok 15:22:15 ewoud, +1, that's about what i did 15:22:25 ewoud, we should consider adding a running apache in front 15:22:33 ewoud, to use port 80, but not critical 15:24:06 eedri: I did look in history to determine that :) 15:24:56 any other issues? 15:26:04 I think we can close the meeting 15:29:00 #endmeeting