14:00:50 <ewoud> #startmeeting infra weekly 14:00:50 <ovirtbot> Meeting started Mon Jun 24 14:00:50 2013 UTC. The chair is ewoud. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:00:50 <ovirtbot> Useful Commands: #action #agreed #help #info #idea #link #topic. 14:00:53 <ewoud> #chair obasan knesenko 14:00:53 <ovirtbot> Current chairs: ewoud knesenko obasan 14:01:01 <knesenko> eedri: ? 14:02:44 <ewoud> dcaro not here either? 14:03:00 <obasan> ewoud, dcaro is not here today 14:03:24 <ewoud> ok 14:03:30 <ewoud> I see I've been slacking with the agenda 14:04:26 <knesenko> lets got guys 14:04:31 <knesenko> go* 14:04:36 <ewoud> #topic hosting 14:05:01 <ewoud> knesenko: any progress on the rackspace servers? 14:05:19 <knesenko> ewoud: yes ... I have installed ovirt engine service there 14:05:33 <knesenko> there was some issues with PTR records ... 14:06:07 * eedri here 14:06:11 <knesenko> so I have installed DNS server on rackspace01.ovirt.org that holds PTR records for the rackspace01,02 14:06:21 <ewoud> #chair eedri 14:06:21 <ovirtbot> Current chairs: eedri ewoud knesenko obasan 14:06:51 <ewoud> knesenko: and you have set up that as recursor for the rackspace machines? 14:07:02 <knesenko> I opened 80,443 ports in iptables , but seems like we are blocked by the HW firewall there , so I opened a ticket for rackspace guys 14:07:11 <knesenko> ewoud: yes 14:07:38 <knesenko> So I think the firewall issue will be solved soon 14:08:04 <knesenko> also I changed the schema a little bit 14:08:10 <ewoud> knesenko: and the DNS issue? 14:08:22 <knesenko> ewoud: DNs issue solved 14:08:31 <knesenko> regarding the schema ... 14:08:53 <knesenko> We will use rackspace01 as engine and NFS server .... instead of sing the localstorage 14:09:16 <ewoud> knesenko: but I think you don't want to run a DNS server in the long run and just have the PTR records served by rackspace 14:09:19 <knesenko> I mean rackspace01 will be engine/hosts at the same time but without localstoage 14:09:35 <ewoud> how so? won't that be a lot slower? 14:09:35 <knesenko> ewoud: they can't handle it .... we asked them 14:09:36 <eedri> ewoud, rackspace said they don't support PTR records for private ips 14:09:47 <eedri> ewoud, only public ips 14:09:51 <ewoud> ah 14:09:59 <ewoud> and you need PTR? /etc/hosts is insufficient? 14:10:02 <knesenko> ewoud: this will be a bit slower, but we will have all HA features 14:10:24 <knesenko> ewoud: PTR is must ... 14:10:34 <knesenko> lest thing I wanted to do is to install a DNS server 14:10:35 <knesenko> :) 14:10:39 <knesenko> last* 14:10:44 <Yamaksi> LOL ewoud is a chair ;) 14:10:58 <ewoud> chair == voorzitter 14:11:06 <ewoud> knesenko: but NFS isn't HA, so what do you win? 14:11:08 <Yamaksi> ewoud: chair"man";) 14:11:20 <Yamaksi> ewoud: NFS can be HA 14:11:25 <Yamaksi> if the backend supports it 14:11:37 <knesenko> ewoud: 2 hosts in the same DC 14:11:55 <knesenko> instead of using 1 host per DC 14:12:08 <ewoud> knesenko: but who is the NFS server? 14:12:10 <knesenko> we will have 1 DC with 2 hosts in it 14:12:16 <knesenko> rackspace01 14:12:42 <ewoud> so if rackspace01 goes down, it all goes down? 14:13:12 <knesenko> ewoud: same with allinone 14:13:40 <ewoud> knesenko: not true, with all-in-one rackspace02 will keep running if rackspace01 goes down 14:14:29 <knesenko> ewoud: yes, but you can't manage them 14:14:38 <knesenko> ewoud: engine will be down 14:14:50 <ewoud> knesenko: but that's less of a problem imho 14:15:10 <knesenko> ewoud: there are benefits to use NFS ... 14:15:21 <Yamaksi> ewoud: for what enviroment is this ? ovirt test ? 14:15:29 <knesenko> ewoud: we have two choices here ... NFS or 2 local DCs 14:15:39 <ewoud> knesenko: but why not gluster instead of NFS? then you'd at least have the benefit of HA storage 14:15:55 <knesenko> ewoud: possibl 14:15:55 * Yamaksi has some netapps laying around... 14:15:56 <knesenko> e 14:16:10 <ewoud> Yamaksi: computing power for CI using jenkins 14:16:26 <Yamaksi> ewoud: CI ? 14:16:29 <knesenko> ewoud: gluster is an option .... we can go with that as well 14:16:30 <Yamaksi> Code Igniter ? 14:16:38 <ewoud> continious integration 14:16:59 <Yamaksi> ewoud: and that is going todo ? 14:17:01 <knesenko> ewoud: using gluster will make our NFS HA ? 14:17:06 <eedri> Yamaksi, stateless vms for jenkins slaves 14:17:24 <Yamaksi> ah ok 14:17:53 <Yamaksi> uhm, guys, why not have a "mirror" somewhere which can provide it ? We have redundant netapps in a cluster that cannot go down 14:18:10 <Yamaksi> unless you unplug the cable 14:18:15 <Yamaksi> (s) 14:18:55 <knesenko> ewoud: i am sorry 14:18:56 <ewoud> Yamaksi: they're rather stateless so it's all throw away data, which is why I think HA is less important than uptime 14:19:15 <knesenko> ewoud: i was disconnected .... can you repeat ? 14:19:25 <ewoud> knesenko: you missed nothing 14:19:40 <knesenko> ewoud: I asked if gluster will make NFS HA ? 14:19:42 <eedri> sorry got disconnected from network 14:19:46 <Yamaksi> ewoud: okay, but you want to "share" data don't you ? 14:19:49 <ewoud> knesenko: and you DC'ed before I could answer 14:20:27 <ewoud> knesenko: I don't know how production ready gluster is and what the performance does, but gluster will replace NFS 14:20:52 <ewoud> knesenko: it does replication so the data will be both on rackspace01 and rackspace02 14:21:15 <eedri> ewoud, i don't think we need to invest too much in HA for jenkins slaves 14:21:17 <knesenko> ewoud: want to try gluster ? 14:21:29 <eedri> ewoud, it's stateless vms that we can always resintall with foreman 14:21:32 <knesenko> i am really don;t want to use local storage 14:21:49 <eedri> ewoud, as long as they will be properly puppetize 14:22:23 <ewoud> eedri: I fully agree, but I don't think NFS is a solution for us 14:22:38 <eedri> ewoud, and local storage? 14:22:49 <eedri> ewoud, will be a problem too? 14:22:49 <ewoud> it only gives the illusion of HA while in practice it will double the chance of downtime in this case 14:23:45 <ewoud> eedri: if you use local storage, the VMs on rackspace02 will keep running when rackspace01 is down 14:24:19 <knesenko> but we need to think about the future as well .... what if we will grow and we will grow ? 14:24:21 <ewoud> when you use NFS on rackspace01, both hosts will be down while you perform maintenance 14:24:35 <knesenko> we will get one more bare metal host 14:24:53 <knesenko> but gluster solves it ... 14:24:55 <knesenko> right ? 14:25:02 <ewoud> knesenko: then depending on what we want to do, we IMHO either go for gluster or local storage again 14:25:40 <Yamaksi> ewoud: doesn't it depends on the rackspace backend ? I mean performance 14:25:52 <ewoud> Yamaksi: they're bare metal 14:26:02 <knesenko> I vote for gluster 14:26:50 <knesenko> obasan: eedri ewoud ? 14:26:50 <eedri> knesenko, what is the process for installing gluster? 14:26:58 <eedri> knesenko, installign the rpms on one baremetal? 14:27:07 <knesenko> eedri: its built in the allinone installation 14:27:09 <obasan> knesenko, I heard that gluster is good solution 14:27:26 <eedri> knesenko, ok, we're still early in the installaion, so no harm 14:27:30 <eedri> +1 for gluster 14:27:52 <Yamaksi> ewoud: aha, no local storage than 14:27:53 <knesenko> guys, we can try to use gluster .... is this wont work, installing a localstorage takes 5 minutes 14:28:08 <knesenko> s/is/if 14:28:09 <ewoud> +1 on trying, if not fall back to local 14:28:22 <knesenko> ok so we decided to go with gluster 14:29:03 <ewoud> #agree we're going to try to set up gluster on rackspace hosts and fall back to local storage if it doesn't work out 14:30:10 <ewoud> knesenko: I also see another action item for you 14:30:20 <knesenko> ewoud: which one please ? 14:30:21 <ewoud> the migration plan for linode resources => alterway 14:30:50 <knesenko> ewoud: didn't touched it yet ... let me finish with rackspace servers and i will move to the migration plan 14:31:12 <ewoud> sounds good to me 14:31:39 <knesenko> ewoud: still we can;t migrate until we wil have answers for alterway setup ... 14:31:47 <knesenko> external sotrage and VM for engine 14:32:05 <eedri> ewoud, i'm waiting for answers on addtional resources from rackspace that might help 14:32:21 <eedri> ewoud, we might get an additional baremetal and some VMs. 14:32:42 <ewoud> knesenko: true, and it seems quite stable now so I'd rather focus on installing the jenkins slaves now 14:32:47 <eedri> ewoud, do you know if there might be an issue running engine on rackspace that manages alterway servers ? 14:32:48 <ewoud> eedri: ok 14:33:21 <ewoud> eedri: I think you need layer 2 access and I don't know how well it reacts to a higher latency 14:34:57 <knesenko> it will be better to use a VM that will be located in the alterway DC 14:35:08 <knesenko> ewoud: i am not sure about L2 ... 14:36:29 <ewoud> knesenko: I don't know either 14:36:40 <knesenko> ewoud: I can ask ... 14:36:42 <knesenko> :) 14:36:49 <ewoud> please do 14:37:24 <knesenko> ok 14:37:49 <eedri> ewoud, can we ask kevin is that's possible? 14:37:50 <ewoud> so to summarize: we're going to install the rackspace hosts now as a gluster cluster, then think about alterway hosting and linode migration? 14:38:02 <eedri> ewoud, +! 14:38:03 <knesenko> ewoud: yes 14:38:04 <eedri> +1 14:38:45 <ewoud> ok, then let's move on 14:38:56 <ewoud> unless there's more about hosting 14:39:16 <knesenko> no more 14:39:48 <knesenko> quaid: hello 14:40:00 <ewoud> ok 14:40:23 <ewoud> obasan: your action item about monitoring openshift quota, any progress? 14:40:37 <obasan> ewoud, yes 14:40:41 <obasan> ewoud, I have a solution for that 14:41:07 <obasan> ewoud, all there is to do is ssh to the openshift instance 14:41:17 <knesenko> eedri: Oved fixed - ovirt_engine_find_bugs 14:41:21 <knesenko> eedri: good news 14:41:24 <obasan> ewoud, ssh foo@bar-ohadbasan.rhcloud.com 14:41:27 <eedri> knesenko, :) 14:41:38 <obasan> ewoud, and then run the command "quota" 14:41:54 <eedri> knesenko, great, now we need to get unit_tests fixed (but let's wait till we reach jenkins topic) 14:42:13 <ewoud> obasan: I knew that part was possible, but do you know if we can easily hook that into icinga? 14:42:28 <obasan> ewoud, that won't be any problem. 14:42:35 <obasan> ewoud, it can be executed by icinga as a command... 14:43:10 <obasan> ewoud, just a custom script that sends the command. parses the output and alerts if needed... 14:44:06 <ewoud> obasan: cool 14:45:58 <ewoud> ok, anything else on hosting? 14:47:41 <eedri> ewoud, well 14:47:54 <eedri> ewoud, about fedora17 slaves upgrade to f19 14:48:12 <eedri> ewoud, we need to ask on ovirt meeting if it's OK to stop running tests / delivery nightly builds for f17 14:48:19 <eedri> ewoud, and upgrade your host to f19 instead 14:48:38 <eedri> ewoud, or we can wait for rackspace to be ready and install f19 slave there 14:49:07 <ewoud> eedri: then I think that f17 will still be outdated 14:49:18 <ewoud> eedri: can you ask if it's OK to stop? 14:49:42 <eedri> ewoud, i can send email to the list, not sure if i'll attend the meeting tomororw 14:49:46 <eedri> mburns, ping 14:50:06 <eedri> mburns, do you know if we can stop supporting f17 in jenkins and upgrade the slave to f19? 14:50:21 <mburns> eedri: i'd say yes 14:50:34 <eedri> mburns, so no more nightly builds for f17 14:50:41 <mburns> eedri: makes sense to me 14:50:48 <eedri> mburns, would you say it's worth rasing in tomorrow meeting? 14:50:53 <mburns> though we should definitely have f19 builds 14:50:54 <eedri> mburns, or to go ahead with it 14:51:21 <mburns> eedri: probably worth bringing up 14:51:30 <eedri> mburns, ok 14:51:37 <eedri> mburns, thanks 14:51:42 <mburns> eedri: i would think you could move most of the slaves to f19 14:51:51 <eedri> mburns, what about f18? 14:52:00 <eedri> mburns, we currently have 2 f18, 1 f17 14:52:04 <mburns> oh 14:52:06 <eedri> and one rhel 14:52:20 <mburns> let's leave it as is for now, and we'll get agreement on the weekly meeting 14:52:26 <eedri> mburns, ok 14:53:18 <ewoud> eedri: anything else on jenkins? 14:53:21 <dneary> Hi 14:53:31 <dneary> Sorry I am so late - was on a train 14:53:31 <ewoud> dneary: hi 14:53:41 <eedri> ewoud, there is an issue with jenkins backups 14:53:43 <eedri> ewoud, i opened a ticket 14:53:54 <eedri> ewoud, might worth going over the trac tickets 14:54:12 <eedri> dneary, hi 14:54:50 <ewoud> eedri: I didn't see it 14:55:16 <ewoud> but we certainly should go over the issues 14:55:36 <ewoud> RH TLV has been a bit unstable lately 14:56:27 <eedri> bad network issues here... sorry 14:57:03 <ewoud> eedri: yes, it's been bad for the past week I think 14:57:13 <eedri> ewoud, you're too? 14:57:19 <eedri> ewoud, so it's OFTC issue? 14:57:31 <ewoud> eedri: no, I just see a huge wave of nat-pool-tlv-t1 going offline 14:58:01 <Yamaksi> mburns: where will the ISO be published, also on the docs or only on gerrit ? 14:58:15 <ewoud> eedri: can you link which ticket you were refereing to? I can't find it 14:58:27 <mburns> Yamaksi: it will be published on ovirt.org 14:58:47 <ewoud> eedri: is it https://fedorahosted.org/ovirt/ticket/59? 14:58:49 <mburns> Yamaksi: it will go under here: http://resources.ovirt.org/releases/node-base/ 14:59:30 <eedri> ewoud, yep 14:59:51 <Yamaksi> mburns: ah nice, was looking there already. Will place an nephew on it an tell him to press F5 every second ;) 14:59:54 <Yamaksi> *a 14:59:57 <eedri> ewoud, i have another topic on hosting 15:00:13 <ewoud> eedri: do go ahead 15:00:24 <eedri> ewoud, recently we've been hitting alot of issue with wiki on openshift... out of space/slowness 15:00:34 <eedri> ewoud, and lack of response on irc channel as well 15:00:36 <mburns> Yamaksi: we're probably at least a few hours away from having something posted 15:01:00 <Yamaksi> mburns: ah it will keep him busy, he has vacation I guess :) 15:01:02 <eedri> ewoud, should we consider migrating it out or it and into another service ? (on one of our vms/rackspace) 15:01:05 <Yamaksi> keeps them from the street ;) 15:01:21 <ewoud> eedri: possibly 15:01:34 <eedri> ewoud, worth openning a thread on it on the list 15:01:40 <eedri> ewoud, see what our options are 15:01:56 <eedri> ewoud, the wiki page had too much downtime lately, which is not health for the project... 15:01:59 <ewoud> eedri: Yes, a ML thread sounds good 15:02:08 <ewoud> and I fully agree with that 15:02:20 <eedri> dneary, ^^? 15:02:23 <eedri> dneary, what do you think? 15:02:29 <ewoud> by using PaaS we shouldn't have to worry about it 15:02:47 <dneary> eedri, Catching up 15:02:55 <eedri> ewoud, yea.. but something isn't working apparantley 15:03:03 <dneary> eedri, Yes, agreed re wiki 15:03:14 <eedri> dneary, what are our options ? 15:03:37 <dneary> Garrett is working on an update this week which will make things better wrt disk usage on the PaaS - that's been our main issue 15:03:51 <eedri> dneary, and the slowness? 15:03:56 <dneary> There was a mjor upgrade of infrastructure ~3 weeks ago which is causing this "no email" situation 15:04:03 <garrett> it will also have other bugfixes and an improved mobile experience 15:04:10 <dneary> The slowness was another badly behaved app. That just shouldn't happen 15:04:24 <eedri> dneary, i got a compain today from tlv site 15:04:26 <dneary> I'm chasing it down with the OpenShift guys 15:04:35 <eedri> dneary, but that might be relevant to local network issues.. not sure 15:04:45 <dneary> eedri, Yes, it was very slow this morning, it cleared up ~11:30 CEST 15:05:22 <eedri> dneary, so you're saying we should give it a chance ? and keep it in openshift for now 15:06:20 <dneary> eedri, Yes - let us get this update out the door, and we'll re-evaluate in a month 15:06:29 <dneary> eedri, Report will go to infra@ after that 15:06:34 <eedri> dneary, ok. thanks 15:06:38 <dneary> (after the update, that is) 15:07:10 <ewoud> I'm also quite overdue with setting a new meeting time 15:07:56 <ewoud> right, I think we're over time now, so any last items? 15:09:47 <ewoud> going once 15:09:48 <dneary> eedri, This was probably covered before I arrived, but we talked about getting together a "who has access to what/how to restart/fix service X if it's down/broken" in the wiki 15:09:52 <dneary> Does anyone own that? 15:10:08 <ewoud> dneary: I don't really think so 15:10:55 <ewoud> we did discuss it a few times and I think the closest we came was http://lists.ovirt.org/pipermail/infra/2013-April/002625.html 15:11:22 <dneary> ewoud, Can we put a name and a deadline to it? 15:11:39 <dneary> If it doesn't get done by then, fair enough - but at least we'll be able to check progress each week 15:11:59 <ewoud> dneary: do I hear a volunteer? :) 15:14:34 <lhornyak> eedri: is there a jenkisn job that runs the engine junit tests? 15:17:09 <dneary> ewoud, I wish I could 15:17:14 <dneary> I don't have most of the information 15:17:23 <dneary> Nor a decent chunk of time 15:17:44 <dneary> theron, Do you have some time? 15:17:56 <theron> dneary, I do. but we have a call in 15. 15:18:14 <dneary> theron, I mean, in the next month or so, to put together ^^^ 15:18:25 <theron> dneary, yes lol :) 15:18:42 <dneary> It doesn't have to be done in the next 15 mins 15:18:50 <dneary> Althoug if it were, that would be cool :-) 15:18:53 <ewoud> dneary: I'm also quite lacking time 15:19:12 <dneary> ewoud, Seems like Theron just "volunteered" :-) 15:19:13 <ewoud> dneary: we need to compile more info from the ML to the wiki 15:19:35 <theron> dneary, I can certainly "try" 15:20:25 <theron> dneary, we'll need to sort it out certainly. 15:22:34 <eedri> lhornyak, yes 15:23:04 <ewoud> #action theron compile a list of services and who has access 15:23:09 <ewoud> #endmeeting