#ovirt log

14:00:50 <ewoud> #startmeeting infra weekly
14:00:50 <ovirtbot> Meeting started Mon Jun 24 14:00:50 2013 UTC.  The chair is ewoud. Information about MeetBot at http://wiki.debian.org/MeetBot.
14:00:50 <ovirtbot> Useful Commands: #action #agreed #help #info #idea #link #topic.
14:00:53 <ewoud> #chair obasan knesenko
14:00:53 <ovirtbot> Current chairs: ewoud knesenko obasan
14:01:01 <knesenko> eedri: ?
14:02:44 <ewoud> dcaro not here either?
14:03:00 <obasan> ewoud, dcaro is not here today
14:03:24 <ewoud> ok
14:03:30 <ewoud> I see I've been slacking with the agenda
14:04:26 <knesenko> lets got guys
14:04:31 <knesenko> go*
14:04:36 <ewoud> #topic hosting
14:05:01 <ewoud> knesenko: any progress on the rackspace servers?
14:05:19 <knesenko> ewoud: yes ... I have installed ovirt engine service there
14:05:33 <knesenko> there was some issues with PTR records ...
14:06:07 * eedri here
14:06:11 <knesenko> so I have installed DNS server on rackspace01.ovirt.org that holds PTR records for the rackspace01,02
14:06:21 <ewoud> #chair eedri
14:06:21 <ovirtbot> Current chairs: eedri ewoud knesenko obasan
14:06:51 <ewoud> knesenko: and you have set up that as recursor for the rackspace machines?
14:07:02 <knesenko> I opened 80,443 ports in iptables , but seems like we are blocked by the HW firewall there , so I opened a ticket for rackspace guys
14:07:11 <knesenko> ewoud: yes
14:07:38 <knesenko> So I think the firewall issue will be solved soon
14:08:04 <knesenko> also I changed the schema a little bit
14:08:10 <ewoud> knesenko: and the DNS issue?
14:08:22 <knesenko> ewoud: DNs issue solved
14:08:31 <knesenko> regarding the schema ...
14:08:53 <knesenko> We will use rackspace01 as engine and NFS server .... instead of sing the localstorage
14:09:16 <ewoud> knesenko: but I think you don't want to run a DNS server in the long run and just have the PTR records served by rackspace
14:09:19 <knesenko> I mean rackspace01 will be engine/hosts at the same time but without localstoage
14:09:35 <ewoud> how so? won't that be a lot slower?
14:09:35 <knesenko> ewoud: they can't handle it .... we asked them
14:09:36 <eedri> ewoud, rackspace said they don't support PTR records for private ips
14:09:47 <eedri> ewoud, only public ips
14:09:51 <ewoud> ah
14:09:59 <ewoud> and you need PTR? /etc/hosts is insufficient?
14:10:02 <knesenko> ewoud: this will be a bit slower, but we will have all HA features
14:10:24 <knesenko> ewoud: PTR is must ...
14:10:34 <knesenko> lest thing I wanted to do is to install a DNS server
14:10:35 <knesenko> :)
14:10:39 <knesenko> last*
14:10:44 <Yamaksi> LOL ewoud is a chair ;)
14:10:58 <ewoud> chair == voorzitter
14:11:06 <ewoud> knesenko: but NFS isn't HA, so what do you win?
14:11:08 <Yamaksi> ewoud: chair"man";)
14:11:20 <Yamaksi> ewoud: NFS can be HA
14:11:25 <Yamaksi> if the backend supports it
14:11:37 <knesenko> ewoud: 2 hosts in the same DC
14:11:55 <knesenko> instead of using 1 host per DC
14:12:08 <ewoud> knesenko: but who is the NFS server?
14:12:10 <knesenko> we will have 1 DC with 2 hosts in it
14:12:16 <knesenko> rackspace01
14:12:42 <ewoud> so if rackspace01 goes down, it all goes down?
14:13:12 <knesenko> ewoud: same with allinone
14:13:40 <ewoud> knesenko: not true, with all-in-one rackspace02 will keep running if rackspace01 goes down
14:14:29 <knesenko> ewoud: yes, but you can't manage them
14:14:38 <knesenko> ewoud: engine will be down
14:14:50 <ewoud> knesenko: but that's less of a problem imho
14:15:10 <knesenko> ewoud: there are benefits to use NFS ...
14:15:21 <Yamaksi> ewoud: for what enviroment is this ? ovirt test ?
14:15:29 <knesenko> ewoud: we have two choices here ... NFS or 2 local DCs
14:15:39 <ewoud> knesenko: but why not gluster instead of NFS? then you'd at least have the benefit of HA storage
14:15:55 <knesenko> ewoud: possibl
14:15:55 * Yamaksi has some netapps laying around...
14:15:56 <knesenko> e
14:16:10 <ewoud> Yamaksi: computing power for CI using jenkins
14:16:26 <Yamaksi> ewoud: CI ?
14:16:29 <knesenko> ewoud: gluster is an option .... we can go with that as well
14:16:30 <Yamaksi> Code Igniter ?
14:16:38 <ewoud> continious integration
14:16:59 <Yamaksi> ewoud: and that is going todo ?
14:17:01 <knesenko> ewoud: using gluster will make our NFS HA ?
14:17:06 <eedri> Yamaksi, stateless vms for jenkins slaves
14:17:24 <Yamaksi> ah ok
14:17:53 <Yamaksi> uhm, guys, why not have a "mirror" somewhere which can provide it ? We have redundant netapps in a cluster that cannot go down
14:18:10 <Yamaksi> unless you unplug the cable
14:18:15 <Yamaksi> (s)
14:18:55 <knesenko> ewoud: i am sorry
14:18:56 <ewoud> Yamaksi: they're rather stateless so it's all throw away data, which is why I think HA is less important than uptime
14:19:15 <knesenko> ewoud: i was disconnected .... can you repeat ?
14:19:25 <ewoud> knesenko: you missed nothing
14:19:40 <knesenko> ewoud: I asked if gluster will make NFS HA ?
14:19:42 <eedri> sorry got disconnected from network
14:19:46 <Yamaksi> ewoud: okay, but you want to "share" data don't you ?
14:19:49 <ewoud> knesenko: and you DC'ed before I could answer
14:20:27 <ewoud> knesenko: I don't know how production ready gluster is and what the performance does, but gluster will replace NFS
14:20:52 <ewoud> knesenko: it does replication so the data will be both on rackspace01 and rackspace02
14:21:15 <eedri> ewoud, i don't think we need to invest too much in HA for jenkins slaves
14:21:17 <knesenko> ewoud: want to try gluster ?
14:21:29 <eedri> ewoud, it's stateless vms that we can always resintall with foreman
14:21:32 <knesenko> i am really don;t want to use local storage
14:21:49 <eedri> ewoud, as long as they will be properly puppetize
14:22:23 <ewoud> eedri: I fully agree, but I don't think NFS is a solution for us
14:22:38 <eedri> ewoud, and local storage?
14:22:49 <eedri> ewoud, will be a problem too?
14:22:49 <ewoud> it only gives the illusion of HA while in practice it will double the chance of downtime in this case
14:23:45 <ewoud> eedri: if you use local storage, the VMs on rackspace02 will keep running when rackspace01 is down
14:24:19 <knesenko> but we need to think about the future as well .... what if we will grow and we will grow ?
14:24:21 <ewoud> when you use NFS on rackspace01, both hosts will be down while you perform maintenance
14:24:35 <knesenko> we will get one more bare metal host
14:24:53 <knesenko> but gluster solves it ...
14:24:55 <knesenko> right ?
14:25:02 <ewoud> knesenko: then depending on what we want to do, we IMHO either go for gluster or local storage again
14:25:40 <Yamaksi> ewoud: doesn't it depends on the rackspace backend ? I mean performance
14:25:52 <ewoud> Yamaksi: they're bare metal
14:26:02 <knesenko> I vote for gluster
14:26:50 <knesenko> obasan: eedri ewoud ?
14:26:50 <eedri> knesenko, what is the process for installing gluster?
14:26:58 <eedri> knesenko, installign the rpms on one baremetal?
14:27:07 <knesenko> eedri: its built in the allinone installation
14:27:09 <obasan> knesenko, I heard that gluster is good solution
14:27:26 <eedri> knesenko, ok, we're still early in the installaion, so no harm
14:27:30 <eedri> +1 for gluster
14:27:52 <Yamaksi> ewoud: aha, no local storage than
14:27:53 <knesenko> guys, we can try to use gluster .... is this wont work, installing a localstorage takes 5 minutes
14:28:08 <knesenko> s/is/if
14:28:09 <ewoud> +1 on trying, if not fall back to local
14:28:22 <knesenko> ok so we decided to go with gluster
14:29:03 <ewoud> #agree we're going to try to set up gluster on rackspace hosts and fall back to local storage if it doesn't work out
14:30:10 <ewoud> knesenko: I also see another action item for you
14:30:20 <knesenko> ewoud: which one please ?
14:30:21 <ewoud> the migration plan for linode resources => alterway
14:30:50 <knesenko> ewoud: didn't touched it yet ... let me finish with rackspace servers and i will move to the migration plan
14:31:12 <ewoud> sounds good to me
14:31:39 <knesenko> ewoud: still we can;t migrate until we wil have answers for alterway setup ...
14:31:47 <knesenko> external sotrage and VM for engine
14:32:05 <eedri> ewoud, i'm waiting for answers on addtional resources from rackspace that might help
14:32:21 <eedri> ewoud, we might get an additional baremetal and some VMs.
14:32:42 <ewoud> knesenko: true, and it seems quite stable now so I'd rather focus on installing the jenkins slaves now
14:32:47 <eedri> ewoud, do you know if there might be an issue running engine on rackspace that manages alterway servers ?
14:32:48 <ewoud> eedri: ok
14:33:21 <ewoud> eedri: I think you need layer 2 access and I don't know how well it reacts to a higher latency
14:34:57 <knesenko> it will be better to use a VM that will be located in the alterway DC
14:35:08 <knesenko> ewoud: i am not sure about L2 ...
14:36:29 <ewoud> knesenko: I don't know either
14:36:40 <knesenko> ewoud: I can ask ...
14:36:42 <knesenko> :)
14:36:49 <ewoud> please do
14:37:24 <knesenko> ok
14:37:49 <eedri> ewoud, can we ask kevin is that's possible?
14:37:50 <ewoud> so to summarize: we're going to install the rackspace hosts now as a gluster cluster, then think about alterway hosting and linode migration?
14:38:02 <eedri> ewoud, +!
14:38:03 <knesenko> ewoud: yes
14:38:04 <eedri> +1
14:38:45 <ewoud> ok, then let's move on
14:38:56 <ewoud> unless there's more about hosting
14:39:16 <knesenko> no more
14:39:48 <knesenko> quaid: hello
14:40:00 <ewoud> ok
14:40:23 <ewoud> obasan: your action item about monitoring openshift quota, any progress?
14:40:37 <obasan> ewoud, yes
14:40:41 <obasan> ewoud, I have a solution for that
14:41:07 <obasan> ewoud, all there is to do is ssh to the openshift instance
14:41:17 <knesenko> eedri: Oved fixed - ovirt_engine_find_bugs
14:41:21 <knesenko> eedri: good news
14:41:24 <obasan> ewoud, ssh foo@bar-ohadbasan.rhcloud.com
14:41:27 <eedri> knesenko, :)
14:41:38 <obasan> ewoud, and then run the command "quota"
14:41:54 <eedri> knesenko, great, now we need to get unit_tests fixed (but let's wait till we reach jenkins topic)
14:42:13 <ewoud> obasan: I knew that part was possible, but do you know if we can easily hook that into icinga?
14:42:28 <obasan> ewoud, that won't be any problem.
14:42:35 <obasan> ewoud, it can be executed by icinga as a command...
14:43:10 <obasan> ewoud, just a custom script that sends the command. parses the output and alerts if needed...
14:44:06 <ewoud> obasan: cool
14:45:58 <ewoud> ok, anything else on hosting?
14:47:41 <eedri> ewoud, well
14:47:54 <eedri> ewoud, about fedora17 slaves upgrade to f19
14:48:12 <eedri> ewoud, we need to ask on ovirt meeting if it's OK to stop running tests / delivery nightly builds for f17
14:48:19 <eedri> ewoud, and upgrade your host to f19 instead
14:48:38 <eedri> ewoud, or we can wait for rackspace to be ready and install f19 slave there
14:49:07 <ewoud> eedri: then I think that f17 will still be outdated
14:49:18 <ewoud> eedri: can you ask if it's OK to stop?
14:49:42 <eedri> ewoud, i can send email to the list, not sure if i'll attend the meeting tomororw
14:49:46 <eedri> mburns, ping
14:50:06 <eedri> mburns, do you know if we can stop supporting f17 in jenkins and upgrade the slave to f19?
14:50:21 <mburns> eedri: i'd say yes
14:50:34 <eedri> mburns, so no more nightly builds for f17
14:50:41 <mburns> eedri: makes sense to me
14:50:48 <eedri> mburns, would you say it's worth rasing in tomorrow meeting?
14:50:53 <mburns> though we should definitely have f19 builds
14:50:54 <eedri> mburns, or to go ahead with it
14:51:21 <mburns> eedri: probably worth bringing up
14:51:30 <eedri> mburns, ok
14:51:37 <eedri> mburns, thanks
14:51:42 <mburns> eedri: i would think you could move most of the slaves to f19
14:51:51 <eedri> mburns, what about f18?
14:52:00 <eedri> mburns, we currently have 2 f18, 1 f17
14:52:04 <mburns> oh
14:52:06 <eedri> and one rhel
14:52:20 <mburns> let's leave it as is for now, and we'll get agreement on the weekly meeting
14:52:26 <eedri> mburns, ok
14:53:18 <ewoud> eedri: anything else on jenkins?
14:53:21 <dneary> Hi
14:53:31 <dneary> Sorry I am so late - was on a train
14:53:31 <ewoud> dneary: hi
14:53:41 <eedri> ewoud, there is an issue with jenkins backups
14:53:43 <eedri> ewoud, i opened a ticket
14:53:54 <eedri> ewoud, might worth going over the trac tickets
14:54:12 <eedri> dneary, hi
14:54:50 <ewoud> eedri: I didn't see it
14:55:16 <ewoud> but we certainly should go over the issues
14:55:36 <ewoud> RH TLV has been a bit unstable lately
14:56:27 <eedri> bad network issues here... sorry
14:57:03 <ewoud> eedri: yes, it's been bad for the past week I think
14:57:13 <eedri> ewoud, you're too?
14:57:19 <eedri> ewoud, so it's OFTC issue?
14:57:31 <ewoud> eedri: no, I just see a huge wave of nat-pool-tlv-t1 going offline
14:58:01 <Yamaksi> mburns: where will the ISO be published, also on the docs or only on gerrit ?
14:58:15 <ewoud> eedri: can you link which ticket you were refereing to? I can't find it
14:58:27 <mburns> Yamaksi: it will be published on ovirt.org
14:58:47 <ewoud> eedri: is it https://fedorahosted.org/ovirt/ticket/59?
14:58:49 <mburns> Yamaksi: it will go under here:  http://resources.ovirt.org/releases/node-base/
14:59:30 <eedri> ewoud, yep
14:59:51 <Yamaksi> mburns: ah nice, was looking there already. Will place an nephew on it an tell him to press F5 every second ;)
14:59:54 <Yamaksi> *a
14:59:57 <eedri> ewoud, i have another topic on hosting
15:00:13 <ewoud> eedri: do go ahead
15:00:24 <eedri> ewoud, recently we've been hitting alot of issue with wiki on openshift... out of space/slowness
15:00:34 <eedri> ewoud, and lack of response on irc channel as well
15:00:36 <mburns> Yamaksi: we're probably at least a few hours away from having something posted
15:01:00 <Yamaksi> mburns: ah it will keep him busy, he has vacation I guess :)
15:01:02 <eedri> ewoud, should we consider migrating it out or it and into another service ? (on one of our vms/rackspace)
15:01:05 <Yamaksi> keeps them from the street ;)
15:01:21 <ewoud> eedri: possibly
15:01:34 <eedri> ewoud, worth openning a thread on it on the list
15:01:40 <eedri> ewoud, see what our options are
15:01:56 <eedri> ewoud, the wiki page had too much downtime lately, which is not health for the project...
15:01:59 <ewoud> eedri: Yes, a ML thread sounds good
15:02:08 <ewoud> and I fully agree with that
15:02:20 <eedri> dneary, ^^?
15:02:23 <eedri> dneary, what do you think?
15:02:29 <ewoud> by using PaaS we shouldn't have to worry about it
15:02:47 <dneary> eedri, Catching up
15:02:55 <eedri> ewoud, yea.. but something isn't working apparantley
15:03:03 <dneary> eedri, Yes, agreed re wiki
15:03:14 <eedri> dneary, what are our options ?
15:03:37 <dneary> Garrett is working on an update this week which will make things better wrt disk usage on the PaaS - that's been our main issue
15:03:51 <eedri> dneary, and the slowness?
15:03:56 <dneary> There was a mjor upgrade of infrastructure ~3 weeks ago which is causing this "no email" situation
15:04:03 <garrett> it will also have other bugfixes and an improved mobile experience
15:04:10 <dneary> The slowness was another badly behaved app. That just shouldn't happen
15:04:24 <eedri> dneary, i got a compain today from tlv site
15:04:26 <dneary> I'm chasing it down with the OpenShift guys
15:04:35 <eedri> dneary, but that might be relevant to local network issues.. not sure
15:04:45 <dneary> eedri, Yes, it was very slow this morning, it cleared up ~11:30 CEST
15:05:22 <eedri> dneary, so you're saying we should give it a chance ? and keep it in openshift for now
15:06:20 <dneary> eedri, Yes - let us get this update out the door, and we'll re-evaluate in a month
15:06:29 <dneary> eedri, Report will go to infra@ after that
15:06:34 <eedri> dneary, ok. thanks
15:06:38 <dneary> (after the update, that is)
15:07:10 <ewoud> I'm also quite overdue with setting a new meeting time
15:07:56 <ewoud> right, I think we're over time now, so any last items?
15:09:47 <ewoud> going once
15:09:48 <dneary> eedri, This was probably covered before I arrived, but we talked about getting together a "who has access to what/how to restart/fix service X if it's down/broken" in the wiki
15:09:52 <dneary> Does anyone own that?
15:10:08 <ewoud> dneary: I don't really think so
15:10:55 <ewoud> we did discuss it a few times and I think the closest we came was http://lists.ovirt.org/pipermail/infra/2013-April/002625.html
15:11:22 <dneary> ewoud, Can we put a name and a deadline to it?
15:11:39 <dneary> If it doesn't get done by then, fair enough - but at least we'll be able to check progress each week
15:11:59 <ewoud> dneary: do I hear a volunteer? :)
15:14:34 <lhornyak> eedri: is there a jenkisn job that runs the engine junit tests?
15:17:09 <dneary> ewoud, I wish I could
15:17:14 <dneary> I don't have most of the information
15:17:23 <dneary> Nor a decent chunk of time
15:17:44 <dneary> theron, Do you have some time?
15:17:56 <theron> dneary, I do.  but we have a call in 15.
15:18:14 <dneary> theron, I mean, in the next month or so, to put together ^^^
15:18:25 <theron> dneary, yes lol :)
15:18:42 <dneary> It doesn't have to be done in the next 15 mins
15:18:50 <dneary> Althoug if it were, that would be cool :-)
15:18:53 <ewoud> dneary: I'm also quite lacking time
15:19:12 <dneary> ewoud, Seems like Theron just "volunteered" :-)
15:19:13 <ewoud> dneary: we need to compile more info from the ML to the wiki
15:19:35 <theron> dneary, I can certainly "try"
15:20:25 <theron> dneary, we'll need to sort it out certainly.
15:22:34 <eedri> lhornyak, yes
15:23:04 <ewoud> #action theron compile a list of services and who has access
15:23:09 <ewoud> #endmeeting