<div dir="ltr">I see. Valid points.<div>Whenever you break a production site - do you try to add a test which simulates the parameters of the breakage?</div><div>It sounds to me like some sort of an image versioning could still help here, that way you can really "roll back" (actually boot to a previous version of the image) properly.</div><div>For instance, VyOS (<a href="http://vyos.net/wiki/Upgrade">http://vyos.net/wiki/Upgrade</a>) roll out new versions this way. I'm not sure how exactly they do that but the bottom line is that it's possible to upgrade to the next release and still save all the versions and configuration to roll back if you have to.</div><div class="gmail_extra"><br><div class="gmail_quote">On 7 August 2016 at 14:18, Elazar Leibovich <span dir="ltr"><<a href="mailto:elazarl@gmail.com" target="_blank">elazarl@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><div dir="ltr">It's radio antenna.<div><br></div><div>It is of course tested before to some extent, in a "staging" environment.</div><div><br></div><div>But since the physical environment varies, and sometimes antenna related parameters change between releases (e.g., duration of receive time), it is not easy to know you're not breaking something for someone by mistake.</div><div><br></div><div>It could be for example the physical location of the antenna at the client which would make a difference.<br><div><br></div></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Sat, Aug 6, 2016 at 2:27 AM, Amos Shapira <span dir="ltr"><<a href="mailto:amos.shapira@gmail.com" target="_blank">amos.shapira@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><div dir="ltr">What kind of hardware is this that's connected to the servers, and what does the software do that you can't test before installing on production servers?</div><div class="gmail_extra"><div><div><br><div class="gmail_quote">On 6 August 2016 at 02:14, Elazar Leibovich <span dir="ltr"><<a href="mailto:elazarl@gmail.com" target="_blank">elazarl@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div>All real servers, with custom hardware attached, geographically distributed across the planet.</div><div><br></div><div>Real people actually use the hardware attached to this computers, and it's not obvious to test whether or not it failed.</div><div><br></div><div>The strategy therefor is, deploy randomly to small percentage of the machines, wait to see if you get complains from those customers using these hardware devices, and if everything went well, update the rest of the servers.</div><div><br></div><div>The provisioning solution is chef, but I'm open to changing it. As I said, I don't think it makes too much difference.</div><div><br></div><div>As of immutable server images, I'd do it with ZFS/brtfs snapshots (+docker/machinectl/systemd-ns<wbr>pawn if you must have some sort of virtual environment), but it's probably a better idea than apt-get install pkg=oldversion. Immutable filesystem for execution is of course not enough, since you might have migrations for the mutable part, etc. In this particular case, I don't think it's a big deal.</div><div><br></div><div>You see, not everything is a web startup with customer facing website ;-)</div><div><br></div><div>Thanks,</div><div>Appreciate you sharing your experience.</div><div>I'm not disagreeing with your points, but in this particular case, where testing is expensive, not all of them seems valid.</div></div><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Aug 5, 2016 at 3:15 PM, Amos Shapira <span dir="ltr"><<a href="mailto:amos.shapira@gmail.com" target="_blank">amos.shapira@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div>What provisioning tools do you use to manage these servers? Please tell me you aren't doing all of this manually.</div><div>Also what's your environment? All hardware servers? Any virtualisation involved? Cloud servers?</div><div><br></div>Reading your question it feels like you are setting yourself up to fail instead of minimising the failure altogether.<div><br></div><div>What I suggest is that you test your package automatically in a test environment (to me, Vagrant + Rspec/ServerSpec would be first candidates to check) then rollout the package to the repository for the servers to pick it up.</div><div><br></div><div>As for "roll-back" - with comprehensive automatic testing this concept is becoming obsolete, there is no such thing as "roll-back" only "roll-forward", i.e. since the testing and rolling out are small and "cheap", it should be feasible to fix whatever problem was found instead of having to revert the change altogether.</div><div><br></div><div>If you are in a properly supported virtual environment then I'd even go for immutable server images (e.g. Packer building AMI's, or Docker containers), then it's a matter of just firing up an instance of the new image both when testing and in production.</div><div><br></div><div>--Amos</div></div><div class="gmail_extra"><div><div><br><div class="gmail_quote">On 3 August 2016 at 16:55, Elazar Leibovich <span dir="ltr"><<a href="mailto:elazarl@gmail.com" target="_blank">elazarl@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><div dir="ltr">How exactly you connect to the server is not in the scope of the discussion, and I agree that ansible is a sensible solution.<div><br></div><div>But what you're proposing is to manually update the package on a small percent of the machines.</div><div><br></div><div>Manual solution is fine, but I would like to hear experience of people who actually did that on many servers.</div><div><br></div><div>There are many other issues, for example, how to you roll back?</div><div><br></div><div>apt-get remove exposes you to the risk that the uninstallation script would be buggy. There are other solutions, e.g., btrfs snapshots on root partitions, but I'm curious to hear someone experienced with it to expose issues I didn't even thought of.</div><div><br></div><div>Another issue is, how do you select the servers you try it?</div><div><br></div><div>You suggested a static "beta" list, and I think it's better to select the candidates randomly on each update.</div><div><br></div><div>Anyhow, how exactly you connect to the server is not the essence of the issue.</div></div><div><div><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Aug 3, 2016 at 9:30 AM, Evgeniy Ginzburg <span dir="ltr"><<a href="mailto:nad.oby@gmail.com" target="_blank">nad.oby@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div>Hello.</div><div>I'm assuming that you have paswordless ssh to the servers in question as root.</div><div>Also I assume that you don't use central management/deployment software (ansible/puppet/chef)</div>In similar cases I usully use parallel-ssh (gnu-parallel is another alternative).<div>First stage install the package manually on one server to see that configuration is OK, daemons restart, etc...</div><div>If this stage is ok second step will be creating list of servers for "complain" list and install package on them trough parallel-ssh.</div><div>Instead of waiting for complains, one can define metrics to check and use some monitoring appliance for verification.</div><div>I case of failure remove package from repository and remove-install again.</div><div>Third will be parallel-ssh install on all the servers.</div><div><br></div><div>P. S. In case of few tens of servers I'd prefer to work with ansible or alternative, it's worh it in most cases/</div><div><br></div><div>Best Regards, Evgeniy.</div><div><br></div></div><div class="gmail_extra"><br><div class="gmail_quote"><div><div>On Tue, Aug 2, 2016 at 8:50 PM, Elazar Leibovich <span dir="ltr"><<a href="mailto:elazarl@gmail.com" target="_blank">elazarl@gmail.com</a>></span> wrote:<br></div></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><div><div><div dir="ltr">Hi,<div><br></div><div>I'm having a few (say, a few tens) Debian machines, with a local repository defined.</div><div><br></div><div>In the local repository I have some home made packages I'm building and pushing to the local repository.</div><div><br></div><div>When I'm upgrading my package, I want to be sure the update wouldn't cause a problem.</div><div><br></div><div>So I wish to install them on a few percentage of the machines, wait for complaints.</div><div><br></div><div>If complaints arrive - roll back.</div><div>Otherwise keep upgrading the whole machines.</div><div><br></div><div>I'll appreciate your advice and experience of similar situation,</div><div>I'll appreciate if someone who had actual real life experience with this situation would mention it in the comments.</div><div><br></div><div>Thanks,</div></div>
<br></div></div>______________________________<wbr>_________________<br>
Linux-il mailing list<br>
<a href="mailto:Linux-il@cs.huji.ac.il" target="_blank">Linux-il@cs.huji.ac.il</a><br>
<a href="http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il" rel="noreferrer" target="_blank">http://mailman.cs.huji.ac.il/m<wbr>ailman/listinfo/linux-il</a><br>
<br></blockquote></div><span><font color="#888888"><br><br clear="all"><span><font color="#888888"><div><br></div>-- <br><div data-smartmail="gmail_signature">So long, and thanks for all the fish.</div>
</font></span></font></span></div><span><font color="#888888">
</font></span></blockquote></div><span><font color="#888888"><br></font></span></div><span><font color="#888888">
</font></span></div></div><span><font color="#888888"><br>______________________________<wbr>_________________<br>
Linux-il mailing list<br>
<a href="mailto:Linux-il@cs.huji.ac.il" target="_blank">Linux-il@cs.huji.ac.il</a><br>
<a href="http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il" rel="noreferrer" target="_blank">http://mailman.cs.huji.ac.il/m<wbr>ailman/listinfo/linux-il</a><br>
<br></font></span></blockquote></div><span><font color="#888888"><br><br clear="all"><span><font color="#888888"><div><br></div>-- <br></font></span></font></span></div></div><span><font color="#888888"><span><font color="#888888"><div data-smartmail="gmail_signature"><div dir="ltr"><a href="http://au.linkedin.com/in/gliderflyer" target="_blank"><img src="https://static.licdn.com/scds/common/u/img/webpromo/btn_viewmy_160x25.png"></a><br></div></div>
</font></span></font></span></div><span><font color="#888888">
</font></span></blockquote></div><span><font color="#888888"><br></font></span></div><span><font color="#888888">
</font></span></blockquote></div><span><font color="#888888"><br><br clear="all"><div><br></div></font></span></div></div><span><font color="#888888"><span><font color="#888888">-- <br><div data-smartmail="gmail_signature"><div dir="ltr"><a href="http://au.linkedin.com/in/gliderflyer" target="_blank"><img src="https://static.licdn.com/scds/common/u/img/webpromo/btn_viewmy_160x25.png"></a><br></div></div>
</font></span></font></span></div>
</blockquote></div><br></div>
</blockquote></div><br><br clear="all"><div><br></div>-- <br><div data-smartmail="gmail_signature"><div dir="ltr"><a href="http://au.linkedin.com/in/gliderflyer" target="_blank"><img src="https://static.licdn.com/scds/common/u/img/webpromo/btn_viewmy_160x25.png"></a><br></div></div>
</div></div>