[Openstack-operators] Configuring local instance storage

Discussion:

Arne Wiebalck

2014-05-08 12:18:07 UTC

Hi all,

In our cloud we use the non-shared local fs of the compute for instance storage. As our cloud gets more busy, this is now more and more becoming a serious bottleneck.

We saw that tuning the configuration, such as the IO scheduler, can improve things significantly, but in the end things are of course limited by the h/w used in the
hypervisors (which are RAIDed spinning disks in our case). For now, we don't want to go down the roads of local shared storage (due the additional complexity to
set this up) nor the road of non-local shared storage (to continue to profit from the limited impact of storage failures). So we are considering to add SSDs to our hardware
setup, which can then either be used directly or via a block level caching mechanism (e.g. bcache).

When discussing the various options to set this up, we wondering how other clouds deal with the problem of compute disk contention in general and the integration
of SSDs in particular.

So, any suggestions or experiences in this area you'd like to share would be very welcome!

Thanks!
Arne

--
Arne Wiebalck
CERN IT

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-operators/attachments/20140508/1faf7361/attachment.html>

Robert van Leeuwen

2014-05-08 13:03:09 UTC

Permalink

> In our cloud we use the non-shared local fs of the compute for instance storage.
> As our cloud gets more busy, this is now more and more becoming a serious bottleneck.
>
> When discussing the various options to set this up, we wondering how other clouds deal with the problem of compute disk contention in general and the integration
> of SSDs in particular.
>
>So, any suggestions or experiences in this area you'd like to share would be very welcome!

Hi Arne,

We run all our compute nodes with SSDs for local storage.

We optimized for 2 different flavors.
Based on the flavor the instance will end up on the correct hypervisor.
* normal instances: The are hosted on a SSD raid 1, using QCOW2 disk format
* fastio instances (e.g. for our database team): These are hosted on a bigger raid 10 volume of SSDs and use RAW disk format

We noticed a very big impact of QCOW2 vs RAW in our IOPS test:
About a factor 10 with random 16k writes.

Since we have mostly internal customers we were also able to optimize the images.
We made sure they do not do any unnecessary IOs.
E.G. no local logging, everything goes to our central log servers.

Cheers,
Robert van Leeuwen

Tim Bell

2014-05-08 15:57:30 UTC

Permalink

Robert,

The difference between RAW and QCOW2 is pretty significant... what hypervisor are you using ?

Have you seen scenarios where the two SSDs are failing at the same time ? Red Hat was recommending against mirroring SSDs as with the same write pattern, the failure points for the same batch of SSDs would be close.

Tim

-----Original Message-----
From: Robert van Leeuwen [mailto:Robert.vanLeeuwen at spilgames.com]
Sent: 08 May 2014 15:03
To: Arne Wiebalck; openstack-operators at lists.openstack.org
Subject: Re: [Openstack-operators] Configuring local instance storage

> In our cloud we use the non-shared local fs of the compute for instance storage.
> As our cloud gets more busy, this is now more and more becoming a serious bottleneck.
>
> When discussing the various options to set this up, we wondering how
> other clouds deal with the problem of compute disk contention in general and the integration of SSDs in particular.
>
>So, any suggestions or experiences in this area you'd like to share would be very welcome!

Hi Arne,

We run all our compute nodes with SSDs for local storage.

We optimized for 2 different flavors.
Based on the flavor the instance will end up on the correct hypervisor.
* normal instances: The are hosted on a SSD raid 1, using QCOW2 disk format
* fastio instances (e.g. for our database team): These are hosted on a bigger raid 10 volume of SSDs and use RAW disk format

We noticed a very big impact of QCOW2 vs RAW in our IOPS test:
About a factor 10 with random 16k writes.

Since we have mostly internal customers we were also able to optimize the images.
We made sure they do not do any unnecessary IOs.
E.G. no local logging, everything goes to our central log servers.

Cheers,
Robert van Leeuwen

Abel Lopez

2014-05-08 19:50:58 UTC

Permalink

Second that question, Using KVM at least, I couldn't find any significant differences between QCOW2 and RAW based images.
By "significant", I mean, enough to justify tossing the benefits of qcow2.

On May 8, 2014, at 8:57 AM, Tim Bell <Tim.Bell at cern.ch> wrote:

>
> Robert,
>
> The difference between RAW and QCOW2 is pretty significant... what hypervisor are you using ?
>
> Have you seen scenarios where the two SSDs are failing at the same time ? Red Hat was recommending against mirroring SSDs as with the same write pattern, the failure points for the same batch of SSDs would be close.
>
> Tim
>
> -----Original Message-----
> From: Robert van Leeuwen [mailto:Robert.vanLeeuwen at spilgames.com]
> Sent: 08 May 2014 15:03
> To: Arne Wiebalck; openstack-operators at lists.openstack.org
> Subject: Re: [Openstack-operators] Configuring local instance storage
>
>> In our cloud we use the non-shared local fs of the compute for instance storage.
>> As our cloud gets more busy, this is now more and more becoming a serious bottleneck.
>>
>> When discussing the various options to set this up, we wondering how
>> other clouds deal with the problem of compute disk contention in general and the integration of SSDs in particular.
>>
>> So, any suggestions or experiences in this area you'd like to share would be very welcome!
>
> Hi Arne,
>
> We run all our compute nodes with SSDs for local storage.
>
> We optimized for 2 different flavors.
> Based on the flavor the instance will end up on the correct hypervisor.
> * normal instances: The are hosted on a SSD raid 1, using QCOW2 disk format
> * fastio instances (e.g. for our database team): These are hosted on a bigger raid 10 volume of SSDs and use RAW disk format
>
> We noticed a very big impact of QCOW2 vs RAW in our IOPS test:
> About a factor 10 with random 16k writes.
>
> Since we have mostly internal customers we were also able to optimize the images.
> We made sure they do not do any unnecessary IOs.
> E.G. no local logging, everything goes to our central log servers.
>
> Cheers,
> Robert van Leeuwen
>
>
> _______________________________________________
> OpenStack-operators mailing list
> OpenStack-operators at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>
> _______________________________________________
> OpenStack-operators mailing list
> OpenStack-operators at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 535 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://lists.openstack.org/pipermail/openstack-operators/attachments/20140508/ce3056bf/attachment.pgp>

Robert van Leeuwen

2014-05-09 06:13:29 UTC

Permalink

We are using KVM.
What I noticed on the hypervisor was that it was actually doing lots of reads when doing the benchmark on QCOW2 images.
I simulated a MYSQL workload on the OS, I created a 20GB file and doing 100% random 16K write IOPS in it.
On QCOW we got about 600 IOPS on RAW about 5000.
With RAW we did no reads when writing while with QCOW I saw 100MB+ reads per second.
I would be happy to know if we can improve this somehow :)

We are monitoring the lifetime of the SSDs and will do some preventive swapping:
There are lifetime estimations you can read from the SSDs.
Luckily the array controller we have lets us query those stats per disk even if they are in RAID :)

Cheers,
Robert van Leeuwen

________________________________________
From: Abel Lopez [alopgeek at gmail.com]
Sent: Thursday, May 08, 2014 9:50 PM
To: Tim Bell
Cc: Robert van Leeuwen; Arne Wiebalck; openstack-operators at lists.openstack.org
Subject: Re: [Openstack-operators] Configuring local instance storage

Second that question, Using KVM at least, I couldn't find any significant differences between QCOW2 and RAW based images.
By "significant", I mean, enough to justify tossing the benefits of qcow2.

On May 8, 2014, at 8:57 AM, Tim Bell <Tim.Bell at cern.ch> wrote:

>
> Robert,
>
> The difference between RAW and QCOW2 is pretty significant... what hypervisor are you using ?
>
> Have you seen scenarios where the two SSDs are failing at the same time ? Red Hat was recommending against mirroring SSDs as with the same write pattern, the failure points for the same batch of SSDs would be close.
>
> Tim
>
> -----Original Message-----

Arne Wiebalck

2014-05-09 09:50:00 UTC

Permalink

Any experiences with "unexpected" SSD failures, i.e. failures that were not predicted? I am asking this
as we're considering to use block level caching on non-RAIDed SSDs and I'd like to get a feeling for
how much we have to reflect this in the SLA :)

Thanks!
Arne

On May 9, 2014, at 8:13 AM, Robert van Leeuwen <Robert.vanLeeuwen at spilgames.com> wrote:

> We are using KVM.
> What I noticed on the hypervisor was that it was actually doing lots of reads when doing the benchmark on QCOW2 images.
> I simulated a MYSQL workload on the OS, I created a 20GB file and doing 100% random 16K write IOPS in it.
> On QCOW we got about 600 IOPS on RAW about 5000.
> With RAW we did no reads when writing while with QCOW I saw 100MB+ reads per second.
> I would be happy to know if we can improve this somehow :)
>
> We are monitoring the lifetime of the SSDs and will do some preventive swapping:
> There are lifetime estimations you can read from the SSDs.
> Luckily the array controller we have lets us query those stats per disk even if they are in RAID :)
>
> Cheers,
> Robert van Leeuwen
>
>
> ________________________________________
> From: Abel Lopez [alopgeek at gmail.com]
> Sent: Thursday, May 08, 2014 9:50 PM
> To: Tim Bell
> Cc: Robert van Leeuwen; Arne Wiebalck; openstack-operators at lists.openstack.org
> Subject: Re: [Openstack-operators] Configuring local instance storage
>
> Second that question, Using KVM at least, I couldn't find any significant differences between QCOW2 and RAW based images.
> By "significant", I mean, enough to justify tossing the benefits of qcow2.
>
> On May 8, 2014, at 8:57 AM, Tim Bell <Tim.Bell at cern.ch> wrote:
>
>>
>> Robert,
>>
>> The difference between RAW and QCOW2 is pretty significant... what hypervisor are you using ?
>>
>> Have you seen scenarios where the two SSDs are failing at the same time ? Red Hat was recommending against mirroring SSDs as with the same write pattern, the failure points for the same batch of SSDs would be close.
>>
>> Tim
>>
>> -----Original Message-----
>
>

Robert van Leeuwen

2014-05-09 11:07:53 UTC

Permalink

We started with flashcache devices for our Swift cluster about 2 1/2 years ago.
These have one Corsair Force 3 and lifetime is now at about 50%

Using "non-supported" SSDs on "a-brand" hardware is a bit of a pain though:
We had "unexpected" faillures on SSDs we could not monitor due to a raid controller in between.
These had a RAID 0 of 2 240GB ssds and we had 2 graphite nodes with the same data.
Those failed at the same time so pretty sure they ran out. I think it was predicted if we could have read the data :
The SSDs were doing 8K IOPS / 40MB sustained for about 2 years or about 2PB of written data.

We moved to Intel SSDs when we switched to Supermicro hardware, which is supported, and very happy about those up to now.

In general the SSDs seem pretty reliable. (better then spinning platters)
We are not yet close enough to the expected (write) lifetimes to be 100% sure the counters are perfect but up to now it looks okay.
Our plan is to do some preventive swapping just to make sure we do not end up with a big problem.

Cheers,
Robert van Leeuwen

________________________________________
From: Arne Wiebalck [Arne.Wiebalck at cern.ch]
Sent: Friday, May 09, 2014 11:50 AM
To: Robert van Leeuwen
Cc: openstack-operators at lists.openstack.org
Subject: Re: [Openstack-operators] Configuring local instance storage

Any experiences with "unexpected" SSD failures, i.e. failures that were not predicted? I am asking this
as we're considering to use block level caching on non-RAIDed SSDs and I'd like to get a feeling for
how much we have to reflect this in the SLA :)

Thanks!
Arne

On May 9, 2014, at 8:13 AM, Robert van Leeuwen <Robert.vanLeeuwen at spilgames.com> wrote:

> We are using KVM.
> What I noticed on the hypervisor was that it was actually doing lots of reads when doing the benchmark on QCOW2 images.
> I simulated a MYSQL workload on the OS, I created a 20GB file and doing 100% random 16K write IOPS in it.
> On QCOW we got about 600 IOPS on RAW about 5000.
> With RAW we did no reads when writing while with QCOW I saw 100MB+ reads per second.
> I would be happy to know if we can improve this somehow :)
>
> We are monitoring the lifetime of the SSDs and will do some preventive swapping:
> There are lifetime estimations you can read from the SSDs.
> Luckily the array controller we have lets us query those stats per disk even if they are in RAID :)
>
> Cheers,
> Robert van Leeuwen
>
>
> ________________________________________
> From: Abel Lopez [alopgeek at gmail.com]
> Sent: Thursday, May 08, 2014 9:50 PM
> To: Tim Bell
> Cc: Robert van Leeuwen; Arne Wiebalck; openstack-operators at lists.openstack.org
> Subject: Re: [Openstack-operators] Configuring local instance storage
>
> Second that question, Using KVM at least, I couldn't find any significant differences between QCOW2 and RAW based images.
> By "significant", I mean, enough to justify tossing the benefits of qcow2.
>
> On May 8, 2014, at 8:57 AM, Tim Bell <Tim.Bell at cern.ch> wrote:
>
>>
>> Robert,
>>
>> The difference between RAW and QCOW2 is pretty significant... what hypervisor are you using ?
>>
>> Have you seen scenarios where the two SSDs are failing at the same time ? Red Hat was recommending against mirroring SSDs as with the same write pattern, the failure points for the same batch of SSDs would be close.
>>
>> Tim
>>
>> -----Original Message-----
>
>

George Shuklin

2014-05-11 18:59:21 UTC

Permalink

It simple - use SSD. only. OK. The price difference between 15k SAS and SSD
is negative (15k is more costly), and differce between 10k SAS and SSD is
small. SSD is pricier than a low end SATA, but I don't want to be near
system running instances on SATA array.

So answer simple - SSD. Use 5/6 RAID to save a bit money in exchange for
write performance loss, but it still gonna be much faster than spindles.
On May 8, 2014 3:24 PM, "Arne Wiebalck" <Arne.Wiebalck at cern.ch> wrote:

> Hi all,
>
> In our cloud we use the non-shared local fs of the compute for instance
> storage. As our cloud gets more busy, this is now more and more becoming a
> serious bottleneck.
>
> We saw that tuning the configuration, such as the IO scheduler, can
> improve things significantly, but in the end things are of course limited
> by the h/w used in the
> hypervisors (which are RAIDed spinning disks in our case). For now, we
> don't want to go down the roads of local shared storage (due the additional
> complexity to
> set this up) nor the road of non-local shared storage (to continue to
> profit from the limited impact of storage failures). So we are considering
> to add SSDs to our hardware
> setup, which can then either be used directly or via a block level caching
> mechanism (e.g. bcache).
>
> When discussing the various options to set this up, we wondering how
> other clouds deal with the problem of compute disk contention in general
> and the integration
> of SSDs in particular.
>
> So, any suggestions or experiences in this area you'd like to share
> would be very welcome!
>
> Thanks!
> Arne
>
> --
> Arne Wiebalck
> CERN IT
>
>
> _______________________________________________
> OpenStack-operators mailing list
> OpenStack-operators at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-operators/attachments/20140511/e1ffe57d/attachment.html>