<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/">
  <channel>
    <title>Storage on Luis Logs</title>
    <link>https://luislogs.com/categories/storage/</link>
    <description>Recent content in Storage on Luis Logs</description>
    <generator>Hugo -- gohugo.io</generator>
    <language>en-us</language>
    <lastBuildDate>Sat, 30 Mar 2024 20:30:30 +0900</lastBuildDate><atom:link href="https://luislogs.com/categories/storage/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>K8s with external Ceph, disaster recovery, and StorageClass migration</title>
      <link>https://luislogs.com/posts/k8s-with-external-ceph-disaster-recovery-and-storageclass-migration/</link>
      <pubDate>Sat, 30 Mar 2024 20:30:30 +0900</pubDate>
      
      <guid>https://luislogs.com/posts/k8s-with-external-ceph-disaster-recovery-and-storageclass-migration/</guid>
      <description>In the past couple of weeks I was able to source matching mini USFF PCs which upgrades the mini homelab from 14 CPU cores to 18! Along with this I decided to attach a 2.5Gbe NIC and a 1TB NVME on each device to be used for Ceph allowing for hyper-converged infrastructure.
Ceph on its own is a huge topic. It has so many moving parts-monitors, metadata servers, OSDs, placement groups to name a few.</description>
      <content:encoded><![CDATA[<p>In the past couple of weeks I was able  to source matching mini USFF PCs which upgrades the mini homelab from 14 CPU cores to 18! Along with this I decided to attach a 2.5Gbe NIC and a 1TB NVME on each device to be used for Ceph allowing for hyper-converged infrastructure.</p>




	




































  	
	

	
		<script src="/shortcode-gallery/jquery-3.7.0.min.js"></script>
	
	
	
		<script src="/shortcode-gallery/lazy/jquery.lazy.min.js"></script>
	

	<script src="/shortcode-gallery/swipebox/js/jquery.swipebox.min.js"></script>
	<link rel="stylesheet" href="/shortcode-gallery/swipebox/css/swipebox.min.css">

	<script src="/shortcode-gallery/justified_gallery/jquery.justifiedGallery.min.js"></script>
	<link rel="stylesheet" href="/shortcode-gallery/justified_gallery/justifiedGallery.min.css"/>


<style>
	

	
</style>





<div id="gallery-84caa044864117c79524b86417764b30-0-wrapper" class="gallery-wrapper">
<div id="gallery-84caa044864117c79524b86417764b30-0" class="justified-gallery">
	
		
		
				
			
			
			
				
			

			
			
				
					
				
			


			
			
			
				
				
				
				
				
				
				
				
			

			
			


			<div>
				
				
					
				
				<a href="/posts/k8s-with-external-ceph-disaster-recovery-and-storageclass-migration/minipcs/1.jpg" 
					class="galleryImg"
					
						

						
							data-description="SM-G998U1 + <br/>9mm f/2.4 0.0333sec ISO 320"
						

						
					
					>
					<img			
						width="600" height="450"

						
							
							style="filter: blur(25px);"
							
								src="data:image/jpeg;base64,/9j/2wCEAAoHBwgHBgoICAgLCgoLDhgQDg0NDh0VFhEYIx8lJCIfIiEmKzcvJik0KSEiMEExNDk7Pj4&#43;JS5ESUM8SDc9PjsBCgsLDg0OHBAQHDsoIig7Ozs7Ozs7Ozs7Ozs7Ozs7Ozs7Ozs7Ozs7Ozs7Ozs7Ozs7Ozs7Ozs7Ozs7Ozs7Ozs7O//AABEIABgAIAMBIgACEQEDEQH/xAGiAAABBQEBAQEBAQAAAAAAAAAAAQIDBAUGBwgJCgsQAAIBAwMCBAMFBQQEAAABfQECAwAEEQUSITFBBhNRYQcicRQygZGhCCNCscEVUtHwJDNicoIJChYXGBkaJSYnKCkqNDU2Nzg5OkNERUZHSElKU1RVVldYWVpjZGVmZ2hpanN0dXZ3eHl6g4SFhoeIiYqSk5SVlpeYmZqio6Slpqeoqaqys7S1tre4ubrCw8TFxsfIycrS09TV1tfY2drh4uPk5ebn6Onq8fLz9PX29/j5&#43;gEAAwEBAQEBAQEBAQAAAAAAAAECAwQFBgcICQoLEQACAQIEBAMEBwUEBAABAncAAQIDEQQFITEGEkFRB2FxEyIygQgUQpGhscEJIzNS8BVictEKFiQ04SXxFxgZGiYnKCkqNTY3ODk6Q0RFRkdISUpTVFVWV1hZWmNkZWZnaGlqc3R1dnd4eXqCg4SFhoeIiYqSk5SVlpeYmZqio6Slpqeoqaqys7S1tre4ubrCw8TFxsfIycrS09TV1tfY2dri4&#43;Tl5ufo6ery8/T19vf4&#43;fr/2gAMAwEAAhEDEQA/AOM0ZXjvk2IIlVSryspYp/tAdz6A12FncXdlqAsbqRZY3ZXSfJ3EcEc9QcdquaLpumCaS1khSO/tM7Vjw6up6SDP3iP/AK3pjEvkSDV3svtSK28ea0kuAhH8WfXPT9aiWpcZOLsalxpbarr2uQr5IeIxeW0hJPzDPBzWWthBbQXKNNHLOojKGNHBB3DnJA6g1sx3n2DU7mXT7i1vWuhGD8srMxUf7C4/WuZ1DVJre42iSKUzKvmKrHovQdBg8CmthSvc6DUraeG&#43;ESp&#43;/df3M0X30HGSPSqUVlFpesaZHIRGWbmXhmYMSpySMEjn1/Gt&#43;&#43;/5GK1/65N/SsXxF/yFtI&#43;h/wDRhqaa906K8rtadEbF5o8WjalHPEtxPIZN43SbEzuYAY25P3a5i&#43;8O38t5NqGk&#43;XLLbysHiCgknqeDw3DYx/OvQPFH/H3B/vr/AOhNWZ4c/wCPrUv&#43;vt/5Cqm7K5zQV2f/2Q=="
							
							class="lazy"
							data-src="/posts/k8s-with-external-ceph-disaster-recovery-and-storageclass-migration/minipcs/1_hu083888481282419fffa537aed85f52ab_3332078_600x600_fit_q90_lanczos.jpg"
						

						
							
								
							
						
					>
				</a>
			</div>
		
	
		
		
				
			
			
			
				
			

			
			
				
					
				
			


			
			
			
				
				
				
				
				
				
				
				
			

			
			


			<div>
				
				
					
				
				<a href="/posts/k8s-with-external-ceph-disaster-recovery-and-storageclass-migration/minipcs/2.jpg" 
					class="galleryImg"
					
						

						
							data-description="SM-G998U1 + <br/>6.7mm f/1.8 1/60sec ISO 400"
						

						
					
					>
					<img			
						width="600" height="450"

						
							
							style="filter: blur(25px);"
							
								src="data:image/jpeg;base64,/9j/2wCEAAoHBwgHBgoICAgLCgoLDhgQDg0NDh0VFhEYIx8lJCIfIiEmKzcvJik0KSEiMEExNDk7Pj4&#43;JS5ESUM8SDc9PjsBCgsLDg0OHBAQHDsoIig7Ozs7Ozs7Ozs7Ozs7Ozs7Ozs7Ozs7Ozs7Ozs7Ozs7Ozs7Ozs7Ozs7Ozs7Ozs7Ozs7O//AABEIABgAIAMBIgACEQEDEQH/xAGiAAABBQEBAQEBAQAAAAAAAAAAAQIDBAUGBwgJCgsQAAIBAwMCBAMFBQQEAAABfQECAwAEEQUSITFBBhNRYQcicRQygZGhCCNCscEVUtHwJDNicoIJChYXGBkaJSYnKCkqNDU2Nzg5OkNERUZHSElKU1RVVldYWVpjZGVmZ2hpanN0dXZ3eHl6g4SFhoeIiYqSk5SVlpeYmZqio6Slpqeoqaqys7S1tre4ubrCw8TFxsfIycrS09TV1tfY2drh4uPk5ebn6Onq8fLz9PX29/j5&#43;gEAAwEBAQEBAQEBAQAAAAAAAAECAwQFBgcICQoLEQACAQIEBAMEBwUEBAABAncAAQIDEQQFITEGEkFRB2FxEyIygQgUQpGhscEJIzNS8BVictEKFiQ04SXxFxgZGiYnKCkqNTY3ODk6Q0RFRkdISUpTVFVWV1hZWmNkZWZnaGlqc3R1dnd4eXqCg4SFhoeIiYqSk5SVlpeYmZqio6Slpqeoqaqys7S1tre4ubrCw8TFxsfIycrS09TV1tfY2dri4&#43;Tl5ufo6ery8/T19vf4&#43;fr/2gAMAwEAAhEDEQA/AOYvvG&#43;ZpHawuBI7hi0655GfQj1NU7TxVcpGYo4BsfhicAkZzjp7Cur1ttBnuHWyt4Zz5mxmYkIT3CYGPx6fXtjTaDaAMY47qN1UsY7Vll6fUj19KbT6iSi9uhc8G6zbWGq3bTI8Fk0TSgSZ3H5gQAB1&#43;8cVY1fxvDq&#43;mzWSxXqeadu2I7AT2BPpz0Nc42kPc2xjW8iV3b/VXQ8uQYP1IPT3qpcaJrtkuTC7x9ujj8Aalb3LvpY7jUPCtzEiqbSO4RFwDGMkfh1/KsI2MljMHt5pbSVOQOcD6jr39a9XP&#43;sP0rzrXf8Aj/n&#43;g/mKpanJWTp6xY1NauzEBeRQXv8ACzEBWK/Tgfqag0W0u9W16O3WTyA8u7IXy1ij6Hnoxxx/&#43;uqg/wBXXQeF/wDkNxf9cT/6EKJKyCjWlKVmf//Z"
							
							class="lazy"
							data-src="/posts/k8s-with-external-ceph-disaster-recovery-and-storageclass-migration/minipcs/2_hufb50b81dabd357b55baf1a49af9450ee_1647246_600x600_fit_q90_lanczos.jpg"
						

						
							
								
							
						
					>
				</a>
			</div>
		
	
		
		
				
			
			
			
				
			

			
			
				
					
				
			


			
			
			
				
				
				
				
				
				
				
				
					
				
			

			
			


			<div>
				
				
					
				
				<a href="/posts/k8s-with-external-ceph-disaster-recovery-and-storageclass-migration/minipcs/3.jpg" 
					class="galleryImg"
					
						

						
							data-description="SM-G998U1 + <br/>9mm f/2.4 1/30sec ISO 400"
						

						
					
					>
					<img			
						width="450" height="600"

						
							
							style="filter: blur(25px);"
							
								src="data:image/jpeg;base64,/9j/2wCEAAoHBwgHBgoICAgLCgoLDhgQDg0NDh0VFhEYIx8lJCIfIiEmKzcvJik0KSEiMEExNDk7Pj4&#43;JS5ESUM8SDc9PjsBCgsLDg0OHBAQHDsoIig7Ozs7Ozs7Ozs7Ozs7Ozs7Ozs7Ozs7Ozs7Ozs7Ozs7Ozs7Ozs7Ozs7Ozs7Ozs7Ozs7O//AABEIACAAGAMBIgACEQEDEQH/xAGiAAABBQEBAQEBAQAAAAAAAAAAAQIDBAUGBwgJCgsQAAIBAwMCBAMFBQQEAAABfQECAwAEEQUSITFBBhNRYQcicRQygZGhCCNCscEVUtHwJDNicoIJChYXGBkaJSYnKCkqNDU2Nzg5OkNERUZHSElKU1RVVldYWVpjZGVmZ2hpanN0dXZ3eHl6g4SFhoeIiYqSk5SVlpeYmZqio6Slpqeoqaqys7S1tre4ubrCw8TFxsfIycrS09TV1tfY2drh4uPk5ebn6Onq8fLz9PX29/j5&#43;gEAAwEBAQEBAQEBAQAAAAAAAAECAwQFBgcICQoLEQACAQIEBAMEBwUEBAABAncAAQIDEQQFITEGEkFRB2FxEyIygQgUQpGhscEJIzNS8BVictEKFiQ04SXxFxgZGiYnKCkqNTY3ODk6Q0RFRkdISUpTVFVWV1hZWmNkZWZnaGlqc3R1dnd4eXqCg4SFhoeIiYqSk5SVlpeYmZqio6Slpqeoqaqys7S1tre4ubrCw8TFxsfIycrS09TV1tfY2dri4&#43;Tl5ufo6ery8/T19vf4&#43;fr/2gAMAwEAAhEDEQA/AF18N/YzqFJLDAAHXiq&#43;mRw20IGoXC26yAkx7isgHI7qcdKb4ll1HTxGPtTDcq52EgFsc4/GsK31fVI8BbyXAYffcN&#43;p5FZQha9zWUr2SOtVtCk3i3vljkx1klwP1AzTPJs/&#43;gtY/wDfY/xrmrLV76XxJbQXE8jLNKgIVzg5OMYHBFehfYov7g/KqfMidDl/F96xiWzmtc8CQSRvtJGO&#43;QawLDTJb&#43;yW8trZniLBch0bYc9D6fQ11Wu2L6rbQm1KvOo2klgBjHrmuYg8Ma7phLWsEm8jDNBKAHX0KnrUqULfEVyyvsKtmuna1aSXdpeRTrMuwqqBN2eM47Zrsv7VvfRP&#43;&#43;T/AI1y1ovihJYrYWrxweYpfMSoFAPOMYA4zXQeTe/31/76WpnJdGEU&#43;qP/2Q=="
							
							class="lazy"
							data-src="/posts/k8s-with-external-ceph-disaster-recovery-and-storageclass-migration/minipcs/3_hu6c0c0e91edfa69d4dc297089c4d2eb97_3173660_600x600_fit_q90_r270_lanczos.jpg"
						

						
							
								
							
						
					>
				</a>
			</div>
		
	
</div>
</div>

<script>
	if (!jQuery) {
		alert("jquery is not loaded");
	}

	$( document ).ready(() => {
		const gallery = $("#gallery-84caa044864117c79524b86417764b30-0");
		

		
		let swipeboxInstance = null;

		
		
		gallery.on('jg.complete', () => {
			
				
				
				$(() => {
					$('.lazy').Lazy({
						visibleOnly: true,
						afterLoad: element => element.css({filter: "none", transition: "filter 1.0s ease-in-out"})
					});
				});
			

			swipeboxInstance = $('.galleryImg').swipebox(
				jQuery.extend({},
					{  }
				)
			);
		});

		
		gallery.justifiedGallery({
			rowHeight : "150",
			margins : "5",
			border : 0,
			randomize :  false ,
			waitThumbnailsLoad : false,
			lastRow : "justify",
			captions : false,
			
			
		});

		
		
	});
</script>

<p>Ceph on its own is a huge topic. It has so many moving parts-monitors, metadata servers, OSDs, placement groups to name a few. And yes I am completely new to it. The good thing in today&rsquo;s age every beautiful piece of software together with the community backing it stems from how well the user documenation is written. Here I am not just talking about Ceph but also Proxmox. Both applications have very well-written documentation to get you started, and if anything just goes haywire, you have a plethora of resources to point you back to the right direction.

    <img src="/posts/k8s-with-external-ceph-disaster-recovery-and-storageclass-migration/pveplusceph.png" alt="PVE and Ceph">
  </p>
<p>That said, Proxmox also nicely integrates Ceph and the complete installation can be done from the GUI itself. All I had to do was create the VLANs and specify the interface to be used for the public and private network. Though to be honest this part took me some time and a fair amount of ceph reinstallations to understand a bit more on the networking side.</p>
<p>Today&rsquo;s agenda is to walk through how I was able to integrate my K8S cluster to an externally managed ceph storage, and how I am actually migrating to this new cluster by performing a disaster recovery procedure. I am also sharing here whatever resources were used for this operation.</p>
<p>References:</p>
<ul>
<li><a href="https://pve.proxmox.com/wiki/Deploy_Hyper-Converged_Ceph_Cluster">https://pve.proxmox.com/wiki/Deploy_Hyper-Converged_Ceph_Cluster</a></li>
<li><a href="https://rook.io/docs/rook/latest-release/CRDs/Cluster/external-cluster/">https://rook.io/docs/rook/latest-release/CRDs/Cluster/external-cluster/</a></li>
<li><a href="https://github.com/catalogicsoftware/velero-docs/blob/main/velero-intro-pvs/velero-intro.md">https://github.com/catalogicsoftware/velero-docs/blob/main/velero-intro-pvs/velero-intro.md</a></li>
<li><a href="https://velero.io/docs/v1.13/migration-case/">https://velero.io/docs/v1.13/migration-case/</a></li>
<li><a href="https://velero.io/docs/v1.13/contributions/minio/">https://velero.io/docs/v1.13/contributions/minio/</a></li>
</ul>
<h2 id="current-situation">Current situation</h2>
<p>At the moment I have an RKE2 cluster running Longhorn as my storage solution for persistent volumes. It has worked wonderfully -safeguarding my data with replication. But the long term goal had always been to leverage Proxmox&rsquo;s built-in ceph management tool. Back then I was not having the appropriate hardware but with recent &ldquo;reasonable&rdquo; upgrades, I think now is the right time. There aren&rsquo;t any heavy workload and majority of what are running are media management tools. The problem is the virtual disk is 150GB which takes up about a third of my 500GB SSD. Apart from this Longhorn storage is running on top of the same disk where the RKE2 ETCD DB is running and all just converge on the same physical disk where Proxmox is also installed. The cluster makes use of a single storageclass for both RWO and RWX PVCs. This cluster is also running in the same vlan where the rest of my VMs are also running. A single vNIC is attached for both k8s traffic, and longhorn replication traffic.</p>
<h2 id="planning-and-logistics">Planning and logistics</h2>
<p>The second cluster will have about 100GB total size (about 80GB total of ephemeral storage after OS). <a href="https://rook.io/">Rook</a> will be used to connect to the Proxmox-managed ceph cluster for persistent volumes. With this the cluster will have two storageclasses — <code>ceph-rbd</code> for RWO PVCs, and <code>cephfs</code> for RWX. Each VM will have two vNICs attached. First vNIC will be assigned a new VLAN3 to be used solely for k8s traffic, and the second vNIC will be running in another VLAN dedicated for ceph traffic. <a href="https://velero.io/">Velero</a> will be used to migrate to the new cluster. <a href="https://min.io/">MinIO</a> will be used as the S3-compatible storage for storing the backups. Existing Cilium LB IPs will be re-used by re-learning BGP routes from the main router with the new VLAN.</p>
<p>I am working on a diagram for the network at home and although it&rsquo;s still work in progress, it should be more than enough to help you understand how things are connected and segmented.

    <img src="/posts/k8s-with-external-ceph-disaster-recovery-and-storageclass-migration/netdiag.png" alt="Toky-prod network diagram">
  </p>
<p>To expound more on this, on high-level we have two clusters - cluster 1 (old) and cluster 2 (new). The migration will go as follows:</p>
<ul>
<li>First we will prepare cluster 2 by integrating to an external ceph cluster.</li>
<li>Then install velero on both k8s clusters.</li>
<li>On cluster 1, perform a backup of the resources.</li>
<li>Go back to cluster 2 and perform the restore. RWO PVCs directly should be migrated to ceph-rbd, and the RWX PVCs to be migrated to cephfs.</li>
<li>Block cluster 1 IPs (on VLAN20) from internet access to avoid issues with cert-manager.</li>
<li>Enable BGP peering on cluster 2 (VLAN3) and reconfigure BGP on OPNsense to route externally exposed IPs to this cluster.</li>
</ul>
<h2 id="installing-rook-and-integrating-with-an-external-cluster">Installing Rook and integrating with an external cluster</h2>
<p>Download create-external-cluster-resources.py:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-zsh" data-lang="zsh"><span class="line"><span class="cl">curl -s https://raw.githubusercontent.com/rook/rook/release-1.13/deploy/examples/create-external-cluster-resources.py &gt; create-external-cluster-resources.py
</span></span></code></pre></div><p>Without enabling prometheus you might get the following error after the next step:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-v" data-lang="v"><span class="line"><span class="cl"><span class="nc">ExecutionFailureException</span><span class="p">:</span> <span class="nv">can</span><span class="s1">&#39;t find monitoring_endpoint, prometheus module might not be enabled, enable the module by running &#39;</span><span class="nv">ceph</span> <span class="nv">mgr</span> <span class="kn">module</span> <span class="nv">enable</span> <span class="nv">prometheus</span><span class="s1">&#39;
</span></span></span></code></pre></div><p>To avoid this you can either enable prometheus with the command below or add <code>--skip-monitoring-endpoint</code> when executing the python script.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-v" data-lang="v"><span class="line"><span class="cl"><span class="nv">ceph</span> <span class="nv">mgr</span> <span class="kn">module</span> <span class="nv">enable</span> <span class="nv">prometheus</span>
</span></span></code></pre></div><p>Execute the script to generate the required environment variables:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sh" data-lang="sh"><span class="line"><span class="cl">python3 create-external-cluster-resources.py --rbd-data-pool-name tokyoceph --namespace rook-ceph-external --format bash
</span></span></code></pre></div><p>Sample output:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sh" data-lang="sh"><span class="line"><span class="cl"><span class="nb">export</span> <span class="nv">NAMESPACE</span><span class="o">=</span>rook-ceph-external
</span></span><span class="line"><span class="cl"><span class="nb">export</span> <span class="nv">ROOK_EXTERNAL_FSID</span><span class="o">=</span>31833ea9-541d-435d-96ef-e9c653e97d2b
</span></span><span class="line"><span class="cl"><span class="nb">export</span> <span class="nv">ROOK_EXTERNAL_USERNAME</span><span class="o">=</span>client.healthchecker
</span></span><span class="line"><span class="cl"><span class="nb">export</span> <span class="nv">ROOK_EXTERNAL_CEPH_MON_DATA</span><span class="o">=</span><span class="nv">pve1</span><span class="o">=</span>10.22.0.11:6789
</span></span><span class="line"><span class="cl"><span class="nb">export</span> <span class="nv">ROOK_EXTERNAL_USER_SECRET</span><span class="o">=</span><span class="nv">AQAigfdlyQocHhAAtBlaM6EMLp9N6ysQoUUN2A</span><span class="o">==</span>
</span></span><span class="line"><span class="cl"><span class="nb">export</span> <span class="nv">CSI_RBD_NODE_SECRET</span><span class="o">=</span>AQAigfdlfB/OHhAAeVpSf6Ow4OE8pWjOU2XPkA<span class="o">==</span>
</span></span><span class="line"><span class="cl"><span class="nb">export</span> <span class="nv">CSI_RBD_NODE_SECRET_NAME</span><span class="o">=</span>csi-rbd-node
</span></span><span class="line"><span class="cl"><span class="nb">export</span> <span class="nv">CSI_RBD_PROVISIONER_SECRET</span><span class="o">=</span><span class="nv">AQAigfdld8WCHxBBtakNxF2axoyXbghVzsLROA</span><span class="o">==</span>
</span></span><span class="line"><span class="cl"><span class="nb">export</span> <span class="nv">CSI_RBD_PROVISIONER_SECRET_NAME</span><span class="o">=</span>csi-rbd-provisioner
</span></span><span class="line"><span class="cl"><span class="nb">export</span> <span class="nv">CEPHFS_POOL_NAME</span><span class="o">=</span>tokyocephfs_data
</span></span><span class="line"><span class="cl"><span class="nb">export</span> <span class="nv">CEPHFS_METADATA_POOL_NAME</span><span class="o">=</span>tokyocephfs_metadata
</span></span><span class="line"><span class="cl"><span class="nb">export</span> <span class="nv">CEPHFS_FS_NAME</span><span class="o">=</span>tokyocephfs
</span></span><span class="line"><span class="cl"><span class="nb">export</span> <span class="nv">CSI_CEPHFS_NODE_SECRET</span><span class="o">=</span>AQA6gvdlNsZGKxAAXbWkhSawMStb+CGwV7Lurw<span class="o">==</span>
</span></span><span class="line"><span class="cl"><span class="nb">export</span> <span class="nv">CSI_CEPHFS_PROVISIONER_SECRET</span><span class="o">=</span>AQA6gvdlP3MhLBAA+vykqaFjLh20bagnGNwGsg<span class="o">==</span>
</span></span><span class="line"><span class="cl"><span class="nb">export</span> <span class="nv">CSI_CEPHFS_NODE_SECRET_NAME</span><span class="o">=</span>csi-cephfs-node
</span></span><span class="line"><span class="cl"><span class="nb">export</span> <span class="nv">CSI_CEPHFS_PROVISIONER_SECRET_NAME</span><span class="o">=</span>csi-cephfs-provisioner
</span></span><span class="line"><span class="cl"><span class="nb">export</span> <span class="nv">MONITORING_ENDPOINT</span><span class="o">=</span>10.22.0.12
</span></span><span class="line"><span class="cl"><span class="nb">export</span> <span class="nv">MONITORING_ENDPOINT_PORT</span><span class="o">=</span><span class="m">9283</span>
</span></span><span class="line"><span class="cl"><span class="nb">export</span> <span class="nv">RBD_POOL_NAME</span><span class="o">=</span>tokyoceph
</span></span><span class="line"><span class="cl"><span class="nb">export</span> <span class="nv">RGW_POOL_PREFIX</span><span class="o">=</span>default
</span></span></code></pre></div><p>Paste the output in your shell environment either in the CLI or in the shell configuration file e.g. <code>~/.bashrc</code> or <code>~/.zshrc</code>. If you are pasting in your shell configuration file, don&rsquo;t forge to reload.</p>
<p>Download the import script and run it:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sh" data-lang="sh"><span class="line"><span class="cl">curl -s https://raw.githubusercontent.com/rook/rook/release-1.13/deploy/examples/import-external-cluster.sh &gt; import-external-cluster.sh
</span></span><span class="line"><span class="cl">./import-external-cluster.sh
</span></span></code></pre></div><p>This import script will read the environment variables and accordingly create the necessary resources to connect to your external ceph cluster. The following resources should get created:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-zsh" data-lang="zsh"><span class="line"><span class="cl">namespace/rook-ceph-external created
</span></span><span class="line"><span class="cl">secret/rook-ceph-mon created
</span></span><span class="line"><span class="cl">configmap/rook-ceph-mon-endpoints created
</span></span><span class="line"><span class="cl">secret/rook-csi-rbd-node created
</span></span><span class="line"><span class="cl">secret/rook-csi-rbd-provisioner created
</span></span><span class="line"><span class="cl">secret/rook-csi-cephfs-node created
</span></span><span class="line"><span class="cl">secret/rook-csi-cephfs-provisioner created
</span></span><span class="line"><span class="cl">storageclass.storage.k8s.io/ceph-rbd created
</span></span><span class="line"><span class="cl">storageclass.storage.k8s.io/cephfs created
</span></span></code></pre></div><p>Now it&rsquo;s time to install rook! Take note of the first two lines. Here we specify the namespace where the rook operator will be running in and the namespace for the external ceph cluster. You can have a common namespace if you wish.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-zsh" data-lang="zsh"><span class="line"><span class="cl"><span class="nb">export</span> <span class="nv">operatorNamespace</span><span class="o">=</span><span class="s2">&#34;rook-ceph&#34;</span>
</span></span><span class="line"><span class="cl"><span class="nb">export</span> <span class="nv">clusterNamespace</span><span class="o">=</span><span class="s2">&#34;rook-ceph-external&#34;</span>
</span></span><span class="line"><span class="cl">curl -s https://raw.githubusercontent.com/rook/rook/release-1.13/deploy/charts/rook-ceph/values.yaml &gt; values.yaml
</span></span><span class="line"><span class="cl">curl -s https://raw.githubusercontent.com/rook/rook/release-1.13/deploy/charts/rook-ceph-cluster/values-external.yaml &gt; values-external.yaml
</span></span><span class="line"><span class="cl">helm install --create-namespace --namespace <span class="nv">$operatorNamespace</span> rook-ceph rook-release/rook-ceph -f values.yaml
</span></span><span class="line"><span class="cl">helm install --create-namespace --namespace <span class="nv">$clusterNamespace</span> rook-ceph-cluster <span class="se">\
</span></span></span><span class="line"><span class="cl"><span class="se"></span>--set <span class="nv">operatorNamespace</span><span class="o">=</span><span class="nv">$operatorNamespace</span> rook-release/rook-ceph-cluster -f values-external.yaml
</span></span></code></pre></div><p>Logging the important nodes post helm installation (this is for personal reference. This part can be skipped):</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl"><span class="c1"># Operator installation</span>
</span></span><span class="line"><span class="cl"><span class="c1"># Important Notes:</span>
</span></span><span class="line"><span class="cl"><span class="c1"># - You must customize the &#39;CephCluster&#39; resource in the sample manifests for your cluster.</span>
</span></span><span class="line"><span class="cl"><span class="c1"># - Each CephCluster must be deployed to its own namespace, the samples use `rook-ceph` for the namespace.</span>
</span></span><span class="line"><span class="cl"><span class="c1"># - The sample manifests assume you also installed the rook-ceph operator in the `rook-ceph` namespace.</span>
</span></span><span class="line"><span class="cl"><span class="c1"># - The helm chart includes all the RBAC required to create a CephCluster CRD in the same namespace.</span>
</span></span><span class="line"><span class="cl"><span class="c1"># - Any disk devices you add to the cluster in the &#39;CephCluster&#39; must be empty (no filesystem and no partitions).</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1"># CephCluster installation</span>
</span></span><span class="line"><span class="cl"><span class="c1"># Important Notes:</span>
</span></span><span class="line"><span class="cl"><span class="c1"># - Visit https://rook.io/docs/rook/latest/CRDs/ceph-cluster-crd/ for more information about the Ceph CRD.</span>
</span></span><span class="line"><span class="cl"><span class="c1"># - You can only deploy a single cluster per namespace</span>
</span></span><span class="line"><span class="cl"><span class="c1"># - If you wish to delete this cluster and start fresh, you will also have to wipe the OSD disks using `sfdisk`</span>
</span></span></code></pre></div><p>In a few minutes you should be able to see a successful connection to your ceph cluster. You can also query the storageclasses created as part of the import script.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-fallback" data-lang="fallback"><span class="line"><span class="cl">❯ k -n rook-ceph-external get cephcluster
</span></span><span class="line"><span class="cl">NAME                 DATADIRHOSTPATH   MONCOUNT   AGE     PHASE       MESSAGE                          HEALTH      EXTERNAL   FSID
</span></span><span class="line"><span class="cl">rook-ceph-external   /var/lib/rook     3          2m53s   Connected   Cluster connected successfully   HEALTH_OK   true       31833ea9-541d-435d-96ef-e9c653e97d2b
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">❯ k get sc
</span></span><span class="line"><span class="cl">NAME       PROVISIONER                     RECLAIMPOLICY   VOLUMEBINDINGMODE   ALLOWVOLUMEEXPANSION   AGE
</span></span><span class="line"><span class="cl">ceph-rbd   rook-ceph.rbd.csi.ceph.com      Delete          Immediate           true                   3h10m
</span></span><span class="line"><span class="cl">cephfs     rook-ceph.cephfs.csi.ceph.com   Delete          Immediate           true                   3h10m
</span></span></code></pre></div><h2 id="installing-velero-on-cluster-1-and-cluster-2">Installing velero on Cluster 1 and Cluster 2</h2>
<p>For this I prepared five files:</p>
<ul>
<li>credentials-velero</li>
<li>velero-exclude-nfs-volumepolicy.yaml</li>
<li>minio-service-endpoint.yaml</li>
<li>1_velero-change-sc-configmap.yaml</li>
<li>2_velero-change-sc-cephfs-configmap.yaml</li>
</ul>
<p>Before installation, ensure that you have an S3-compatible storage for storing backups. I am using MinIO hosted on my Openmediavault NAS. To be able to reach this from within my cluster, I have to create a service and endpoint manually pointing to the IP and port that should be reachable by the nodes. You can follow this yaml file and change the IP and port whichever is necessary:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="nt">apiVersion</span><span class="p">:</span><span class="w"> </span><span class="l">v1</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="nt">kind</span><span class="p">:</span><span class="w"> </span><span class="l">Namespace</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="nt">metadata</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">external</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="nn">---</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="nt">apiVersion</span><span class="p">:</span><span class="w"> </span><span class="l">v1</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="nt">kind</span><span class="p">:</span><span class="w"> </span><span class="l">Service</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="nt">metadata</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">namespace</span><span class="p">:</span><span class="w"> </span><span class="l">external</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">minio</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="nt">spec</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">ports</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span>- <span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">app</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">port</span><span class="p">:</span><span class="w"> </span><span class="m">9000</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">protocol</span><span class="p">:</span><span class="w"> </span><span class="l">TCP</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">targetPort</span><span class="p">:</span><span class="w"> </span><span class="m">9000</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span>- <span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">web</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">port</span><span class="p">:</span><span class="w"> </span><span class="m">9001</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">protocol</span><span class="p">:</span><span class="w"> </span><span class="l">TCP</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">targetPort</span><span class="p">:</span><span class="w"> </span><span class="m">9001</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">clusterIP</span><span class="p">:</span><span class="w"> </span><span class="l">None</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">type</span><span class="p">:</span><span class="w"> </span><span class="l">ClusterIP</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="nn">---</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="nt">apiVersion</span><span class="p">:</span><span class="w"> </span><span class="l">v1</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="nt">kind</span><span class="p">:</span><span class="w"> </span><span class="l">Endpoints</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="nt">metadata</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">namespace</span><span class="p">:</span><span class="w"> </span><span class="l">external</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">minio</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="nt">subsets</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span>- <span class="nt">addresses</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span>- <span class="nt">ip</span><span class="p">:</span><span class="w"> </span><span class="m">10.0.0.8</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">ports</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span>- <span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">app</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">port</span><span class="p">:</span><span class="w"> </span><span class="m">9000</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">protocol</span><span class="p">:</span><span class="w"> </span><span class="l">TCP</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span>- <span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">web</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">port</span><span class="p">:</span><span class="w"> </span><span class="m">9001</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">protocol</span><span class="p">:</span><span class="w"> </span><span class="l">TCP</span><span class="w">
</span></span></span></code></pre></div><p>Next we install velero on both clusters. To connect to an S3-compatible we must install the velero plugin for AWS apart from specifying it as the provider.</p>
<p>Input your S3 access and secret key in the following format:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-fallback" data-lang="fallback"><span class="line"><span class="cl">[default]
</span></span><span class="line"><span class="cl">aws_access_key_id = inputs3SecretKeyIdHere
</span></span><span class="line"><span class="cl">aws_secret_access_key = inputs3SecretKeyHere
</span></span></code></pre></div><p>In the same install command we specify the backup location and point to the MinIO URL. We can follow the cluster FQDN format of <em>svc-name</em>.<em>namespace</em>.svc.cluster.local to reach this externally hosted service from within the pods.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sh" data-lang="sh"><span class="line"><span class="cl">velero install <span class="se">\
</span></span></span><span class="line"><span class="cl"><span class="se"></span>    --provider aws <span class="se">\
</span></span></span><span class="line"><span class="cl"><span class="se"></span>    --plugins velero/velero-plugin-for-aws:v1.9.1,velero/velero-plugin-for-csi:v0.7.0 <span class="se">\
</span></span></span><span class="line"><span class="cl"><span class="se"></span>    --use-node-agent --features<span class="o">=</span>EnableCSI <span class="se">\
</span></span></span><span class="line"><span class="cl"><span class="se"></span>    --bucket velero <span class="se">\
</span></span></span><span class="line"><span class="cl"><span class="se"></span>    --secret-file ./credentials-velero <span class="se">\
</span></span></span><span class="line"><span class="cl"><span class="se"></span>    --use-volume-snapshots<span class="o">=</span><span class="nb">false</span> <span class="se">\
</span></span></span><span class="line"><span class="cl"><span class="se"></span>    --backup-location-config <span class="nv">region</span><span class="o">=</span>minio,s3ForcePathStyle<span class="o">=</span><span class="s2">&#34;true&#34;</span>,s3Url<span class="o">=</span>http://minio.external.svc.cluster.local:9000
</span></span></code></pre></div><p>Execute <code>velero backup-location get</code> or <code>v backup-location get</code> if you also configured auto-completion:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-fallback" data-lang="fallback"><span class="line"><span class="cl">❯ v backup-location get
</span></span><span class="line"><span class="cl">NAME      PROVIDER   BUCKET/PREFIX   PHASE       LAST VALIDATED                  ACCESS MODE   DEFAULT
</span></span><span class="line"><span class="cl">default   aws        velero          Available   2024-03-23 08:15:37 +0900 JST   ReadWrite     true
</span></span></code></pre></div><h2 id="perform-backup-on-cluster-1">Perform backup on Cluster 1</h2>
<p>First we create a configmap to disable backup of NFS volumes. We do not want NFS volumes to be part of the backup since these are expected to be remounted later on.</p>
<p>Create a file with the following content:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="nt">version</span><span class="p">:</span><span class="w"> </span><span class="l">v1</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="nt">volumePolicies</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span>- <span class="nt">conditions</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="c"># nfs could be empty which matches any nfs volume source</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">nfs</span><span class="p">:</span><span class="w"> </span>{}<span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">action</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">type</span><span class="p">:</span><span class="w"> </span><span class="l">skip</span><span class="w">
</span></span></span></code></pre></div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-fallback" data-lang="fallback"><span class="line"><span class="cl">kubectl -n velero create cm exclude-nfs-volumepolicy --from-file velero-exclude-nfs-volumepolicy.yaml
</span></span></code></pre></div><p>Create two backups - one for all RWX volumes, and another one for all RWO volumes. In my case the only application that is using an RWX volume is adguard. I can specify the namespace here. I don&rsquo;t want the CRDs that are created dynamically by cilium so</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-zsh" data-lang="zsh"><span class="line"><span class="cl">v backup create all-rwx-fs-backup --default-volumes-to-fs-backup<span class="o">=</span><span class="nb">true</span> --exclude-resources storageclasses.storage.k8s.io,ciliumendpoints.cilium.io,ciliumidentities.cilium.io --include-namespaces adguard --resource-policies-configmap exclude-nfs-volumepolicy
</span></span></code></pre></div><p>For RWO volumes:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-zsh" data-lang="zsh"><span class="line"><span class="cl">v backup create all-rwo-fs-backup --default-volumes-to-fs-backup<span class="o">=</span><span class="nb">true</span> --exclude-namespaces kube-system,longhorn-system,velero,adguard --exclude-resources storageclasses.storage.k8s.io,ciliumendpoints.cilium.io,ciliumidentities.cilium.io,ciliumbgppeeringpolicies.cilium.io --resource-policies-configmap exclude-nfs-volumepolicy --parallel-files-upload <span class="m">10</span>
</span></span></code></pre></div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-fallback" data-lang="fallback"><span class="line"><span class="cl">❯ v backup get
</span></span><span class="line"><span class="cl">NAME                STATUS      ERRORS   WARNINGS   CREATED                         EXPIRES   STORAGE LOCATION   SELECTOR
</span></span><span class="line"><span class="cl">all-rwo-fs-backup   Completed   0        5          2024-03-23 09:31:50 +0900 JST   29d       default            &lt;none&gt;
</span></span><span class="line"><span class="cl">all-rwx-fs-backup   Completed   0        0          2024-03-23 09:31:28 +0900 JST   29d       default            &lt;none&gt;
</span></span></code></pre></div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-fallback" data-lang="fallback"><span class="line"><span class="cl">❯ k get pv
</span></span><span class="line"><span class="cl">NAME                   CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                              STORAGECLASS   REASON   AGE
</span></span><span class="line"><span class="cl">adguard-conf-pv        1Gi        RWX            Retain           Bound    adguard/adguard-conf-pvc           longhorn                20d
</span></span><span class="line"><span class="cl">adguard-work-pv        2Gi        RWX            Retain           Bound    adguard/adguard-work-pvc           longhorn                20d
</span></span><span class="line"><span class="cl">code-server-pv         500Mi      RWO            Retain           Bound    vsc/code-server-config-pvc         longhorn                20d
</span></span><span class="line"><span class="cl">dashy-config-pv        200Mi      RWO            Retain           Bound    dashy/dashy-config-pvc             longhorn                20d
</span></span><span class="line"><span class="cl">grafana-pv             10Gi       RWO            Retain           Bound    grafana/grafana-pvc                longhorn                20d
</span></span><span class="line"><span class="cl">influxdb-data-pv       10Gi       RWO            Retain           Bound    influxdb/influxdb-data-pvc         longhorn                20d
</span></span><span class="line"><span class="cl">jellyseerr-config-pv   512Mi      RWO            Retain           Bound    jellyseerr/jellyseerr-config-pvc   longhorn                20d
</span></span><span class="line"><span class="cl">radarr-config-pv       1Gi        RWO            Retain           Bound    radarr/radarr-config-pvc           longhorn                20d
</span></span><span class="line"><span class="cl">sabnzbd-config-pv      512Mi      RWO            Retain           Bound    sabnzbd/sabnzbd-config-pvc         longhorn                20d
</span></span><span class="line"><span class="cl">sonarr-config-pv       1Gi        RWO            Retain           Bound    sonarr/sonarr-config-pvc           longhorn                20d
</span></span><span class="line"><span class="cl">wikijs-config-pv       512Mi      RWO            Retain           Bound    wikijs/wikijs-config-pvc           longhorn                20d
</span></span><span class="line"><span class="cl">wikijs-data-pv         10Gi       RWO            Retain           Bound    wikijs/wikijs-data-pvc             longhorn                20d
</span></span></code></pre></div><h1 id="perform-restoration-on-cluster-2">Perform restoration on Cluster 2</h1>
<p>Verify the same backups are reflected. Patch the backup-location as read-only to avoid overwriting the backups:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-zsh" data-lang="zsh"><span class="line"><span class="cl">kubectl patch backupstoragelocation default <span class="se">\
</span></span></span><span class="line"><span class="cl"><span class="se"></span>    --namespace velero <span class="se">\
</span></span></span><span class="line"><span class="cl"><span class="se"></span>    --type merge <span class="se">\
</span></span></span><span class="line"><span class="cl"><span class="se"></span>    --patch <span class="s1">&#39;{&#34;spec&#34;:{&#34;accessMode&#34;:&#34;ReadOnly&#34;}}&#39;</span>
</span></span></code></pre></div><p>First we will restore all RWO volumes. Since we are changing the storageclass from longhorn to ceph-rbd, we have to create a configmap to allow this conversion. Apply 1_velero-change-sc-configmap.yaml:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="nt">apiVersion</span><span class="p">:</span><span class="w"> </span><span class="l">v1</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="nt">kind</span><span class="p">:</span><span class="w"> </span><span class="l">ConfigMap</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="nt">metadata</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">change-storage-class-config</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">namespace</span><span class="p">:</span><span class="w"> </span><span class="l">velero</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">labels</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">velero.io/plugin-config</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;&#34;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">velero.io/change-storage-class</span><span class="p">:</span><span class="w"> </span><span class="l">RestoreItemAction</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="nt">data</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">longhorn</span><span class="p">:</span><span class="w"> </span><span class="l">ceph-rbd</span><span class="w">
</span></span></span></code></pre></div><p>Execute the restoration:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-fallback" data-lang="fallback"><span class="line"><span class="cl">v restore create --from-backup all-rwo-fs-backup
</span></span></code></pre></div><p>Wait for the pods to go up and running.

    <img src="/posts/k8s-with-external-ceph-disaster-recovery-and-storageclass-migration/watchpods.png" alt="Watch kubectl get pods">
  </p>
<p>Next we update the config map to migrate longhorn storageclasses to cephfs. Apply 2_velero-change-sc-cephfs-configmap.yaml</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="nt">apiVersion</span><span class="p">:</span><span class="w"> </span><span class="l">v1</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="nt">kind</span><span class="p">:</span><span class="w"> </span><span class="l">ConfigMap</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="nt">metadata</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">change-storage-class-config</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">namespace</span><span class="p">:</span><span class="w"> </span><span class="l">velero</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">labels</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">velero.io/plugin-config</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;&#34;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">velero.io/change-storage-class</span><span class="p">:</span><span class="w"> </span><span class="l">RestoreItemAction</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="nt">data</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">longhorn</span><span class="p">:</span><span class="w"> </span><span class="l">cephfs</span><span class="w">
</span></span></span></code></pre></div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-fallback" data-lang="fallback"><span class="line"><span class="cl">v restore create --from-backup all-rwx-fs-backup
</span></span></code></pre></div><p>You can check the restoration status with</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-fallback" data-lang="fallback"><span class="line"><span class="cl">v restore get
</span></span></code></pre></div><p>
    <img src="/posts/k8s-with-external-ceph-disaster-recovery-and-storageclass-migration/restorecomplete.png" alt="Restore complete">
  </p>
<p>Have a look on your PVs and make sure everything is there. Check the storageclass as well.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-fallback" data-lang="fallback"><span class="line"><span class="cl">❯ k get pv
</span></span><span class="line"><span class="cl">NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                              STORAGECLASS   REASON   AGE
</span></span><span class="line"><span class="cl">pvc-002f450b-1765-4a2d-8900-916f83ac4fec   1Gi        RWO            Delete           Bound    sonarr/sonarr-config-pvc           ceph-rbd                168m
</span></span><span class="line"><span class="cl">pvc-5613d3f3-473a-47ed-8e3f-8bedd3cb36c4   10Gi       RWO            Delete           Bound    grafana/grafana-pvc                ceph-rbd                168m
</span></span><span class="line"><span class="cl">pvc-5d088e43-fd9e-4e08-ac4c-c095a681b18b   500Mi      RWO            Delete           Bound    vsc/code-server-config-pvc         ceph-rbd                168m
</span></span><span class="line"><span class="cl">pvc-80d7d785-9c82-4995-808e-dfb7985c910e   512Mi      RWO            Delete           Bound    jellyseerr/jellyseerr-config-pvc   ceph-rbd                168m
</span></span><span class="line"><span class="cl">pvc-853f057a-c851-474d-8825-3bfbe2eb6573   10Gi       RWO            Delete           Bound    wikijs/wikijs-data-pvc             ceph-rbd                168m
</span></span><span class="line"><span class="cl">pvc-8f5a7b54-cffa-4453-909a-519e9dc74f83   512Mi      RWO            Delete           Bound    wikijs/wikijs-config-pvc           ceph-rbd                168m
</span></span><span class="line"><span class="cl">pvc-a5734f5f-7f10-4e22-bcb0-9c2ab4d95023   10Gi       RWO            Delete           Bound    influxdb/influxdb-data-pvc         ceph-rbd                168m
</span></span><span class="line"><span class="cl">pvc-b9a70ed6-1a60-4e2b-a1fe-1de19d64c1f5   512Mi      RWO            Delete           Bound    sabnzbd/sabnzbd-config-pvc         ceph-rbd                168m
</span></span><span class="line"><span class="cl">pvc-c7d935b3-2194-43ac-96b7-5eb184cb716a   200Mi      RWO            Delete           Bound    dashy/dashy-config-pvc             ceph-rbd                168m
</span></span><span class="line"><span class="cl">pvc-f0d2af55-b1d9-4d76-b929-03007c787529   1Gi        RWO            Delete           Bound    radarr/radarr-config-pvc           ceph-rbd                168m
</span></span><span class="line"><span class="cl">pvc-f7e848cb-4ce8-47c9-8a81-65775fd1c747   1Gi        RWX            Delete           Bound    adguard/adguard-conf-pvc           cephfs                  161m
</span></span><span class="line"><span class="cl">pvc-fa9a9b75-48e8-48f0-8e53-75105b80b913   2Gi        RWX            Delete           Bound    adguard/adguard-work-pvc           cephfs                  161m
</span></span></code></pre></div><p>We should be able to see ceph logs by now continuously generating:</p>
<p>Ceph logs:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-fallback" data-lang="fallback"><span class="line"><span class="cl">2024-03-24T10:25:57.822352+0900 mgr.pve2 (mgr.1064599) 46085 : cluster 0 pgmap v46023: 97 pgs: 97 active+clean; 3.4 GiB data, 14 GiB used, 2.7 TiB / 2.7 TiB avail; 17 KiB/s wr, 2 op/s
</span></span><span class="line"><span class="cl">2024-03-24T10:25:59.822721+0900 mgr.pve2 (mgr.1064599) 46086 : cluster 0 pgmap v46024: 97 pgs: 97 active+clean; 3.4 GiB data, 14 GiB used, 2.7 TiB / 2.7 TiB avail; 19 KiB/s wr, 3 op/s
</span></span><span class="line"><span class="cl">2024-03-24T10:26:01.823161+0900 mgr.pve2 (mgr.1064599) 46087 : cluster 0 pgmap v46025: 97 pgs: 97 active+clean; 3.4 GiB data, 14 GiB used, 2.7 TiB / 2.7 TiB avail; 27 KiB/s wr, 4 op/s
</span></span><span class="line"><span class="cl">2024-03-24T10:26:03.823408+0900 mgr.pve2 (mgr.1064599) 46088 : cluster 0 pgmap v46026: 97 pgs: 97 active+clean; 3.4 GiB data, 14 GiB used, 2.7 TiB / 2.7 TiB avail; 19 KiB/s wr, 3 op/s
</span></span><span class="line"><span class="cl">2024-03-24T10:26:05.823949+0900 mgr.pve2 (mgr.1064599) 46089 : cluster 0 pgmap v46027: 97 pgs: 97 active+clean; 3.4 GiB data, 14 GiB used, 2.7 TiB / 2.7 TiB avail; 29 KiB/s wr, 4 op/s
</span></span><span class="line"><span class="cl">2024-03-24T10:26:07.824176+0900 mgr.pve2 (mgr.1064599) 46090 : cluster 0 pgmap v46028: 97 pgs: 97 active+clean; 3.4 GiB data, 14 GiB used, 2.7 TiB / 2.7 TiB avail; 22 KiB/s wr, 2 op/s
</span></span><span class="line"><span class="cl">2024-03-24T10:26:09.824519+0900 mgr.pve2 (mgr.1064599) 46091 : cluster 0 pgmap v46029: 97 pgs: 97 active+clean; 3.4 GiB data, 14 GiB used, 2.7 TiB / 2.7 TiB avail; 23 KiB/s wr, 3 op/s
</span></span><span class="line"><span class="cl">2024-03-24T10:26:11.824919+0900 mgr.pve2 (mgr.1064599) 46092 : cluster 0 pgmap v46030: 97 pgs: 97 active+clean; 3.4 GiB data, 14 GiB used, 2.7 TiB / 2.7 TiB avail; 51 KiB/s wr, 6 op/s
</span></span><span class="line"><span class="cl">2024-03-24T10:26:13.825139+0900 mgr.pve2 (mgr.1064599) 46093 : cluster 0 pgmap v46031: 97 pgs: 97 active+clean; 3.4 GiB data, 14 GiB used, 2.7 TiB / 2.7 TiB avail; 41 KiB/s wr, 5 op/s
</span></span><span class="line"><span class="cl">2024-03-24T10:26:15.825790+0900 mgr.pve2 (mgr.1064599) 46094 : cluster 0 pgmap v46032: 97 pgs: 97 active+clean; 3.4 GiB data, 14 GiB used, 2.7 TiB / 2.7 TiB avail; 53 KiB/s wr, 7 op/s
</span></span><span class="line"><span class="cl">2024-03-24T10:26:17.826087+0900 mgr.pve2 (mgr.1064599) 46095 : cluster 0 pgmap v46033: 97 pgs: 97 active+clean; 3.4 GiB data, 14 GiB used, 2.7 TiB / 2.7 TiB avail; 42 KiB/s wr, 6 op/s
</span></span><span class="line"><span class="cl">2024-03-24T10:26:19.826454+0900 mgr.pve2 (mgr.1064599) 46096 : cluster 0 pgmap v46034: 97 pgs: 97 active+clean; 3.4 GiB data, 14 GiB used, 2.7 TiB / 2.7 TiB avail; 44 KiB/s wr, 6 op/s
</span></span><span class="line"><span class="cl">2024-03-24T10:26:21.826811+0900 mgr.pve2 (mgr.1064599) 46097 : cluster 0 pgmap v46035: 97 pgs: 97 active+clean; 3.4 GiB data, 14 GiB used, 2.7 TiB / 2.7 TiB avail; 50 KiB/s wr, 7 op/s
</span></span><span class="line"><span class="cl">2024-03-24T10:26:23.827027+0900 mgr.pve2 (mgr.1064599) 46098 : cluster 0 pgmap v46036: 97 pgs: 97 active+clean; 3.4 GiB data, 14 GiB used, 2.7 TiB / 2.7 TiB avail; 21 KiB/s wr, 3 op/s
</span></span><span class="line"><span class="cl">2024-03-24T10:26:25.827427+0900 mgr.pve2 (mgr.1064599) 46099 : cluster 0 pgmap v46037: 97 pgs: 97 active+clean; 3.4 GiB data, 14 GiB used, 2.7 TiB / 2.7 TiB avail; 30 KiB/s wr, 4 op/s
</span></span></code></pre></div><h1 id="migrate-the-bgp-peers">Migrate the BGP peers</h1>
<p>The last but not the least is to remove the BGP peering with the old cluster and configure on the new one! If you are also using OPNsense and given you have the BGP peer configuration resources already created in the cluster, theh you just have to clone the existing BGP configuration (on OPNsense) for each of the old nodes, change the ASN number, and update the router ID to be in the new VLAN.

    <img src="/posts/k8s-with-external-ceph-disaster-recovery-and-storageclass-migration/bgp.png" alt="OPNsense BGP peer configuration">
  </p>
<h1 id="onwards-with-ceph">Onwards with Ceph</h1>
<p>This sums up how I was able to migrate the storageclass from Longhorn to Ceph by performing disaster recovery to a new cluster. In the past I&rsquo;ve done a similar kind of operationbut by simply restoring longhorn volumes with the same storageclass. It was fairly easy and longhorn even allows you to use the last PV and PVC name used. The backups can be stored in either NFS or S3 compatible storage as well. But with Velero it just feels a lot more straightforward. Since operation is via CLI, it allows you to prepare backup and restore commands that can be kept as a playbook for future use. By simply copy-pasting you will be able to restore your entire cluster without having to navigate in the GUI. Though don&rsquo;t get me wrong - I am only saying this in relation to backup and restore. I haven&rsquo;t even operated Longhorn long enough!

    <img src="/posts/k8s-with-external-ceph-disaster-recovery-and-storageclass-migration/ceph-pve-1.png" alt="Ceph dashboard on PVE">
  

    <img src="/posts/k8s-with-external-ceph-disaster-recovery-and-storageclass-migration/ceph-pve-2.png" alt="Ceph dashboard on PVE">
  </p>
<p>Talking about Ceph, I don&rsquo;t really have anything to say yet. I just know it has long been established as the grandfather of distributed storage. The first time I came across it was at work when we started to face DB latency issues which seemed to be due to the bluestore cache and OSD memory facing some kind of memory leak. Personally I just started using it as the storage backend of my RKE2 cluster and so far connecting to it with Rook has been a breeze. With very minimal workload, I still look forward to any new learning I can get running my small cluster at home.</p>
]]></content:encoded>
    </item>
    
  </channel>
</rss>
