Coding Blocks

Coding Blocks

Pragmatic talk about software design best practices: design patterns, software architecture, coding for performance, object oriented programming, database design and implementation, tips, tricks and a whole lot more.

All Episodes

We wrap up the discussion on partitioning from our collective favorite book, Designing Data-Intensive Applications, while Allen is properly substituted, Michael can’t stop thinking about Kafka, and Joe doesn’t live in the real sunshine state. The full show notes for this episode are available at https://www.codingblocks.net/episode172. Sponsors Datadog – Sign up today for a free 14 day trial and get a free Datadog t-shirt after creating your first dashboard. Linode – Sign up for $100 in free credit and simplify your infrastructure with Linode’s Linux virtual machines. Survey Says How many different data storage technologies do you use for your day job? Take the survey at: https://www.codingblocks.net/episode172. News Game Ja Ja Ja Jam is coming up, sign up is open now! (itch.io) Joe finished the Create With Code Unity Course (learn.unity.com) New MacBook Pro Review, notch be darned! Last Episode … Best book evar! In our previous episode, we talked about data partitioning, which refers to how you can split up data sets, which is great when you have data that’s too big to fit on a single machine, or you have special performance requirements. We talked about two different partitioning strategies: key ranges which works best with homogenous, well-balanced keys, and also hashing which provides a much more even distribution that helps avoid hot-spotting. This episode we’re continuing the discussion, talking about secondary indexes, rebalancing, and routing. Partitioning, Part Deux Partitioning and Secondary Indexes Last episode we talked about key range partitioning and key hashing to deterministically figure out where data should land based on a key that we chose to represent our data. But what happens if you need to look up data by something other than the key? For example, imagine you are partitioning credit card transactions by a hash of the date. If I tell you I need the data for last week, then it’s easy, we hash the date for each day in the week. But what happens if I ask you to count all the transactions for a particular credit card? You have to look at every single record. in every single partition! Secondary Indexes refer to metadata about our data that help keep track of where our data is. In our example about counting a user’s transactions in a data set that is partitioned by date, we could keep a separate data structure that keeps track of which partitions each user has data in. We could even easily keep a count of those transactions so that you could return the count of a user’s transaction solely from the information in the secondary index. Secondary indexes are complicated. HBase and Voldemort avoid them, while search engines like Elasticsearch specialize in them. There are two main strategies for secondary indexes: Document based partitioning, and Term based partitioning. Document Based Partitioning Remember our example dataset of transactions partitioned by date? Imagine now that each partition keeps a list of each user it holds, as well as the key for the transaction. When you query for users, you simply ask each partition for the keys for that user. Counting is easy and if you need the full record, then you know where the key is in the partition. Assuming you store the data in the partition ordered by key, it’s a quick lookup. Remember Big O? Finding an item in an ordered list is O(log n). Which is much, much, much faster than looking at every row in every partition, which is O(n). We have to take a small performance hit when we insert (i.e. write) new items to the index, but if it’s something you query often it’s worth it. Note that each partition only cares about the data they store, they don’t know anything about what the other partitions have. Because of that, we call it a local index. Another name for this type of approach is “scatter/gather”: the data is scattered as you write it and gathered up again when you need it. This is especially nice when you have data retention rules. If you partition by date and only keep 90 days worth of data, you can simply drop old partitions and the secondary index data goes with them. Term Based Partitioning If we are willing to make our writes a little more complicated in exchange for more efficient reads, we can step up to term based partitioning. One problem with having each partition keeping track of their local data is you have to query all the partitions. What if the data’s only on one partition? Our client still needs to wait to hear back from all partitions before returning the result. What if we pulled the index data away from the partitions to a separate system? Now we check this secondary index to figure out the keys, which we can then go look up on the appropriate indices. We can go one step further and partition this secondary index so it scales better. For example, userId 1-100 might be on one, 101-200 on another, etc. The benefit of term based partitioning is you get more efficient reads, the downside is that you are now writing to multiple spots: the node the data lives on and any partitions in our indexing system that we need to account for any secondary indexes. And this is multiplied by replication. This is usually handled by asynchronous writes that are eventually consistent. Amazon’s DynamoDB states it’s global secondary indexes are updated within a fraction of a second normally. Rebalancing Partitions What do you do if you need to repartition your data, maybe because you’re adding more nodes for CPU, RAM, or losing nodes? Then it’s time to rebalance your partitions, with the goals being to … Distribute the load equally-ish (notice we didn’t say data, could have some data that is more important or mismatched nodes), Keep the database operational during the rebalance procedure, and Minimize data transfer to keep things fast and reduce strain on the system. Here’s how not to do it: hash % (number of nodes) Imagine you have 100 nodes, a key of 1000 hashes to 0. Going to 99 nodes, that same key now hashes to 1, 102 nodes and it now hashes to 4 … it’s a lot of change for a lot of keys. Partitions > Nodes You can mitigate this problem by fixing the number of partitions to a value higher than the number of nodes. This means you move where the partitions go, not the individual keys. Same recommendation applies to Kafka: keep the numbers of partitions high and you can change nodes. In our example of partitioning data by date, with a 7 years retention period, rebalancing from 10 nodes to 11 is easy. What if you have more nodes than partitions, like if you had so much data that a single day was too big for a node given the previous example? It’s possible, but most vendors don’t support it. You’ll probably want to choose a different partitioning strategy. Can you have too many partitions? Yes! If partitions are large, rebalancing and recovering from node failures is expensive. On the other hand, there is overhead for each partition, so having many, small partitions is also expensive. Other methods of partitioning Dynamic partitioning: It’s hard to get the number of partitions right especially with data that changes it’s behavior over time. There is no magic algorithm here. The database just handles repartitioning for you by splitting large partitions. Databases like HBase and RethinkDB create partitions dynamically, while Mongo has an option for it. Partitioning proportionally to nodes: Cassandra and Ketama can handle partitioning for you, based on the number of nodes. When you add a new node it randomly chooses some partitions to take ownership of. This is really nice if you expect a lot of fluctuation in the number of nodes. Automated vs Manual Rebalancing We talked about systems that automatically rebalance, which is nice for systems that need to scale fast or have workloads that are homogenized. You might be able to do better if you are aware of the patterns of your data or want to control when these expensive operations happen. Some systems like Couchbase, Riak, and Voldemort will suggest partition assignment, but require an administrator to kick it off. But why? Imagine launching a large online video game and taking on tons of data into an empty system … there could be a lot of rebalancing going on at a terrible time. It would have been much better if you could have pre-provisioned ahead of time … but that doesn’t work with dynamic scaling! Request Routing One last thing … if we’re dynamically adding nodes and partitions, how does a client know who to talk to? This is an instance of a more general problem called “service discovery”. There are a couple ways to solve this: The nodes keep track of each other. A client can talk to any node and that node will route them anywhere else they need to go. Or a centralized routing service that the clients know about, and it knows about the partitions and nodes, and routes as necessary. Or require that clients be aware of the partitioning and node data. No matter which way you go, partitioning and node changes need to be applied. This is notoriously difficult to get right and REALLY bad to get wrong. (Imagine querying the wrong partitions …) Apache ZooKeeper is a common coordination service used for keeping track of partition/node mapping. Systems check in or out with ZooKeeper and ZooKeeper notifies the routing tier. Kafka (although not for much longer), Solr, HBase, and Druid all use ZooKeeper. MongoDb uses a custom ConfigServer that is similar. Cassandra and Riak use a “gossip protocol” that spreads the work out across the nodes. Elasticsearch has different roles that nodes can have, including data, ingestion and … you guessed it, routing. Parallel Query Execution So far we’ve mostly talked about simple queries, i.e. searching by key or by secondary index … the kinds of queries you would be running in NoSQL type situations. What about? Massively Parallel Processing (MPP) relational databases that are known for having complex join, filtering, aggregations? The query optimizer is responsible for breaking down these queries into stages which target primary/secondary indexes when possible and run these stages in parallel, effectively breaking down the query into subqueries which are then joined together. That’s a whole other topic, but based on the way we talked about primary/secondary indexes today you can hopefully have a better understanding of how the query optimizer does that work. It splits up the query you give it into distinct tasks, each of which could run across multiple partitions/nodes, runs them in parallel, and then aggregates the results. Designing Data-Intensive Applications goes into it in more depth in future chapters while discussing batch processing. Resources We Like Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems by Martin Kleppmann (Amazon) Tip of the Week PowerLevel10k is a Zsh “theme” that adds some really nice features and visual candy. It’s highly customizable and works great with Kubernetes and Git. (GitHub) If for some reason VS Code isn’t in your path, you can add it easily within VS Code. Open up the command palette (CTRL+SHIFT+P / COMMAND+SHIFT+P) and search for “path”. Easy peasy! Gently Down the Stream is a guidebook to Apache Kafka written and illustrated in the style of a children’s book. Really neat way to learn! (GentlyDownThe.Stream) PostgreSQL is one of the most powerful and versatile databases. Here is a list of really cool things you can do with it that you may not expect. (HakiBenita.com) Check out PowerLevel10k

Nov 22

2 hr 19 min

We crack open our favorite book again, Designing Data-Intensive Applications by Martin Kleppmann, while Joe sounds different, Michael comes to a sad realization, and Allen also engages “no take backs”. The full show notes for this episode are available at https://www.codingblocks.net/episode171. Sponsors Datadog – Sign up today for a free 14 day trial and get a free Datadog t-shirt after creating your first dashboard. Linode – Sign up for $100 in free credit and simplify your infrastructure with Linode’s Linux virtual machines. Survey Says Have you ever had to partition your data? Take the survey at: https://www.codingblocks.net/episode171. News Thank you for the review! iTunes: Wohim321 Best book evar! The Whys and Hows of Partitioning Data Partitioning is known by different names in different databases: Shard in MongoDB, ElasticSearch, SolrCloud, Region in HBase, Tablet in BigTable, vNode in Cassandra and Riak, vBucket in CouchBase. What are they? In contrast to the replication we discussed, partitioning is spreading the data out over multiple storage sections either because all the data won’t fit on a single storage mechanism or because you need faster read capabilities. Typically data records are stored on exactly one partition (record, row, document). Each partition is a mini database of its own. Why partition? Scalability Different partitions can be put on completely separate nodes. This means that large data sets can be spread across many disks, and queries can be distributed across many processors. Each node executes queries for its own partition. For more processing power, spread the data across more nodes. Examples of these are NoSQL databases and Hadoop data warehouses. These can be set up for either analytic or transactional workloads. While partitioning means that records belong to a single partition, those partitions can still be replicated to other nodes for fault tolerance. A single node may store more than one partition. Nodes can also be a leader for some partitions and a follower for others. They noted that the partitioning scheme is mostly independent of the replication used. Figure 6-1 in the book shows this leader / follower scheme for partitioning among multiple nodes. The goal in partitioning is to try and spread the data around as evenly as possible. If data is unevenly spread, it is called skewed. Skewed partitioning is less effective as some nodes work harder while others are sitting more idle. Partitions with higher than normal loads are called hot spots. One way to avoid hot-spotting is putting data on random nodes. Problem with this is you won’t know where the data lives when running queries, so you have to query every node, which is not good. Partitioning by Key Range Assign a continuous range of keys on a particular partition. Just like old encyclopedias or even the rows of shelves in a library. By doing this type of partitioning, your database can know which node to query for a specific key. Partition boundaries can be determined manually or they can be determined by the database system. Automatic partition is done by BigTable, HBase, RethinkDB, and MongoDB. The partitions can keep the keys sorted which allow for fast lookups. Think back to the SSTables and LSM Trees. They used the example of using timestamps as the key for sensor data – ie YY-MM-DD-HH-MM. The problem with this is this can lead to hot-spotting on writes. All other nodes are sitting around doing nothing while the node with today’s partition is busy. One way they mentioned you could avoid this hot-spotting is maybe you prefix the timestamp with the name of the sensor, which could balance writing to different nodes. The downside to this is now if you wanted the data for all the sensors you’d have to issue separate range queries for each sensor to get that time range of data. Some databases attempt to mitigate the downsides of hot-spotting. For example, Elastic has the ability specify an index lifecycle that can move data around based on the key. Take the sensor example for instance, new data comes in but the data is rarely old. Depending on the query patterns it may make sense to move older data to slower machines to save money as time marches on. Elastic uses a temperature analogy allowing you to specify policies for data that is hot, warm, cold, or frozen. Partitioning by Hash of the Key To avoid the skew and hot-spot issues, many data stores use the key hashing for distributing the data. A good hashing function will take data and make it evenly distributed. Hashing algorithms for the sake of distribution do not need to be cryptographically strong. Mongo uses MD5. Cassandra uses Murmur3. Voldemort uses Fowler-Noll-Vo. Another interesting thing is not all programming languages have suitable hashing algorithms. Why? Because the hash will change for the same key. Java’s object.hashCode() and Ruby’s Object#hash were called out. Partition boundaries can be set evenly or done pseudo-randomly, aka consistent hashing. Consistent hashing doesn’t work well for databases. While the hashing of keys buys you good distribution, you lose the ability to do range queries on known nodes, so now those range queries are run against all nodes. Some databases don’t even allow range queries on the primary keys, such as Riak, Couchbase, and Voldemort. Cassandra actually does a combination of keying strategies. They use the first column of a compound key for hashing. The other columns in the compound key are used for sorting the data. This means you can’t do a range query over the first portion of a key, but if you specify a fixed key for the first column you can do a range query over the other columns in the compound key. An example usage would be storing all posts on social media by the user id as the hashing column and the updated date as the additional column in the compound key, then you can quickly retrieve all posts by the user using a single partition. Hashing is used to help prevent hot-spots but there are situations where they can still occur. Popular social media personality with millions of followers may cause unusual activity on a partition. Most systems cannot automatically handle that type of skew. In the case that something like this happens, it’s up to the application to try and “fix” the skew. One example provided in the book included appending a random 2 digit number to the key would spread that record out over 100 partitions. Again, this is great for spreading out the writes, but now your reads will have to issue queries to 100 different partitions. Couple examples: Sensor data: as new readings come in, users can view real-time data and pull reports of historical data, Multi-tenant / SAAS platforms, Giant e-commerce product catalog, Social media platform users, such as Twitter and Facebook. The first Google computer at Stanford was housed in custom-made enclosures constructed from Mega Blocks. (Wikipedia) Resources We Like Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems by Martin Kleppmann (Amazon) History of Google (Wikipedia) Tip of the Week VS Code lets you open the search results in an editor instead of the side bar, making it easier to share your results or further refine them with something like regular expressions. Apple Magic Keyboard (for iPad Pro 12.9-inch – 5th Generation) is on sale on Amazon. Normally $349, now $242.99 on Amazon and Best Buy usually matches Amazon.(Amazon) Compatible Devices: iPad Pro 12.9-inch (5th generation), iPad Pro 12.9-inch (4th generation), iPad Pro 12.9-inch (3rd generation) Room EQ Wizard is free software for room acoustic, loudspeaker, and audio device measurements. (RoomEQWizard.com)

Nov 8

1 hr 39 min

The Mathemachicken strikes again for this year’s shopping spree, while Allen just realized he was under a rock, Joe engages “no take backs”, and Michael ups his decor game. The full show notes for this episode are available at https://www.codingblocks.net/episode170. Sponsors Datadog – Sign up today for a free 14 day trial and get a free Datadog t-shirt after creating your first dashboard. Survey Says What's your favorite feature on the new MacBook Pro? Take the survey at: https://www.codingblocks.net/episode170. News Thank you to everyone that left a review! iTunes: BoldAsLove88 Audible: Tammy Joe’s List Price Description   “Fun” Answers $3,499.95 Jura x8 (Williams and Sonoma) $3,499.00 2021 Macbook Pro – 16″ Screen, M1 Max, 32GB RAM, 1TB drive (Apple) Robotics $359.99 Lego Mindstorms (Amazon) $149.99 Sphero BOLT (Amazon) Entertainment $499.99 Xbox Series X (Microsoft) $180 Game Pass (Microsoft) $179.00 Play Date (Website) Health $929.99 Trek Dual Sport 3 (Trek) $179.95 Fitbit Charge 5 (Amazon) $921.22 Dorito Dust Supplies (Recipe) Levelling Up $199 / year Educative.io (Website) $159 / year LeetCode Subscription (Website) $99 ACM Subscription (Sign Up) Allen’s List   Description Price Honorable mention: Steam Deck (Steam) $399.00 Honorable mention: Microsoft Surface Laptop Studio 14.4″ (Amazon) $2,700.00 LG 48″ C1 OLED TV (Amazon) $1,297.00 Honorable mention: Aorus 48″ OLED Gaming Monitor (Newegg) $1,500.00 HTC Vive Pro 2 (Amazon) $799.00 Valve Index Controllers (Steam/Valve) $279.00 Kinesis Advantage 2 (Amazon) $339.00 Corsair MP600 NVME PCIE x4 2TB (Amazon) $240.00 Arduino Ultimate Starter Kit (Amazon) $63.00 Michael’s List Price Description   My smart home can beat up your smart home $14.99 Kasa Smart Light Switch HS200 (Amazon) $16.99 Kasa Smart Dimmer Switch HS220 (Amazon) $26.99 Kasa Smart Plug Mini 15A 4-Pack EP10P4 (Amazon) $17.99 Kasa Outdoor Smart Plug with 2 Sockets EP40 (Amazon) For my health $529.00 Apple Watch Series 7 GPS + Cellular (Amazon) Need moar power! $34.00 Apple MagSafe Charger (Amazon) $12.99 elago W6 Apple Watch Stand (Amazon) $10.99 Honorable mention: elago W3 Apple Watch Stand (Amazon) $29.00 Honorable mention: Apple Watch Magnetic Charging Cable (0.3m) (Amazon) When I lose my stuff $98.99 Apple AirTag 4 Pack (Amazon) $10.99 Protective Case for Airtags (Amazon) $14.88 Honorable mention: Air Tags Airtag Holder for Dogs/Cat Pet Collar (Amazon) I need to get some work done $180.00 Code V3 104-Key Illuminated Mechanical Keyboard (Amazon) $169.00 Honorable mention: Das Keyboard 4 Professional Wired Mechanical Keyboard (Amazon) $280.00 Honorable mention: Drop SHIFT Mechanical Keyboard (Amazon) $240.00 Honorable mention: Drop CTRL Mechanical Keyboard (Amazon) If you insist on an ergo keyboard $199.00 Honorable mention: KINESIS GAMING Freestyle Edge RGB Split Mechanical Keyboard (Amazon) Turns out, keycaps matter $29.99 Honorable mention: Razer Doubleshot PBT Keycap Upgrade Set (Amazon) $24.99 Honorable mention: HyperX Pudding Keycaps (Amazon) Things I need to buy again $19.99 HyperX Wrist Rest (Amazon) $28.99 Honorable mention: Glorious Gaming Wrist Pad/Rest (Amazon) $34.99 Honorable mention: Razer Ergonomic Wrist Rest Pro (Amazon) When things go wrong $69.99 iFixit Pro Tech Toolkit (Amazon) $64.99 Honorable mention: iFixit Manta Driver Kit (Amazon) For all your calling needs $599.00 Rode RODECaster Pro Podcast Production Studio (Amazon) $549.99 Honorable mention: Zoom PodTrak P8 Podcast Recorder (Amazon) $12.95 On-Stage DS7100B Desktop Microphone Stand (Amazon) $199.99 Elgato Ring Light (Amazon) $159.99 Elgato HD60 S+ Capture Card (Amazon) Music to your ears $148.49 Kali Audio LP-6 Studio Monitor (Amazon) $189.00 Honorable mention: KRK RP5 Rokit G4 Studio Monitor (Amazon) $379.99 Honorable mention: Yamaha HS7I Studio Monitor (Amazon) $199.99 Honorable mention: ADAM Audio T5V Two-Way Active Nearfield Monitor (Amazon) $155.00 Honorable mention: JBL Professional Studio Monitor (305PMKII) (Amazon) $599.00 Kali Audio WS-12 12 inch Powered Subwoofer (Sweetwater) $65.00 Palmer Audio Interface (PMONICON) (Amazon) $169.99 Honorable mention: Focusrite Scarlett 2i2 (3rd Gen) USB Audio Interface (Amazon) For the decor $34.99 Dumb and Dumber Canvas (Amazon) $34.99 Honorable mention: The Big Lebowski Canvas (Amazon) $34.99 Honorable mention: Pulp Fiction Canvas (Amazon) $34.99 Honorable mention: Friday Canvas (Amazon) $34.99 Honorable mention: Jurassic Park (Amazon) $34.99 Honorable mention: Bridesmaids Canvas (Amazon) $34.99 Honorable mention: There’s Something About Mary (Amazon) Resources We Like Security Now 834, Life: Hanging By A Pin (Twit.tv) Buyer Beware: Crucial Swaps P2 SSD’s TLC NAND for Slower Chips (ExtremeTech.com) Samsung Is the Latest SSD Manufacturer Caught Cheating Its Customers (ExtremeTech.com) Tip of the Week VS Code … in the browser … just … there? Not all extensions work, but a lot do! (VSCode.dev) Skaffold is a tool you can use to build and maintain Kubernetes environments that we’ve mentioned on the show several times and guess what!? You can make your life even easier with Skaffold with environment variables. It’s another great way to maintain flexibility for your environments … both local and CI/CD. (Skaffold.dev) K9s is a Kubernetes terminal UI that makes it easy to quickly search, browse, filter, and edit your clusters and it also has skins! The Solarized Light theme is particularly awesome for customizing your experience, especially for presenting. (GitHub)

Oct 25

2 hr 59 min

We discuss the pros and cons of speaking at conferences and similar events, while Joe makes a verbal typo, Michael has turned over a new leaf, and Allen didn’t actually click the link. The full show notes for this episode are available at https://www.codingblocks.net/episode169. Sponsors Datadog – Sign up today for a free 14 day trial and get a free Datadog t-shirt after creating your first dashboard. Survey Says How likely are you to give a presentation? Take the survey at: https://www.codingblocks.net/episode169. News The Kinesis Gaming Freestyle Edge RGB Split Mechanical Keyboard might be the current favorite. Thank you to everyone that left a review! iTunes: dahol1337, Pesri How long does it take to get the Moonlander? (ZSA.io) Is the Kinesis Gaming Freestyle the current favorite? (Amazon) Atlanta Code Camp was fantastic, see you again next year! (atlantacodecamp.com) What kind of speaking are we talking about? Conferences Meetups Does YouTube/Twitch count as tech presentations? There are some similarities! Streaming has the engagement, but generally isn’t as rehearsed. Published videos are closer to the format but you have to make some assumptions about your audience and can get creative with the editing. Why do people speak? Can help you build an audience Establish credibility Check out Azure Steve! Promotional opportunities Networking Free travel/conferences Great way to learn something Become a better communicator Is it fun? Who speaks at conferences? People speak at conferences for different reasons Couple different archetypes of speakers: Sponsored: the speakers are on the job, promoting their company and products Practitioners: Talks from people in the trenches, usually more technical and focused on specific results or challenges Idea people: People who have a strong belief in something that is controversial, may have an axe to grind or an idea that’s percolating into a product Professionals: Some companies encourage speakers to bolster the company reputation, promotions and job descriptions might require this How do you put together a talk? How do you pick a talk? Know who is selecting talks, go niche for larger conferences if you don’t have large credentials/backing Sometimes conferences will encourage “tracks” certain themes for topics What are some talks you like? What do they do differently? Do you aim for something you know, or want to know? How do you write your talks? How do you practice for a talk? Differences between digital and physical presentations? How long does it take you? Where can you find places to speak? Is this the right question? What does this tell you about your motivation? Meet new people who share your interests through online and in-person events. (Meetup) Find your next tech conference (Confs.Tech) Google for events in your area! Final Questions Is it worth the time and anxiety? What do you want out of talks? What are some alternatives? Blogging Videos Open Source Participating in communities Resources Is Speaking At A Conference Really Worth Your Time? (Cleverism.com) We’re 93% certain that Burke Holland gave a great talk about a dishwasher and Vue.js. (Twitter) Monitor you Netlify sites with Datadog (Datadog) Netlify (docs.datadoghq.com) Risk Astley – Never Gonna Give You Up (Official Music Video) (YouTube) Simple Minds – Don’t You (Forget About Me) (YouTube) Foo Fighters With Rick Astley – Never Gonna Give You Up – London O2 Arena 19 September 2017 (YouTube) Tip of the Week Next Meeting is a free app for macOS that keeps a status message up in the top right of your toolbar so you know when your next meeting is. It does other stuff too, like making it easier to join meetings and see your day’s events but … the status is enough to warrant the install. Thanks MadVikingGod! (Mac App Store) How do I disable “link preview” in iOS safari? (Stack Exchange) Here is your new favorite YouTube channel, Rick Beato is a music professional who makes great videos about the music you love, focusing on what makes the songs and artists special. (YouTube) Hot is a free app for macOS that shows you the temperate of your MacBook Pro … and the percentage of CPU you’re limited to because of the heat! Laptop feels slow? Maybe it’s too hot! (GitHub, XS-Labs) What is the meaning of $? in a shell script? (Stack Exchange) Did you know…You can install brew on Linux? That’s right, the popular macOS packaging software is available on your favorite distro. (docs.brew.sh, brew.sh)

Oct 11

2 hr 16 min

Joe goes full shock jock, but only for a moment. Allen loses the "Most Tips In A Single Episode: 2021" award, and Michael didn't get the the invite notification in this week's episode. The full show notes for this episode are available at https://www.codingblocks.net/episode168. Sponsors Datadog – Sign up today for a free 14 day trial and get a free Datadog t-shirt after creating your first dashboard. Shortcut - Project management has never been easier. Check out how Shortcut (formerly known as Clubhouse) is project management without all the management. Survey Says Well...no survey this week, but this is where it would be! News This book has a whole chapter on transactions in distributed systems Thank you to everyone that left a review! "Podchaser: alexi*********, Nicholas G Larsen, Kubernutties, iTunes: Kidboyadde, Metalgeeksteve, cametumbling, jstef16, Fr1ek Audible: Anonymous (we are like your mother - go clean your room and learn Docker) Atlanta Code Camp is right around the corner on October 9th. Stop by the CB booth and say hi! (AtlantaCodeCamp.com) Maintaining data consistency Each service should have its own data store What about transactions? microservices.io suggests the saga pattern (website) A sequence of local transactions must occur Order service saves to its data store, then sends a message that it is done Customer service attempts to save to its data store…if it succeeds, the transaction is done. If it fails, it sends a message stating so, and then the Order service would need to run another update to undo the previous action Sound complicated? It is…a bit, you can't rely on a standard 2 Phase Commit at the database level to ensure an atomic transaction Ways to achieve this - choreography or orchestration Choreography Saga - The Order Service receives the POST /orders request and creates an Order in a PENDING state - It then emits an Order Created event - The Customer Service’s event handler attempts to reserve credit - It then emits an event indicating the outcome - The OrderService’s event handler either approves or rejects the Order Each service's local transaction sends a domain event that triggers another service's local transaction To sum things up, each service knows where to listen for work it should do, and it knows where to publishes the results of it's work. It's up to the designers of the system to set things up such that the right things happened What's good about this approach? "The code I wrote sux. The code I'm writing is cool. The code I'm going to write rocks!" Thanks for the paraphrase Mike! Orchestration Saga - The Order Service receives the POST /orders request and creates the Create Order saga orchestrator - The saga orchestrator creates an Order in the PENDING state - It then sends a Reserve Credit command to the Customer Service - The Customer Service attempts to reserve credit - It then sends back a reply message indicating the outcome - The saga orchestrator either approves or rejects the Order There is an orchestrator object that tells each service what transaction to run The difference between Orchestration and Choreography is that the orchestration approach has a "brain" - an object that centralizes the logic and can make more advanced changes These patterns allow you to maintain data consistency across multiple services The programming is quite a bit more complicated - you have to write rollback / undo transactions - can't rely on ACID types of transactions we've come to rely on in databases Other issues to understand The service must update the local transaction AND publish the message / event The client that initiates the saga (asynchronously) needs to be able to determine the outcome The service sends back a response when the saga completes The service sends back a response when the order id is created and then polls for the status of the overall saga The service sends back a response when the order id is created and then submits an event via a webhook or similar when the saga completes When would you use Orchestration vs Choreography for transactions across Microservices? Friend of the show @swyx works for Temporal, a company that does microservice orchestration as a service, https://temporal.io/ Tips for writing Great Microservices Fantastic article on how to keep microservices loosely coupled https://www.capitalone.com/tech/software-engineering/how-to-avoid-loose-coupled-microservices/ Mentions using separate data storage / dbs per service Can't hide implementation from other services if they can see what's happening behind the scenes - leads to tight coupling Share as little code as possible Tempting to share things like customer objects, but doing so tightly couples the various microservices Better to nearly duplicate those objects in a NON-shared way - that way the services can change independently Avoid synchronous communication where possible This means relying on message brokers, polling, callbacks, etc Don't use shared test environments / appliances May not sound right, but sharing a service may lead to problems - like multiple services using the same test service could introduce performance problems Share as little domain data as possible - ie. important pieces of information shouldn't be passed around various services in domain objects. Only the bits of information necessary should be shared with each service - ie an order number or a customer number. Just enough to let the next microservice be able to do its job Resources https://microservices.io/ Sam Newman books (Thanks Jim!) Monolith to Microservices Building Microservices https://segment.com/blog/goodbye-microservices/ https://stackoverflow.blog/2020/11/23/the-macro-problem-with-microservices/ Designing Data Intensive Applications https://www.dashcon.io/ Tip of the Week Podman is an open-source containerization tool from Red Hat that provides a drop in replacement for Docker (they even recommend aliasing it!). The major difference is in how it works underneath, spawning process directly rather than relying on resident daemons. Additionally, podman was designed in a post Kubernetes world, and it has some additional tooling that makes it easier to transition to Kubernetes- like being able to spawn pods and generate Kubernetes yaml files. Website Check out this episode from Google's Kubernetes podcast all about it: Podcast Unity is the most popular game engine and they have a ton of resources in their Learning Center. Including one that is focused on writing code. It walks you through writing 5 microgames with hands on exercises where you fix projects and ultimately design and write your own simple game. Also it's free! https://learn.unity.com/course/create-with-code Bonus: Make sure you subscribe to Jason Weimann's YouTube channel if you are interested in making games. Brilliant coder and communicator has a wide variety of videos: YouTube Educative.io has been a sponsor of the show before and we really like their approach to hands on teaching so Joe took a look to see if they had any resources on C++ since he was interested in possibly pursuing competitive programming. Not only do they have C++ courses, but they actually have a course specifically for Competitive Programming in C++. Great for devs who already know a programming language and are wanting to transition without having to start at step 1. Educative Course The most recent Coding Blocks Mailing List contest asked for "Summer Song" recommendations, we compiled them into a Spotify Summer Playlist. These are songs that remind you of summer, and don't worry we deduped the list so there is only one song from Rick Astley on there. Spotify Finally, one special recommendation for Coding Music. It's niche, for sure, but if you like coding to instrumental rock/hard-rock then you have to check out a 2018 album from a band called Night Verses. It's like Russian Circles had a baby with the Mercury Program. If you are familiar with either of those bands, or just want something different then make sure to check it out. Spotify

Sep 27

1 hr 14 min

Some things just require discussion, such as Docker’s new licensing, while Joe is full of it, Allen *WILL* fault them, and Michael goes on the record. The full show notes for this episode are available at https://www.codingblocks.net/episode167. Sponsors Datadog – Sign up today for a free 14 day trial and get a free Datadog t-shirt after creating your first dashboard. Shortcut – Project management has never been easier. Check out how Shortcut (formerly known as Clubhouse) is project management without all the management. Survey Says How do you get prefer to get on the network? Take the survey at: https://www.codingblocks.net/episode167. News Thank you to everyone that left a review! iTunes: Badri Ravi Audible: Dysrhythmic, Brent Atlanta Code Camp is right around the corner on October 9th. Stop by the CB booth and say hi! (AtlantaCodeCamp.com) Docker Announcement Docker recently announced big changes to the licensing terms and pricing for their product subscriptions. These changes would mean some companies having to pay a lot more to continue using Docker like they do today. So…what will will happen? Will Docker start raking in the dough or will companies abandon Docker? Resources Docker is Updating and Extending Our Product Subscriptions (Docker) Minkube documentation (Thanks MadVikingGod! From the Tips n’ Tools channel in Slack.) Open Container Initiative, an open governance structure for the purpose of creating open industry standards around container formats and runtimes. (opencontainers.org) Podman, a daemonless container engine for developing, managing, and running OCI containers. (podman.io) Getting Started with K9s (YouTube) How valuable is education? How do you decide when it’s time to go back to school or get a certification? What are the determining factors for making those decisions? Full-Stack Road Map What’s on your roadmap? We found a full-stack roadmap on dev.to and it’s got some interesting differences from other roadmaps we’ve seen or the roadmaps we’ve made. What are those differences? Resources Full Stack Developer’s Roadmap (dev.to) Bonus Tip: You can find the top dev.to articles for certain time periods like: https://dev.to/top/year. Works for week, month, and day, too. Where does your business logic go? Business logic should be in a service, not in a model … or should it? What’s the right way to do this? Is there a right way? Resources How accurate is “Business logic should be in a service, not in a model”? (Stack Exchange) AnemicDomainModel (MartinFowler.com) Are the M1/M1X chips a good idea for devs? Last year’s MacBook Pros introduced new M1 processors based on a RISC architecture. Now Apple is rolling out the rest of the line. What does this mean for devs? Is there a chance you will regret purchasing one of these laptops? Resources Apple Silicon M1: A Developer’s Perspective (steipete.com) Tip of the Week Hit . (i.e. the period key) in GitHub to bring up an online VS Code editor while you are logged in. Thanks Morten Olsrud! (blog.yogeshchavan.dev) Shoutout to Coder, cloud-powered development environments that feel local. (coder.com) The podcast that puts together the the “perfect album” for the topic du jour: The Perfect Album Side Podcast (iTunes, Spotify, Google Podcasts) Bon Jovi – Livin’ On A Prayer / Wanted Dead Or Alive (Los Angeles 1989) (YouTube) Docker’s system prune command now includes a filter option to easily get rid of older docker resources. (docs.docker.com) Example: docker system prune --filter="until=72h" The GitHub CLI makes it easy to create PR by autofilling information, as well as pushing your branch to origin: Example: gh pr create --fill (cli.github.com) Apache jclouds is an open-source multi-cloud toolkit that abstracts the details of your cloud provider away so you can focus on your code and still support multiple providers. (jclouds.apache.org)

Sep 13

2 hr

We step away from our microservices deployments to meet around the water cooler and discuss the things on our minds, while Joe is playing Frogger IRL, Allen “Eeyores” his way to victory, and Michael has some words about his keyvoard, er, kryboard, leybaord, ugh, k-e-y-b-o-a-r-d! The full show notes for this episode are available at https://www.codingblocks.net/episode166. Sponsors Datadog – Sign up today for a free 14 day trial and get a free Datadog t-shirt after creating your first dashboard. Clubhouse – Project management has never been easier. Check out how Clubhouse (soon to be Shortcut) is project management without all the management. Survey Says Do you find that you're more productive ... Take the survey at: https://www.codingblocks.net/episode166 News The threats worked and we got new reviews! Thank you to everyone that left a review: iTunes: ArcadeyGamey, Mc’Philly C. Steak, joby_h Audible: Jake Tucker Atlanta Code Camp is right around the corner on October 9th. Stop by the CB booth and say hi! (AtlantaCodeCamp.com) Water Cooler Gossip > Office Memos Are you interested in competitive programming? Michael gives a short term use review of his Moonlander. Spring makes Java better. Resources We Like CoRecursive episode 65: From Competitive Programming to APL With Conor Hoekstra (corecursive.com) Competitive Programming – A Complete Guide (GeeksForGeeks.org) Algorithms (GeeksForGeeks.org) Get started solving problems on Code Chef (CodeChef.com) Data Structures and Algorithms (CodeChef.com) Introduction to Dynamic Programming 1 (HackerEarth.com) Enhance your skills, expand your knowledge, and prepare for technical interviews with LeetCode. (LeetCode.com) Getting started with Competitive Programming – Build your algorithm skills (dev.to) ZSA Moonlander (zsa.io) Spring Framework Documentation (docs.spring.io) Spring Expression Language (SpEL) (docs.spring.io) RethinkDB, the open-source database for the realtime web. (RethinkDB.com) Tip of the Week Learn C the Hard Way: Practical Exercises on the Computational Subjects You Keep Avoiding (Like C) by Zed Shaw (Amazon) With Windows Terminal installed: In File Explorer, right click on or in a folder and select Open in Windows Terminal. Right click on the Windows Terminal icon to start a non-default shell. SonarLint is a free and open source IDE extension that identifies and helps you fix quality and security issues as you code. (SonarLint.org) Use docker buildx to create custom builders. Just be sure to call docker buildx stop when you’re done with it. (Docker docs: docker buildx, docker buildx stop)

Aug 30

1 hr 50 min

We decide to dig into the details of what makes a microservice and do we really understand them as Joe tells us why we really want microservices, Allen incorrectly answers the survey, and Michael breaks down in real time. The full show notes for this episode are available at https://www.codingblocks.net/episode165. Stop by, check it out, and join the conversation. Sponsors Datadog – Sign up today for a free 14 day trial and get a free Datadog t-shirt after creating your first dashboard. Survey Says For your next laptop, are you leaning ... Take the survey at: https://www.codingblocks.net/episode165. News Want to know why we’re so hot on Skaffold? Check out this video from Joe: Getting Started with Skaffold (YouTube) Atlanta Code Camp is coming up October 9th, come hang out at the CB booth! Want to know what’s up with Skaffold? We Thought We Knew About Microservices What are Microservices? A collection of services that are… Highly maintainable and testable Loosely coupled (otherwise you just have a distributed monolith!) Independently deployable Organized around business capabilities (super important, Microservices are just as much about people organization as they are about code) Owned by a small team A couple from Jim Humelsine (Design Patterns Evangelist) Stateless Independently scalable (both in terms of tech, but also personnel) Note: we didn’t say anything about size but Sam Newman’s definition is: “Microservices are small, autonomous services that work together.” Semantic Diffusion (vague term getting vaguer) Enables frequent and reliable delivery of complex applications Allows you to evolve your tech stack (reminiscent of the strangler pattern) They are NOT a silver bullet – Many downsides A Pattern Language A collection of patterns for apply microservice patterns Example Microservice Implementation: https://microservices.io/patterns/microservices.html 3 micro-services in the example: Inventory service Account service Shipping service Each services talks to a separate backend database – i.e., inventory service talks to inventory DB, etc. Fronting those micro-services are a couple of API’s – a mobile gateway API and an API that serves a website When an order is placed, a request is made to the mobile API to place the order, the mobile API has to make individual calls to each one of the individual micro-services to get / update information regarding the order This setup is in contrast to a monolithic setup where you’d just have a single API that talks to all the backends and coordinates everything itself The macro problem with microservices (Stack Overflow) Pros of the Microservice Architecture Each service is small so it’s easier to understand and change Easier / faster to test as they’re smaller and less complex Better deployability – able to deploy each service independently of the others Easier to organize development effort around smaller, autonomous teams Because the code bases are smaller, the IDEs are actually better to work in Improved fault isolation – example they gave is a memory leak won’t impact ALL parts of the system like in a monolithic design Applications start and run faster when they are smaller Allows you to be more flexible with tech stacks – you can change out small pieces rather than entire systems if necessary Cons of the Microservice Approach Additional complexity of a distributed system Distributed debugging is hard! Requires additional tooling Additional cost (overhead of services, network traffic) Multi-system transactions are really hard Implementing inter-service communication and handling of failures Implementing multi-service requests is more complex Not only more complex, but you may be interfacing with multiple developer teams as well Testing interactions between services is more complex IDEs don’t really make distributed application development easier – more geared towards monolithic apps Deployments are more complex – managing multiple services, dependencies, etc. Increased infrastructure requirements – CPU, memory, etc. Distributed debugging is hard! Requires additional tooling How to Know When to Choose the Microservice Architecture This is actually a hard problem. Choosing this path can slow down development However, if you need to scale in the future, splitting apart / decomposing a monolith may be very difficult Decomposing an Application into Microservices Do so by business capability Example for e-commerce: Product catalog management, Inventory management, Order management, Delivery management How do you know the right way to break down the business capabilities? Organizational structure – customer service department, billing, shipping, etc Domain model – these usually map well from domain objects to business functions Which leads to decomposing by domain driven design Decompose by “verb” – ship order, place order, etc Decompose by “noun” – Account service, Order service, Billing service, etc Follow the Single Responsibility Principal – similar to software design Questions About Microservices Are Microservices a conspiracy? Isn’t this just SOA over again? How can you tell if you should have Microservices? Who uses Microservices? Netlifx Uber Amazon Lots of other big companies Who has abandoned Microservices? Lots of small companies…seeing a pattern here? Resources We Like https://microservices.io/ Sam Newman books (Thanks Jim!) Monolith to Microservices Building Microservices https://segment.com/blog/goodbye-microservices/ https://stackoverflow.blog/2020/11/23/the-macro-problem-with-microservices/ Tip of the Week NeoVim is a fork of Vim 7 that aims to address some technical debt in vim in hopes of speeding up maintenance, plugin creation, and new features. It supports RPC now too, so you can write vim plugins in any language you want. It also has better support for background jobs and async tasks. Apparently the success of nvim has also led to some of the more popular features being brought into vim as well. Thanks Claus/@komoten! (neovim.io) Portable Apple Watch charger lets you charge your watch wirelessly from an outlet, or a usb. Super convenient! (Amazon) Free book from Linode explaining how to secure your Docker containers. Thanks Jamie! (Linode) There is a daily.dev plugin for Chrome that gives you the dev home page you deserve, delivering you dev news by default. Thanks @angryzoot! (Chrome Web Store) SonarQube is an open-source tool that you can run on your code to pull metrics on it’s quality. And it’s available for you to run in docker Thanks Derek Chasse! (hub.docker.com)

Aug 16

1 hr 57 min

We dive into JetBrains’ findings after they recently released their State of the Developer Ecosystem for 2021 while Michael has the open down pat, Joe wants the old open back, and Allen stopped using the command line. The full show notes for this episode are available at https://www.codingblocks.net/episode164. Stop by, check it out, and join the conversation. Sponsors Datadog – Sign up today for a free 14 day trial and get a free Datadog t-shirt after creating your first dashboard. Survey Says What's your IDE of choice? Take the survey at: https://www.codingblocks.net/episode164. News We really appreciate the latest reviews, so thank you! iTunes: TenPal7 Allen has been making videos of some of our tips: HTTP Requests Made Easy in Visual Studio Code Easily See Text Diffs in Visual Studio Code Atlanta Code Camp is coming up October 9th, come hang out at the CB booth! Check out Allen’s Quick Tips! Why JetBrains? JetBrains has given us free licenses to give out for years now. Sometimes people ask us what it is that we like about their products, especially when VS Code is such a great (and 100% free) experience…so we’ll tell ya! JetBrains produces (among other things) a host of products that are all based on the same IDEA platform, but are custom tailored for certain kinds of development. CLion for C, Rider for C#, IntelliJ for JVM, WebStorm for front-end, etc. These IDEs support plugins but they come stocked with out-of-the-box functionality that you would have to add via plugins in a generalized Editor or IDE This also helps keep consistency amongst developers…everybody using the same tools for git, databases, formatting, etc Integrated experience vs General Purpose Tool w/ Plugins, Individual plugins allow for a lot of innovation and evolution, but they aren’t designed to work together in the same way that you get from an integrated experience. JetBrains has assembled a great community Supporting user groups, podcasts, and conferences for years with things like personal licenses Great learning materials for multiple languages (see the JetBrains Academy) Community (free) versions of popular products (Android Studio, IntelliJ, WebStorm, PyCharm) Advanced features that have taken many years of investment and iteration (Resharper/Refactoring tools) TL;DR JetBrains has been making great products for 20 years, and they are still excelling because those products are really good! Survey Results Survey was comprised of 31,743 developers from 183 countries. JetBrains attempted to get a wide swath of diverse responses and they weighted the results in an attempt to get a realistic view of the world. Read more about the methodology What would you normally expect from JetBrain’s audience? (Compare to surveys from StackOverflow or Github or State of JS) JetBrains are mainly known for non-cheap, heavy duty tools so you might expect to see more senior or full time employees than StackOverlow, but that’s not the case…it skews younger Professional / Enterprise (63% full-time, 70.9% on latest Stack Overflow) JetBrains 3-5 vs StackOverflow 5-9 years of experience Education level is similar 71% of respondents develop for web backend! Key Takeaways JavaScript is the most popular language Python is more popular than Java overall, but Java is more popular as a main language Top 5 languages devs are planning to adopt: Go Kotlin TypeScript Python Rust Top 5 Languages devs learning in 2021: JS Python TS Java Go Languages that fell: Ruby Objective C Scala Top 5 Fastest Growing: Python TypeScript SQL Go Kotlin 71% of respondents develop for web backend Primary programming languages, so much JS! Developer OS: 61% Windows 47% linux 44% macOS Lifestyle and Fun What sources of information… Podcasts 31%! Glad to see this up there, of course 74% of the respondents use online ad-blocking tools Accounts: Github 84% Reddit…47%? Workplace and Events – pre covid comparisons Video Games are #1 hobby, last year was programming Databases Used in last 12 Months, Primary…so much MySQL Really cool to see relative popularity by programming language DevOps How familiar are you with Docker? DevOps engineers are 2x more likely to be architects, 30% more likely to be leads Kubernetes: went from 16% to 29% to 40% to…40%. Is Kubernetes growth stalling? 90% of devs who use k8s have SSD, have above average RAM 53% of hosting is in the cloud? Still moving up, but there’s also a lot of growth with Hybrad AWS has a big lead in cloud services…GCP 2nd!? Let’s speculate how that happened, that’s not what we see in financial reports During development, where do you run and debug your code? (Come to Joe’s skaffold talk!) Microservices 35% of respondents develop microservices!!!!! Can this be right? Mostly senior devs are doing microservices GraphQL at 14%, coming up a little bit from last year Miscellaneous How much RAM? (Want more RAM? Be DevOps, Architect, Data Analyst, leads) 79% of devs have SSD? Excellent! How old is your computer? Survey says….2 years? That’s really great. Testing 75% say tests play an integral role, 44% involved. Not bad…but 44% not involved, huh? 67% Unit tests, yay! Resources We Like https://www.jetbrains.com/lp/devecosystem-2021/ https://insights.stackoverflow.com/survey/2020#most-popular-technologies https://octoverse.github.com/ Tip of the Week The CoRecursive podcast has fantastic interviews with some really interesting people (corecursive.com) Thanks @msuriar. Some highlights: The Untold Story of SQLite with Richard Hipp (CoRecursive episode 66) Software That Doesn’t Suck, Building Subversion with Jim Blandy (CoRecursive episode 54) Reinforcement Learning At Facebook with Jason Gauci (CoRecursive episode 61, bonus: it’s Jason from Programming Throwdown) Free audiobook/album from the Software Daily host: Move Fast: How Facebook Builds Software (softwareengineeringdaily.com) Apple has great features and documentation on the different ways to take screenshots in macOS (support.apple.com) Data, Data, Data: Let the data guide your decisions. Not feelings. HTTPie is a utility built in Python that makes it really issue to issue web requests. CURL is great…but it’s not very user friendly. Give HTTPie a shot! (httpie.io)

Aug 2

2 hr 14 min

It’s time to take a break, stretch our legs, grab a drink, and maybe even join in some interesting conversations around the water cooler as Michael goes off script, Joe is very confused, and Allen insists that we stay on script. The full show notes for this episode are available at https://www.codingblocks.net/episode163. Stop by, check it out, and join the conversation. Sponsors Educative.io – Learn in-demand tech skills with hands-on courses using live developer environments. Visit educative.io/codingblocks to get an additional 10% off an Educative Unlimited annual subscription. Survey Says Which desktop OS do you prefer? Take the survey at: https://www.codingblocks.net/episode163 News We really appreciate the latest reviews, so thank you! iTunes: EveryNickIsTaken2858, Memnoch97 Allen finished his latest ergonomic keyboard review: Moonlander Ergonomic Keyboard Long Term Review (YouTube) Sadly, the .http files tip from episode 161 for JetBrains IDEs is only application for JetBrains’ Ultimate version. Test RESTful Web services (JetBrains) Meantime, at the watercooler…. GitHub Copilot (GitHub) In short, it’s a VS Code Extension that leverages the OpenAI Codex, a product that translates natural language to code, in order to … wait for it … write code. It’s currently in limited preview. What’s the value? Is the code correct? Github says ~40-50% in some large scale test cases It works best with small, documented functions Does having the code written for you steer you towards solutions? Could this encourage similar bugs/security holes across multiple languages by people importing the same code? Is this any different from developers using the same common solutions from StackOverflow? Could it become a crutch for new developers? Better for certain kinds of code? (Boiler plate, common accessors, date math) Boiler Plate (like angular / controller vars) Common APIs (Twitter, Goodreads) Common Algorithms, Design Patterns Less Familiar Languages But is it useful? We’ll see! Is this the future? We see more low, no, and now co-code solutions all the time, is this where things are going? This probably won’t be “it”, but maybe we will see things like this more commonly – in any case it’s different, why not give it a shot? Is it Ethical? The “AI” or whatever has been trained on “billions of lines” of open-source code…but not strictly permissive licenses. This means a dev using this tool runs the risk of accidently including proprietary code Quake Engine Source Code Example (GPLv2) (Twitter) From an article in VentureBeat: 54 million public software repositories hosted on GitHub as of May 2020 (for Python) 179GB of unique Python files under 1MB in size. Some basic limitations on line and file length, sanitization: The final training dataset totaled 159GB. There is problem with bias, especially in more niche categories Is it ethical to use somebody else’s data to train an AI without their permission? Can it get you sued? Would your thoughts change if the data is public? License restricted? Would your thoughts change if the product/model were open-sourced? Abstractions… how far is too far? Services should communicate with datastores and services via APIs that hide the details, these provide for a nice indirection that allows for easier maintenance in the future Do you abstract at the service level or the feature level? Are ORMs a foregone conclusion? What about services that have a unique communication pattern, or assist with cross cutting concerns for things like microservices (We are looking at you hear Kafka!) The 10 Best Practices for Remote Software Engineering From article: The 10 Best Practices for Remote Software Engineering (ACM) Work on Things You Care About Define Goals for Yourself Define Productivity for Yourself Establish Routine and Environment Take Responsibility for Your Work Take Responsibility for Human Connection Practice Empathetic Review Have Self-Compassion Learn to Say Yes, No, and Not Anymore Choose Correct Communication Channels Terminal Tricks (CodeMag.com) Some of Michael’s (Linux/macOS) favorites from the article: Abbreviate your directories with tab completion when changing directories, such as cd /v/l/a, and assuming that that abbreviated path can uniquely identify something like, /var/logs/apache, tab completion will take care of the rest. Use nl to get a numbered list of some previous command’s output, such as ls -l | nl. ERRATUM: During the episode, Michael mentioned that the output would first list the total lines, but that just happened to be due to output from ll and was unrelated to the output from nl. On macOS, you can use the powermetrics command to gain access to all sorts of metrics related to the internals of your computer, such as the temperature at various sensors. Use !! to repeat the last command. This can be especially helpful when you want to do something like prepend/append the previous command, such as sudo !!. ERRATUM: Wow, Michael really got this one wrong during the episode. It doesn’t repeat the “last sudo command” nor does it leave the command in edit mode. Listen to Allen’s description. /8) Awesome keyboard shortcuts: CTRL+A takes you to the start of the line and CTRL+E takes you to the end. No need to type clear any longer as CTRL+L will clear your screen. CTRL+U deletes the content to the left of the cursor and CTRL+K deletes the content to the right of the cursor. Made a mistake in while typing your command? Use CTRL+SHIFT+- to undo what you last typed. Using the history command, you can see your previous commands and even limit it with a negative number, such as history -5 to see only the last five commands. Tip of the Week Partial Diff is a VS Code extension that makes it easy to compare text. You can right click to compare files or even blocks of text in the same file, as well as in different files. (Visual Studio Marketplace) StackBlitz is an online development environment for full stack applications. (StackBlitz.com) Microcks, an open source Kubernetes native tool for API mocking and testing. (Microcks.io) Bridging the HTTP protocol to Apache Kafka (Strimzi.io) Difference Between grep, sed, and awk (Baeldung.com) As an alternative to the ruler hack mentioned in episode 161, there are several compact, travel ready laptop stands. (Amazon)

Jul 19

2 hr 25 min

We wrap up our replication discussion of Designing Data-Intensive Applications, this time discussing leaderless replication strategies and issues, while Allen missed his calling, Joe doesn’t read the gray boxes, and Michael lives in a future where we use apps. If you’re reading this via your podcast player, you can find this episode’s full show notes at https://www.codingblocks.net/episode162. As Joe would say, check it out and join in on the conversation. Sponsors Educative.io – Learn in-demand tech skills with hands-on courses using live developer environments. Visit educative.io/codingblocks to get an additional 10% off an Educative Unlimited annual subscription. Survey Says Do you have TikTok installed? Take the survey at: https://www.codingblocks.net/episode162. News Thank you for the latest review! iTunes: tuns3r Check out the book! Single Leader to Multi-Leader to Leaderless When you have leaders and followers, the leader is responsible for making sure the followers get operations in the correct order Dynamo brought the trend to the modern era (all are Dynamo inspired) but also… Riak Cassandra Voldemort We talked about NoSQL Databases before: Episode 123 Data Models: Relational vs Document What exactly is NewSQL? https://en.wikipedia.org/wiki/NewSQL What if we just let every replica take writes? Couple ways to do this… You can write to several replicas You can use a coordinator node to pass on the writes But how do you keep these operations in order? You don’t! Thought exercise, how can you make sure operation order not matter? Couple ideas: No partial updates, increments, version numbers Multiple Writes, Multiple Reads What do you do if your client (or coordinator) try to write to multiple nodes…and some are down? Well, it’s an implementation detail, you can choose to enforce a “quorom”. Some number of nodes have to acknowledge the write. This ratio can be configurable, making it so some % is required for a write to be accepted What about nodes that are out of date? The trick to mitigating stale data…the replicas keep a version number, and you only use the latest data – potentially by querying multiple nodes at the same time for the requested data We’ve talked about logical clocks before, it’s a way of tracking time via observed changes…like the total number of changes to a collection/table…no timezone or nanosecond differences How do you keep data in sync? About those unavailable nodes…2 ways to fix them up Read Repair: When the client realizes it got stale data from one of the replicas, it can send the updated data (with the version number) back to that replica. Pretty cool! – works well for data that is read frequently Anti-Entropy: The nodes can also do similar background tasks, querying other replicas to see which are out of data – ordering not guaranteed! Voldemort: ONLY uses read repair – this could lead to loss of data if multiple replicas went down and the “new” data was never read from after being written Quorums for reading and writing Quick Reminder: We are still talking about 100% of the data on each replica 3 major numbers at play: Number of nodes Number of confirmed writes Number of reads required If you want to be safe, the nodes you write to and the ones you write too should include some overlap A common way to ensure that, keep the number of writes + the number of reads should be greater than the number of nodes Example: You have 10 nodes – if you use 5 for writing and 5 for reading…you may not have an overlap resulting in potentially stale data! Common approach – taken number of nodes (odd number) + 1, then divide that number by 2 and that’s the number of reader and writers you should have 9 Nodes – 5 writes and 5 reads – ensures non-stale data When using this approach, you can handle Nodes / 2 (rounded down) number of failed nodes How would you tweak the numbers for a write heavy workload? Typically, you write and read to ALL replicas, but you only need a successful response from these numbers What if you have a LOT of nodes?!? Note: there’s still room for problems here – author explicitly lists 5 types of edge cases, and one category of miscellaneous timing edge cases. All variations of readers and writers getting out of sync or things happen at the same timing If you really want to be safe, you need consensus (r = w = n) or transactions (that’s a whole other chapter) Note that if the number of required readers or writers doesn’t return an OK, then an error is returned from the operation Also worth considering is you don’t have to have overlap – having readers + writers < nodes means you could have stale data, but at possibly lower latencies and lower probabilities of error responses Monitoring staleness Single/Multi Leader lag is generally easy to monitor – you just query the leader and the replicas to see which operation they are on Leaderless databases don’t have guaranteed ordering so you can’t do it this way If the system only uses read repair (where the data is fixed up by clients only as it is read) then you can have data that is ancient It’s hard to give a good algorithm description here because so much relies on the implementation details Paper discussing Probabilistic Bounded Staleness (PBS) http://www.bailis.org/papers/pbs-cacm2014.pdf And when things don’t work? Multi-writes and multi-reads are great when a small % of nodes or down, or slow What if that % is higher? Return an error when we can’t get quorum? Accept writes and catch the unavailable nodes back up later? If you choose to continue operating, we call it “sloppy quorum” – when you allow reads or writes from replicas that aren’t the “home” nodes – the likened it to you got locked out of your house and you ask your neighbor if you can stay at their place for the night This increases (write) availability, at the cost of consistency Technically it’s not a quorum at all, but it’s the best we can do in that situation if you really care about availability – the data is stored somewhere just not where it’d normally be stored Detecting Concurrent Writes What do you get when you write the same key at the same time with different values? Remember, we’re talking about logical clocks here so imagine that 2 clients both write version #17 to two different nodes This may sound unlikely, but when you realize we’re talking logical clocks, and systems that can operate at reduced capacity…it happens What can we do about it? Last write wins: But which one is considered last? Remember, how we catch up? (Readers fix or leaders communicate) …either way, the data will eventually become consistent but we can’t say which one will win…just that one will eventually take over Note: We can take something else into account here, like clock time…but no perfect answer LWW is good when your data is immutable, like logs – Cassandra recommends using a UUID as a key for each write operation Happens-Before Relationship – (Riak has CfRDT that bundle a version vector to help with this) This “happens-before” relationship and concurrency How do we know whether the operations are concurrent or not? Basically if neither operation knows about the other, then they are concurrent… Three possible states if you have writes A and B A happened before B B happened before A A and B happened concurrently When there is a happens before, then you take the later value When they are concurrent, then you have to figure out how to resolve the conflicts Merging concurrently written values Last write wins? Union the data? No good answer Version vectors The collection of version numbers from all replicas is called a version vector Riak uses dotted version vectors – the version vectors are sent back to the clients when values are read, and need to be sent back to the db when the value is written back Doing this allows the db to understand if the write was an overwrite or concurrent This also allows applications to merge siblings by reading from one replica and write to another without losing data if the siblings are merged correctly Resources We Like Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems by Martin Kleppmann (Amazon) Past episode discussions on Designing Data-Intensive Applications (Coding Blocks) Designing Data-Intensive Applications – Data Models: Relational vs Document (episode 123) NewSQL (Wikipedia) Do not allow Jeff Bezos to return to Earth (Change.org) Man Invests $20 in Obscure Cryptocurrency, Becomes Trillionaire Overnight, at Least Temporarily (Newsweek) Quantifying Eventual Consistency with PBS (Bailis.org) Riak Distributed Data Types (Riak.com) Tip of the Week A GitHub repo for a list of “falsehoods”: common things that people believe but aren’t true, but targeted at the kinds of assumptions that programmers might make when they are working on domains they are less familiar with. (GitHub) The Linux at command lets you easily schedule commands to run in the future. It’s really user friendly so you can be lazy with how you specify the command, for example echo "command_to_be_run" | at 09:00 or at 09:00 -f /path/to/some/executable (linuxize.com) You can try Kotlin online at play.kotlinlang.org, it’s an online editor with links to lots of examples. (play.kotlinlan.org) The Docker COPY cmd will need to be run if there are changes to files that are being copied. You can use a .dockerignore to skip files that you don’t care about to trim down on unnecessary work and build times. (doc.docker.com).

Jul 6

2 hr 4 min

We continue our discussion of Designing Data-Intensive Applications, this time focusing on multi-leader replication, while Joe is seriously tired, and Allen is on to Michael’s shenanigans. For anyone reading this via their podcast player, this episode’s show notes can be at https://www.codingblocks.net/episode161, where you can join the conversation. Sponsors Educative.io – Learn in-demand tech skills with hands-on courses using live developer environments. Visit educative.io/codingblocks to get an additional 10% off an Educative Unlimited annual subscription. Survey Says How do you put on your shoes? Take the survey at: https://www.codingblocks.net/episode161. News Thank you very much for the new reviews: iTunes: GubleReid, tbednarick, JJHinAsia, katie_crossing Audible: Anonymous User, Anonymous User … hmm When One Leader Just Won’t Do Talking about Multi-Leader Replication Replication Recap and Latency When you’re talking about single or multi-leader replication, remember all writes go through leaders If your application is read heavy, then you can add followers to increase your scalability That doesn’t work well with sync writes..the more followers, the higher the latency The more nodes the more likely there will be a problem with one or more The upside is that your data is consistent The problem is if you allow async writes, then your data can be stale. Potentially very stale (it does dial up the availability and perhaps performance) You have to design your app knowing that followers will eventually catch up – “eventual consistency“ “Eventual” is purposely vague – could be a few seconds, could be an hour. There is no guarantee. Some common use cases make this particularly bad, like a user updating some information…they often expect to see that change afterwards There are a couple techniques that can help with this problem Techniques for mitigation replication lag Read You Writes Consistency refers to an attempt to read significant data from leader or in sync replicas by the user that submitted the data In general this ensures that the user who wrote the data will get the same data back – other users may get stale version of the data But how can you do that? Read important data from a leader if a change has been made OR if the data is known to only be changeable by that particular user (user profile) Read from a leader/In Sync Replica for some period of time after a change Client can keep a timestamp of it’s most recent write, then only allow reads from a replica that has that timestamp (logical clocks keep problems with clock synchronization at bay here) But…what if the user is using multiple devices? Centralize MetaData (1 leader to read from for everything) You make sure to route all devices for a user the same way Monotonic Reads: a guarantee of sorts that ensures you won’t see data moving backwards in time. One way to do this – keep a timestamp of the most recent read data, discard any reads older than that…you may get errors, but you won’t see data older than you’ve already seen. Another possibility – ensure that the reads are always coming from the same replica Consistent Prefix Reads: Think about causal data…an order is placed, and then the order is shipped…but what if we had writes going to more than one spot and you query the order is shipped..but nothing was placed? (We didn’t have this problem with a Single Replica) We’ll talk more about this problem in a future episode, but the short answer is to make sure that causal data gets sent to the same “partition” Replication isn’t as easy as it sounds, is it? Multi-Leader Rep…lication Single leader replication had some problems. There was a single point of failure for writes, and it could take time to figure out the new leader. Should the old leader come back then…we have a problem. Multi-Leader replication… Allows more than one node to receive writes Most things behave just like single-leader replication Each leader acts as followers to other leaders When to use Multi-Leader Replication Many database systems that support single-leader replication can be taken a step further to make them mulit-leader. Usually. you don’t want to have multiple leaders within the same datacenter because the complexity outweighs the benefits. When you have multiple leaders you would typically have a leader in each datacenter An interesting approach is for each datacenter to have a leader and followers…similar to the single leader. However, each leader would be followers to the other datacenter leaders Sort of a chained single-leader replication setup Comparing Single-Leader vs Multi-Leader Replication Performance – because writes can occur in each datacenter without having to go through a single datacenter, latency can be greatly reduced in multi-leader The synchronization of that data across datacenters can happen asynchronously making the system feel faster overall Fault tolerance – in single-leader, everything is on pause while a new leader is elected In multi-leader, the other datacenters can continue taking writes and will catch back up when a new leader is selected in the datacenter where the failure occurred Network problems Usually a multi-leader replication is more capable of handling network issues as there are multiple data centers handling the writes – therefore a major issue in one datacenter doesn’t cause everything to take a dive So it’s clear right? Multi-leader all the things? Hint: No! Problems with Multi-Leader Replication Changes to the same data concurrently in multiple datacenters has to be resolved – conflict resolution – to be discussed later External tools for popular databases: Tungsten replicator for MySQL BDR for PostgreSQL GoldenGate for Oracle Additional problems – multi-leader is typically bolted on after the fact Auto-incrementing keys, triggers, constraints can all be problematic Those reasons alone are reasons why it’s usually recommended to avoid multi-leader replication Clients with offline operation Multi-leader makes sense when there are applications that need to continue to work even when they’re not connected to the network Calendars were an example given – you can make changes locally and when your app is online again it syncs back up with the remote databases Each application’s local database acts as a leader CouchDB was designed to handle this type of setup Collaborative editing Google Docs, Etherpad, Changes are saved to the “local” version that’s open per user, then changes are synced to a central server and pushed out to other users of the document Conflict resolution One of the problems with multi-leader writes is there will come times when there will be conflicting writes when two leaders write to the same column in a row with different values How do you solve this? If you can automate, you should because you don’t want to be putting this together by hand Make one leader more important than the others Make certain writes always go through the same data centers It’s not easy – Amazon was brought up as having problems with this as well Multi-Leader Replication Toplogogies A replication topology describes how replicas communicate Two leaders is easy Some popular topologies: Ring: Each leader reads from “right”, writes to the “left” All to All: Very Chatty, especially as you add more and more nodes Star: 1 special leader that all other leaders read from Depending on the topology, a write may need to pass through several nodes before it reaches all replicas How do you prevent infinite loops? Tagging is a popular strategy If you have a star or circular topology, then a single node failure can break the flow All to all is safest, but some networks are faster than others that can cause problems with “overrun” – a dependent change can get recorded before the previous You can mitigate this by keeping “version vectors”, kind of logical clock you can use to keep from getting too far ahead Resources We Like Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems by Martin Kleppmann (Amazon) Past episode discussions on Designing Data-Intensive Applications (Coding Blocks) Amazon Yesterday Shipping (YouTube) Uber engineering blog (eng.uber.com) Tip of the Week .http files are a convenient way of running web requests. The magic is in the IDE support. IntelliJ has it built in and VSCode has an extension. (IntelliJ Products, VSCode Extension) iTerm2 is a macOS Terminal Replacement that adds some really nice features. Some of our Outlaw’s favorite short-cuts: (iTerm2, Features and Screenshots) CMD+D to create a new panel (split vertically) CMD+SHIFT+D to create a new panel (split horizontally) CMD+Option+arrow keys to navigate between panes CMD+Number to navigate between tabs Ruler Hack – An architect scale ruler is a great way to prevent heat build up on your laptop by giving the hottest parts of the laptop some air to breathe. (Amazon) Fizz Buzz Enterprise Edition is a funny, and sadly reminiscent, way of doing FizzBuzz that incorporates all the buzzwords and most abused design patterns that you see in enterprise Code. (GitHub) From our friend Jamie Taylor (of DotNet Core Podcast, Tabs ‘n Spaces, and Waffling Taylors), mkcert is a “zero-config” way to easily generate self-signed certificates that your computer will trust. Great for dev! (GitHub) Find out more about Jamie on these great shows… https://dotnetcore.show/ https://tabsandspaces.io/ https://wafflingtaylors.rocks/

Jun 21

1 hr 48 min

We dive back into Designing Data-Intensive Applications to learn more about replication while Michael thinks cluster is a three syllable word, Allen doesn’t understand how we roll, and Joe isn’t even paying attention. For those that like to read these show notes via their podcast player, we like to include a handy link to get to the full version of these notes so that you can participate in the conversation at https://www.codingblocks.net/episode160. Sponsors Datadog –  Sign up today for a free 14 day trial and get a free Datadog t-shirt after creating your first dashboard. Linode – Sign up for $100 in free credit and simplify your infrastructure with Linode’s Linux virtual machines. Educative.io – Learn in-demand tech skills with hands-on courses using live developer environments. Visit educative.io/codingblocks to get an additional 10% off an Educative Unlimited annual subscription. Survey Says How important is it to learn advanced programming techniques? Take the survey at: https://www.codingblocks.net/episode160. News Thank you to everyone that left us a new review: Audible: Ashfisch, Anonymous User (aka András) The major difference between a thing that might go wrong and a thing that cannot possibly go wrong is that when a thing that cannot possibly go wrong goes wrong it usually turns out to be impossible to get at or repair Douglas Adams Douglas Adams In this episode, we are discussing Data Replication, from chapter 5 of “Designing Data-Intensive Applications”. Replication in Distributed Systems When we talk about replication, we are talking about keeping copies of the same data on multiple machines connected by a network For this episode, we’re talking about data small enough that it can fit on a single machine Why would you want to replicate data? Keeping data close to where it’s used Increase availability Increase throughput by allowing more access to the data Data that doesn’t change is easy, you just copy it 3 popular algorithms Single Leader Multi-Leader Leaderless Well established (1970’s!) algorithms for dealing with syncing data, but a lot data applications haven’t needed replication so the practical applications are still evolving Cluster group of computers that make up our data system Node each computer in the cluster (whether it has data or not) Replica each node that has a copy of the database Every write to the database needs to be copied to every replica The most common approach is “leader based replication”, two of the algorithms we mentioned apply One of the nodes is designated as the “leader”, all writes must go to the leader The leader writes the data locally, then sends to data to it’s followers via a “replication log” or “change stream” The followers tail this log and apply the changes in the same order as the leader Reads can be made from any of the replicas This is a common feature of many databases, Postgres, Mongo, it’s common for queues and some file systems as well Synchronous vs Asynchronous Writes How does a distributed system determine that a write is complete? The system could hang on till all replicas are updated, favoring consistency…this is slow, potentially a big problem if one of the replicas is unavailable The system could confirm receipt to the writer immediately, trusting that replicas will eventually keep up… this favors availability, but your chances for incorrectness increase You could do a hybrid, wait for x replicas to confirm and call it a quorum All of this is related to the CAP theorem…you get at most two: Consistency, Availability and Partition Tolerance Site Note: Can you ever have Consistent/Available databases? https://codahale.com/you-cant-sacrifice-partition-tolerance/ The book mentions “chain replication” and other variants, but those are still rare Example: Chain replication in Mongo: https://docs.mongodb.com/manual/tutorial/manage-chained-replication/ Steps for Adding New Followers Take a consistent snapshot of the leader at some point in time (most db can do this without any sort of lock) Copy the snapshot to the new follower The follower connects to the leader and requests all changes since the back-up When the follower is fully caught up, the process is complete Handling Outages Nodes can go down at any given time What happens if a non-leader goes down? What does your db care about? (Available or Consistency) Often Configurable When the replica becomes available again, it can use the same “catch-up” mechanism we described before when we add a new follower What happens if you lose the leader? Failover: One of the replicas needs to be promoted, clients need to reconfigure for this new leader Failover can be manual or automatic Rough Steps for Failover Determining that the leader has failed (trickier than it sounds! how can a replica know if the leader is down, or if it’s a network partition?) Choosing a new leader (election algorithms determine the best candidate, which is tricky with multiple nodes, separate systems like Apache Zookeeper) Reconfigure: clients need to be updated (you’ll sometimes see things like “bootstrap” services or zookeeper that are responsible for pointing to the “real” leader…think about what this means for client libraries…fire and forget? try/catch? Failover is Hard! How long do you wait to declare a leader dead? What if the leader comes back? What if it still thinks it’s leader? Has data the others didn’t know about? Discard those writes? Split brain – two replicas think they are leaders…imagine this with auto-incrementing keys… Which one do you shut down? What if both shut down? There are solutions to these problems…but they are complex and are a large source of problems Node failures, unreliable networks, tradeoffs around consistency, durability, availability, latency are fundamental problems with distributed systems Implementation of Replication Logs 3 main strategies for replication, all based around followers replaying the same changes Statement-Based Replication Leader logs every Insert, Update, Delete command, and followers execute them Problems Statements like NOW() or RAND() can be different Auto-increments, triggers depend on existing things happen in the exact order..but db are multi-threaded, what about multi-step transactions? What about LSM databases that do things with delete/compaction phases? You can work around these, but it’s messy – this approach is no longer popular Example, MySQL used to do it Write Ahead Log Shipping LSM and B-Tree databases keep an append only WAL containing all writes Similar to statement-based, but more low level…contains details on which bytes change to which disk blocks Tightly coupled to the storage engine, this can mean upgrades require downtime Examples: Postgres, Oracle Row Based Log Replication Decouples replication from the storage engine Similar to WAL, but a litle higher level – updates contain what changed, deletes similar to a “tombstone” Also known as Change Data Capture Often seen as an optional configuration (Sql Server, for example) Examples: (New MySQL/binlog) Trigger-Based Replication Application based replication, for example an app can ask for a backup on demand Doesn’t keep replicas in sync, but can be useful Resources We Like Other Episodes on “Designing Data Intensive Applications Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems by Martin Kleppmann (Amazon) You Can’t Sacrifice Partition Tolerance (codahale.com) Manage Chained Replication (docs.mongodb.com) Doug DeMuro’s YouTube channel (YouTube) Apache ZooKeeper (Wikipedia, Apache) Tip of the Week A collection of CSS generators for grid, gradients, shadows, color palettes etc. from Smashing Magazine. Learn This One Weird ? Trick To Debug CSS (freecodecamp.org) Previously mentioned in episode 81. Use tree to see a visualization of a directory structure from the command line. Install it in Ubuntu via apt install tree. (manpages.ubuntu.com) Initialize a variable in Kotlin with a try-catch expression, like val myvar: String = try { ... } catch { ... }. (Stack Overflow) Manage secrets and protect sensitive data (and more with Hashicorp Vault. (Hashicorp)

Jun 7

2 hr 12 min

We couldn’t decide if we wanted to gather around the water cooler or talk about some cool APIs, so we opted to do both, while Joe promises there’s a W in his name, Allen doesn’t want to say graph, and Michael isn’t calling out applets. For all our listeners that read this via their podcast player, this episode’s show notes can be found at https://www.codingblocks.net/episode159, where you can join the conversation. Sponsors Datadog –  Sign up today for a free 14 day trial and get a free Datadog t-shirt after creating your first dashboard. Linode – Sign up for $100 in free credit and simplify your infrastructure with Linode’s Linux virtual machines. ConfigCat – The feature flag and config management service that lets you turn features ON after deployment or target specific groups of users with different features. Survey Says How often do you leetcode? Take the survey at: https://www.codingblocks.net/episode159. News Thank you all for the latest reviews: iTunes: Lp1876 Audible: Jon, Lee Overheard around the Water Cooler Where do you draw the line before you use a hammer to solve every problem? When is it worth bringing in another technology? Can you have too many tools? APIs of Interest Joe’s Picks Video game related APIs RAWG – The Biggest Video Game Database on RAWG – Video Game Discovery Service (rawg.io) RAWG Video Games Database API (v1.0) (api.rawg.io) PS: Your favorite video games might have an API: PokéAPI – The RESTful Pokémon API (pokeapi.co) Legends of Runeterra (developer.riotgames.com) The Bungie.Net API (GitHub) Blizzard Battle.net Developer Portal (develop.battle.net) Satellite imagery related APIs Planet – A leader in Earth observation (planet.com) bird.i – Satellite Imagery API (hibirdi.com) Get into the affiliate game Rainforest – The (missing) Amazon Product Data API (rainforestapi.com) Allen’s Picks Amazon Market Web Service (docs.developer.amazonservices.com) We would *love* a Libsyn API! Should paid services provide an API? Michael’s Picks Alpha Vantage – Free Stock APIs (alphavantage.co) Why so serious? icanhazdadjoke – The largest selection of dad jokes on the Internet (icanhazdadjoke.com) Channel your inner Stuart Smalley with affirmations. (affirmations.dev) HTTP Cats – The ultimate source for HTTP status code images. (http.cat) Relevant call backs from episode 127: Random User Generator – A free, open-source API for generating random user data. (randomuser.me) Remember the API – Programmer gifts and merchandise (remembertheapi.com) Resources We Like ReDoc – OpenAPI/Swagger-generated API Reference Documentation (GitHub) Google Earth – The world’s most detailed globe. (google.com/earth) Google Sky – Traveling to the stars has never been easier. (google.com/sky) apitracker.io – Discover the best APIs and SaaS products to integrate with. (apitracker.io) ProgrammableWeb – The leading source of news and information about Internet-based APIs.(ProgrammableWeb.com) NASA APIs – NASA data, including imagery, accessible to developers. (api.nasa.gov) RapidAPI – The Next-Generation API Platform (rapidapi.com) Stuart Smalley (Wikipedia) Al Franken (Wikipedia) Muzzle – A simple Mac app to silence embarrassing notifications while screensharing. (MuzzleApp.com) Previously mentioned in episode 125. Tip of the Week Not sure what project to do? Google for an API or check out RapidAPI for a consistent way to farm ideas: RAWG Video Games Database API Documentation (rapidapi.com) Press F12 in Firefox, Chrome, or Edge, then go to the Elements tab (or Inspector in Firefox) to start hacking away at the DOM for immediate prototyping. All things K9s Getting Started with K9s – A Love Letter to K9s Use K9s to easily monitor your Kubernetes cluster Not only does K9s support skins and themes, but supports *cluster specific* skins (k9scli.io) If you like xkcd, Monkey User is for you! xkcd – A webcomic of romance, sarcasm, math, and language. (xkcd.com) Monkey User – Created out of a desire to bring joy to people working in IT. (MonkeyUser.com) Remap Windows Terminal to use CTRL+D, another keyboard customizations. (docs.microsoft.com) PostgreSQL and Foreign Data (postgresql.org) A listing of available foreign data wrappers for PostgreSQL on the wiki. (wiki.postgresql.org) Cheerio – Fast, flexible & lean implementation of core jQuery designed specifically for the server. (npmjs.com) JetBrains MPS (Meta Programming System) – Create your own domain-specific language (JetBrains) Case study – Domain-specific languages to implement Dutch tax legislation and process changes of that legislation. (JetBrains)

May 24

1 hr 46 min

We talk about the various ways we can get paid with code while Michael failed the Costco test, Allen doesn’t understand multiple choice questions, and Joe has a familiar pen name. This episode’s show notes can be found at https://www.codingblocks.net/episode158, where you can join the conversation, for those reading this via their podcast player. Sponsors Datadog –  Sign up today for a free 14 day trial and get a free Datadog t-shirt after creating your first dashboard. Linode – Sign up for $100 in free credit and simplify your infrastructure with Linode’s Linux virtual machines. Survey Says Do you want to run your own business? Take the survey at: https://www.codingblocks.net/episode158. News Thank you all for the latest reviews: iTunes: PriestRabbitWalkIntoBloodBank, Sock-puppet Sophist sez, Rogspug, DhokeDev, Dan110024 Audible: Aiden Show Me the Money Active Income Active income is income earned by exchanging time for money. This typically includes salary and hourly employment, as well as contracting. Some types of active income blur the lines. Way to find active income can include job sites like Stack Overflow Jobs, Indeed, Upwork, etc. Government grants and jobs are out there as well. Active income is typically has some ceiling, such as your time. Passive Income Passive income is income earned on an investment, any kind of investment, such as stock markets, affiliate networks, content sales for things like books, music, courses, etc. The work you do for the passive income can blur lines, especially when that work is promotion. Passive income is generally not tied to your time. Passive Income Options Create a SaaS platform to keep people coming back. Don’t let the term SaaS scare you off. This can be something smaller like a regex validator. Affiliate links are a great example of passive income because you need to invest the time once to create the link. Ads and sponsors: typically, the more targeted the audience is for the ad, the more the ad is worth. Donations via services like Ko-fi, Patreon, and PayPal. Apps, plugins, website templates/themes Create content, such as books, courses, videos, etc. Self-publishing can have a bigger reward and offer more freedom, but doesn’t come with the built-in audience and marketing team that a publisher can offer. Arbitrage between markets. Grow an audience, be it on YouTube, Twitch, podcasting, blogging, etc. Things to Consider What’s the up-front effort and/or investment? How much maintenance can you afford? How much will it cost you? Who gets hurt if you choose to quit? What can you realistically keep up with? What are the legal and tax liabilities? Resources We Like Apply for Grants To Fund Open Source Work (changeset.nyc) Government grants and loans (usa.gov, grants.gov) 35 Passive Income Ideas for Developers [All Types] (beginnerspassiveincome.com) 8 Side Income Ideas For Programmers (That Actually Work) (afternerd.com) Podcasts The Smart Passive Income Podcast with Pat Flynn (smartpassiveincome.com) Entrepreneurs On Fire (eofire.com) How I Built This with Guy Raz (npr.org) Who Moved My Cheese (Amazon) The Lean Startup: How Today’s Entrepreneurs Use Continuous Innovation to Create Radically Successful Businesses (Amazon) These top Patreon creators earn more than $200,000 a year (blog.patreon.com) Ko-fi – Make an Income Doing What You Love (ko-fi.com) PayPal – Make your Donate button (paypal.com) How Long Does It Take To Create An Online Course? (onlinecoursehow.com) Udemy – Planning your online course (udemy.com) Am I Procrastinating? (amiprocrastinating.com) Google Ads Help: Use Keyword Planner (support.google.com) Tip of the Week Google developer documentation style guide: Word list (developers.google.com) In Windows Terminal, use CTRL+SHIFT+W to close a tab or the window. The GitHub CLI manual (cli.github.com) Use gh pr create --fill to create a pull request using your last commit message as the title and body of the PR. We’ve discussed the GitHub CLI in episode 142 and episode 155. How to get a dependency tree for an artifact? (Stack Overflow) xltrail – Version control for Excel workbooks (xltrail.com) Spring Initializr (start.spring.io) You can leverage the same thing in IntelliJ with Spring.

May 10

2 hr 24 min

We discuss all things APIs: what makes them great, what makes them bad, and what we might like to see in them while Michael plays a lawyer on channel 46, Allen doesn’t know his favorite part of the show, and Joe definitely pays attention to the tips of the week. For those reading this episode’s show notes via their podcast player, you can find this episode’s show notes at https://www.codingblocks.net/episode157 where you can be a part of the conversation. Sponsors Datadog –  Sign up today for a free 14 day trial and get a free Datadog t-shirt after creating your first dashboard. Survey Says How do you prefer to be interviewed? Take the survey at: https://www.codingblocks.net/episode157. News Big thanks to everyone that left us a new review: iTunes: hhskakeidmd Audible: Colum Ferry All About APIs What are APIs? API stands for application programming interface and is a formal way for applications to speak to each other. An API defines requests that you can make, what you need to provide, and what you get back. If you do any googling, you’ll see that articles are overwhelmingly focused on Web APIs, particularly REST, but that is far from the only type. Others include: All libraries, All frameworks, System Calls, i.e.: Windows API, Remote API (aka RPC – remote procedure call), Web related standards such as SOAP, REST, HATEOAS, or GraphQL, and Domain Specific Languages (SQL for example) The formal definition of APIs, who own them, and what can be done with them is complicated à la Google LLC v. Oracle America, Inc. Different types of API have their own set of common problems and best practices Common REST issues: Authentication, Rate limiting, Asynchronous operations, Filtering, Sorting, Pagination, Caching, and Error handling. Game libraries: Heavy emphasis on inheritance and “hidden” systems to cut down on complexity. Libraries for service providers Support multiple languages and paradigms (documentation, versioning, rolling out new features, supporting different languages and frameworks) OData provides a set of standards for building and consuming REST API’s. General tips for writing great APIs Make them easy to work with. Make them difficult to misuse (good documentation goes a long way). Be consistent in the use of terms, input/output types, error messages, etc. Simplicity: there’s one way to do things. Introduce abstractions for common actions. Service evolution, i.e. including the version number as part of your API call enforces good versioning habits. Documentation, documentation, documentation, with enough detail that’s good to ramp up from getting started to in depth detail. Platform Independence: try to stay away from specifics of the platforms you expect to deal with. Why is REST taking over the term API? REST is crazy popular in web development and it’s really tough to do anything without it. It’s simple. Well, not really if you consider the 43 things you need to think about. Some things about REST are great by design, such as: By using it, you only have one protocol to support, It’s verb oriented (commonly used verbs include GET, POST, PUT, PATCH, and DELETE), and It’s based on open standards. Some things about REST are great by convention, such as: Noun orientation like resources and identifiers, Human readable I/O, Stateless requests, and HATEOAS provides a methodology to decouple the client and the server. Maybe we can steal some ideas from REST Organize the API around resources, like /orders + verbs instead of /create-order. Note that nouns can be complex, an order can be complex … products, addresses, history, etc. Collections are their own resources (i.e. /orders could return more than 1). Consistent naming conventions makes for easy discovery. Microsoft recommends plural nouns in most cases, but their skewing heavily towards REST, because REST already has a mechanism for behaviors with their verbs. For example /orders and /orders/123. You can drill in further and further when you orient towards nouns like /orders/123/status. The general guidance is to return resource identifiers rather than whole objects for complex nouns. In the order example, it’s better to return a customer ID associated with the whole order. Avoid introducing dependencies between the API and the underlying data sources or storage, the interface is meant to abstract those details! Verb orientation is okay in some, very action based instances, such as a calculator API. Resources We Like API (Wikipedia) The Linux Kernel API (kernel.org) gRPC (Wikipedia) S3 Compatible API (Backblaze) Storing Data Shouldn’t Cost More Than Generating It (Wasabi) Top 50 Most Popular APIs on RapidAPI (2021) (rapidapi.com) Free Public APIs for Developers APIs (rapidapi.com) API Reference (Datadog) OData – the best way to REST (odata.org) Understand OData in 6 steps (odata.org) Best practices for REST API design (Stack Overflow) Best Practices in API Design (swagger.io) Web API design (docs.microsoft.com) The Web API Checklist — 43 Things To Think About When Designing, Testing, and Releasing your API (mathieu.fenniak.net) Google API Design Guide (apistylebook.com) The DevOps Handbook – Create Organizational Learning (episode 145) How to Scrum (episode 156) State of JS 2020 (stateofjs.com) Tip of the Week Docker Desktop: WSL 2 Best practices (Docker) Experiencing déjà vu? That’s because we talked about this during episode 156. With Minikube, you can easily configure the amount of CPU and RAM each time you start it. Minikube default CPU/Memory (Stack Overflow) Listen to American Scandal. A great podcast with amazing production quality. (Wondery) If you have a license for DataGrip and use other JetBrains IDEs, once you add a data source, the IDE will recognize strings that are SQL in your code, be they Java, JS, Python, etc., and give syntax highlighting and autocomplete. Also, you can set the connection to a DB in DataGrip as read only under the options. This will give you a warning message if you try a write operation even if your credentials have write permissions. API Blueprint. A powerful high-level API description language for web APIs. (apiblueprint.org) Apache Superset – A modern data exploration and visualization platform. (Apache) Use console.log() like a pro. (markodenic.com) Turns out we did discuss something similar to this back in episode 44. Telerik Fiddler – A must have web debugging tool for your web APIs. (Telerik) New Docker Build secret information (docs.docker.com)

Apr 26

2 hr 34 min

We discuss the parts of the scrum process that we’re supposed to pay attention to while Allen pronounces the m, Michael doesn’t, and Joe skips the word altogether. If you’re reading this episode’s show notes via your podcast player, just know that you can find this episode’s show notes at https://www.codingblocks.net/episode156. Stop by, check it out, and join the conversation. Sponsors Datadog –  Sign up today for a free 14 day trial and get a free Datadog t-shirt after creating your first dashboard. Survey Says For your next car, you plan to buy: Take the survey at: https://www.codingblocks.net/episode156. News Hey, we finally updated the Resources page. It only took a couple years. Apparently we don’t understand the purpose of the scrum during rugby. (Wikipedia) Standup Time User Stories A user story is a detailed, valuable chunk of work that can be completed in a sprint.\ Use the INVEST mnemonic created by Bill Wake: I = Independent – the story should be able to be completed without any dependencies on another story N = Negotiable – the story isn’t set in stone – it should have the ability to be modified if needed V = Valuable – the story must deliver value to the stakeholder E = Estimable – you must be able to estimate the story S = Small – you should be able to estimate within a reasonable amount of accuracy and completed within a single sprint T = Testable – the story must have criteria that allow it to be testable Stories Should be Written in a Format Very Much Like… “As a _____, I want _____ so that _____.”, like “As a user, I want MFA in the user profile so I can securely log into my account” for a functional story, or “As a developer, I want to update our version of Kubernetes so we have better system metrics” for a nonfunctional story. Stories Must have Acceptance Criteria Each story has it’s own UNIQUE acceptance criteria. For the MFA story, the acceptance criteria might be:\ Token is captured and saved. Verification of code completed successfully. Login works with new MFA. The acceptance criteria defines what “done” actually means for the story. Set up Team Boundaries Define “done”. Same requirement for ALL stories: Must be tested in QA environment, Must have test coverage for new methods. Backlog prioritization or “grooming”. Must constantly be ordered by value, typically by the project owner. Define sprint cadence Usually 1-4 weeks in length, 2-3 is probably best. Two weeks seems to be what most choose simply because it sort of forces a bit of urgency on you. Estimates Actual estimation, “how many hours will a task take?” Relative estimation, “I think this task will take 2x as long as this other ticket.” SCRUM uses both, user stories are compared to each other in relative fashion. By doing it this way, it lets external stakeholders know that it’s an estimate and not a commitment. Story points are used to convey relative sizes. Estimation is supposed to be lightweight and fast. Roadmap and Release Plan The roadmap shows when themes will be worked on during the timeframe.\ You should be able to have a calendar and map your themes across that calendar and in an order that makes sense for getting a functional system. Just because you should have completed, functional components at the end of each sprint, based on the user stories, that doesn’t mean you’re releasing that feature to your customer. It may take several sprints before you’ve completed a releasable feature. It will take several sprints to find out what a team’s stabilized velocity is, meaning that the team will be able to decently predict how many story points they can complete in a given sprint. Filling up the Sprint Decide how many points you’ll have for a sprint. Determine how many sprints before you can release the MVP. Fill up the sprints as full as possible in priority order UNLESS the next priority story would overflow the sprint. Simple example, let’s say your sprint will have 10 points and you have the following stories: Story A – 3 points Story B – 5 points Story C – 8 points Story D – 2 points Your sprints might look like: Sprint 1 – A (3) B(5), D(2) = 10 points Sprint 2 – C (8) Story C got bumped to Sprint 2 because the goal is to maximize the amount of work that can be completed in a given sprint in priority order, as much as possible. The roadmap is an estimate of when the team will complete the stories and should be updated at the end of each sprint. In other words, the roadmap is a living document. Sprint Planning This is done at the beginning of each sprint. Attendees – all developers, scrum master, project owner. Project owner should have already prioritized the stories in the backlog. The goal of the planning meeting is to ensure all involved understand the stories and acceptance criteria. Also make sure the overarching definition of “done” is posted as a reminder. Absolutely plan for a Q&A session. Crucial to make sure any misunderstandings of the stories are cleared here. Next the stories are broken down into specific tasks. These tasks are given actual estimates in time. Once this is completed, you need to verify that the team has enough capacity to complete the tasks and stories in the sprint. In general, each team member can only complete 6 hours of actual work per day on average. Each person is then asked whether they commit to the work in the sprint. Must give a “yes” or “no” and why. If someone can’t commit with good reason, the the project owner and team need to work together to modify the sprint so that everyone can commit. This is a highly collaborative part of scrum planning. Stakeholder Feedback Information radiators are used to post whatever you think will help inform the stakeholders of the progress, be it a task board or burn down chart. Task board Lists stories committed to in the sprint. Shows the status of any current tasks. Lists which tasks have been completed. Swimlanes are typically how these are depicted with lanes like: Story, Not Started, In Progress, Completed. Sprint burndown chart Shows ongoing status of how you’re doing with completing the sprint. Daily Standup The purpose of the standup is the three C’s:\ Collaboration, Communication, and Cadence. The entire team must join: developers, project owner, QA, scrum master. Should occur at the same time each day. Each status should just be an overview and light on the details. Tasks are moved to a new state during the standup, such as from Not Started to In Progress. Stakeholders can come to the scrum but should hold questions until the end. Cannot go over 15 minutes. It can be shorter, but should not be longer. Each person should answer three questions: What did you do yesterday?, What are you doing today?, and Are there any blockers? If you see someone hasn’t made progress in several days, this is a great opportunity to ask to help. This is part of keeping the team members accountable for progressing. Blockers are brought up during the meeting as anyone on the team needs to try and step in to help. If the issue hasn’t been resolved by the next day, then it’s the responsibility of the scrum master to try and resolve it, and escalate it further up the chain after that, such as to the project owner and so on, each consecutive day. Again, very important, this is just the formal way to keep the entire team aware of the progress. People should be communicating throughout the day to complete whatever tasks they’re working on. Backlog Refinement The backlog is constantly changing as the business requirements change. It is the job of the project owner to be in constant communication with the stakeholders to ensure the backlog represents the most important needs of the business and making sure the stories are prioritized in value order. Stories are constantly being modified, added, or removed. Around the midpoint of the sprint, there is usually a 30-60 minute “backlog refinement session” where the team comes together to discuss the changes in the backlog. These new stories can only be added to future sprints. The current sprint commitment cannot be changed once the sprint begins. The importance of this mid-sprint session is the team can ask clarifying questions and will be better prepared for the upcoming sprint planning. This helps the project owner know when there are gaps in the requirements and helps to improve the stories. Marking a Story Done The project owner has the final say making sure all the acceptance criteria has been met. There could be another meeting called the “sprint review” where the entire team meets to get signoff on the completed stories. Anything not accepted as done gets reviewed, prioritized, and moved out to another sprint. This can happen when a team discovers new information about a story while working on it during a sprint. The team agrees on what was completed and what can be demonstrated to the stakeholders. The Demo This is a direct communication between the team and the stakeholders and receive feedback. This may result in new stories. Stakeholders may not even want the new feature and that’s OK. It’s better to find out early rather than sinking more time into building something not needed or wanted. This is a great opportunity to build a relationship between the team members and the stakeholders. This demo also shows the overall progress towards the final goal. May not be able to demo at the end of every sprint, but you want to do it as often as possible. Team Retrospective Focus is on team performance, not the product and is facilitated by the scrum master. This is a closed door session and must be a safe environment for discussion. Only dedicated team members present and the team norms must be observed You want an open dialogue/ What worked well? Focus on good team collaboration. What did not work well? Focus on what you can actually change. What can be improved? Put items into an improvement backlog Focus on one or two items in the next sprint Start with team successes first! Resources We Like Scrum: The Basics (LinkedIn) Manifesto for Agile Software Development (agilemanifesto.org) Scrum (rugby) (Wikipedia) INVEST (mnemonic) (Wikipedia) Excellent examples of how not to write user stories (Twitter) The Pragmatic Programmer – How to Estimate (episode 109) Why is there a 20 and not 21 in some versions of Planning Poker? (Stack Overflow) Tip of the Week Test your YAML with the Ansible Template Tester (ansible.sivel.net) Hellscape by Andromida (Spotify, YouTube) Use ALT+LEFT CLICK in Windows Terminal to open a new terminal in split screen mode. Learn how to tie the correct knot for every situation! (animatedknots.com) Apply zoom levels to each tab independently of other tabs of the same website with Per Tab Zoom. (Chrome web store) Nearly every page on GitHub has a keyboard shortcut to perform actions faster. Learn them! (GitHub) Speaking of shortcuts, here’s a couple for Visual Studio Code: Use CTRL+P (or CMD+P on a Mac) to find a file by name or path. List (and search) all available commands with CTRL+SHIFT+P (or CMD+SHIFT+P on a … you know). Use CTRL+K M (CMD+K M) to change the current document’s language mode. Access your WSL2 filesystem from Windows using special network share: like \\wsl$\ubuntu_instance_name\home\your_username\some_path Docker Desktop: WSL 2 Best practices (Docker)

Apr 12

2 hr 25 min

During today’s standup, we focus on learning all about Scrum as Joe is back (!!!), Allen has to dial the operator and ask to be connected to the Internet, and Michael reminds us why Blockbuster failed. If you didn’t know and you’re reading these show notes via your podcast player, you can find this episode’s show notes in their original digital glory at https://www.codingblocks.net/episode155 where you can also jump in the conversation. Sponsors Datadog –  Sign up today for a free 14 day trial and get a free Datadog t-shirt after creating your first dashboard. Survey Says While in Slack, do you reply ... Take the survey at: https://www.codingblocks.net/episode155. News Thank you all for the reviews: iTunes: DareOsewa, Miggybby, MHDev, davehadley, GrandMasterJR, Csmith49, ForTheHorde&Tacos, A-Fi Audible: Joshua, Alex Do You Even Scrum? Why Do We Call it Scrum Anyways? Comes from the game of Rugby. A scrummage is when a team huddles after a foul to figure out their next set of plays and readjust their strategy. Why is Scrum the Hot Thing? Remember waterfall? Plan and create documentation for a year up front, only to build a product with rigid requirements for the next year. By the time you deliver, it may not even be the right product any longer. Waterfall works for things that have very repeatable steps, such as things like planning the completion of a building. It doesn’t work great for things that require more experimentation and discovery. Project managers saw the flaw in planning for the complete “game” rather than planning to achieve each milestone and tackle the hurdles as they show up along the way. Scrum breaks the deliverables and milestones into smaller pieces to deliver. The Core Tenants of Scrum Having business partners and stakeholders work with the development of the software throughout the project, Measure success using completed software throughout the project, and Allow teams to self-organize. Scrum Wants You to Fail Fast Failure is ok as long as you’re learning from it. But those lessons learned need to happen quickly, with fast feedback cycles. Small scale focus and rapid learning cycles. In other words, fail fast really means “learn fast”. It’s super important to recognize that Scrum is *not* prescriptive. It’s more like guardrails to a process. An Overview of the Scrum Framework The product owner has a prioritized backlog of work for the team. Every sprint, the team looks at the backlog and decides what they can accomplish during that sprint, which is generally 2-3 weeks. The team develops and tests their solutions until completed. This effort needs to happen within that sprint. The team then demonstrates their finished product to the product owner at the end of the sprint. The team has a retrospective to see how the sprint went, what worked, and what they can improve going forward. Focusing on creating a completed, demo-able piece of work in the sprint allows the team to succeed or fail/learn fast. Projects are typically comprised of three basic things: time, cost, and scope. Usually time and cost are fixed, so all you can work with is the scope. There are Two Key Roles Within Scrum Project owner – The business representative dedicated 100% to the team. Acts as a full time business representative. Reviews the team’s work constantly to ensure the proper deliverable is being created. Interacts with the stakeholders. Is the keeper of the product vision. Responsible for making sure the work is continuously sorted per the ongoing business needs. The Scrum master – Responsible for helping resolve daily issues and balance ongoing changes in requirements and/or scope. This person has a mastery of Scrum. Also helps improve internal team processes. Responsible for protecting the team and their processes. Balances the demands of the product owner and the needs of the team. This means keeping the team working at a sustainable rate. Acts as the spokesperson for the entire team. Provides charts and other views into the progress for others for transparency. Responsible for removing any blockers. Project owner focuses on what needs to be done while the Scrum master focuses on how the team gets it done. Scrum doesn’t value heroics by teams or team members. Scrum is all about Daily Collaboration Whatever you can do to make daily collaboration easier will yield great benefits. Collocate your team if possible. If you can’t do that, use video conferencing, chat, and/or conference calls to keep communication flowing. The Team Makeup You must have a dedicated team. If members of your team are split amongst different projects, it will be difficult to accomplish your goals as you lose efficiency. The ideal team size is 5 to 9 members. You want a number of T-shaped developers. These are people can work on more than one type of deliverable. You also need some “consultants” you may be able to call on that have more specialized/focused skillsets that may not be core members of the team. Team Norms Teams will need to have standard ways of dealing with situations. How people will work together. How they’ll resolve conflicts. How to come to a consensus. Must have full team buy-in and everyone must be willing to hold each other accountable. Agree to disagree, but move forward with agreed upon solution. Product Vision It’s the map for your team, it’s what tells you how to get where you want to go. This must be established by the project owner. The destination should be the “MVP”, i.e. the Minimum Viable Product. Why MVP? By creating just enough to get it out to the early adopters allows you to get feedback early. This allows for a fast feedback cycle. Minimizes scope creep. Must set the vision, and then decompose it. Break the Vision Down into Themes Start with a broad grouping of similar work. Allows you to be more efficient by grouping work together in similar areas. This also allows you to think about completing work in the required order. Once You’ve Identified the Themes, You Break it Down Further into Features If you had a theme of a User Profile, maybe your features might be things like: Change password, Setup MFA, and Link social media. To get the MVP out the door, you might decide that only the Change Password feature is required. Resources We Like Scrum: The Basics (LinkedIn) Manifesto for Agile Software Development (agilemanifesto.org) Epics, stories, themes, and initiatives (Atlassian) Bad Software Engineering KILLED Cyberpunk 2077’s Release (YouTube) Tip of the Week Learn and practice your technical writing skills. Online Technical Writing: Contents, Free Online Textbook for Technical Writing (prismnet.com) Examples, Cases & Models (prismnet.com) Using k9s makes running your Kubernetes cronjobs on demand super easy. Find the cronjob you want to run (hint: :cronjobs) and then use CTRL+T to execute the cronjob now. (GitHub) Windows Terminal is your new favorite terminal. (microsoft.com) What is Windows Terminal? (docs.microsoft.com) TotW redux: GitHub CLI – Your new favorite way to interact with your GitHub account, be it public GitHub or GitHub Enterprise. (GitHub) Joe previously mentioned the GitHub CLI as a TotW (episode 142) Grep Console – grep, tail, filter, and highlight … everything you need for a console, in your JetBrains IDE. (plugins.jetbrains.com) Use my_argument:true when calling pwsh to pass boolean values to your Powershell script. JetBrains allows you to prorate your license upgrades at any point during your subscription.

Mar 29

1 hr 56 min

We dig into recursion and learn that Michael is the weirdo, Joe gives a subtle jab, and Allen doesn’t play well with others while we dig into recursion. This episode’s show notes can be found at https://www.codingblocks.net/episode154, for those that might be reading this via their podcast player. Sponsors Datadog –  Sign up today for a free 14 day trial and get a free Datadog t-shirt after creating your first dashboard. News Thank you all for the reviews: iTunes: ripco55, Jla115 Audible: _onetea, Marnu, Ian Here I Go Again On My Own What is Recursion? Recursion is a method of solving a problem by breaking the problem down into smaller instances of the same problem. A simple “close enough” definition: Functions that call themselves Simple example: fib(n) { n <= 1 ? n : fib(n - 1) + fib(n - 2) } Recursion pros: Elegant solutions that read well for certain types of problems, particularly with unbounded data. Work great with dynamic data structures, like trees, graphs, linked lists. Recursion cons: Tricky to write. Generally perform worse than iterative solutions. Runs the risk of stack overflow errors. Recursion is often used for sorting algorithms. How Functions Roughly Work in Programming Languages Programming languages generally have the notion of a “call stack”. A stack is a data structure designed for LIFO. The call stack is a specialized stack that is common in most languages Any time you call a function, a “frame” is added to the stack. The frame is a bucket of memory with (roughly) space allocated for the input arguments, local variables, and a return address. Note: “value types” will have their values duplicated in the stack and reference types contain a pointer. When a method “returns”, it’s frame is popped off of the stack, deallocating the memory, and the instructions from the previous function resume where it left off. When the last frame is popped off of the call stack, the program is complete. The stack size is limited. In C#, the size is 1MB for 32-bit processes and 4MB for 64-bit processes. You can change these values but it’s not recommended! When the stack tries to exceed it’s size limitations, BOOM! … stack overflow exception! How big is a frame? Roughly, add up your arguments (values + references), your local variables, and add an address. Ignoring some implementation details and compiler optimizations, a function that adds two 32b numbers together is going to be roughly 96b on the stack: 32 * 2 + return address. You may be tempted to “optimize” your code by condensing arguments and inlining code rather than breaking out functions… don’t do this! These are the very definition of micro optimizations. Your compiler/interpreter does a lot of the work already and this is probably not your bottleneck by a longshot. Use a profiler! Not all languages are stack based though: Stackless Python (kinda), Haskell (graph reduction), Assembly (jmp), Stackless C (essentially inlines your functions, has limitations) The Four Memory Segments source: Quora How Recursive Functions Work The stack doesn’t care about what the return address is. When a function calls any other function, a frame is added to the stack. To keep things simple, suppose for a Fibonacci sequence function, the frame requires 64b, 32b for the argument and 32b for the return address. Every Fibonacci call, aside from 0 or 1, adds 2 frames to the stack. So for the 100th number we will allocate .6kb (1002 * 32). And remember, we only have 1mb for everything. You can actually solve Fibonacci iteratively, skipping the backtracking. Fibonacci is often given as an example of recursion for two reasons: It’s relatively easy to explain the algorithm, and It shows the danger of this approach. What is Tail Recursion? The recursive Fibonacci algorithm discussed so far relies on backtracking, i.e. getting to the end of our data before starting to wind back. If we can re-write the program such that the last operation, the one in “tail position” is the ONLY recursive call, then we no longer need the frames, because they are essentially just pass a through. A smart compiler can see that there are no operations left to perform after the next frame returns and collapse it. The compiler will first remove the last frame before adding the new one. This means we no longer have to allocate 1002 extra frames on the stack and instead just 1 frame. A common approach to rewriting these types of problems involves adding an “accumulator” that essentially holds the state of the operation and then passing that on to the next function. The important thing here, is that the your ONE AND ONLY recursive call must be the LAST operation … all by itself. Joe’s (Un)Official Recursion Tips Start with the end. Do it by hand. Practice, practice, practice. Joe Recursion Joe’s Motivational Script   1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 #!/usr/bin/env python3 import sys   def getname(arg):     print(f"{arg}")       if arg == 'Joe':         getname('Recursion')     else:         getname('Joe')     return   if __name__ == "__main__":     sys.setrecursionlimit(500)     print(f"Who is awesome?")       try:         getname('Joe')     except:         print("You got this!")   Recap Recursion is a powerful tool in programming. It can roughly be defined as a function that calls itself. It’s great for dynamic/unbounded data structures like graphs, trees, or linked lists. Recursive functions can be memory intensive and since the call stack is limited, it is easy to overflow. Tail call optimization is a technique and compiler trick that mitigates the call stack problem, but it requires language support and that your recursive call be the last operation in your function. FAANG-ish interviews love recursive problems, and they love to push you on the memory. Resources We Like Recursion (computer science) (Wikipedia) Dynamic Programming (LeetCode) Grokking Dynamic Programming Patterns for Coding Interviews (educative.io) Boxing and Unboxing in .NET (episode 2) IDA EBP variable offset (Stack Exchange) What is the difference between the stack and the heap? (Quora) Data Structures – Arrays and Array-ish (episode 95) Function Calls, Part 3 (Frame Pointer and Local Variables) (codeguru.com) How to implement the Fibonacci sequence in Python (educative.io) Tail Recursion for Fibonacci (GeeksforGeeks.org) Recursion (GeeksforGeeks.org) Structure and Interpretation of Computer Programs (MIT) Tail Recursion Explained – Computerphile (YouTube) !!Con 2019- Tail Call Optimization: The Musical!! by Anjana Vakil & Natalia Margolis (YouTube) Tip of the Week How to take good care of your feet (JeanCoutu.com) Be sure to add labels to your Kubernetes objects so you can later use them as your selector. (kubernetes.io) Example: kubectl get pods --selector=app=nginx Security Now!, Episode 808 (twit.tv, grc.com) AdGuard Home (adguard.com) Match CNAME records against the blocklists (GitHub)

Mar 15

2 hr 8 min

It’s been a minute since we last gathered around the water cooler, as Allen starts an impression contest, Joe wins said contest, and Michael earned a participation award. For those following along in their podcast app, this episode’s show notes can be found at https://www.codingblocks.net/episode153. Sponsors Datadog –  Sign up today for a free 14 day trial and get a free Datadog t-shirt after creating your first dashboard. DataStax – Sign up today to get $300 in credit with promo code CODINGBLOCKS and make something amazing with Cassandra on any cloud. Survey Says When you start a new project, in regards to the storage technology, do you ... Take the survey at: https://www.codingblocks.net/episode153. News Thank you all for the latest reviews: iTunes: peter b :(, Jackifus, onetea_ Getting BSOD? Test your memory with MemTest86. Check out all of the test descriptions. Gather Around the Water Cooler Go deep on a single language? Or know enough about many of them? Who is hiring for remote work? Remote job resources: Datadog DataStax Microsoft Facebook Amazon Github GitLab Twitter Confluent Elastic MongoDB Netlify Heroku remotegamejobs.com Hand-picked remote jobs from Hacker News – Who is hiring (RemoteLeaf.com) remoteok.io Stack Overflow What companies are in your top 3? Know your storage technology. What it excels at and what it doesn’t. Resources We Like DB-Engines Ranking (db-engines.com) Search Driven Apps (episode 83) Amazon CloudSearch FAQs (aws.amazon.com) Kubernetes Failure Stores (k8s.af) The Uber Engineering blow (eng.uber.com) Datadog and the Container Report, with Michael Gerstenhaber (Kubernetes Podcast from Google, episode 137) Tip of the Week Automated Google Cloud Platform Authentication with minikube. Be careful about how you use ARG in your Docker images. (docs.docker.com) Calvin and Hobbes the Search Engine (MichaelYingling.com) 11 Facts About Real-World Container Use (Datadog) Tips & Tricks for running Strimzi with kubectl (Strimzi)

Mar 1

2 hr 3 min

We dig into all things Python, which Allen thinks is pretty good, and it’s rise in popularity, while Michael and Joe go toe-to-toe over a gripe, ahem, feature. We realize that you _can_ use your podcast player to read these notes, but if you didn’t know, this episode’s show notes can be found at https://www.codingblocks.net/episode152. Check it out and join the conversation. Sponsors Datadog –  Sign up today for a free 14 day trial and get a free Datadog t-shirt after creating your first dashboard. DataStax – Sign up today to get $300 in credit with promo code CODINGBLOCKS and make something amazing with Cassandra on any cloud. Linode – Sign up for $100 in free credit and simplify your infrastructure with Linode’s Linux virtual machines. Survey Says What's your favorite Python feature? Take the survey at: https://www.codingblocks.net/episode152.   News The Coding Blocks Game Jam 2021 results are in! (itch.io) Our review page has been updated! (/review) Ergonomic keyboard reviews: Kinesis Advantage 2 Full Review after Heavy Usage (YouTube) Ergonomic Keyboard Zergotech Freedom Full Review (YouTube) Why Python? A Brief History of Python. Very Brief. Python is a general-purpose high-level programming language, which can be used to develop desktop GUI applications, websites, and apps that run on sophisticated algorithms. Python was created in 1991, before JavaScript or Java, but didn’t make major leaps in popularity until 1998 – 2003, according to the Tiobe index. Coincidentally, this lines up with the early days of Google, where they had a motto of “Python where we can, C++ where we must”. In 2009, MIT switched from Scheme to Python, and others in academia followed. Some Python Benefits, But Only Some It’s an easy language for new developers as well as those who don’t consider themselves developers, such as data scientists or hobbyists, but have a need write some code. Python has a great standard library when compared to languages like JavaScript that largely rely on third party libraries to provide depth in functionality. It’s cross platform. As long as we’re talking OS. Mobile? Not really, as that space is consumed with Swift, Java, and Objective-C. But with things like Pythonista, you can write and execute Python on mobile. Web? No, at least not on the client side. That space is dominated by JavaScript. But with frameworks like Django and Flask, you can use Python on the server side. In addition to the standard library, there are also many great/popular third party libraries, like NumPy, that are available on PyPi. The Downsides to Python Performance when compared to a natively compiled application because it’s a dynamic, interpreted language. Which brings up the late binding type system. The lack of mobile and web presence as previously mentioned. Legacy issues when dealing with v2, which is still in use. Language features that haven’t aged well, such as PEP-8. Quirks like self, or __init__, private functionality, and immutability. Resources We Like The Python Programming Language (Tiobe) Heavy usage of Python at Google (Stack Overflow) The Incredible Growth of Python (Stack Overflow) 2020 Developer Survey, Top Paying Technologies (Stack Overflow) What makes Python more popular than Ruby? (Reddit) 2020 Developer Survey, Most Loved, Dreaded, and Wanted (Stack Overflow) Top 10 Python Packages For Machine Learning (ActiveState.com) 56 Groundbreaking Python Open-source Projects – Get Started with Python (data-flair.training) Top 10 Python Packages Every Developer Should Learn (ActiveState.com) Choosing the right estimator aka the scikit-learn algorithm cheat-sheet (scikit-learn.org) Previously discussed as a Tip of the Week during episode 92, Azure Functions and CosmosDB from MS Ignite (/episode92) Introduction to Celery (docs.celeryproject.org) Is it possible to compile a program written in Python? (Stack Overflow) Pythonista 3 – A Full Python IDE for iOS (omz-software.com) Previously discussed as a Tip of the Week during episode 88, What is Algorithmic Complexity? (/episode88) Flask – Web development, one drop at a time (flask.palletsprojects.com) Django – A high-level Python Web framework that encourages rapid development and clean, pragmatic design. (DjangoProject.com) PyPi – The Python Package Index (PyPI) is a repository of software for the Python programming language. (pypi.org) Ten Reasons To Use Python (cuelogic.com) Top 10 Reasons Why Python is So Popular With Developers in 2021 (upgrad.com) Python – 12. Virtual Environments and Packages (docs.python.org) Python’s virtual environments have been mentioned as a Tip of the Week twice: first during episode 102, Why Date-ing is Hard and again during episode 140, The DevOps Handbook – Enabling Safe Deployments. (/episode102, /episode140) PEP 8 — Style Guide for Python Code (python.org) The Gary Gnu Intro With Knock Knock – The Great Space Coaster (YouTube) Datadog has a blog article for everything: Tracing asynchronous Python code with Datadog APM (Datadog) How to collect, customize, and centralize Python logs (Datadog) Tip of the Week It’s easy to get the last character of a string in Python: last_char = sample_str[-1] Follow us on Twitch: Joe Allen Michael Install the Git cheat NPM module to use git cheat to easily see a Git cheat sheet in your terminal. (GitHub) gcloud has a built-in cheat sheet. Use gcloud cheat-sheet to access it. (Google Cloud)

Feb 15

2 hr 20 min

We step back to reflect on what we learned from our first game jam, while Joe’s bathroom is too close and Allen taught Michael something (again). Stop squinting to read this via your device’s podcast player. This episode’s show notes can be found at https://www.codingblocks.net/episode151, where you can join the conversation. Sponsors Datadog –  Sign up today for a free 14 day trial and get a free Datadog t-shirt after creating your first dashboard. DataStax – Sign up today to get $300 in credit with promo code CODINGBLOCKS and make something amazing with Cassandra on any cloud. Linode – Sign up for $100 in free credit and simplify your infrastructure with Linode’s Linux virtual machines. Survey Says What is your favorite lesson learned from Game Jam? Take the survey at: https://www.codingblocks.net/episode151. News We appreciate the new reviews, thank you! iTunes: ddausdd Audible: devops.rob Kinesis Advantage 2 Full Review after Heavy Usage (YouTube) Game Jam Tips Aim for the browser and to be embedded. Be careful sharing your custom URL when hosting somewhere other than the game jam as it splits your traffic and likely, your feedback. Time management is super important. Be realistic about how much time you have. You’ll be tired by the end! Start with the Game Loop. Try to always be playable. Aim small and prioritize the “must haves”. Know what you want to learn. Judge your game against what you can do. Beware of graphics and animations! Inspiration is fine, but it can become a sinkhole. Recall from the above tips about time management and focusing on a playable game. Play into the theme. Or don’t. Use tools, asset stores, and libraries, such as Tiled, PyGame, Photoshop, and/or Butler, to simplify your effort and make maximum use of your time. Consider teamwork. Borrowing ideas is fine. Keep your “elevator pitch” in mind, and evolve it. Publish early and save energy for playing. Save time to write up your game’s description, take video/screenshots, etc. for the submission. Keep your game testable by having a dev mode and/or the ability to initialize a certain game state. Think about the player over and over and over. How do you teach them the game’s mechanics, physics, when the game is over, etc. And again, save time and energy for publishing your game, as well as, playing and rating other’s games. Resources We Like The Coding Blocks Game Jam 2021 submissions (itch.io) Tip of the Week Sign up for a game jam! (itch.io) Use -A number, -B number, or -C number to include context with your next grep output. (gnu.org) -A number will print number lines after the match. -B number will print number lines before the match. -C number will print number lines before and after the match. Add your Git commit to your Docker images as a label like: docker build --tag 1.0.0.1 --label git-commit=$GIT_COMMIT . Where $GIT_COMMIT might be something like: GIT_COMMIT=$(git rev-parse HEAD) or GIT_COMMIT=$(git rev-parse --short HEAD) if you only want to use the abbreviated commit ID. In Jenkins, you can use ${env.GIT_COMMIT} to get the current commit ID that the current build is building,

Feb 1

1 hr 43 min

We discuss all things open-source, leaving Michael and Joe to hold down the fort while Allen is away, while Joe’s impersonations are spot on and Michael is on a first name basis, assuming he can pronounce it. This episode of the Coding Blocks podcast is about the people and organizations behind open-source software. We talk about the different incentives behind projects, and their governance to see if we can understand our ecosystem better. This episode’s show notes can be found at https://www.codingblocks.net/episode150, if you’re reading this via your podcast player. Sponsors Datadog – Sign up today for a free 14 day trial and get a free Datadog t-shirt after your first dashboard. Linode – Sign up for $100 in free credit and simplify your infrastructure with Linode’s Linux virtual machines. Survey Says Which company has the best open source projects? Take the survey at: https://www.codingblocks.net/episode150. News We appreciate the new reviews, thank you! iTunes: @k_roll242, code turtle Upcoming Events: Joe is presenting at the San Diego Elastic Meetup Tuesday January 19th 2021 #CBJam is right around the corner, January 21st to 24th Subscribe to our Coding Blocks YouTube channel for all the upcoming keyboard reviews You Thought You Knew OSS Q: What do most developers think about when they think of “open-source” software? Q: Is the formal definition more important than the general perception? Formal Definitions of Open-Source opensource.org: Open source software is made by many people and distributed under an OSD-compliant license which grants all the rights to use, study, change, and share the software in modified and unmodified form. Software freedom is essential to enabling community development of open source software. opensource.com: Open source commonly refers to software that uses an open development process and is licensed to include the source code Pop Quiz, who created…? C: Dennis Ritchie Linux: Linus Torvolds Curl: Daniel Sternberg Python: Guido von Rossum JavaScript: Brandon Eich (Netscape) Node: Ryan Dahl Java: James Gosling (Sun) Git: Linus Torvalds C#: Microsoft (Anders Hejlsberg) Kubernetes: Google Postgres: Michael Stonebraker React: Facebook Rust: Mozilla Chromium: Google Flutter: Google TypeScript: Microsoft Vue: Evan You Q: It seems like most newer projects (with the exception of Vue) are associated with corporations or foundations. When and why did that change? GitHub Star Distribution Q: What are the most popular projects? Who were they made for, and why? https://github.com/EvanLi/Github-Ranking Who uses open-source software? There are a lot of stats and surveys…none great All surveys and stats agree that open-source is on the rise You kinda can’t not use open-source software. Your OS, tools, networking hardware, etc all use copious amounts of open-souce software. Individuals Many (most) smaller libraries are written and maintained by individual authors, and have few or no contributors Some large / important libraries have thousands of contributors 10 most contributed GitHub projects in 2019 VS Code has almost 20k contributors Flutter has 13k contributors Kubernetes and Ansible have around 7k Q: Why do individuals create open source? What do they get out of it? Corporations A lot of corporate “open source” that are utilities or tools for working with those companies (ie: Azure SDK) Many open source projects are stewarded by a single company (Confluent, Elastic, MongoDB) Many open source projects listed below are now run by a foundation Let’s look at some of the most prominent projects that were started by corporations. Note: many of these projects came in through acquisitions, and many have since been donated to foundations. Microsoft Maybe the biggest? Maybe? .NET, Helm, TypeScript, Postgres, VS Code, NPM, GitHub https://opensource.microsoft.com/ https://en.wikipedia.org/wiki/Microsoft_and_open_source Google Kubernetes, Angular, Chromium, Android, Go, Dart, Protobuff, TensorFlow, Flutter, Skaffold, Spinnaker, Polymer, Yeoman https://opensource.google/projects/explore/featured Facebook React, PyTorch, GraphQL, RocksDB, Presto, Jest, Flux https://opensource.facebook.com/projects Amazon Well, lots of toolkits and sdks for AWS… Oracle Java, MySQL https://developer.oracle.com/open-source/ Focused Corporations Sometimes a company will either outright own, or otherwise build a business centered around a technology. These companies will typically offer services and support around open-source projects. DataStax Elasticsearch Canonical MongoDB Q: Why do corporations publish open-source software? What do they get out of releasing projects? Foundations Foundations are organizations that own open-source projects Foundations have many different kinds of governance models, but generally they are responsible for things like… code stewardship (pull requests, versions, planning, contributors, lifecycle, support, certification*) financial support (domains, hosting, marketing, grants) legal issues (including protecting the contributors liability) Most big open-source projects you can think of run under some sort of foundation Typically they are funded by large corporate backers There are a ton of foundations here. including many “one-offs”: https://opensource.com/resources/organizations ** WordPress Foundation, Python Foundation, Mozilla foundations Foundations are run in a variety of ways, and for different reasons, some even offer many competing projects https://opensource.guide/leadership-and-governance/ ** BDFL – Python, small projects, one person has final say ** “Meritocracy” (not a great term) – Active contributors are given decision making power, voting ** Liberal Contribution: Projects seek concensus rather than a pure vote, strive to be inclusive (Node, Rust) Apache (1999) Governance: https://www.apache.org/foundation/governance/ Org Chart: https://www.apache.org/foundation/governance/orgchart https://projects.apache.org/ Non-Profit Company, mostly Java, tons of libraries, data All volunteer board, 350+ projects Projects HTTPD, Kafka, Spark, Flink, Groovy, Avro, Log4…, Maven, ActiveMQ, Lucene, Solr, Cloud Native Computing Foundation Kubernetes, Helm, Prometheus, FluentD, Linkerd, OpenTracing A whole bunch of others that start with a K Linux Foundation Linux Kernel Kubernetes..? Ah, they’re over CNCF, and many, many, many other things Let’s Encrypt, NodeJS (through the OpenJs Foundation) Q: Why do corporations donate projects, why do individuals? Who really owns open-source code? Resources We Like USA Facts – Our nation, in numbers (usafacts.org) Tip of the Week Peacock – Subtly change the color of your VS Code workspace. (marketplace.visualstudio.com) SketchUp – 3d model *all* of your projects. (sketchup.com)

Jan 18

2 hr 9 min

We start off the year discussing our favorite developer tools of 2020, as Joe starts his traditions early, Allen is sly about his résumé updates, and Michael lives to stream. For those that read the show notes via their podcast player and find themselves wondering where they can find these show notes on their computer, the answer is simple: https://www.codingblocks.net/episode149. Check it out and join the discussion. Sponsors Datadog – Sign up today for a free 14 day trial and get a free Datadog t-shirt after your first dashboard. Survey Says Which hand do you type the 6 key with? Take the survey at: https://www.codingblocks.net/episode149. News Thank you for the new reviews we received this holiday! iTunes: larsankile, 0xACE, Tbetcha33, The_Shrike_, shineycloud Watch Joe speak at the virtual San Diego Elastic Meetup, Tuesday, January 19, 2021 at 5:00 PM PST, where he is talking about Easy Local Development with Elastic Cloud on Kubernetes using Skaffold. Signup at community.elastic.co. Join our game jam January 21 – 24, 2021 and let’s make games! The Top Tools of 2020 K9s – A terminal UI to interact with your Kubernetes cluster. Popeye – A utility that scans a live Kubernetes cluster and reports potential issues with deployed resources and configurations. All things JetBrains, especially IntelliJ IDEA, PyCharm, and DataGrip. Amazing, cross-platform tools that developers from all tech stacks can use. Skaffold – Handles the workflow for building, pushing, and deploying your application within Kubernetes, allowing you to focus on writing your code. Kustomize – A template-free way to customize Kubernetes application configuration, built into kubectl. Oh My Zsh – An open source, community-driven framework for managing your zsh configuration. Netlify – Unites everything teams need to build and run dynamic web experiences, from preview to production, using an intuitive git-based workflow and a powerful serverless platform. Jira StopWatch – A desktop tool for recording time spent on different Jira issues. Lens – The Kubernetes IDE, Lens provides full situational awareness for everything that runs in Kubernetes, lowering the barrier of entry for those just getting started with Kubernetes, while radically improving productivity for the experienced users. Zoom – Keeping you connected wherever you are. Hands down, the best video calling software out there for screen sharing; you can actually read the other person’s screen! Prometheus – Power your metrics and alerting with a leading open-source monitoring solution. Grafana – The world’s most popular technology used to compose observability dashboards. Visual Studio Code – Code editing, redefined. Free and built on open source. Runs everywhere. Favorite VS Code extensions: Bracket Pair Colorizer – Matching brackets can be easily identified in color. GitLens – Supercharges the Git capabilities built into Visual Studio Code. Rainbow CSV – Highlight CSV files and run SQL-like queries. Cmder – A software package created out of pure frustration over the absence of nice console emulators on Windows. Unraid – Take control of your data, media, applications, and desktops, using just about any combination of hardware. Resources We Like Streaming: Debugging C# in Kubernetes and Skaffold vs Kustomize (YouTube) Tip of the Week Challenging projects every programmer should try (utk.edu) More challenging projects every programmer should try (utk.edu) Duck DNS – A free dynamic DNS hosted on AWS (duckdns.org) Honorable DNS mentions: 8.8.8.8/8.8.4.4 – Google Public DNS (Wikipedia) 9.9.9.9/9.9.9.10/9.9.9.11 – Quad9 (Wikipedia) 1.1.1.1/1.1.1.2/1.1.1.3 – Cloudflare’s DNS (Wikipedia) MIT’s Missing Semester of CS Education: The Missing Semester of Your CS Education (mit.edu) Missing Semester IAP 2020 (YouTube) 2020 Lectures (mit.edu)

Jan 4

2 hr 31 min

It’s the end of 2020. We’re all tired. So we phone it in for the last episode of the year as we discuss the State of the Octoverse, while Michael prepared for the wrong show (again), Allen forgot to pay his ISP bill and Joe’s game finished downloading. In case you’re wondering where you can find these show notes in all there 1:1 pixel digital glory because you’re reading them via your podcast app, you can find them at https://www.codingblocks.net/episode148, where you can also join the conversation. Sponsors Educative.io – Learn in-demand tech skills with hands-on courses using live developer environments. Visit educative.io/codingblocks to get an additional 10% off an Educative Unlimited annual subscription. xMatters – Sign up today to learn how to take control of your incident management workflow and get a free xMatters t-shirt. Survey Says What's your favorite Christmas movie? Take the survey at: https://www.codingblocks.net/episode148. News Joe will be speaking at the virtual San Diego Elastic Meetup, Tuesday, January 19, 2021 at 5:00 PM PST, talking about Easy Local Development with Elastic Cloud on Kubernetes using Skaffold. Signup at community.elastic.co. We’re hosting a game jam January 21 – 24, 2021 – let’s make games! Follow all of our upcoming events! (/events) Watch and learn with Joe and Michael as they dive into Kubernetes: Local Kubernetes dev with Helm and Skaffold (YouTube) Streaming: Debugging C# in Kubernetes and Skaffold vs Kustomize (YouTube) Resources We Like The 2020 State of the Octo-verse (GitHub) Curl turns 20, HTTP/2, QUIC, The Changelog. Episode #299 (changelog.com) Parent Driven Development, a podcast about parenting in tech (parentdrivendevelopment.com) Tip of the Week K9s, a terminal UI to interact with your Kubernetes clusters (GitHub) Popeye, a utility that scans a live Kubernetes cluster and reports potential issues with deployed resources and configurations. (popeyecli.io) IPython, a powerful interactive shell and so much more. (ipython.org) The Google Authenticator app now makes it super easy to export your two factor settings to another device. (App Store, Google Play) Dungeon Map Doodler, a free drawing tool that allows you to easily create maps for all your gaming needs. (dungeonmapdoodler.com)

Dec 2020

2 hr 4 min

We discuss the things we’re excited about for 2021 as Michael prepared for a different show, Joe can’t stop looking at himself, and Allen gets paid by the tip of the week. For those that aren’t in the know, these show notes can be found at https://www.codingblocks.net/episode147. Stop by and join the conversation. Sponsors Command Line Heroes – A podcast that tells the epic true tales of developers, programmers, hackers, geeks, and open source rebels who are revolutionizing the technology landscape. Educative.io – Learn in-demand tech skills with hands-on courses using live developer environments. Visit educative.io/codingblocks to get an additional 10% off an Educative Unlimited annual subscription. xMatters – Sign up today to learn how to take control of your incident management workflow and get a free xMatters t-shirt. Survey Says What are the least amount of bits (or smallest data type) your annual salary, in whole dollars, could fit in? Take the survey at: https://www.codingblocks.net/episode147. News We really appreciate the new reviews. iTunes: 8BitToaster, NotResp, masterull We’re hosting a game jam January 21 – 24, 2021 – let’s make games! Follow all of our upcoming events! (/events) Who’s Excited about What Joe Interactive online/streaming events DevOps and SRE technologies Python Game development Allen .NET 5 DevOps technologies Kubernetes Game Jams Big Data Video content creation Presentations IoT Machine Learning Michael Kubernetes all the things Kotlin Resources We Like 15 Kubernetes Tools For Deployment, Monitoring, Security, & More (phoenixNAP.com) Kubernetes Training and Certification (kubernetes.io) We’re sad to see you go, Azure Notebooks (notebooks.azure.com) At least they list some alternatives. Announcing .NET 5.0 (devblogs.microsoft.com) Must Buys Price Description   $17.99 Kasa Smart HS220 Dimmer Switch by TP-Link (Amazon, Best Buy) $17.46 Real-World Machine Learning (Amazon) Tip of the Week Manning Publications has a lot of their books available in audio form on Audible. Top 9 companies that are hiring software engineers to work remotely (HackReactor.com) iTerm2 – A terminal emulator for macOS that does amazing things. (item2.com) Use CMD+SHIFT+. to edit the command being pasted before running it. Some helpful tips for the holiday season: Capital One Shopping: Save in seconds (chrome web store) Automatically find and apply coupon codes with Honey. (chrome web store) Use PayPal Key as a virtual credit card to use your PayPal account anywhere credit cards are accepted. (PayPal) Use Privacy to create single or limited use credit cards (privacy.com) Port forward services from your Kubernetes cluster for external access to debug and test, like kubectl port-forward svc/svc-name 7000:8000. Use Port Forwarding to Access Applications in a Cluster (kubernetes.io)

Dec 2020

2 hr 14 min

We learn all the necessary details to get into the world of developer game jams, while Michael triggers all parents, Allen’s moment of silence is oddly loud, and Joe hones his inner Steve Jobs. If you’re reading these show notes via your podcast player and wondering where you can find them in your browser, well wonder no more. These show notes can be found at https://www.codingblocks.net/episode146 in all their 8-bit glory. Check it out and join the conversation. Sponsors Educative.io – Learn in-demand tech skills with hands-on courses using live developer environments. Visit educative.io/codingblocks to get an additional 10% off an Educative Unlimited annual subscription. xMatters – Sign up today to learn how to take control of your incident management workflow and get a free xMatters t-shirt. Datadog – Sign up today for a free 14 day trial and get a free Datadog t-shirt after your first dashboard. Survey Says What kind of game do you want to make? Take the survey at: https://www.codingblocks.net/episode146. News Thank you to everyone that left us a new review! iTunes: AbhishekN12, shkpstrbtorurn, Herkamer’s dad, Bamers22 Stitcher: Pour one out for Stitcher reviews, as sadly, they no longer have them. 8( Follow our upcoming events! (/events) Unlimited Google Photos’ … for a limited time. (blog.google) Woo-hoo! We cracked the top 15 in the Apple Podcasts Technology category somehow! Is Game Dev your Jam? What are Game Jams? A timed challenged to create and publish games. Similar to a musical jam, people bring in their own perspectives and skills, and making new stuff for the world Did you know there is an International Game Jam conference? When’s the next game jam? Popular Game Jams Ludum Dare (ldjam.com) An online event where games are made from scratch in a weekend. Check it out every April and October! Global Game Jam (https://globalgamejam.org) Empowers individuals worldwide to learn, experiment, and create together through the medium of games. January 2020, they had 934 locations in 118 countries create 9,601 games in one weekend! GGJ 2021 is scheduled for January 29-31, 2021. 7DRL (7drl.com) The 7DRL Challenges are NOT about being a fast coder, but rather proving you can release a finished, playable roguelike to the world. itch.io (itch.io) itch.io is a simple way to find and share indie games online for free. GMTK is 48hours, with 18k entrants. Anyone can instantly create and host a jam. 109,365 games have been created for jams hosted on itch.io. Why Should You do a Game Jam? Meet new people, see how they solved similar problems. Learn something new. Maybe you’ll love it? Great for your GitHub, blog, Twitch, etc. content. Maybe make millions of $$$. Step 1. enter game jam, Step 2. … something about making a game …, and Step 3. profit! How can you Game Jam? Check Indie Game Jams for a jam that interests you. No need to take time off, just do what you can. You can even make your own jams. Popular Tools Unity Unreal Godot Game Maker RPG Maker App Game Kit Resources We Like Indie Game Jams (indiegamejams.com) International Conference on Game Jams (indiegamejams.com) Ludum Dare (ldjam.com) Ludum Dare 47 event stats (ldjam.com) Global Game Jam Online (globalgamejam.org) What is Global Game Jam? (YouTube) The 7DRL Challenge (7drl.com) 7DRL Challenge 2020 (7drl.com) itch.io Game Jams on itch.io GMTK Game Jam 2020 (itch.io) GMTK Game Jam 2020 submissions (itch.io) GMTK Game Jam 2020 Winners (gmtk.itch.io) Stay Safe! Jam submissions (itch.io) 10 Awesome Game Jam Success Stories (gamesparks.com) All Your Database Are Belong to Us (episode 13) Tip of the Week Some places to learn python (pyatl.dev) JSONPath StatusBar (marketplace.visualstudio.com) An awesome curated list of chaos engineering resources (GitHub) Ultra portable 13″ laptops that Michael has his eye one: Mentioned on-air   Description Starting Price Dell XPS 13 Developer Edition (Dell) $1,049.00 Razer Blade Stealth 13 (Best Buy) online/clearance at $1,299.99 (seen for less in the store) Razer Book 13 (Razer) $1,199.99 Honorable (and/or forgotten) mentions Starting Price Description   $999.00 Samsung Galaxy Book Flex 13.3″ (Amazon) $999.99 Apple MacBook Air 13.3″ with Apple M1 Chip (Amazon)

Nov 2020

2 hr 16 min

We wrap up our deep dive into The DevOps Handbook, while Allen ruined Halloween, Joe isn’t listening, and Michael failed to… forget it, it doesn’t even matter. If you’re reading this via your podcast player, this episode’s full show notes can be found at https://www.codingblocks.net/episode145 where you can join the conversation, as well find past episode’s show notes. Sponsors Command Line Heroes – A podcast that tells the epic true tales of developers, programmers, hackers, geeks, and open source rebels who are revolutionizing the technology landscape. Educative.io – Learn in-demand tech skills with hands-on courses using live developer environments. Visit educative.io/codingblocks to get an additional 10% off an Educative Unlimited annual subscription. xMatters – Sign up today to learn how to take control of your incident management workflow and get a free xMatters t-shirt. Survey Says How often _should_ you update your resume? How often _do_ you update your resume? Take the surveys at: https://www.codingblocks.net/episode145. News Thank you to everyone that left us a new review! iTunes: AbhishekN12, Streichholzschächtelchen Mann Stitcher: emirdev Zoom introduces the Podtrak P8. (zoomcorp.com) Wrapping up The Third Way Use Chat Rooms and Bots to Automate and Capture Organizational Knowledge Chat rooms have been increasingly used for triggering actions. One of the first to do this was ChatOps at GitHub. By integrating automation tools within the chat, it was easy for people to see exactly how things were done. Everyone sees what’s happening. Onboarding is nice because people can look through the history and see how things work. This helps enable fast organizational learning. Another benefit is that typically chat rooms are public. so it creates an environment of transparency. One of the more beneficial outcomes was that ops engineers were able to discover problems quickly and help each other out more easily. “Even when you’re new to the team, you can look in the chat logs and see how everything is done. It’s as if you were pair-programming with them all the time.” Jesse Newland Automate Standardized Processes in Software for Re-Use Often times developers document things in wikis, SharePoint systems, word documents, excel documents, etc., but other developers aren’t aware these documents exist so they do things a different way, and you end up with a bunch of disparate implementations. The solution is to put these processes and standards into executable code stored in a repository. Create a Single, Shared Source Code Repository for Your Entire Organization This single repository enables quick of sharing amongst an entire organization. In 2015, Google had a single repository with over 1 billion files over 2 billion lines of code. This single repository is used by every software engineer and every product. This repository doesn’t just include code, but also: Configuration standards for libraries, infrastructure and environments like Chef, Ansible, etc., Deployment tools, Testing standards and tools as well as security, Deployment pipeline tools, Monitoring and analysis tools, and Tutorials and standards. Whenever a commit goes in, everything is built from code in the shared repo: no dynamic linking. This ensures everything works with the latest code in the repository. By building everything off a single source tree, Google eliminates the problems you encounter when you use external dependency management systems like Artifactory, Nuget, etc. Spread Knowledge by Using Automated Tests as Documentation and Communities of Practice Sharing libraries throughout an organization means you need a good way of sharing expertise and improvements. Automated tests are a great way to ensure things work with new commits and they are self-documenting. TDD turns tests into up-to-date specifications of a system. Want to know how to use the library? Take a look at the test suites. Ideally you want to have one group responsible for owning and supporting a library. Ideally you only ever have one version of that code out in production. It will contain the current best collaborative knowledge of the organization. The owner is also responsible for migrating each consumer of the library to the next version. This requires the consumers to have a good suite of automated testing as well. Another great use of chat rooms is to have one for each library. Design for Operations Through Codified Non-Functional Requirements When developers are responsible for incident response in their deployed applications, their applications become better designed for operations. As developers are involved in non-functional requirements, we design our systems for faster deployment, better reliability, the ability to detect problems, and allow for graceful degradation. Some of these non-functionals are: Production telemetry, Ability to track dependencies, Resilient and gracefully degrading services, Forward and backward compatibility between versions, Ability to archive data to reduce size requirements, Ability to search and understand log messages, Ability to trace requests through multiple services, and Centralized runtime configurations. Build Reusable Operations User Stories into Development When there is ops works that needs to be done but can’t be fully automated, we need to make them as repeatable and deterministic as we can. Automate as much as possible. Document the rest for operations. Automation for handoffs is also helpful. By having these workflows and handoffs in place, it’s easier to measure and forecast future needs and ETAs. Ensure Technology Choices Help Achieve Organizational Goals Any technology introduced, introduces more pressure on operations for support. If operations cannot support it, then the group that owns the service or library becomes a bottleneck, which could be a major problem. Always be identifying technologies that appear to be the problem areas. Maybe they: Slow the flow of work, Create high levels of unplanned work (i.e. fire fighting), Unbalanced number of support requests, and/or Don’t really meet organizational goals, such as stability, throughput, etc. This doesn’t mean don’t use new technologies or languages, but know that your level of support greatly diminishes as you go into uncharted territories. Reserve time to Create Organizational Learning and Improvement Dedicate time over several days to attack and resolve a particular problem or issue. Use people outside the process to assist those inside the process. The most intense methodology is a 30 day focus group with coaches and engineers that focus on solving real company problems. Not uncommon to solve in days what used to take months Institutionalize Rituals to Pay Down Technical Debt Schedule time, a few days, a week, whatever, to fix problems you care about. No feature work allowed. Could be code problems, environment, configuration, etc. Usually want to include people from different teams working together, i.e. operations, developers, InfoSec, etc. Present accomplishments at the end of the blitz Enable Everyone to Teach and Learn Everyone should be encouraged to teach and learn in their own ways. It’s becoming more important than ever for folks to have knowledge in more than just one area to be successful. Encourage cross functional pollination, i.e. have operations show developers how they do something, or vice versa. Share Your Experiences from DevOps Conferences Organizations should encourage their employees to attend and/or speak at conferences. Hold your own company conference, even if it’s just for your small team. Create Internal Consulting and Coaches to Spread Practices Encourage your SME’s to have office hours where they’ll answer technical questions. Create groups with missions to help the organization move forward. Resources We Like The DevOps Handbook: How to Create World-Class Agility, Reliability, and Security in Technology Organizations (Amazon) The Phoenix Project: A Novel about IT, DevOps, and Helping Your Business Win (Amazon) The Unicorn Project: A Novel about Developers, Digital Disruption, and Thriving in the Age of Data (Amazon) Comparing Git Workflows (episode 90) Git Large File Storage (GitHub) Devopsdays (devopsdays.org) DevOps: Job Title or Job Responsibility? (episode 118) Tip of the Week Diff syntax highlighting in Github Markdown (Stack Overflow) Code Chefs – Hungry Web Developer Podcast (Apple Podcasts) Maria, a coding environment for beginners (maria.cloud) CodeWorld, create drawings, animations, and games using math, shapes, colors, and transformations. (code.world) Generation numbers and preconditions – Apply preconditions to guarantee atomicity of multi-step transactions with object generation numbers to uniquely identify data resources. (cloud.google.com) helm search repo – Search repositories for a keyboard in charts. (helm.sh) Use -cur_console:p5 in your cmder WSL profile to ensure that the arrow keys work as expected on Windows 10 (GitHub) cmder – A portable console emulator for Windows (cmder.net)

Nov 2020

2 hr 16 min

It’s our favorite time of year where we discuss all of the new ways we can spend our money in time for the holidays, as Allen forgets a crucial part, Michael has “neons”, and Joe has a pet bear. Reading this via your podcast player? If so, you can find this episode’s full show notes at https://www.codingblocks.net/episode144, where you can join the conversation. Sponsors Datadog – Sign up today for a free 14 day trial and get a free Datadog t-shirt after your first dashboard. Teamistry – A podcast that tells the stories of teams who work together in new and unexpected ways, to achieve remarkable things. Survey Says Which do you want the most? Take the survey at: https://www.codingblocks.net/episode144. News Thank you for the latest new review! Stitcher: wsha And I’m Spent Allen’s List for the Content Creators Price Description   $149.99 Elgato Stream Deck (Amazon) $69.99 TDisplay Capture Card (Amazon) $329.00 Oktava MK-012 Small Diaphragm Condenser mic (Amazon) $199.99 Zoom Podtrak P4 (Best Buy) $7.00 Oktava WS-012 foam Wind-Screen for MK-012 (Amazon) $16.99 LYRCRO Microphone Shock Mount Holder (Amazon) $274.99 product (TechSmith) $26.99 Kasa Smart Plugs 4 Pack (Amazon) $14.99 3 prong extension cables 1′ (Amazon) $29.99 Toazoe Articulating 11″ Arm (Amazon) $12.99 Mounting pole (Amazon) $14.99 Desk mounting bracket for pole (Amazon) $199.99 Elgato Key Light (Amazon) $90.00 Aputure Amaran MC RGBWW (Amazon) $186.20 Silverstone ATX CS380B (Amazon) $139.99 Ryzen 5 3400G (Amazon) $99.99 Silicon Power 32GB 3200Mhz (Amazon) $217.99 WD Easy Store 12TB (Amazon) $44.99 Thermaltake 500w 80+ (Amazon) $365.00 ZSA Moonlander (ZSA) Michael’s List to Pimp Your Desk   Description Price Apple iPad Pro (12.9-inch, WiFi, 256GB) (Amazon) $1,079.00 amFilm (2 Pack) Glass Screen Protector for iPad Pro 12.9 inch (2020 & 2018 Models) (Amazon) $12.99 ProCase iPad Pro 12.9 Case 4th Generation 2020 & 2018 (Amazon) $21.99 Apple AirPods with Charging Case (Wired) (Amazon) $129.00 Honorable mention: Apple AirPods Pro (Amazon) $219.00 Philips Hue 555334 BLE Lightstrip LED Light strip, 2m / 6ft Base Kit (Amazon) $79.99 Philips Hue Indoor Motion Sensor for Smart Lights (Amazon) $38.95 Elgato Stream Deck (15 Keys) (Amazon) $149.99 Knox Microphone Shock Mount for Audio-Technica ATR2100-USB and Samson Q2U (Amazon) $19.99 On-Stage Foam Ball-Type Microphone Windscreen (Amazon) $2.99 ZSA Moonlander (ZSA) $365.00 Glorious 3XL Extended Gaming Mouse Mat/Pad (Amazon) $49.99 Razer Gaming Mouse Bungee v2 (Amazon) $19.99 ALIENWARE AW3420DW Curved 34 Inch WQHD 3440 X 1440 120Hz Monitor (Amazon) $1,029.99 Honorable mention:Dell AW3418DW Alienware 34 Curved Gaming Monitor (Amazon) n/a Microsoft Xbox Series X (Amazon) $499.99 Castle Grayskull …   Description Price Tilted Nation RGB Headset Stand and Gaming Headphone Display with Mouse Bungee Cord Holder with USB 3.0 HUB (Amazon) n/a Joe’s List to Make Bank Well, first there’s *the* chair … Price Description   $3,299.00 Cluvens Scorpion Computer Cockpit (Cluvens) Flippa – Buy an online business, become an acquisition entrepreneur, and invest in digital real estate. Start Engine – Invest and buy shares in startups and small businesses. (DEFUNCT) Gratipay – An open source startup helping open source projects. Liberapay – A recurrent donations platform. Fund the creators and projects you appreciate. Wefunder – Back founders solving the problems you care about and help their startups grow. Crowdfunder – Where entrepreneurs and investors meet. Fundrise – The future of real estate investing. Localstake – Connecting businesses and local investors. SeedInvest – Own a piece of your favorite startups. Resources We Like Indoor Boom Microphones: Oktava MK-012, RODE NT5, Audio Technica AT4053b (YouTube) The Aputure MC Video LED Light is Amazing! (YouTube) Tip of the Week The Docker image for Google Cloud SDK is the easiest way to interact with the Google cloud. (hub.docker.com) Play Hades. Play it now. (Steam) Use git log --reverse to see the repo history from the beginning. (git-scm.com)

Oct 2020

2 hr 18 min

We dive into the benefits of enabling daily learning into our processes, while it's egregiously late for Joe, Michael's impersonation is awful, and Allen's speech is degrading. This episode’s show notes can be found at https://www.codingblocks.net/episode143, for those reading this via their podcast player, where you can join the conversation. Sponsors Datadog – Sign up today for a free 14 day trial and get a free Datadog t-shirt after your first dashboard. Teamistry – A podcast that tells the stories of teams who work together in new and unexpected ways, to achieve remarkable things. Survey Says How often do you change jobs? Take the survey at: https://www.codingblocks.net/episode143. News Thank you to everyone that left us a new review! iTunes: John Roland, Shefodorf, DevCT, Flemon001, ryanjcaldwell, Aceium Stitcher: Helia Allen saves your butt with his latest chair review on YouTube. Enable and Inject Learning into Daily Work To work on complex systems effectively and safely we must get good at: Detecting problems, Solving problems, and Multiplying the effects by sharing the solutions within the organization. The key is treating failures as an opportunity to learn rather than an opportunity to punish. Establish a Just, Learning Culture By promoting a culture where errors are “just” it encourages learning ways to remove and prevent those errors. On the contrary, an “unjust” culture, promotes bureaucracy, evasion, and self-protection. This is how most companies and management work, i.e. put processes in place to prevent and eliminate the possibility of errors. Rather than blaming individuals, take moments when things go wrong as an opportunity to learn and improve the systems that will inevitably have problems. Not only does this improve the organization’s systems, it also strengthens relationships between team members. When developers do cause an error and are encouraged to share the details of the errors and how to fix them, it ultimately benefits everyone as the fear of consequences are lowered and solutions on ensuring that particular problem isn’t encountered again increase. Blameless Post Mortem Create timelines and collect details from many perspectives. Empower engineers to provide details of how they may have contributed to the failures. Encourage those who did make the mistakes to share those with the organization and how to avoid those mistakes in the future. Don’t dwell on hindsight, i.e. the coulda, woulda, and shoulda comments. Propose countermeasures to ensure similar failures don’t occur in the future and schedule a date to complete those countermeasures. Stakeholders that should be present at these meetings People who were a part of making the decisions that caused the problem. People who found the problem. People who responded to the problem. People who diagnosed the problem. People who were affected by the problem. Anyone who might want to attend the meeting. The meeting Must be rigorous about recording the details during the process of finding, diagnosing, and fixing, etc. Disallow phrases like “could have” or “should have” because they are counterproductive. Reserve enough time to brainstorm countermeasures to implement. These must be prioritized and given a timeline for implementation. Publish the learnings and timelines, etc. from the meeting so the entire organization can gain from them. Finding more Failures as Time Moves on As you get better at resolving egregious errors, the errors become more subtle and you need to modify your tolerances to find weaker signals indicating errors. Treat applications as experiments where everything is analyzed, rather than stringent compliance and standardization. Redefine Failure and Encourage Calculated Risk Taking Create a culture where people are comfortable with surfacing and learning from failures. It seems counter-intuitive, but by allowing more failures this also means that you’re moving the ball forward. Inject Production Failures The purpose is to make sure failures can happen in controlled ways. We should think about making our systems crash in a way that keeps the key components protected as much as possible i.e. graceful degradation. Use Game Days to Rehearse Failures “A service is not really tested until we break it in production.” Jesse Robbins Introduce large-scale fault injection across your critical systems. These gamedays are scheduled with a goal, like maybe losing connectivity to a data center. This gives everyone time to prepare for what would need to be done to make sure the system still functions, failovers, monitoring, etc. Take notes of anything that goes wrong, find, fix, and retest. On gameday, force an outage. This exposes things you may have missed, not anticipated, etc. Obviously the goal is to create more resilient systems. Resources We Like The DevOps Handbook: How to Create World-Class Agility, Reliability, and Security in Technology Organizations (Amazon) The Phoenix Project: A Novel about IT, DevOps, and Helping Your Business Win (Amazon) The Unicorn Project: A Novel about Developers, Digital Disruption, and Thriving in the Age of Data (Amazon) Netflix Chaos Monkey (GitHub) Chaos Mesh, a cloud-native platform that orchestrates chaos on Kubernetes environments. (GitHub) Alexey Golub’s Twitter response (thread) to our discussion of his article Unit Testing is Overrated during episode 141. Esty’s post mortem tracker: morgue (GitHub) 1987 Crash Test Dummies PSA – Buckle Up (YouTube) Tip of the Week Firefox Relay – Hide your real email address to help protect your identity (relay.firefox.com) Honorable mention: Sign in with Apple (support.apple.com) How I Built this with Guy Raz – Khan Academy: Sal Khan (NPR) Boost your student’s learning (Khan Academy) Automate your world at the push of a button with the Elgato Stream Deck. (Elgato) /the social dilemma – A hybrid documentary-drama that explores the dangerous human impact of social networking. (Netflix) Migrate your repos from TFVC (aka Team Foundation Version Control) to Git using git-tfs. (GitHub) Migrate from TFVC to Git (docs.microsoft.com) Use Azure DevOps to simplify the migration process: Import repositories from TFVC to Git (docs.microsoft.com) What’s the difference between git-tf and git-tfs? (Stack Overflow)

Oct 2020

1 hr 52 min

We wrap up the second way from The DevOps Handbook, while Joe has a mystery episode, Michael doesn’t like ketchup, and Allen has a Costco problem. These show notes, in all of their full sized digital glory, can be found at https://www.codingblocks.net/episode142, where you can join the conversation, for those using their podcast player to read this. Sponsors Datadog – Sign up today for a free 14 day trial and get a free Datadog t-shirt after your first dashboard. Survey Says When a new mobile OS update comes out on iOS or Android, do you ... Take the survey at: https://www.codingblocks.net/episode142. News We appreciate the new reviews, thank you! iTunes: Jynx_Protocol Stitcher: IllKeepItEasyForYouMike, KingJArthur Treating Features as Experiments Integrate Hypothesis Driven Development and A/B Testing “The most inefficient way to test a business model or product idea is to build the complete product to see whether the predicted demand actually exists.” Jez Humble Constantly ask should we build it and why? A/B testing will allow us to know if an idea is worthwhile because allows for fast-feedback on what’s working. Doing these experiments during peak season can allow you to out-experiment the competition. But this is only possible if you can deploy quickly and reliably. This allows A/B testing to support rapid, high-velocity experiments. A/B test is also known as “split testing”. A/B testing is where one group sees one version of a page or feature and the other group sees another version. Study from Microsoft found that only about 1/3 of features actually improved the key metric they were trying to move! The important takeaway? Without measuring the impact of features, you don’t know if you’re adding value or decreasing it while increasing complexity. Integrate A/B Testing Into Releases Effective A/B testing is only possible with the ability to do production releases quickly and easily. Using feature flags allow you to delivery multiple versions of the application without requiring separate hardware to be deployed to. This requires meaningful telemetry at every level of the stack to understand how the application is being used. Etsy open-sourced their Feature API, used for online ramp-ups and throttling exposure to features. Optimizely and Google Analytics offer similar features. Integrating A/B Testing into Feature Planning Tie feature changes to actual business goals, i.e. the business has a hypothesis and an expected result and A/B testing allows the business owner to experiment. The ability to deploy quickly and reliably is what enables these experiments. Create Processes to Increase Quality Eliminate the need for “approvals” from those not closely tied to the code being deployed. Development, Operations and InfoSec should constantly be collaborating. The Dangers of Change Approval Process Bad deployments are often attributed to: Not enough approval processes in place, or Not good enough testing processes in place The findings of this is that often, command-and-control environments usually raise the likelihood of bad deployments. Beware of “Overly Controlling Changes” Traditional change controls can lead to: Longer lead times, and/or Reducing the “strength and immediacy” of the deployment process. Adding the traditional controls add more “friction” to the deployment process, by: Multiplying the number of steps in the approval process, Increasing batch sizes (size of deployments), and/or Increasing deployment lead times. People closest to the items know the most about them. Requiring people further from the problem to do approvals reduces the likelihood of success. As the distance between the person doing the work and the person approving the work increases, so does the likeliness of failure. Organizations that rely on change approvals often have worse stability and throughput in their IT systems. The takeaway is that peer reviews are much more effective than outside approvals. Enable Coordination and Scheduling of Changes The more loosely coupled our architecture, the less we have to communicate between teams. This allows teams to make changes in a much more autonomous way. This doesn’t mean that communication isn’t necessary, sometimes you HAVE to speak to someone. Especially true when overarching infrastructure changes are necessary. Enable Peer Review of Changes Those who are familiar with the systems are better to review the changes. Smaller changes are much better. The size of a change is not linear with the risk of the change. As the size of a change increases, the risk goes up by way more than a factor of one, Prefer short lived branches. “Scarier” changes may require more than just one reviewer. Potential Dangers of Doing More Manual Testing and Change Freezes The more manual testing you do, the slower you are to release. The larger the batch sizes, the slower you are to release. Enable Pair Programing to Improve all our Changes “I can’t help wondering if pair programming is nothing more than code review on steroids.” Jeff Atwood Pair programming forces communication that may never have happened. Pair programming brings many more design alternatives to life. It also reduces bottlenecks of code reviews. Evaluating the Effectiveness of Pull Request Processes Look at production outages and tie them back to the peer reviews. The pull request should have good information about what the context of the change is: Sufficient detail on why the change is being made, How the change was made, and Any risks associated with it. Fearlessly Cut Bureaucratic Processes The goal should be to reduce the amount of outside approvals, meetings, and signoffs that need to happen to deploy the application. Resources We Like The DevOps Handbook: How to Create World-Class Agility, Reliability, and Security in Technology Organizations (Amazon) The Phoenix Project: A Novel about IT, DevOps, and Helping Your Business Win (Amazon) The Unicorn Project: A Novel about Developers, Digital Disruption, and Thriving in the Age of Data (Amazon) Multivariate Testing vs A/B Testing (optimizely.com) Esty’s Feature API (GitHub) Pair Programming vs. Code Reviews (blog.codinghorror.com) Tip of the Week Allen’s love of all things Costco has been extended to their credit card. Of course it has. (Costco) Doom Emacs – A configuration framework tailored for those that want a faster, stable environment with less framework in their frameworks. (GitHub) Create (and commit!) an EditorConfig based on your current project with IntelliCode in Visual Studio. (docs.microsoft.com) Allen’s less trick: Is it possible to keep the output of less on the screen after quitting? (Stack Overflow) Head in the Clouds podcast (libsyn.com) GitHub CLI (!!!) (github.blog) Speed up your k8s development on Windows with WSL: Developing for Docker + Kubernetes with Windows WSL (Medium) Configure bash completion for your kubectl commands: kubectl Cheat Sheet (kubernetes.io) We previously discussed this cheat sheet during episode 107, but didn’t highlight the bash completion at that time.

Sep 2020

1 hr 50 min

We gather around the water cooler to discuss some random topics, while Joe sends too many calendar invites, Allen interferes with science, and Michael was totally duped. If you’re reading these show notes via your podcast player, you can find this episode’s full show notes at https://www.codingblocks.net/episode141. As Joe would say, check it out and join the conversation. Sponsors Datadog – Sign up today for a free 14 day trial and get a free Datadog t-shirt after your first dashboard. Secure Code Warrior – Start gamifying your organization’s security posture today, score 5,000 points, and get a free Secure Code Warrior t-shirt. Survey Says Do you eat at your desk? Take the survey at: https://www.codingblocks.net/episode141. News Thank you to everyone left us a new review: iTunes: I Buy, Not Play! Stitcher: kconrad53, TheNicknameFieldIsTooShor, hopkir Factorio v1 is official! (factorio.com) Overheard Around the Water Cooler Are your unit tests bring your down? Do we hate unit tests now? Unit Testing Is Overrated (Hacker News) Is TDD Dead? (Reddit, Hacker News) Write tests. Not too many. Mostly integration. (KentCDodds.com) Do you have to read your emails? Are you ruining everything by working late? Is Kubernetes programmings? When developing the next big deal application, which of the following are most important to you? Features Automation, i.e. CI / CD pipeline Unit tests, maybe even TDD Dependency Injection ALM (Alerting, Logging, Monitoring) Security first Resources We Like RAW Vim Workshop/Tutorial (YouTube) Explore Kubernetes resources with Datadog Live Containers (Datadog) Tip of the Week Lens, The Kubernetes IDE (GitHub) Bind Docker inside a running container to the host’s Docker instance to use Docker within Docker by adding the following to your Docker run command: -v /var/run/docker.sock:/var/run/docker.sock Note that this syntax works on Windows.

Sep 2020

1 hr 42 min

We learn the secrets of a safe deployment practice while continuing to study The DevOps Handbook as Joe is a cartwheeling acrobat, Michael is not, and Allen is hurting, so much. For those of you that are reading these show notes via their podcast player, you can find this episode’s full show notes at https://www.codingblocks.net/episode140. Sponsors Datadog – Sign up today for a free 14 day trial and get a free Datadog t-shirt after your first dashboard. Secure Code Warrior – Start gamifying your organization’s security posture today, score 5,000 points, and get a free Secure Code Warrior t-shirt. Survey Says Do you prefer that your laptop keyboard ... Take the survey at: https://www.codingblocks.net/episode140. News Rob Pike’s Rules of Programming (utexas.edu) No, you click the button … Enable Feedback to Safely Deploy Code Without a quick feedback loop: Operations doesn’t like deploying developer code. Developers complain about operations not wanting to deploy their code. Given a button for anyone to push to deploy, nobody wants to push it. The solution is to deploy code with quick feedback loops. If there’s a problem, fix it quickly and add new telemetry to track the fix. Puts the information in front of everyone so there are no secrets. This encourages developers to write more tests and better code and they take more pride in releasing successful deployments. An interesting side effect is developers are willing to check in smaller chunks of code because they know they’re safer to deploy and easier to reason about. This also allows for frequent production releases with constant, tight feedback loops. Automating the deployment process isn’t enough. You must have monitoring of your telemetry integrated into that process for visibility. Use Telemetry to Make Deployments Safer Always make sure you’re monitoring telemetry when doing a production release, If anything goes wrong, you should see it pretty immediately. Nothing is “done” until it is operating as expected in the production environment. Just because you improve the development process, i.e. more unit tests, telemetry, etc., that doesn’t mean there won’t be issues. Having these monitors in place will enable you to find and fix these issues quicker and add more telemetry to help eliminate that particular issue from happening again going forward. Production deployments are one of the top causes of production issues. This is why it’s so important to overlay those deployments on the metric graphs. Pager Duty – Devs and Ops together Problems sometimes can go on for extremely long periods of time. Those problems might be sent off to a team to be worked on, but they get deprioritized in lieu of some features to be added. The problems can be a major problem for operations, but not even a blip on the radar of dev. Upstream work centers that are optimizing for themselves reduces performance for the overall value stream. This means everyone in the value stream should share responsibility for handing operational incidents. When developers were awakened at 2 AM, New Relic found that issues were fixed faster than ever. Business goals are not achieved when features have been marked as “done”, but instead only when they are truly operating properly. Have Developers Follow Work Downstream Having a developer “watch over the shoulder” of end-users can be very eye-opening. This almost always leads to the developers wanting to improve the quality of life for those users. Developers should have to do the same for the operational side of things. They should endure the pain the Ops team does to get the application running and stable. When developers do this downstream, they make better and more informed decisions in what they do daily, in regards to things such as deployability, manageability, operability, etc. Developers Self-Manage Their Production Service Sometimes deployments break in production because we learn operational problems too late in the cycle. Have developers monitor and manage the service when it first launches before handing over to operations. This is practiced by Google. Ops can act as consultants to assist in the process. Launch guidance: Defect counts and severity Type and frequency of pager alerts Monitoring coverage System architecture Deployment process Production hygiene If these items in the checklist aren’t met, they should be addressed before being deployed and managed in production. Any regulatory compliance necessary? If so, you now have to manage technical AND security / compliance risks. Create a service hand back mechanism. If a production service becomes difficult to manage, operations can hand it back to the developers. Think of it as a pressure release valve. Google still does this and shows a mutual respect between development and operations. Resources We Like The DevOps Handbook: How to Create World-Class Agility, Reliability, and Security in Technology Organizations (Amazon) The Phoenix Project: A Novel about IT, DevOps, and Helping Your Business Win (Amazon) The Unicorn Project: A Novel about Developers, Digital Disruption, and Thriving in the Age of Data (Amazon) Improve mobile user experience with Datadog Mobile Real User Monitoring (Datadog) Tip of the Week Configure an interpreter using Docker (JetBrains) JetBrains describes how to connect PyCharm to use Docker as the interpreter. BONUS: Why Date-ing is Hard (episode 102) We discuss using the venv Python module to create seperate virtual environments, allowing each to have their own version dependencies. (docs.python.org) To use venv, Create the virtual environment: python -m venv c:\path\to\myenv Activate the virtual environment: c:\path\to\myenv\Scripts\activate.bat NOTE that the venv module documentation includes the variations for different OSes and shells. Node Anchors in YAML (yaml.org) Tweaks (Visual Studio Marketplace) Install Tweaks to gain features, such as Presentation Mode, for Visual Studio. Angular state inspector (chrome web store) Angular Language Service (Visual Studio Marketplace) Angular Snippets (Version 9) (Visual Studio Marketplace) NOTE that the author has similar plugins available for different Angular versions.

Aug 2020

1 hr 36 min

We’re using telemetry to fill in the gaps and anticipate problems while discussing The DevOps Handbook, while Michael is still weird about LinkedIn, Joe knows who’s your favorite JZ, and Allen might have gone on vacation. You can find these show notes at https://www.codingblocks.net/episode139, in case you’re reading these within your podcast player. Sponsors Datadog – Sign up today for a free 14 day trial and get a free Datadog t-shirt after your first dashboard. Secure Code Warrior – Start gamifying your organization’s security posture today, score 5,000 points, and get a free Secure Code Warrior t-shirt. Survey Says What's your favorite mobile device? Joe’s Super Secret Survey Go or Rust? Take both surveys at: https://www.codingblocks.net/episode139. News Thank you to everyone that left us a new review: iTunes: AbhiZambre, Traz3r Stitcher: AndyIsTaken Most important things to do for new developer job seekers? I Got 99 Problems and DevOps ain’t One Find and Fill Any Gaps Once we have telemetry in place, we can identify any gaps in our metrics, especially in the following levels of our application: Business level – These are metrics on business items, such as sales transactions, signups, etc. Application level – This includes metrics such as timing metrics, errors, etc. Infrastructure level – Metrics at this level cover things like databases, OS’s, networking, storage, CPU, etc. Client software level – These metrics include data like errors, crashes, timings, etc. Deployment pipeline level – This level includes metrics for data points like test suite status, deployment lead times, frequencies, etc. Application and Business Metrics Gather telemetry not just for technical bits, but also organizational goals, i.e. things like new users, login events, session lengths, active users, abandoned carts, etc. Have every business metric be actionable. And if they’re not actionable, they’re “vanity metrics”. By radiating these metrics, you enable fast feedback with feature teams to identify what’s working and what isn’t within their business unit. Infrastructure Metrics Need enough telemetry to identify what part of the infrastructure is having problems. Graphing telemetry across infrastructure and application allows you to detect when things are going wrong. Using business metrics along with infrastructure metrics allows development and operations teams to work quickly to resolve problems. Need the same telemetry in pre-production environments so you can catch problems before they make it to production. Overlaying other Relevant Information onto Our Metrics In addition to our business and infrastructure telemetry graphing, you also want to graph your deployments so you can quickly correlate if a release caused a deviation from normal. There may even be a “settling period” after a deployment where things spike (good or bad) and then return to normal. This is good information to have to see if deployments are acting as expected. Same thing goes for maintenance. Graphing when maintenance occurs helps you correlate infrastructure and application issues at the time they’re deployed. Resources We Like The DevOps Handbook: How to Create World-Class Agility, Reliability, and Security in Technology Organizations (Amazon) The Phoenix Project: A Novel about IT, DevOps, and Helping Your Business Win (Amazon) The Unicorn Project: A Novel about Developers, Digital Disruption, and Thriving in the Age of Data (Amazon) The ONE Metric More Important Than Sales & Subscribers (YouTube) 2020 Developer Survey – Most Loved, Dreaded, and Wanted Languages (Stack Overflow) Instrument your Python applications with Datadog and OpenTelemetry (Datadog) Why does speed matter? (web.dev) Dash goes virtual! Join us on Tuesday, August 11 (Datadog) Tip of the Week Google Career Certificates (grow.google) Google Offers 100,000 Scholarships – Here’s How To Get One (Forbes) Grow with Google (grow.google) Hearth Bound (HearthBoundPodcast.com, Twitter) Tsunami (GitHub) is a general purpose network security scanner with an extensible plugin system for detecting high severity vulnerabilities with high confidence. Plugins for Tsunami Security Scanner (GitHub)

Aug 2020

1 hr 22 min

It’s all about telemetry and feedback as we continue learning from The DevOps Handbook, while Joe knows his versions, Michael might have gone crazy if he didn’t find it, and Allen has more than enough muscles. For those that use their podcast player to read these show notes, did you know that you can find them at https://www.codingblocks.net/episode138? Well, you can. And now you know, and knowing is half the battle. Sponsors Datadog – Sign up today for a free 14 day trial and get a free Datadog t-shirt after your first dashboard. Secure Code Warrior – Start gamifying your organization’s security posture today, score 5,000 points, and get a free Secure Code Warrior t-shirt. Survey Says Which one? Take the survey at: https://www.codingblocks.net/episode138. News We give a heartfelt thank you in our best announcer voice to everyone that left us a new review! iTunes: TomJerry24, Adam Korynta Stitcher: VirtualShinKicker, babbansen, Felixcited Cost of a Data Breach Report 2020 (IBM) Garmin Risks Repeat Attack If It Paid $10 Million Ransom (Forbes) Almost 4,000 databases wiped in ‘Meow’ attacks (WeLiveSecurity.com) The Second Way: The Principles of Feedback Implementing the technical practices of the Second Way Provides fast and continuous feedback from operations to development. Allows us to find and fix problems earlier on the software development life cycle. Create Telemetry to Enable Seeing and Solving Problems Identifying what causes problems can be difficult to pinpoint: was it the code, was it networking, was it something else? Use a disciplined approach to identifying the problems, don’t just reboot servers. The only way to do this effectively is to always be generating telemetry. Needs to be in our applications and deployment pipelines. More metrics provide the confidence to change things. Companies that track telemetry are 168 times faster at resolving incidents than companies that don’t, per the 2015 State of DevOps Report (Puppet). The two things that contributed to this increased MTTR ability was operations using source control and proactive monitoring (i.e. telemetry). Create Centralized Telemetry Infrastructure Must create a comprehensive set of telemetry from application metrics to operational metrics so you can see how the system operates as a whole. Data collection at the business logic, application, and environmental layers via events, logs and metrics. Event router that stores events and metrics. This enables visualization, trending, alerting, and anomaly detection. Transforms logs into metrics, grouping by known elements. Need to collect telemetry from our deployment pipelines, for metrics like: How many unit tests failed? How long it takes to build and execute tests? Static code analysis. Telemetry should be easily accessible via APIs. The telemetry data should be usable without the application that produced the logs Create Application Logging Telemetry that Helps Production Dev and Ops need to be creating telemetry as part of their daily work for new and old services. Should at least be familiar with the standard log levels Debug – extremely verbose, logs just about everything that happens in an application, typically disabled in production unless diagnosing a problem. Info – typically action based logging, either actions initiated by the system or user, such as saving an order. Warn – something you log when it looks like there might be a problem, such as a slow database call. Error – the actual error that occurs in a system. Fatal – logs when something has to exit and why. Using the appropriate log level is more important than you think Low toner is not an Error. You wouldn’t want to be paged about low toner while sleeping! Examples of some things that should be logged: Authentication events, System and data access, System and app changes, Data operations (CRUD), Invalid input, Resource utilization, Health and availability, Startups and shutdowns, Faults and errors, Circuit breaker trips, Delays, Backup success and failure Use Telemetry to Guide Problem Solving Lack of telemetry has some negative issues: People use it to avoid being blamed for problems, which can be due to a political atmosphere and SUPER counter-productive. Telemetry allows for scientific methods of problem solving to be used. This approach leads to faster MTTR and a much better relationship between Dev and Ops. Enable Creation of Production Metrics as Part of Daily Work This needs to be easy, one-line implementations. StatsD, often used with Graphite or Graphana, creates timers and counters with a single line of code. Use data to generate graphs, and then overlay those graphs with production changes to see if anything changed significantly. This gives you the confidence to make changes. Create Self-Service Access to Telemetry and Information Radiators Make the data available to anyone in the value stream without having to jump through hoops to get it, be they part of Development, Operations, Product Management, or Infosec, etc. Information radiators are displays which are placed in highly visible locations so everyone can see the information quickly. Nothing to hide from visitors OR from the team itself. Resources We Like The DevOps Handbook: How to Create World-Class Agility, Reliability, and Security in Technology Organizations (Amazon) The Phoenix Project: A Novel about IT, DevOps, and Helping Your Business Win (Amazon) The Unicorn Project: A Novel about Developers, Digital Disruption, and Thriving in the Age of Data (Amazon) 2015 State of DevOps Report (Puppet) StatsD (GitHub) Graphite (graphiteapp.org) Grafana (grafana.com) The Twelve-Factor App (12factor.net) The Twelve-Factor App: Codebase, Dependencies, and Config (episode 32) The Twelve-Factor App: Backing Services, Building and Releasing, Stateless Processes (episode 33) The Twelve-Factor App: Port Binding, Concurrency, and Disposability (episode 35) The Twelve Factor App: Dev/Prod Parity, Logs, and Admin Processes (episode 36) Break Up With IE8 (breakupwithie8.com) Tip of the Week Bookmarks for VS Code (GitHub, Visual Studio Marketplace) Pwn your zsh! (ohmyz.sh) Companion cheetsheet (GitHub) Use Docker BuildKit’s experimental features to enable and use build caches (GitHub) Docker BuildKit (GitHub) Disable all of your VS Code extensions and then re-enable just the ones you need using CTRL+SHIFT+P. (code.visualstudio.com) Color code your environments in Datagrip! Right click on the server and select Color Settings. Use green for local and red for everything else to easily differentiate between the two. Can be applied at the server and/or DB levels. For example, color your default local postgres database orange. This color coding will be applied to both the navigation tree and the open file editors (i.e. tabs).

Aug 2020

1 hr 51 min

Our journey into the world of DevOps continues with The DevOps Handbook as Michael doesn’t take enough tangents, Joe regrets automating the build, err, wait never regrets (sorry), and ducks really like Allen. If you’re reading these show notes via your podcast player, you can find this episode’s full show notes at https://www.codingblocks.net/episode137, where you can be a part of the conversation. Sponsors Datadog – Sign up today for a free 14 day trial and get a free Datadog t-shirt after your first dashboard. Secure Code Warrior – Start gamifying your organization’s security posture today, score 5,000 points, and get a free Secure Code Warrior t-shirt. Survey Says What is your meeting limit? Take the survey at: https://www.codingblocks.net/episode137. News Thank you to everyone that left us a new review! iTunes: justsomedudewritingareview, stupub, andrew.diamond, scipiomarcellus Stitcher: Bicycle Repairman, BrunoLC Using Bazel to build and test software of any size, quickly and reliably. (bazel.build) Reflections on Trusting Trust (cs.cmu.edu) Build your own Linux distro! (linuxfromscratch.org) That System76 Oryx Pro keyboard though? (System76) Fast, Reliable. Pick Two Continuously Build, Test, and Integrate our Code and Environments Build and test processes run constantly, independent of coding. This ensures that we understand and codify all dependencies. This ensures repeatable deployments and configuration management. Once changes make it into source control, the packages and binaries are created only ONCE. Those same packages are used throughout the rest of the pipeline to ensure all environments are receiving the same bits. What does this mean for our team culture? You need to maintain reliable automated tests that truly validate deploy-ability. You need to stop the “production pipeline” when validation fails, i.e. pull the andon cord. You need to work in small, short lived batches that are based on trunk. No long-lived feature branches. Short, fast feedback loops are necessary; builds on every commit. Integrate Performance Testing into the Test Suite Should likely build the performance testing environment at the beginning of the project so that it can be used throughout. Logging results on each run is also important. If a set of changes shows a drastic difference from the previous run, then it should be investigated. Enable and Practice Continuous Integration Small batch and andon cord style development practices optimize for team productivity. Long lived feature branches optimize for individual productivity. But: They require painful integration periods, such as complex merges, which is “invisible work”. They can complicate pipelines. The integration complexity scales exponentially with the number of feature branches in play. They can make adding new features, teams, and individuals to a team really difficult. Trunk based development has some major benefits such as: Merging more often means finding problems sooner. It moves us closer to “single piece flow”, such as single envelope at a time story, like one big assembly line. Automate and Enable Low-Risk Releases Small batch changes are inherently less risky. The time to fix is strongly correlated with the time to remediate, i.e. the mean time to find (MTF) and the mean time to remediate (MTR). Automation needs to include operational changes, such as restarting services, that need to happen as well. Enable “self-service” deployments. Teams and individuals need to be able to dynamically spin up reliable environments. Decouple Deployments from Releases Releases are marketing driven and refer to when features are made available to customers. Feature flags can be used to toggle the release of functionality independent of their deployments. Feature flags enable roll back, graceful degradation, graceful release, and resilience. Architect for Low-Risk Releases Don’t start over! You make a lot of the same mistakes, and new ones, and ultimately end up at the same place. Instead, fix forward! Use the strangler pattern instead to push the good stuff in and push the bad stuff out, like how strangler vines grow to cover and subsume a fig tree. Decouple your code and architecture. Use good, strong versioned APIs, and dependency management to help get there. Resources We Like The DevOps Handbook: How to Create World-Class Agility, Reliability, and Security in Technology Organizations (Amazon) The Phoenix Project: A Novel about IT, DevOps, and Helping Your Business Win (Amazon) The Unicorn Project: A Novel about Developers, Digital Disruption, and Thriving in the Age of Data (Amazon) Andon (manufacturing) (Wikipedia) LEGO Nintendo Entertainment System (lego.com) Comparing Git Workflows (episode 90) StranglerFigApplication (MartinFowler.com) Tip of the Week Be sure to check out the gaming channel in Slack to find and connect with some great people for your next game. Also, look up in the Slack channel. There may be cool information in the channel’s description. Within JetBrains IDEs, such as Datagrip and IntelliJ, press CTRL+SHIFT+V to see and choose from the list of items recently copied to the clipboard. Tired of working from home? Work from a park! Save your sanity by customizing your Macbook Pro Touchbar in Chrome by going to View -> Customize Touchbar.

Jul 2020

2 hr 5 min

We begin our journey into the repeatable world of DevOps by taking cues from The DevOps Handbook, while Allen loves all things propane, Joe debuts his “singing” career with his new music video, and Michael did a very bad, awful thing. These show notes can be found at https://www.codingblocks.net/episode136 for those reading this via their podcast player. Sponsors Datadog.com/codingblocks – Sign up today for a free 14 day trial and get a free Datadog t-shirt after your first dashboard. Survey Says When you do listen to music, how do you do it? Take the survey at: https://www.codingblocks.net/episode136. News We’re very thankful for all of the new reviews! iTunes: galTheJewishHammer, Ragnaroekk, Marcel7473826, rkosko31 Stitcher: Bicycle Repairman, BrunoLC Joe serenades us in song! (YouTube) Congratulations Allen on his renewed MVP status! What is The DevOps Handbook? It’s a collection of arguments and high level guidance for understanding the spirit of DevOps. It’s light on specifics and heavy on culture. The tools aren’t the problem here, the people need to change. It’s also a book about scaling features, teams, people, and environments. The First Way: The Principles of Flow The Deployment Pipeline is the Foundation Continuous delivery: Reduces the risk associated with deploying and releasing changes. Allows for an automated deployment pipeline. Allows for automated tests. Environments on Demand Always use production like environments at every stage of the stream. Environments must be created in an automated fashion. Should have all scripts and configurations stored in source control. Should require no intervention from operations. The reality though … Often times the first time an application is tested in a production like environment, is in production. Many times test and development environments are not configured the same. Ideally though … Developers should be running their code in production like environments from the very beginning, on their own workstations. This provides an early and constant feedback cycle. Rather than creating wiki pages on how to set things up, the configurations and scripts necessary are committed to source control. This can include any of all of the following: Copying virtualized environments. Building automated environments on bare metal. Using infrastructure as code, i.e. Puppet, Chef, Ansible, Salt, CFEngine, etc. Using automated OS configuration tools. Creating environments from virtual images or containers. Creating new environments in public clouds. All of this allows entire systems to be spun up quickly making this … A win for operations as they don’t have to constantly battle configuration problems. A win for developers because they can find and fix things very early in the development process that benefits all environments. “When developers put all their application source files and configurations in version control, it becomes the single repository of truth that contains the precise intended state of the system.” The DevOps Handbook Check Everything into One Spot, that Everybody has Access to Here are the types of things that should be stored in source control: All application code and its dependencies (e.g. libraries, static content, etc.) Scripts for creating databases, lookup data, etc. Environment creation tools and artifacts (VMWare, AMI images, Puppet or Chef recipes). Files used to create containers (Docker files, Rocket definition files, etc.) All automated tests and manual scripts. Scripts for code packaging, deployments, database migrations, and environment provisioning. Additional artifacts such as documentation, deployment procedures, and release notes. Cloud configuration files, such as AWS CloudFormation templates, Azure ARM templates, Terraform scripts, etc.) All scripts or configurations for infrastructure supporting services for things like services buses, firewalls, etc. Make Infrastructure Easier to Rebuild than to Repair Treat servers like cattle instead of pets, meaning, rather than care for and fix them when they’re broken, instead delete and recreate them. This has the side effect of keeping your architecture fluid. Some have adopted immutable infrastructure where manual changes to environments are not allowed. Instead, changes are in source control which removes variance among environments. The Definition of Done “Done” means your changeset running in a production-like environment. This ensures that developers are involved in getting code to production and bring operations closer to the code. Enable Fast and Reliable Automated Testing Automated tests let you move faster, with more confidence, and shortens feedback cycles for catching and fixing problems earlier. Automated testing allowed the Google Web Server team to go from one of the least productive, to most productive group in the company. Resources We Like The DevOps Handbook: How to Create World-Class Agility, Reliability, and Security in Technology Organizations (Amazon) The Phoenix Project: A Novel about IT, DevOps, and Helping Your Business Win (Amazon) The Unicorn Project: A Novel about Developers, Digital Disruption, and Thriving in the Age of Data (Amazon) Kubernetes Failure Stories (k8s.af) Tip of the Week Press SHIFT twice to search everywhere in your IntelliJ project: Search for a target by name (Search everywhere) (JetBrains) Use ALT+F1 in Datagrip to see available options for a schema object such as navigating to it in the left pane. Use database migrations for all of your DB DevOps needs: Flyway by Redgate (flywaydb.org) RoundhousE (GitHub) Liquibase (liquibase.org)

Jul 2020

1 hr 50 min

We review the Stack Overflow Developer Survey in the same year it was created for the first time ever, while Joe has surprising news about the Hanson Brothers, Allen doesn’t have a thought process, and Michael’s callback is ruined. If you’re reading these show notes via your podcast player, you can find this episode’s full show notes and join the conversation at https://www.codingblocks.net/episode135. Sponsors Datadog.com/codingblocks – Sign up today for a free 14 day trial and get a free Datadog t-shirt after your first dashboard. University of California, Irvine Division of Continuing Education – One of the top 50 nationally ranked universities, UCI offers over 80 certificates and specialized programs designed for working professionals. Registration is NOW OPEN! Sign up and reserve your seat today! Survey Says How do you feel about semi-colons? Take the survey at: https://www.codingblocks.net/episode135, News Thank you, we appreciate the latest reviews: iTunes: Akma12345678843225, Rdudek101, SuperGoodDave, vis_1, Asaurus Rex, Brainswart, pr0ph3t, JoesGotTalent, RunsWithScissors Stitcher: TheDude01, barnabasj, oneWithTwoDotsOverTheO, MustardMakerDeluxe, OnlyRaul, _agentShrapnel, yael, d3v3l0p3r, eats_glue Zoom says free users will get end-to-end encryption after all (The Verge) AMD Ryzen 4000-Powered Asus Mini PC Challenges Intel’s NUC (Tom’s Hardware) Joe was a guest on Gaming Fyx! Gaming Fyx – Episode 125! (Cloudy With A Chance Of PS5!!) (fyx.space) Resources We Like 2020 Developer Survey (Stack Overflow) Tip of the Week Firefox now has their own VPN service: Firefox Private Network. (fpn.firefox.com) SDKMAN! The software development kit manager for managing parallel versions of multiple SDKs on most Unix based systems. (sdkman.io) What is Scaffolder, and how you can use it to increase your team dev velocity (dev.to) Fast, repeatable, simple, local Kubernetes development. (skaffold.dev)

Jun 2020

2 hr 7 min

As we learn from Google about how to navigate a code review, Michael learns to not give out compliments, Joe promises to sing if we get enough new reviews, and Allen introduces a new section to the show. For those reading this via their podcast player, this episode’s full show notes can be found at https://www.codingblocks.net/episode134. Sponsors University of California, Irvine Division of Continuing Education – One of the top 50 nationally ranked universities, UCI offers over 80 certificates and specialized programs designed for working professionals. Registration is NOW OPEN! Sign up and reserve your seat today! Datadog.com/codingblocks – Sign up today for a free 14 day trial and get a free Datadog t-shirt after your first dashboard. Survey Says How many hours per week do you work on average? Take the survey at: https://www.codingblocks.net/episode134. News Thank you, we appreciate the latest reviews: Stitcher: Jean Guillaume Misteli, gitterskow LGTM Navigating a CL in Review A couple starting questions when reviewing a CL (changelist): Does the change make sense? Does the CL have a good description? Take a broad view of the CL If the change doesn’t make sense, you need to immediately respond with WHY it shouldn’t be there. Typically if you do this, you should probably also respond with what they should have done. Be courteous. Give a good reason why. If you notice that you’re getting more than a single CL or two that doesn’t belong, you should consider putting together a quick guide to let people know what should be a part of CL’s in a particular area of code This will save a lot of work and frustration. Examine the main parts of the CL Look at the file with the most changes first as that will typically aid in figuring out the rest of the CL quicker. The smaller changes are usually part of that bigger change. Ask the developer to point you in the right direction. Ask to have the CL split into multiple smaller CL’s If you see a major problem with the CL, you need to send that feedback immediately, maybe even before you look at the rest of the CL. Might be that the rest of the CL isn’t even legit any longer if the major problem ends up being a show stopper. Why’s it so important to review and send out feedback quickly? Developers might be moving onto their next task that built off the CL in review. You want to reduce the amount of wasted effort. Developers have deadlines they have to meet so if there’s a major change that needs to happen, they need to find out about it as soon as possible. Look at the rest of the CL in an appropriate sequence Looking at files in a meaningful order will help understanding the CL. Reviewing the unit tests first will help with a general understanding of the CL. Speed of Code Reviews Velocity of the team is more important than the individual. The individual slacking on the review gets other work done, but they slow things down for the team. Looking at the other files in the CL in a meaningful order may help in speed and understanding of the CL. If there are long delays in the process, it encourages rubber stamping. One business day is the maximum to time to respond to a CL. You don’t have to stop your flow immediately though. Wait for a natural break point, like after lunch or a meeting. The primary focus on response time to the CL. When is it okay to LGTM (looks good to me)? The reviewer trusts the developer to address all of the issues raised. The changes are minor. How to write code review comments Be kind. Explain your reasoning. Balance giving directions with pointing out problems. Encourage simplifications or add comments instead of just complaining about complexity. Courtesy is important. Don’t be accusatory. Don’t say “Why did you…” Say “This could be simpler by…” Explain why things are important. It’s the developer’s responsibility to fix the code, not the reviewer’s. It’s sufficient to state the problem. Code review comments should either be conveyed in code or code comments. Pull request comments aren’t easily searchable. Handling pushback in code reviews When the developer disagrees, consider if they’re right. They are probably closer to the code than you. If you believe the CL improves things, then don’t give up. Stay polite. People tend to get more upset about the tone of comments, rather than the reviewers insistence on quality. The longer you wait to clean-up, the less likely the clean-up is to happen. Better to block the request up front then move on. Having a standard to point to clears up a lot of disputes. Change takes time, people will adjust. Resources We Like Google Engineering Practices Documentation (GitHub) Navigating a CL in review (GitHub) Speed of Code Reviews (GitHub) How to write code reviews comments (GitHub) Handling pushback in code reviews (GitHub) The CL author’s guide to getting through code review (GitHub) Writing good CL descriptions (GitHub) Small CLs (GitHub) How to handle reviewer comments (GitHub) The Myers diff algorithm: part 1 (blog.jcoglan.com) Yagni (MartinFowler.com) You aren’t gonna need it (Wikipedia) Tip of the Week Build your own Pi-hole for network-wide ad blocking (pi-hole.net) Joe’s Pi-picks: Vilros Raspberry Pi 4 4GB Complete Kit with Clear Transparent Fan Cooled Case – $99.99 (Amazon) Ubiquiti Networks EdgeRouter 12 – $228.99 (Amazon) GeeekPi New Raspberry Pi Cluster Case – $39.99 (Amazon) uBlock Origin – Browser based plug-in for content-filtering, including ad-blocking. (Wikipedia) FREE (!!!) O’Reilly site reliability engineering books made available by Google. (landing.google.com) Remove *all* background noise with your NVIDIA RTX card, using NVIDIA RTX Voice (Nvidia.com) scoop – A command-line installer for Windows. (scoop.sh) kubefwd – Kubernetes bulk service port-forwarding (kubefwd.com)

Jun 2020

1 hr 42 min

We learn what to look for in a code review while reviewing Google’s engineering practices documentation as Michael relates patterns to choo-choos, Joe has a “weird voice”, and Allen has a new favorite portion of the show. Are you reading this via your podcast player? You can find this episode’s full show notes at https://www.codingblocks.net/episode133 where you can also join the conversation. Sponsors University of California, Irvine Division of Continuing Education – One of the top 50 nationally ranked universities, UCI offers over 80 certificates and specialized programs designed for working professionals. Registration is NOW OPEN! Sign up and reserve your seat today! Datadog.com/codingblocks – Sign up today for a free 14 day trial and get a free Datadog t-shirt after your first dashboard. Survey Says How likely are you to advocate for working from home in the future? Take the survey at: https://www.codingblocks.net/episode133. News Thank you to everyone that left us a review: iTunes: codewith_himanshu, SpaceDuckets, akirakinski Stitcher: Anonymous (from Croatia), Llanfairpwllgwyngyll (Wikipedia), Murali Suriar Watch Joe solve LeetCode Problems (YouTube) Regarding the OWNERs file … // TODO: Insert Clever Subtitle Here Design This is the MOST IMPORTANT part of the review: the overall design of the changelist (CL). Does the code make sense? Does it belong in the codebase or in a library? Does it meld well with the rest of the system? Is it the right time to add it to the code base? Functionality Does the CL do what it’s supposed to do? Even if it does what it’s supposed to do, is it a good change for the users, both developers and actual end-users? As a reviewer, you should be thinking about all the edge-cases, concurrency issues, and generally just trying to see if any bugs arise just looking at the code. As a reviewer, you can verify the CL if you’d like, or have the developer walk you through the changes (the actual implemented changes rather than just slogging through code). Google specifically calls out parallel programming types of issues that are hard to reason about (even when debugging) especially when it comes to deadlocks and similar types of situations. Complexity This should be checked at every level of the change: Single lines of code, Functions, and Classes Too complex is code that is not easy to understand just looking at the code. Code like this will potentially introduce bugs as developers need to change it in the future. A particular type of complexity is over-engineering, where developers have made the code more generic than it needs to be, or added functionality that isn’t presently needed by the system. Reviewers should be especially vigilant about over-engineering. Encourage developers to solve the problem they know needs to be solved now, not the problem that the developer speculates might need to be solved in the future. The future problem should be solved once it arrives and you can see its actual shape and requirements in the physical universe. Google’s Engineering Practices documentation Tests Usually tests should be added in the same CL as the change, unless the CL is for an emergency. Emergencies were discussed in episode 132. Make sure the tests are correct and useful. Will the tests fail if the code is broken? Are the assertions simple and useful? Are the tests separated appropriately into different test methods? Naming Were good names chosen? A good name is long enough to be useful and not too long to be hard to read, Comments Were the comments clear and understandable, in English? Were the comments necessary? They should explain WHY code exists and NOT what it’s doing. If the code isn’t clear enough on its own, it should be refactored. Exceptions to the rule can include regular expressions and complex algorithms. Comments are different than documentation of code. Code documentation expresses the purpose, usage and behavior of that code. Style Have a style guide. Google has one for most of the languages they use. Make sure the CL follows the style guide. If something isn’t in the style guide, and as the reviewer you want to comment on the CL to make a point about style, prefix your comment with “Nit”. DO NOT BLOCK PR’s based on personal style preference! Style changes should not be mixed in with “real” changes. Those should be a separate CL. Consistency Google indicates that if existing code conflicts with the style guide, the style guide wins. If the style guide is a recommendation rather than a hard requirement, it’s a judgement call on whether to follow the guide or existing code. If no style guide applies, the CL should remain consistent with existing code. Use TODO statements for cleaning up existing code if outside the scope of the CL. Documentation If the CL changes any significant portion of builds, interactions, tests, etc., then appropriate README’s, reference docs, etc. should be updated. If the CL deprecates portions of the documentation, that should also likely be removed. Every Line Look over every line of non-generated, human written code. You need to at least understand what the code is doing. If you’re having a hard time examining the code in a timely fashion, you may want to ask the developer to walk you through it. If you can’t understand it, it’s very likely future developers won’t either, so getting clarification is good for everyone. If you don’t feel qualified to be the only reviewer, make sure someone else reviews the CL who is qualified, especially when you’re dealing with sensitive subjects such as security, concurrency, accessibility, internationalization, etc. Context Sometimes you need to back up to get a bigger view of what’s changing, rather than just looking at the individual lines that changed. Seeing the whole file versus the few lines that were changed might reveal that 5 lines were added to a 200 line method which likely needs to be revisited. Is the CL improving the health of the system? Is the CL complicating the system? Is the CL making the system more tested or less tested? “Don’t accept CLs that degrade the code health of the system.” Most systems become complex through many small changes. Good Things If you see something good in a CL, let the author know. Many times we focus on mistakes as reviewers, but some positive reinforcement may actually be more valuable. Especially true when mentoring. Resources We Like OWNERS files (chormium.googlesource.com) Modern Code Review: A Case Study at Google (research.google) Google Engineering Practices Documentation (GitHub) What to look for in a code review (GitHub) Comparing Git Workflows (episode 90) Google Style Guides (GitHub) Perl Special Variables Quick Reference (PerlMonks) Email Address Regular Expression That 99.99% Works. (emailregex.com) Tip of the Week List of common misconceptions (Wikipedia) The unofficial extension that integrates Draw.io into VS Code. (marketplace.visualstudio.com) Use Dataproc’s Cluster properties to easily update XML settings. (cloud.google.com) Bonus tip: Include a Dockerfile (or Docker Compose) file with your open source project to help it gain traction.

May 2020

1 hr 41 min

We dig into Google’s engineering practices documentation as we learn how to code review while Michael, er, Fives is done with proper nouns, Allen can’t get his pull request approved, and Joe prefers to take the average of his code reviews. In case you’re reading this via your podcast player, this episode’s full show notes can be found at https://www.codingblocks.net/episode132. Be sure to check it out and join the conversation. Sponsors University of California, Irvine Division of Continuing Education – One of the top 50 nationally ranked universities, UCI offers over 80 certificates and specialized programs designed for working professionals. Registration is NOW OPEN! Sign up and reserve your seat today! Datadog.com/codingblocks – Sign up today for a free 14 day trial and get a free Datadog t-shirt after your first dashboard. Survey Says Do you *always* include (new or updated) unit tests with your pull requests? Take the survey at: https://www.codingblocks.net/episode132. News Thank you to everyone that left us a review: iTunes: Jbarger, Podcast Devourer, Duracce Stitcher: Daemyon C How to Code Review Code Review Developer Guide Q: What is a code review? A: When someone other than the author of the code examines that code. Q: But why code review? A: To ensure high quality standards for code as well as helping ensure more maintainable code. What should code reviewers look for? Design: Is the code well-designed and appropriate for your system? Functionality: Does the code behave as the author likely intended? Is the way the code behaves good for its users? Complexity: Could the code be made simpler? Would another developer be able to easily understand and use this code when they come across it in the future? Tests: Does the code have correct and well-designed automated tests? Naming: Did the developer choose clear names for variables, classes, methods, etc.? Comments: Are the comments clear and useful? Style: Does the code follow our style guides? Documentation: Did the developer also update relevant documentation? Picking the Best Reviewers Get the best reviewer you can, someone who can review your code within the appropriate time frame. The best reviewer is the one who can give you the most thorough review. This might or might not be people in the OWNERS file. Different people might need to review different portions of your changes for the same pull request. If the “best” person isn’t available, they should still be CC’d on the change list. In Person Reviews If you pair-programmed with someone who was the right person for a code review, then the code is considered reviewed. You can also do code reviews where the reviewer asks questions and the coder only speaks when responding to the questions. How to do a Code Review The Standard of a Code Review The purpose of the code review is to make sure code quality is improving over time. There are trade-offs: Developers need to actually be able to complete some tasks. If reviewers are a pain to work with, for example they are overly critical, then folks will be less incentivized to make good improvements or ask for good reviews in the future. It is still the duty of the reviewer to make sure the code is good quality. You don’t want the health of the product or code base to degrade over time. The reviewer has ownership and responsibility over the code they’re reviewing. Reviewers should favor approving the changes when the code health is improved even if the changes aren’t perfect. There’s no such thing as perfect code, just better code. Reviewers can actually reject a set of changes even if it’s quality code if they feel it doesn’t belong in “their” system. Reviewers should not seek perfection but they should seek constant improvement. This doesn’t mean that reviewers must stay silent. They can point out things in a comment using a prefix such as “Nit”, indicating something that could be better but doesn’t block the overall change request. Code that worsens the overall quality or health of a system should not be admitted unless it’s under extreme/emergency circumstances. What constitutes an emergency? A small change that: Allows a major launch to continue, Fixes a significant production bug impacting users, Addresses a legal issue, or Patches a security hole. What does not constitute an emergency? You want the change in sooner rather than later. You’ve worked hard on the feature for a long time. The reviewers are away or in another timezone. Because it’s Friday and you want the code merged in before the weekend. A manager says that it has to be merged in today because of a soft deadline. Rolling back causes test failures or breaks the build. Mentoring Code reviews can absolutely be used as a tool for mentoring, for example teaching design patterns, explaining algorithms, etc., but if it’s not something that needs to be changed for the PR to be completed, note it as a “Nit” or “Note”. Principles Technical facts and data overrule opinions and/or preferences. The style guide is the authority. If it’s not in the style guide, it should be based on previous coding style already in the code, otherwise it’s personal preference. The reviewer may request the code follow existing patterns in the code base if there isn’t a style guide. Resolving Conflicts If there are conflicts between the coder and reviewer, they should first attempt to come to a consensus based on the information discussed here as well as what’s in the CL Author’s Guide or the Reviewer Guide. If the conflict remains, it’s probably worth having a face to face to discuss the issues and then make sure notes are taken to put on the code review for future reference and readers. If the conflict still remains, then it’s time to escalate to a team discussion, potentially having a team leader weigh in on the decision. NEVER let a change sit around just because the reviewer and coder can’t come to an agreement. Resources We Like Google Engineering Practices Documentation (GitHub) Code Review Developer Guide (GitHub) How to do a code review (GitHub) The Standard of Code Review (GitHub) Emergencies (GitHub) The CL author’s guide to getting through code review (GitHub) Technical Writing Courses (developers.google.com) Ruffles Potato Chips, Cheddar and Sour Cream (Amazon) Flawless Execution Tip of the Week William Lin’s competitive programming channel (YouTube) Register for the free Microsoft Build digital event, May 19-20. (register.build.microsoft.com) Apple to host virtual Worldwide Developers Conference beginning June 22 (Apple) Checkstyle helps Java developers adhere to a coding standard. (checkstyle.sourceforge.io) CheckStyle-IDEA – An IDEA plugin that uses Checkstyle but isn’t officially part of it. (plugins.jetbrains.com) Black – The uncompromising code formatter for Python. (pypi.org)

May 2020

1 hr 39 min

We gather around the water cooler at 6 foot distances as Michael and Joe aren’t sure what they streamed, we finally learn who has the best fries, at least in the US, and Allen doesn’t understand evenly distributing your condiments. For those reading this via their podcast player, this episode’s full show notes can be found at https://www.codingblocks.net/episode131. Stop by and join in on the conversation. Survey Says Are you staying sane during these stay-at-home orders? Take the survey at: https://www.codingblocks.net/episode131. News We really appreciate the latest reviews, so thank you! iTunes: Braver1996summer, eleneshector, Dis14Joe Stitcher: Nik P, Anonymous, Please HelP, Dis14Joe, thephdwasamistake Be on the lookout for live streams of Joe on YouTube or Twitch! Heard Around the Water Cooler COVID-19 Pushes Up Internet Use 70% And Streaming More Than 12%, First Figures Reveal (Forbes) Security at Zoom (Zoom) Joe has been busy live streaming (YouTube) Come learn Apache Drill with us! (YouTube) Cmder – Portable console emulator for Windows. We’re still learning the keyboard shortcuts. 30-Day LeetCoding Challenge (LeetCode.com) Codewars – Achieve mastery through challenge. Conway’s Game of Life (Wikipedia) by John Horton Conway (Wikipedia) Coding Interview Tips, How to get better at technical interviews without practicing (InterviewCake.com) True descriptions of languages (Reddit) Allen upgrades to the AMD Ryzen 9 3900x AMD Ryzen 9 3900X 12-core, 24-thread CPU (Amazon) Asus TUF A15 laptop review: AMD’s Ryzen 4000 is a groundbreaking mobile CPU (Eurogamer.net) Big data has been on our minds lately. Data lake (Wikipedia) Apache Hadoop Apache Cassandra Apache Parquet Google Cloud Bigtable Uber’s Big Data Platform: 100+ Petabytes with Minute Latency (eng.uber.com) Tip of the Week Interested in COBOL, game development, and Dvorak keyboards? Check out Joe’s new favorite streamer Zorchenhimer. (Twitch) Using helm uninstall doesn’t remove persistent volumes nor their claims. After doing helm uninstall RELEASE_NAME, delete the persistent volume claim using kubectl delete pvc PVC_NAME to remove the claim, which depending on the storage class and reclaim policy, will also remove the persistent volume. Otherwise, you’d need to manually remove the persistent volume using kubectl delete pv PV-NAME. kafkacat – A generic non-JVM producer and consumer for Apache Kafka. (GitHub)

Apr 2020

1 hr 51 min

We dig into the details of how databases use B-trees as we continue our discussion of Designing Data-Intensive Applications while Michael’s description of median is awful, live streaming isn’t for Allen, and Joe really wants to bring us back from the break. For those reading this via their podcast player, this episode’s full show notes can be found at https://www.codingblocks.net/episode130 in all their glory. Check it out, as Joe would say, and join the conversation. Sponsors Datadog.com/codingblocks – Sign up today for a free 14 day trial and get a free Datadog t-shirt after install the agent. Survey Says What tools are you using to ease WFH? Take the survey at: https://www.codingblocks.net/episode130. News We really appreciate the latest reviews, so thank you! iTunes: Anips79, Jacobboyd23, LoadedGalaxy, JenT Avid Listener Pluralsight is free for the month of April! (Pluralsight) TechSmith is offering Snagit and Video Review for free through June 2020. (TechSmith) Remember when we gushed over Zoom? Zoom: Every security issue uncovered in the video chat app (CNET) Zoom Go Boom (TWiT) Maybe should use Jitsi instead of Zoom. (Jitsi) Be on the lookout for live streams of Joe on YouTube or Twitch! B-Trees are Awesome B-trees are the most commonly used indexing structure. Introduced in 1970, and called ubiquitous 10 years later. They are the implementation used by most relational database systems, as well as a number of non-relational DB’s. “Indexing” is the way databases store metadata about your data to make quick look ups. Like the SSTable, the B-tree stores key/value pairs sorted by key. This makes range query look ups quick. B-trees use fixed block sizes, referred to as pages, that are usually 4 KB in size which (generally) map well to the underlying hardware because disks are typically arranged in fixed block sizes. Every page has an address that can be referenced from other pages. These are pointers to positions on a disk. Knowing (or being able to quickly find) which page the data you are looking for is in, drastically cuts down on the amount of data you have to scan through. B-trees start with a root page. All key searches start here. This root will contain references to child pages based off of key ranges. The child pages might contain more references to other child pages based off of more narrowly focused key ranges. This continues until you reach the page that has the data for the key you searched for. These pages are called leaf pages, where the values live along with the key. The branching factor is the number of references to child pages in one page of a B-tree. The branching factor is tied to the space needed to store the page references and the range boundaries. The book states that it’s common to have a branching factor of several hundred, some even say low thousands! The higher the branching factor means the fewer levels you have to go through, i.e. less pages you have to scan, when looking for your data. Updating a value in a B-tree can be complicated. You search for the leaf node containing the key and then update the value and write it to disk. Assuming everything fits in the page, then none of the upstream references change and everything is still valid. If you are inserting a new key, you find the leaf node where the key should live based on the ranges and then you add the key and value there. Again, if everything fits in the page, then similar to the update, none of the upstream references need to change. However, if the key/value would exceed the size of the page, the page is split into two half-pages, and the parent page’s references are updated to point to the new pages. This update to the parent page might require it to also be split. And this update/split pattern might continue up to and including the root page. By splitting the pages into halves as data is added that exceeds the page size, this keeps the tree balanced. A balanced tree is the secret to consistent lookup times. It terms of big-O, a B-tree with n keys has a depth of O(log n). Most DB’s only go 3 to 4 levels deep. A tree with four levels, using a 4 KB page size, and a branching factor of 500 can store up to 256 TB! Making B-Trees Reliable The main notion is that writes in a B-tree occur in the same location as the original page, that way no references have to change, assuming the page size isn’t exceeded. Think of this as a hardware operation. These actually map to spinning drives better than SSD’s. SSD’s must rewrite large blocks of a storage chip at a time. Because some operations require multiple pages to be written, in the case of splitting full pages and updating the parent, it can be dangerous because if there is a DB crash at any point during the writing of the pages, you can end up with orphaned pages. To combat this, implementations usually include a write-ahead log (WAL, aka a redo log). This is an append-only file where all modifications go before the tree is updated. If the database crashes, this file is read first and used to put the DB back in a good, consistent state. Another issue is that of concurrency. Multiple threads reading and writing to the B-tree at the same time could read things that would be in an inconsistent state. In order to counter this problem, latches, or lightweight locks, are typically used. B-Tree Optimizations Some databases use a copy-on-write scheme. This alleviates the need to write to an append only log like previously mentioned and instead you write each updated page to a new location including updated parents that point to it. In some cases, abbreviated keys can be stored which saves space and would allow for more branching but fewer node levels, which is fewer hops to get to the leaf nodes. This is technically a B+ tree. Some implementations attempt to keep leaf pages next to each other in sequential order which would improve the seek speed to the data. Some implementations keep additional pointers, such as references to the previous and next sibling pages so it’s quicker to scan without having to go back to the parent to find the pointer to those same nodes. Variants like fractal trees, use tactics from log-structured ideas to reduce disk seeks. Comparing B-Trees and LSM-Trees B-trees are much more common and mature. We’ve ironed out the kinks and we understand the ways people use RDBMSes. LSM-trees are typically faster for writes. B-trees are typically faster for reads because LSM-trees have to check multiple data-structures, including SSTables that might be at different levels of compaction. Use cases vary, so benchmarking your use cases are important. LSM-Tree Advantages The write amplification problem: B-trees must write all data at least twice, once to the WAL and another to the page (and again if pages are split). Some storage engines go even further for redundancy. LSM-trees also rewrite data, due to compaction and tree/SSTable merging. This is particularly a problem for SSDs, which don’t do so well with repeated writes to the same segment. LSM-trees typically have better sustained write throughput because they have lower write amplification and because of they generally sequentially write the SSTable files, which is particularly important on HDDs. LSM-trees can be compressed better, and involve less space on disk. LSM-trees also have lower fragmentation on writes. LSM-Tree Downsides Compaction of the SSTables can affect performance, even though the compaction can happen in another thread, because takes up disk I/O resources, i.e. the disk has a finite amount of I/O bandwidth. It’s possible that the compaction can not be keep up with incoming events, causing you to run out of disk space, which also slows down reads as more SSTable files need to be read. This problem is magnified in a LSM-tree because a key can exist multiple times (before compaction) unlike B-trees which have just one location for a given key. The B-tree method for updating also makes it easier for B-trees to guarantee transactional isolation. Resources We Like Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems by Martin Kleppmann (Amazon) Data Structures – (some) Trees (episode 97) B-Tree Visualization (USFCA) SQL Server Transaction Log Architecture and Management Guide (docs.microsoft.com) Log-structured merge-tree (Wikipedia) Postgres Indexes Under the Hood (rcoh.me) Is Docker the new Git? (Coding Blocks) Tip of the Week Chocolatey adds a PowerShell command Update-SessionEnvironment or refreshenv for short, that you can use to update the environment variables in your current PowerShell session, much like . $HOME/.profile for MacOS/Linux. (Chocolatey) Use docker stats to monitor the usage of your running Docker containers. It’s like top for Docker. (Docker) Click the Equivalent REST or command line link at the bottom of the Google Cloud Console to get the equivalent as a command you can script and iterate on. Jupyter has a Docker image for you: Jupyter Docker Stats. (jupter-docker-stacks.readthedocs.io) Apache Drill is an amazing schema-free SQL query engine for Hadoop, NoSQL, and Cloud Storage. (drill.apache.org) Get up and running in minutes with Drill + Docker (drill.apache.org) Presto, aka Presto DB, not to be confused with Presto SQL, is distributed SQL query engine for big data originally developed by Facebook. (prestodb.io)

Apr 2020

1 hr 56 min

Since we can’t leave the house, we discuss what it takes to effectively work remote while Allen’s frail body requires an ergonomic keyboard, Joe finally takes a passionate stance, and Michael tells them why they’re wrong. Reading these show notes via your podcast player? You can find this episode’s full show notes at https://www.codingblocks.net/episode129 and be a part of the conversation. Sponsors Datadog.com/codingblocks – Sign up today for a free 14 day trial and get a free Datadog t-shirt after install the agent. University of California, Irvine Division of Continuing Education – One of the top 50 nationally ranked universities, UCI offers over 80 certificates and specialized programs designed for working professionals. Spring registration is NOW OPEN! Sign up and reserve your seat today! Survey Says What's your preferred method to increase your productivity? Take the survey at: https://www.codingblocks.net/episode129. News Thank you, krauseling, for the latest iTunes review. TechSmith is offering Snagit and Video Review for free through June 2020. (TechSmith) How to WFH The Essentials First and foremost, get a quality internet connection. For video calls, favor lower latency over higher bandwidth. Turn your camera on. Use a comfortable headset with a good microphone Wired headphones are definitely the way to go. Better audio quality, fewer problems, and no battery life issues to worry about. Mute unless talking. Not all video sharing is equal. Know which to use when screen sharing. Screen sharing in Zoom is much better than in Hangouts. The text on the screen is crisp and readable and the screen sharing session is responsive. Communicate when you will be away during normal hours. Make sure your IM application status and/or availability is accurate. Sound Good Price Description   $30 Sony MDRXB50AP Extra Bass Earbuds Headset with mic (Amazon) $149 SteelSeries Arctis 7 Gaming Headphones (Amazon) $18 Apple EarPods with 3.5mm Headphone Plug (Amazon) Avoid these headphones   Description Price Sennheiser SC 130 USB Single Sided Headset (Amazon) NA Sennheiser SC 160 USB Double-Sided Headset (Amazon) $65 Look Good, too Price Description   NA Logitech c930e WebCam (Amazon) Digging Deeper Don’t be afraid to spend time on calls just chatting about non work related stuff. Working from home means there’s little opportunity to connect personally and that is sorely needed when working from home. Taking time to chat will help to keep the team connected. Keep it light and have fun! Be available and over communicate. During business hours make sure you’re available. That doesn’t mean you need to be in front of your computer constantly, but it does mean to make sure you can be reached via phone, email, or chat and can participate when needed. Working from home also means it is super important to communicate status and make sure people feel like progress is being made. Also, if you need to be offline for any reason, send up a flare, don’t just disappear. Make sure your chat application status is really your status. People will rely on you showing “Active” meaning that you are available. Don’t game your status. Take a break if you need to but if you aren’t available, don’t show available. Also, if you don’t show “Active” many will assume that you aren’t available or online. We’ve also found that sometimes it is good to show “offline” or “unavailable” to give us a chance to get into a flow and get things done, so don’t be afraid to do that. Having this be a “known agreement” will signal to others that they may just want to send you an e-mail or schedule a conference later. If something is urgent in email, make sure to send the subject with a prefix of “URGENT:” But beware the an “urgent” email doesn’t mean you’ll get an instant reply. If you need an answer right now, consider a phone call. An “urgent” email should be treated as “as soon as you read this”, knowing that it might not be read for a while. Make sure your calendar is up to date. If you are busy or out of the office (OOO) then make sure you schedule that in your calendar so that people will know when they can meet with you. Along with the above, when scheduling meetings, check the availability of your attendees. Be flexible. This goes with things mentioned above. As a manager especially, you need to be flexible and recognize that working from home sometimes means people need to be away for periods of time for personal reasons. Don’t sweat that unless these people aren’t delivering per the next point. Favor shorter milestones or deliverables and an iterative approach. This helps keep people focused and results oriented. Science projects are easy to squash if you define short milestones that provide quick wins on the way to a longer term goal. We use the term “fail fast” a lot where we break projects into smaller bits and try to attack what’s scariest first in an effort to “fail fast” and change course. We use JIRA and work in 2 week sprints. Define work in small enough increments. If something exceeds two weeks, it means it needs to be reviewed and refined into smaller work streams. Spend the time to think through it. Require estimates on work items to help keep thing on track. Allow and encourage people to work in groups or teams if appropriate, for things like: Brainstorming sessions. Mini-scrums that are feature or project based. Pair programming. Use of the proper video application for screen sharing is important here. Conference etiquette: Mute. If you’re not talking, mute. Lots of participants? Mute. Smaller/Team meeting? Up to you. But probably best to mute. Use a microphone and verify people hear you okay. Don’t forgo a real headset or microphone and instead try to use your internal laptop microphone and speakers. You will either be super loud with background noise, for example people just hear you typing the whole time or hear your fan running, or people won’t hear you at all. When you start presenting, it is a good practice to ask “can you see my screen?” Give others opportunities to talk and if someone hasn’t said anything, mention it and ask for their feedback especially if you think their opinion is important on the subject at hand. Use a tool to help you focus. It is easy to get distracted by any number of things. A technique that works well for some is the Pomodoro Technique. There’s also nifty applications and timers that you can use to reinforce it. Music may not be the answer. For some people just putting on noise-cancelling headphones helps with external noise (kids, TV, etc.) Choose the right desktop sharing tool when needed. We’ve found that Hangouts is a great tool to meet quickly and while it does provide for screen sharing, the video quality isn’t great. It does not allow people who are viewing your screen to zoom in and if you have a very high resolution monitor, people may find it hard to read/see it. While Webex is a little more challenging to use, it does provide the ability for others to zoom in when you share, and the shared screens are more clear than Hangouts. Additionally, Webex allows you to view all participants in one gallery view, thus reinforcing team cohesion. That said though, we’ve found Zoom to be far superior to it’s competitors. Develop a routine. Get up and start working at roughly the same time if you can. Shower and dress as if you’re going out for errands at least. If possible, have a dedicated workspace. Most importantly, make sure you stop work at some point and just be home. If at all possible, coupled with the dedicated workspace tip, if you can have a physical barrier, such as a door, use it, i.e close the door and “be home” and not “at work”. It’s hard not to overeat at first, but try to avoid the pantry that is probably really close to your workspace. Try to get out of the house for exercise or errands in the middle of day to break things up. Working from home is much more sedentary than working in an office. Make it a point to get up from your desk and walk around, check the mail, do whatever you can to stretch your legs. Resources We Like Zoom Training Resources (Zoom) Slack. Have we mentioned, we have a Slack? Microsoft Teams Pomodoro Technique (Wikipedia) Does music help us work better? It depends (BBC) GitLab’s Guide to All-Remote (about.gitlab.com) How To Work From Home (haacked.com) How to Lead From Home (haacked.com) Make “work from home” work for you (blog.google) Working from home tips from our experienced remote employees (stackoverflow.blog) Geographically Distributed Teams (haacked.com) Spouses Share the Hilarious Things They’ve Learned About Their Partner Wo5rking from Home (WorkingMother.com) Here Are Some Tweets You’ll Enjoy If You’re Currently Stuck Working From Home (BuzzFeed) 18 Jokes About Working From Home That Are Equal Parts Hilarious And Accurate (BuzzFeed) Tip of the Week Unity Learn is free for 3 months! (learn.unity.com) Use GitHub Learning Lab to grow your skills (lab.github.com) VS Code Remote Development allows you to use a container, remote machine, or WSL as a full-featured development environment. (code.visualstudio.com) You’ll need the Remote Development extension pack from the marketplace. https://twitter.com/Nick_Craver/status/1241357050988433411 https://twitter.com/Metallica/status/1242202008674803712

Mar 2020

2 hr 10 min

It’s time to learn about SSTables and LSM-Trees as Joe feels pretty zacked, Michael clarifies what he was looking forward to, and Allen has opinions about Dr Who. These show notes can be found at https://www.codingblocks.net/episode128 where you be a part of the conversation, in case you’re reading this via your podcast player. Sponsors Datadog.com/codingblocks – Sign up today for a free 14 day trial and get a free Datadog t-shirt after install the agent. ABOUT YOU processes > 200,000 API calls per minute. You like things that scale? Give their corporate page a visit! They are looking for new team members! Apply now at aboutyou.com/job. University of California, Irvine Division of Continuing Education – One of the top 50 nationally ranked universities, UCI offers over 80 certificates and specialized programs designed for working professionals. Spring registration is NOW OPEN! Sign up and reserve your seat today! Survey Says Do you leave your laptop plugged in the majority of the time? Take the survey at: https://www.codingblocks.net/episode128. News Thank you for all of the great reviews: iTunes: devextremis, CaffinatedGamer, Matt Hussey, index out of range Stitcher: Marcos Sagrado, MoarLiekCodingRokzAmirite, Asparges69 Sadly, due to COVID-19 (aka Coronavirus), the 15th Annual Orlando Code Camp & Tech Conference has been cancelled. We’ll keep you informed of your next opportunity to kick us in the shins. (orlandocodecamp.com) During this unprecedented time, TechSmith is offering Snagit and Video Review for free through June 2020. (TechSmith) SSTables and LSM-Trees SSTables SSTable is short for “Sorted String Table”. SSTable requires that the writes be sorted by key. This means we cannot append the new key/value pairs to the segment immediately because we need to make sure the data is sorted by key first. What are the benefits of the SSTable over the hash indexed log segments? Merging the segments is much faster, and simpler. It’s basically a mergesort against the segment files being merged. Look at the first key in each file, and take the lowest key (according to the sort order), add it to the new segment file … rinse-n-repeat. When the same key shows in multiple segment files, keep the newer segment’s key/value pair, sticking with the notion that the last written key/value for any given key is the most up to date value. To find keys, you no longer need to keep the entire hash of indexes in memory. Instead, you can use a sparse index where you store a key in memory for every few kilobytes from a segment file This saves on memory. This also allows for quick scans as well. For example, when you search for a key, Michael and the key isn’t in the index, you can find two keys in the sparse index that Michael falls between, such as Micah and Mick, then start at the Micah offset and scan that portion of the segment until you find the Michael key. Another improvement for speeding up read scans is to write chunks of data to disk in compressed blocks. Then, the keys in the sparse index point to the beginning of that compressed block. So how do you write this to disk in the proper order? If you just write them to disk as you get them, they’ll be out of order in an append only manner because you’re likely going to receive them out of order. One method is to actually write them to disk in a sorted structure. B-Tree is one option. However, maintaining a sorted structure in memory is actually easier than trying to maintain it on disk though, due to well known tree data structures like red-black trees and AVL trees. The keys are sorted as they’re inserted due to the way nodes are shuffled during inserts. This allows you to write the data to memory in any order and retrieve it sorted. When data arrives, write it to the memory balanced tree data structure, such as a red-black tree. This is also referred to as a memtable. Once you’ve reached a predefined size threshold, you dump the data from memory to disk in a new SSTable file. While the new segment is being written to disk, any incoming key/value pairs get written to a new memtable. When serving up read requests, you search in your memtable first, then back to the most recent segment, and so on moving backwards until you find the key you’re looking for. Occasionally run a merge on the segments to get rid of overwritten or deleted items. Downside of this method? If the database crashes for some reason, the data in the memtable is lost. To avoid this, you can use an append-only, unsorted log for each new record that comes in. If the database crashes, that log file can be used to recreate the memtable. LSM-Trees This implementation is the ground work for: LevelDB (GitHub) and RocksDB (GitHub), Databases intended to be embedded in other applications, RocksDB is embedded in Kafka Streams and is used for GlobalKTables. Similar storage engines are used by Cassandra and HBase. Both took some design queues from Google’s BigTable whitepaper, which introduced the terms SSTable and memtable. All of this was initially described under the name Log-Structured Merge Tree, LSM-Tree. Storage engines that are based on the notion of storing compacted and sorted files are often called LSM storage engines. Lucene, the indexing engine used in Solr and ElasticSearch, uses a very similar process. Optimizing One of the problems with the LSM-Tree model is that searching for keys that don’t exist can be expensive. Must search the memtable first, then latest segment, then the next oldest segment, etc., all the way back through all the segments. One solution for this particular problem is a Bloom filter. A Bloom filter is a data structure used for approximating what is in a set of data. It can tell you if the key does not exist, saving a lot of I/O looking for the key. There are competing strategies for determining when and how to perform the merge and compaction operations. The most common approaches include: Leveled compaction – Key ranges are split into smaller SSTables and old data is moved to different “levels” allowing the compacting process to use less disk and done incrementally. This is the strategy used by LevelDB and RocksDB. Size-tiered compaction – Smaller and newer SSTables are merged into larger and older SSTables. This is the strategy used by HBase. Resources We Like Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems by Martin Kleppmann (Amazon) Red-black trees in 5 minutes – Insertions (examples) (YouTube) Data Structures – (some) Trees (episode 97) B-Tree Visualization (USFCA) Red Black Tree vs AVL Tree (GeeksforGeeks) How to: Use Bloom Filters in Redis (YouTube) A Busy Developer’s Guide to Database Storage Engines – The Basics (yugabyteDB) Tip of the Week Save time typing paths by drag-n-dropping a folder from Finder/File Explorer to your command shell. Works on Windows and macOS in Command Prompt, Powershell, Cmder, and Terminal. Popular and seminal white papers curated by Papers We Love (GitHub) See if there is an upcoming PWL meetup in your area (paperswelove.org) And there’s a corresponding Papers We Love Conference (pwlconf.org) Every find yourself in the situation where you’re asked to pivot from your current work to another task that would require you to stash your current changes and change branches? Maybe you do that. Or maybe you clone the repo into another path and work from there? But there’s a pro-tip way. Instead, you can use git worktree to work with your repo in another path without needing to re-clone the repo. For example, git worktree add -b myhotfix /temp master copies the files from master to /temp and creates a new branch named myhotfix. Get your Silicon Valley fix with Mythic Quest. (Apple) Level up your programming skills with exercises and mentors with Exercism. (exercism.io) Exercism has been worth mentioning a few times: Algorithms, Puzzles, and the Technical Interview (episode 26) Deliberate Practice for Programmers (episode 78) Use elasticdump’s import and export tools for Elasticsearch. (GitHub) Use docker run --network="NETWORK-NAME-HERE" to connect a container to an existing Docker network. (docs.docker.com)

Mar 2020

1 hr 38 min

In this episode, Allen is back, Joe knows his maff, and Michael brings the jokes, all that and more as we discuss the internals of how databases store and retrieve the data we save as we continue our deep dive into Designing Data-Intensive Applications. If you’re reading these show notes via your podcast player, did you know that you can find them at https://www.codingblocks.net/episode127? Well you do now! Check it out and join in the conversation. Sponsors Datadog.com/codingblocks – Sign up today for a free 14 day trial and get a free Datadog t-shirt after creating your first dashboard. Educative.io – Level up your coding skills, quickly and efficiently. Visit educative.io/codingblocks to get 10% off any course or annual subscription. Clubhouse – The fast and enjoyable project management platform that breaks down silos and brings teams together to ship value, not features. Sign up to get two additional free months of Clubhouse on any paid plan by visiting clubhouse.io/codingblocks. Survey Says Which fast food restaurant makes the better fries? Take the survey at: https://www.codingblocks.net/episode127. News We thank all of the awesome people that left us reviews: iTunes: TheLunceforce, BrianMorrisonMe, Collectorofmuchstuff, Momentum Mori, brianbrifri, Isyldar, James Speaker Stitcher: adigolee Come see Allen, Joe, and Michael in person at the 15th Annual Orlando Code Camp & Tech Conference, March 28th. Sign up for your chance to kick them all in the shins and grab some swag. (orlandocodecamp.com) Database Storage and Retrieval A database is a collection of data. A database management system includes the database, APIs for managing the data and access to it. RDBMS Storage Data Structures Generally speaking, data is written to a log in an append only fashion, which is very efficient. Log: an append-only sequence of records; this doesn’t have to be human readable. These write operations are typically pretty fast because writing to the end of a file is generally a very fast operation. Reading for a key from a file is much more expensive though as the entire file has to be scanned for instances of the key. To solve this problem, there are indexes. Generally speaking, an index is just different ways to store another structure derived from the primary set of data. Having indices incurs additional overhead on writes. You’re no longer just writing to the primary data file, but you’re also keeping the indices up to date at the same time. This is a trade-off you incur in databases: indexes speed up reads but slow down writes. Hash Indexes One possible solution is to keep every key’s offset (which points to the location of the value of the key) in memory. This is what is done for Bitcask, the default storage engine for Riak. The system must have enough RAM for the index though. In the example given, all the keys stay in memory, but the file is still always appended to, meaning that the key’s offset is likely to change frequently, but it’s still very efficient as you’re only ever storing a pointer to the location of the value. If you’re always writing to a file, aren’t you going to run out of disk space? File segmenting / compaction solves this. Duplicate keys in a given file are compacted to store just the last value written for the key, and those values are written to a new file. This typically happens on a background thread. Once the new segment file has been created, after merging in changes from the previous file, then it becomes the new “live” log file. This means while the background thread is running to create the new segment, the locations for keys are being read from the old segment files in the meantime so that processes aren’t blocked. After the new segment file creation is completed, the old segment files can be deleted. This is how Kafka topic retention policies work, and what happens when you run “force merge” on an Elasticsearch index (same goes for similar systems). Some key factors in making this work well: File format CSV is not a great format for logs. Typically you want to use a binary format that encodes the length of the string in bytes with the actual string appended afterwards. Deleting records requires some special attention You have to add a tombstone record to the file. During the merge process, the key and values will be deleted. Crash recovery If things go south on the server, recovering might take some time if there are large segments or key/value pairs. Bitcask makes this faster by snapshotting the in-memory hashes on occasion so that starting back up can be faster. Incomplete record writes Bitcask files include checksums so any corruption in the logs can be ignored. Concurrency control It’s common for there to only be one writer thread, but multiple reader threads, since written data is immutable. Why not update the file, instead of only appending to it? Appending and merging are sequential operations, which are particularly efficient on HDD and somewhat on SSD. Concurrency and crash recovery are much simpler. Merging old segments is a convenient and unintrusive way to avoid fragmentation. Downsides to Hash Indexes The hash table must fit in memory or else you have to spill over to disk which is inefficient for hash table. Range queries are not efficient, you have to lookup each key. Resources We Like Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems by Martin Kleppmann (Amazon) Grokking the System Design Interview (Educative.io) Tip of the Week Add authentication to your applications with minimum fuss using KeyCloak. (keycloak.org) Master any major city’s transit system like a boss with CityMapper. (citymapper.com) Spin up a new VM with a single command using Multipass. (GitHub) We referenced Stefan Scherer’s Docker images again. (episode 80, Docker, GitHub) Random User Generator – like Lorem Ipsum, but for people. (randomuser.me) Example calls: US male, US female The perfect gifts for that nerd in your life. (remembertheapi.com) Git Cheat Sheet coffee mug Use CTRL+SHIFT+O in Chrome’s Sources tab to navigate to your JavaScript function by name. tabs AND spaces – A new podcast that talks the topics that developers care about. (tabsandspaces.io)

Mar 2020

2 hr 15 min

Jamie from https://dotnetcore.show/ and Allen, ya know, from Coding Blocks, sat down together at NDC London to talk about the hot topics from the conference as well as how to get the most out of any conference you attend. If you're reading this episodes show notes via your podcast player, you can find this episode's full show notes at https://www.codingblocks.net/episode126 where you can join in on the conversation. Sponsors Datadog - Sign up today at codingblocks.net/datadog for a free 14 day trial and get a free Datadog t-shirt after creating your first dashboard. Read Datadog's new State of Serverless research report that breaks down the state of Serverless, with a detailed look at AWS Lambda usage. Educative.io - Level up your coding skills, quickly and efficiently. Visit educative.io/codingblocks to get 10% off any course or annual subscription. Clubhouse - The fast and enjoyable project management platform that breaks down silos and brings teams together to ship value, not features. Sign up to get two additional free months of Clubhouse on any paid plan by visiting clubhouse.io/codingblocks. How to get the most out of a Conference If the conference has an app - I highly recommend downloading it - typically latest breaking changes to venue, rooms, talks, etc. will be in the app Attend talks that are outside your immediate realm of knowledge - get exposed to new things, new ideas, new ways of thinking Walk away with fewer "unknown unknowns" and gain some more "known unknowns" Provides you with things to search when you get back from the conference Picking the talks you want to attend Sometimes you have to sacrifice bigger names just to attend talks that may pique your interest Know that what you're seeing on stage for an hour was probably weeks worth of effort to make it go off without a hitch - so when you try to replicate these things at home, don't lose hope when your try isn't as smooth as what you saw on stage This next bit goes for Meetups, Conferences, etc - Get involved in conversations - don't just sit on the sideline - many developers are introverts, but to truly get the most out of a conference you want to have some meaningful discussions Pacman effect - leave a gap when you're standing in a group having a conversation Take advantage of eating times - find a table with an open spot and don't pick up your phone!!! Say good morning or good afternoon! "What's been your favorite talk today?" When it's "drinking time", talk to people. If you're not a drinker, grab a water or a soda and join in on the conversation Try and reach out BEFORE the conference online - Twitter, Facebook, Slack, etc - try and find out who all is going to be attending and try to make a point to meet up at the event! Makes things much less awkward when you've planned a meeting rather than just shouldering your way in. Be a wingman/wingwoman or bring one along - help introduce people to your ring of contacts' Maybe sign up to be a speaker at one of these things! If you watch the other folks giving presentations, you'll see they're regular people just sharing the things they're passionate about The big names in the industry became big names because they took that first step - you don't become a big name overnight Must-see Presentations Allen Underwood (shameless plug) - Big Data Analytics in Near-Real-Time with Apache Kafka Streams Twitter: @theallenu [https://www.twitter.com/theallenu] Summary: https://ndc-london.com/talk/big-data-analytics-in-near-real-time-with-apache-kafka-streams/ Actual talk here: Coming Soon Laura Silvanavičiūtė (this was my favorite of the entire conference) - How to code music? Twitter: @laurasilvanavi [https://www.twitter.com/@laurasilvanavi] Summary: https://ndc-london.com/talk/how-to-code-music/ Actual talk here: Coming Soon Tess Ferrandez-Norlander - We are the Guardians of our Future Summary: https://ndc-london.com/talk/keynote-we-are-the-guardians-of-our-future/ Actual talk here: https://www.youtube.com/watch?v=2YjrmgFJ_S8 Clifford Agius - 3D printed Bionic Hand a little IOT and a Xamarin Mobile App Twitter: @CliffordAgius [https://www.twitter.com/CliffordAgius] Summary: https://ndc-london.com/talk/3d-printed-bionic-hand-a-little-iot-and-a-xamarin-mobile-app/ Actual talk here: ComingSoon Blog version: https://cliffordagius.co.uk/2019/10/06/3d-printing-a-hand/ Carl Franklin from .NET Rocks - Deep Dive on Server-Side Blazor Twitter: @carlfranklin [https://www.twitter.com/carlfranklin] Summary: https://ndc-london.com/talk/deep-dive-on-server-side-blazor/ Actual talk here: Coming Soon David Fowler: SignalR Deep Dive: Building Servers Twitter: @davidfowl [https://www.twitter.com/davidfowl] Summary: https://ndc-london.com/talk/signalr-deep-dive-building-servers/ Actual talk here: Coming Soon Steve Gordon Twitter: @stevejgordon Summary: https://ndc-london.com/talk/turbocharged-writing-high-performance-c-and-net-code/ Actual talk here: Coming Soon David James - Turning a side project into a business 10 lessons in 10 minutes Twitter: @davidjames [https://www.twitter.com/davidjames] Summary: https://ndc-london.com/talk/lightning-talks-9/ Actual talk here: Coming Soon Steve Sanderson Twitter: @stevesanderson [http://www.twitter.com/stevesanderson] Summary: https://ndc-london.com/talk/blazor-a-new-framework-for-browser-based-net-apps-1/ Actual talk here: Coming Soon Notes from some of the Talks Machine Learning - we as developers need to take much more care in what we release to the world A number of talks / discussion panels revolved around this topic Even with good intentions, you can make something that has consequences that aren't easy to see Knowing your data intimately is the key to everything - but, you need to have different perspectives on the data - it'd be really easy to get laser focused on what you think makes for a good set of data for a model, and miss the pieces that actually provide the best model Microsoft's ethical approach to AI - AI Principles https://www.microsoft.com/en-us/ai/responsible-ai Cities with most camera coverage? Looks like Allen got it wrong - London isn't in the first spot anymore, but they're still top 10! https://www.comparitech.com/vpn-privacy/the-worlds-most-surveilled-cities/ Favorite part of the conference? "Why go if you can just watch the videos?" Interacting with people Thanking the people who make an impact on your daily life Miscellaneous It's not "Steve Ardalis" as Allen said - it's Steve Smith, better known as @Ardalis online! Twitter: @ardalis [https://twitter.com/ardalis] App Center - DevOps Pipeline for Mobile https://appcenter.ms/ Zac Braddy / Jamie Taylor plan to launch a new podcast! Jamie doesn't care about tabs or spaces...what?!?! He also has other podcasts in the work, but they're on hold at the moment...stay tuned!

Feb 2020

1 hr 16 min

We dive into declarative vs imperative query languages as we continue to dive into Designing Data-Intensive Applications while Allen is gallivanting around London, Michael had a bullish opinion, and Joe might not know about The Witcher. If you’re reading this episodes show notes via your podcast player, you can find this episode’s full show notes at https://www.codingblocks.net/episode125 where you can join in on the conversation. Sponsors Datadog – Sign up today at codingblocks.net/datadog for a free 14 day trial and get a free Datadog t-shirt after creating your first dashboard. Read Datadog’s new State of Serverless research report that breaks down the state of Serverless, with a detailed look at AWS Lambda usage. Educative.io – Level up your coding skills, quickly and efficiently. Visit educative.io/codingblocks to get 10% off any course or annual subscription. Clubhouse – The fast and enjoyable project management platform that breaks down silos and brings teams together to ship value, not features. Sign up to get two additional free months of Clubhouse on any paid plan by visiting clubhouse.io/codingblocks. Survey Says How do you pronounce data? Take the survey at: https://www.codingblocks.net/episode125. News We thank everyone that left us some great reviews: iTunes: 3divint, RyansWorld23 Stitcher: Thomasvc, thew_s_witcher4, DaveTheShirt, Yarpskendaya Get your shin kicking shoes on and sign up for the South Florida Software Developers Conference 2020, February 29th, where Joe will be giving his talk, Streaming Architectures by Example. (fladotnet.com) Come meet us at the 15th annual Orlando Code Camp & Tech Conference, March 28th. Grab some swag and kick us in the shins. (orlandocodecamp.com) Query Languages Declarative vs Imperative The relational model introduced a declarative query language: SQL. Prior models used imperative code. An imperative language performs certain operations in a certain order, i.e. do this, then do that. With a declarative query language, you specify the pattern of data you want, the conditions that must be met, any sorting, grouping, etc. Note that you don’t specify how to retrieve the data. That is left to the optimizer to figure out. Declarative languages are attractive because they are shorter and easier to work with. Consider UI frameworks where you declaratively describe the UI without needing to write code that actually draws a button of a specific size in a specific place with a specific label, etc. Additionally, declarative languages hide the implementation details. This means it’s easier to continue using the code as the underlying engine is updated, be it a database, UI framework, etc. This also means that the declarative code can take advantage of performance enhancements with little to no change (often) to the declarative code. Because declarative languages only specify the result, instead of how to get the result, they are often more likely to be able to take advantage of parallel execution. Conversely, because imperative code needs to happen in a specific order, it’s more difficult to parallelize. MapReduce Made popular by Google, MapReduce is a programming model meant for processing large amounts of data in bulk in a horizontally distributed fashion. Some NoSQL databases, such as MongoDB and CouchDB, support MapReduce in a limited form as a way to perform read-only queries across many documents. MapReduce isn’t a declarative query language but it’s also not completely an imperative query API either. This is because to use it, you’re implementing the Template Pattern (episode 16). With MapReduce, you implement two methods: map() and reduce(). The map() and reduce() functions are pure functions. They can only use the data passed into them, they can’t perform additional queries, and they must not have side effects. Pure functions are a concept used in functional programming. From a usability perspective though, it does require writing two functions that are somewhat tied to each other, which may be more effort than just writing a single SQL query. Plus a purely declarative SQL query is better able to take advantage of the optimizer. For this reason, MongoDB added a declarative query language called the aggregation pipeline to wrap the MapReduce functionality. It’s expessiveness is similar to a subset of SQL but in a JSON syntax. Graph-Like Data Models Relationships, particularly many-to-many, are an important feature for distinguishing between when to use which data model. As relationships get even more complicated, graph models start to feel more natural. Where as document databases have documents, and relational databases have tables, rows, and columns, graph databases have: Vertices: Nodes in the graph Edges: Define the relationships between nodes, and can contain data about those relationships. Examples of graph-like data: Social graphs: Vertices are the entities (people, media, articles), and edges are the relationships (friends with, likes, etc.) Web graph: Vertices are the pages, and edges are the links. Maps: Addresses are the vertices, and roads, rails, sidewalks are the edges. There are some things that are trivial to express in a graph query that are really hard any other way. For example, fetch the top 10 people that are friends with my friends, but not friends with me, and liked pages that I like sorted by the count of our common interests. These queries work just like graph algorithms, you define how the graph is traversed. Graph databases tend to be highly flexible since you can keep adding new vertices and nodes without changing any other relationships. This makes graphs great for evolvability. Resources We Like Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems by Martin Kleppmann (Amazon) Grokking the System Design Interview (Educative.io) Design Patterns Part 2 – Oh behave! (episode 16) Developer Survey Results 2019 (Stack Overflow) Graph Algorithms (episode 85) CloudSQL with Amy Krishnamohan (gcppodcast.com) Tip of the Week Is there an equivalent of tail -f on Windows? (Stack Overflow) Recursively find files whose content matches a regex pattern and display the first 10 lines for context: Get-ChildItem .\*.txt -Recurse | Select-String -Pattern 'MyPattern' -context 10,0 Use the Microsoft Application Inspector to identify and surface well-known features and other interesting characteristics of a component’s source code to determine what it is and/or what it does. (GitHub) Automatically silence those pesky, or worse: embarrassing, notifications while screensharing on your Mac. (Muzzle)

Feb 2020

1 hr 38 min

While we continue to dig into Designing Data-Intensive Applications, we take a step back to discuss data models and relationships as Michael covers all of his bases, Allen has a survey answer just for him, and Joe really didn’t get his tip from Reddit. This episode’s full show notes can be found at https://www.codingblocks.net/episode124, in case you’re reading this via your podcast player, where you can be a part of the conversation. Sponsors Datadog.com/codingblocks – Sign up today for a free 14 day trial and get a free Datadog t-shirt after creating your first dashboard. Educative.io – Level up your coding skills, quickly and efficiently. Visit educative.io/codingblocks to get 10% off any course or annual subscription. Clubhouse – The fast and enjoyable project management platform that breaks down silos and brings teams together to ship value, not features. Sign up to get two additional free months of Clubhouse on any paid plan by visiting clubhouse.io/codingblocks. Survey Says Which keyboard do you use? Take the survey at: https://www.codingblocks.net/episode124. News Thank you for the awesome reviews: iTunes: Kampfirez, Ameise776, JozacAlanOutlaw, skmetzger, Napalm684, Dingus the First Get your tickets now for NDC { London }, January 27th – 31st, where you can kick Allen in the shins where he will be giving his talk, Big Data Analytics in Near-Real-Time with Apache Kafka Streams. (ndc-london.com) Hurry and sign up for the South Florida Software Developers Conference 2020, February 29th, where Joe will be giving his talk, Streaming Architectures by Example. This is a great opportunity for you to try to kick him in the shins. (fladotnet.com) The CB guys will be at the 15th Annual Orlando Code Camp & Tech Conference, March 28th. Sign up for your chance to kick them all in the shins and grab some swag. (orlandocodecamp.com) Relationships … It’s complicated Normalization Relational databases are typically normalized. A quick description of normalization would be associating meaningful data with a key and then relating data by keys rather than storing all of the data together. Normalization reduces redundancy and improve data integrity. Relational normalization has several benefits: Consistent styling and spelling for meaningful values. No ambiguity, even when text values are coincidentally the same, for example, Georgia the state vs Georgia the country. Updating meaningful values is easy since there is only one spot to change. Language localization support can be easier because you can associate different meaningful values with the same key for each supported language. Search for hierarchical relationships can be easier, for example, getting a list of cities for a particular state. This can vary based on how the data is stored. See episode 28 and episode 29 for more detailed discussions related to some strategies. There are legitimate reasons for having denormalized data in a relational database, like faster searches, although there might be better tools for the specific use case. Relationships … In Document Databases Document databases struggle as relationships get more complicated. Document database designers have to make careful decisions about where data will be stored. A big benefit of document databases is locality, meaning all of the relevant data for an entity is stored in one spot. Fetching an order object is one simple get in a document database, while the relational database might end up being more than one query and will surely join multiple tables. In Relational Databases There are several benefits of relational database relationships, particularly Many-to-One and Many-to-Many relationships To illustrate a Many-to-One example, there are many parts associated to one particular computer. To illustrate a Many-to-Many example, a person can be associated to many computers and a computer can be associated to many people. As your product matures, your database (typically) gets more complicated. The relational model holds up really well to these changes over time. The queries get more complicated as you add more relationships, but your flexibility remains. Query Optimization A query optimizer, a common part of popular RDBMSes, is responsible for deciding which parts of your written query to execute in which order and which indexes to use. The query optimizer has a huge impact on performance and is a big part of the reason why proprietary RDBMSes like Oracle and SQL Server are so popular. Imagine if you, the developer, had to be smarter about the order that you joined your tables and the order of items in your WHERE clause … and then ratios of data in the tables were different in production vs development, and then a new index was added, … The query optimizer uses advanced statistics about your data to make smart choices about how to execute your query. A key insight into the relational model is that the query optimizer only has to be built once and everybody benefits from it. In document databases, the developers and data model designers have to consider their designs and querying constantly. How to choose Document vs Relational Document Databases … Better performance in some use cases because of locality. Often scale very well because of the locality. Are flexible in what they can store, often called “schemaless” or “schema on read”, but put another way, this is a lack of enforced integrity. Have poor support for joining because you have to fetch the whole document for a simple lookup. Require extra care when designing because it’s difficult to change the document formats after the fact and because there is no generic query optimizer available. Relational Databases … Can provide powerful relationships, particularly with highly connected data. However, they don’t scale horizontally very well. Resources We Like Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems by Martin Kleppmann (Amazon) Grokking the System Design Interview (Educative.io) Generate metrics from your logs to view historical trends and track SLOs (Datadog) Hierarchical Data – Adjacency Lists and Nested Set Models (episode 28) Hierarchical Data cont’d – Path Enumeration and Closure Tables (episode 29) Tip of the Week Presto – The Distributed SQL Query Engine for Big Data. (prestodb.io) Use the Files app in iOS to proxy files from Box or Google Drive (support.apple.com) Pin tabs in Chrome for all of your must have open tabs. (support.google.com) Use the Microsoft Authenticator to keep all of your one-time passwords in sync across all of your devices. And it requires you authenticate with it to even see the OTPs! (App Store, Google Play) Combine Poker with learning with Varianto:25’s Git playing cards. (varianto25.com) Search your Gmail for unread old emails with queries like before:2019/01/01 is:unread. The new JetBrains Mono font is almost as awesome as the page that describes it. (JetBrains)

Jan 2020

2 hr 13 min

We’re comparing data models as we continue our deep dive into Designing Data-Intensive Applications as Coach Joe is ready to teach some basketball, Michael can’t pronounce 6NF, and Allen measured some geodesic distances just this morning. For those reading these show notes via a podcast player, this episode’s full show notes can be found at https://www.codingblocks.net/episode123 where you can also join in on the conversation. Sponsors Datadog.com/codingblocks – Sign up today for a free 14 day trial and get a free Datadog t-shirt after creating your first dashboard. Educative.io – Level up your coding skills, quickly and efficiently. Visit educative.io/codingblocks to get 20% off any course or, for a limited time, get 50% off an annual subscription. ABOUT YOU – One of the fastest growing e-commerce companies headquartered in Hamburg, Germany that is growing fast and looking for motivated team members like you. Apply now at aboutyou.com/job. Survey Says Which data model do you prefer? Take the survey at: https://www.codingblocks.net/episode123.   News We thank everyone that took a moment to leave us a review: iTunes: BoulderDude333, the pang1, fizch26 Hurry up and get your tickets now for NDC { London }, January 27th – 31st, where Allen will be giving his talk, Big Data Analytics in Near-Real-Time with Apache Kafka Streams. This is your chance to kick him in the shins on the other side of the pond. (ndc-london.com) Sign up for your chance to kick Joe in the shins at the South Florida Software Developers Conference 2020, February 29th, where he will be giving his talk, Streaming Architectures by Example. (fladotnet.com) Want a chance to kick all three Coding Blocks hosts in the shins? Sign up for the 15th Annual Orlando Code Camp & Tech Conference, March 28th, for your chance to kick them all in the shins and grab some swag. (orlandocodecamp.com) Data Models Data models are one of the most important pieces of developing software. It dictates how the software is written. And it dictates how we think about the problems we’re solving. Software is typically written by stacking layers of modeling on top of each other. We write objects and data structures to reflect the real world. These then get translated into some format that will be persisted in JSON, XML, relational tables, graph db’s, etc. The people that built the storage engine had to determine how to model the data on disk and in memory to support things like search, fast access, etc. Even further down, those bits have to be converted to electrical current, pulses of light, magnetic fields and so on. Complex applications commonly have many layers: APIs built on top of APIs. What’s the purpose of these layers? To hide the complexity of the layer below it. The abstractions allow different groups of people (potentially with completely different skillsets) to work together. There are MANY types of data models, all with different usages and needs in mind. It can take a LOT of time and effort to master just a single model. Data models have a HUGE impact on how you write your applications, so its important to choose one that makes sense for what you’re trying to accomplish. Relational Model vs Document Model Best-known model today is probably the ones based on SQL. The relational model was proposed by Edgar Codd back in 1970. The relational model organizes data into relations (i.e. tables in SQL) where each relation contains an unordered collection of tuples (i.e. rows in SQL). People originally doubted it would work but it’s dominance has lasted since the mid-80’s, which the author points out is basically an eternity in software. Origins were based in business data processing, particularly transaction processing. There have been a number of competing data storage and querying approaches over the years. Network and Hierarchical models in 70’s and 80’s, Object databases were competitors in the late 80’s and early 90’s, XML databases, Basically a number a competitors over the years but nobody has dethroned the relational database. Almost everything you see and use today has some sort of relational database working behind it. NoSQL NoSQL is the latest competitor to Relational Databases. It was originally intended as a catchy Twitter hashtag for a meetup about open source, distributed, non-relational databases. It has since been re-termed to “Not only SQL”. What needs does NoSQL aim to address? The need for greater scalability than traditional RDBMS’s can typically achieve, including very large datasets and fast writes. The desire for FOSS (free and open source software), as opposed to very expensive, commercial RDBMS’s. Specialized query operations that are not supported well in the relational model. Shortcomings of relational models – need for more dynamic and/or expressive data models. Different applications (or even different pieces of the same application) have different needs and may require different data models. For that reason, it’s very likely that NoSQL won’t replace SQL, but rather it’ll augment it. This is referred to as polyglot persistence. Object-Relational Mismatch Most applications today are written in an object oriented programming language. There’s typically a translation layer required to map the relational data models to an object model. The disconnect between models can be referred to as impedance mismatch. Frameworks like ActiveRecord, Hibernate, Entity Framework, etc., can reduce the boilerplate code needed for the translation but typically don’t fully hide the impedance mismatch issues. Resources We Like Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems by Martin Kleppmann (Amazon) Grokking the System Design Interview (Educative.io) Monitor Azure DevOps workflows and pipelines with Datadog (Datadog) Monitor Amazon EKS on AWS Fargate with Datadog (Datadog) Best practices for tagging your infrastructure and applications (Datadog) Introducing: Educative Subscriptions (Educative.io) Santosh Hari – Not all data is created equal: NoSQL (YouTube) TIOBE Index (tiobe.com) Database Schema for Multiple Types of Products   Tip of the Week Got data? Use DataGrip. One tool for many databases. (JetBrains) KafkaHQ – A Kafka GUI for topics, data, consumer groups, schema registry and more. (GitHub) Grafka – A GraphQL interface for Apache Kafka (GitHub) Use Google Maps to measure geodesic distances (citylab.com) How to undo (almost) anything with Git (GitHub) Will Save the Galaxy for Food by Yahtzee Croshaw (Amazon)

Jan 2020

1 hr 53 min