Category: Uncategorized

  • How does Linux manages executable code of a Process in Primary Memory(RAM)

    Linux process has many properties that define its metadata.

    task_struct is the C Structure that stores various properties of processes

    It stores pid, state, tgid, mm(pointer to mm_struct, which defines memory metadata of the process, we will see in another article), active_mm, and so on.

    Scheduler uses this metadata for various reasons regarding scheduling.

    But the process also has executable code, right? Where is that code stored?

    task_struct does not store the executable code of a process, but Linux has a complex mechanism to store the executable code of a process.

    Linux stores the process in Virtual Memory.

    Virtual Memory contains RAM and disk addresses, and these are represented by virtual addresses. vm_area_struct defines these addresses; one process can have multiple vm_area_struct instances, and all these struct entries are registered in another structure named mm_struct.

    How does Linux store processes in virtual memory, and how does it map to primary memory? All this information we will see in another article in detail.

    This article focuses on when code is loaded into RAM, then how it is represented in RAM, and how the Linux kernel treats it.

    This is a very high-level explanation about how Linux treats executable code. We will discuss everything in detail in upcoming articles.

    So Linux treats each process as pages in RAM.

    One process may have multiple pages associated with it, but each page is of 4 KB in size.

    Each process has one separate Page Table.

    In upcoming chapters, we will see how Linux manages these page tables and how it becomes a tree.

    Now, back to pages, once a page is loaded into RAM, it has a base address also called as page frame address. Linux uses this address for process execution, because each page is 4 KB. Linux also knows where the code ends in RAM

    PTE is the main thing that defines the process.

    By using PTE, Linux defines how the executable code can be accessible.

    This is how Linux maps executable code in RAM.

    In the next chapters, we will see the entire process of how Linux stores processes in virtual memory and how a process is given to the CPU for execution.

  • Why Linux doesn’t have a Main Event Loop Like Web Servers

    As System Programming lovers, you might have seen some web server code, or might have created a small web server in C or C++ or java, if not, then for sure you might have created some application programs, the design of many softwares is like this: One main loop for example while(1){ //all code here with exit statementes }.

    Many web servers, even today, follow the design where the main event infinite loop is a central component.

    Does Linux, the world’s most popular open source Operating System, follow the same Main loop design, where all code and function calls happen, and the loop keeps running until the user shuts down the machine?

    The answer is Linux does not have a Main event loop, though it has an idle loop, but as a central component, it does not havean infinite loop like while(1) { os code goes here }, but Linux is primarily driven by the Interrupt mechanism, Linux kernel runs the code when hardware interrupt,s, like disk, keyboard, and network, arrive.

    Apart from interrupts, the Linux kernel reports on duty if the Process Scheduler decides that a particular process needs the kernel’s attention to pass the executable code to the CPU

    The third most important thing that wakes Linux code is System calls. Whenever a user programs wants to read a file, access any device like usb or want to maipulate any resource, a System call goes to the Kernel and then kernel runs the code.

    So basically, instead of executing under rounds of an infinite loop, Linux code runs on Interrupts, Scheduler calls, and System Calls.

    But Linux still has a loop, an idle loop

    When the Scheduler does not find any task for the kernel or no Interrupt happens, then this idle loop or idle task runs on the CPU.

    So the question is Infinite loop-based design is not possible in OS kernel development? The answer is that it is possible, but intentionally avoided by Unix and Unix-like systems like Linux, havingan infinite loop wastes CPU power, and Interrupts fill the purpose, so designers avoided this type of system.

    Even Minix avoided this design, Minix was the OS that motivated Linus to create Linux

    is an infinite loop-based design good for at least other types of software?

    Today, many popular software are using infinite loop-based design, for ex. Nginx tops the list, one of the most popular reverse proxy server, uses this design

    Another popular product that uses infinite loop design in Node js.

    We should remember that most of these software which use infinite loops does not burn the CPU by always executing the loop, but this loop also waits and sleeps

  • The $100 Million Gamble: Why Instagram Left AWS for its Own Servers

    Introduction:


    In 2014, Instagram was at a crossroads. They had a massive user base but were burning cash on AWS S3 storage. With over 20 billion photos and a growth rate that was terrifying, the engineering team decided to do the impossible: Move everything to Facebook’s data centers without the world noticing.
    ​The Engineering Powerhouse:
    You might think it took thousands of people, but the core migration was handled by a surprisingly small and elite team. Only about 8 to 10 core engineers were responsible for the architecture of this migration. They had to ensure that while they moved petabytes of data, the 200 million active users could still post their lunch photos without a single error message.
    ​The Strategy: 0 Downtime or Bust
    To achieve Zero Downtime, the team used a method called “The Dark Launch”:
    ​Dual-Writing: Every time a user uploaded a photo, it was written to both Amazon S3 and Instagram’s new servers at the same time.
    ​Verification: They built a “Verifier” daemon that checked every single file’s integrity using MD5 checksums. If even one pixel was different, the migration tool would retry.
    ​The Cutover: Once all 20 billion old photos were copied, they simply flipped the “Read” switch from AWS to their own servers.
    ​Why did they do it? (The 3 Big Pillars)
    ​Cost Saving: Cloud storage for 20 billion photos is insanely expensive. Moving to internal servers saved them millions of dollars every single month.
    ​Total Control: On AWS, they were limited by Amazon’s infrastructure. In their own data centers, they could optimize the hardware specifically for photo-heavy traffic.
    ​Latency: By moving to the same network as Facebook, the speed of data transfer improved significantly, making the app feel “snappier” for users.
    ​Conclusion:
    Moving 20 billion photos is like moving a skyscraper while people are still living in it. It remains one of the greatest feats in the history of System Design.

    Visit our site for daily system Design stories

    I am Nadeem, System lover, My answeres on Quora got 500k views in just 45 days, got upvotes from Senior Microsoft Employees

    I promise that I will give daily deep system facts on DSS (DeepSystemStuff.com)

    Read my another artilce here : Does linux have main Event Loop?

  • Hello world!

    Welcome to WordPress. This is your first post. Edit or delete it, then start writing!