System Design Problems

| Comments

Unique User ID

Design a system to generate unique id for 1 billion user in distributed system

Mean and Median

You are given say 20 nodes in a distributed system and each node have 1 billion numbers. Find mean and median. You can have some other nodes for co-ordination.

Mean = (sum of all the numbers) / total numbers
Median = Mid element in sorted sequence

For example
Input = [2, 3, 4, 1]

Mean = (2+3+4+1) / 4 =2.5
For median sorted sequence is [1, 2, 3, 4]. Median is 2 or 3.



How ls works?

  • Distributed systems people talk about NFS
  • Typical low-level systems folks get into the copy-on-write implementation of fork
  • Filesystem folks get into finding the blocks that constitute the directory

File descriptor

How file descriptor is generated?

Before I start talking about File Descriptor lets first understand the Library functions and System calls.

System calls are functions that transfer control from the user process to the operating system kernel.

Functions such as read(), write() etc are system calls.

Library functions typically provide a richer set of features. Library functions are implemented on top of system calls.

The first step is the open() system call which either open an existing file or create a new file. Multiple parameters can be provided for the requirement. For details of these parameters visit here.

In C program, we use fopen() rather than open() and this is the difference between Library function and System Call. Library function fopen() will internally call open() and configure parameter based on input provided to fopen()

fopen()
#include <stdio.h>

int main()
{
   FILE *fp;

    //Open file test.txt in read mode
   fp = fopen("/tmp/test.txt", "r");
   fclose(fp);

   return 0;
}

Executing this code under dtruss

$ sudo dtruss ./file 2>&1 | grep open
open_nocancel(“/tmp/test.txt\0”, 0x0, 0x1B6) = -1 Err#2
open(“/dev/dtracehelper\0”, 0x2, 0x7FFF5CE62180) = 3 0
open(“.\0”, 0x0, 0x1) = 3 0
open(“/usr/lib/dtrace/libdtrace_dyld.dylib\0”, 0x0, 0x0) = 3 0

Lets deep dive now into the File descriptor. In above C code we created a FILE* which is file pointer or *file descriptor at Library function level. In reality library function fopen() will call system call open() which will get the file descriptor information from filesystem.

fopen() library call want to embed the file descriptor information with some additional information. For this purpose a struct is created called FILE and in this structure file descriptor information returned by open() system call is stored with other information.

FILE structure fields may vary depending on OS

In MAC OS its as follows

FILE - stdio.h
 /*
  * stdio state variables.
  *
  * The following always hold:
  *
  *        if (_flags&(__SLBF|__SWR)) == (__SLBF|__SWR),
  *                _lbfsize is -_bf._size, else _lbfsize is 0
  *        if _flags&__SRD, _w is 0
  *        if _flags&__SWR, _r is 0
  *
  * This ensures that the getc and putc macros (or inline functions) never
  * try to write or read from a file that is in `read' or `write' mode.
  * (Moreover, they can, and do, automatically switch from read mode to
  * write mode, and back, on "r+" and "w+" files.)
  *
  * _lbfsize is used only to make the inline line-buffered output stream
  * code as compact as possible.
  *
  * _ub, _up, and _ur are used when ungetc() pushes back more characters
  * than fit in the current _bf, or when ungetc() pushes back a character
  * that does not match the previous one in _bf.  When this happens,
  * _ub._base becomes non-nil (i.e., a stream has ungetc() data iff
  * _ub._base!=NULL) and _up and _ur save the current values of _p and _r.
  *
  * NB: see WARNING above before changing the layout of this structure!
  */

 typedef struct __sFILE {
         unsigned char *_p;        /* current position in (some) buffer */
         int     _r;               /* read space left for getc() */
         int     _w;               /* write space left for putc() */
         short   _flags;           /* flags, below; this FILE is free if 0 */
         short   _file;            /* fileno, if Unix descriptor, else -1 */
         struct  __sbuf _bf;       /* the buffer (at least 1 byte, if !NULL) */
         int     _lbfsize;         /* 0 or -_bf._size, for inline putc */
 
         /* operations */
         void    *_cookie;        /* cookie passed to io functions */
         int     (*_close)(void *);
         int     (*_read) (void *, char *, int);
         fpos_t  (*_seek) (void *, fpos_t, int);
         int     (*_write)(void *, const char *, int);
 
         /* separate buffer for long sequences of ungetc() */
         struct  __sbuf _ub;     /* ungetc buffer */
         struct __sFILEX *_extra; /* additions to FILE to not break ABI */
         int     _ur;            /* saved _r when _r is counting ungetc data */
 
         /* tricks to meet minimum requirements even when malloc() fails */
         unsigned char _ubuf[3]; /* guarantee an ungetc() buffer */
         unsigned char _nbuf[1]; /* guarantee a getc() buffer */
 
         /* separate buffer for fgetln() when line crosses buffer boundary */
         struct  __sbuf _lb;     /* buffer for fgetln() */
 
         /* Unix stdio files get aligned to block boundaries on fseek() */
         int     _blksize;        /* stat.st_blksize (may be != _bf._size) */
         fpos_t  _offset;        /* current lseek offset (see WARNING) */
 } FILE;

In above snippet

FILE - stdio.h
  <code>// File descriptor from open()</code>
  short   _file;            /* fileno, if Unix descriptor, else -1 */
  
  // The buffer through which the data will pass to system calls.
  struct  __sbuf _bf;       /* the buffer (at least 1 byte, if !NULL) */
  int     _lbfsize;         /* 0 or -_bf._size, for inline putc */

  unsigned char _ubuf[3]; /* guarantee an ungetc() buffer */
  unsigned char _nbuf[1]; /* guarantee a getc() buffer */

To be continued…



Comments