`def calculate_median(nums): if len(nums) == 0: return 0 nums = sorted(nums) if len(nums) % 2 == 1: return nums[len(nums) // 2] else: return (nums[len(nums) // 2 - 1] + nums[len(nums) // 2]) / 2`

But what if the list keeps growing indefinitely, and you need to find the median in real time? This scenario presents a challenging problem known as the "Infinite Rolling Median Problem."

To address this issue, we will progressively enhance simple approaches and gradually develop a solution that, to the best of my knowledge, is the most optimal for this problem.

Let's keep things simple and assess the capabilities of the naive approach in solving this problem. We will utilise our implemented `calculate_median`

function to test the problem and determine the time and space complexities of the solution.

`import randomdef calculate_median(nums): # ... same as beforedef naive_solution(N): data = [] for _ in range(N): rn = random.random() data.append(rn) median = calculate(median)`

Assuming that `calculate_median`

which uses Pythons default `sorted`

function and it has an optimal time-complexity of \(O(Nlog(N)\) and that for each in `naive_solution,`

we loop over \(N\) means that the time-complexity would be \(O(N^2log(N))\) which is not that bad for smaller \(N\) but would increase rapidly with larger \(N\).

To understand the execution of the function we will be carrying out multiple runs for each specific \(N\) and calculate the average time taken to execute it.

The above graph shows the runtime of 10 runs for each specific \(N\) ranging from 1000 to over 8000. As you can see the runtime increases in close to a power of two.

Upon examining our naive solution, it becomes evident that a more efficient approach would involve maintaining the incoming numbers in a sorted list instead of sorting the list repeatedly within the `compute_median`

function. By doing so, we can reduce the time complexity of the `compute_median`

function to \(O(1)\). Furthermore, this adjustment allows us to keep the list nearly sorted as new numbers are added, resulting in a more efficient sorting operation with time complexity\(O(Nlog(N))\).

`def compute_median(nums): if len(nums) == 0: return 0 mid_index = len(nums) // 2 if len(nums) % 2 == 1: return nums[mid_index] else: return (nums[mid_index-1] + nums[mid_index]) / 2def sorted_list_solution(N=100): data = [] for _ in range(N): r_num = random.random() data.append(r_num) data.sort() median = compute_median(data)`

We conducted a comparison between our `naive_solution`

and `sorted_list_solution`

by measuring their execution times across a range of numbers. The mean and deviation of the execution times were then plotted for analysis and evaluation.

By maintaining the list sorted, as demonstrated in the previous solution, we have achieved a substantial performance boost. However, we are not done yet. We will continue building upon this improved solution to explore further enhancements and push the boundaries of its performance capabilities.

In the Python standard library, `bisect`

is a package a set of functions which allows for efficient sorting of lists using binary search. A similar solution to that of `sorted_list_solution`

is written below using `bisect`

.

`import bisectdef compute_median(nums): ## similar to previousdef bisect_solution(N=100): data = [] for _ in range(N): r_num = random.random() bisect.insort(data, r_num) median = compute_median(data)`

Since the `native_solution`

takes a lot of time for execution, we will no longer be comparing its runtime with other solutions. We will be focusing only on better solutions going forward.

As evident from the above comparison between our sorted list approach and bisect approach, we can immediately see that for much larger `N`

the `bisect_solution`

outperforms the one which just sorts the list.

To devise an even more efficient solution, it is crucial to reflect on the ultimate objective we aim to accomplish. Our primary goal is to maintain a continuous track of the median. Consequently, it is worth exploring the possibility of developing a system that not only effectively retains the median but also seamlessly accommodates the insertion of new numbers.

By focusing on this core objective, we can strategise and implement an optimised solution that prioritises the efficient tracking of the median while ensuring that the insertion process remains smooth and seamless. This approach enables us to strike a balance between accuracy and performance, leading to an enhanced system for effectively managing the median and incorporating new numerical values.

The underlying concept involves efficiently storing the left (smaller values) and right (larger values) sides of the list. The median number will be tracked separately, enabling us to determine the appropriate placement of new numbers. To accomplish this, a heap data structure from the `heapq`

package in the standard Python library has been utilised for its efficiency.

By leveraging the heap data structure, we can ensure efficient insertion and retrieval operations while maintaining the desired order of the values on the left and right sides of the list. This approach facilitates an optimised implementation of the median tracker.

Considering the efficiency of heaps in maintaining the smallest value, we can easily handle the right side (larger numbers) of the median calculation. Retrieving the smallest larger number than the current median requires constant time complexity\(O(1)\), and inserting it into the heap takes \(O(log(N))\) time complexity.

However, dealing with the left side (smaller numbers) is slightly more complex. To retrieve the largest smaller number than the current median, we need to manipulate the values. To facilitate this, we can store the negation of each number (i.e., -x) to simplify the process and make it easier for the heap to find the largest smaller number.

This approach enables us to efficiently handle both sides of the median calculation, maximising performance by leveraging the strengths of heaps while considering the unique requirements of each side.

To implement the median tracker, it is important to establish the requirements that it needs to fulfil. The tracker should be able to:

Add a number to the tracker.

Retrieve the median from the numbers added so far.

Given that we intend to store the median and maintain separate structures for the right and left sides of the list, the structure of the `MedianTracker`

class would appear as follows.

`class MedianTracker: def __init__(self): self.r_set = [] # larger values self.l_set = [] # smaller values self.mid = None def add_number(self, n): ## to be implemented def calculate_median(self): ## to be implemented`

Maintaining the invariant of the structure during the insertion process is crucial to ensure that we always keep track of the median of the numbers added thus far. This requirement can be simplified into two scenarios: when the current median is less than the incoming number, and when the current median is greater than the incoming number.

To consider a number as the median, we must have a balanced set of numbers on the left and right sides of the median. Therefore, while inserting numbers into the median tracker, it is essential to ensure a well-balanced number of items on both sides of the median. The following code presents an implementation that satisfies these conditions.

`class MedianTracker: ## ... def add_number(self, n): if self.mid is None: self.mid = n return # check if the sides are balanced # if a number is greater than mid -> take the number and put # it to the right side and take the least of the right side # if a number is less or equal to mid -> take the number and put # it ot the left side and take the largest of the left side # replace the mid with the largest or the smallest values # from either side d = len(self.l_set) - len(self.r_set) if d == 0: if n > self.mid: heapq.heappush(self.r_set, n) else: heapq.heappush(self.l_set, -n) elif d > 0: if n > self.mid: heapq.heappush(self.r_set, n) else: largest_l = heapq.heappop(self.l_set) heapq.heappush(self.r_set, self.mid) self.mid = -largest_l heapq.heappush(self.l_set, -n) else: if n > self.mid: smallest_r = heapq.heappop(self.r_set) heapq.heappush(self.l_set, -self.mid) self.mid = smallest_r heapq.heappush(self.r_set, n) else: heapq.heappush(self.l_set, -n) ## ...`

The core idea behind the implementation is to consider the disparity between the left and right sides of the median, as well as the appropriate placement of new numbers. By analysing this difference, necessary adjustments can be made to maintain a balanced distribution on both sides.

When the difference between the left and right sides is zero, indicating a balanced state, the process becomes relatively straightforward. If the new number is smaller than the current median, it is added to the left side; if it is greater, it is added to the right side.

However, when the difference is non-zero, an additional operation is required. This operation involves transferring a number from the side with a greater number of elements to the side with fewer elements. This step ensures the continual balance between the two sides, guaranteeing the integrity of the structure.

By employing this approach, we can effectively manage the median tracker while preserving the equilibrium of the left and right sides throughout the insertion process.

Calculating the median itself is a relatively straightforward function to implement. First, we need to determine the total number of elements, which is the sum of the elements on the left side, the right side, and an additional 1 for the midpoint itself.

If the total number of elements is odd, we can simply return the current median as the output, as we have a single middle value. However, if the total number of elements is even, we need to take the average of the median and either the largest value on the smaller side or the smallest value on the larger side, depending on which side contains the greater number of elements.

`class MedianTracker: ## ... def calculate_median(self): if self.mid is None: return 0 if (len(self.r_set) + len(self.l_set) + 1) % 2 == 1: return self.mid if len(self.r_set) > len(self.l_set): x = self.r_set[0] return (x + self.mid) / 2 else: x = -self.l_set[0] return (x + self.mid) / 2 ## ...`

In terms of time complexity, the `MedianTracker`

solution offers an improvement. The insertion of numbers in the `MedianTracker`

takes a maximum of O(log n) time complexity due to the balancing operations performed, ensuring that both sides of the median remain relatively equal. On the other hand, the calculation of the median in the `MedianTracker`

solution has a constant time complexity of O(1), as it directly retrieves the median value based on the maintained structure. From the graph below it is evident that the `MedianTracker`

solution is outperforming the `bisect`

solution.

Below is a graph comparing the performance of the most efficient approaches we have implemented thus far. As depicted in the graph, the `MedianTracker`

solution and the `bisect`

solution are compared. It is evident that for smaller values of N, the performance of the `MedianTracker`

solution is comparable to that of the bisect solution. However, as N becomes larger, the `MedianTracker`

solution starts to outperform the bisect solution significantly. This indicates that the `MedianTracker`

solution excels in handling larger datasets and provides better scalability.

Among the three approaches - the sorted list solution, bisect solution, and median tracker solution - for calculating and tracking the median in a growing list of numbers, the median tracker solution stands out as the most optimal. While the sorted list solution suffers from repeated sorting operations and has a time complexity of \(O(N^2log(N))\)*,* and the bisect solution improves upon it with a time complexity of \(O(Nlog(N))\), the median tracker solution outperforms both by maintaining the balance of the left and right sides. With a time complexity for each insertion and \(O(1)\) for median calculation, the median tracker solution offers efficient real-time tracking of the median. It also provides better scalability and performance as the number of elements increases. Moreover, the space complexity remains \(O(N)\) for all three solutions. Therefore, the median tracker solution emerges as the most efficient and optimal approach for solving the "Infinite Rolling Median Problem."

Source Code: https://github.com/THasthika/rolling-median

]]>The figure below gives an overview of how the process works, the app first authenticates itself with Keycloak to retrieve the Jwt token after which it can use it to access Jitsi.

JWT Tokens can use two methods to prove their authenticity. First is by using a shared secret that is known by the systems that need to verify the tokens. This is an easier approach but it is a bit tedious since we have to keep the secret synched with the applications. The other approach for verifying jwt tokens is using private and public keys which Keycloak also supports. For this keycloak signs the jwt using its private key, this is done by taking the jwt header and jwt payload and signing it with the private key as shown in the diagram below.

To verify the authenticity of a jwt token we take the public key, the received jwt header and jwt payload and verify it with the signature of the jwt.

Navigate to the client that you are using for the app, in the Keycloak admin panel. Head over to the Mappers tab which shows the different custom or built-in claims that have been set up. You can click on the Create button on the top right to create custom mappers which we need to do.

Jitsi needs two claims inside of the jwt, which are room and bnf. The room claim is to check whether the user has access to a certain room. The bnf claim can be set to a constant 0 (zero).

For the room claim you can either take it from the user properties where it can be controlled using the admin apis available for you. But, for this example Ill be using a hardcoded value * which tells Jitsi that the token has access to every room.

After creating these mappers, your client should have its mappers section populated with the two created mappers. If you check the jwt token provided to your application using a tool like https://jwt.io/ you can verify that the claims are actually there.

In Jitsi, we need to change the `.env`

file and put the following configs into it.

`# Enable authenticationENABLE_AUTH=1# Enable guest accessENABLE_GUESTS=0# Select authentication type: internal, jwt or ldapAUTH_TYPE=jwt# JWT authentication# # Application identifierJWT_APP_ID=jitsiJWT_ASAP_KEYSERVER=http://localhost:9000/certs# (Optional) Set asap_accepted_issuers as a comma separated listJWT_ACCEPTED_ISSUERS=http://localhost:8080/auth/realms/test# (Optional) Set asap_accepted_audiences as a comma separated list# JWT_ACCEPTED_AUDIENCES=my_server1,my_server2JWT_AUTH_TYPE=tokenJWT_TOKEN_AUTH_MODULE=token_verification`

`JWT_ACCEPTED_ISSUERS`

is a field where you can specify the `iss`

or issuers of the jwt token, so that only tokens with the ones specified in the configuration are permitted to access the rooms. Likewise, we can use `JWT_ACCEPTED_AUDIENCES`

to control it using the `aud`

property of the jwt.

The `JWT_ASAP_KEYSERVER`

is the property which tells jitsi from where to get the public key so as to verify the jwt token. For this you have to create a service that retrieves the public key in a `.pem`

file format.

The Jitsi server requests the public key by using the `kid`

property in the jwt header.

The `kid`

of a jwt will remain same as long as the it uses the same private public key pair for signing the token. The Jitsi server takes the `sha256`

hash of the `kid`

property and requests the key server in this case which is `[http://localhost:9000/certs](http://localhost:9000/certs)`

.

In this case `sha256("1DKOk8q4Dc9BSgDLmksFemg5lEGuoYQvYrHVOnXNj3k")`

is `"e15452c2c03fc8afdb1d558953ab30ffd235a622fad1175de7f791a3c86eb08d"`

.

Therefore, the Jitsi server requests the public key by sending a `GET`

request to `http://localhost:9000/certs/e15452c2c03fc8afdb1d558953ab30ffd235a622fad1175de7f791a3c86eb08d.pem`

.

So the goal of our custom key server will be to take this request and produce the relavent public key.

So, how do you get the public key for a specific `kid`

value in keycloak? To do that we first need to head over to the keycloak admin panel and in the General tab of the realm, click on OpenID Endpoint Configuration in the Endpoints section.

You are then presented with a JSON of the configuration of endpoints for that realm. From there take the `jwks_uri`

and head over to that url, it might look something like `https://localhost:8080/auth/realms/test/protocol/openid-connect/certs`

. This also provides a JSON which has a `keys`

array that lists all the public keys used by that realm, and as you can see the key with `kid`

that we were looking for `1DKOk8q4Dc9BSgDLmksFemg5lEGuoYQvYrHVOnXNj3k`

is also there.

For each of the key `n`

- modulus `e`

- exponent are listed in the JSON response from keycloak. We need to take these two and produce the public key. Below code section gives an example of how you can produce the public key in Java, you could google around and find something equivalent that in your language.

`package com.tharinduhasthika.certs;import java.math.BigInteger;import java.security.KeyFactory;import java.security.PublicKey;import java.security.spec.RSAPublicKeySpec;class CertService { private String createPublicKey(String b64Modulus, String b64Exponent) throws Exception { BigInteger modulus = new BigInteger(1, new Base64URL(b64Modulus).decode()); BigInteger exponent = new BigInteger(1, new Base64URL(b64Exponent).decode()); RSAPublicKeySpec spec = new RSAPublicKeySpec(modulus, exponent); KeyFactory factory = KeyFactory.getInstance("RSA"); PublicKey pub = factory.generatePublic(spec); return Base64.encodeBase64String(pub.getEncoded()); } private String makeCertFile(String b64Modulus, String b64Exponent, String kidHash) { String publicKey = createPublicKey(b64Modulus, b64Exponent); String certBody = "-----BEGIN PUBLIC KEY-----\n" + publicKey + "\n" + "-----END PUBLIC KEY-----"; return certBody; }}`

After generating the public key you have to set the content type of the response to `Content-Type: application/x-x509-ca-cert; charset=utf-8`

.

Thats all that needs to be done. Hope you got a basic idea of how to setup the integration. If there are any clarification that needs to be improved feel free to let me know. I have listed below some references that I have used to learn how to do it myself.

References

]]>PyTorch Lightning is a framework built on top of the PyTorch deep learning framework for ease of use, think of it as a Keras-like API for the PyTorch framework. I have planned to write this series of articles from my own experience in using it for my research purposes. These articles assume that you have a good grasp of Deep Learning and PyTorch.

To install PyTorch Lightning use `pip install pytorch-lightning`

.

First, we will go over some of the important concepts in PyTorch Lightning so that it would be easier to work with them later. The PyTorch Lightning framework has been able to capture most of the requirements of people who are creating deep learning models. At the end of this article, we will be going through a mock dataset to show the full framework in action.

A model is the neural network model that we need to learn some particular task. For that, we have the `pytorch_lightning.LightningModule`

is similar to the PyTorch module, `nn.Module`

.

The scaffold for a basic model is as follows.

`import torchimport pytorch_lightning as plclass MyModel(pl.LightningModule): def __init__(self): super().__init__() def forward(self, x): ## the forward pass def configure_optimizers(self): ## configure the optimizer that is used by the model # optimizer = torch.optim.Adam(self.parameters(), lr=self.lr) # return optimizer def training_step(self, batch, batch_idx): ## the training step def validation_step(self, batch, batch_idx): ## the validation step def test_step(self, batch, batch_idx): ## the test step`

The `forward`

method is similar to the one in PyTorch it is called whenever the input is needed to be fed into the network for a forward pass.

Likewise each of the methods `training_step`

, `validation_step`

and `test_step`

is called when the model is in `training`

, `validation`

and `test`

phases respectively.

To load data into the model we have to create a class that extends from the PyTorch `Dataset`

class. Even though the PyTorch Lightning framework has its own `LightningDataModule`

class it in turn depends on the PyTorch `Dataset`

class.

Below is a way to handle data pipelining in PyTorch Lightning.

`from torch.utils.data import Datasetimport pandas as pdclass MyDataset(Dataset): def __init__(self, dataset_type="train"): if dataset_type == "train": ## load the train dataset self.df = ... elif dataset_type == "validation": self.df = ... elif dataset_type == "test": self.df = ... def get_features(self, index): ## extract the needed features X = ... return X def get_label(self, index): ## extract the needed label data y = ... return y def __len__(self): return len(self.df) def __getitem__(self, index): X = self.get_features(index) y = self.get_label(index) return (X, y)`

The `__getitem__`

method is important here because by using this we can mould the data however we want it to be presented to the model.

Here, you can load the data any way you like but the flow would be similar. In this case, we have used `dataset_type`

to differentiate between the types of data that we need, but you can use a method that is best for your particular need.

For the dataset to load the data in efficiently PyTorch has the `DataLoader`

class which loads the data in batches and also uses concurrency to speed up the process.

`from torch.utils.data import DataLoaderdataset = MyDataset()dataloader = DataLoader( dataset, batch_size=32, # number of samples to load at a time num_workers=4 # number of threads (= number of processors))`

The `Trainer`

, as the name implies is the class responsible for the training and evaluation of the models that you create. It has a myriad of options that you can go through in the official documentation. For this article, we will go through a subset of these options that are critical for operating with it.

`import pytorch_lightning as plfrom torch.utils.data import DataLoader# create the modelmodel = MyModel()# create the datasetstrain_dataset = MyDataset(dataset_type="train")validation_dataset = MyDataset(dataset_type="validation")# create the dataloaderstrain_dataloader = DataLoader(train_dataset, batch_size=32, num_workers=4)validation_dataloader = DataLoader(validation_dataset, batch_size=32, num_workers=4)# create the trainertrainer = pl.Trainer( gpus=1, # number of gpus to use -1 to use all max_epochs=10 # maximum number of epochs the trainer will execute)trainer.fit( model, train_dataloader=train_dataloader, val_dataloaders=validation_dataloader)`

`gpus`

- Specifies how many GPUs to use for the training purpose, by default it uses none.`max_epochs`

- Specifies the maximum number of epochs (how many times the dataset is shown to the model).

`fit`

Method Arguments`train_dataloader`

- Specify the data loader which is used by the trainer.`val_dataloaders`

- This can be either a list of data loaders or a single data loader, which is then used by the trainer to evaluate the model.

To make use of the stuff that we have gone through, we will be making a simple model that can identify data points that belong to 4 classes. These data points will be created using `sklearn`

.

`from sklearn.datasets import make_blobsfrom sklearn.model_selection import train_test_splitimport matplotlib.pyplot as pltclasses = 4n_samples = 1000(X, y) = make_blobs(n_samples=n_samples, n_features=2, centers=classes, cluster_std=2.5, center_box=(-10, 10) , random_state=42)## Splitting the datasets(X_train, X_test, y_train, y_test) = train_test_split(X, y, test_size=0.2, random_state=42)(X_train, X_validation, y_train, y_validation) = train_test_split(X_train, y_train, test_size=0.2, random_state=42)colors = ['red', 'green', 'blue', 'black', 'purple']cdict = dict(map(lambda x: (x, colors[x]), range(0, classes)))fig, ax = plt.subplots()for g in np.unique(y): ix = np.where(y == g) ax.scatter(X[ix, 0], X[ix, 1], c = cdict[g], label = g) ax.plotax.legend()plt.show()`

Below is the clustering of points that we are trying to fit a model to.

`import torchfrom torch.utils.data import Dataset, DataLoaderclass MyCustomDataset(Dataset): def __init__(self, X, y): self.X = X self.y = y self.count = X.shape[0] def __len__(self): return self.count def __getitem__(self, index): X = self.X[index] y = self.y[index] return (torch.tensor(X, dtype=torch.float32), y)ds_train = MyCustomDataset(X_train, y_train)ds_validation = MyCustomDataset(X_validation, y_validation)ds_test = MyCustomDataset(X_test, y_test)dl_train = DataLoader(ds_train, batch_size=16, num_workers=2)dl_validation = DataLoader(ds_validation, batch_size=16, num_workers=2)dl_test = DataLoader(ds_test, batch_size=16, num_workers=2)`

`## create the model classimport pytorch_lightning as plimport torchfrom torch import nnfrom torch.nn import functional as Fclass MyModel(pl.LightningModule): def __init__(self): super().__init__() ## make the model self.classifier = nn.Sequential( nn.Linear(in_features=2, out_features=4), nn.ReLU(), nn.Linear(in_features=4, out_features=4) ) ## use cross entropy loss for categorical problems self.loss = F.cross_entropy def configure_optimizers(self): optimizer = torch.optim.Adam(self.parameters(), lr=0.01) return optimizer def forward(self, x): x = self.classifier(x) return x def training_step(self, batch, batch_idx): x, y = batch y_logit = self(x) loss = self.loss(y_logit, y) pred = F.softmax(y_logit, dim=1) self.log('train/loss', loss, prog_bar=True, on_step=False, on_epoch=True) return loss def validation_step(self, batch, batch_idx): x, y = batch y_logit = self(x) loss = self.loss(y_logit, y) pred = F.softmax(y_logit, dim=1) self.log("val/loss", loss, prog_bar=True) def test_step(self, batch, batch_idx): x, y = batch y_logit = self(x) loss = self.loss(y_logit, y) pred = F.softmax(y_logit, dim=1) self.log("test/loss", loss)`

`trainer = pl.Trainer( max_epochs=10)model = MyModel()trainer.fit( model, train_dataloader=dl_train, val_dataloaders=dl_validation)`

With the above code we execute the `training_step`

and `validation_step`

of the model to train and also validate the model.

After training the model, we can use the test set to check the model performance with unseen data.

`trainer.test( model, test_dataloaders=dl_test)`

The output of training and testing of the model is as follows.

We will manually save the model for now, but the `Trainer`

has more advanced options that allow us to automate the saving of models. You can check the documentation for more details.

`trainer.save_checkpoint("example.ckpt")`

In this article, we have gone through each of the main steps that are necessary for using the PyTorch Lightning framework. The framework is a wonderful addition on top of the PyTorch framework. I will be posting more topics regarding PyTorch Lightning and Deep Learning in general.

]]>In a coding competition there was a problem, which was essentially to find,

$${Fib}(n) \ {mod} \ 10 \ {for} \ n \le 10^9$$

As you may know fibonacci numbers increase very rapidly, given below is the \(100^{th}\) fibonacci number.

$${Fib}({100}) = {354224848179261915075}$$

So you can see that for an input about \(10^9\) it would be impossible to represent int a 64-bit integer value and it would also take a very long time.

To bypass this bottleneck I needed to find a way to compute the \({Fib}(n) \ mod \ m \ \forall(m\gt1)\) without having to compute very large fibonacci numbers. To do that we need to first get to know *The Pisano Period*

Pisano Period, \(\pi(m)\) is defined as the period which the fibonacci sequence modulo \(m\) repeats.

For example lets take \(m = 2\)

\(n\) | \(Fib(n)\) | \(Fib(n) \ mod \ 2\) |

0 | 1 | 1 |

1 | 1 | 1 |

2 | 2 | 0 |

3 | 3 | 1 |

4 | 5 | 1 |

5 | 8 | 0 |

6 | 13 | 1 |

7 | 21 | 1 |

8 | 34 | 0 |

As you can see there is a pattern of period 3.

So essentially for a given number $m$ there is a number $\pi(m)$ which is the period at which the fibonacci sequence repeats itself.

If you need to know more about the Pisano Period, feel free to google it!

So lets code the `get_pisano_period`

function

`def get_pisano_period(m): a = 0 b = 1 for i in range(0, m*m): c = (a + b) % m a = b b = c if a == 0 and b == 1: return i + 1`

So how does the Pisano Period helps us in finding the modulo of fibonnaci number. Let me demonstrate,

Let \(Fib(n) \ mod \ m\) be the value we want to find

Let \(p = \pi(m)\), where \(\pi(x)\) is the Pisano Period of \(x\)

Then we know that the \(Fib(x) \ mod \ m\) function repeats after every \(p\) iterations.

We can express that as \(n = a*p + b \ (a,b \in \mathbb{N})\). So we only need to compute the \(Fib(b) \ mod \ m\).

The python code for that is as follows,

`def fib(n): if n == 0: return 1 n1 = 0 n2 = 1 for i in range(0, n): t = n1 + n2 n1 = n2 n2 = t return n2def fibmod(n, m): p = get_pisano_period(m) b = n % p return fib(n) % m`

So I hope you guys understood this article, if you have any doubts or you can think of a quicker way to find the modulo of fibonacci numbers please feel free to comment on it!

]]>First we'll go through these 3 functions one by one and try to visualize what happens inside of them.

This function acts as a gatekeeper which filters the elements of the input list or array by the function that is given to it.

From the above image it is clear that the function will only let elements whose the function f(x) is true are only allowed to the resulting array.

- for loop

`const arr = [ 6, 8, 12, 4, 23, 1 ];const resultArray = [];for (var i = 0; i < arr.length; i++) { if (arr[i] < 10) { resultArray.push(arr[i]); }}// resultArray <- [6, 8, 4, 1]`

- filter function

`const arr = [ 6, 8, 12, 4, 23, 1 ];const resultArray = arr.filter(function (num) { return num < 10;});// resultArray <- [6, 8, 4, 1]`

As the name implies it maps stuff, but mainly arrays or lists. If you can remember from mathematics there's this thing called functions f(x) where it maps a set to another set, nominally called domain and co-domain.

As you can see from the above image map is sort of a sliding function that goes over an array and create a new array according to the function that you provided.

- for loop

`const arr = [ 6, 8, 12, 4, 23, 1 ];const resultArray = [];for (var i = 0; i < arr.length; i++) { resultArray.push(2 * arr[i]);}// resultArray <- [12, 16, 24, 8, 46, 2]`

- map function

`const arr = [ 6, 8, 12, 4, 23, 1 ];const resultArray = arr.map(function (num) { return 2 * num;});// resultArray <- [12, 16, 24, 8, 46, 2]`

This is actually bit tricky to understand, but when you finally get it, it's really not that complicated. So to understand reduce, you have to think of accumilation or aggregation. The image below will help with the visualization of the reduce function.

For this example I've choosen a simple task, to calculate the total of the elements of an array. But you can use it for all sorts of stuff. Below I'll show the code for doing the above task using the for loop as well as using the reduce function.

`const arr = [ 6, 8, 12, 4, 23, 1 ];let total = 0;for (var i = 0; i < arr.length; i++) { total += arr[i];}// total <- 54`

`const arr = [ 6, 8, 12, 4, 23, 1 ];const total = arr.reduce(function (total, currentValue) { return total + currentValue;}, 0); // <- 0 is important, it is the initial value of total// total <- 54`

As you can see the reduce function reduces the amount of code and also makes the code more cleaner.

Lets say that we have an array of objects sort of like this,

`const people = [ { name: 'John', age: 32, occupation: 'Manager' }, { name: 'Chris', age: 24, occupation: 'Programmer' }, { name: 'Will', age: 14, occupation: 'Student' }, ...]// get names as an arrayconst names = people.map((p) => { // short form of a function return p.name;});// names <- ['John', 'Chris', 'Will', ...]// get people with occupation = 'Programmer'const programmers = people.filter((p) => { return p.occupation === 'programmer';});// programmers <- [{ name: 'Chris', age: 24, occupation: 'Programmer' }, ...]// get names of people of age >= 18const adultNames = people.filter((p) => { return p.age >= 18;}).map((p) => { return p.name;});// adultNames <- ['John', 'Chris', ...]`

As you can see map, reduce and filter functions are really powerful when they are chained to create sort of a pipeline for your data. Hope this article helped to clear out some things about these 3 functions.

Happy Coding!

]]>It used to be if you needed to share code among a group that is working on the same software, a floppy is needed to copy the code from one computer to the other. After the internet was invented and email with that, programmers began to share code in email. As sending the whole project code was a waste of time and bandwidth, some decided to use a patch system which would only send the difference of the old version and the new version of the code. Following this idea some great tools were developed, like Mercurial, Apache Subversion. These softwares known as Version Control Systems were and are still used in the industry.

Git which is also a version control system that was originally created for the Linux Kernel, as the tools which were available at that time did not satisfy the needs of those developers. After it's creation it has been used and is still used by almost all of the software industry. So learning git will be a pro in your list of skill set if you are planning to work at any position in the software industry.

There are two flavors of the git tool.

- CLI (Command Line Interface)
- GUI (Graphical User Interface) only for windows and Mac

In this article I'll be going through the CLI tool of git. It can be installed from here. So if you would like to go through this guide by doing it yourself please install git in your system.

You'll have to configure your name and email address before using git.

`git config --global user.name [name]git config --global user.email [email]`

To use git in your project first cd into your project folder

`cd [project-folder]`

Then type

`git init`

This will create a subfolder in your project folder named ".git" you don't need to worry about it, just don't delete it because it is how git manages your project.

The git project that you created is now essentially empty, git doesn't automatically adds files into the repository (project). You have to add the following to which are needed to be committed, to do that type

`git add [file-name-list]`

Or

`git add -A`

`A`

adds all the files in your project folder.

A commit is a specific instance in your project that you decided to put a hash value and store in a safe place. So if you need to come back to that specific point in time if there are some unexpected bugs you can do that. Now that you have an idea why they are important let's see how to create them.

Now that you have added the files using `git add`

the files are in staging, you can even remove files that you don't need in the commit. Let's create a commit

`git commit -m "an insightful message"`

The `-m`

flag is to add a message along side the commit so that the commit can identified easily.

The easiest way of thinking of git branches for me is to think of it as forming another parallel universe, basically you take the exact code and split into another space. You can change the code in one branch without affecting the other branches. It is very useful when every developer is given a small part of the project and he or she can just develop their part in their own branch.

The branch mechanism is useless without a way to merge the branches into a single branch after one branch had finished it's development. For that there is `git merge`

I'll come to that later, I just wanted you to see the whole branch merge mechanism and how it can be used.

So to create a new branch

`git branch [branch-name]`

To switch between branches

`git checkout [branch-name]`

You can create and switch to a new branch with this one liner

`git checkout -b [branch-name]`

Usually the main branch is named `main`

.

After switching to a branch you can follow the add commit cycle for your development.

As I told earlier the merge is used to merge two branches or more precisely merging one branch into the other. So if you wanted to merge from `branch01`

to `main`

.

First make sure that `branch01`

is based off of the latest `main`

commit. If not it will give an error, to prevent that you need to `rebase`

to that commit if you are not based on that. It will create a merge conflict if there was a change done to the code at the same places that you have changed, so you'll have to resolve them yourself. I'll explain them later, so for now think that didn't happen.

To rebase to the latest main do the following

`git checkout branch01git rebase main`

Then switch to the main branch

`git checkout main`

Then merge the branch01

`git merge branch01`

In the case where there were changes in the same places that you changed and was happened after you started working on that part, conflicts will be generated by git and I'll be your job to resolve them. There are some tools used for the to open them type

`git mergetool`

Or if you happen to be using some kind of an IDE or a good editor it'll show you the places where the conflicts took place. To resolve them you just need to remove the special markers that git places between the conflict areas and also decide on which code to keep in that commit.

If you need to see the history of the commits of the repository you can use

`git log`

To list the commits and their message that was set.

To use git effectively you need to have a remote location that can store your project so that multiple people can access it. There are several sites that provide these services like, GitHub and GitLab. So to use them you'll need to create an account in one of those sites and create a remote repository for your local repository.

After you have created your remote repository you'll get a link to the repo which has a .git ending, for instance `https://github.com/torvalds/linux.git`

, you can use that to set the remote in your local repo.

`git remote add origin [remote-link]`

To push your code to the remote you can use

`git push origin main`

`origin`

is the remote name that you set when you added the remote to your local repository.

`main`

is the remote Branch name.

If you want to be able to push the code by just `git push`

you'll have to permanently set the remote and branch by

`git push -u origin main`

On the first push or

`git branch -u origin main`

To pull changes from the remote you can use

`git pull origin main`

Or

`git pull`

If you had already set the default remote branch.

The pull may also create merge conflict that my need you to address.

Those commands that I have explained are the most used when using git, but they are not the only commands in git. There are many commands that are in the git documentation you can go and look through them more comfortably now that you have a gist of what git does.

]]>There are many ways of checking if a number is prime, below are some of them.

This is the most simple and straight forward way of checking primality.

`def is_prime(n): # prime numbers >= 2 if n <= 1: return False # loop i from 2 to n - 1 for i in range(2, n): # if n is divisible by i and the remainder is 0 # then it is surely not prime if n % i == 0: return False # if a factor was not found it is prime! return True`

The Running Time: \(O(n)\)

So This in't that bad for small integers but if the number is a very large number, like ten or hundred million it would slow down.

So What can we do to reduce our running time?

If you can remember your maths classes, you could have noticed that for every factor there is another factor that when multiplied together gives the number in question. More formally,

if \(a\) is a factor \(n\)

then there is some \(b\) such that \(a*b=n\)

So let's see some examples and figure out what can be done to make the algorithm faster.

For example lets take 20

- \(1 * 20 = 20\)
- \(2 * 10 = 20\)
- \(4 * 5 = 20\)
- \(5 * 4 = 20\)
- \(10 * 2 = 20\)
- \(20 * 1 = 20\)

As you can see we are finding the other factor which makes up the number in a separate step, which is unnecessary. So we only need to look for the first numbers that are less than or equal to \(\sqrt{n}\).

So with that intuition we could write a new function which is much more faster for larger numbers. While coding you should remember that finding the `sqrt`

of a number is more costly that multiplication so we could say that \(i*i \leq n\).

`def is_prime(n): # prime numbers >= 2 if n <= 1: return False # loop i from 2 to sqrt(n) i = 2 while i * i <= n: # if n is divisible by i and the remainder is 0 # then it is surely not prime if n % i == 0: return False i += 1 # if a factor was not found it is prime! return True`

There are some other optimizations that can be applied to the function to reduce the running time even further.

Check if \(n\) is divisible by \(2\), and if \(n \ne 2\) then it is not prime.

If a number is not divisible by \(2\) then we can be sure that it will not be divisible by any other even number. So we can cut down the iteration count in half!

So with those Optimizations the final function is as follows.

`def is_prime(n): # prime numbers >= 2 if n <= 1: return False if n == 2: return True # reject all even numbers exept for 2 if n % 2 == 0: return False # loop i from 3 to sqrt(n) i = 3 while i * i <= n: # if n is divisible by i and the remainder is 0 # then it is surely not prime if n % i == 0: return False # loop over odd numbers only i += 2 # if a factor was not found it is prime! return True`

Hope you find it useful!

]]>As some of you know in c there are no abstract classes and such, so how do we make c change the type of data used in different scenarios? this is where the `void`

, `malloc`

and `free`

comes in handy.Given below are ways to allocate a arbitrary size of memory and keep it to store whatever data you want to store.

`void *data_pointer;data_pointer = malloc(size_of_data);`

So `data_pointer`

keeps the location of the newly created data from the heap.

- nodes, when inserted to the list the data is copied (not a pointer to the data)
- when removing a node from the list the data is copied to a location that the caller of the function gives

To implement a linked list, first we need to define structures in c.The structures are `ListNode`

and `List`

, the `List`

is the structure which is in charge of all the `ListNode`

structures that belongs to it.First we'll code the structure of the `ListNode`

which is easy.

`typedef struct _ListNode { struct _ListNode *next; void *data;} ListNode;`

The tricky part is the `List`

data structure, the requirements from that data type are as follows

- know the size of the data type that we want to work with.
- know how to safely discard the data after it has been removed from the listwith the above requirements in mind lets write the code for the
`List`

.`typedef struct { ListNode *head; // pointer to the first node in the linked list size_t item_size; // the size of an element unsigned int count; // not really necessary void (*destroy)(void *data); // a function pointer to give customized ways to delete the data inside the node}`

The `unsigned int count`

member is just a optimization that can be done on any linked list implementation to increase the performance when we need to find the size of the list.

As for the `void (*destroy)(void *data)`

, don't be scared if you have not seen a pointer to a function before, it's just a complicated way of saying the type of the function that we want to have a pointer to.The syntax of a pointer to a function is as follows,

`// return_type (*pointer_name)(arg1_type arg1, arg2_type arg2, ...);// exampleint foo(int num1, int num2) { return num1 + num2;}void bar() { int (*add_function)(int a, int b); add_function = &foo; int n = add_function(1, 2); n = (*add_function)(1, 2); // same as the above}`

Hope that the above gave a simple idea of how pointers to functions works, if you still don't understand how they work I would suggest to read about them on the web and try one or two examples.

So back to the implementation...We now have to implement the actual working of the linked list, for this article I'll only be showing you how to,

- initialize the list
- insert node after a specific node
- remove node after a specific node
- cleanup the list

After implementing the above features all the other features should be really easy to implement, and also you could add in other features which you find important to a linked list.

To initialize the list we'll be implementing a function named `list_initialize`

, creative right? Let's get right to it.

`void list_initialize(List *list, size_t item_size, void (*destroy)(void *data)) { list->head = NULL; // set the head to NULL list->item_size = item_size; list->destroy = destroy; list->count = 0;}`

The initialization needs no explanation, but I'll explain it a little bit,

- set the
`head`

of the list to '0' because there are no items currently in the list - set
`item_size`

to the size that the caller of the function provides - set
`destroy`

pointer to the function that the caller provides, this can also be NULL - set the count to 0

To insert a node after a specific node we'll be implementing a function named `list_insert_after`

, let's code this thing!

`void list_insert_after(List *list, ListNode *node, void *data) { ListItem *new_node = (ListItem*) malloc(sizeof(ListItem)); new_item->next = NULL; if(node == NULL) { if(list->head != NULL) { new_node->next = list->head; } list->head = new_node; } else { new_node->next = node->next; node->next = new_node; } new_node->data = malloc(list->item_size); memcpy(new_node->data, data, list->item_size); list->count++;}`

As you can see the last 3 lines of the above function is the place where we can use to copy the needed data into another memory location in the heap.The rest of the code is the same as for all single linked lists.

To remove a node after a specific node in the list we have to have a pointer to the node before the one that we need to remove, then the function `list_remove`

can be used.If we want to remove from the head we set the to NULL, let's see it's implementation

`void list_remove(List *list, ListNode *prev_node, void *data) { ListNode *node; if(prev_node == NULL) { node = list->head; if(node != NULL) list->head = node->next; } else { node = prev_node->next; if(node != NULL) prev_node->next = node->next; } if(node == NULL) return; memcpy(data, node->data, list->item_size); if(list->destroy != NULL) { (list->destroy)(node->data); } free(node->data); free(node); node = NULL; list->count--;}`

As you can see after the boring link changing which has to be done in order to remove a node, we copy the content of the data into a caller specified location.Then we call the destroy function (if there is one) and the free up the allocated spaces. The count also has to be decremented.

To cleanup the list we have to start from the head and remove all the linked nodes, then reset the list to the initial state. The code is as follows.

`void list_cleanup(List *list) { ListNode *prev_node, *node; prev_node = list->head; while(prev_node != NULL) { node = prev_node->next; if(list->destroy != NULL) { (list->destroy)(prev_node->data); } free(prev_node->data); free(prev_node); prev_node = node; } list->head = NULL; list->destroy = NULL; list->count = 0; list->item_size = 0;}`

This implementation is just an idea into how the `malloc`

, `free`

and some memory management can be used to create a variable size data linked list.

Hope this article was useful!

]]>